Coder Social home page Coder Social logo

vemonet / rdflib-endpoint Goto Github PK

View Code? Open in Web Editor NEW
66.0 5.0 16.0 9.58 MB

๐Ÿ’ซ Deploy SPARQL endpoints from RDFLib Graphs to serve RDF files, machine learning models, or any other logic implemented in Python

Home Page: https://pypi.org/project/rdflib-endpoint

License: MIT License

Dockerfile 0.58% Python 95.46% HTML 3.96%
sparql sparql-endpoints rdflib fastapi rdf python oxigraph

rdflib-endpoint's Introduction

๐Ÿ’ซ SPARQL endpoint for RDFLib

PyPI - Version PyPI - Python Version

Test package Publish package Coverage Status

license code style - black types - Mypy

rdflib-endpoint is a SPARQL endpoint based on RDFLib to easily serve RDF files locally, machine learning models, or any other logic implemented in Python via custom SPARQL functions.

It aims to enable python developers to easily deploy functions that can be queried in a federated fashion using SPARQL. For example: using a python function to resolve labels for specific identifiers, or run a classifier given entities retrieved using a SERVICE query to another SPARQL endpoint.

Feel free to create an issue, or send a pull request if you are facing issues or would like to see a feature implemented.

โ„น๏ธ How it works

rdflib-endpoint can be used directly from the terminal to quickly serve RDF files through a SPARQL endpoint automatically deployed locally.

It can also be used to define custom SPARQL functions: the user defines and registers custom SPARQL functions, and/or populate the RDFLib Graph using Python, then the endpoint is started using uvicorn/gunicorn.

The deployed SPARQL endpoint can be used as a SERVICE in a federated SPARQL query from regular triplestores SPARQL endpoints. Tested on OpenLink Virtuoso (Jena based) and Ontotext GraphDB (RDF4J based). The endpoint is CORS enabled by default to enable querying it from client JavaScript (can be turned off).

Built with RDFLib and FastAPI.

๐Ÿ“ฆ๏ธ Installation

This package requires Python >=3.8, install it from PyPI with:

pip install rdflib-endpoint

The uvicorn and gunicorn dependencies are not included by default, if you want to install them use the optional dependency web:

pip install "rdflib-endpoint[web]"

If you want to use rdlib-endpoint as a CLI you can install with the optional dependency cli:

pip install "rdflib-endpoint[web,cli]"

If you want to use oxigraph as backend triplestore you can install with the optional dependency oxigraph:

pip install "rdflib-endpoint[web,cli,oxigraph]"

Warning

Oxigraph and oxrdflib do not support custom functions, so it can be only used to deploy graphs without custom functions.

โŒจ๏ธ Use the CLI

rdflib-endpoint can be used from the command line interface to perform basic utility tasks, such as serving or converting RDF files locally.

Make sure you installed rdflib-endpoint with the cli optional dependencies:

pip install "rdflib-endpoint[cli]"

โšก๏ธ Quickly serve RDF files through a SPARQL endpoint

Use rdflib-endpoint as a command line interface (CLI) in your terminal to quickly serve one or multiple RDF files as a SPARQL endpoint.

You can use wildcard and provide multiple files, for example to serve all turtle, JSON-LD and nquads files in the current folder you could run:

rdflib-endpoint serve *.ttl *.jsonld *.nq

Then access the YASGUI SPARQL editor on http://localhost:8000

If you installed with the Oxigraph optional dependency you can use it as backend triplestore, it is faster and supports some functions that are not supported by the RDFLib query engine (such as COALESCE()):

rdflib-endpoint serve --store Oxigraph "*.ttl" "*.jsonld" "*.nq"

๐Ÿ”„ Convert RDF files to another format

rdflib-endpoint can also be used to quickly merge and convert files from multiple formats to a specific format:

rdflib-endpoint convert "*.ttl" "*.jsonld" "*.nq" --output "merged.trig"

โœจ Deploy your SPARQL endpoint

rdflib-endpoint enables you to easily define and deploy SPARQL endpoints based on RDFLib Graph, ConjunctiveGraph, and Dataset. Additionally it provides helpers to defines custom functions in the endpoint.

Checkout the example folder for a complete working app example to get started, including a docker deployment. A good way to create a new SPARQL endpoint is to copy this example folder, and start from it.

๐Ÿšจ Deploy as a standalone API

Deploy your SPARQL endpoint as a standalone API:

from rdflib import ConjunctiveGraph
from rdflib_endpoint import SparqlEndpoint

# Start the SPARQL endpoint based on a RDFLib Graph and register your custom functions
g = ConjunctiveGraph()
# TODO: Add triples in your graph

# Then use either SparqlEndpoint or SparqlRouter, they take the same arguments
app = SparqlEndpoint(
    graph=g,
    path="/",
    cors_enabled=True,
    # Metadata used for the SPARQL service description and Swagger UI:
    title="SPARQL endpoint for RDFLib graph",
    description="A SPARQL endpoint to serve machine learning models, or any other logic implemented in Python. \n[Source code](https://github.com/vemonet/rdflib-endpoint)",
    version="0.1.0",
    public_url='https://your-endpoint-url/',
    # Example query displayed in YASGUI default tab
    example_query="""PREFIX myfunctions: <https://w3id.org/um/sparql-functions/>
SELECT ?concat ?concatLength WHERE {
    BIND("First" AS ?first)
    BIND(myfunctions:custom_concat(?first, "last") AS ?concat)
}""",
    # Additional example queries displayed in additional YASGUI tabs
    example_queries = {
    	"Bio2RDF query": {
        	"endpoint": "https://bio2rdf.org/sparql",
        	"query": """SELECT DISTINCT * WHERE {
    ?s a ?o .
} LIMIT 10""",
    	},
    	"Custom function": {
        	"query": """PREFIX myfunctions: <https://w3id.org/um/sparql-functions/>
SELECT ?concat ?concatLength WHERE {
    BIND("First" AS ?first)
    BIND(myfunctions:custom_concat(?first, "last") AS ?concat)
}""",
    	}
	}
)

Finally deploy this app using uvicorn (see below)

๐Ÿ›ฃ๏ธ Deploy as a router to include in an existing API

Deploy your SPARQL endpoint as an APIRouter to include in an existing FastAPI API. The SparqlRouter constructor takes the same arguments as the SparqlEndpoint, apart from enable_cors which needs be enabled at the API level.

from fastapi import FastAPI
from rdflib import ConjunctiveGraph
from rdflib_endpoint import SparqlRouter

g = ConjunctiveGraph()
sparql_router = SparqlRouter(
    graph=g,
    path="/",
    # Metadata used for the SPARQL service description and Swagger UI:
    title="SPARQL endpoint for RDFLib graph",
    description="A SPARQL endpoint to serve machine learning models, or any other logic implemented in Python. \n[Source code](https://github.com/vemonet/rdflib-endpoint)",
    version="0.1.0",
    public_url='https://your-endpoint-url/',
)

app = FastAPI()
app.include_router(sparql_router)

TODO: add docs to integrate to a Flask app

๐Ÿ“ Define custom SPARQL functions

This option makes it easier to define functions in your SPARQL endpoint, e.g. BIND(myfunction:custom_concat("start", "end") AS ?concat). It can be used with the SparqlEndpoint and SparqlRouter classes.

Create a app/main.py file in your project folder with your custom SPARQL functions, and endpoint parameters:

import rdflib
from rdflib import ConjunctiveGraph
from rdflib.plugins.sparql.evalutils import _eval
from rdflib_endpoint import SparqlEndpoint

def custom_concat(query_results, ctx, part, eval_part):
    """Concat 2 strings in the 2 senses and return the length as additional Length variable
    """
    # Retrieve the 2 input arguments
    argument1 = str(_eval(part.expr.expr[0], eval_part.forget(ctx, _except=part.expr._vars)))
    argument2 = str(_eval(part.expr.expr[1], eval_part.forget(ctx, _except=part.expr._vars)))
    evaluation = []
    scores = []
    # Prepare the 2 result string, 1 for eval, 1 for scores
    evaluation.append(argument1 + argument2)
    evaluation.append(argument2 + argument1)
    scores.append(len(argument1 + argument2))
    scores.append(len(argument2 + argument1))
    # Append the results for our custom function
    for i, result in enumerate(evaluation):
        query_results.append(eval_part.merge({
            part.var: rdflib.Literal(result),
            # With an additional custom var for the length
            rdflib.term.Variable(part.var + 'Length'): rdflib.Literal(scores[i])
        }))
    return query_results, ctx, part, eval_part

# Start the SPARQL endpoint based on a RDFLib Graph and register your custom functions
g = ConjunctiveGraph()
# Use either SparqlEndpoint or SparqlRouter, they take the same arguments
app = SparqlEndpoint(
    graph=g,
    path="/",
    # Register the functions:
    functions={
        'https://w3id.org/um/sparql-functions/custom_concat': custom_concat
    },
    cors_enabled=True,
    # Metadata used for the SPARQL service description and Swagger UI:
    title="SPARQL endpoint for RDFLib graph",
    description="A SPARQL endpoint to serve machine learning models, or any other logic implemented in Python. \n[Source code](https://github.com/vemonet/rdflib-endpoint)",
    version="0.1.0",
    public_url='https://your-endpoint-url/',
    # Example queries displayed in the Swagger UI to help users try your function
    example_query="""PREFIX myfunctions: <https://w3id.org/um/sparql-functions/>
SELECT ?concat ?concatLength WHERE {
    BIND("First" AS ?first)
    BIND(myfunctions:custom_concat(?first, "last") AS ?concat)
}"""
)

โœ’๏ธ Or directly define the custom evaluation

You can also directly provide the custom evaluation function, this will override the functions.

Refer to the RDFLib documentation to define the custom evaluation function. Then provide it when instantiating the SPARQL endpoint:

import rdflib
from rdflib.plugins.sparql.evaluate import evalBGP
from rdflib.namespace import FOAF, RDF, RDFS

def custom_eval(ctx, part):
    """Rewrite triple patterns to get super-classes"""
    if part.name == "BGP":
        # rewrite triples
        triples = []
        for t in part.triples:
            if t[1] == RDF.type:
                bnode = rdflib.BNode()
                triples.append((t[0], t[1], bnode))
                triples.append((bnode, RDFS.subClassOf, t[2]))
            else:
                triples.append(t)
        # delegate to normal evalBGP
        return evalBGP(ctx, triples)
    raise NotImplementedError()

app = SparqlEndpoint(
    graph=g,
    custom_eval=custom_eval
)

๐Ÿฆ„ Run the SPARQL endpoint

You can then run the SPARQL endpoint server from the folder where your script is defined with uvicorn on http://localhost:8000 (it is installed automatically when you install the rdflib-endpoint package)

uvicorn main:app --app-dir example/app --reload

Checkout in the example/README.md for more details, such as deploying it with docker.

๐Ÿ“‚ Projects using rdflib-endpoint

Here are some projects using rdflib-endpoint to deploy custom SPARQL endpoints with python:

  • The Bioregistry, an open source, community curated registry, meta-registry, and compact identifier resolver.
  • proycon/codemeta-server, server for codemeta, in memory triple store, SPARQL endpoint and simple web-based visualisation for end-user

๐Ÿ› ๏ธ Contributing

To run the project in development and make a contribution checkout the contributing page.

rdflib-endpoint's People

Contributors

cthoyt avatar datadavev avatar steve-bate avatar vemonet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rdflib-endpoint's Issues

Mounting rdflib-endpoint on existing FastAPI app

In my service, I already have a FastAPI app instantiated (called app). Can you document how to mount the as if it were a fastapi.APIRouter so I can do something like app.include_router(rdflib_endpoint_router)?

SPARQL Updates

The SPARQL endpoint doesn't support Delete and Insert. There's a comment in the code:

# TODO: RDFLib doesn't support SPARQL insert (Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, found 'INSERT')

At least the current versions of the library does support it. However, the update processing uses different APIs than the query processing: prepareUpdate instead of prepareQuery and Graph.update instead of Graph.query. I'll investigate creating a PR to support updates.

https://rdflib.readthedocs.io/en/stable/intro_to_sparql.html#update-queries

Creating SparqlEndpoint breaks some rdflib.Graph queries

Managed to distill it to this piece of code.

from rdflib import ConjunctiveGraph, URIRef
from rdflib_endpoint.sparql_endpoint import SparqlEndpoint

query = """
select * where {
  bind ( exists { select * where {?s ?p ?o .} } as ?b0 )
}
"""

graph = ConjunctiveGraph()
dummy = URIRef("dummy")
graph.add((dummy, dummy, dummy))
for row in graph.query(query):
    print(row)
# Prints
# (rdflib.term.Literal('true', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#boolean')),)
# as expected

app = SparqlEndpoint(graph=ConjunctiveGraph(), path="/")

for row in graph.query(query):
    print(row)
# Prints nothing. WTF?

Looks like even simply creating a SparqlEndpoint breaks some queries even on completely unrelated graphs. Only some quries break, e.g. select * where {?s ?p ?o .} will still work fine, but it looks like anything containing BIND breaks.

Add json-ld to CONTENT_TYPE_TO_RDFLIB_FORMAT ?

Hi Vincent,

Thank you for writing this endpoint around rdfLib!
I enjoy using it for my local development where i can just use this instead of a more complex setup.

There is just a small compatibility issue with parsing the results for my needs.
The easiest format to parse for my services is json.
So it would be awesome to get a json-ld response back after say a CONSTRUCT query.
Now we can only get turtle or xml.
I believe it should be as simple as adding it to CONTENT_TYPE_TO_RDFLIB_FORMAT
Would you mind having a look ?

Happy coding :)
Gerbert

https://rdflib.readthedocs.io/en/stable/plugin_serializers.html

It is also possible to pass a mime-type for the format parameter:

why not just pass the mime-type?

Swagger examples for application/*json are not shown

"api_responses" in sparql_router.py does not have "examples" set. The remaining media-types have examples, and work perfectly in swagger.

I am not sure whether this is done by purpose because of backwards compatability or not, but
api_responses[200]["content"]["application/sparql-results+json"]["example"] = "Example" works, so I think this is easy to solve. :)

SPARQL query containing 'coalesce' returns no result on rdflib-endpoint.SparqlEndpoint()

Hello!

My setup:

I define a graph, g = Graph(), then I
g.parse() a number of .ttl files and create the endpoint with SparqlEndpoint(graph=g).
Then I use uvicorn.run(app, ...) to expose the endpoint on my local machine.

I can successfully run this simple sparql statement to query for keywords:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT 
?keyword
WHERE { ?subj dcat:keyword ?keyword }

However, as soon as I add a coalesce statement, the query does not return results any more:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT 
?keyword
(coalesce(?keyword, "xyz") as ?foo) 
WHERE { ?subj dcat:keyword ?keyword }

I tried something similar on wikidata which has no problems:

SELECT ?item ?itemLabel 
(coalesce(?itemLabel, 2) as ?foo)
WHERE 
{
  ?item wdt:P31 wd:Q146. # Must be of a cat
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
}

The server debug output for the keywords query looks ok to me.

INFO:     127.0.0.1:43942 - "GET /sparql?format=json&query=%0APREFIX+dcat%3A+%3Chttp%3A%2F%2Fwww.w3.org%2Fns%2Fdcat%23%3E%0ASELECT+%0A%3Fkeyword%0A%28coalesce%28%3Fkeyword%2C+%22xyz%22%29+as+%3Ffoo%29+%0AWHERE+%7B+%3Fsubj+dcat%3Akeyword+%3Fkeyword+%7D%0ALIMIT+10%0A HTTP/1.1" 200 OK

Putting COALESCE in the WHERE clause does not solve the problem:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT
?keyword
?foo
WHERE {
?subj dcat:keyword ?keyword
BIND(coalesce(?keyword, 2) as ?foo)
}

Am I missing something?

Thanks in advance

Mounting on a Flask App

A web service I want to include a rdflib-endpoint instance in is built in Flask, can you add documentation on how to mount a rdflib_endpoint.SparqlEndpoint to an existing flask.Flask that doesn't require converting everything over to FastAPI?

Should be able to use prefixes bound in graph's NamespaceManager

Prefixes can be bound to a graph using the NamespaceManager. This makes them globally available, when loading and serializing as well as when querying.

rdflib-endpoint, however, requires prefixes to be declared in the query input even for those bound to the graph, because prepareQuery is not aware of the graph.

It would be convenient if the SPARQL endpoint could use the globally bound prefixes. My suggestion would be to drop the prepareQuery completely. I actually don't see why it is there in the first place, given that the prepared query is never used later on.

Missing dependencies `uvicorn` and `click`

When trying out project via :

pipx install git+https://github.com/vemonet/rdflib-endpoint.git

I've encountered problems with missing click and later uvicorn modules:

Traceback (most recent call last):
  File "/home/user/.local/bin/rdflib-endpoint", line 5, in <module>
    from rdflib_endpoint.__main__ import cli
  File "/home/user/.local/pipx/venvs/rdflib-endpoint/lib/python3.11/site-packages/rdflib_endpoint/__main__.py", line 5, in <module>
    import click
ModuleNotFoundError: No module named 'click'

I've injected them manually into pipx environment:

pipx inject rdflib-endpoint click uvicorn

However, maybe it's possible to add them to your dependencies so things just work.

Construct query

Use rdflib-endpoint more frequently I ran into an issue with a sparql construct.

sparql without prefixes

CONSTRUCT {
  ?work dcterms:date ?jaar .
}
WHERE {
{  select ?work (min(?year) as ?jaar)
    where {
          ?work dc:date ?year .
    }
    group by ?work }
}

The correct tuples are selected but no triples are created.
Using rdflib directly does give the correct triples.

SPARQLWrapper.setQuery() format does not work

Using v0.4.0 on Python 3.9.17 with Poetry (version 1.6.1).

I am spinning up a sparqlendpoint with the following file:

<http://localhost/concept-scheme#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#ConceptScheme> .
<http://localhost/concept1#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://localhost/concept1#this> <http://www.w3.org/2004/02/skos/core#prefLabel> "Concept 1" .
<http://localhost/concept1#this> <http://www.w3.org/2004/02/skos/core#inScheme> <http://localhost/concept-scheme#this> .
<http://localhost/concept2#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://localhost/concept2#this> <http://www.w3.org/2004/02/skos/core#prefLabel> "Concept 2" .
<http://localhost/concept2#this> <http://www.w3.org/2004/02/skos/core#inScheme> <http://localhost/concept-scheme#this> .
<http://localhost/concept3#this> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://localhost/concept3#this> <http://www.w3.org/2004/02/skos/core#prefLabel> "Concept 3" .

I am querying using this function:

def sparql_query(sparql_endpoint:  str, query: str, return_format=None):
        """
        Query graph from URL - executes SPARQL over HTTP and returns data (Dict or Graph).
        """
        sparql = SPARQLWrapper(sparql_endpoint)
        sparql.setQuery(query)
        if return_format != None: sparql.setReturnFormat(return_format)
        return sparql.query().convert()
        
sparql_query(sparql_endpoint="http://localhost:8000", query="PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT DISTINCT ?scheme WHERE { ?scheme a skos:ConceptScheme }", return_format=SPARQLWrapper.JSON)

I run rdflib-endpoint:

rdflib-endpoint serve --store Oxigraph skos.nt

When I query it returns xml.dom.minidom

YASGUI not working

When I run the Pypi package, the YAS UI doesn't initialize. It looks like the $EXAMPLE_QUERIES placeholder in the yasgui.html template is replaced with null. This causes the later Object.keys operation on the queries_obj to fail.

        <script>
            Yasqe.defaults.value = `$EXAMPLE_QUERY`
            const queries_obj = $EXAMPLE_QUERIES
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.