Coder Social home page Coder Social logo

rdfstream4j's Introduction

RDFStream4j

RDFStream4j logo|width=94px|height=65px

RDFStream4j is an in-memory continuous SPARQL query engine built with the Sesame RDF framework (soon to be RDF4j). It implements an open-world subset (see below) of the SPARQL query language and uses an incremental technique based on the symmetric hash join, responding to streaming RDF statements as soon as possible with SPARQL query answers and discarding irrelevant statements. RDFStream4j uses time-to-live to process infinite streams of data: queries time out and are removed unless renewed, while partial solutions to queries exist in the query engine only as long as their shortest-lived statement, making room for fresh data as they expire. RDFStream4j integrates with LinkedDataSail, following links in response to join operations.

Below is a usage example in Java. See the source code for the full example.

// A query for things written by Douglas Adams which are referenced with a pointing gesture
String query = "PREFIX activity: <http://fortytwo.net/2015/extendo/activity#>\n" +
        "PREFIX dbo: <http://dbpedia.org/ontology/>\n" +
        "PREFIX dbr: <http://dbpedia.org/resource/>\n" +
        "PREFIX foaf: <http://xmlns.com/foaf/0.1/>\n" +
        "SELECT ?actor ?indicated WHERE {\n" +
        "?a activity:thingIndicated ?indicated .\n" +
        "?a activity:actor ?actor .\n" +
        "?indicated dbo:author dbr:Douglas_Adams .\n" +
        "}";

// An RDF graph representing an event. Normally, this would come from a dynamic data source.
// The example is from the Typeatron keyer (see http://github.com/joshsh/extendo)
String eventData = "@prefix activity: <http://fortytwo.net/2015/extendo/activity#> .\n" +
        "@prefix dbr: <http://dbpedia.org/resource/> .\n" +
        "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n" +
        "@prefix tl: <http://purl.org/NET/c4dm/timeline.owl#> .\n" +
        "@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n" +
        "\n" +
        "<urn:uuid:e6f4c759-712c-448c-96f0-c2ecee2ccb97> a activity:Point ;\n" +
        "    activity:actor <http://fortytwo.net/josh/things/JdGwZ4n> ;\n" +
        "    activity:thingIndicated dbr:The_Meaning_of_Liff ;\n" +
        "    activity:recognitionTime <urn:uuid:a4a2fd8c-ea0d-43bb-bcad-6510f4c9b55a> .\n" +
        "\n" +
        "<urn:uuid:a4a2fd8c-ea0d-43bb-bcad-6510f4c9b55a> a tl:Instant ;\n" +
        "    tl:at \"2015-02-13T21:00:12-05:00\"^^xsd:dateTime .";

// Instantiate the query engine.
QueryEngineImpl queryEngine = new QueryEngineImpl();

// Define a time-to-live for the query. It will expire after this many seconds,
// freeing up resources and ceasing to match statements.
int queryTtl = 10 * 60;

// Define a handler for answers to the query.
BindingSetHandler handler = new BindingSetHandler() {
    public void handle(final BindingSet answer) {
        System.out.println("found an answer to the query: " + answer);
    }
};

// Submit the query to the query engine to obtain a subscription.
Subscription sub = queryEngine.addQuery(queryTtl, query, handler);

// create subscriptions for additional queries at any time; queries match in parallel

// Add some data with infinite (= 0) time-to-live.
// Results derived from this data will never expire.
int staticTtl = 0;

// Add some static background knowledge.  Alternatively, let RDFStream4j discover this
// information as Linked Data (see LinkedDataExample.java).
Statement st = new StatementImpl(
        new URIImpl("http://dbpedia.org/resource/The_Meaning_of_Liff"),
        new URIImpl("http://dbpedia.org/ontology/author"),
        new URIImpl("http://dbpedia.org/resource/Douglas_Adams"));
queryEngine.addStatements(staticTtl, st);

// Now define a finite time-to-live of 30 seconds.
// This will be used for the short-lived data of gesture events.
int eventTtl = 30;

RDFFormat format = RDFFormat.TURTLE;
RDFParser parser = Rio.createParser(format);
parser.setRDFHandler(queryEngine.createRDFHandler(eventTtl));
// as new statements are added, computed query answers will be pushed to the BindingSetHandler
parser.parse(new ByteArrayInputStream(eventData.getBytes()), "");

// cancel the query subscription at any time;
// no further answers will be computed/produced for the corresponding query
sub.cancel();

// alternatively, renew the subscription for another 10 minutes
sub.renew(10 * 60);

See also the Linked Data example; here, we replace the above "hard-coded" background semantics with discovered information which the query engine proactively fetches from the Web:

// Create a Linked Data client and metadata store.  The Sesame triple store will be used for
// managing caching metadata, while the retrieved Linked Data will be fed into the continuous
// query engine, which will trigger the dereferencing of URIs in response to join operations.
MemoryStore sail = new MemoryStore();
sail.initialize();
LinkedDataCache.DataStore store = new LinkedDataCache.DataStore() {
    public RDFSink createInputSink(final SailConnection sc) {
        return queryEngine.createRDFSink(staticTtl);
    }
};
LinkedDataCache cache = LinkedDataCache.createDefault(sail);
cache.setDataStore(store);
queryEngine.setLinkedDataCache(cache, sail);

For projects which use Maven, RDFStream4j snapshots and release packages can be imported by adding configuration like the following to the project's POM:

    <dependency>
        <groupId>edu.rpi.twc.rdfstream4j</groupId>
        <artifactId>rdfstream4j-impl</artifactId>
        <version>1.3-SNAPSHOT</version>
    </dependency>

or if you will implement the API (e.g. for an RDFStream4j proxy),

    <dependency>
        <groupId>edu.rpi.twc.rdfstream4j</groupId>
        <artifactId>rdfstream4j-api</artifactId>
        <version>1.1-SNAPSHOT</version>
    </dependency>

The latest Maven packages can be browsed here. See also:

Send questions or comments to:

Josh email

Syntax reference

SPARQL syntax currently supported by RDFStream4j includes:

  • SELECT queries. SELECT subscriptions in RDFStream4j produce query answers indefinitely unless cancelled.
  • ASK queries. ASK subscriptions produce at most one query answer (indicating a result of true) and then are cancelled automatically, similarly to a SELECT query with a LIMIT of 1.
  • CONSTRUCT queries. Each query answer contains "subject", "predicate", and "object" bindings which may be turned into an RDF statement.
  • basic graph patterns
  • variable projection
  • all RDF Term syntax and triple pattern syntax via Sesame
  • FILTER constraints, with all SPARQL operator functions supported via Sesame except for EXISTS
  • DISTINCT modifier. Use with care if the streaming data source may produce an unlimited number of solutions.
  • REDUCED modifier. Similar to DISTINCT, but safe for long streams. Each subscription maintains a solution set which begins to recycle after it reaches a certain size, configurable with RDFStream4j.setReducedModifierCapacity().
  • LIMIT clause. Once LIMIT number of answers have been produced, the subscription is cancelled.
  • OFFSET clause. Since query answers roughly follow the order in which input statements are received, OFFSET can be practically useful even without ORDER BY (see below)

Syntax explicitly not supported:

  • ORDER BY. This is a closed-world operation which requires a finite data set or window; RDFStream4j queries over a stream of data and an infinite window.
  • SPARQL 1.1 aggregates. See above

Syntax not yet supported:

rdfstream4j's People

Contributors

joshsh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.