Coder Social home page Coder Social logo

comunica-experiments's Introduction

Comunica Experiments

Collection of various small experiments with Comunica. The experiments here are not ready for actual use anywhere, and are mostly just saved in case they will turn out to be useful later. There is nothing of interest here for most people, and nothing is guaranteed to work.

Generation of test data

The SolidBench benchmark can be used to generate social network test data and example queries, and serve them using Community Solid Server. For example, to generate data and then serve it using the default configuration:

yarn run generate
yarn run serve

Please check the SolidBench documentation for more details, especially the dependencies, as it requires Docker for the data generation part. The generated data will end up in the out-fragments folder in the workspace root.

Development setup

The project uses Yarn as package manager, so to install the dependencies and build the components after cloning, with optional --ignore-engines flag when using up-to-date Node versions and some packages complain:

yarn install
yarn run build

Generation of VOID descriptions for test data

The dataset description generator tool allows generating a dataset description for all the pods generated by SolidBench, using the VOID vocabulary. The description will contain information on the total count of triples in the pod, distinct subjects and objects, unique property count and the cardinalities of various properties. By default, the file gets placed in profile/voiddescription.nq for each pod, and is linked to from the WebID. For example:

@prefix void: <http://rdfs.org/ns/void#>.
@prefix ldbcv: <http://localhost:3000/www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/>.
...

<http://localhost:3000/pods/00000000000000000065> a void:Dataset;
    void:triples 3410;
    void:distinctSubjects 471;
    void:distinctObjects 111;
    void:properties 36;
    void:propertyPartition [ a 332 ];
    void:propertyPartition [ ldbcv:id 313 ];
    void:propertyPartition [ ldbcv:creationDate 314 ];
    void:propertyPartition [ ldbcv:locationIP 313 ];
    ... etc.

The generator tool is a somewhat dumb script that makes a lot of assumptions about the paths and other things, and is not yet reusable. The tool should get built with the rest of the workspaces. To run it from the repository root:

yarn run index

Additionally, some of the pods appear to have Solid type indexes generated by the SolidBench tool, by the looks of it, but not all of them.

Using VOID description metadata for querying

There exists the ActorRdfMetadataExtractVoidDescription actor in packages/ that extracts any VOID metadata it finds and places it in the metadata object, keeping track of metadata on a dataset level. The actor primarily focuses on providing predicate cardinalities for query operations. For running queries, there is the query runner tool in tools/query-runner that includes the new actor, by creating an instance of the query engine with a custom configuration in templates/config-query-cardinalities.json. Running the query tool should work with:

yarn run query

The query runner tool attempts to do some simple approximate timing of queries, just to get a rough estimate of the time it takes to execute one. The tool uses the default configuration in templates/config-runner.json by default, and runs each query on each config the number of times specified by repeat. The query durations are then averaged and eventually serialised in templates/results.csv for each combination. This is not necessarily a valid way to benchmark anything, it is just there to get some idea of how the configurations affect the query durations.

comunica-experiments's People

Contributors

simonvbrae avatar surilindur avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.