Coder Social home page Coder Social logo

Comments (3)

cordawyn avatar cordawyn commented on August 18, 2024

I think that tightly coupled to this criterion is "graph building" vs "graph querying" performance. I'll get to it later, but first I'd like to discuss one very specific issue that I've come across while refactoring TriplesGraph (following #12 issue):

  • The output of graph implementations must have expanded URIs or contain "expandable instances" (e.g. (Namespace, "URI fragment") couples or anything that can be resolved into an absolute URI). That is, a simple UNode("rdf:type") is no good.

Or... perhaps that's fine to leave those URIs as they are?

We have an option of letting the user take care of prefix expansion and "absolutizing" URIs. But I don't see why we should force users to resolve "rdf:type" -- after all, if they stick to using "rdf:type" everywhere (building and querying the graph), it's as good as an absolute URI.

It would be different however, if we accepted only absolute URIs into UNode and had a separate Node type for prefixed nodes, as you suggested earlier. Then we would also need a separate Node type for relative URIs... I think it all makes sense and a certain underpinning in terms of categories (that we are expected to manipulate in Haskell šŸ˜ƒ ), but as far as performance goes... I'm not so sure. Would it also be possible to efficiently build and query graphs where one node can be represented by 3 different node types?

So, to summarize, we have 3 options:

  1. Force absolute URIs in UNodes.
  2. Handle absolute, prefixed and relative URIs as separate types.
  3. Leave it up to user to resolve/not resolve prefixed/relative URIs.

And now, for that subtlety of building and querying graphs that I mentioned in the beginning. If we somehow handle URIs in RDF4H (options 1 or 2 above), we still have 2 options where to do it:

  1. Perform resolution at graph building time (mkRdf and possibly functions for adding triples/nodes).
  2. Perform resolution at graph querying time (query, select and a dozen of other functions).

I think it largely depends on a given graph implementation and we cannot say in advance which would be faster (although the 2nd option sounds like a lot of coding compared to the 1st option, because one has to ensure consistent output of all functions, while doing so in mkRdf automatically ensures that all throughout the rest of the implementation). And I believe it's the same as far as filtering of the duplicates is concerned. Perhaps there are implementations that even allow duplicates in the input or output. In any case, it makes sense to have 2 benchmarks: one for building, another for querying the graph.

This, most likely, is a question to graph implementors, and we don't have much to discuss here. Just sharing my thoughts. But we should really sort out things with absolute/relative/prefixed URIs in Nodes first, I believe.

What are your thoughts?

from rdf4h.

robstewart57 avatar robstewart57 commented on August 18, 2024

I've made a start with 8a7e6b0 . The preliminary results are:

Benchmark rdf4h-bench: RUNNING...
benchmarking parse/TriplesGraph
time                 768.8 ms   (759.9 ms .. 773.7 ms)
                     1.000 RĀ²   (1.000 RĀ² .. 1.000 RĀ²)
mean                 769.0 ms   (767.8 ms .. 769.9 ms)
std dev              1.328 ms   (0.0 s .. 1.517 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking parse/MGraph
time                 1.067 s    (775.7 ms .. 1.374 s)
                     0.990 RĀ²   (0.963 RĀ² .. 1.000 RĀ²)
mean                 995.5 ms   (918.9 ms .. 1.046 s)
std dev              75.93 ms   (0.0 s .. 87.66 ms)
variance introduced by outliers: 21% (moderately inflated)

benchmarking parse/PatriciaTreeGraph
time                 1.289 s    (1.250 s .. 1.348 s)
                     1.000 RĀ²   (0.999 RĀ² .. 1.000 RĀ²)
mean                 1.265 s    (1.256 s .. 1.272 s)
std dev              12.51 ms   (0.0 s .. 13.32 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking query/TriplesGraph
time                 493.7 Ī¼s   (484.9 Ī¼s .. 505.4 Ī¼s)
                     0.998 RĀ²   (0.996 RĀ² .. 1.000 RĀ²)
mean                 487.1 Ī¼s   (484.4 Ī¼s .. 492.3 Ī¼s)
std dev              12.53 Ī¼s   (6.996 Ī¼s .. 18.90 Ī¼s)
variance introduced by outliers: 17% (moderately inflated)

benchmarking query/MGraph
time                 383.2 ns   (382.2 ns .. 384.7 ns)
                     1.000 RĀ²   (1.000 RĀ² .. 1.000 RĀ²)
mean                 382.9 ns   (382.6 ns .. 383.6 ns)
std dev              1.294 ns   (686.9 ps .. 2.390 ns)

benchmarking query/PatriciaTreeGraph
time                 12.88 ms   (12.80 ms .. 12.95 ms)
                     1.000 RĀ²   (1.000 RĀ² .. 1.000 RĀ²)
mean                 12.89 ms   (12.78 ms .. 12.98 ms)
std dev              244.8 Ī¼s   (155.0 Ī¼s .. 395.0 Ī¼s)

benchmarking select/TriplesGraph
time                 709.9 Ī¼s   (708.6 Ī¼s .. 711.5 Ī¼s)
                     1.000 RĀ²   (1.000 RĀ² .. 1.000 RĀ²)
mean                 712.5 Ī¼s   (711.2 Ī¼s .. 715.0 Ī¼s)
std dev              5.740 Ī¼s   (3.679 Ī¼s .. 8.686 Ī¼s)

benchmarking select/MGraph
time                 6.844 ms   (6.738 ms .. 6.933 ms)
                     0.996 RĀ²   (0.993 RĀ² .. 0.998 RĀ²)
mean                 6.901 ms   (6.810 ms .. 7.045 ms)
std dev              321.9 Ī¼s   (224.2 Ī¼s .. 441.3 Ī¼s)
variance introduced by outliers: 24% (moderately inflated)

benchmarking select/PatriciaTreeGraph
time                 14.32 ms   (14.12 ms .. 14.50 ms)
                     0.999 RĀ²   (0.999 RĀ² .. 1.000 RĀ²)
mean                 14.26 ms   (14.13 ms .. 14.37 ms)
std dev              286.3 Ī¼s   (200.4 Ī¼s .. 459.9 Ī¼s)

Benchmark rdf4h-bench: FINISH

from rdf4h.

robstewart57 avatar robstewart57 commented on August 18, 2024

Here are the RDF graph implementation benchmarks from October 2016:

http://robstewart57.github.io/rdf4h/rdf-bench-04102016.html

from rdf4h.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.