We don't know what the construction or query performance is for <code class="notransla

I've made a start with <a class="commit-link" data-hovercard-type="commit" data-hoverc

Criterion benchmarks for RDF graph instances about rdf4h HOT 3 CLOSED

robstewart57 commented on August 18, 2024

Criterion benchmarks for RDF graph instances

from rdf4h.

Comments (3)

cordawyn commented on August 18, 2024

I think that tightly coupled to this criterion is "graph building" vs "graph querying" performance. I'll get to it later, but first I'd like to discuss one very specific issue that I've come across while refactoring TriplesGraph (following #12 issue):

The output of graph implementations must have expanded URIs or contain "expandable instances" (e.g. (Namespace, "URI fragment") couples or anything that can be resolved into an absolute URI). That is, a simple UNode("rdf:type") is no good.

Or... perhaps that's fine to leave those URIs as they are?

We have an option of letting the user take care of prefix expansion and "absolutizing" URIs. But I don't see why we should force users to resolve "rdf:type" -- after all, if they stick to using "rdf:type" everywhere (building and querying the graph), it's as good as an absolute URI.

It would be different however, if we accepted only absolute URIs into UNode and had a separate Node type for prefixed nodes, as you suggested earlier. Then we would also need a separate Node type for relative URIs... I think it all makes sense and a certain underpinning in terms of categories (that we are expected to manipulate in Haskell 😃 ), but as far as performance goes... I'm not so sure. Would it also be possible to efficiently build and query graphs where one node can be represented by 3 different node types?

So, to summarize, we have 3 options:

Force absolute URIs in UNodes.
Handle absolute, prefixed and relative URIs as separate types.
Leave it up to user to resolve/not resolve prefixed/relative URIs.

And now, for that subtlety of building and querying graphs that I mentioned in the beginning. If we somehow handle URIs in RDF4H (options 1 or 2 above), we still have 2 options where to do it:

Perform resolution at graph building time (mkRdf and possibly functions for adding triples/nodes).
Perform resolution at graph querying time (query, select and a dozen of other functions).

I think it largely depends on a given graph implementation and we cannot say in advance which would be faster (although the 2nd option sounds like a lot of coding compared to the 1st option, because one has to ensure consistent output of all functions, while doing so in mkRdf automatically ensures that all throughout the rest of the implementation). And I believe it's the same as far as filtering of the duplicates is concerned. Perhaps there are implementations that even allow duplicates in the input or output. In any case, it makes sense to have 2 benchmarks: one for building, another for querying the graph.

This, most likely, is a question to graph implementors, and we don't have much to discuss here. Just sharing my thoughts. But we should really sort out things with absolute/relative/prefixed URIs in Nodes first, I believe.

What are your thoughts?

from rdf4h.

robstewart57 commented on August 18, 2024

I've made a start with 8a7e6b0 . The preliminary results are:

Benchmark rdf4h-bench: RUNNING...
benchmarking parse/TriplesGraph
time                 768.8 ms   (759.9 ms .. 773.7 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 769.0 ms   (767.8 ms .. 769.9 ms)
std dev              1.328 ms   (0.0 s .. 1.517 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking parse/MGraph
time                 1.067 s    (775.7 ms .. 1.374 s)
                     0.990 R²   (0.963 R² .. 1.000 R²)
mean                 995.5 ms   (918.9 ms .. 1.046 s)
std dev              75.93 ms   (0.0 s .. 87.66 ms)
variance introduced by outliers: 21% (moderately inflated)

benchmarking parse/PatriciaTreeGraph
time                 1.289 s    (1.250 s .. 1.348 s)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.265 s    (1.256 s .. 1.272 s)
std dev              12.51 ms   (0.0 s .. 13.32 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking query/TriplesGraph
time                 493.7 μs   (484.9 μs .. 505.4 μs)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 487.1 μs   (484.4 μs .. 492.3 μs)
std dev              12.53 μs   (6.996 μs .. 18.90 μs)
variance introduced by outliers: 17% (moderately inflated)

benchmarking query/MGraph
time                 383.2 ns   (382.2 ns .. 384.7 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 382.9 ns   (382.6 ns .. 383.6 ns)
std dev              1.294 ns   (686.9 ps .. 2.390 ns)

benchmarking query/PatriciaTreeGraph
time                 12.88 ms   (12.80 ms .. 12.95 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 12.89 ms   (12.78 ms .. 12.98 ms)
std dev              244.8 μs   (155.0 μs .. 395.0 μs)

benchmarking select/TriplesGraph
time                 709.9 μs   (708.6 μs .. 711.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 712.5 μs   (711.2 μs .. 715.0 μs)
std dev              5.740 μs   (3.679 μs .. 8.686 μs)

benchmarking select/MGraph
time                 6.844 ms   (6.738 ms .. 6.933 ms)
                     0.996 R²   (0.993 R² .. 0.998 R²)
mean                 6.901 ms   (6.810 ms .. 7.045 ms)
std dev              321.9 μs   (224.2 μs .. 441.3 μs)
variance introduced by outliers: 24% (moderately inflated)

benchmarking select/PatriciaTreeGraph
time                 14.32 ms   (14.12 ms .. 14.50 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 14.26 ms   (14.13 ms .. 14.37 ms)
std dev              286.3 μs   (200.4 μs .. 459.9 μs)

Benchmark rdf4h-bench: FINISH

from rdf4h.

robstewart57 commented on August 18, 2024

Here are the RDF graph implementation benchmarks from October 2016:

http://robstewart57.github.io/rdf4h/rdf-bench-04102016.html

from rdf4h.

Criterion benchmarks for RDF graph instances about rdf4h HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent