Comments (3)
I think that tightly coupled to this criterion is "graph building" vs "graph querying" performance. I'll get to it later, but first I'd like to discuss one very specific issue that I've come across while refactoring TriplesGraph
(following #12 issue):
- The output of graph implementations must have expanded URIs or contain "expandable instances" (e.g. (
Namespace
, "URI fragment") couples or anything that can be resolved into an absolute URI). That is, a simpleUNode("rdf:type")
is no good.
Or... perhaps that's fine to leave those URIs as they are?
We have an option of letting the user take care of prefix expansion and "absolutizing" URIs. But I don't see why we should force users to resolve "rdf:type" -- after all, if they stick to using "rdf:type" everywhere (building and querying the graph), it's as good as an absolute URI.
It would be different however, if we accepted only absolute URIs into UNode
and had a separate Node type for prefixed nodes, as you suggested earlier. Then we would also need a separate Node type for relative URIs... I think it all makes sense and a certain underpinning in terms of categories (that we are expected to manipulate in Haskell š ), but as far as performance goes... I'm not so sure. Would it also be possible to efficiently build and query graphs where one node can be represented by 3 different node types?
So, to summarize, we have 3 options:
- Force absolute URIs in
UNode
s. - Handle absolute, prefixed and relative URIs as separate types.
- Leave it up to user to resolve/not resolve prefixed/relative URIs.
And now, for that subtlety of building and querying graphs that I mentioned in the beginning. If we somehow handle URIs in RDF4H (options 1 or 2 above), we still have 2 options where to do it:
- Perform resolution at graph building time (
mkRdf
and possibly functions for adding triples/nodes). - Perform resolution at graph querying time (
query
,select
and a dozen of other functions).
I think it largely depends on a given graph implementation and we cannot say in advance which would be faster (although the 2nd option sounds like a lot of coding compared to the 1st option, because one has to ensure consistent output of all functions, while doing so in mkRdf
automatically ensures that all throughout the rest of the implementation). And I believe it's the same as far as filtering of the duplicates is concerned. Perhaps there are implementations that even allow duplicates in the input or output. In any case, it makes sense to have 2 benchmarks: one for building, another for querying the graph.
This, most likely, is a question to graph implementors, and we don't have much to discuss here. Just sharing my thoughts. But we should really sort out things with absolute/relative/prefixed URIs in Nodes first, I believe.
What are your thoughts?
from rdf4h.
I've made a start with 8a7e6b0 . The preliminary results are:
Benchmark rdf4h-bench: RUNNING...
benchmarking parse/TriplesGraph
time 768.8 ms (759.9 ms .. 773.7 ms)
1.000 RĀ² (1.000 RĀ² .. 1.000 RĀ²)
mean 769.0 ms (767.8 ms .. 769.9 ms)
std dev 1.328 ms (0.0 s .. 1.517 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking parse/MGraph
time 1.067 s (775.7 ms .. 1.374 s)
0.990 RĀ² (0.963 RĀ² .. 1.000 RĀ²)
mean 995.5 ms (918.9 ms .. 1.046 s)
std dev 75.93 ms (0.0 s .. 87.66 ms)
variance introduced by outliers: 21% (moderately inflated)
benchmarking parse/PatriciaTreeGraph
time 1.289 s (1.250 s .. 1.348 s)
1.000 RĀ² (0.999 RĀ² .. 1.000 RĀ²)
mean 1.265 s (1.256 s .. 1.272 s)
std dev 12.51 ms (0.0 s .. 13.32 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking query/TriplesGraph
time 493.7 Ī¼s (484.9 Ī¼s .. 505.4 Ī¼s)
0.998 RĀ² (0.996 RĀ² .. 1.000 RĀ²)
mean 487.1 Ī¼s (484.4 Ī¼s .. 492.3 Ī¼s)
std dev 12.53 Ī¼s (6.996 Ī¼s .. 18.90 Ī¼s)
variance introduced by outliers: 17% (moderately inflated)
benchmarking query/MGraph
time 383.2 ns (382.2 ns .. 384.7 ns)
1.000 RĀ² (1.000 RĀ² .. 1.000 RĀ²)
mean 382.9 ns (382.6 ns .. 383.6 ns)
std dev 1.294 ns (686.9 ps .. 2.390 ns)
benchmarking query/PatriciaTreeGraph
time 12.88 ms (12.80 ms .. 12.95 ms)
1.000 RĀ² (1.000 RĀ² .. 1.000 RĀ²)
mean 12.89 ms (12.78 ms .. 12.98 ms)
std dev 244.8 Ī¼s (155.0 Ī¼s .. 395.0 Ī¼s)
benchmarking select/TriplesGraph
time 709.9 Ī¼s (708.6 Ī¼s .. 711.5 Ī¼s)
1.000 RĀ² (1.000 RĀ² .. 1.000 RĀ²)
mean 712.5 Ī¼s (711.2 Ī¼s .. 715.0 Ī¼s)
std dev 5.740 Ī¼s (3.679 Ī¼s .. 8.686 Ī¼s)
benchmarking select/MGraph
time 6.844 ms (6.738 ms .. 6.933 ms)
0.996 RĀ² (0.993 RĀ² .. 0.998 RĀ²)
mean 6.901 ms (6.810 ms .. 7.045 ms)
std dev 321.9 Ī¼s (224.2 Ī¼s .. 441.3 Ī¼s)
variance introduced by outliers: 24% (moderately inflated)
benchmarking select/PatriciaTreeGraph
time 14.32 ms (14.12 ms .. 14.50 ms)
0.999 RĀ² (0.999 RĀ² .. 1.000 RĀ²)
mean 14.26 ms (14.13 ms .. 14.37 ms)
std dev 286.3 Ī¼s (200.4 Ī¼s .. 459.9 Ī¼s)
Benchmark rdf4h-bench: FINISH
from rdf4h.
Here are the RDF graph implementation benchmarks from October 2016:
http://robstewart57.github.io/rdf4h/rdf-bench-04102016.html
from rdf4h.
Related Issues (20)
- Cannot use parseFile with RDF that does not have a base URI and URI without "http:"
- Invalid RDF format when writing to file
- Invalid blank node creation when using BNode or bnode HOT 1
- Have common Vocabularies compiled into the library HOT 12
- Use rdfs:comment string to generate documentation in genVocabulary
- Parser not handling correct turtle file HOT 5
- Implement RDF Canonicalization algorithm HOT 1
- Base URI not detected with TurtleParser HOT 1
- Corner case property failure for query_match_spo
- Another corner case property failure for query_match_spo
- Can't override default prefixes in TurtleSerializer HOT 1
- TurtleSerializer baseUrl and prefix mappings effect HOT 8
- mkRdf using default prefixes HOT 5
- Build failure due to GHC option "-threaded" HOT 1
- Questions about Blank Nodes HOT 4
- Add a Namespace for schema.org HOT 2
- Exclude GPL hgal dependency HOT 10
- Aliased URI Not Properly Serialized
- Non-URI Nodes (TurtleSerializer)? HOT 2
- Bug: TurtleSerializer Does Not Properly Group Common Subjects HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rdf4h.