Comments (3)
Thanks for your feedback @donpellegrino . I had a look at your Rust HDT Library and overall, you're using the same/a similar approach as what I am doing at the moment. It's mainly string manipulation/regex. In my case, it's quite fast and so far has not given problems but the context in which my implementation will be used requires us to avoid doing this kind of magic tricks (Automotive industry).
Meanwhile, I've been digging further into the matter with the C++ library and am still far from having a real solution, but I'm starting to understand what the actual problem is.
In the screenshot above, the element being extracted would be the Object, required for the Triple being queried by Rasqal.
CSD_PFC::extractInBlock is called with block = 0
and o = 14
. That means that we're in the first available block and we have to move forward by "14" suffixes within this block.
To move forward, we have the VByte::decode
function that decodes the first byte of the suffix, extract the delta (length of the suffix) and returns the amount of bytes to move forward indicating where the suffix actually starts (if I didn't misunderstand this last part).
This actually returns the correct value that we're looking for, but here's the catch: The data-type suffix I'm interested in comes right after the suffix where we stopped and this is the case in all of the common/shared prefixes as in this case.
In this particular case, at the beginning we have the "0" prefix, then move forward by 14 suffixes, extract the length which is (1) and store it in tmpStr and we stop here. Result yields the value "1". Which is correct.
extractInBlock ID: 0 pos2 1 delta 3
actual_ptr 0x7ffff71b1010
VByte::decode -> 1 delta 3
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 4
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 4
VByte::decode -> 1 delta 3
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 1
pos 14 >> "1"`
There is also another case where o = 0
, meaning it won't even go into the for-loop, which makes sense because then it's the very first element and we don't need to look any further. But even here, we're losing out on the data-type suffix.
My guess is that this is an architectural issue as it depends on how the prefixes and suffixes are pooled and there's no real workaround for it.
from hdt-cpp.
I tried to hack the code a bit, thinking that the datatypes are "always" right after the first common/shared prefix.. So I just accessed that location directly with
and then append this suffix to the tmpStr
variable at the end of extractInBlock
only if it starts with ^^
:
add_datatype
defaults to false
and I set it to true
only when I'm retrieving data through the Dictionary::tripleIDtoTripleString
method otherwise I ended up with a duplicated data-type string attached.
It works only partially. There are some results that get the correct data-type suffix, while others get nothing at all. So it's unreliable, too, as it depends on how the data is stored in the blocks.
from hdt-cpp.
I don't have a direct answer to the question, "whether there is a way to retrieve the datatypes stored inside the HDT file and have them passed (perhaps separately) through the iterator returned by the HDT::Search method." I would have to do some research to figure that out.
Combining HDT storage with a SPARQL Query Engine is useful work and integrating Rasqal with HDT sounds like a good approach. For reference, I have a branch of a fork of Oxigraph available that uses the Oxigraph SPARQL Query Engine and the Rust HDT Library for reading the HDT files. Since that implementation is in Rust rather than C, there will be differences in approach, but the code might show one technique for how the datatypes are handled when going from the HDT contents to the SPARQL query processing.
from hdt-cpp.
Related Issues (20)
- Test case "properties" fails HOT 1
- Code formatting / beautifier needed. HOT 1
- Evaluate Parallel Hashmap for potential performance benefits HOT 2
- Add option to ignore error instead of throwing error HOT 5
- `make install` does not install triples/ directory -- hdt-it still active? HOT 1
- clang-format of libdcs [sic]
- hdt::QueryProcessor.searchJoin() gives incorrect results HOT 6
- Compile error on macOS with "make -j2" command HOT 2
- rdf2hdt stops without error message HOT 3
- Add encryption-at-rest to libraries HOT 1
- rdf2hdt produces invalid UTF8 values? HOT 1
- undefined reference to `hdt::HDTManager::mapHDT(char const*, hdt::ProgressListener*)'
- support for quads/named graphs HOT 3
- Memcpy to nullptr in CSD_HTFC::CSD_HTFC()
- Support N-Quads for the C++ repo
- ./configure --static does not produce static binaries HOT 5
- CLI tools fail to handle -V or report incorrect version number for the 1.3.3 release.
- rdf2hdt fails to handle "<>" from input RDF/Turtle
- Adding built docker image to gchr
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hdt-cpp.