Coder Social home page Coder Social logo

Comments (3)

gpicciuca avatar gpicciuca commented on June 15, 2024 1

Thanks for your feedback @donpellegrino . I had a look at your Rust HDT Library and overall, you're using the same/a similar approach as what I am doing at the moment. It's mainly string manipulation/regex. In my case, it's quite fast and so far has not given problems but the context in which my implementation will be used requires us to avoid doing this kind of magic tricks (Automotive industry).

Meanwhile, I've been digging further into the matter with the C++ library and am still far from having a real solution, but I'm starting to understand what the actual problem is.

image

In the screenshot above, the element being extracted would be the Object, required for the Triple being queried by Rasqal.

CSD_PFC::extractInBlock is called with block = 0 and o = 14. That means that we're in the first available block and we have to move forward by "14" suffixes within this block.
To move forward, we have the VByte::decode function that decodes the first byte of the suffix, extract the delta (length of the suffix) and returns the amount of bytes to move forward indicating where the suffix actually starts (if I didn't misunderstand this last part).

This actually returns the correct value that we're looking for, but here's the catch: The data-type suffix I'm interested in comes right after the suffix where we stopped and this is the case in all of the common/shared prefixes as in this case.

In this particular case, at the beginning we have the "0" prefix, then move forward by 14 suffixes, extract the length which is (1) and store it in tmpStr and we stop here. Result yields the value "1". Which is correct.

extractInBlock ID: 0 pos2 1 delta 3
actual_ptr 0x7ffff71b1010
VByte::decode -> 1 delta 3
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 4
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 4
VByte::decode -> 1 delta 3
VByte::decode -> 1 delta 2
VByte::decode -> 1 delta 1
pos 14 >> "1"`

There is also another case where o = 0, meaning it won't even go into the for-loop, which makes sense because then it's the very first element and we don't need to look any further. But even here, we're losing out on the data-type suffix.

My guess is that this is an architectural issue as it depends on how the prefixes and suffixes are pooled and there's no real workaround for it.

from hdt-cpp.

gpicciuca avatar gpicciuca commented on June 15, 2024 1

I tried to hack the code a bit, thinking that the datatypes are "always" right after the first common/shared prefix.. So I just accessed that location directly with

image

and then append this suffix to the tmpStr variable at the end of extractInBlock only if it starts with ^^:

image

add_datatype defaults to false and I set it to true only when I'm retrieving data through the Dictionary::tripleIDtoTripleString method otherwise I ended up with a duplicated data-type string attached.

It works only partially. There are some results that get the correct data-type suffix, while others get nothing at all. So it's unreliable, too, as it depends on how the data is stored in the blocks.

from hdt-cpp.

donpellegrino avatar donpellegrino commented on June 15, 2024

I don't have a direct answer to the question, "whether there is a way to retrieve the datatypes stored inside the HDT file and have them passed (perhaps separately) through the iterator returned by the HDT::Search method." I would have to do some research to figure that out.

Combining HDT storage with a SPARQL Query Engine is useful work and integrating Rasqal with HDT sounds like a good approach. For reference, I have a branch of a fork of Oxigraph available that uses the Oxigraph SPARQL Query Engine and the Rust HDT Library for reading the HDT files. Since that implementation is in Rust rather than C, there will be differences in approach, but the code might show one technique for how the datatypes are handled when going from the HDT contents to the SPARQL query processing.

from hdt-cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.