Coder Social home page Coder Social logo

Improve performance about rdf4h HOT 4 OPEN

robstewart57 avatar robstewart57 commented on August 18, 2024
Improve performance

from rdf4h.

Comments (4)

wismill avatar wismill commented on August 18, 2024

@robstewart57 what do you think about this proposal?

I would like to keep only Megaparsec for the parsing, as it is now very fast and robust.

from rdf4h.

robstewart57 avatar robstewart57 commented on August 18, 2024

@wismill the idea of #36 was to generalise rdf4h across numerous parsers, specificlaly attoparsec and parsec. E.g.

-- |'NTriplesParser' is an instance of 'RdfParser' using parsec based parsers.
instance RdfParser NTriplesParser where
  parseString _  = parseStringParsec
  parseFile   _  = parseFileParsec
  parseURL    _  = parseURLParsec

-- |'NTriplesParser' is an instance of 'RdfParser'.
instance RdfParser NTriplesParserCustom where
  parseString (NTriplesParserCustom Parsec)     = parseStringParsec
  parseString (NTriplesParserCustom Attoparsec) = parseStringAttoparsec
  parseFile   (NTriplesParserCustom Parsec)     = parseFileParsec
  parseFile   (NTriplesParserCustom Attoparsec) = parseFileAttoparsec
  parseURL    (NTriplesParserCustom Parsec)     = parseURLParsec
  parseURL    (NTriplesParserCustom Attoparsec) = parseURLAttoparsec

-- |'NTriplesParser' is an instance of 'RdfParser' using parsec based parsers.
.

This functionality comes from this PR: #36

The motivation of this flexibility between parsers is that each of they have trade offs. E.g. "Megaparsec vs Attoparsec" : https://github.com/mrkkrp/megaparsec#megaparsec-vs-attoparsec

Attoparsec is sometimes faster but not that feature-rich. It should be used when you want to process large amounts of data where performance matters more than quality of error messages.

This is a realistic assumption when working with RDF data, which might be 10s/100s MegaBytes or millions of triples.

There's megaparsec instances in parsers-megaparsec that we could use: https://hackage.haskell.org/package/parsers-megaparsec

The issue arises when attoparsec, parsec and megaparsec instances of the typeclasses in the parsers have different parsing semantics (which presumably shouldn't happen), meaning the rdf4h tests against w2c-tests might pass for one instance of the parsers typeclasses, e.g. parsec, but fail for others, megaparsec.

from rdf4h.

wismill avatar wismill commented on August 18, 2024

Ok, but at least we could drop parsec and keep only attoparsec and megaparsec. Then we could use parser-combinators.

I think we should also provide a way to stream parsing results. As there are several framework for this, I wonder if we should provide new packages such as: rdf4h-pipes, rdf4h-conduit, rdf4h-streamly, etc.

from rdf4h.

robstewart57 avatar robstewart57 commented on August 18, 2024

@wismill

I think we should also provide a way to stream parsing results. As there are several framework for this, I wonder if we should provide new packages such as: rdf4h-pipes, rdf4h-conduit, rdf4h-streamly, etc.

I would also really like to see this!

from rdf4h.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.