Coder Social home page Coder Social logo

Slow DataLoader Performance about database HOT 4 OPEN

mstabile75 avatar mstabile75 commented on June 18, 2024
Slow DataLoader Performance

from database.

Comments (4)

beebs-systap avatar beebs-systap commented on June 18, 2024

@mstabile75 One thing to try is to configure the branching factors based one a partial load of the data. It's very possible that the shape of your data is causing some inefficiencies in the underlying storage.

Try loading part of your data, running the DumpJournal, and then updating the properties file with the new branching factors and reloading.

https://wiki.blazegraph.com/wiki/index.php/IOOptimization#Branching_Factors

from database.

mstabile75 avatar mstabile75 commented on June 18, 2024

Thanks, I'll try this out this week and post the results.

from database.

KMax avatar KMax commented on June 18, 2024

Hi @mstabile75, did the branching factors helped you? Thanks!

from database.

KMax avatar KMax commented on June 18, 2024

Okay, I tried to configure the branching factors based on the output of com.bigdata.journal.DumpJournal, but the speed (triples/sec) was even worse comparing to another run only with the global com.bigdata.btree.BTree.branchingFactor=256.

I tried to load ~2bln triples using a VM with 4xCPU, 26Gb, 700Gb Local SSD.

Did I set the branching factors correctly? Is there anything that could minimize the effect from the custom branching factors?

This is the properties:

# changing the axiom model to none essentially disables all inference
com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
com.bigdata.rdf.store.AbstractTripleStore.quads=true
com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false

com.bigdata.rdf.store.AbstractTripleStore.geoSpatial=false
com.bigdata.rdf.sail.truthMaintenance=false
com.bigdata.rdf.store.AbstractTripleStore.textIndex=false
com.bigdata.rdf.store.AbstractTripleStore.justify=false

# RWStore (scalable single machine backend)
com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
com.bigdata.journal.AbstractJournal.file=/blazegraph/db/bigdata.jnl
com.bigdata.journal.AbstractJournal.writeCacheBufferCount=2000

# Enable small slot optimization.
com.bigdata.rwstore.RWStore.smallSlotType=1024
# Set the default B+Tree branching factor.
com.bigdata.btree.BTree.branchingFactor=256
com.bigdata.namespace.__globalRowStore.com.bigdata.btree.BTree.branchingFactor=592
com.bigdata.namespace.kb.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor=2109
com.bigdata.namespace.kb.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor=903
com.bigdata.namespace.kb.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor=367
com.bigdata.namespace.kb.lex.search.com.bigdata.btree.BTree.branchingFactor=517
com.bigdata.namespace.kb.spo.CSPO.com.bigdata.btree.BTree.branchingFactor=731
com.bigdata.namespace.kb.spo.OCSP.com.bigdata.btree.BTree.branchingFactor=667
com.bigdata.namespace.kb.spo.PCSO.com.bigdata.btree.BTree.branchingFactor=864
com.bigdata.namespace.kb.spo.POCS.com.bigdata.btree.BTree.branchingFactor=816
com.bigdata.namespace.kb.spo.SOPC.com.bigdata.btree.BTree.branchingFactor=630
com.bigdata.namespace.kb.spo.SPOC.com.bigdata.btree.BTree.branchingFactor=604
# Set the default B+Tree retention queue capacity.
com.bigdata.btree.writeRetentionQueue.capacity=4000

from database.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.