Coder Social home page Coder Social logo

Comments (5)

realstolz avatar realstolz commented on June 1, 2024

We are considering to add the feature, dump the in-memory gstore to binary files on DFS and reload them, but not ready now. Previously, I have some tips which might help you.

  1. After loading dataset, you can write new queries and run them if you don’t need to change the dataset.
  2. Most configuration items can be changed at runtime by “config -s XXX” command without restart Wukong.
  3. To speed up dataset loading, you can
    1. always split your input RDF data into multiple files (#files >= #machines * #engine-threads), since Wukong can parallel load them.
    2. replace “str_normal” with “str_normal_minimal”, which only contain the ID-mapping used by your queries (if you don't need to read the query results in string format). Currently, wukong loads the ID-mapping file (“str_normal” or “str_normal_minimal”) sequentially.
    3. disable the planner in “config” file (i.e., global_enable_planner), and then Wukong will not collect data statistics used by planner when loading dataset. Currently, Wukong does it sequentially without optimizations. It should be noted that you must manually decide the order of triple patterns with the access direction in your query (e.g., script/query/lubm_q*) after disabling planner.

from wukong.

kjhkim0702 avatar kjhkim0702 commented on June 1, 2024

Thank you for the comment. I should try your advice. I have one question related to 3-iii, and another question about batch-mode.
Q1. related to 3-iii. : I want to try queries other than script/query/lubm_q*. Is there any suggestions(or rules) of deciding the matching order of triple patterns?
Q2. batch mode: help command shows "-b a set of queries configured by (batch-mode)". Does batch-mode mean that wukong processes a batch of queries concurrently? How can I configure a set of queries in an input file(I don't understand the meaning of numbers in scripts/batch/mix_config) ?

from wukong.

realstolz avatar realstolz commented on June 1, 2024

For Q1, we only have some simple heuristic rules. For selective queries, the good plan commonly starts from a normal vertex (constant entity, like <http://www.Department0.University0.edu> in lubm_q4). For non-selective queries, the good plan commonly starts from an index vertex (type or predicate, like ub:FullProfessor in lubm_q7 and ub:undergraduateDegreeFrom in lubm_q3), and prefers to a relatively large pruning. Moreover, you can evaluate different plans on a small dataset and reuse it on a large dataset if two datasets follow the same style.

For Q2, the batch mode is used to evaluate the throughput of wukong (sorry for the confusing name), which generates a large number of queries from templates and continuously submits them to engines. A configuration file (i.e., script/batch/mix_config) specifies the combination and submission rate of queries, which are defined in several query template files (i.e., script/batch/q?) .

The only difference between query file and query template file: the start constant vertex (e.g., <http://www.Department0.University0.edu> in script/query/lubm_q4) is replaced by a type (e.g., %ub:Department in script/batch/q4). The wukong’s proxy will first send a query to retrieve a pool of all vertices belonged to the type, and generate queries by randomly replacing the type in the query template with a vertex from the pool.

BTW, the real batch mode you wanted is easy to implement. You can first add a new command, which reads a configuration file with a set of query files, and then the proxy can send and receive the queries one-by-one or in parallel.

from wukong.

kjhkim0702 avatar kjhkim0702 commented on June 1, 2024

Thank you.

from wukong.

realstolz avatar realstolz commented on June 1, 2024

The new version (v0.2.0) has supported run a batch of SPARQL queries by using a single command.

from wukong.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.