I am trying to run lubm query on some large lubm datasets. It took large amount of tim

Can I export a loaded database? about wukong HOT 5 CLOSED

sjtu-ipads commented on June 1, 2024

Can I export a loaded database?

from wukong.

Comments (5)

realstolz commented on June 1, 2024

We are considering to add the feature, dump the in-memory gstore to binary files on DFS and reload them, but not ready now. Previously, I have some tips which might help you.

After loading dataset, you can write new queries and run them if you don’t need to change the dataset.
Most configuration items can be changed at runtime by “config -s XXX” command without restart Wukong.
To speed up dataset loading, you can
1. always split your input RDF data into multiple files (#files >= #machines * #engine-threads), since Wukong can parallel load them.
2. replace “str_normal” with “str_normal_minimal”, which only contain the ID-mapping used by your queries (if you don't need to read the query results in string format). Currently, wukong loads the ID-mapping file (“str_normal” or “str_normal_minimal”) sequentially.
3. disable the planner in “config” file (i.e., global_enable_planner), and then Wukong will not collect data statistics used by planner when loading dataset. Currently, Wukong does it sequentially without optimizations. It should be noted that you must manually decide the order of triple patterns with the access direction in your query (e.g., script/query/lubm_q*) after disabling planner.

from wukong.

kjhkim0702 commented on June 1, 2024

Thank you for the comment. I should try your advice. I have one question related to 3-iii, and another question about batch-mode.
Q1. related to 3-iii. : I want to try queries other than script/query/lubm_q*. Is there any suggestions(or rules) of deciding the matching order of triple patterns?
Q2. batch mode: help command shows "-b a set of queries configured by (batch-mode)". Does batch-mode mean that wukong processes a batch of queries concurrently? How can I configure a set of queries in an input file(I don't understand the meaning of numbers in scripts/batch/mix_config) ?

from wukong.

realstolz commented on June 1, 2024

For Q1, we only have some simple heuristic rules. For selective queries, the good plan commonly starts from a normal vertex (constant entity, like <http://www.Department0.University0.edu> in lubm_q4). For non-selective queries, the good plan commonly starts from an index vertex (type or predicate, like ub:FullProfessor in lubm_q7 and ub:undergraduateDegreeFrom in lubm_q3), and prefers to a relatively large pruning. Moreover, you can evaluate different plans on a small dataset and reuse it on a large dataset if two datasets follow the same style.

For Q2, the batch mode is used to evaluate the throughput of wukong (sorry for the confusing name), which generates a large number of queries from templates and continuously submits them to engines. A configuration file (i.e., script/batch/mix_config) specifies the combination and submission rate of queries, which are defined in several query template files (i.e., script/batch/q?) .

The only difference between query file and query template file: the start constant vertex (e.g., <http://www.Department0.University0.edu> in script/query/lubm_q4) is replaced by a type (e.g., %ub:Department in script/batch/q4). The wukong’s proxy will first send a query to retrieve a pool of all vertices belonged to the type, and generate queries by randomly replacing the type in the query template with a vertex from the pool.

BTW, the real batch mode you wanted is easy to implement. You can first add a new command, which reads a configuration file with a set of query files, and then the proxy can send and receive the queries one-by-one or in parallel.

from wukong.

kjhkim0702 commented on June 1, 2024

Thank you.

from wukong.

realstolz commented on June 1, 2024

The new version (v0.2.0) has supported run a batch of SPARQL queries by using a single command.

from wukong.

Can I export a loaded database? about wukong HOT 5 CLOSED

Comments (5)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent