Coder Social home page Coder Social logo

chestnut's Introduction

Welcome to Chestnut project!

Chestnut is a data layout generator for in-memory object-oriented database applications.

  • It takes a set of object queries and a memory bound as input and generates customized data layout.

  • The nested data layout can be nested (to reduce serializaing from the tabular layout as used in database to the nested layout as used in the application) with fancy indexes (to preprocess the queries as much as possible).

  • Chestnut formulates the search of optimal layout under a memory bound into ILP problem which can achieve desired tradeoff between memory and query performance.

While still under maintainance, you may check out some examples under benchmarks/.

For instance, in benchmarks/kandan/kandan.py, you can run

generate_db_data_files(datafile_dir, tables, associations)
populate_database(datafile_dir, tables, associations, True)

which generates random data for the application and populates MySQL database with the random data. You can go to kandan_schema.py to change the scale in order to generate data with different sizes. The generated data will be in a tsv file with fields separated by |, stored under ./data/#{workload_name}/

Then you can run

search_plans_for_one_query(q_ci_1)

to see all the query plans and data structures enumerated by Chestnut for an individual query (currently printing out using Chestnut IR).

You can try

test_codegen_one_query(tables, associations, q_ci_1)

to see the best data structure generated for a single query, with the c++ code generated. When you go to the folder of generated code (./#{workload_name}/), you can compile the code and run it. This code will load data from MySQL to populate the data structure (the data loading may take a while), and runs the query with a subset of the query result printed along with how long the query takes.

If you have gurobi license installed, you can try the ILP to see how query plans share data structures. To enable ILP, go to ./ilp/ilp_manager.y and uncomment line

from gurobipy import *

If you don't have the license installed, you are still able to run the non-ILP stuffs (like generating code for a single query), by uncommenting this line in ./ilp/ilp_manager.y,

from ilp_fake import *

To run the ILP and see the data structures generated (including code), try the following:

ilp_solve(read_queries, write_queries=[], membound_factor=1.7, save_to_file=True, read_from_file=False, read_ilp=False, save_ilp=True)
test_read_overall(tables, associations, read_queries, memfactor=1.7, read_from_file=True, read_ilp=True)

Write queries are currently not supported. membound_factor is the ratio of the allowed memory bound to the size of the application data. Similarly, the generated code will be stored to ./#{workload_name}/.

You can also test the serialization cost between the c++-returned query result to the ruby objects. To do so, you can change the following line in globalv.py

qr_type = 'struct'

to

qr_type = 'proto'

The code-gen process will also generates a ruby test file which sends query parameter and receives query result from the Chestnut-generated C++ code using zmq. The ruby tes file is under ./#{workload_name}/ruby. To run the test, you may first compile and run the Chestnut-generated c++ code, which will populate the data structures and wait for request, and then run the ruby file to run the queries.

chestnut's People

Contributors

congy avatar mingweisamuel avatar akcheung avatar

Stargazers

 avatar zyf0726 avatar Xiaoying Wang avatar Mengzhu Sun avatar  avatar Kia Rahmani avatar Gautam Mittal avatar  avatar

Watchers

James Cloos avatar Jack Feser avatar  avatar  avatar  avatar

Forkers

zoey1124

chestnut's Issues

TPC-H benchmark issues

Hi,

I've been trying to get the tpch benchmarks to run, and I'm hitting a few issues. I'm able to get chestnut to generate code and to generate the backing DB, but I haven't been able to run the generated code.

The issue that I'm seeing right now is that the generated code exits while loading data from the DB for tpch query 1. It looks to me like there is an off-by-one error in the VarChar template. In particular, when LENGTH=1, the calls to memcpy will copy zero bytes. Something in the data loading code notices this and calls exit.

If I patch this issue, I run into two more problems. The first is that the generated code is missing some return statements in non-void functions. Clang helpfully inserts an illegal instruction here, which crashes the program. Fixing this leads to a segmentation fault in some other code.

I'm pretty sure that I'm doing something wrong in the codegen process. This is happening with the tip of the partition branch. Any pointers to the right branch or the right process for generating code would be great!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.