Coder Social home page Coder Social logo

asvyatkovskiy / rootconverter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from diana-hep/rootconverter

0.0 2.0 0.0 30.2 MB

Converts ROOT trees into different formats to make them accessible in Big Data applications.

License: Apache License 2.0

Makefile 0.59% Python 21.62% C++ 46.46% C 1.03% Scala 30.30%

rootconverter's Introduction

rootconverter

Converts ROOT trees into different formats to make them accessible in Big Data applications.

There are several projects here, three of which are complete. They all belong in the same git repository because they share code.

  • root2avro is a C++ program that converts ROOT TTree data into an equivalent Avro data (which may be saved to a file on disk or streamed into another application.
  • scaroot-reader is a hybrid Scala/C++ (through JNA) library that streams ROOT TTree data directly into the JVM. Data representation is controlled with (possibly) user-supplied callbacks.
  • Spark examples shows how to use ScaROOT-Reader in Spark.

Click on the links to go to specific documentation for each.

Rough performance statistics for 1000 Event.root entries on a single machine (my laptop). Take these numbers as relative.

  • 1.8 sec: read TTree, discard data.
  • 1.8 sec: read TTree, create Scala objects with ScaROOT-Reader (negligible difference from above). However, repeating this test eventually produced some 3 second spikes, presumably due to garbage collector pauses.
  • 5.7 sec: convert to uncompressed Avro file and save. Reading from Avro file in Java: about 1 sec. Avro file is 2.0 times as large as the original ROOT file.
  • 6.1 sec: convert to Snappy-compressed Avro file and save. Avro file is 1.4 times as large as the original ROOT file.
  • 29 sec: convert to Avro with any other compression method. Avro file is 1.0 times as large as the original ROOT file (suggesting that ROOT uses something like deflate).
  • 18 sec: convert to JSON file and save. The JSON file is huge.
  • 29 sec: abandoned scaroot-oldreader version (see old branch).

Unfortunately, file-reading cannot be parallelized in the same process: you immediately get libRIO segmentation faults. Adding a "micro-batch" strategy of copying several entries from C++ to Scala at a time does nothing for performance.

rootconverter's People

Contributors

jpivarski avatar sabasehrish avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.