Coder Social home page Coder Social logo

elvismircan / jolt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bazaarvoice/jolt

0.0 2.0 0.0 2.09 MB

JSON to JSON transformation library written in Java.

Home Page: http://bazaarvoice.github.io/jolt/

License: Apache License 2.0

Shell 0.10% Java 99.90%

jolt's Introduction

Jolt

JSON to JSON transformation library written in Java where the "specification" for the transform is itself a JSON document.

Useful For

  1. Transforming JSON data from ElasticSearch, MongoDb, Cassandra, etc before sending it off to the world
  2. Extracting data from a large JSON documents for your own consumption

Table of Contents

Overview

Jolt :

  • provides a set of transforms, that can be "chained" together to form the overall JSON to JSON transform.
  • focuses on transforming the structure of your JSON data, not manipulating specific values
    • The idea being: use Jolt to get most of the structure right, then write code to fix values
  • consumes and produces "hydrated" JSON : in-memory tree of Maps, Lists, Strings, etc.
    • use Jackson (or whatever) to serialize and deserialize the JSON text

Stock Transforms

The Stock transforms are:

shift       : copy data from the input tree and put it the output tree
default     : apply default values to the tree
remove      : remove data from the tree
sort        : sort the Map key values alphabetically ( for debugging and human readability )
cardinality : "fix" the cardinality of input data.  Eg, the "urls" element is usually a List, but if there is only one, then it is a String

Each transform has it's own DSL (Domain Specific Language) in order to facilitate it's narrow job.

Currently, all the Stock transforms just effect the "structure" of the data. To do data manipulation, you will need to write Java code. If you write your Java "data manipulation" code to implement the Transform interface, then you can insert your code in the transform chain.

The out-of-the-box Jolt transforms should be able to do most of your structural transformation, with custom Java Transforms implementing your data manipulation.

Documentation

Jolt Slide Deck : covers motivation, development, and transforms.

Javadoc explaining each transform DSL :

  • shift
  • default
  • remove
  • sort
  • full qualified Java ClassName : Class implements the Transform or ContextualTransform interfaces, and can optionally be SpecDriven (marker interface)
    • Transform interface
    • SpecDriven
      • where the "input" is "hydrated" Java version of your JSON Data

Running a Jolt transform means creating an instance of Chainr with a list of transforms.

The JSON spec for Chainr looks like : unit test.

The Java side looks like :

Chainr chainr = JsonUtils.classpathToList( "/path/to/chainr/spec.json" );

Object input = elasticSearchHit.getSource(); // ElasticSearch already returns hydrated JSon

Object output = chainr.transform( input );

return output;

Shiftr Transform DSL

The Shiftr transform generally does most of the "heavy lifting" in the transform chain. To see the Shiftr DSL in action, please look at our unit tests (shiftr tests) for nice bite sized transform examples, and read the extensive Shiftr javadoc.

Our unit tests follow the pattern :

{
    "input": {
        // sample input
    },

    "spec": {
        // transform spec
    },

    "expected": {
        // what the output of the transform looks like
    }
}

We read in "input", apply the "spec", and Diffy it against the "expected".

To learn the Shiftr DSL, examine "input" and "output" json, get an understanding of how data is moving, and then look at the transform spec to see how it facilitates the transform.

For reference, this was the very first test we wrote.

Demo

There is a demo available at jolt-demo.appspot.com. You can paste in JSON input data and a Spec, and it will post the data to server and run the transform.

Note

  • it is hosted on a free Google App Engine instance, so it may take a minute to spin up.
  • it validates in input JSON and spec client side, but if there are any errors server side it just silently fails.

Getting Started

Getting started code wise has it's own doc.

Getting Transform Help

If you can't get a transform working and you need help, create and Issue in Jolt (for now).

Make sure you include what your "input" is, and what you want your "output" to be.

Alternatives

Aside from writing your own custom code to do a transform, there are two general approaches to doing Json to Json transforms in Java.

  1. JSON -> XML -> XSLT or STX -> XML -> JSON

Aside from being a Rube Goldberg approach, XSLT is more complicated than Jolt because it is trying to do the whole transform with a single DSL.

  1. Write a Template (Velocity, FreeMarker, etc) that take hydrated JSON input and write textual JSON output

With this approach you are working from the output format backwards to the input, which is complex for any non-trivial transform. Eg, the structure of your template will be dictated by the output JSON format, and you will end up coding a parallel tree walk of the input data and the output format in your template. Jolt works forward from the input data to the output format which is simpler, and it does the parallel tree walk for you.

Performance

The primary goal of Jolt was to improve "developer speed" by providing the ability to have a declarative rather than imperative transforms. That said, Jolt should have a better runtime than the alternatives listed above.

Work has been done to make the stock Jolt transforms fast:

  1. Transforms can be initialized once with their spec, and re-used many times in a multi-threaded environment.
    • We reuse initialized Jolt transforms to service multiple web requests from a DropWizard service.
  2. "*" wildcard logic was redone to reduce the use of Regex in the common case, which was a dramatic speed improvement.
  3. The parallel tree walk performed by Shiftr was optimized.

Two things to be aware of :

  1. Jolt is not "stream" based, so if you have a very large Json document to transform you need to have enough memory to hold it.
  2. The transform process will create and discard a lot of objects, so the garbage collector will have work to do.

Jolt CLI

Jolt Transforms and tools can be run from the command line. Command line interface doc here.

Code Coverage

Build Status

For the moment we have Cobertura configured in our poms. When we move to a proper open source CI build, this can go away.

mvn cobertura:cobertura
open jolt-core/target/site/cobertura/index.html

Currently code coverage is at 89% line, and 81% branch.

Release Notes

Versions and Release Notes available here.

jolt's People

Contributors

milosimpson avatar snkinard avatar bvbuild avatar norbertpotocki avatar namroff avatar bivasdas avatar carwashi avatar victortrac avatar sslavic avatar bvtreyperry avatar tpanagos avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.