Coder Social home page Coder Social logo

nandankumar / h2o.js Goto Github PK

View Code? Open in Web Editor NEW

This project forked from h2oai/h2o.js

0.0 2.0 0.0 875 KB

Node.js bindings to H2O, the open-source prediction engine for big data science.

Home Page: http://h2o.ai

License: MIT License

CoffeeScript 100.00%

h2o.js's Introduction

H2O.js

NPM

This Node.js / io.js module provides access to the H2O JVM (and extensions thereof), its objects, its machine-learning algorithms, and modeling support (basic munging and feature generation) capabilities.

It is designed to bring H2O to a wider audience of data and machine learning devotees that work exclusively with Javascript, for building machine learning applications or doing data munging in a fast, scalable environment without any extra mental anguish about threads and parallelism.

H2O also supports R, Python, Scala and Java.

What is H2O?

H2O is a piece of Java software for data modeling and general computing. There are many different views of the H2O software, but the primary view of H2O is that of a distributed (many machines), parallel (many CPUs), in memory (several hundred GBs Xmx) processing engine.

There are two levels of parallelism:

  • within node
  • across (or between) node.

The goal, remember, is to "simply" add more processors to a given problem in order to produce a solution faster. The conceptual paradigm MapReduce (also known as "divide and conquer and combine") along with a good concurrent application structure (c.f. jsr166y and NonBlockingHashMap) enable this type of scaling in H2O (we’re really cooking with gas now!).

For application developers and data scientists, the gritty details of thread-safety, algorithm parallelism, and node coherence on a network are concealed by simple-to-use REST calls that are all documented here. In addition, H2O is an open-source project under the Apache v2 licence. All of the source code is on Github, there is an active Google Group mailing list, our nightly tests are open for perusal, our JIRA ticketing system is also open for public use. Last, but not least, we regularly engage the machine learning community all over the nation with a very busy meetup schedule (so if you’re not in The Valley, no sweat, we’re probably coming to you soon!), and finally, we host our very own H2O World conference. We also sometimes host hack-a-thons at our campus in Mountain View, CA. Needless to say, there is a lot of support for the application developer.

In order to make the most out of H2O, there are some key conceptual pieces that are helpful to know before getting started. Mainly, it’s helpful to know about the different types of objects that live in H2O and what the rules of engagement are in the context of the REST API (which is what any non-JVM interface is all about).

Let’s get started!

The H2O Object System

H2O sports a distributed key-value store (the "DKV"), which contains pointers to the various objects that make up the H2O ecosystem. The DKV is a kind of biosphere in that it encapsulates all shared objects (though, it may not encapsulate all objects). Some shared objects are mutable by the client; some shared objects are read-only by the client, but mutable by H2O (e.g. a model being constructed will change over time); and actions by the client may have side-effects on other clients (multi-tenancy is not a supported model of use, but it is possible for multiple clients to attach to a single H2O cloud).

Briefly, these objects are:

  • Key: A key is an entry in the DKV that maps to an object in H2O.
  • Frame: A Frame is a collection of Vec objects. It is a 2D array of elements.
  • Vec: A Vec is a collection of Chunk objects. It is a 1D array of elements.
  • Chunk: A Chunk holds a fraction of the BigData. It is a 1D array of elements.
  • ModelMetrics: A collection of metrics for a given category of model.
  • Model: A model is an immutable object having predict and metrics methods.
  • Job: A Job is a non-blocking task that performs a finite amount of work.

Many of these objects have no meaning to an end Javascript user, but in order to make sense of the objects available in this module it is helpful to understand how these objects map to objects in the JVM (because after all, this module is an interface that allows the manipulation of a distributed system).

(To be continued...)

h2o.js's People

Contributors

lo5 avatar

Watchers

Nandankumar MR avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.