Coder Social home page Coder Social logo

wukong-deploy's Introduction

Wukong Deploy Pack

The Infochimps Platform is an end-to-end, managed solution for building Big Data applications. It integrates best-of-breed technologies like Hadoop, Storm, Kafka, MongoDB, ElasticSearch, HBase, &c. and provides simple interfaces for accessing these powerful tools.

Computation, analytics, scripting, &c. are all handled by Wukong within the platform. Wukong is an abstract framework for defining computations on data. Wukong processors and flows can run in many different execution contexts including:

  • locally on the command-line for testing or development purposes
  • as a Hadoop mapper or reducer for batch analytics or ETL
  • within Storm as part of a real-time data flow

The Infochimps Platform uses the concept of a deploy pack for developers to develop all their processors, flows, and jobs within. The deploy pack can be thought of as a container for all the necessary Wukong code and plugins useful in the context of an Infochimps Platform application. It includes the following libraries:

  • wukong-hadoop: Run Wukong processors as mappers and reducers within the Hadoop framework. Model Hadoop jobs locally before you run them.
  • wukong-storm: Run Wukong processors within the Storm framework. Model flows locally before you run them.
  • wukong-load: Load the output data from your local Wukong jobs and flows into a variety of different data stores.
  • wonderdog: Connect Wukong processors running within Hadoop to Elasticsearch as either a source or sink for data.

Installation

The deploy pack is installed as a RubyGem:

$ sudo gem install wukong-deploy

Usage

Wukong-Deploy provides a command-line tool wu-deploy which can be used to create or interact with deploy packs.

Creating a New Deploy Pack

Create a new deploy pack:

$ wu-deploy new my_app
Within /home/user/my_app:
      create  .
      create  app/models
      create  app/processors
	  ...

This will create a directory my_app in the current directory. Passing the dry_run option will print what should happen without actually doing anything:

$ wu-deploy new my_app --dry_run
Within /home/user/my_app:
      create  .
      create  app/models
      create  app/processors
	  ...

You'll be prompted if there is a conflict. You can pass the force option to always overwrite files and the skip option to never overwrite files.

Working with an Existing Deploy Pack

If your current directory is within an existing deploy pack you can start up an IRB console with the deploy pack's environment already loaded:

$ wu-deploy console
irb(main):001:0> 

File Structure

A deploy pack is a repository with the following Rails-like file structure:

├──   app
│   ├──   models
│   ├──   processors
│   ├──   flows
│   └──   jobs
├──   config
│   ├──   environment.rb
│   ├──   application.rb
│   ├──   initializers
│   ├──   settings.yml
│   └──   environments
│       ├──   development.yml
│       ├──   production.yml
│       └──   test.yml
├──   data
├──   Gemfile
├──   Gemfile.lock
├──   lib
├──   log
├──   Rakefile
├──   spec
│   ├──   spec_helper.rb
│   └──   support
└──   tmp

Let's look at it piece by piece:

  • app: The directory with all the action. It's where you define:
    • models: Your domain models or "nouns", which define and wrap the different kinds of data elements in your application. They are built using whatever framework you like (defaults to Gorillib)
    • processors: Your fundamental operations or "verbs", which are passed records and parse, filter, augment, normalize, or split them.
    • flows: Chain together processors into streaming flows for ingestion, real-time processing, or complex event processing (CEP)
    • jobs: Pair processors together to create batch jobs to run in Hadoop
  • config: Where you place all application configuration for all environments
    • environment.rb: Defines the runtime environment for all code, requiring and configuring all Wukong framework code. You shouldn't have to edit this file directly.
    • application.rb: Require and configure libraries specific to your application. Choose a model framework, pick what application code gets loaded by default (vs. auto-loaded).
    • initializers: Holds any files you need to load before application.rb here. Useful for requiring and configuring external libraries.
    • settings.yml: Defines application-wide settings.
    • environments: Defines environment-specific settings in YAML files named after the environment. Overrides config/settings.yml.
  • data: Holds sample data in flat files. You'll develop and test your application using this data.
  • Gemfile and Gemfile.lock: Defines how libraries are resolved with Bundler.
  • lib: Holds any code you want to use in your application but that isn't "part of" your application (like vendored libraries, Rake tasks, &c.).
  • log: A good place to stash logs.
  • Rakefile: Defines Rake tasks for the development, test, and deploy of your application.
  • spec: Holds all your RSpec unit tests.
    • spec_helper.rb: Loads libraries you'll use during testing, includes spec helper libraries from Wukong.
    • support: Holds support code for your tests.
  • tmp: A good place to stash temporary files.

wukong-deploy's People

Contributors

kornypoet avatar timgasper avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.