Coder Social home page Coder Social logo

deploy-soon / incubator-nemo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from apache/incubator-nemo

0.0 1.0 0.0 16.01 MB

Apache Nemo (Incubating) - Data Processing System for Flexible Employment With Different Deployment Characteristics

Home Page: https://nemo.apache.org

License: Apache License 2.0

Shell 0.39% Python 1.12% Java 91.44% Scala 1.94% JavaScript 1.23% Vue 3.89%

incubator-nemo's Introduction

Nemo

Build Status Quality Gate Status

A Data Processing System for Flexible Employment With Different Deployment Characteristics.

Online Documentation

Details about Nemo and its development can be found in:

Please refer to the Contribution guideline to contribute to our project.

Nemo prerequisites and setup

Prerequisites

  • Java 8
  • Maven
  • YARN settings
  • Protobuf 2.5.0
    • On Ubuntu 14.04 LTS and its point releases:

      $ sudo apt-get install protobuf-compiler
    • On Ubuntu 16.04 LTS and its point releases:

      $ sudo add-apt-repository ppa:snuspl/protobuf-250
      $ sudo apt update
      $ sudo apt install protobuf-compiler=2.5.0-9xenial1
    • On macOS:

      $ brew tap homebrew/versions
      $ brew install [email protected]
    • Or build from source:

    • To check for a successful installation of version 2.5.0, run $ protoc --version

Installing Nemo

  • Run all tests and install: $ mvn clean install -T 2C
  • Run only unit tests and install: $ mvn clean install -DskipITs -T 2C

Running Beam applications

Apache Nemo is an official runner of Apache Beam, and it can be executed from Beam, using NemoRunner, as well as directly from the Nemo project. The details of using NemoRunner from Beam is shown on the NemoRunner page of the Apache Beam website. Below describes how Beam applications can be run directly on Nemo.

Configurable options

  • -job_id: ID of the Beam job
  • -user_main: Canonical name of the Beam application
  • -user_args: Arguments that the Beam application accepts
  • -optimization_policy: Canonical name of the optimization policy to apply to a job DAG in Nemo Compiler
  • -deploy_mode: yarn is supported(default value is local)

Examples

## WordCount example from the Beam website (Count words from a document)
$ ./bin/run_beam.sh \
    -job_id beam_wordcount \
    -optimization_policy org.apache.nemo.compiler.optimizer.policy.DefaultPolicy \
    -user_main org.apache.nemo.examples.beam.BeamWordCount \
    -user_args "--runner=NemoRunner --inputFile=`pwd`/examples/resources/inputs/test_input_wordcount --output=`pwd`/outputs/wordcount"
$ less `pwd`/outputs/wordcount*

## MapReduce WordCount example (Count words from the Wikipedia dataset)
$ ./bin/run_beam.sh \
    -job_id mr_default \
    -executor_json `pwd`/examples/resources/executors/beam_test_executor_resources.json \
    -optimization_policy org.apache.nemo.compiler.optimizer.policy.DefaultPolicy \
    -user_main org.apache.nemo.examples.beam.WordCount \
    -user_args "`pwd`/examples/resources/inputs/test_input_wordcount `pwd`/outputs/wordcount"
$ less `pwd`/outputs/wordcount*

## YARN cluster example
$ ./bin/run_beam.sh \
    -deploy_mode yarn \
    -job_id mr_transient \
    -executor_json `pwd`/examples/resources/executors/beam_test_executor_resources.json \
    -user_main org.apache.nemo.examples.beam.WordCount \
    -optimization_policy org.apache.nemo.compiler.optimizer.policy.TransientResourcePolicy \
    -user_args "hdfs://v-m:9000/test_input_wordcount hdfs://v-m:9000/test_output_wordcount"

## NEXMark streaming Q0 (query0) example 
$ ./bin/run_nexmark.sh \
    -job_id nexmark-Q0 \
    -executor_json `pwd`/examples/resources/executors/beam_test_executor_resources.json \
    -user_main org.apache.beam.sdk.nexmark.Main \
    -optimization_policy org.apache.nemo.compiler.optimizer.policy.StreamingPolicy \
    -scheduler_impl_class_name org.apache.nemo.runtime.master.scheduler.StreamingScheduler \	
    -user_args "--runner=NemoRunner --streaming=true --query=0 --numEventGenerators=1"

Resource Configuration

-executor_json command line option can be used to provide a path to the JSON file that describes resource configuration for executors. Its default value is config/default.json, which initializes one of each Transient, Reserved, and Compute executor, each of which has one core and 1024MB memory.

Configurable options

  • num (optional): Number of containers. Default value is 1
  • type: Three container types are supported:
    • Transient : Containers that store eviction-prone resources. When batch jobs use idle resources in Transient containers, they can be arbitrarily evicted when latency-critical jobs attempt to use the resources.
    • Reserved : Containers that store eviction-free resources. Reserved containers are used to reliably store intermediate data which have high eviction cost.
    • Compute : Containers that are mainly used for computation.
  • memory_mb: Memory size in MB
  • capacity: Number of Tasks that can be run in an executor. Set this value to be the same as the number of CPU cores of the container.

Examples

[
  {
    "num": 12,
    "type": "Transient",
    "memory_mb": 1024,
    "capacity": 4
  },
  {
    "type": "Reserved",
    "memory_mb": 1024,
    "capacity": 2
  }
]

This example configuration specifies

  • 12 transient containers with 4 cores and 1024MB memory each
  • 1 reserved container with 2 cores and 1024MB memory

Monitoring your job using web UI

Nemo Compiler and Engine can store JSON representation of intermediate DAGs.

  • -dag_dir command line option is used to specify the directory where the JSON files are stored. The default directory is ./dag. Using our online visualizer, you can easily visualize a DAG. Just drop the JSON file of the DAG as an input to it.

Examples

$ ./bin/run_beam.sh \
    -job_id als \
    -executor_json `pwd`/examples/resources/executors/beam_test_executor_resources.json \
    -user_main org.apache.nemo.examples.beam.AlternatingLeastSquare \
    -optimization_policy org.apache.nemo.compiler.optimizer.policy.TransientResourcePolicy \
    -dag_dir "./dag/als" \
    -user_args "`pwd`/examples/resources/inputs/test_input_als 10 3"

Options for writing metric results to databases.

  • -db_enabled: Whether or not to turn on the DB (true or false).
  • -db_address: Address of the DB. (ex. PostgreSQL DB starts with jdbc:postgresql://...)
  • -db_id : ID of the DB from the given address.
  • -db_password: Credentials for the DB from the given address.

Speeding up builds

  • To exclude Spark related packages: mvn clean install -T 2C -DskipTests -pl \!compiler/frontend/spark,\!examples/spark
  • To exclude Beam related packages: mvn clean install -T 2C -DskipTests -pl \!compiler/frontend/beam,\!examples/beam
  • To exclude NEXMark related packages: mvn clean install -T 2C -DskipTests -pl \!examples/nexmark

incubator-nemo's People

Contributors

wonook avatar sanha avatar seojangho avatar jooykim avatar johnyangk avatar taegeonum avatar jeongyooneo avatar skystar-p avatar hy00nc avatar ejjeong avatar differentsc avatar wynot12 avatar alapha23 avatar yunseong avatar deploy-soon avatar mhkwon924 avatar gwsshs22 avatar shpark avatar arunlakshman avatar jangdonghae avatar davin111 avatar haese0ng avatar kennknowles avatar ulgal avatar jang1suh avatar apeinot avatar dependabot[bot] avatar dreamsh19 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.