Coder Social home page Coder Social logo

techiev2 / plynx Goto Github PK

View Code? Open in Web Editor NEW

This project forked from plynx-team/plynx

0.0 2.0 0.0 12.54 MB

PLynx is a framework that makes it easy to prototype, build, and deploy complex ML pipelines.

Home Page: https://plynx.com

License: Apache License 2.0

Dockerfile 0.55% Shell 0.21% Python 43.70% HTML 0.57% JavaScript 45.58% CSS 9.37%

plynx's Introduction

PLynx

Main page and demo: PLynx.com.

PLynx is a framework that makes it easy to prototype, build, and deploy complex ML pipelines.

Developed for data scientists and data engineers, PLynx provides an easy and flexible way of building data pipelines faster than ever. PLynx is a high level abstraction over data workflows.

Features

Distributed computation

All of the computation is distributed across multiple workers.

Distributed computation

Graph Editor

Interactive User Interface. Create new pipelines based on existing ones and share your workflows with other people. Each of the pipelines is stored and you can always reproduce the results.

interactive graph editor

Online operations editor

Operations can be customized independently from the platform. You have high flexibility for each of the experiments.

online editor

Monitor progress

Track the progress of the workflow. Each of intermediate operations produce results that you can inspect.

monitor progress

Preview the results

View the results right in the browser.

results preview

Requirements

PLynx is a modular framework. Backend and Master, Worker and command line tool are written in python.

User Interface is based on React framework.

PLynx is using MongoDB as a primary metadata storage. In order to meet diverse data storage requirements, PLynx is using boto library as an abstractions. It supports multiple data storages such as AWS S3, Google Cloud Storage and traditional filesystems.

In order to reduce complexity on local workstations we recommend to install docker framework.

Get started with Docker

Get started

Install using pip

pip install plynx

Install from source code

You can install backend, service and command line utils building them from the source code.

First clone the repo:

git clone https://github.com/khaxis/plynx.git   # Clone the repo

Then build it using setup.py script:

cd plynx
python setup.py install     # install main PLynx package

Usage

Run the local cluster. Please note docker is required. (Get started with Docker)

plynx local

If docker is not available, you can still set up UI and database.

First, you will need to build the UI:

# from plynx

cd ui
npm install -s

Then in order to start the UI type the following command:

npm start

Another dependency is needed: metadata storage. PLynx currently uses MongoDB infrastructure for it.

Get started with MongoDB

When your UI and MongoDB servers are running, you can call the cluster with master, workers and backend. Make sure you skip the services that are already running.

plynx local --ignore-containers

Note that plynx local starts up the following services:

  • MongoDB instance
  • PLynx User Interface
  • Backend
  • Master
  • Several workers (3 by default)

You can always call them separately:

plynx backend
plynx master
plynx worker

PLynx API

Web User Interface is not the only way to create Graphs. Some of the experiments are easier to create using PLynx API:

#!/usr/bin/env python
# “WDBC: compare regressors” example
from collections import namedtuple
from plynx import Operation, File, Graph, Client

# define the training parameters
TrainDescriptor = namedtuple('TrainDescriptor', ['method', 'parameters'])
train_descriptors = [
    TrainDescriptor(method='GradientBoostingRegressor', parameters='100'),
    TrainDescriptor(method='MLPRegressor', parameters='100 10'),
    TrainDescriptor(method='MLPRegressor', parameters='100 8'),
    TrainDescriptor(method='MLPRegressor', parameters='80 10')
]

##########################################
# Build Plynx elements
##########################################

# Operation is a metaclass that declares the interface to an existing operation
# in the Plynx database
Split = Operation(
    id='5ae6b0123136050000d8711a',
    title='Split Train Test',
    inputs=['sample.py', 'data'],
    params=['rate', 'seed'],
    outputs=['train', 'test']
)

TrainRegression = Operation(
    id='5ae6b023d26111000027a613',
    title='Train regression',
    inputs=['train.py', 'data'],
    params=['regressor', 'hidden-layer-sizes'],
    outputs=['model']
)

Predict = Operation(
    id='5ae6b02f0164d80000afd938',
    title='Predict',
    inputs=['predict.py', 'data', 'model'],
    outputs=['prediction']
)

Compare = Operation(
    id='5aeaa49c16b8b50000abca48',
    title='Compare regressors',
    inputs=['build_roc.py', 'predictions'],
    outputs=['plot.png']
)

# define Files
wdbc_dataset = File(id='5ad42a2cfad58d1cc6c9f7b5')
sample_py = File(id='5ad42da1fad58d1cc6c9f7c4')
train_py = File(id='5aeaa42bfad58d7d05d632db')
predict_py = File(id='5aeaa455fad58d7d05d632e0')
build_plot_py = File(id='5aeaa483fad58d7d05d632e5')


##########################################
# Build pipeline
##########################################

# first split the dataset
split_block = Split(
    sample.py=sample_py.outputs.out,
    data=wdbc_dataset.outputs.out,
    rate=0.7
)

# run training algorithms with specified parameters
predictions = []
for descriptor in train_descriptors:
    train = TrainRegression(
        train_py=train_py.outputs.out
        data=split_block.outputs.train,
        regressor=descriptor.method,
        hidden_layer_sizes=descriptor.parameters
    )

    predict = Predict(
        predict_py=predict_py.outputs.out,
        data=split_block.outputs.test,
        model=train.outputs.model
    )

    predictions.append(predict.outputs.prediction)

# join 'predictions' in the report file
compare_models = Compare(
    build_roc_py=build_plot_py.outputs.out,
    predictions=predictions
)

# build the Graph
graph = Graph(
    Client(),
    title='WDBC: compare regressors',
    description='Wisconsin Diagnostic Breast Cancer',
    targets=[compare_models]
)

##########################################
# Run pipeline
##########################################

# Approve the Graph and wait until it finishes
graph.approve().wait()

Config

All of the parameters can be set in command line when the services are called. For example:

$ plynx master --help
usage: plynx master [-h] [-v] [--internal-master-host INTERNAL_MASTER_HOST]
                    [-P PORT] [--db-host DB_HOST] [--db-port DB_PORT]
                    [--db-user DB_USER] [--db-password DB_PASSWORD]

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Set logging output more verbose
  --internal-master-host INTERNAL_MASTER_HOST
                        Internal Master host
  -P PORT, --port PORT  Master port
  --db-host DB_HOST     Database host
  --db-port DB_PORT     Database port
  --db-user DB_USER     Database username
  --db-password DB_PASSWORD
                        Database password

Yet it is almost always more convenient to store the config in a separate file.

PLynx config is located at Env variable PLYNX_CONFIG_PATH. If the variable is not defined, PLynx will use a file config.yaml in the local directory.

Here is an example of a config.yaml:

mongodb:
  user:
  password:
  host: 'mongodb'
  port: 27017

master:
  internal_host: '0.0.0.0'
  host: 'master'
  port: 17011

worker:
  user:

storage:
  scheme: file
  prefix: /data/resources/

auth:
  secret_key: SECRET_KEY

web:
  host: '0.0.0.0'
  port: 5000
  endpoint: http://localhost:3000
  debug: true

demo:
  graph_ids: []

External links

plynx's People

Contributors

khaxis avatar

Watchers

Sriram Velamur avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.