Coder Social home page Coder Social logo

alibaba / pipcook Goto Github PK

View Code? Open in Web Editor NEW
2.5K 49.0 204.0 17.26 MB

Machine learning platform for Web developers

Home Page: https://alibaba.github.io/pipcook/

License: Apache License 2.0

JavaScript 5.14% Dockerfile 0.23% TypeScript 79.65% Shell 0.86% Makefile 0.04% Jupyter Notebook 11.32% HTML 2.75%
js machine-learning tensorflow pipeline

pipcook's Introduction

pipcook

A JavaScript application framework for machine learning and its engineering.

npm npm GitHub repo size

Documentation: English | 中文

Builds

Build Types Status
tests
documentation
docker

Why Pipcook

With the mission of enabling JavaScript engineers to utilize the power of machine learning without any prerequisites and the vision to lead front-end technical field to the intelligention. Pipcook is to become the JavaScript application framework for the cross-cutting area of machine learning and front-end interaction.

We are truly to design Pipcook's API for front-end and machine learning applications, and focusing on the front-end area and developed from the JavaScript engineers' view. With the principle of being friendly to JavaScript, we will push the whole area forward with the machine learning engineering. For this reason we opened an issue about machine-learning application APIs, and look forward to you get involved.

What's Pipcook

The project provides subprojects including machine learning pipeline framework, management tools, a JavaScript runtime for machine learning, and these can be also used as building blocks in conjunction with other projects.

Principles

Pipcook is an open-source project guided by strong principles, aiming to be modular and flexible on user experience. It is open to the community to help set its direction.

  • Modular the project includes some of projects that have well-defined functions and APIs that work together.
  • Swappable the project includes enough modules to build what Pipcook has done, but its modular architecture ensures that most of the modules can be swapped by different implementations.

Audience

Pipcook is intended for Web engineers looking to:

  • learn what's machine learning.
  • train their models and serve them.
  • optimize own models for better model evaluation results, like higher accuracy for image classification.

If you are in the above conditions, just try it via installation guide.

Subprojects

Pipcook Pipeline

It's used to represent ML pipelines consisting of Pipcook scripts. This layer ensures the stability and scalability of the whole system and uses a plug-in mechanism to support rich functions including dataset, training, validations, and deployment.

A Pipcook Pipeline is generally composed of lots of scripts. Through different scripts and configurations, the final output to us is an NPM package, which contains the trained model and JavaScript functions that can be used directly.

Note: In Pipcook, each pipeline has only one role, which is to output the above-trained model you need. That is to say, the last stage of each pipeline must be the output of the trained model, otherwise, this Pipeline is invalid.

Pipcook Bridge to Python

For JavaScript engineers, the most difficult part is the lack of a mature machine learning toolset in the ecosystem. In Pipcook, a module called [Boa][https://github.com/imgcook/boa], which provides access to Python packages by bridging the interface of CPython using N-API.

With it, developers can use packages such as numpy, scikit-learn, jieba, tensorflow, or any other Python ecology in the Node.js runtime through JavaScript.

Quick start

Setup

Prepare the following on your machine:

Installer Version Range
Node.js >= 12.17 or >= 14.0.0
npm >= 6.14.4

Install the command-line tool for managing Pipcook projects:

$ npm install -g @pipcook/cli

Then train from anyone of those pipelines, we take image classification as an example:

$ pipcook train https://cdn.jsdelivr.net/gh/alibaba/pipcook@main/example/pipelines/image-classification-mobilenet.json -o ./output

This dataset specfied by the pipeline includes 2 categories image: avatar and blurBackground. After training, we can predict the category of a image:

$ pipcook predict ./output/image-classification-mobilenet.json -s ./output/data/validation/blurBackground/71197_223__30.7_36.jpg
✔ Origin result:[{"id":1,"category":"blurBackground","score":0.9998120665550232}]

The input is a blurBackground image from the validation dataset. And the model determines that its category is blurBackground.

Want to deploy it?

$ pipcook serve ./output
ℹ preparing framework
ℹ preparing scripts
ℹ preparing artifact plugins
ℹ initializing framework packages
Pipcook has served at: http://localhost:9091

Then you can open the browser and try your image classification server.

Playground

If you are wondering what you can do in Pipcook and where you can check your training logs and models, you could start from Pipboard:

open https://pipboard.imgcook.com

You will see a web page prompt in your browser, and there is a MNIST showcase on the home page and play around there.

Pipelines

If you want to train a model to recognize MNIST handwritten digits by yourself, you could try the examples below.

Name Description Open in Colab
mnist-image-classification pipeline for classific MNIST image classification problem. N/A
databinding-image-classification pipeline example to train the image classification task which is
to classify imgcook databinding pictures.
Open In Colab
object-detection pipeline example to train object detection task which is for component recognition
used by imgcook.
Open In Colab
text-bayes-classification pipeline example to train text classification task with bayes N/A

See here for complete list, and it's easy and quick to run these examples. For example, to do a MNIST image classification, just run the following to start the pipeline:

$ pipcook run https://cdn.jsdelivr.net/gh/alibaba/pipcook@main/example/pipelines/image-classification-mobilenet.json -o output

After the above pipeline is completed, you have already trained a model at the current output/model directory, it's a tensorflow.js model.

Developers

Clone this repository:

$ git clone [email protected]:alibaba/pipcook.git

Install dependencies, e.g. via npm:

$ npm install

After the above, now build the project:

$ npm run build

Community

DingTalk

Or searched via the group number: 30624012.

Download DingTalk (an all-in-one free communication and collaboration platform) here: English | 中文

Gitter Room

Who's using it

License

Apache 2.0

pipcook's People

Contributors

ahkari avatar anyexinglu avatar dependabot[bot] avatar doreenyou avatar ederzz avatar eliyao avatar feelychau avatar gindis avatar heluwe avatar hongyin163 avatar jabez128 avatar joker-jelly avatar lewis617 avatar liangzr avatar lijiajunxs avatar macbesu avatar markexin avatar mowatermelon avatar nhibiki avatar rajpratik71 avatar rickycao-qy avatar rickyes avatar sinoon avatar sirm2z avatar sungongwei avatar thomasyxy avatar txiaozhe avatar wordcount avatar yorkie avatar zijingao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pipcook's Issues

meta: how to work with tensorboard

TensorBoard is TensorFlow's visualization toolkit, it can do many visualization works as https://www.tensorflow.org/tensorboard described. To use this toolkit with the model, the easy way is to use tf.keras.callbacks.TensorBoard with model.fit's callbacks, and the other ways are under the tf.summary.

There are two ways to use TensorBoard in Pipcook:

Which way do you think is better? @utkobe @wordcount

meta: about project scope, contributions and plugin ecosystem

A clear project scope can help us make better choices. Here we will clearly define the source code, configuration and documentation parts that need to be included in Pipcook. This project Pipcook as an open source project, we should welcome different types of contributions at different levels, which will include the scope of the project mentioned earlier. The last discussion is about the plugin ecosystem, I will describe the unit organization structure of our plugin ecosystem and how to integrate it with NPM and JavaScript to develop together.

Project scope

The project "Pipcook" software includes the followings:

  • source code of the framework, high-level apis, command-line tools and builtin plugins.
  • documents and specifications of framework, high-level apis, command-line tools and built plugins.
  • a Web launcher for plugins discovery, dataset selection, pipeline creation, model deployment, and visualization.

The plugin plays an important role in this project, pipeline does schedule some of plugins which are wrapped as component and working together to output the model or service to deploy. Each plugin needs to follow the below:

  • MUST be a NPM package, which means some files of package.json and a main file, TypeScript(*.ts) is recommended by default.
  • SHOULD have a README for introducing the plugin.
  • SHOULD have tsdoc/jsdoc annotations or HTML version for API references.
  • SHOULD have unit tests for code quality.

Contributions & Contributors

After understanding the project scope and plugins, let's take a look at what types of contributions and contributors pipcook will accept as an open source project.

  • contribution to web launcher
  • contribution to command-line tools
  • contribution to framework and high-level apis
  • contribution to built-in plugin

In addition to the above, we'll describe user-land plugin at the section "plugin ecosystem".

Each contribution mentioned above MUST follow these rules:

  • contributor submits a pull request to describe the technical details.
  • changes in this pull request include some of source code, document and configuration.
  • changes in this pull request pass all the related build instructions.
  • changes in this pull request receive over 1 approval from project collaborators.
    • changes of framework and high-level apis does require core collaborators' approvals.
    • changes of built-in plugin does require the built-in plugin collaborators' approvals.

We have also classified the contributors as follows:

  • contributor: someone who has the contributions in the project scope.
  • collaborator: project maintainer who does make improvements, fix bugs, and review pull requests.
    • core collaborator maintains all the project scope, focusing on framework, high-level apis and release management.
    • built-in plugin collaborator maintains specific one or more built-in plugins.

Plugin ecosystem

The composition and requirements of the plugin was mentioned in the previous chapter, so here we will define some rules between plugins, namely plugin ecosystem.

From the maintainer's perspective, plugins can be divided into built-in and community ones:

  • built-in plugins are maintained by core collaborators and released with the Pipcook.
  • each community plugin is maintained and released by the author himself/herself, Pipcook can download the specified plugins through git, npm or oss.
  • private plugin is maintained by private organization or company itself.

To help Pipcook discover all the plugins, the project provides some rules to let the Web launcher discover community ones:

Community plugins can also be submitted as built-in plugins through pull requests, but this requires nomination by a core collaborator and the approvals of at least 2 collaborators.

PipApp: the application framework for machine learning

The vision of Pipcook is to take the JavaScript developers and engineers into the world of machine learning quickly and seamlessly, then we're responsible for creating easy enough APIs.

In the Pipcook stack, the pipcook-app is to be defined the ML application, which abstracts some duplicated stuffs and hides low-level algorithm implementation which requires a learning curve for every ML rookie.

APIs

Every module represents a type of dataset, and basically we provide some different methods for developers.

module ml

This module is to create machine learning functions, it provides the core abilities to represent your machine learning application in an intuitive way.

interface ml.Function

To hide the ML details as possible, Pipcook lets your declare your functions for machine learning purpose in a specific type ml.Function, you can create a ml.Function via the following create() function.

Internally, the Pipcook compiler parses the applications, then generates the training code via the ml.Function instances, and replaces these slots with model generated inferences.

interface ml.FunctionImpl(arg: data.MLType)

This interface is to describe the internal machine learning internals for applications, and it accepts an argument in data.MLType as the input, however the output's type is not required.

create(fn: ml.FunctionImpl): ml.Function

This is to create the above ml.Function with a ml.FunctionImpl object.

const mlfunc: ml.Function = ml.create((input: ml.ImageType) => {
  // call other ML Application APIs here and return
});

// ...
mlfunc(new ml.ImageType(...)); // call this function anywhere.

module data

This module is to declare all types for your application's I/O.

interface data.MLType

It's the base interface to tell the Pipcook compiler a type for ML.

interface data.ImageType extends data.MLType

It represents the image type for given ml.Function I/O.

interface data.TextType extends data.MLType

It represents the text type for given ml.Function I/O.

module vision

This module provides vision-related functions like image classification and object detection.

interface vision.Position2D

it represents the position in 2d for object detections:

  • label {string} the label string represents the object's type.
  • left {number} the left of detected object in pixel.
  • top {number} the top of detected object in pixel.
  • height {number} the height of detected object.
  • width {number} the width of detected object.
classify(img: ImageType): string

It recognizes the type of image, and returns the type string.

ml.create((img: data.ImageType) => {
  const label = vision.classify(img); // returns the label
});
detect(img: ImageType): vision.Position2D[]

It detects target from a single image, and returns the position and label of detected objects.

ml.create((img: data.ImageType) => {
  const objects = vision.detect(img);
  objects.forEach((o) => {
    console.log(o.label, o.left, o.top); // prints the label, left and top.
  });
});

module nlp

This module provides NLP-related functions like text classification and clustering.

interface nlp.Cluster
  • label {string} the label for this cluster.
  • items {string[]} the strings in this cluster.
interface nlp.ClusteringResult
  • clusters {nlp.Cluster[]} all grouped clusters, and each is an object of nlp.Cluster.
  • noises {string[]} all labeled noises strings.
classify(input: string): string

it recognizes the type of text, and returns the type string.

clustering(inputs: string[]): nlp.ClusteringResult

it clusters all types of given inputs, and returns the result in nlp.ClusteringResult.

Anti-APIs

The anti-API means the API must be hidden under the application user, there is a list here:

  • hide the training workflow, therefore some interfaces to train and predict should be invisible.
  • hide the dataset workflow, in the future, developer uses a tool for dataset processing and validation.
  • hide the model-related APIs: graph structure, parameters and model validation.
  • hide the serving implementation, every ML application should be serve-able in pipcook-app, thus we don't any other APIs for serving models specially.

Example

// example.ts
import { ml, vision, data } from '@pipcook/pipcook-app';

class MyImage extends data.ImageType {
  constructor(x, y, buffer) {
    super(x, y, buffer, 100, 100);
  }
}

const listAvatars: ml.Function = ml.create((img: MyImage) => {
  const components = vision.recognizeComponent(img);
  if (!component)
    return false;

  components.map((item: UIView) => {
    const img = item.toImage() as UIImage;
    return vision.detectFace(img);
  }).filter((avatar: data.FaceType) => {
    return avatar !== null;
  });
});

// use the listAvatars function for your use
const app = express();
app.get('/', (req, res) => {
  const img = new MyImage(req.body.x, req.body.y, req.body.buffer);
  res.json(listAvatars(img).toJSON());
});

Then run the following commands to train:

$ pipcook train example.ts --epoch=5 --no-validation
generated the model at example.ts.im

And run your ML application:

$ pipcook try example.ts
$ pipcook deploy example.ts --eas=xxx

doc: optimize the guide for beginners and developers

Currently the document is still in progress, we can optimize documents from these perspectives:

  • Currently the 'getting started' doc is too simple and naive, the getting started document should include
    • How to use pipcook
    • How to organize data
    • How to choose plugin
    • How to deploy
  • Currently the developer guide is not clear enough, it should include:
    • How to init the plugin developer environment
    • How to write the plugin
    • How to publish the plugin
  • There is no doc / api reference for plugins themselves, so that the users are not clear what plugins we have and what parameters should be includes for specific plugin. We need to optimize plugin doc
  • Following above, probably it's good to have appropriate comments in plugins so that the doc can be generated to some extent

I can't init the project! Who can help me

I use archlinux, nvm, node 13.13.0, python 3.8.3

~/CODE/pipcook-example » pipcook init                            han@archlinux
? which client do you want to use? npm
⠋ installing pipcook[..................] / rollbackFailedOptional: verb npm-sess
> @pipcook/[email protected] install /home/han/CODE/pipcook-example/node_modules/@pipcook/boa
> node-gyp rebuild

gyp WARN install got an error, rolling back install
gyp ERR! configure error 
gyp ERR! stack Error: getaddrinfo EAI_AGAIN nodejs.org
gyp ERR! stack     at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:66:26)
gyp ERR! System Linux 5.6.5-arch3-1
gyp ERR! command "/home/han/.nvm/versions/node/v13.13.0/bin/node" "/home/han/.nvm/versions/node/v13.13.0/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /home/han/CODE/pipcook-example/node_modules/@pipcook/boa
gyp ERR! node -v v13.13.0
gyp ERR! node-gyp -v v5.1.0
gyp ERR! not ok 
npm WARN [email protected] No description
npm WARN [email protected] No repository field.

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! @pipcook/[email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the @pipcook/[email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/han/.npm/_logs/2020-04-21T02_58_25_001Z-debug.log
✖ install Error: Command failed: npm install @pipcook/boa --save error

Could we provide a source easier to download libtensorflow?

It is difficult to download libtensorflow when user init project, install dependents and so on because of the reason of network as everyone knows;
Could we provide a source easier to download it, and give an appropriate way to set it? :)

proposal to add some labels to manage our issues

The labels list are the following:

  • pipcook-core: the pipcook-core issues
  • build: the build(CI/CD) issues
  • plugin: the plugin issues
  • tests: the test issues
  • model: the model issues

@utkobe may I ask you to have review the above, I will do an operation after getting approvals of you.

meta: proposals for customized UI plugin for Pipboard

As discussed with @yorkie @wordcount , we would allow users to develop user-interface plugins to extend the ability of Pipboard. This issue will discuss the specification for ui-plugin.

Built-in Pipboard

Pipcook will provide a built-in web launcher and we name it as Pipboard as discussed in #29 , Basically default Pipboard will provide access to build pipeline, check logs and models. The basic structure is shown as below:

Customized UI Plugin

Customized user-interface plugin still belongs to pipcook plugin ecosystem (refer to #17 ). Currently plugins are categorized into:

  • data collect
  • data access
  • data process
  • model load
  • model train
  • model evaluate
  • model deploy

Now for UI plugin, we will add the 8th category:

  • user interface

After the UI plugin is used, Pipboard will incorporate its user-interface into itself. For example, if a plugin called 'customize-demo' is developed, Pipboard will be shown as below and the content area will show contents of this plugin.

UI Plugin Developer Guide

The specification for UI plugin is as follows:

  • MUST be a NPM package
  • MUST have the structure as follows:
    • build (front-end codes after built with package tool)
    • src [optional] (your source codes for user interface)
    • config.json (config file. Configure your plugin name to be shown in tab)
    • package.json
    • index.js
  • SHOULD have a readme for us to understand

In case the UI will need to do some basic operations on local file system, we will provide several basic APIs to achieve this:

  • API to get info about local models
  • API to get info about local training history logs
  • API to read a file from local system
  • API to write a file from local system

How to use UI Plugin

The user can follow these steps to use a specific plugin

  • install UI plugin npm package into working space
  • use command interface
pipcook board --plugin=<npm package name>

I will provide a example plugin later on

Please help me have a review on this proposal @yorkie @wordcount

build: add a new action for validating the user scenarios

We have received some feedback when using the CLI to initialize plugin development, so I think we need to add test cases for these specific user scenarios in addition to ensuring unit testing, I have compiled a list, if you need to add, you can also comment, too:

  • Install Pipcook
  • Initialize Pipeline Project
  • Initialize Plugin Project
  • Run Pipeline Project
  • Run Plugin Project

pipcook should looks like a community product?

I means that pipcook contains many modules in its path(package)like app/cli/core and board clint/server,do not you think it's confusing?That make pipcook looks like a aggregate of many libs.
OK I would told my opinion:
For example,pipcook-board-server,a server for board contains few apis but build from egg,as every one knows that egg is an awesome framework for engineering in back end but may put fine timber to petty use,we could streamlining the board server use a lightweight framework.
In short,I proposal we need to simplify pipcook structure by using lightweight framework,remove redundancy files,planing development mode for plugin,etc.:)

meta: discuss how to use python for plugin author

We'll have a builtin Python integration called BOA, and it's a Node.js binding, so it's not going to do any change for installing Python packages.

So I propose that we will have the some commands for managing Python's packages, and append the corresponding PYTHONPATH on running Pipelines.

不论是本地安装cli还是docker安装都会报错

根据示例,在目录文件执行 pipcook init 时

# pipcook init
internal/streams/legacy.js:59
      throw er; // Unhandled stream error in pipe.
      ^

Error: connect ECONNREFUSED 151.101.228.133:443
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1129:14) {
  errno: 'ECONNREFUSED',
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '151.101.228.133',
  port: 443
}

但是pip-project文件确实可以成功生成,

然后执行 node examples/pipeline-mnist-image-classfication.js时

internal/modules/cjs/loader.js:800
    throw err;
    ^

Error: Cannot find module '@pipcook/pipcook-core'
Require stack:
- /document/pipcook-project/examples/pipeline-mnist-image-classfication.js
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:797:15)
    at Function.Module._load (internal/modules/cjs/loader.js:690:27)
    at Module.require (internal/modules/cjs/loader.js:852:19)
    at require (internal/modules/cjs/helpers.js:74:18)
    at Object.<anonymous> (/document/pipcook-project/examples/pipeline-mnist-image-classfication.js:23:86)
    at Module._compile (internal/modules/cjs/loader.js:959:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:995:10)
    at Module.load (internal/modules/cjs/loader.js:815:32)
    at Function.Module._load (internal/modules/cjs/loader.js:727:14)
    at Function.Module.runMain (internal/modules/cjs/loader.js:1047:10) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/document/pipcook-project/examples/pipeline-mnist-image-classfication.js'
  ]
}

Boa: magic functions are not working as expected

This bug is likely related to #61
Now calling magic function directly is working file. But use system built-in function to call it is giving errors. For example

const boa = require('../packages/boa');
const torch = boa.import('torch');
const {len} = boa.builtins();

class customDataset extends torch.utils.data.Dataset {
  __len__() {
    return 5;
  }
  __getitem__(index) {
    return 2;
  }
}

const dataset =  new customDataset();
console.log(dataset.__len__());  # This is wokring fine
console.log(len(dataset));  # This gives errors

Accordingly in raw python, this is working:

import sys
import torch

class customDataset( torch.utils.data.Dataset ):
  def __len__(self): 
    return 5
  
  def __getitem__(self, index): 
    return 2
  

dataset = customDataset()

print(dataset.__len__())
print(len(dataset))

'Python.h' file not found.

Problem

for boa version is 0.5.2, it will throw an error: "'Python.h' file not found" when run 'npm install @pipcook/boa' or 'pipcook init':
36367E0F-D4CE-4C3D-BC62-B621F386A4E7

Solved

This version of boa(0.5.2) is dependents Python and Homebrew, so you need:
1、checking 'brew' was installed in your mac
2、checking 'Python3' was installed in your mac by run

$ brew --prefix python\@3

1586964262579_0CFED36D-DA99-417D-9920-EC49EBE3FA22

and you would found Python3 was installed, then checking:

$ `brew --prefix python\@3`/Frameworks/Python.framework/Versions/3.7/include/python3.7m

if it is not print effective info, it means you need install Python3 from

$ brew install python@3

right case:
1

meta: proposal of Pipboard functional documentation

Discussed with @wordcount @utkobe, we rename the Web Launcher to Pipboard, it's more meaningful. And we have more details to be clear at this meeting.

Software architecture: Client/Server
Tech stacks: client(alibaba/ice), server(midwayjs/midway)

Pipboard is used for pipeline management in GUI, plugin configuration and model visualization, and we decide to extend the Pipboard from the pipcook-cli's board sub-command, which is able to view the training progress.

Otherwise, for extending the data-collect plugins in Web way, we just add a plugin property: "web", which represents this plugin is a Web plugin. And Web plugin could only be inserted to a pipeline at Pipboard, and this type of plugin actually is Web project that must contain an index.html as the entrance, then Pipboard will open that in a new sandbox <window>.

meta: define the spec of Pipcook Daemon

We have introduced Pipcook Daemon at #30, now let's define the Pipcook Daemon in details.

  • Define the declarative communication, and we have the following alternatives:
    • HTTP
    • GRPC
    • Node.js IPC (need to support for RPC mode)
  • Daemon SDK
    • Pipeline
      • CRUD, operations for pipeline
      • Run
      • Query
    • Plugin Management
      • Check, this checks if a plugin is able to install.
      • Install, this installs via the given plugin URI.
      • Run, this runs the given plugin with an input.
      • Test, this tests the given plugin with an input and expectations.
      • Log, this returns a readable stream for reading logs.
  • CLI
    • Current implementation to use Daemon SDK.
    • Support for connecting to Pipcook Daemon remotely.
    • Support for Plugin Dist, including Plugin APIs and Plugin Pack.
  • Internal works
    • ML Metadata, for managing the Pipeline objects.(https://github.com/google/ml-metadata)
      • Query Language:
        • SQL, standard query language, and supported by many products like mysql, psql, sqlite and flink.
        • KV-based just like Redis and LevelDB, but more works to be done for applications.
      • Storage, we need 2 modes "local" and "remote" for standalone and distributed architecture.
        • local: sqlite.
        • remote: psql.
    • Pipeline IR
      • IR Specification
        • Root fields
          • Property "id", to specify the pipeline object.
          • Property "plugins" to config plugins.
        • Features
          • Easy to diff changes.
          • Easy to convert sequence for interrupter.
          • Pipeline Recovery must be supported.
      • VM
        • Interrupter for the IR
        • OP Code transfer to Plugin Internal Call
    • Plugin Internals.
      • PluginRT Bootstrap
        • Seed Process for speeding up.
        • Lifecycle:
          • Create a runnable lock.
          • Initialize Boa Environment.
          • Install Python Dependencies.
          • Install PluginRT supporting files.
          • Call an init event to PluginRT.
          • Call plugin with an given input.
          • Release the runnable lock.
      • Plugin Internal Calls
        • 0x30 start, this starts a new PluginRT process.
        • 0x31 read, this reads an event from PluginRT.
        • 0x32 write, this writes an event to PluginRT.
        • 0x100 compile, reversed for future AOT of PluginRT.
      • Scheduler, this is for scheduling jobs for self-managed plugins and components.

pipeline: missing modelId and modelPath for ModelDefine plugin

current execution component: modelDefine
{
  pipelineId: 'e8e2cbf0-7e12-11ea-9cb2-8be7a2c34799',
  modelDir: '/Users/yorkie/workspace/pipcook/pipcook-output/e8e2cbf0-7e12-11ea-9cb2-8be7a2c34799/model',
  dataDir: '/Users/yorkie/workspace/pipcook/pipcook-output/e8e2cbf0-7e12-11ea-9cb2-8be7a2c34799/data'
}
Component modelDefine error:
TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received undefined

Error: ENOENT: no such file or directory, open '/Users/xxx/pipcook-examp'

image

拉到本地,按照readme启动后,执行node examples/pipeline-databinding-image-classification.js,一直不行,报错:

Error: ENOENT: no such file or directory, open '/Users/xx/xx/xx/pipcook-examp'

但目录里一直有 /Users/xx/xx/xx/pipcook-example/pipcook-project/examples/pipeline-databinding-image-classification.js 的,不知道为什么

windows, doc: the invalid path fail to checkout

I tries to add windows build at #22, how it fails with the below error:

git checkout --progress --force de57bf3f3a255524863363ff1a8f2132b3541031
error: invalid path 'docs/doc/How to develop a plugin?-en.md'
Removed matchers: 'checkout-git'
##[error]Git checkout failed with exit code: 128
##[error]Exit code 1 returned from process: file name 'c:\runners\2.165.2\bin\Runner.PluginHost.exe', arguments 'action "GitHub.Runner.Plugins.Repository.v1_0.CheckoutTask, Runner.Plugins"'.

"Pipcook init" node-gyp rebuild error

When i use node version 10 execute "pipcook init" will got error message:
image
I think maybe node version too old to got different API but pipcook not support?

model: example pipeline for image classification's accuracy is low

The training logs:

Epoch 1 / 15
eta=0.0 =====================================================================================================>
195228ms 480858us/step - acc=0.163 loss=2.67 val_acc=0.169 val_loss=2.77
Epoch 2 / 15
eta=0.0 =====================================================================================================>
191213ms 470967us/step - acc=0.264 loss=2.44 val_acc=0.266 val_loss=2.91
Epoch 3 / 15
eta=0.0 =====================================================================================================>
200318ms 493394us/step - acc=0.255 loss=2.56 val_acc=0.233 val_loss=3.34
Epoch 4 / 15
eta=0.0 =====================================================================================================>
204292ms 503182us/step - acc=0.248 loss=2.88 val_acc=0.232 val_loss=3.88
Epoch 5 / 15
eta=0.0 =====================================================================================================>
203780ms 501921us/step - acc=0.246 loss=3.29 val_acc=0.232 val_loss=4.45
Epoch 6 / 15
eta=0.0 =====================================================================================================>
200810ms 494607us/step - acc=0.246 loss=3.69 val_acc=0.232 val_loss=4.89
Epoch 7 / 15
eta=0.0 =====================================================================================================>
198668ms 489329us/step - acc=0.246 loss=4.05 val_acc=0.232 val_loss=5.15
Epoch 8 / 15
eta=0.0 =====================================================================================================>
197658ms 486843us/step - acc=0.246 loss=4.31 val_acc=0.232 val_loss=5.48
Epoch 9 / 15
eta=0.0 =====================================================================================================>
198095ms 487918us/step - acc=0.246 loss=4.48 val_acc=0.232 val_loss=5.51
Epoch 10 / 15
eta=0.0 =====================================================================================================>
196179ms 483200us/step - acc=0.246 loss=4.58 val_acc=0.232 val_loss=5.66
Epoch 11 / 15
eta=0.0 =====================================================================================================>
188978ms 465462us/step - acc=0.246 loss=4.62 val_acc=0.232 val_loss=5.61
Epoch 12 / 15
eta=0.0 =====================================================================================================>
190650ms 469582us/step - acc=0.246 loss=4.62 val_acc=0.232 val_loss=5.64
Epoch 13 / 15
eta=0.0 =====================================================================================================>
195142ms 480645us/step - acc=0.246 loss=4.60 val_acc=0.232 val_loss=5.61
Epoch 14 / 15
eta=0.0 =====================================================================================================>
199023ms 490203us/step - acc=0.246 loss=4.57 val_acc=0.232 val_loss=5.55
Epoch 15 / 15
eta=0.0 =====================================================================================================>
195624ms 481832us/step - acc=0.246 loss=4.53 val_acc=0.232 val_loss=5.56
current execution component: modelEvaluate
evaluate result:  {
  loss: Float32Array(1) [ 6.380456924438477 ],
  accuracy: Float32Array(1) [ 0.12019230425357819 ]
}

To reproduce the problem, just run the pipeline.

Boa: support of magic function overriding

Currently Boa does not support function overriding for those magic functions in python. Just open this issue to track the progress of this work.

For example

const boa = require('../packages/boa');
const sys = boa.import('sys');
const torch = boa.import('torch');

class customDataset extends torch.utils.data.Dataset {
  __len__() {
    return 5;
  }
  __getitem__(index) {
    return 2;
  }
}

const dataset =  new customDataset();

console.log(dataset.__getitem__(2));

now it gives error

Error: NotImplementedError:

meta: release lifecycle

The following are the questions associated with the release lifecycle and my answers:

Time-based or feature-based release?

Pipcook uses semver for release management, so the release generally refers to Major, Minor, and Patch.

For the major versions, time-based release is often after the software and ecosystem are relatively mature, so major will use feature-based. Each major version often needs to define the features of that version first, and then define the milestones to complete.

For the minor version, under the major version target is basically determined, the iteration period could be relatively fixed, so it can be time-based. Therefore, we can define a minor version as 2 weeks and an even number as stable.

For the patch version, it basically fixes the existing versions, so it needs to be divided into different situations to discuss. For the latest version (such as 0.7.x), we can decide whether to release it based on whether there are daily bugfixes. For historical versions (such as 0.3.x, 0.4.x), we need to determine the severity of the bug, the relevance of the project, and community feedback to operate manually.

Provide LTS(long term support) version

No, need to wait until the software matures, currently there is not enough people to maintain, but can provide patches for a specific version range.

Release automatically

Major: release maintainer, minor: release maintainer, patch: ci + release maintainer

@utkobe @wordcount Feel free to create new question and give your answers :)

meta: define Plugin Runtime

The ML low-level API is the basic layer that provides the basic ML power for plugin developers.

  • datasets
    • split
    • shuffle
    • sample
  • model
    • cv
      • gray
      • random
      • resize
    • nlp
      • tokenize
      • tf/idf
  • validation
    • ar/ap
    • confusion matrix
    • mAP
    • roc/auc
  • utils
    • download
    • zip/unzip

build: add a workflow to build daily image(docker)

Currently our docker installation is required to build the docker image by users, this is not easy enough for this use case. A new workflow to build daily image, and help to push to hubs, so that user is able to pull the image to run directly :)

cli: rewrite in TypeScript

Pipcook is using TypeScript to write our core, builtin plugins and tools, to keep the consistence of source code, it's great to use TypeScript for our cli, too.

If you are familiar with TypeScript, feel free to help us with this :)

meta: v1.0 roadmap

The following is what exactly we want Pipcook v1 looks like:

Screen Shot 2020-04-05 at 2 49 10 AM

We will introduce a new and unified layer for end-user that's so-called "Pipcook ML App", its goal is to simplify the developing machine learning applications, for more details you can go #33.

Now Pipcook is to be runtime for building ML applications, rather than a Node.js library. So we also make original pipcook-core to be a Daemon Process, it's used to operate plugins, and manage Pipelines via a new declarative language Pipeline IR.

To make sure what every plugin could run in more safety and powerful, Pipcook v1 is also to add the Plugin Runtime, which should be a standalone process/thread (via configuration), and there are the followings for plugin developers:

  • ML API: it's the low-level machine learning APIs.
  • Builtin Tensorflow.js library.
  • Python APIs via Node.js lets you call other Python functions in JavaScript seamlessly.

Okay, let's see what the detailed v1 milestone tasks will be:

  • CI/CD workflow.
    • pull request workflow.
    • GitHub Actions.
    • unit testing framework.
  • Solid documentation.
    • refactoring documentation build scripts #39.
    • tutorials includes: ML App, Pipeline and Plugin.
    • typedoc includes: ML App APIs and Plugin APIs.
  • Experiment ML App
    • definition of APIs and usage #33.
    • pipelines generator for training/serving.
    • Pipboard supports running ML App.
    • Pipboard supports preparing the dataset/samples.
  • Stable Pipeline.
    • decoupling the plugins at different stage.
    • using a static DSL to represent a pipeline, and it's also the IR for the high-level ML App layer.
    • refactoring for Pipcook daemon architecture for deploying online service.
      • HTTP APIs.
      • using HTTP APIs at CLI.
    • preparing the v1 builtin plugins list and complete them.
  • Stable Plugin APIs, how to valid a plugin works.
    • plugin specification.
    • plugin runtime.
      • provide the ml-base APIs.
      • provide the bridge(boa) to Python's library.
  • Pipboard: a Web application for using Pipcook:
    • Pipboard Extension.
    • interactive GUI to CRUD pipelines.
    • plugin discovery and configuration.
    • ability to run pipeline.
    • ability to run ML App.

Some functions could be reused between plugins

There some functions could be reused between plugins, e.g.:

function MakeWordsSet(words_file: string): Promise<Set<string>> {
...
}

this 'MakeWordsSet' in plugin model-define and model-evaluate are same and it can be reused.
So could we build a mechanism for reuse that same function between plugins?

boa: list of python libraries that should be built-in installed in boa

To let users use boa and python ecosystem easily, boa should install some libraries by default so that the users can feel like boa support those functions by default. Currently only numpy is installed.

Suggestions:

  • numpy
  • matplotlib
  • opencv
  • pandas
  • scipy
  • skicit-lerna
  • tensorflow
  • pytorch
  • Pillow
  • nltk
  • jieba

These eleven libraries include image-processing, math, machine learning and deep learning.

Please have a review please @yorkie

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.