alibaba / pipcook Goto Github PK

View Code? Open in Web Editor NEW

2.5K 49.0 204.0 17.26 MB

Machine learning platform for Web developers

Home Page: https://alibaba.github.io/pipcook/

License: Apache License 2.0

JavaScript 5.14% Dockerfile 0.23% TypeScript 79.65% Shell 0.86% Makefile 0.04% Jupyter Notebook 11.32% HTML 2.75%

js machine-learning tensorflow pipeline

pipcook's Introduction

A JavaScript application framework for machine learning and its engineering.

Documentation: English | 中文

Builds

Build Types	Status
tests
documentation
docker

Why Pipcook

With the mission of enabling JavaScript engineers to utilize the power of machine learning without any prerequisites and the vision to lead front-end technical field to the intelligention. Pipcook is to become the JavaScript application framework for the cross-cutting area of machine learning and front-end interaction.

We are truly to design Pipcook's API for front-end and machine learning applications, and focusing on the front-end area and developed from the JavaScript engineers' view. With the principle of being friendly to JavaScript, we will push the whole area forward with the machine learning engineering. For this reason we opened an issue about machine-learning application APIs, and look forward to you get involved.

What's Pipcook

The project provides subprojects including machine learning pipeline framework, management tools, a JavaScript runtime for machine learning, and these can be also used as building blocks in conjunction with other projects.

Principles

Pipcook is an open-source project guided by strong principles, aiming to be modular and flexible on user experience. It is open to the community to help set its direction.

Modular the project includes some of projects that have well-defined functions and APIs that work together.
Swappable the project includes enough modules to build what Pipcook has done, but its modular architecture ensures that most of the modules can be swapped by different implementations.

Audience

Pipcook is intended for Web engineers looking to:

learn what's machine learning.
train their models and serve them.
optimize own models for better model evaluation results, like higher accuracy for image classification.

If you are in the above conditions, just try it via installation guide.

Subprojects

Pipcook Pipeline

It's used to represent ML pipelines consisting of Pipcook scripts. This layer ensures the stability and scalability of the whole system and uses a plug-in mechanism to support rich functions including dataset, training, validations, and deployment.

A Pipcook Pipeline is generally composed of lots of scripts. Through different scripts and configurations, the final output to us is an NPM package, which contains the trained model and JavaScript functions that can be used directly.

Note: In Pipcook, each pipeline has only one role, which is to output the above-trained model you need. That is to say, the last stage of each pipeline must be the output of the trained model, otherwise, this Pipeline is invalid.

Pipcook Bridge to Python

For JavaScript engineers, the most difficult part is the lack of a mature machine learning toolset in the ecosystem. In Pipcook, a module called [Boa][https://github.com/imgcook/boa], which provides access to Python packages by bridging the interface of CPython using N-API.

With it, developers can use packages such as numpy, scikit-learn, jieba, tensorflow, or any other Python ecology in the Node.js runtime through JavaScript.

Quick start

Setup

Prepare the following on your machine:

Installer	Version Range
Node.js	>= 12.17 or >= 14.0.0
npm	>= 6.14.4

Install the command-line tool for managing Pipcook projects:

$ npm install -g @pipcook/cli

Then train from anyone of those pipelines, we take image classification as an example:

$ pipcook train https://cdn.jsdelivr.net/gh/alibaba/pipcook@main/example/pipelines/image-classification-mobilenet.json -o ./output

This dataset specfied by the pipeline includes 2 categories image: avatar and blurBackground. After training, we can predict the category of a image:

$ pipcook predict ./output/image-classification-mobilenet.json -s ./output/data/validation/blurBackground/71197_223__30.7_36.jpg
✔ Origin result:[{"id":1,"category":"blurBackground","score":0.9998120665550232}]

The input is a blurBackground image from the validation dataset. And the model determines that its category is blurBackground.

Want to deploy it?

$ pipcook serve ./output
ℹ preparing framework
ℹ preparing scripts
ℹ preparing artifact plugins
ℹ initializing framework packages
Pipcook has served at: http://localhost:9091

Then you can open the browser and try your image classification server.

Playground

If you are wondering what you can do in Pipcook and where you can check your training logs and models, you could start from Pipboard:

open https://pipboard.imgcook.com

You will see a web page prompt in your browser, and there is a MNIST showcase on the home page and play around there.

Pipelines

If you want to train a model to recognize MNIST handwritten digits by yourself, you could try the examples below.

Name	Description	Open in Colab
mnist-image-classification	pipeline for classific MNIST image classification problem.	N/A
databinding-image-classification	pipeline example to train the image classification task which is to classify imgcook databinding pictures.
object-detection	pipeline example to train object detection task which is for component recognition used by imgcook.
text-bayes-classification	pipeline example to train text classification task with bayes	N/A

See here for complete list, and it's easy and quick to run these examples. For example, to do a MNIST image classification, just run the following to start the pipeline:

$ pipcook run https://cdn.jsdelivr.net/gh/alibaba/pipcook@main/example/pipelines/image-classification-mobilenet.json -o output

After the above pipeline is completed, you have already trained a model at the current output/model directory, it's a tensorflow.js model.

Developers

Clone this repository:

$ git clone [email protected]:alibaba/pipcook.git

Install dependencies, e.g. via npm:

$ npm install

After the above, now build the project:

$ npm run build

Developer Documentation English | 中文
Project Guide

Community

DingTalk

Or searched via the group number: 30624012.

Download DingTalk (an all-in-one free communication and collaboration platform) here: English | 中文

Gitter Room

Who's using it

License

Apache 2.0

pipcook's People

Contributors

Stargazers

Watchers

Forkers

jingwhale companyfe lanboss anyexinglu regexp-lin nannan9507 nkgfirecream weixuefeng sdgdsffdsfff yorkie gindis rickycao-qy qc-l ai-ml-projects imgcook-admin leowang721 yuxizhe haozi torns 353170753 imsobear wanghongli145 sandy4321 mickeymouse-lh liangzr jihangguo thomasyxy stjordanis nss6000 gitter-badger dingxinh5publicity yidianier ys610zz sinoon heluwe rickyes alraja mengfangui fuying1975 jabez128 xiaoyi-tyut lizike20031423 liruiqing linzuxin joker-jelly nerffei lewis617 lumierex xhcom-ui sirm2z lijiajunxs macbesu forksource hellomike wufuguo0213 imgcook hyqapple dileep8014 keyzf candyqiu meinicheng eliyao alfex4936 imthunder ederzz jason-kid wk19921225 sawravchy sunskyor xiejunpeng66 anseldai devilyouwei sungongwei awesomemachinelearning qijingyu2013 yhua123 ly15927086342 duanyou hhy5277 wenheli evanoxu csbbaa eos-octopus rotate-life winning1120xx lingxyz 8847141 liubin1777 baobao12356 lhongjum 00mjk nyhxiaoning hongyin163 upupzealot holanlan laofo havefive zhirongyuan mowatermelon chenjiayuan195

pipcook's Issues

boa: fail to install @pipcook/[email protected] independently

Want to use @pipcook/[email protected] singly, but 'Failed at the @pipcook/[email protected] preinstall script.'
node v12.4.0
npm 6.9.0
python 3.7
macOS

meta: how to work with tensorboard

TensorBoard is TensorFlow's visualization toolkit, it can do many visualization works as https://www.tensorflow.org/tensorboard described. To use this toolkit with the model, the easy way is to use tf.keras.callbacks.TensorBoard with model.fit's callbacks, and the other ways are under the tf.summary.

There are two ways to use TensorBoard in Pipcook:

use python-node to use the above Python methods
tfjs-node adds the TensorBoard at: tensorflow/tfjs-node#202

Which way do you think is better? @utkobe @wordcount

meta: about project scope, contributions and plugin ecosystem

A clear project scope can help us make better choices. Here we will clearly define the source code, configuration and documentation parts that need to be included in Pipcook. This project Pipcook as an open source project, we should welcome different types of contributions at different levels, which will include the scope of the project mentioned earlier. The last discussion is about the plugin ecosystem, I will describe the unit organization structure of our plugin ecosystem and how to integrate it with NPM and JavaScript to develop together.

Project scope

The project "Pipcook" software includes the followings:

source code of the framework, high-level apis, command-line tools and builtin plugins.
documents and specifications of framework, high-level apis, command-line tools and built plugins.
a Web launcher for plugins discovery, dataset selection, pipeline creation, model deployment, and visualization.

The plugin plays an important role in this project, pipeline does schedule some of plugins which are wrapped as component and working together to output the model or service to deploy. Each plugin needs to follow the below:

MUST be a NPM package, which means some files of package.json and a main file, TypeScript(*.ts) is recommended by default.
SHOULD have a README for introducing the plugin.
SHOULD have tsdoc/jsdoc annotations or HTML version for API references.
SHOULD have unit tests for code quality.

Contributions & Contributors

After understanding the project scope and plugins, let's take a look at what types of contributions and contributors pipcook will accept as an open source project.

contribution to web launcher
contribution to command-line tools
contribution to framework and high-level apis
contribution to built-in plugin

In addition to the above, we'll describe user-land plugin at the section "plugin ecosystem".

Each contribution mentioned above MUST follow these rules:

contributor submits a pull request to describe the technical details.
changes in this pull request include some of source code, document and configuration.
changes in this pull request pass all the related build instructions.
changes in this pull request receive over 1 approval from project collaborators.
- changes of framework and high-level apis does require core collaborators' approvals.
- changes of built-in plugin does require the built-in plugin collaborators' approvals.

We have also classified the contributors as follows:

contributor: someone who has the contributions in the project scope.
collaborator: project maintainer who does make improvements, fix bugs, and review pull requests.
- core collaborator maintains all the project scope, focusing on framework, high-level apis and release management.
- built-in plugin collaborator maintains specific one or more built-in plugins.

Plugin ecosystem

The composition and requirements of the plugin was mentioned in the previous chapter, so here we will define some rules between plugins, namely plugin ecosystem.

From the maintainer's perspective, plugins can be divided into built-in and community ones:

built-in plugins are maintained by core collaborators and released with the Pipcook.
each community plugin is maintained and released by the author himself/herself, Pipcook can download the specified plugins through git, npm or oss.
private plugin is maintained by private organization or company itself.

To help Pipcook discover all the plugins, the project provides some rules to let the Web launcher discover community ones:

add GitHub topic "pipcook-plugin", see https://github.com/topics/pipcook-plugin.
add "pipcook-plugin" in the package.json's "keywords", see https://www.npmjs.com/search?q=keywords:pipcook-plugin.
create a pull request to add the plugin URI by updating the COMMUNITY_PLUGINS.md.
(to be added).

Community plugins can also be submitted as built-in plugins through pull requests, but this requires nomination by a core collaborator and the approvals of at least 2 collaborators.

boa: tc39 proposals to improve the usage

We just list the followings as those proposing ES features that helps JavaScript to more readable like Python:

And we have proposals:

https://es.discourse.group/t/new-well-known-symbol-symbol-tojson/322

PipApp: the application framework for machine learning

The vision of Pipcook is to take the JavaScript developers and engineers into the world of machine learning quickly and seamlessly, then we're responsible for creating easy enough APIs.

In the Pipcook stack, the pipcook-app is to be defined the ML application, which abstracts some duplicated stuffs and hides low-level algorithm implementation which requires a learning curve for every ML rookie.

APIs

Every module represents a type of dataset, and basically we provide some different methods for developers.

module `ml`

This module is to create machine learning functions, it provides the core abilities to represent your machine learning application in an intuitive way.

interface `ml.Function`

To hide the ML details as possible, Pipcook lets your declare your functions for machine learning purpose in a specific type ml.Function, you can create a ml.Function via the following create() function.

Internally, the Pipcook compiler parses the applications, then generates the training code via the ml.Function instances, and replaces these slots with model generated inferences.

interface `ml.FunctionImpl(arg: data.MLType)`

This interface is to describe the internal machine learning internals for applications, and it accepts an argument in data.MLType as the input, however the output's type is not required.

`create(fn: ml.FunctionImpl): ml.Function`

This is to create the above ml.Function with a ml.FunctionImpl object.

const mlfunc: ml.Function = ml.create((input: ml.ImageType) => {
  // call other ML Application APIs here and return
});

// ...
mlfunc(new ml.ImageType(...)); // call this function anywhere.

module `data`

This module is to declare all types for your application's I/O.

interface `data.MLType`

It's the base interface to tell the Pipcook compiler a type for ML.

interface `data.ImageType` extends `data.MLType`

It represents the image type for given ml.Function I/O.

interface `data.TextType` extends `data.MLType`

It represents the text type for given ml.Function I/O.

module `vision`

This module provides vision-related functions like image classification and object detection.

interface `vision.Position2D`

it represents the position in 2d for object detections:

label {string} the label string represents the object's type.
left {number} the left of detected object in pixel.
top {number} the top of detected object in pixel.
height {number} the height of detected object.
width {number} the width of detected object.

`classify(img: ImageType): string`

It recognizes the type of image, and returns the type string.

ml.create((img: data.ImageType) => {
  const label = vision.classify(img); // returns the label
});

`detect(img: ImageType): vision.Position2D[]`

It detects target from a single image, and returns the position and label of detected objects.

ml.create((img: data.ImageType) => {
  const objects = vision.detect(img);
  objects.forEach((o) => {
    console.log(o.label, o.left, o.top); // prints the label, left and top.
  });
});

module `nlp`

This module provides NLP-related functions like text classification and clustering.

interface `nlp.Cluster`

label {string} the label for this cluster.
items {string[]} the strings in this cluster.

interface `nlp.ClusteringResult`

clusters {nlp.Cluster[]} all grouped clusters, and each is an object of nlp.Cluster.
noises {string[]} all labeled noises strings.

`classify(input: string): string`

it recognizes the type of text, and returns the type string.

`clustering(inputs: string[]): nlp.ClusteringResult`

it clusters all types of given inputs, and returns the result in nlp.ClusteringResult.

Anti-APIs

The anti-API means the API must be hidden under the application user, there is a list here:

hide the training workflow, therefore some interfaces to train and predict should be invisible.
hide the dataset workflow, in the future, developer uses a tool for dataset processing and validation.
hide the model-related APIs: graph structure, parameters and model validation.
hide the serving implementation, every ML application should be serve-able in pipcook-app, thus we don't any other APIs for serving models specially.

Example

// example.ts
import { ml, vision, data } from '@pipcook/pipcook-app';

class MyImage extends data.ImageType {
  constructor(x, y, buffer) {
    super(x, y, buffer, 100, 100);
  }
}

const listAvatars: ml.Function = ml.create((img: MyImage) => {
  const components = vision.recognizeComponent(img);
  if (!component)
    return false;

  components.map((item: UIView) => {
    const img = item.toImage() as UIImage;
    return vision.detectFace(img);
  }).filter((avatar: data.FaceType) => {
    return avatar !== null;
  });
});

// use the listAvatars function for your use
const app = express();
app.get('/', (req, res) => {
  const img = new MyImage(req.body.x, req.body.y, req.body.buffer);
  res.json(listAvatars(img).toJSON());
});

Then run the following commands to train:

$ pipcook train example.ts --epoch=5 --no-validation
generated the model at example.ts.im

And run your ML application:

$ pipcook try example.ts
$ pipcook deploy example.ts --eas=xxx

doc: optimize the guide for beginners and developers

Currently the document is still in progress, we can optimize documents from these perspectives:

Currently the 'getting started' doc is too simple and naive, the getting started document should include
- How to use pipcook
- How to organize data
- How to choose plugin
- How to deploy
Currently the developer guide is not clear enough, it should include:
- How to init the plugin developer environment
- How to write the plugin
- How to publish the plugin
There is no doc / api reference for plugins themselves, so that the users are not clear what plugins we have and what parameters should be includes for specific plugin. We need to optimize plugin doc
Following above, probably it's good to have appropriate comments in plugins so that the doc can be generated to some extent

I can't init the project! Who can help me

I use archlinux, nvm, node 13.13.0, python 3.8.3

~/CODE/pipcook-example » pipcook init                            han@archlinux
? which client do you want to use? npm
⠋ installing pipcook[..................] / rollbackFailedOptional: verb npm-sess
> @pipcook/[email protected] install /home/han/CODE/pipcook-example/node_modules/@pipcook/boa
> node-gyp rebuild

gyp WARN install got an error, rolling back install
gyp ERR! configure error 
gyp ERR! stack Error: getaddrinfo EAI_AGAIN nodejs.org
gyp ERR! stack     at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:66:26)
gyp ERR! System Linux 5.6.5-arch3-1
gyp ERR! command "/home/han/.nvm/versions/node/v13.13.0/bin/node" "/home/han/.nvm/versions/node/v13.13.0/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /home/han/CODE/pipcook-example/node_modules/@pipcook/boa
gyp ERR! node -v v13.13.0
gyp ERR! node-gyp -v v5.1.0
gyp ERR! not ok 
npm WARN [email protected] No description
npm WARN [email protected] No repository field.

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! @pipcook/[email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the @pipcook/[email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/han/.npm/_logs/2020-04-21T02_58_25_001Z-debug.log
✖ install Error: Command failed: npm install @pipcook/boa --save error

Could we provide a source easier to download libtensorflow?

It is difficult to download libtensorflow when user init project, install dependents and so on because of the reason of network as everyone knows;
Could we provide a source easier to download it, and give an appropriate way to set it? :)

core, cli: use debug instead of raw console

We have some debug logs, it's better to use debug instead.

boa: make benchmarks for boa(js) and cpython

The benchmarks are at https://github.com/python/pyperformance/tree/master/pyperformance/benchmarks.

meta: lint.

use lint to standardized the code.

proposal to add some labels to manage our issues

The labels list are the following:

pipcook-core: the pipcook-core issues
build: the build(CI/CD) issues
plugin: the plugin issues
tests: the test issues
model: the model issues

@utkobe may I ask you to have review the above, I will do an operation after getting approvals of you.

meta: proposals for customized UI plugin for Pipboard

As discussed with @yorkie @wordcount , we would allow users to develop user-interface plugins to extend the ability of Pipboard. This issue will discuss the specification for ui-plugin.

Built-in Pipboard

Pipcook will provide a built-in web launcher and we name it as Pipboard as discussed in #29 , Basically default Pipboard will provide access to build pipeline, check logs and models. The basic structure is shown as below:

Customized UI Plugin

Customized user-interface plugin still belongs to pipcook plugin ecosystem (refer to #17 ). Currently plugins are categorized into:

data collect
data access
data process
model load
model train
model evaluate
model deploy

Now for UI plugin, we will add the 8th category:

user interface

After the UI plugin is used, Pipboard will incorporate its user-interface into itself. For example, if a plugin called 'customize-demo' is developed, Pipboard will be shown as below and the content area will show contents of this plugin.

UI Plugin Developer Guide

The specification for UI plugin is as follows:

MUST be a NPM package
MUST have the structure as follows:
- build (front-end codes after built with package tool)
- src [optional] (your source codes for user interface)
- config.json (config file. Configure your plugin name to be shown in tab)
- package.json
- index.js
SHOULD have a readme for us to understand

In case the UI will need to do some basic operations on local file system, we will provide several basic APIs to achieve this:

API to get info about local models
API to get info about local training history logs
API to read a file from local system
API to write a file from local system

How to use UI Plugin

The user can follow these steps to use a specific plugin

install UI plugin npm package into working space
use command interface

pipcook board --plugin=<npm package name>

I will provide a example plugin later on

Please help me have a review on this proposal @yorkie @wordcount

pipboard: integrate facets for sample visualization

https://pair-code.github.io/facets/ is for checking common problems in data/sample, so we could have a facets extension to achieve sample visualization.

build: add a new action for validating the user scenarios

We have received some feedback when using the CLI to initialize plugin development, so I think we need to add test cases for these specific user scenarios in addition to ensuring unit testing, I have compiled a list, if you need to add, you can also comment, too:

pipcook should looks like a community product?

I means that pipcook contains many modules in its path（package）like app/cli/core and board clint/server，do not you think it's confusing？That make pipcook looks like a aggregate of many libs.
OK I would told my opinion：
For example，pipcook-board-server，a server for board contains few apis but build from egg，as every one knows that egg is an awesome framework for engineering in back end but may put fine timber to petty use，we could streamlining the board server use a lightweight framework.
In short，I proposal we need to simplify pipcook structure by using lightweight framework，remove redundancy files，planing development mode for plugin，etc.:)

meta: discuss how to use python for plugin author

We'll have a builtin Python integration called BOA, and it's a Node.js binding, so it's not going to do any change for installing Python packages.

So I propose that we will have the some commands for managing Python's packages, and append the corresponding PYTHONPATH on running Pipelines.

Does pipcook board need to provide Chinese version？

Docs of pipcook provide two versions: english & 中文, and does board need?

不论是本地安装cli还是docker安装都会报错

根据示例，在目录文件执行 pipcook init 时

# pipcook init
internal/streams/legacy.js:59
      throw er; // Unhandled stream error in pipe.
      ^

Error: connect ECONNREFUSED 151.101.228.133:443
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1129:14) {
  errno: 'ECONNREFUSED',
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '151.101.228.133',
  port: 443
}

但是pip-project文件确实可以成功生成，

然后执行 node examples/pipeline-mnist-image-classfication.js时

internal/modules/cjs/loader.js:800
    throw err;
    ^

Error: Cannot find module '@pipcook/pipcook-core'
Require stack:
- /document/pipcook-project/examples/pipeline-mnist-image-classfication.js
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:797:15)
    at Function.Module._load (internal/modules/cjs/loader.js:690:27)
    at Module.require (internal/modules/cjs/loader.js:852:19)
    at require (internal/modules/cjs/helpers.js:74:18)
    at Object.<anonymous> (/document/pipcook-project/examples/pipeline-mnist-image-classfication.js:23:86)
    at Module._compile (internal/modules/cjs/loader.js:959:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:995:10)
    at Module.load (internal/modules/cjs/loader.js:815:32)
    at Function.Module._load (internal/modules/cjs/loader.js:727:14)
    at Function.Module.runMain (internal/modules/cjs/loader.js:1047:10) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/document/pipcook-project/examples/pipeline-mnist-image-classfication.js'
  ]
}

Boa: magic functions are not working as expected

This bug is likely related to #61
Now calling magic function directly is working file. But use system built-in function to call it is giving errors. For example

const boa = require('../packages/boa');
const torch = boa.import('torch');
const {len} = boa.builtins();

class customDataset extends torch.utils.data.Dataset {
  __len__() {
    return 5;
  }
  __getitem__(index) {
    return 2;
  }
}

const dataset =  new customDataset();
console.log(dataset.__len__());  # This is wokring fine
console.log(len(dataset));  # This gives errors

Accordingly in raw python, this is working:

import sys
import torch

class customDataset( torch.utils.data.Dataset ):
  def __len__(self): 
    return 5
  
  def __getitem__(self, index): 
    return 2
  

dataset = customDataset()

print(dataset.__len__())
print(len(dataset))

Checking model accuracy for every merge.

'Python.h' file not found.

Problem

for boa version is 0.5.2, it will throw an error: "'Python.h' file not found" when run 'npm install @pipcook/boa' or 'pipcook init':

Solved

This version of boa(0.5.2) is dependents Python and Homebrew, so you need:
1、checking 'brew' was installed in your mac
2、checking 'Python3' was installed in your mac by run

$ brew --prefix python\@3

and you would found Python3 was installed, then checking:

$ `brew --prefix python\@3`/Frameworks/Python.framework/Versions/3.7/include/python3.7m

if it is not print effective info, it means you need install Python3 from

$ brew install python@3

right case:

meta: proposal of Pipboard functional documentation

Discussed with @wordcount @utkobe, we rename the Web Launcher to Pipboard, it's more meaningful. And we have more details to be clear at this meeting.

Software architecture: Client/Server
Tech stacks: client(alibaba/ice), server(midwayjs/midway)

Pipboard is used for pipeline management in GUI, plugin configuration and model visualization, and we decide to extend the Pipboard from the pipcook-cli's board sub-command, which is able to view the training progress.

Otherwise, for extending the data-collect plugins in Web way, we just add a plugin property: "web", which represents this plugin is a Web plugin. And Web plugin could only be inserted to a pipeline at Pipboard, and this type of plugin actually is Web project that must contain an index.html as the entrance, then Pipboard will open that in a new sandbox <window>.

meta: define the spec of Pipcook Daemon

We have introduced Pipcook Daemon at #30, now let's define the Pipcook Daemon in details.

pipeline: missing modelId and modelPath for ModelDefine plugin

current execution component: modelDefine
{
  pipelineId: 'e8e2cbf0-7e12-11ea-9cb2-8be7a2c34799',
  modelDir: '/Users/yorkie/workspace/pipcook/pipcook-output/e8e2cbf0-7e12-11ea-9cb2-8be7a2c34799/model',
  dataDir: '/Users/yorkie/workspace/pipcook/pipcook-output/e8e2cbf0-7e12-11ea-9cb2-8be7a2c34799/data'
}
Component modelDefine error:
TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received undefined

boa: API coverage report about specific python version

See https://github.com/python/typeshed, could check the compatibility of the Python standard library with pyi in this repository.

Error: ENOENT: no such file or directory, open '/Users/xxx/pipcook-examp'

拉到本地，按照readme启动后，执行node examples/pipeline-databinding-image-classification.js，一直不行，报错：

Error: ENOENT: no such file or directory, open '/Users/xx/xx/xx/pipcook-examp'

但目录里一直有 /Users/xx/xx/xx/pipcook-example/pipcook-project/examples/pipeline-databinding-image-classification.js 的，不知道为什么

windows, doc: the invalid path fail to checkout

I tries to add windows build at #22, how it fails with the below error:

git checkout --progress --force de57bf3f3a255524863363ff1a8f2132b3541031
error: invalid path 'docs/doc/How to develop a plugin?-en.md'
Removed matchers: 'checkout-git'
##[error]Git checkout failed with exit code: 128
##[error]Exit code 1 returned from process: file name 'c:\runners\2.165.2\bin\Runner.PluginHost.exe', arguments 'action "GitHub.Runner.Plugins.Repository.v1_0.CheckoutTask, Runner.Plugins"'.

build: add a new workflow for release

Proposed a new way to release Pipcook as:

create a new release via https://github.com/alibaba/pipcook/releases/new
add a workflow which listens the release event, and do version bumps and add assets to the drafted release.

See https://github.com/actions/create-release for more details about how to release with GitHub Actions.

core: use TypeScript Record<K,T> instead of custom object.

Inspired at #140 (comment), and see https://www.typescriptlang.org/docs/handbook/utility-types.html#recordkt for Record<K, T>.

Help us to improve, thank you :)

core: remove type assertion as possible

Type assertion allows us to override TypeSciprt's inference, and we should avoid using them to solve type conflicts manually, which means we should be careful with the use of assertions

Refer to:

"Pipcook init" node-gyp rebuild error

When i use node version 10 execute "pipcook init" will got error message:

I think maybe node version too old to got different API but pipcook not support?

The tutorial content 'Get started with Pipeline API' does not exist.

hi, all:
I can't access this link: https://alibaba.github.io/pipcook/#/tutorials/get-started-with-pipeline-api
Is this page removed?

model: example pipeline for image classification's accuracy is low

The training logs:

Epoch 1 / 15
eta=0.0 =====================================================================================================>
195228ms 480858us/step - acc=0.163 loss=2.67 val_acc=0.169 val_loss=2.77
Epoch 2 / 15
eta=0.0 =====================================================================================================>
191213ms 470967us/step - acc=0.264 loss=2.44 val_acc=0.266 val_loss=2.91
Epoch 3 / 15
eta=0.0 =====================================================================================================>
200318ms 493394us/step - acc=0.255 loss=2.56 val_acc=0.233 val_loss=3.34
Epoch 4 / 15
eta=0.0 =====================================================================================================>
204292ms 503182us/step - acc=0.248 loss=2.88 val_acc=0.232 val_loss=3.88
Epoch 5 / 15
eta=0.0 =====================================================================================================>
203780ms 501921us/step - acc=0.246 loss=3.29 val_acc=0.232 val_loss=4.45
Epoch 6 / 15
eta=0.0 =====================================================================================================>
200810ms 494607us/step - acc=0.246 loss=3.69 val_acc=0.232 val_loss=4.89
Epoch 7 / 15
eta=0.0 =====================================================================================================>
198668ms 489329us/step - acc=0.246 loss=4.05 val_acc=0.232 val_loss=5.15
Epoch 8 / 15
eta=0.0 =====================================================================================================>
197658ms 486843us/step - acc=0.246 loss=4.31 val_acc=0.232 val_loss=5.48
Epoch 9 / 15
eta=0.0 =====================================================================================================>
198095ms 487918us/step - acc=0.246 loss=4.48 val_acc=0.232 val_loss=5.51
Epoch 10 / 15
eta=0.0 =====================================================================================================>
196179ms 483200us/step - acc=0.246 loss=4.58 val_acc=0.232 val_loss=5.66
Epoch 11 / 15
eta=0.0 =====================================================================================================>
188978ms 465462us/step - acc=0.246 loss=4.62 val_acc=0.232 val_loss=5.61
Epoch 12 / 15
eta=0.0 =====================================================================================================>
190650ms 469582us/step - acc=0.246 loss=4.62 val_acc=0.232 val_loss=5.64
Epoch 13 / 15
eta=0.0 =====================================================================================================>
195142ms 480645us/step - acc=0.246 loss=4.60 val_acc=0.232 val_loss=5.61
Epoch 14 / 15
eta=0.0 =====================================================================================================>
199023ms 490203us/step - acc=0.246 loss=4.57 val_acc=0.232 val_loss=5.55
Epoch 15 / 15
eta=0.0 =====================================================================================================>
195624ms 481832us/step - acc=0.246 loss=4.53 val_acc=0.232 val_loss=5.56
current execution component: modelEvaluate
evaluate result:  {
  loss: Float32Array(1) [ 6.380456924438477 ],
  accuracy: Float32Array(1) [ 0.12019230425357819 ]
}

To reproduce the problem, just run the pipeline.

meta: Would it add benchmark and more standardized unit test?

Boa: support of magic function overriding

Currently Boa does not support function overriding for those magic functions in python. Just open this issue to track the progress of this work.

For example

const boa = require('../packages/boa');
const sys = boa.import('sys');
const torch = boa.import('torch');

class customDataset extends torch.utils.data.Dataset {
  __len__() {
    return 5;
  }
  __getitem__(index) {
    return 2;
  }
}

const dataset =  new customDataset();

console.log(dataset.__getitem__(2));

now it gives error

Error: NotImplementedError:

meta: release lifecycle

The following are the questions associated with the release lifecycle and my answers:

Time-based or feature-based release?

Pipcook uses semver for release management, so the release generally refers to Major, Minor, and Patch.

For the major versions, time-based release is often after the software and ecosystem are relatively mature, so major will use feature-based. Each major version often needs to define the features of that version first, and then define the milestones to complete.

For the minor version, under the major version target is basically determined, the iteration period could be relatively fixed, so it can be time-based. Therefore, we can define a minor version as 2 weeks and an even number as stable.

For the patch version, it basically fixes the existing versions, so it needs to be divided into different situations to discuss. For the latest version (such as 0.7.x), we can decide whether to release it based on whether there are daily bugfixes. For historical versions (such as 0.3.x, 0.4.x), we need to determine the severity of the bug, the relevance of the project, and community feedback to operate manually.

Provide LTS(long term support) version

No, need to wait until the software matures, currently there is not enough people to maintain, but can provide patches for a specific version range.

Release automatically

Major: release maintainer, minor: release maintainer, patch: ci + release maintainer

@utkobe @wordcount Feel free to create new question and give your answers :)

meta: define Plugin Runtime

The ML low-level API is the basic layer that provides the basic ML power for plugin developers.

太好了，终于等到了

core: remove `any` types as possible

The any is not recommended in TypeScript, feel free to open PR to help us remove one or them :)

boa: cli to generate the typings for current python env

The difficulty to write boa program is to discover the Python's ecosystem, maybe typings would resolve this.

Also, the pypi files in https://github.com/python/typeshed are able to be used for generating typings.

plugin: help us to rewrite python-based files in boa

We still have some Python source code even though Boa is integrated, let's rewrite them:

core: use ES6 modules instead of calls to "require"

build: add a workflow to build daily image(docker)

Currently our docker installation is required to build the docker image by users, this is not easy enough for this use case. A new workflow to build daily image, and help to push to hubs, so that user is able to pull the image to run directly :)

cli: rewrite in TypeScript

Pipcook is using TypeScript to write our core, builtin plugins and tools, to keep the consistence of source code, it's great to use TypeScript for our cli, too.

If you are familiar with TypeScript, feel free to help us with this :)

meta: v1.0 roadmap

The following is what exactly we want Pipcook v1 looks like:

We will introduce a new and unified layer for end-user that's so-called "Pipcook ML App", its goal is to simplify the developing machine learning applications, for more details you can go #33.

Now Pipcook is to be runtime for building ML applications, rather than a Node.js library. So we also make original pipcook-core to be a Daemon Process, it's used to operate plugins, and manage Pipelines via a new declarative language Pipeline IR.

To make sure what every plugin could run in more safety and powerful, Pipcook v1 is also to add the Plugin Runtime, which should be a standalone process/thread (via configuration), and there are the followings for plugin developers:

ML API: it's the low-level machine learning APIs.
Builtin Tensorflow.js library.
Python APIs via Node.js lets you call other Python functions in JavaScript seamlessly.

Okay, let's see what the detailed v1 milestone tasks will be:

Some functions could be reused between plugins

There some functions could be reused between plugins, e.g.:

function MakeWordsSet(words_file: string): Promise<Set<string>> {
...
}

this 'MakeWordsSet' in plugin model-define and model-evaluate are same and it can be reused.
So could we build a mechanism for reuse that same function between plugins?

numpy
matplotlib
opencv
pandas
scipy
skicit-lerna
tensorflow
pytorch
Pillow
nltk
jieba

These eleven libraries include image-processing, math, machine learning and deep learning.

Please have a review please @yorkie