A JavaScript application framework for machine learning and its engineering.
| Build Types | Status |
|---|---|
| tests | |
| documentation | |
| docker |
With the mission of enabling JavaScript engineers to utilize the power of machine learning without any prerequisites and the vision to lead front-end technical field to the intelligention. Pipcook is to become the JavaScript application framework for the cross-cutting area of machine learning and front-end interaction.
We are truly to design Pipcook's API for front-end and machine learning applications, and focusing on the front-end area and developed from the JavaScript engineers' view. With the principle of being friendly to JavaScript, we will push the whole area forward with the machine learning engineering. For this reason we opened an issue about machine-learning application APIs, and look forward to you get involved.
The project provides subprojects including machine learning pipeline framework, management tools, a JavaScript runtime for machine learning, and these can be also used as building blocks in conjunction with other projects.
Pipcook is an open-source project guided by strong principles, aiming to be modular and flexible on user experience. It is open to the community to help set its direction.
- Modular the project includes some of projects that have well-defined functions and APIs that work together.
- Swappable the project includes enough modules to build what Pipcook has done, but its modular architecture ensures that most of the modules can be swapped by different implementations.
Pipcook is intended for Web engineers looking to:
- learn what's machine learning.
- train their models and serve them.
- optimize own models for better model evaluation results, like higher accuracy for image classification.
If you are in the above conditions, just try it via installation guide.
Pipcook Pipeline
It's used to represent ML pipelines consisting of Pipcook scripts. This layer ensures the stability and scalability of the whole system and uses a plug-in mechanism to support rich functions including dataset, training, validations, and deployment.
A Pipcook Pipeline is generally composed of lots of scripts. Through different scripts and configurations, the final output to us is an NPM package, which contains the trained model and JavaScript functions that can be used directly.
Note: In Pipcook, each pipeline has only one role, which is to output the above-trained model you need. That is to say, the last stage of each pipeline must be the output of the trained model, otherwise, this Pipeline is invalid.
Pipcook Bridge to Python
For JavaScript engineers, the most difficult part is the lack of a mature machine learning toolset in the ecosystem. In Pipcook, a module called [Boa][https://github.com/imgcook/boa], which provides access to Python packages by bridging the interface of CPython using N-API.
With it, developers can use packages such as numpy, scikit-learn, jieba, tensorflow, or any other Python ecology in the Node.js runtime through JavaScript.
Prepare the following on your machine:
| Installer | Version Range |
|---|---|
| Node.js | >= 12.17 or >= 14.0.0 |
| npm | >= 6.14.4 |
Install the command-line tool for managing Pipcook projects:
$ npm install -g @pipcook/cliThen train from anyone of those pipelines, we take image classification as an example:
$ pipcook train https://cdn.jsdelivr.net/gh/alibaba/pipcook@main/example/pipelines/image-classification-mobilenet.json -o ./outputThis dataset specfied by the pipeline includes 2 categories image: avatar and blurBackground. After training, we can predict the category of a image:
$ pipcook predict ./output/image-classification-mobilenet.json -s ./output/data/validation/blurBackground/71197_223__30.7_36.jpg
✔ Origin result:[{"id":1,"category":"blurBackground","score":0.9998120665550232}]
The input is a blurBackground image from the validation dataset. And the model determines that its category is blurBackground.
Want to deploy it?
$ pipcook serve ./output
ℹ preparing framework
ℹ preparing scripts
ℹ preparing artifact plugins
ℹ initializing framework packages
Pipcook has served at: http://localhost:9091
Then you can open the browser and try your image classification server.
If you are wondering what you can do in Pipcook and where you can check your training logs and models, you could start from Pipboard:
open https://pipboard.imgcook.comYou will see a web page prompt in your browser, and there is a MNIST showcase on the home page and play around there.
If you want to train a model to recognize MNIST handwritten digits by yourself, you could try the examples below.
| Name | Description | Open in Colab |
|---|---|---|
| mnist-image-classification | pipeline for classific MNIST image classification problem. | N/A |
| databinding-image-classification | pipeline example to train the image classification task which is to classify imgcook databinding pictures. |
|
| object-detection | pipeline example to train object detection task which is for component recognition used by imgcook. |
|
| text-bayes-classification | pipeline example to train text classification task with bayes | N/A |
See here for complete list, and it's easy and quick to run these examples. For example, to do a MNIST image classification, just run the following to start the pipeline:
$ pipcook run https://cdn.jsdelivr.net/gh/alibaba/pipcook@main/example/pipelines/image-classification-mobilenet.json -o outputAfter the above pipeline is completed, you have already trained a model at the current output/model directory, it's a tensorflow.js model.
Clone this repository:
$ git clone [email protected]:alibaba/pipcook.gitInstall dependencies, e.g. via npm:
$ npm installAfter the above, now build the project:
$ npm run build- Developer Documentation English | 中文
- Project Guide
Or searched via the group number: 30624012.
Download DingTalk (an all-in-one free communication and collaboration platform) here: English | 中文
pipcook's People
Forkers
jingwhale lanboss anyexinglu regexp-lin nannan9507 nkgfirecream weixuefeng sdgdsffdsfff yorkie gindis rickycao-qy qc-l ai-ml-projects imgcook-admin leowang721 yuxizhe haozi torns 353170753 imsobear wanghongli145 sandy4321 mickeymouse-lh liangzr jihangguo thomasyxy stjordanis nss6000 gitter-badger dingxinh5publicity yidianier yfcck sinoon heluwe rickyes alraja mengfangui fuying1975 jabez128 xiaoyi-tyut lizike20031423 liruiqing linzuxin joker-jelly nerffei lewis617 lumierex xhcom-ui sirm2z lijiajunxs macbesu forksource hellomike wufuguo0213 imgcook hyqapple dileep8014 keyzf candyqiu meinicheng eliyao alfex4936 imthunder ederzz jason-kid wk19921225 sawravchy sunskyor future88story anseldai devilyouwei awesomemachinelearning qijingyu2013 yhua123 ly15927086342 duanyou hhy5277 wenheli evanoxu csbbaa eos-octopus rotate-life winning1120xx lingxyz 8847141 liubin1777 baobao12356 lhongjum 00mjk nyhxiaoning hongyin163 upupzealot holanlan laofo havefive zhirongyuan mowatermelon chenjiayuan195 appfws isletpipcook's Issues
meta: how to work with tensorboard
TensorBoard is TensorFlow's visualization toolkit, it can do many visualization works as https://www.tensorflow.org/tensorboard described. To use this toolkit with the model, the easy way is to use tf.keras.callbacks.TensorBoard with model.fit's callbacks, and the other ways are under the tf.summary.
There are two ways to use TensorBoard in Pipcook:
- use python-node to use the above Python methods
- tfjs-node adds the TensorBoard at: tensorflow/tfjs-node#202
Which way do you think is better? @utkobe @wordcount
Some functions could be reused between plugins
There some functions could be reused between plugins, e.g.:
function MakeWordsSet(words_file: string): Promise<Set<string>> {
...
}
this 'MakeWordsSet' in plugin model-define and model-evaluate are same and it can be reused.
So could we build a mechanism for reuse that same function between plugins?
build: add a new workflow for release
Proposed a new way to release Pipcook as:
- create a new release via https://github.com/alibaba/pipcook/releases/new
- add a workflow which listens the release event, and do version bumps and add assets to the drafted release.
See https://github.com/actions/create-release for more details about how to release with GitHub Actions.
meta: lint.
use lint to standardized the code.
windows, doc: the invalid path fail to checkout
I tries to add windows build at #22, how it fails with the below error:
git checkout --progress --force de57bf3f3a255524863363ff1a8f2132b3541031
error: invalid path 'docs/doc/How to develop a plugin?-en.md'
Removed matchers: 'checkout-git'
##[error]Git checkout failed with exit code: 128
##[error]Exit code 1 returned from process: file name 'c:\runners\2.165.2\bin\Runner.PluginHost.exe', arguments 'action "GitHub.Runner.Plugins.Repository.v1_0.CheckoutTask, Runner.Plugins"'.core: remove `any` types as possible
The any is not recommended in TypeScript, feel free to open PR to help us remove one or them :)
boa: add more function tests for Python Standard API
I can't init the project! Who can help me
I use archlinux, nvm, node 13.13.0, python 3.8.3
~/CODE/pipcook-example » pipcook init han@archlinux
? which client do you want to use? npm
⠋ installing pipcook[..................] / rollbackFailedOptional: verb npm-sess
> @pipcook/[email protected] install /home/han/CODE/pipcook-example/node_modules/@pipcook/boa
> node-gyp rebuild
gyp WARN install got an error, rolling back install
gyp ERR! configure error
gyp ERR! stack Error: getaddrinfo EAI_AGAIN nodejs.org
gyp ERR! stack at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:66:26)
gyp ERR! System Linux 5.6.5-arch3-1
gyp ERR! command "/home/han/.nvm/versions/node/v13.13.0/bin/node" "/home/han/.nvm/versions/node/v13.13.0/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /home/han/CODE/pipcook-example/node_modules/@pipcook/boa
gyp ERR! node -v v13.13.0
gyp ERR! node-gyp -v v5.1.0
gyp ERR! not ok
npm WARN [email protected] No description
npm WARN [email protected] No repository field.
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! @pipcook/[email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the @pipcook/[email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /home/han/.npm/_logs/2020-04-21T02_58_25_001Z-debug.log
✖ install Error: Command failed: npm install @pipcook/boa --save error
build: add a workflow to build daily image(docker)
Currently our docker installation is required to build the docker image by users, this is not easy enough for this use case. A new workflow to build daily image, and help to push to hubs, so that user is able to pull the image to run directly :)
core: remove type assertion as possible
Type assertion allows us to override TypeSciprt's inference, and we should avoid using them to solve type conflicts manually, which means we should be careful with the use of assertions
Refer to:
meta: discuss how to use python for plugin author
We'll have a builtin Python integration called BOA, and it's a Node.js binding, so it's not going to do any change for installing Python packages.
So I propose that we will have the some commands for managing Python's packages, and append the corresponding PYTHONPATH on running Pipelines.
boa: fail to install @pipcook/[email protected] independently
Want to use @pipcook/[email protected] singly, but 'Failed at the @pipcook/[email protected] preinstall script.'
node v12.4.0
npm 6.9.0
python 3.7
macOS
'Python.h' file not found.
Problem
for boa version is 0.5.2, it will throw an error: "'Python.h' file not found" when run 'npm install @pipcook/boa' or 'pipcook init':

Solved
This version of boa(0.5.2) is dependents Python and Homebrew, so you need:
1、checking 'brew' was installed in your mac
2、checking 'Python3' was installed in your mac by run
$ brew --prefix python\@3and you would found Python3 was installed, then checking:
$ `brew --prefix python\@3`/Frameworks/Python.framework/Versions/3.7/include/python3.7mif it is not print effective info, it means you need install Python3 from
$ brew install python@3pipeline: missing modelId and modelPath for ModelDefine plugin
current execution component: modelDefine
{
pipelineId: 'e8e2cbf0-7e12-11ea-9cb2-8be7a2c34799',
modelDir: '/Users/yorkie/workspace/pipcook/pipcook-output/e8e2cbf0-7e12-11ea-9cb2-8be7a2c34799/model',
dataDir: '/Users/yorkie/workspace/pipcook/pipcook-output/e8e2cbf0-7e12-11ea-9cb2-8be7a2c34799/data'
}
Component modelDefine error:
TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received undefinedmeta: release lifecycle
The following are the questions associated with the release lifecycle and my answers:
Time-based or feature-based release?
Pipcook uses semver for release management, so the release generally refers to Major, Minor, and Patch.
For the major versions, time-based release is often after the software and ecosystem are relatively mature, so major will use feature-based. Each major version often needs to define the features of that version first, and then define the milestones to complete.
For the minor version, under the major version target is basically determined, the iteration period could be relatively fixed, so it can be time-based. Therefore, we can define a minor version as 2 weeks and an even number as stable.
For the patch version, it basically fixes the existing versions, so it needs to be divided into different situations to discuss. For the latest version (such as 0.7.x), we can decide whether to release it based on whether there are daily bugfixes. For historical versions (such as 0.3.x, 0.4.x), we need to determine the severity of the bug, the relevance of the project, and community feedback to operate manually.
Provide LTS(long term support) version
No, need to wait until the software matures, currently there is not enough people to maintain, but can provide patches for a specific version range.
Release automatically
Major: release maintainer, minor: release maintainer, patch: ci + release maintainer
@utkobe @wordcount Feel free to create new question and give your answers :)
boa: make benchmarks for boa(js) and cpython
The benchmarks are at https://github.com/python/pyperformance/tree/master/pyperformance/benchmarks.
The tutorial content 'Get started with Pipeline API' does not exist.
hi, all:
I can't access this link: https://alibaba.github.io/pipcook/#/tutorials/get-started-with-pipeline-api
Is this page removed?
meta: v1.0 roadmap
The following is what exactly we want Pipcook v1 looks like:
We will introduce a new and unified layer for end-user that's so-called "Pipcook ML App", its goal is to simplify the developing machine learning applications, for more details you can go #33.
Now Pipcook is to be runtime for building ML applications, rather than a Node.js library. So we also make original pipcook-core to be a Daemon Process, it's used to operate plugins, and manage Pipelines via a new declarative language Pipeline IR.
To make sure what every plugin could run in more safety and powerful, Pipcook v1 is also to add the Plugin Runtime, which should be a standalone process/thread (via configuration), and there are the followings for plugin developers:
- ML API: it's the low-level machine learning APIs.
- Builtin Tensorflow.js library.
- Python APIs via Node.js lets you call other Python functions in JavaScript seamlessly.
Okay, let's see what the detailed v1 milestone tasks will be:
- CI/CD workflow.
- pull request workflow.
- GitHub Actions.
- unit testing framework.
- Solid documentation.
- refactoring documentation build scripts #39.
- tutorials includes: ML App, Pipeline and Plugin.
- typedoc includes: ML App APIs and Plugin APIs.
- Experiment ML App
- definition of APIs and usage #33.
- pipelines generator for training/serving.
-
Pipboard supports running ML App. -
Pipboard supports preparing the dataset/samples.
- Stable Pipeline.
- decoupling the plugins at different stage.
- using a static DSL to represent a pipeline, and it's also the IR for the high-level ML App layer.
- refactoring for Pipcook daemon architecture for deploying online service.
- HTTP APIs.
- using HTTP APIs at CLI.
- preparing the v1 builtin plugins list and complete them.
- Stable Plugin APIs, how to valid a plugin works.
- plugin specification.
- plugin runtime.
- provide the ml-base APIs.
- provide the bridge(boa) to Python's library.
- Pipboard: a Web application for using Pipcook:
-
Pipboard Extension. - interactive GUI to CRUD pipelines.
-
plugin discovery and configuration. - ability to run pipeline.
-
ability to run ML App.
-
core: use ES6 modules instead of calls to "require"
Checking model accuracy for every merge.
Does pipcook board need to provide Chinese version?
Docs of pipcook provide two versions: english & 中文, and does board need?
meta: define the spec of Pipcook Daemon
We have introduced Pipcook Daemon at #30, now let's define the Pipcook Daemon in details.
- Define the declarative communication, and we have the following alternatives:
- HTTP
-
GRPC -
Node.js IPC (need to support for RPC mode)
- Daemon SDK
- Pipeline
- CRUD, operations for pipeline
- Run
- Query
- Plugin Management
- Check, this checks if a plugin is able to install.
- Install, this installs via the given plugin URI.
- Run, this runs the given plugin with an input.
- Test, this tests the given plugin with an input and expectations.
- Log, this returns a readable stream for reading logs.
- Pipeline
- CLI
- Current implementation to use Daemon SDK.
- Support for connecting to Pipcook Daemon remotely.
- Support for Plugin Dist, including Plugin APIs and Plugin Pack.
- Internal works
- ML Metadata, for managing the Pipeline objects.(https://github.com/google/ml-metadata)
- Query Language:
- SQL, standard query language, and supported by many products like mysql, psql, sqlite and flink.
-
KV-based just like Redis and LevelDB, but more works to be done for applications.
- Storage, we need 2 modes "local" and "remote" for standalone and distributed architecture.
- local: sqlite.
- remote: psql.
- Query Language:
- Pipeline IR
- IR Specification
- Root fields
- Property "id", to specify the pipeline object.
- Property "plugins" to config plugins.
- Features
- Easy to diff changes.
- Easy to convert sequence for interrupter.
- Pipeline Recovery must be supported.
- Root fields
- VM
- Interrupter for the IR
- OP Code transfer to Plugin Internal Call
- IR Specification
- Plugin Internals.
- PluginRT Bootstrap
- Seed Process for speeding up.
- Lifecycle:
- Create a runnable lock.
- Initialize Boa Environment.
- Install Python Dependencies.
- Install PluginRT supporting files.
- Call an init event to PluginRT.
- Call plugin with an given input.
- Release the runnable lock.
- Plugin Internal Calls
- 0x30 start, this starts a new PluginRT process.
- 0x31 read, this reads an event from PluginRT.
- 0x32 write, this writes an event to PluginRT.
- 0x100 compile, reversed for future AOT of PluginRT.
- Scheduler, this is for scheduling jobs for self-managed plugins and components.
- PluginRT Bootstrap
- ML Metadata, for managing the Pipeline objects.(https://github.com/google/ml-metadata)
meta: proposals for customized UI plugin for Pipboard
As discussed with @yorkie @wordcount , we would allow users to develop user-interface plugins to extend the ability of Pipboard. This issue will discuss the specification for ui-plugin.
Built-in Pipboard
Pipcook will provide a built-in web launcher and we name it as Pipboard as discussed in #29 , Basically default Pipboard will provide access to build pipeline, check logs and models. The basic structure is shown as below:
Customized UI Plugin
Customized user-interface plugin still belongs to pipcook plugin ecosystem (refer to #17 ). Currently plugins are categorized into:
- data collect
- data access
- data process
- model load
- model train
- model evaluate
- model deploy
Now for UI plugin, we will add the 8th category:
- user interface
After the UI plugin is used, Pipboard will incorporate its user-interface into itself. For example, if a plugin called 'customize-demo' is developed, Pipboard will be shown as below and the content area will show contents of this plugin.
UI Plugin Developer Guide
The specification for UI plugin is as follows:
- MUST be a NPM package
- MUST have the structure as follows:
- build (front-end codes after built with package tool)
- src [optional] (your source codes for user interface)
- config.json (config file. Configure your plugin name to be shown in tab)
- package.json
- index.js
- SHOULD have a readme for us to understand
In case the UI will need to do some basic operations on local file system, we will provide several basic APIs to achieve this:
- API to get info about local models
- API to get info about local training history logs
- API to read a file from local system
- API to write a file from local system
How to use UI Plugin
The user can follow these steps to use a specific plugin
- install UI plugin npm package into working space
- use command interface
pipcook board --plugin=<npm package name>
I will provide a example plugin later on
Please help me have a review on this proposal @yorkie @wordcount
boa: list of python libraries that should be built-in installed in boa
To let users use boa and python ecosystem easily, boa should install some libraries by default so that the users can feel like boa support those functions by default. Currently only numpy is installed.
Suggestions:
- numpy
- matplotlib
- opencv
- pandas
- scipy
- skicit-lerna
- tensorflow
- pytorch
- Pillow
- nltk
- jieba
These eleven libraries include image-processing, math, machine learning and deep learning.
Please have a review please @yorkie
pipcook should looks like a community product?
I means that pipcook contains many modules in its path(package)like app/cli/core and board clint/server,do not you think it's confusing?That make pipcook looks like a aggregate of many libs.
OK I would told my opinion:
For example,pipcook-board-server,a server for board contains few apis but build from egg,as every one knows that egg is an awesome framework for engineering in back end but may put fine timber to petty use,we could streamlining the board server use a lightweight framework.
In short,I proposal we need to simplify pipcook structure by using lightweight framework,remove redundancy files,planing development mode for plugin,etc.:)
core: use TypeScript Record<K,T> instead of custom object.
Inspired at #140 (comment), and see https://www.typescriptlang.org/docs/handbook/utility-types.html#recordkt for Record<K, T>.
Help us to improve, thank you :)
cli: rewrite in TypeScript
Pipcook is using TypeScript to write our core, builtin plugins and tools, to keep the consistence of source code, it's great to use TypeScript for our cli, too.
If you are familiar with TypeScript, feel free to help us with this :)
doc: optimize the guide for beginners and developers
Currently the document is still in progress, we can optimize documents from these perspectives:
- Currently the 'getting started' doc is too simple and naive, the getting started document should include
- How to use pipcook
- How to organize data
- How to choose plugin
- How to deploy
- Currently the developer guide is not clear enough, it should include:
- How to init the plugin developer environment
- How to write the plugin
- How to publish the plugin
- There is no doc / api reference for plugins themselves, so that the users are not clear what plugins we have and what parameters should be includes for specific plugin. We need to optimize plugin doc
- Following above, probably it's good to have appropriate comments in plugins so that the doc can be generated to some extent
build: add a new action for validating the user scenarios
We have received some feedback when using the CLI to initialize plugin development, so I think we need to add test cases for these specific user scenarios in addition to ensuring unit testing, I have compiled a list, if you need to add, you can also comment, too:
- Install Pipcook
- Initialize Pipeline Project
- Initialize Plugin Project
- Run Pipeline Project
- Run Plugin Project
meta: proposal of Pipboard functional documentation
Discussed with @wordcount @utkobe, we rename the Web Launcher to Pipboard, it's more meaningful. And we have more details to be clear at this meeting.
Software architecture: Client/Server
Tech stacks: client(alibaba/ice), server(midwayjs/midway)
Pipboard is used for pipeline management in GUI, plugin configuration and model visualization, and we decide to extend the Pipboard from the pipcook-cli's board sub-command, which is able to view the training progress.
Otherwise, for extending the data-collect plugins in Web way, we just add a plugin property: "web", which represents this plugin is a Web plugin. And Web plugin could only be inserted to a pipeline at Pipboard, and this type of plugin actually is Web project that must contain an index.html as the entrance, then Pipboard will open that in a new sandbox <window>.
PipApp: the application framework for machine learning
The vision of Pipcook is to take the JavaScript developers and engineers into the world of machine learning quickly and seamlessly, then we're responsible for creating easy enough APIs.
In the Pipcook stack, the pipcook-app is to be defined the ML application, which abstracts some duplicated stuffs and hides low-level algorithm implementation which requires a learning curve for every ML rookie.
APIs
Every module represents a type of dataset, and basically we provide some different methods for developers.
module ml
This module is to create machine learning functions, it provides the core abilities to represent your machine learning application in an intuitive way.
interface ml.Function
To hide the ML details as possible, Pipcook lets your declare your functions for machine learning purpose in a specific type ml.Function, you can create a ml.Function via the following create() function.
Internally, the Pipcook compiler parses the applications, then generates the training code via the ml.Function instances, and replaces these slots with model generated inferences.
interface ml.FunctionImpl(arg: data.MLType)
This interface is to describe the internal machine learning internals for applications, and it accepts an argument in data.MLType as the input, however the output's type is not required.
create(fn: ml.FunctionImpl): ml.Function
This is to create the above ml.Function with a ml.FunctionImpl object.
const mlfunc: ml.Function = ml.create((input: ml.ImageType) => {
// call other ML Application APIs here and return
});
// ...
mlfunc(new ml.ImageType(...)); // call this function anywhere.
module data
This module is to declare all types for your application's I/O.
interface data.MLType
It's the base interface to tell the Pipcook compiler a type for ML.
interface data.ImageType extends data.MLType
It represents the image type for given ml.Function I/O.
interface data.TextType extends data.MLType
It represents the text type for given ml.Function I/O.
module vision
This module provides vision-related functions like image classification and object detection.
interface vision.Position2D
it represents the position in 2d for object detections:
label{string} the label string represents the object's type.left{number} the left of detected object in pixel.top{number} the top of detected object in pixel.height{number} the height of detected object.width{number} the width of detected object.
classify(img: ImageType): string
It recognizes the type of image, and returns the type string.
ml.create((img: data.ImageType) => {
const label = vision.classify(img); // returns the label
});
detect(img: ImageType): vision.Position2D[]
It detects target from a single image, and returns the position and label of detected objects.
ml.create((img: data.ImageType) => {
const objects = vision.detect(img);
objects.forEach((o) => {
console.log(o.label, o.left, o.top); // prints the label, left and top.
});
});
module nlp
This module provides NLP-related functions like text classification and clustering.
interface nlp.Cluster
label{string} the label for this cluster.items{string[]} the strings in this cluster.
interface nlp.ClusteringResult
clusters{nlp.Cluster[]} all grouped clusters, and each is an object ofnlp.Cluster.noises{string[]} all labeled noises strings.
classify(input: string): string
it recognizes the type of text, and returns the type string.
clustering(inputs: string[]): nlp.ClusteringResult
it clusters all types of given inputs, and returns the result in nlp.ClusteringResult.
Anti-APIs
The anti-API means the API must be hidden under the application user, there is a list here:
- hide the training workflow, therefore some interfaces to train and predict should be invisible.
- hide the dataset workflow, in the future, developer uses a tool for dataset processing and validation.
- hide the model-related APIs: graph structure, parameters and model validation.
- hide the serving implementation, every ML application should be serve-able in
pipcook-app, thus we don't any other APIs for serving models specially.
Example
// example.ts
import { ml, vision, data } from '@pipcook/pipcook-app';
class MyImage extends data.ImageType {
constructor(x, y, buffer) {
super(x, y, buffer, 100, 100);
}
}
const listAvatars: ml.Function = ml.create((img: MyImage) => {
const components = vision.recognizeComponent(img);
if (!component)
return false;
components.map((item: UIView) => {
const img = item.toImage() as UIImage;
return vision.detectFace(img);
}).filter((avatar: data.FaceType) => {
return avatar !== null;
});
});
// use the listAvatars function for your use
const app = express();
app.get('/', (req, res) => {
const img = new MyImage(req.body.x, req.body.y, req.body.buffer);
res.json(listAvatars(img).toJSON());
});
Then run the following commands to train:
$ pipcook train example.ts --epoch=5 --no-validation
generated the model at example.ts.im
And run your ML application:
$ pipcook try example.ts
$ pipcook deploy example.ts --eas=xxxproposal to add some labels to manage our issues
The labels list are the following:
pipcook-core: the pipcook-core issuesbuild: the build(CI/CD) issuesplugin: the plugin issuestests: the test issuesmodel: the model issues
@utkobe may I ask you to have review the above, I will do an operation after getting approvals of you.
meta: about project scope, contributions and plugin ecosystem
A clear project scope can help us make better choices. Here we will clearly define the source code, configuration and documentation parts that need to be included in Pipcook. This project Pipcook as an open source project, we should welcome different types of contributions at different levels, which will include the scope of the project mentioned earlier. The last discussion is about the plugin ecosystem, I will describe the unit organization structure of our plugin ecosystem and how to integrate it with NPM and JavaScript to develop together.
Project scope
The project "Pipcook" software includes the followings:
- source code of the framework, high-level apis, command-line tools and builtin plugins.
- documents and specifications of framework, high-level apis, command-line tools and built plugins.
- a Web launcher for plugins discovery, dataset selection, pipeline creation, model deployment, and visualization.
The plugin plays an important role in this project, pipeline does schedule some of plugins which are wrapped as component and working together to output the model or service to deploy. Each plugin needs to follow the below:
- MUST be a NPM package, which means some files of
package.jsonand a main file, TypeScript(*.ts) is recommended by default. - SHOULD have a README for introducing the plugin.
- SHOULD have
tsdoc/jsdocannotations or HTML version for API references. - SHOULD have unit tests for code quality.
Contributions & Contributors
After understanding the project scope and plugins, let's take a look at what types of contributions and contributors pipcook will accept as an open source project.
- contribution to web launcher
- contribution to command-line tools
- contribution to framework and high-level apis
- contribution to built-in plugin
In addition to the above, we'll describe user-land plugin at the section "plugin ecosystem".
Each contribution mentioned above MUST follow these rules:
- contributor submits a pull request to describe the technical details.
- changes in this pull request include some of source code, document and configuration.
- changes in this pull request pass all the related build instructions.
- changes in this pull request receive over 1 approval from project collaborators.
- changes of framework and high-level apis does require core collaborators' approvals.
- changes of built-in plugin does require the built-in plugin collaborators' approvals.
We have also classified the contributors as follows:
- contributor: someone who has the contributions in the project scope.
- collaborator: project maintainer who does make improvements, fix bugs, and review pull requests.
- core collaborator maintains all the project scope, focusing on framework, high-level apis and release management.
- built-in plugin collaborator maintains specific one or more built-in plugins.
Plugin ecosystem
The composition and requirements of the plugin was mentioned in the previous chapter, so here we will define some rules between plugins, namely plugin ecosystem.
From the maintainer's perspective, plugins can be divided into built-in and community ones:
- built-in plugins are maintained by core collaborators and released with the Pipcook.
- each community plugin is maintained and released by the author himself/herself, Pipcook can download the specified plugins through git, npm or oss.
- private plugin is maintained by private organization or company itself.
To help Pipcook discover all the plugins, the project provides some rules to let the Web launcher discover community ones:
- add GitHub topic "pipcook-plugin", see https://github.com/topics/pipcook-plugin.
- add "pipcook-plugin" in the
package.json's "keywords", see https://www.npmjs.com/search?q=keywords:pipcook-plugin. - create a pull request to add the plugin URI by updating the
COMMUNITY_PLUGINS.md. - (to be added).
Community plugins can also be submitted as built-in plugins through pull requests, but this requires nomination by a core collaborator and the approvals of at least 2 collaborators.
plugin: help us to rewrite python-based files in boa
We still have some Python source code even though Boa is integrated, let's rewrite them:
- https://github.com/alibaba/pipcook/tree/master/packages/plugins/model-define/bayesian-model-define/src/assets
- https://github.com/alibaba/pipcook/tree/master/packages/plugins/model-evaluate/bayesian-model-evaluate/src/assets
- https://github.com/alibaba/pipcook/tree/master/packages/plugins/model-train/bayesian-model-train/src/assets
meta: define Plugin Runtime
The ML low-level API is the basic layer that provides the basic ML power for plugin developers.
- datasets
- split
- shuffle
- sample
- model
- cv
- gray
- random
- resize
- nlp
- tokenize
- tf/idf
- cv
- validation
- ar/ap
- confusion matrix
- mAP
- roc/auc
- utils
- download
- zip/unzip
boa: cli to generate the typings for current python env
The difficulty to write boa program is to discover the Python's ecosystem, maybe typings would resolve this.
Also, the pypi files in https://github.com/python/typeshed are able to be used for generating typings.
"Pipcook init" node-gyp rebuild error
Error: ENOENT: no such file or directory, open '/Users/xxx/pipcook-examp'
boa: API coverage report about specific python version
See https://github.com/python/typeshed, could check the compatibility of the Python standard library with pyi in this repository.
model: example pipeline for image classification's accuracy is low
The training logs:
Epoch 1 / 15
eta=0.0 =====================================================================================================>
195228ms 480858us/step - acc=0.163 loss=2.67 val_acc=0.169 val_loss=2.77
Epoch 2 / 15
eta=0.0 =====================================================================================================>
191213ms 470967us/step - acc=0.264 loss=2.44 val_acc=0.266 val_loss=2.91
Epoch 3 / 15
eta=0.0 =====================================================================================================>
200318ms 493394us/step - acc=0.255 loss=2.56 val_acc=0.233 val_loss=3.34
Epoch 4 / 15
eta=0.0 =====================================================================================================>
204292ms 503182us/step - acc=0.248 loss=2.88 val_acc=0.232 val_loss=3.88
Epoch 5 / 15
eta=0.0 =====================================================================================================>
203780ms 501921us/step - acc=0.246 loss=3.29 val_acc=0.232 val_loss=4.45
Epoch 6 / 15
eta=0.0 =====================================================================================================>
200810ms 494607us/step - acc=0.246 loss=3.69 val_acc=0.232 val_loss=4.89
Epoch 7 / 15
eta=0.0 =====================================================================================================>
198668ms 489329us/step - acc=0.246 loss=4.05 val_acc=0.232 val_loss=5.15
Epoch 8 / 15
eta=0.0 =====================================================================================================>
197658ms 486843us/step - acc=0.246 loss=4.31 val_acc=0.232 val_loss=5.48
Epoch 9 / 15
eta=0.0 =====================================================================================================>
198095ms 487918us/step - acc=0.246 loss=4.48 val_acc=0.232 val_loss=5.51
Epoch 10 / 15
eta=0.0 =====================================================================================================>
196179ms 483200us/step - acc=0.246 loss=4.58 val_acc=0.232 val_loss=5.66
Epoch 11 / 15
eta=0.0 =====================================================================================================>
188978ms 465462us/step - acc=0.246 loss=4.62 val_acc=0.232 val_loss=5.61
Epoch 12 / 15
eta=0.0 =====================================================================================================>
190650ms 469582us/step - acc=0.246 loss=4.62 val_acc=0.232 val_loss=5.64
Epoch 13 / 15
eta=0.0 =====================================================================================================>
195142ms 480645us/step - acc=0.246 loss=4.60 val_acc=0.232 val_loss=5.61
Epoch 14 / 15
eta=0.0 =====================================================================================================>
199023ms 490203us/step - acc=0.246 loss=4.57 val_acc=0.232 val_loss=5.55
Epoch 15 / 15
eta=0.0 =====================================================================================================>
195624ms 481832us/step - acc=0.246 loss=4.53 val_acc=0.232 val_loss=5.56
current execution component: modelEvaluate
evaluate result: {
loss: Float32Array(1) [ 6.380456924438477 ],
accuracy: Float32Array(1) [ 0.12019230425357819 ]
}
To reproduce the problem, just run the pipeline.
Boa: magic functions are not working as expected
This bug is likely related to #61
Now calling magic function directly is working file. But use system built-in function to call it is giving errors. For example
const boa = require('../packages/boa');
const torch = boa.import('torch');
const {len} = boa.builtins();
class customDataset extends torch.utils.data.Dataset {
__len__() {
return 5;
}
__getitem__(index) {
return 2;
}
}
const dataset = new customDataset();
console.log(dataset.__len__()); # This is wokring fine
console.log(len(dataset)); # This gives errors
Accordingly in raw python, this is working:
import sys
import torch
class customDataset( torch.utils.data.Dataset ):
def __len__(self):
return 5
def __getitem__(self, index):
return 2
dataset = customDataset()
print(dataset.__len__())
print(len(dataset))build: create a new GitHub Action for running pipelines daily
We can create a new GitHub Action which is daily running the specific pipelines, and give the validation result at our documentation, it might be an easy way to tell users our model performance and show what's a Pipeline.
不论是本地安装cli还是docker安装都会报错
根据示例,在目录文件执行 pipcook init 时
# pipcook init
internal/streams/legacy.js:59
throw er; // Unhandled stream error in pipe.
^
Error: connect ECONNREFUSED 151.101.228.133:443
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1129:14) {
errno: 'ECONNREFUSED',
code: 'ECONNREFUSED',
syscall: 'connect',
address: '151.101.228.133',
port: 443
}
但是pip-project文件确实可以成功生成,
然后执行 node examples/pipeline-mnist-image-classfication.js时
internal/modules/cjs/loader.js:800
throw err;
^
Error: Cannot find module '@pipcook/pipcook-core'
Require stack:
- /document/pipcook-project/examples/pipeline-mnist-image-classfication.js
at Function.Module._resolveFilename (internal/modules/cjs/loader.js:797:15)
at Function.Module._load (internal/modules/cjs/loader.js:690:27)
at Module.require (internal/modules/cjs/loader.js:852:19)
at require (internal/modules/cjs/helpers.js:74:18)
at Object.<anonymous> (/document/pipcook-project/examples/pipeline-mnist-image-classfication.js:23:86)
at Module._compile (internal/modules/cjs/loader.js:959:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:995:10)
at Module.load (internal/modules/cjs/loader.js:815:32)
at Function.Module._load (internal/modules/cjs/loader.js:727:14)
at Function.Module.runMain (internal/modules/cjs/loader.js:1047:10) {
code: 'MODULE_NOT_FOUND',
requireStack: [
'/document/pipcook-project/examples/pipeline-mnist-image-classfication.js'
]
}
Could we provide a source easier to download libtensorflow?
It is difficult to download libtensorflow when user init project, install dependents and so on because of the reason of network as everyone knows;
Could we provide a source easier to download it, and give an appropriate way to set it? :)
太好了,终于等到了
meta: Would it add benchmark and more standardized unit test?
pipboard: integrate facets for sample visualization
https://pair-code.github.io/facets/ is for checking common problems in data/sample, so we could have a facets extension to achieve sample visualization.
core, cli: use debug instead of raw console
We have some debug logs, it's better to use debug instead.
boa: tc39 proposals to improve the usage
We just list the followings as those proposing ES features that helps JavaScript to more readable like Python:
- https://github.com/tc39/proposal-slice-notation
- https://github.com/tc39/proposal-record-tuple
- https://github.com/littledan/proposal-operator-overloading/
- https://github.com/littledan/proposal-bigdecimal
And we have proposals:
Boa: support of magic function overriding
Currently Boa does not support function overriding for those magic functions in python. Just open this issue to track the progress of this work.
For example
const boa = require('../packages/boa');
const sys = boa.import('sys');
const torch = boa.import('torch');
class customDataset extends torch.utils.data.Dataset {
__len__() {
return 5;
}
__getitem__(index) {
return 2;
}
}
const dataset = new customDataset();
console.log(dataset.__getitem__(2));
now it gives error
Error: NotImplementedError:
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
OpenClaw
Personal AI Assistant
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.





