Coder Social home page Coder Social logo

mlcommons / ck Goto Github PK

View Code? Open in Web Editor NEW
585.0 51.0 109.0 33.63 MB

Collective Mind (CM) is a small, modular, cross-platform and decentralized workflow automation framework with a human-friendly interface and reusable automation recipes to make it easier to build, run, benchmark and optimize AI, ML and other applications and systems across diverse and continuously changing models, data, software and hardware

Home Page: https://access.cKnowledge.org

License: Apache License 2.0

Shell 3.47% Python 87.20% PHP 0.23% HTML 2.44% Batchfile 0.80% Logos 0.01% JavaScript 0.01% Assembly 0.01% R 0.17% HCL 0.05% Faust 0.01% 1C Enterprise 0.01% Euphoria 0.01% C++ 2.83% C 1.15% Cuda 0.05% Makefile 0.01% Dockerfile 1.55% Java 0.02% TeX 0.03%
productivity automation portability reusability collaboration modularity mlops devops workflow-automation best-practices human-readable-interface scripts cross-platform virtualization

ck's Introduction

PyPI version Python Version License Downloads

arXiv CM test CM script automation features test

About

Collective Knowledge (CK) in a community project to develop open-source tools, platforms and automation recipes that can help researchers and engineers automate their repetitive, tedious and time-consuming tasks to build, run, benchmark and optimize AI, ML and other applications and systems across diverse and continuously changing models, data, software and hardware.

CK consists of several ongoing sub-projects:

Incubator

We are preparing new projects based on user feedback:

License

Apache 2.0

Documentation

MLCommons is updating the CM documentation based on user feedback - please check stay tuned for more details.

Citing CM

If you found CM useful, please cite this article: [ ArXiv ], [ BibTex ].

You can learn more about the motivation behind these projects from the following articles and presentations:

  • "Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments": [ ArXiv ]
  • ACM REP'23 keynote about the MLCommons CM automation framework: [ slides ]
  • ACM TechTalk'21 about automating research projects: [ YouTube ] [ slides ]

Acknowledgments

Collective Knowledge (CK) and Collective Mind (CM) were created by Grigori Fursin, sponsored by cKnowledge.org and cTuning.org, and donated to MLCommons to benefit everyone. Since then, this open-source technology (CM, CM4MLOps, CM4MLPerf, CM4ABTF, CM4Research, etc) is being developed as a community effort thanks to all our volunteers, collaborators and contributors!

ck's People

Contributors

ailurus1 avatar alered01 avatar anandhu-eng avatar arjunsuresh avatar ctuning-admin avatar davegreasley avatar dsavenko avatar ens-lg4 avatar gfursin avatar hanwenzhu avatar himanshu-dutta avatar interestinglsy avatar jdesfossez avatar makaveli10 avatar maximallnyi avatar morphine00 avatar nacc avatar nathanw-mlc avatar nijoj avatar psyhtest avatar raduetsya avatar sennikovandrey avatar slahiruk avatar xintin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ck's Issues

Open issues in other CK repositories

CK itself is now stable. Most of other functionality is implemented in other CK repositories. Here are links to open issues in other major CK repositories:

Non-uniform support for ISO time format in CK browser

I have performed a HOG experiment and recorded its data into a local repository. I can view the created entry via a local webserver (http://localhost:3344/?wcid=experiment:) and on disk (ck find experiment:* --tags=hog). It is timestamped at 2015-09-04 13:29:20.

However, pasting this timestamp into the "After date (ISO)" filter in the CK browser (to filter out all but this entry) doesn't work - I need to replace the space in the timestamp with the 'T' character: 2015-09-04T13:29:20

Now, Wikipedia says: "It is permitted to omit the 'T' character by mutual agreement." But the CK browser should either print the date format with 'T' or accept it without 'T'.

Simplify using SSH as Git protocol

Currently the user can specify that she wants to use the SSH protocol by providing --url=<...> when cloning e.g.

 $ ck pull repo:ctuning-programs [email protected]:ctuning/ctuning-programs

Since CK is hosted on GitHub, this could clearly be made less burdensome and error-prone e.g.

 $ ck pull repo:ctuning-programs --ssh

renew repo if too polluted

Sometimes CK repos pulled from GIT become very polluted with local changes. In such case, they can be full removed using "ck rm repo:xyz --all" and then pulled back using "ck pull repo:xyz".
It may be nice to add just one function "renew" that would perform these two steps...

Update OpenME

Clean up OpenME (connecting CK to programs written in any language including C, C++, Fortran, PHP, Java, etc) and provide proper Makefiles for Unix and Windows.

It can be used with GCC and LLVM as an event-based plugin framework. It is also used unify plugin-based auto-tuning and run-time adaptation (see http://hal.inria.fr/hal-01054763).

explain exceptions

As one of the feedback, we should write in docs why we rarely use exception and instead use:
r=ck.function({...})
if r['return']>0: return r

(just because it was much easier to handle errors across various languages and not crash in the middle of some other tools or web-services that use CK; it also simplified life of researchers who do not want to deal with exceptions and can easily connect various modules as LEGO without thinking about interfaces/API)...

Compilation of example program on OS X

I am following the basis getting starting guide and am currently trying to compile the Susan program. Unfortunately, compilation fails on OS X (Darwin-14.4.0-x86_64-i386-64bit):

cnugteren$ ck compile program:cbench-automotive-susan --speed
***************************************************************************************
Current directory: /Users/cnugteren/CK/ctuning-programs/program/cbench-automotive-susan/tmp
***************************************************************************************
Resolving dependencies ...
***************************************************************************************
Compiler vars:
  XOPENME=
***************************************************************************************
Executing prepared batch file tmp-PzV7hE.sh ...
CK_CC  CK_OPT_SPEED CK_FLAGS_CREATE_OBJ CK_COMPILER_FLAGS_OBLIGATORY CK_FLAGS_DYNAMIC_BIN {CK_FLAG_PREFIX_INCLUDE}../ {CK_FLAG_PREFIX_VAR}XOPENME {CK_FLAG_PREFIX_INCLUDE}/Users/cnugteren/CK-TOOLS/lib-rtl-xopenme-0.3-gcc-local-macos-64/include  ../susan.c  {CK_FLAGS_OUTPUT}susan.o
./tmp-PzV7hE.sh: line 8: CK_CC: command not found
***************************************************************************************
Compilation time: 0.013 sec.; Object size: 0; MD5: 
Warning: compilation failed!

It seems that several variables starting with CK_ are not properly substituted, perhaps there is a missing $-sign? Or did I forget to do something, perhaps loading some environmental variables? Setting $CK_CC to for example Clang doesn't seem to help.

Configure web front-end via command line interface

I'm working on building Docker images containing CK and its dependencies.

A snapshot of my Dockerfile based on Ubuntu 16.04 is as follows:

FROM ubuntu:16.04
MAINTAINER Anton Lokhmotov

# Install standard packages.
RUN apt-get update && apt-get install -y \
    python-all \
    git

# Install the core Collective Knowledge (CK) module.
ENV CK_ROOT=$HOME/CK/ck CK_TOOLS=$HOME/CK_TOOLS PATH=$CK_ROOT/bin:$PATH

RUN mkdir -p $HOME/CK && git clone https://github.com/ctuning/ck.git $CK_ROOT
RUN mkdir -p $HOME/CK_TOOLS

RUN cd $CK_ROOT && python setup.py install && python -c "import ck.kernel as ck"

# Install other CK modules.
RUN ck pull repo:ck-web

# Listen on the standard CK port.
CK_PORT=3344
EXPOSE $CK_PORT

# Start the web service.
CMD ck start web --host=`hostname -i` --port=${CK_PORT}

To build an image named ctuning/ck-ubuntu-16.04, run:

$ docker build -t ctuning/ck-ubuntu-16.04 ${DOCKERFILE_DIR}

where ${DOCKERFILE_DIR} is the directory containing the above Dockerfile (e.g. ${CK_DOCKER_DIR}/docker/ubuntu-16.04).

The CK web service can be accessed at http://localhost:3344/ by running the image in a container as follows:

$ docker run --rm -it -p 3344:3344 ctuning/ck-ubuntu-16.04
For now we can only start server indefinitely
but we should add a proper start/stop/resume support at some point ...

Starting CK web service on 172.17.0.2:3344 ...

or at http://localhost:3355/ with a more elaborate command:

$ export WFE_PORT=3355 CK_PORT=3366
$ docker run --rm -it -p ${WFE_PORT}:${CK_PORT} --env CK_PORT=${CK_PORT} \
    ctuning/ck-ubuntu-16.04
For now we can only start server indefinitely
but we should add a proper start/stop/resume support at some point ...

Starting CK web service on 172.17.0.2:3366 ...

That is, the CK web service is running inside the container with --host=hostname -i`` and --port=${CK_PORT}, and can be accessed at `http://localhost:${WFE_PORT}`. So far so good.

Now, suppose I want to access the CK web service running inside a Docker container on http://${WFE_HOST}:${WFE_PORT}. I have port forwarding enabled from ${WFE_PORT} to the ${WFE_HOST} machine where the container is running. Here's what I have to do currently:

$ docker run --rm -it -p ${WFE_PORT}:${CK_PORT} --env CK_PORT=${CK_PORT} \
    ctuning/ck-ubuntu-16.04 /bin/bash
root@cc8a1ff17ae9:/# ck setup kernel --wfe
=======================================================================
Loading current configuration ...
=======================================================================
*** Web front end control (through CK web server or third-party web server and CK php connector) ***

Current web front-end URL prefix: http://localhost:3344/web?
Current web front-end template:   default

Enter new web front-end URL prefix (Enter to keep previous): http://<WFE_HOST>:<WFE_PORT>/web?
Enter new web front-end template (Enter to keep previous): 
=======================================================================
Writing local configuration (directly) ...

Configuration successfully recorded to /root/CK/local/kernel/default/.cm/meta.json ...
root@cc8a1ff17ae9:/# ck start web --host=`hostname -i` --port=${CK_PORT} --use_wfe_url
For now we can only start server indefinitely
but we should add a proper start/stop/resume support at some point ...

Starting CK web service on 172.17.0.2:3366 ...

I have to explicitly type in the values of ${WFE_HOST} and ${WFE_PORT} into http://<WFE_HOST>:<WFE_PORT>/web?, which is cumbersome, error-prone and not easily automate-able.

I propose to enable launching the CK web service as follows:

$  ck start web \
    --host=`hostname -i` --port=${CK_PORT} \
    --wfe_host=${WFE_HOST} --wfe_port=${WFE_PORT}

Then, the Dockerfile file would look something like:

FROM ubuntu:16.04
MAINTAINER Anton Lokhmotov <[email protected]>

# Install standard packages.
RUN apt-get update && apt-get install -y \
    python-all \
    git

# Install the core Collective Knowledge (CK) module.
ENV CK_HOME=/root/CK \
    CK_TOOLS=/root/CK_TOOLS \
    CK_ROOT=${CK_HOME}/ck-core \
    PATH=${CK_ROOT}/bin:${PATH}

RUN mkdir -p ${CK_HOME} && git clone https://github.com/ctuning/ck.git ${CK_ROOT}
RUN mkdir -p ${CK_TOOLS}

RUN cd ${CK_ROOT} && python setup.py install && python -c "import ck.kernel as ck"

# Install other CK modules.
RUN ck pull repo:ck-web

# Set the CK web service defaults.
ENV CK_PORT=3344 \
    WFE_PORT=3344 \
    WFE_HOST=localhost

# Listen on the CK port.
EXPOSE ${CK_PORT}

# Start the CK web service.
CMD ck start web \
    --host=`hostname -i` --port=${CK_PORT} \
    --wfe_host=${WFE_HOST} --wfe_port=${WFE_PORT}

Behavior not clear when adding GIT repositories

From users:

ck add repo:ck-analytics shared=git --quiet
Would you like to reuse them ("yes" or "no"/Enter)?: yes
What if I say no? Will new UID, UOA and "User friendly name" be created?

I think we don't need it - it should always be reused.

The reason is that I plan to add there dependencies on other packages,
i.e. ck-analytics will rely on numpy, scipy, R, and various models.

On Linux, I can actually call apt-get install with those packages to install them automatically...

Print the actual IP address when starting web server

I have successfully used:

ck setup kernel --wfe

to customize the IP address of the web server.

While I can now access the server at this IP address, when it starts it still prints:

Starting CK web service on localhost:3344 ...

which is mildly confusing...

Unifying (Yes/No) questions

Suggestion from Anton:

I suggest that all "Yes/No" questions should be in the standard form:

Y/n (yes is the default answer)
y/N (no is the default answer)

ck cd

Certain CK commands (e.g. find) often return a directory where a user needs to go for further operation. It would be convenient to combine search and change into a single command.

Confused CK status

I'm confused about ck status:

$ ck status
Your version is outdated: V1.6.3
New available version   : V1.6.1

Just execute "ck pull all --kernel" to update CK and all repositories (if you installed CK from GIT) or visit https://github.com/ctuning/ck for more details ...

This could be because I installed CK both from PyPi and in the canonical way (by cloning).

Improving analysis of varation of experiments in math.variation (expected value + confidence interval)

We currently have a simple way of analyzing variation of experimental results in math.variation in ck-analytics repo. We use density via gaussian_kde and can detect multiple expected values, i.e. to warn that there are several states and features to separate them should be found. However, we do not calculate confidence interval, etc (since which such approach we do not need it, but instead we look at the delta and if there is more than 1 expected value).

We should check that it works correctly and possibly improve it (calculate confidence interval, etc)...

Manually importing GitHub repositories as zip fail on Windows

During my visit to Manchester, we tried to test CK on Windows which didn't have git installed. I tried to manually download and import CK repositories as zip, but it failed. The reason is that GitHub adds extra root directory when packing zip.

We can easily fix that in "ck add repo --zip=xyz.zip" by searching .cmr.json in zip and use it as root ...

Compilation on Windows fails when md5sum is unavailable

Running a compile program on Windows e.g.

$ ck compile program:tool-print-cuda-devices

fails with "md5sum is not recognized as an internal or external command..."

I've nearly solved it by downloading "File Checksum Integrity Verifier" (fciv.exe) from Microsoft into ~CK/ck/bin and placing a file md5sum.bat there:

fciv.exe -md5 %1

But the Microsoft tool prints a comment before the actual checksum:

//
// File Checksum Integrity Verifier version 2.05.
//
2a084556e80926ce59f81cd51520ae08 a.exe

If this is an acceptable solution, CK should ignore the lines starting with //.

set different web port if 3344 is not available

A user suggestion is to try different web port when starting CK web service (ck start web) if default 3344 port is busy.

I am hesitant about that since client machines will not know the port and will not be able to connect remotely ...

Update .cm/meta.json on copy

I've copied a repository like this:

$ ck cp reproduce-carp-project:program:realeyes-hog-opencl-tbb \
            reproduce-carp-project:program:new-hog-opencl-tbb 
Entry program:realeyes-hog-opencl-tbb was successfully copied!

Git reports new untracked files:

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        .cm/alias-a-new-hog-opencl-tbb
        .cm/alias-u-32d91de0c7049067
        new-hog-opencl-tbb/

Diff shows changes in .cm/info.json:

$ diff {realeyes,new}-hog-opencl-tbb/.cm/info.json 
2c2
<   "backup_data_uid": "b93bc750890706bc", 

---
>   "backup_data_uid": "32d91de0c7049067", 
19c19
<   "data_name": "realeyes-hog-opencl-tbb"

---
>   "data_name": "new-hog-opencl-tbb"

So far so good! However, there's no difference in .cm/meta.json:

$ diff {realeyes,new}-hog-opencl-tbb/.cm/meta.json
$

I'd expect it to be updated at least with the new UID (32d91de0c7049067)!

Improving pulling of stable vs development repos

In the future, we need to add support to

  • have distributed repos which resolve repo/module/package UOA names (like DOI) - support via CK repos is already there but we just need to provide functionality
  • provide a way to get a specific version of repo including stable vs development
  • have 'get' function to get ziped repo besides 'pull' to get GIT repo

Installation of OpenME on OS X

I know that OS X isn't officially supported, so I understand that some features might not work. However, it would be nice if OpenME can be installed through ck on OS X.

If I follow the getting started guide, OpenME is automatically installed. However, this fails on my system:

cnugteren$ ck compile program:cbench-automotive-susan --speed
***************************************************************************************
Current directory: /Users/cnugteren/CK/ctuning-programs/program/cbench-automotive-susan/tmp
***************************************************************************************
Resolving dependencies ...
==========================================================================================
WARNING: environment was not found using tags="lib,xopenme" and setup={"target_os_bits": "64", "host_os_uoa": "linux-64", "target_os_uoa": "linux-64"}
  Would you like to search and install package with these tags automatically (Y/n)? Y
CK error: environment was not found using tags="lib,xopenme" and setup={"target_os_bits": "64", "host_os_uoa": "linux-64", "target_os_uoa": "linux-64"}!

When installing it explicitly, there is a more informative error message:

cnugteren$ ck install package:lib-rtl-xopenme

Resolving dependencies ...

Searching if environment already exists using:
  * Tags: lib,rtl,xopenme,lang-c,lang-cpp,lang-f77,lang-f90,lang-f95,v0.3,v0,host-os-linux-64,target-os-linux-64,64bits
  * Dependency: compiler=44609897464b20a3

Environment not found ...

*** Installation path used: /Users/cnugteren/CK-TOOLS/lib-rtl-xopenme-0.3-gcc-local-linux-64

Resolving dependencies ...

Copying XOpenME to src dir ...

Building static library ...

Executing gcc -O3 -fPIC -c  -I .     xopenme.c  
Executing ar rcs -o librtlxopenme.a xopenme.o
ar: illegal option combination for -r
usage:  ar -d [-TLsv] archive file ...
    ar -m [-TLsv] archive file ...
    ar -m [-abiTLsv] position archive file ...
    ar -p [-TLsv] archive [file ...]
    ar -q [-cTLsv] archive file ...
    ar -r [-cuTLsv] archive file ...
    ar -r [-abciuTLsv] position archive file ...
    ar -t [-TLsv] archive [file ...]
    ar -x [-ouTLsv] archive [file ...]
Error: Compilation failed in /Users/cnugteren/CK-TOOLS/lib-rtl-xopenme-0.3-gcc-local-linux-64/src!
CK error: processing archive failed!

simple decision tree example

please provide basic example for machine learning / decision trees.
input: csv file with values of features and labels (boolean)
program_size, opt_flag, is_cmov_expected
20, O2 , no
20, O3 , yes
20, Os , yes
output: dot file with decision tree

Update datetime and version on copy

Currently CK copy creates a new UID and updates the name e.g.:

anton@localhost ~/CK/reproduce-carp-project/demo $ ck cp reproduce-carp-project:demo:explore-hog-lws reproduce-carp-project:demo:explore-new-hog-lws
Entry demo:explore-hog-lws was successfully copied!
anton@localhost ~/CK/reproduce-carp-project/demo $ diff explore-{,new-}hog-lws/.cm/info.json
2c2
<   "backup_data_uid": "44bb91f7f0d759da", 

---
>   "backup_data_uid": "8517804c4aefa229", 
19c19
<   "data_name": "explore-hog-worksize"

---
>   "data_name": "explore-new-hog-lws"

However, it's a great opportunity to update the "datetime" and "version" fields too:

    "iso_datetime": "2015-06-09T16:30:40.498000", 
    "license": "See CK LICENSE.txt for licensing details", 
    "version": [
      "1", 
      "2", 
      "0605"
    ]

"ck ls"

"ck ls" would be a useful alias for "ck list".

Add CK calls to main file managers

It would be great to CK calls (to find CK entries and jump to their directories or load files and meta description) to main file managers in Windows and Linux:

  • explorer
  • mcedit
  • emacs
  • vim
    etc

ck find for repositories

The ck find command for a repository returns the location of the locally stored metadata for this repository e.g.

anton@localhost / $ ck find repo:reproduce-carp-project
/home/anton/CK/local/repo/reproduce-carp-project

In most cases, however, the user will want to find the actual location of the data, not that of the metadata. Currently this can be found using the ck where command e.g.:

anton@localhost / $ ck where repo:reproduce-carp-project
/home/anton/CK/reproduce-carp-project

This is consistent with using ck find for other entities but not terribly intuitive, leading to confusion and frustration.

I suggest to change this behaviour, so that ck find returns the location of the data, while e.g. ck find --meta returns the location of the metadata. The ck where command can then be kept for compatibility or deprecated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.