Coder Social home page Coder Social logo

microsoft / 0xdeca10b Goto Github PK

View Code? Open in Web Editor NEW
545.0 31.0 127.0 16 MB

Sharing Updatable Models (SUM) on Blockchain

Home Page: https://aka.ms/0xDeCA10B-blog1

License: MIT License

Python 94.77% Dockerfile 1.29% Shell 0.51% JavaScript 3.42%
blockchain ml ai economics machine-learning artificial-intelligence ethereum truffle prediction-mar prediction-market

0xdeca10b's Introduction

Sharing Updatable Models (SUM) on Blockchain

(formerly Decentralized & Collaborative AI on Blockchain)

Animated logo for the project. A neural network appears on a block. The nodes change color until finally converging. The block slides away on a chain and the process restarts on the next blank block.

Demo Simulation Security
Demo: Test Simulation: Test Build Status

Sharing Updatable Models (SUM) on Blockchain is a framework to host and train publicly available machine learning models. Ideally, using a model to get a prediction is free. Adding data consists of validation by three steps as described below.

Picture of a someone sending data to the addData method in CollaborativeTrainer which sends data to the 3 main components as further described next.

  1. The IncentiveMechanism validates the request to add data, for instance, in some cases a "stake" or deposit is required. In some cases, the incentive mechanism can also be triggered later to provide users with payments or virtual "karma" points.
  2. The DataHandler stores data and meta-data on the blockchain. This ensures that it is accessible for all future uses, not limited to this smart contract.
  3. The machine learning model is updated according to predefined training algorithms. In addition to adding data, anyone can query the model for predictions for free.

The basics of the framework can be found in our blog post. A demo of one incentive mechanism can be found here. More details can be found in the initial paper describing the framework, accepted to Blockchain-2019, The IEEE International Conference on Blockchain.

This repository contains:

  • Demos showcasing some proof of concept systems using the Ethereum blockchain. There is a locally deployable test blockchain and demo dashboard to interact with smart contracts written in Solidity.
  • Simulation tools written in Python to quickly see how models and incentive mechanisms would work when deployed.

Picture of a QR code with aka.ms/0xDeCA10B written in the middle.

FAQ/Concerns

Aren't smart contracts just for simple code?

There are many options. We can restrict the framework to simple models: Perceptron, Naive Bayes, Nearest Centroid, etc. We can also combine off-chain computation with on-chain computation in a few ways such as:

  • encoding off-chain to a higher dimensional representation and just have the final layers of the model fine-tuned on-chain,
  • using secure multiparty computation, or
  • using external APIs, or as they are called the blockchain space, oracles, to train and run the model

We can also use algorithms that do not require all models parameters to be updated (e.g. Perceptron). We hope to inspire more research in efficient ways to update more complex models.

Some of those proposals are not in the true spirit of this system which is to share models completely publicly but for some applications they may be suitable. At least the data would be shared so others can still use it to train their own models.

Will transaction fees be too high?

Fees in Ethereum are low enough for simple models: a few cents as of July 2019. Simple machine learning models are good for many applications. As described the previous answer, there are ways to keep transactions simple. Fees are decreasing: Ethereum is switching to proof of stake. Other blockchains may have lower or possibly no fees.

What about storing models off-chain?

Storing the model parameters off-chain, e.g. using IPFS, is an option but many of the popular solutions do not have robust mirroring to ensure that the model will still be available if a node goes down. One of the major goals of this project is to share models and improve their availability, the easiest way to do that now is to have the model stored and trained in a smart contract.

We're happy to make improvements! If you do know of a solution that would be cheaper and more robust than storing models on a blockchain like Ethereum then let us know by filing an issue!

What if I just spam bad data?

This depends on the incentive mechanism (IM) chosen but essentially, you will lose a lot of money. Others will notice the model is performing badly or does not work as expected and then stop contributing to it. Depending on the IM, such as in Deposit, Refund, and Take: Self-Assessment, others that already submitted "good" data will gladly take your deposits without submitting any more data.

Furthermore, people can easily automatically correct your data using techniques from unsupervised learning such as clustering. They can then use the data offline for their own private model or even deploy a new collection system using that model.

What if no one gives bad data, then no one can profit?

That’s great! This system will work as a source for quality data and models. People will contribute data to help improve the machine learning models they use in their daily life.

Profit depends on the incentive mechanism (IM). Yes, in Deposit, Refund, and Take: Self-Assessment, the contributors will not profit and should be able to claim back their own deposits. In the Prediction Market based mechanism, contributors can still get rewarded by the original provider of the bounty and test set.

Learn More

Papers

More details can be found in our initial paper, Decentralized & Collaborative AI on Blockchain, which describes the framework, accepted to Blockchain-2019, The IEEE International Conference on Blockchain.

An analysis of several machine learning models with the self-assessment incentive mechanism can be found in our second paper, Analysis of Models for Decentralized and Collaborative AI on Blockchain, which was accepted to The 2020 International Conference on Blockchain.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

0xdeca10b's People

Contributors

dependabot[bot] avatar hkaur008 avatar juharris avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

0xdeca10b's Issues

demo " run yarn client ":

(base) WorkSpace:~/Documents/projects/BC+AI/0xDeCA10B/d
at wrapSafe (internal/modules/cjs/loader.js:915:16)
at Module._compile (internal/modules/cjs/loader.js:963:27)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
at Module.load (internal/modules/cjs/loader.js:863:32)
at Function.Module._load (internal/modules/cjs/loader.js:708:14)
at Module.require (internal/modules/cjs/loader.js:887:19)
at require (internal/modules/cjs/helpers.js:74:18)
at Object.webpack_require.f.require (/usr/local/lib/node_modules/truffle/build/cli.bundled.js:608:28)
at /usr/local/lib/node_modules/truffle/build/cli.bundled.js:538:40
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

[demo] Add way to search for added data

When a lot of data has been added, it might be annoying to find the data you want to refund/report. It would be good to have an easy way to search for the events.

Search params can be put into the customFilter function that is passed to handleAddedData.

Storing models for cheap on Factom

The Factom blockchain is built for data storage, offering a fixed cost of $0.001 for 10kb of data. The Factom chain anchors to the Bitcoin and Ethereum blockchain giving it accountability.

I'm curious if this option has been explored for model storage instead of storing it in Etheruem?

Each Factom entry has a unique hash and that can be used in the smart contract to lookup the required model.

More information on Factom - https://www.factom.com

[demo] Separate model storage from the contract with the update function.

Just documenting some thoughts here.
It would be nice for the model storage and the model's update algorithm to be kept separate so that users updating the model can pick from different update functions.
In practice this could cause problems if any update algorithm can be used because a user can significantly change the entire model.
I'm also pretty sure that it would cost more gas to call methods from another contract.

[demo] Use toasts more

There are plenty of toasts when adding a new model but for other pages just console logging or alerts are used. There are lots of TODOs on those pages for when toasts should be used.

[demo] [add] Allow uploading a vocab

Remove the option to select the IMDB vocab and instead allow uploading a vocab locally or to the DB.

Not as important now that #88 is done which lets you hash words to a number which works well with sparse models.

demo deployment failed

Hello author. When I installed your demo, I encountered two problems, 1: I want to know under what environment do you run, window powershell or linux? 2: I am running under Linux and running to Listening on 0.0.0.0:7545 when executing to the yarn blockchain, I haven't executed any further. Attempt to execute yarn client or yarn server also fails

[demo] Track accuracy over time.

In database? On blockchain?

We now track the accuracy in the database (managed by the server). This is okay but it's centralized so it would be good to add some proof to that database.

Add Model Page

The page to add models needs to be updated have it's route added back.

[simulation] Use normalization like in the demo.

In the demo, normalization is done before updating the dense perceptron and dense nearest centroid classifiers, but the simulation doesn't use normalization.

Normalization is used to avoid submissions with large values that would corrupt the model. Using normalized vectors might be bad for these models but it seems like a necessary guard to have on-chain.

[demo] Validate the contracts when they are given by address

Currently an error is shown if something fails when loading the contract but it would be better to get more detailed errors for what is invalid.
Validate that:

  • Address is for a contract.
  • Components (model, data handler, IM) exist
  • The component contracts are owned by the main entry point.

Leaving this issue open since there's probably more to validate.

error when run 'yarn server'

when I try to start the server from one terminal via
yarn server
the following error occured:

root@4b935555249c:~/workspace/demo# yarn server
yarn run v1.22.0
$ nodemon server.js --ignore client/
[nodemon] 2.0.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching dir(s): *.*
[nodemon] watching extensions: js,mjs,json
[nodemon] starting `node server.js`
[nodemon] Internal watch failed: ENOSPC: System limit for number of file watchers reached, watch '/root/workspace/demo/blockchain_db/!trie_db!0x98bc7528aec5dba65ab2272c2844b472118d4e17cce298ee58a3a2848d407c32'
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Does anyone know where went wrong?

[demo] Show image for the original data when looking at refund/reward data for an image classifier

I started to do this but the image wouldn't display. I was doing stuff like:

img = document.getElementById('input-image');

canvas = document.createElement('canvas');
context = canvas.getContext('2d');
canvas.width = img.width;
canvas.height = img.height;
context.drawImage(img, 0, 0 );

img2 = document.createElement("img");
img2.src = canvas.toDataURL();
document.body.appendChild(img2);

I was trying to save canvas.toDataURL() to the database of original data as text and then load it into an <img> tag. It wouldn't display.

[Simulation] -> Bokeh server application does not exist

Hi! I've recently started working and learning on SUM. I'm currently stuck with running the Simulation steps.

To direct you to the steps I've followed:

1 - I tried running the "docker run --rm -it -p 5006:5006 -v %cd%:/root/workspace/0xDeCA10B/simulation --name decai-simulation mcr.microsoft.com/samples/blockchain-ai/0xdeca10b-simulation bash" comman on my Git Cmd.
2 - Got my docker file up and running
3 - I then tried running "bokeh serve decai/simulation/simulate_imdb_perceptron.py"

This is where I got stuck and I get this:
image

I did quite some searches on google and tried some codes with regards to bokeh, and none worked.

Would be great if someone can guide me with what has happened and what needs to be done, new to this sort of tech!

Thank You

WSL Deployment Consulting

Hello, I used to run the linux virtual machine under vmware. The installation steps of your installation were unsuccessful. Now I want to implement it through WSL to see if I can do it. However, I have a few questions to ask you now: 1: What necessary configurations need to be prepared in WSL (except npm), can you provide detailed command steps? 2: Is docker installed in WSL? , You did not involve the installation of docker in the video demo. 3: I saw that your video demo used powershell to actually execute the command. I don't understand this point. How is it related to WSL? I am a beginner and appreciate your interest in this project. I hope I can run through it myself.
Looking forward to your reply,
sincerely thanks

[simulation] Add MurmurHash3 option

  • Pick a library (will likely use https://pypi.org/project/mmh3/)
  • Add test cases to compare hashes to make sure that the library is equivalent to the one we use in JavaScript for a few words.
  • Use for word-based datasets (can't do for IMDB yet because words are already mapped to vocabulary indices)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.