Comments (5)
Naively quantized bloom 6b on CPU: http://nora:8800/notebooks/decentralized/jheuristic/bloom_test/cpu-qint-matmul.ipynb
source: https://gist.github.com/justheuristic/47830c9ddfd45889894e69d4f45ce233
from petals.
On client-side computations
Since we plan to run embeddings/logits on client side, we need to compute them efficiently.
Embedding computation is cheap AF, but logits are more complicated
Computing final logits on colab CPU costs over a minute per one token
Solution 1: use fast KNN
- top-99% probabilities are held by the 100 most likely tokens
- use HNSW to find top-100 tokens that have highest dot product
- FAISS of ScaNN for fast nearest neighbor search
Solution 2: just use GPU
- colab T4 in fp16: 30ms per token (no longer a bottleneck)
- kudesnik m40 (~2x colab k80): 67ms, still very much acceptable
- gpus might not be available
Current opinion: use GPUs, think about fast CPU mode later.
from petals.
Copied bloom version huggingface/transformers@ca2a55e
from huggingface
Their attention code is spectacularly bad, see #1
from petals.
Next steps:
- push individual bloom layers to huggingface hub
- implement BloomBlock as hivemind.moe.server.ExpertBackend
from petals.
As of 8221469
- Pushing model to hub is handled via python -m cli.convert_model --many_args_here, see README for usage example
- Server can run forward, backward and inference of bloom blocks, see README for instructions on how to start a server
from petals.
Related Issues (20)
- AttributeError: module 'numpy' has no attribute 'ndarray' HOT 5
- Support stable diffusion model HOT 2
- Available Models? HOT 1
- Add pre-commit hook
- support quivr or privateGPT? HOT 2
- whether past_key_values can be obtained HOT 1
- Add mistral to chat.petals HOT 1
- Pyinstaller packaged Petals binary fails to load/download on`AutoDistributedModelForCausalLM.from_pretrained`
- text to video generation models ? HOT 2
- IndexError: tuple index out of range HOT 7
- Error while hosting as provider
- I confirm Petals not working in Google Colab HOT 2
- Upstream Changes makes the demo not work HOT 2
- M1 macOS installation error ("failed to build wheel for hivemind") with mac-native petals HOT 2
- Prepull model data on private swarm HOT 5
- ImportError: cannot import name 'AutoDistributedModelForCausalLM' from partially initialized module 'petals' (most likely due to a circular import) HOT 3
- Add "Podman" usage to the documentation
- Is there any plan to support MoE models like Mixtral8×7B? HOT 5
- Error when trying to launch private swarm using locally stored model
- Can not use direct server-to-server communication HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from petals.