Comments (9)
The actual model sizes are:
3B: 3,638,525,952
7B: 7,869,358,080
The fp32 weights are provided to allow users to reduce precision to their needs. We will consider providing the weights in f16 since this is a common complaint :)
Thank you for pointing it out!
from stablelm.
For the sake of convenience (2x less download size/RAM/VRAM), I've uploaded 16-bit versions of tuned models to HF Hub:
https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-7b-16bit
https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-3b-16bit
from stablelm.
There's a 4.9GB ggml 4bit GPTQ quantization for StableLM-7B up on HuggingFace which works in llama.cpp for fast CPU inference.
(For comparison, LLaMA-7B in the same format is 4.1GB. But, StableLM-7B is actually closer to 8B parameters than 7B.)
from stablelm.
Ok, the size seems about right then.
# took the size from disk. huggingface shows in / 1000**3
>>> (10_161_140_290+4_656_666_941) / 1024 / 1024 / 1024
13.800158380530775
>>> (3_638_525_952 * 4) / 1024 / 1024 / 1024
13.5545654296875
f16 weights would be nice, to download less stuff
from stablelm.
Hi, @andysalerno! I do expect these models to quantize quite well. They're pretty wide, which should help reduce bandwidth boundness compared to models of similar size when quantized.
from stablelm.
@jon-tow on this topic, do you expect these models to quantize well down to 4bits (or lower) via GPTQ and/or other quantizing strategies?
I don't see why not, since GPTQ seems to be a general technique that works well for different transformer models. But I'm asking because part of reason behind Stable Diffusion's success is from how well it runs on consumer hardware. So I'm wondering if these models will follow a similar goal, of running very well on consumer hardware, and therefore consider quantization from the very beginning?
from stablelm.
Yeah we need a Colab for this stuff that doesn't crash from ram out of memory lol
from stablelm.
There's a 4.9GB ggml 4bit GPTQ quantization for StableLM-7B up on HuggingFace which works in llama.cpp for fast CPU inference.
(For comparison, LLaMA-7B in the same format is 4.1GB. But, StableLM-7B is actually closer to 8B parameters than 7B.)
Hm, how do you actually run this?
I tried https://github.com/ggerganov/llama.cpp ( 4afcc378698e057fcde64e23eb664e5af8dd6956 and also 5addcb120cf2682c7ede0b1c520592700d74c87c )
and got:
./main -m ../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin -p "this is a test"
main: seed = 1682468827
llama.cpp: loading model from ../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin
error loading model: missing tok_embeddings.weight
llama_init_from_file: failed to load model
main: error: failed to load model '../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin'
from stablelm.
Hi @jon-tow @python273 Why we have multiple .bin files inside stabilityai/stablelm-base-alpha-7b
? When we load the model which bin file is loaded?
from stablelm.
Related Issues (20)
- Windows fatal exception: access violation
- Chatting and prompt
- RLHF training code for StableVicuna open sourced? HOT 1
- StableVicuna does not stop dialog speaking, probably until max_new_tokens. HOT 3
- loss not decreasing with deepspeed HOT 1
- Training Script stablity 3B and 7B HOT 6
- Unclear tokenizer class HOT 2
- Cannot run demo HOT 2
- fairyfloss HOT 2
- process killed HOT 4
- License unclear HOT 8
- Is it normal to take a long time ( about 15min )to generate an answer? HOT 1
- How to expand the sequence length of llama? HOT 1
- Consider using OpenAI Evals
- The output is the same as the input. HOT 1
- Is this project abandoned? HOT 4
- Stability AI
- Hello, how to convert the statityai/tablelm-base-alpha-3b to ggml format HOT 1
- Target modules ['query_key_value', 'dense', 'dense_h_to_4h', 'dense_4h_to_h'] not found in the base model. Please check the target modules and try again. HOT 2
- OSError: stabilityai/stablelm-base-alpha-3b-v2 does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stablelm.