Coder Social home page Coder Social logo

Model files are big? about stablelm HOT 9 CLOSED

stability-ai avatar stability-ai commented on June 16, 2024
Model files are big?

from stablelm.

Comments (9)

jon-tow avatar jon-tow commented on June 16, 2024 12

The actual model sizes are:
3B: 3,638,525,952
7B: 7,869,358,080

The fp32 weights are provided to allow users to reduce precision to their needs. We will consider providing the weights in f16 since this is a common complaint :)

Thank you for pointing it out!

from stablelm.

vvsotnikov avatar vvsotnikov commented on June 16, 2024 9

For the sake of convenience (2x less download size/RAM/VRAM), I've uploaded 16-bit versions of tuned models to HF Hub:
https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-7b-16bit
https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-3b-16bit

from stablelm.

MarkSchmidty avatar MarkSchmidty commented on June 16, 2024 4

There's a 4.9GB ggml 4bit GPTQ quantization for StableLM-7B up on HuggingFace which works in llama.cpp for fast CPU inference.

(For comparison, LLaMA-7B in the same format is 4.1GB. But, StableLM-7B is actually closer to 8B parameters than 7B.)

from stablelm.

python273 avatar python273 commented on June 16, 2024 3

Ok, the size seems about right then.

# took the size from disk. huggingface shows in / 1000**3
>>> (10_161_140_290+4_656_666_941) / 1024 / 1024 / 1024
13.800158380530775
>>> (3_638_525_952 * 4) / 1024 / 1024 / 1024
13.5545654296875

f16 weights would be nice, to download less stuff

from stablelm.

jon-tow avatar jon-tow commented on June 16, 2024 2

Hi, @andysalerno! I do expect these models to quantize quite well. They're pretty wide, which should help reduce bandwidth boundness compared to models of similar size when quantized.

from stablelm.

andysalerno avatar andysalerno commented on June 16, 2024

@jon-tow on this topic, do you expect these models to quantize well down to 4bits (or lower) via GPTQ and/or other quantizing strategies?

I don't see why not, since GPTQ seems to be a general technique that works well for different transformer models. But I'm asking because part of reason behind Stable Diffusion's success is from how well it runs on consumer hardware. So I'm wondering if these models will follow a similar goal, of running very well on consumer hardware, and therefore consider quantization from the very beginning?

from stablelm.

iboyles avatar iboyles commented on June 16, 2024

Yeah we need a Colab for this stuff that doesn't crash from ram out of memory lol

from stablelm.

jrincayc avatar jrincayc commented on June 16, 2024

There's a 4.9GB ggml 4bit GPTQ quantization for StableLM-7B up on HuggingFace which works in llama.cpp for fast CPU inference.

(For comparison, LLaMA-7B in the same format is 4.1GB. But, StableLM-7B is actually closer to 8B parameters than 7B.)

Hm, how do you actually run this?
I tried https://github.com/ggerganov/llama.cpp ( 4afcc378698e057fcde64e23eb664e5af8dd6956 and also 5addcb120cf2682c7ede0b1c520592700d74c87c )

and got:

./main -m ../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin -p "this is a test"
main: seed = 1682468827
llama.cpp: loading model from ../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin
error loading model: missing tok_embeddings.weight
llama_init_from_file: failed to load model
main: error: failed to load model '../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin'

from stablelm.

pratikchhapolika avatar pratikchhapolika commented on June 16, 2024

Hi @jon-tow @python273 Why we have multiple .bin files inside stabilityai/stablelm-base-alpha-7b? When we load the model which bin file is loaded?

from stablelm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.