<a href="https://huggingface.co/stabilityai/stablelm-base-alpha-3b/tree/main" rel="nof

There's a 4.9GB <a href="https://huggingface.co/cakewalk/ggml-q4_0-stablelm-tuned-alph

Ok, the size seems about right then. <div class="highlight highlight-source-python

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

There's a 4.9GB <a href="https://huggingface.co/cakewalk/ggml-q4_0-stable

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Model files are big? about stablelm HOT 9 CLOSED

stability-ai commented on June 16, 2024

Model files are big?

from stablelm.

Comments (9)

jon-tow commented on June 16, 2024 12

The actual model sizes are:
3B: 3,638,525,952
7B: 7,869,358,080

The fp32 weights are provided to allow users to reduce precision to their needs. We will consider providing the weights in f16 since this is a common complaint :)

Thank you for pointing it out!

from stablelm.

vvsotnikov commented on June 16, 2024 9

For the sake of convenience (2x less download size/RAM/VRAM), I've uploaded 16-bit versions of tuned models to HF Hub:
https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-7b-16bit
https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-3b-16bit

from stablelm.

MarkSchmidty commented on June 16, 2024 4

There's a 4.9GB ggml 4bit GPTQ quantization for StableLM-7B up on HuggingFace which works in llama.cpp for fast CPU inference.

(For comparison, LLaMA-7B in the same format is 4.1GB. But, StableLM-7B is actually closer to 8B parameters than 7B.)

from stablelm.

python273 commented on June 16, 2024 3

Ok, the size seems about right then.

# took the size from disk. huggingface shows in / 1000**3
>>> (10_161_140_290+4_656_666_941) / 1024 / 1024 / 1024
13.800158380530775
>>> (3_638_525_952 * 4) / 1024 / 1024 / 1024
13.5545654296875

f16 weights would be nice, to download less stuff

from stablelm.

jon-tow commented on June 16, 2024 2

Hi, @andysalerno! I do expect these models to quantize quite well. They're pretty wide, which should help reduce bandwidth boundness compared to models of similar size when quantized.

from stablelm.

andysalerno commented on June 16, 2024

@jon-tow on this topic, do you expect these models to quantize well down to 4bits (or lower) via GPTQ and/or other quantizing strategies?

I don't see why not, since GPTQ seems to be a general technique that works well for different transformer models. But I'm asking because part of reason behind Stable Diffusion's success is from how well it runs on consumer hardware. So I'm wondering if these models will follow a similar goal, of running very well on consumer hardware, and therefore consider quantization from the very beginning?

from stablelm.

iboyles commented on June 16, 2024

Yeah we need a Colab for this stuff that doesn't crash from ram out of memory lol

from stablelm.

jrincayc commented on June 16, 2024

There's a 4.9GB ggml 4bit GPTQ quantization for StableLM-7B up on HuggingFace which works in llama.cpp for fast CPU inference.

(For comparison, LLaMA-7B in the same format is 4.1GB. But, StableLM-7B is actually closer to 8B parameters than 7B.)

Hm, how do you actually run this?
I tried https://github.com/ggerganov/llama.cpp ( 4afcc378698e057fcde64e23eb664e5af8dd6956 and also 5addcb120cf2682c7ede0b1c520592700d74c87c )

and got:

./main -m ../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin -p "this is a test"
main: seed = 1682468827
llama.cpp: loading model from ../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin
error loading model: missing tok_embeddings.weight
llama_init_from_file: failed to load model
main: error: failed to load model '../ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin'

from stablelm.

pratikchhapolika commented on June 16, 2024

Hi @jon-tow @python273 Why we have multiple .bin files inside stabilityai/stablelm-base-alpha-7b? When we load the model which bin file is loaded?

from stablelm.

Model files are big? about stablelm HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent