stability-ai / stablelm Goto Github PK

StableLM: Stability AI Language Models

License: Apache License 2.0

Jupyter Notebook 100.00%

stablelm's Issues

Support for MPS device (Apple M1/M2)

Would be nice to support MPS to use this model on consumer hardware, it would be super useful, for example with Apple Shortcuts + raycast etc. i already have a bunch of gpt4 shortcuts which i would be happy to try with a non-privacy leaking/faster model

RuntimeError: MPS does not support cumsum op with int64 input

Since there is no code available, I cannot point out where is the fix for this

Regarding model context length

Are there any plans for a model with a larger context length in the works? With Claude's 9k limit, GPT-4's 8k and 32k limit and Jurassic 2's 8k limit, I feel like a model that's only limited to 4096 tokens of context in the current year could pose quite a limitation. If it's feasible, could you consider planning on making the 175B model incorporate a larger context window since it hasn't commenced training yet? From a local standpoint of things an 8k context length or even larger model would be great, especially since before this release we were all stuck dealing with just the 2k context window of LLaMA.

GPU support Table & VRAM usage

It would be great to get the instructions to run the 3B model locally on a gaming GPU (e.g. 3090/4090 with 24GB VRAM).

Confirmed GPUs

From this thread

GPU Model	VRAM (GB)	Tuned-3b	Tuned-7b
RTX 3090	24	✅	✅
RTX 4070 Ti	12	✅
RTX 4090	24		✅
T4	16	✅	❌
A100	40		✅

Best RAM/VRAM TRICKS (from this thread)

Convert models F32 -> F16 (lower RAM, faster load)

#17 (comment)

from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-3b")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-3b")
model.half().cuda()
model.save_pretrained('vvsotnikov/stablelm-tuned-alpha-3b-16bit')
tokenizer.save_pretrained('vvsotnikov/stablelm-tuned-alpha-3b-16bit')

Low-memory model loads

quantized 8bit (BitsAndBytes): #17 (comment)
torch_dtype=torch.float16 & low_cpu_mem_usage: #17 (comment)
device_map=auto: #17 (comment)

Other tricks

Streaming responses: #17 (comment)

Weights RAM/VRAM (GB)

model name	parameters	W (fp32)	W (fp16)	weights (VRAM)	load time (s)	works
stablelm-tuned-alpha-3b	3637321728	13.55	6.78	7.03	18.62	✅
stablelm-tuned-alpha-7b	7868755968	29.31	14.66	14.91	50.28	✅

weights (fp32, GB): that's the minimum required RAM to load the model (before calling .half())
weights (fp16, GB): that's the minimum VRAM when transferring the model to the GPU
weights (fp16, VRAM): reported VRAM increase after loading the model

Activations

Empyrical (numbers in bytes, fp32):

stablelm-tuned-alpha-3b: total_tokens * 1,280,582
stablelm-tuned-alpha-7b: total_tokens * 1,869,134

The regression fits at 0.99999989. For instance, with 32 input tokens and an output of 512, the activations are: 969 MB of VAM (almost 1 GB) will be required. Haven't tested with Batch not equal 1.

Examples of a few recorded activations numbers:

model	input_tokens	out_tokens	total_tokens	VRAM (MB)
3b	3072	1024	4096	5003
3b	1024	512	1536	1875
3b	64	1	65	78.19
3b	8	1	9	9.77
7b	3072	1024	4096	7304.22
7b	2048	512	2560	4564.47
7b	8	64	72	126.64
7b	8	1	9	14.27

Colab OOM

Hey THanks for the code. Ironically even the 3B model is crashing on Colab. This is after enabling 8-bit with fp16 precision.

Did it work for anyone?

Torch not compiled with CUDA enabled

Hi, on mac M1 I have the error related to Torch not compiled with CUDA enabled

Traceback (most recent call last):
File "/start.py", line 6, in
model.half().cuda()
File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 749, in cuda
return self._apply(lambda t: t.cuda(device))
File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/dev/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/dev/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 749, in
return self._apply(lambda t: t.cuda(device))
File "/dev/miniforge3/lib/python3.10/site-packages/torch/cuda/init.py", line 221, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Thanks

Notebook not working

I get ---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[12], line 3
1 #@title Setup
----> 3 import torch
4 from transformers import AutoModelForCausalLM, AutoTokenizer
6 from IPython.display import Markdown, display

ModuleNotFoundError: No module named 'torch'

even after installing torch .
Im on mac
https://github.com/Stability-AI/StableLM/blob/main/notebooks/stablelm-alpha.ipynb

How to train the StableLM-Tuned-Alpha-3b or StableLM-Tuned-Alpha-7b? I want to know the details of the fine-tuning. Thanks.

Embeddings with StableLM?

Is it possible to get embeddings from the model for my input text?

I.e. could I replace GTP3 calls from OpenAI with some python code and this model?

Repository should have unambiguous document saying what "ethical or appropriate uses" are, and if usage for NSFW/mature content generation is permitted.

The license listed here is Apache 2.0. ( with Creative Commons BY-SA for the 'data' )

In clarification and, for the avoidance of any doubt, any read-me and associated documentation, should indicate if mature, explicit or NSFW content can (or cannot) be generated with the model/toolset, provided that the content (or generation thereof) does not constitute a breach of appropriate and relevant legal or regulatory requirements in a given users jurisdiction or region. (You might also add applicable community standards here, but those can vary quite considerably.)

As well as the above ideally, the read-me (or a separate ethical generation and use policy document) should indicate if certain sensitive areas are allowed or disallowed.

Some sample areas of potential concern follow (this is not an exhaustive list.):-
*Content which contains overt political or ideological content, or which is intended to inform/influence the views or choices of a potential (competent) reader, on issues of public concern, or in an election. (Examples being campaign material, lobbying briefings or public service announcement "fillers".)
*The use of fictionalized representations of potentially identifiable individuals (living or deceased), corporations (both current and defunct) and prominent brands , franchises or trademarks associated with those individuals or corporations.
*Content which contains LGBTQI themes, including cross-dressing or explorations of non-binary and gender-fluid presentation.
*Content which whilst not containing (explicit) deceptions of actual sexual activity, may explore alternative sexuality, fetishes, or practices of a mutually consensual nature, between informed consenting adult participants.
*Use of profanity and pejoratives. (in an appropriate context)
*Deceptions of violence, crime, 'abuse' or self-harm. (in line with the editorial standards typically applied in print or other media.)
*Professional advice which would typically be made a qualified individual under regulatory supervision (such as Doctors, attorneys, financial advisers, architects and engineers, )

I know that this may seem to be overly cautious, but it would seem reasonable to have some kind of guidance document, beyond the typical "Do not do illegal, criminal or obscene things with this." warnings commonly given with other models. Especially given that LLM style technology is getting media attention.

What Other Models are Available?

Hi All, I'm currently using the default AutoModelForCausalLM. What models would be recommended for a classifier? I'd like to write a system prompt to classify user inputs

More than 4096 context length?

Is it possible to have larger context as this allows to do more complicated things with smaller models?
A lot of the negatives of a smaller model can be rectified by pushing more data into the context. For example: Help pages, datasheets, examples, thinking rules, longer conversations trying to fix an issue, etc.

Please excuse me if this is the wrong place to ask this question, but very rarely the context is discussed. Thanks in advance.

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

(All setup scripts in the notebook executed successfully)

Getting this runtime error when executing Generate Text in the notebook:

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Details

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 41
     38 inputs.to(model.device)
     40 # Generate
---> 41 tokens = model.generate(
     42   **inputs,
     43   max_new_tokens=max_new_tokens,
     44   temperature=temperature,
     45   top_k=top_k,
     46   top_p=top_p,
     47   do_sample=do_sample,
     48   pad_token_id=tokenizer.eos_token_id,
     49   stopping_criteria=StoppingCriteriaList([StopOnTokens()])
     50 )
     52 # Extract out only the completion tokens
     53 completion_tokens = tokens[0][inputs['input_ids'].size(1):]

File ~/Library/Python/3.9/lib/python/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/transformers/generation/utils.py:1485, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, streamer, **kwargs)
...
   2513         layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
   2514     )
-> 2515 return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Up until this point I was using the default options. So I tried using "float" for option torch_dtype:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Details

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[18], line 41
     38 inputs.to(model.device)
     40 # Generate
---> 41 tokens = model.generate(
     42   **inputs,
     43   max_new_tokens=max_new_tokens,
     44   temperature=temperature,
     45   top_k=top_k,
     46   top_p=top_p,
     47   do_sample=do_sample,
     48   pad_token_id=tokenizer.eos_token_id,
     49   stopping_criteria=StoppingCriteriaList([StopOnTokens()])
     50 )
     52 # Extract out only the completion tokens
     53 completion_tokens = tokens[0][inputs['input_ids'].size(1):]

File ~/Library/Python/3.9/lib/python/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/transformers/generation/utils.py:1485, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, streamer, **kwargs)
...
-> 2560 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
   2562 # finished sentences should have their next token be a padding token
   2563 if eos_token_id is not None:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

In each of the above scenarios, #@title Generate Text was failing in 0.1s, but when I tried the other option for torch_dtype: "bfloat16", it didn't fail until after 3m 36s. It failed again for the exact same reason as before:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Environment

M2 mac running: 13.3.1

How to convert to 4bit gptq

I've seen there is a 4bit gptq version of stablelm and i'm curious if someone could lead me to some resources describing how to convert the current AI model to 4bit gptq. Any hint would be much appreciated.

Watching and chatting video with StableLM, and Ask anything in video.

Thanks for your amazing work! We have simply extended StableLM for video question answering in our project Ask-Anything.

In our attempts, it can generate longer content than chatGPT, but without additional fine-tuning, the current results are not satisfactory

Now we are trying to build a real video ChatBot with fantastic techniques. Hopefully, everyone can try our demo, and find the problem, we will try our best to fix it in our future ChatBot.

Poor Benchmark Results (Needs Addressed)

As seen in this popular spreadsheet by @lhl , StableLM-Alpha-7B currently scores below 5 year old 1GB models with 700M parameters and well below its architectural cousin GPT-J-6B which is only trained on 300B tokens.

This is a serious issue which needs to be addressed.

Edit:
@abacaj on twitter posted these 3B results:

How to Fine-tune the Model?

Hi, I want to fine-tune the 7b model, am I supposed to download the provided checkpoint and fine-tune it as shown in this repo: https://github.com/EleutherAI/gpt-neox#using-custom-data . Would they be compatible and did anyone here give it a shot? Thanks.

Add requirements.txt

Needs a dependencies list to run the example

Dataset used to pre-train

Hi there!
First of all, thank you for the amazing work!
The readme says the models were trained on "the new dataset based on The Pile" which is 3x the size of The Pile. Can you give more insights on the dataset and its content?

Thank you!

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

i run these codes from reade doc:

`import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
model.half().cuda()

class StopOnTokens(StoppingCriteria):
def call(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
stop_ids = [50278, 50279, 50277, 1, 0]
for stop_id in stop_ids:
if input_ids[0][-1] == stop_id:
return True
return False

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)

StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
StableLM will refuse to participate in anything that could harm a human.
"""

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.7,
do_sample=True,
stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))`

And I got this error:

Loading checkpoint shards: 25%|████▌ | 1/4 [00:07<00:23, 7.92s/it]
Traceback (most recent call last):
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 442, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/torch/serialization.py", line 797, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/torch/serialization.py", line 283, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 446, in load_state_dict
if f.read(7) == "version":
File "/home/ps/anaconda3/envs/pt/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/MyStudio/stableLM.py", line 5, in
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 468, in from_pretrained
return model_class.from_pretrained(
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2795, in from_pretrained
) = cls._load_pretrained_model(
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3110, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 458, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for '/home/ps/.cache/huggingface/hub/models--stabilityai--stablelm-tuned-alpha-7b/snapshots/25071b093c15c0d1cb2b2876c6deb621b764fcf5/pytorch_model-00002-of-00004.bin' at '/home/ps/.cache/huggingface/hub/models--stabilityai--stablelm-tuned-alpha-7b/snapshots/25071b093c15c0d1cb2b2876c6deb621b764fcf5/pytorch_model-00002-of-00004.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

how to fix this ?

How to install and run this on Ubuntu server?

notebook not work TypeError: dispatch_model() got an unexpected keyword argument 'offload_index'

Using `stabilityai[/stablelm-base-alpha-7b]`
Loading with: `torch_dtype='float16', load_in_8bit=False, device_map='auto'`

TypeError Traceback (most recent call last)
Cell In[10], line 17
14 cprint(f"Loading with: {torch_dtype=}, {load_in_8bit=}, {device_map=}")
16 tokenizer = AutoTokenizer.from_pretrained(model_name)
---> 17 model = AutoModelForCausalLM.from_pretrained(
18 model_name,
19 torch_dtype=getattr(torch, torch_dtype),
20 load_in_8bit=load_in_8bit,
21 device_map=device_map,
22 offload_folder="[./offload],
23 )

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:463], in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
461 elif type(config) in cls._model_mapping.keys():
462 model_class = _get_model_class(config, cls._model_mapping)
--> 463 return model_class.from_pretrained(
464 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
465 )
466 raise ValueError(
467 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n"
468 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}."
469 )

File [~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:2406], in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
...
-> 2406 dispatch_model(model, device_map=device_map, offload_dir=offload_folder, offload_index=offload_index)
2408 if output_loading_info:
2409 if loading_info is None:

TypeError: dispatch_model() got an unexpected keyword argument 'offload_index'

Can't load model on AWS Sagemaker

Hi,

when executing the model on AWS Sagemaker, I get the following error:

PredictionException: Could not load model /.sagemaker/mms/models/stabilityai__stablelm-tuned-alpha-7b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gpt_neox.modeling_gpt_neox.GPTNeoXForCausalLM'>)

In the notebook AutoModelForCausalLM is used too.
Maybe the used transformers version 4.26 doesn't support StableLM.

Does anyone know the needed version of transformers?
Does anyone has experience with running StableLM on AWS Sagemaker?

Code for recreating the issue:

from sagemaker.huggingface.model import HuggingFaceModel
hub = {
  'HF_MODEL_ID': 'stabilityai/stablelm-tuned-alpha-7b',
  'HF_TASK': 'text-generation'
}

huggingface_model = HuggingFaceModel(
   env=hub,
   role=role,
   transformers_version="4.26",
   pytorch_version="1.13",
   py_version='py39',
)

predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g4dn.8xlarge"
)

prompt = f"""<|SYSTEM|># StableLM Tuned (Alpha version)
  - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
  - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
  - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
  - StableLM will refuse to participate in anything that could harm a human.

<|USER|>Can you write a song about a pirate at sea?
<|ASSISTANT|>"""

result = predictor.predict(prompt)
predictor.delete_endpoint()
print(result)

How to finetune StableLM with LoRA?

StableLM looks GPTNeoX and has query_key_value parameters.
I thought I could apply LoRA to StableLM by specifying target_modules='query_key_value', but I got the following error.

Traceback (most recent call last):
  File "/root/workspace/finetune.py", line 288, in <module>
    fire.Fire(train)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/root/workspace/finetune.py", line 189, in train
    model = get_peft_model(model, config)
  File "/usr/local/lib/python3.10/dist-packages/peft/mapping.py", line 112, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 647, in __init__
    super().__init__(model, peft_config, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 91, in __init__
    self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 132, in __init__
    self.add_adapter(adapter_name, self.peft_config[adapter_name])
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 139, in add_adapter
    self._find_and_replace(adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 225, in _find_and_replace
    raise ValueError(
ValueError: Target modules query_key_value not found in the base model. Please check the target modules and try again.

Is there any solution?
Thank you in advance!

I confirmed the base model has `query_key_value` parameters in the attention modules.

In [5]: model.named_parameters
Out[5]:
<bound method Module.named_parameters of GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50688, 4096)
    (layers): ModuleList(
      (0-15): 16 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (attention): GPTNeoXAttention(
          (rotary_emb): RotaryEmbedding()
          (query_key_value): Linear(in_features=4096, out_features=12288, bias=True)
          (dense): Linear(in_features=4096, out_features=4096, bias=True)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=4096, out_features=16384, bias=True)
          (dense_4h_to_h): Linear(in_features=16384, out_features=4096, bias=True)
          (act): GELUActivation()
        )
      )
    )
    (final_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
  (embed_out): Linear(in_features=4096, out_features=50688, bias=False)
)>

What's the difference between ChatGPT and Chatbot?

What's the difference between ChatGPT and Chatbot?
https://gpt.mqgggg.top

Is the code used for pre-training this public?

Upload model to replicate

Please consider uploading the model to https://replicate.com/ so is more easier to use it

Write me an essay about birds in the earth for 250 words

Are StableLMs Multilingual Causal Decoders?

I've double check the description on the huggingface hub, it seems that the Stable LMs (3b & 7b) are only pre-trained on English. It means they don't support other languages, right?

This is not a issue 👍

👀

Вся

О мой гад

The example code does not respect stop tokens

I have added into the stop_ids several tokens, however it seems to not be respecting even the default ones given:
stop_ids = set([50278, 50279, 50277, 1, 0,187])

Represented as decoded outputs these are:

<|USER|><|ASSISTANT|><|SYSTEM|><|padding|><|endoftext|>\n

However it still generates these tokens, here is my sample output:

<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.<|USER|>Where is the capital of germany?<|ASSISTANT|>The capital of Germany is Berlin.<|USER|>What are some notable attractions or landmarks in Berlin, Germany that tourists can visit?<|ASSISTANT|>Some notable attractions and landmarks in Berlin, Germany that tourists can visit include:

1. Brandenburg Gate - a beautiful and historic monument that was the symbol of Berlin from the late 18th

I've tried omitting the skipping of special tokens, and also tweaked the system prompt to include other stop sequences and explicitly telling it not to generate more than just a single output, but it didn't work for me

Any advice?

How to run the classification task

Is there other model rather than chat to focus on the NL classification tasks, if yes, please also give an example.

Obvious wrong anwser by stablelm-tuned-alpha.

Will the StableLM support Chinese?

Appreciate the team's quick turn here! Love to learn if the model will support Chinese, and will that perform as good as in English?

Source code for the model

Hi,

Just curious: will Stability release the source code that was used to build the model? I know without weights/training set that source won't be of much use, but I would still like to see the source code so that we know what's under the hood.

Thanks,

Vivek

What's the difference between stablelm and gpt-neox

Is padding token supported?

#

Originally posted by @Dungkamon in #24 (comment)

Training Tokens (1T+ ?)

First, I would like to thank the folks at Stability AI for their generous contribution of these base models under a permissive license.

Do you plan on releasing training data (ie. wandb) logs?

I'm also curious why training was stopped at 800B tokens, while the LLAMA models were trained up to 1T and 1.4T tokens. Is there any plan to continue training the base models up to 1T tokens or beyond? It appeared at though the LLAMA models were continuing to improve even up to 1.4T tokens.

Running Quantized Model

Hi All,

Does anyone else run into the following error when attempting to run the 8-Bit version of the StableLM notebook?

Essentially, all I changed was I set the 8-bit flag to True

What is the context window?

Is it feasible to combine diffusion models and language models to mimic divergent thinking?

Does using a diffusion model in a language model increase the generality of the language model?

Why no 1T model?

Relicense the finetuned checkpoints under CC BY-SA

The license of the finetuned checkpoints currently makes no sense.

The base model was almost certainly trained on a ton of unlicensed all-rights-reserved data. In particular, the README says that it was trained on a dataset derived from the Pile, which includes ~100GB of commercial (some might say "pirated") ebooks (the Books3 dataset). And yet this model is licensed under CC BY-SA.

The finetuned model was trained on data which is under a less restrictive license (CC BY-NC, which is less restrictive than "all rights reserved") and yet suddenly the model has to follow the license of the data that was used for training?

This makes no sense. If training on unlicensed/all-rights-reserved data and releasing that model under an arbitrary license is OK then training it on less restrictive CC BY-NC data and releasing it under an arbitrary license is OK too. Alternatively, if the model has to follow the license of the data on which it was trained on then the base model has to be taken down as it was trained on all-rights-reserved data for which you had no license.

Getting outofMemory error: CUDA

I get an error when I trying to use the model on a ml.g4dn.4xlarge instance.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 592.00 MiB (GPU 0; 14.62 GiB total capacity; 14.33 GiB already allocated; 175.94 MiB free; 14.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am using the script in the README QuickStart section.

15B, 30B, 65B

May I ask when will the larger model be released?

Is there a schedule (15B, 30B, 65B) for reference?

failed to detect simple syntax code errors

hi,
just did few tests on https://huggingface.co/spaces/stabilityai/stablelm-tuned-alpha-chat
just copied some random code on github and removed some stuff

what is wrong with this code?

const express = require("express");
const cors = require("cors");

const app = express();

var corsOptions = {
  origin: "http://localhost:8081"
};

app.use(cors(corsOptions));
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
app.get("/", (req, res) => {
  res.json({ message: "Welcome to bezkoder application."
});
require("./app/routes/tutorial.routes.js")(app);
const PORT = process.env.PORT || 8080;
app.listen(PORT, () => {
  console.log(`Server is running on port ${PORT}.`);
});

and it fails to detect the problem response:

There doesn’t seem to be anything necessarily wrong with the code provided, but it is missing some important details such as defining the routes, setting up the server and client correctly, and ensuring that all necessary libraries and dependencies are installed. Additionally, there might be errors or warnings that could be causing issues.

chatgpt response

There seems to be an error in the code.

In the app.get() method, the response object res.json() has an unclosed bracket for the object passed in. It should be closed with a curly brace "}" like this:

app.get("/", (req, res) => {
  res.json({ message: "Welcome to bezkoder application." });
});

Without the closing brace, the code will result in a syntax error.

that being said can someone help me please? i'm looking to have some type of AI for specific coding languages and remove rest of unnecessary data, to help on the code and debug, for nodejs, php
if anyone could give me some suggestions what steps to take that i can achieve this?! would mean a lot
thank you

Model files are big?

https://huggingface.co/stabilityai/stablelm-base-alpha-3b/tree/main

Looks like 3B is 14.7GB, and if I understand correctly, it's supposed to be f16. Even with f32, it should be about 11.2G. With f16, 5.6G. Am I missing something?

For reference LLaMA 7B (f16) is 12.6G.

upd: I guess it's actually f32. But still seems a little bigger than should be?

Why my output is so short?

my output is different from huggingface demo and my output is so short