stability-ai / stablelm Goto Github PK

StableLM: Stability AI Language Models

License: Apache License 2.0

Jupyter Notebook 100.00%

stablelm's Introduction

StableLM: Stability AI Language Models

“A Stochastic Parrot, flat design, vector art” — Stable Diffusion XL

This repository contains Stability AI's ongoing development of the StableLM series of language models and will be continuously updated with new checkpoints. The following provides an overview of all currently available models. More coming soon.

News

September 29, 2023

Released StableLM-3B-4E1T model under CC BY-SA-4.0.

August 5, 2023

Released patched StableLM-Alpha v2 models with 3B and 7B parameters.

April 28, 2023

Released StableVicuna-13B, our RLHF fine-tune of Vicuna-13B v0, which itself is a fine-tune of LLaMA-13B. Delta weights over the original Llama model is released under (CC BY-NC-SA-4.0).

April 20, 2023

Released initial set of StableLM-Alpha models, with 3B and 7B parameters. Base models are released under CC BY-SA-4.0.
Try to chat with our 7B model, StableLM-Tuned-Alpha-7B, on Hugging Face Spaces.

Models

StableLM-3B-4E1T

Technical Report: StableLM-3B-4E1T

StableLM-3B-4E1T is a 3 billion (3B) parameter language model pre-trained under the multi-epoch regime to study the impact of repeated tokens on downstream performance. Given prior success in this area (Tay et al., 2023 and Taylor et al., 2022), we train on 1 trillion (1T) tokens for 4 epochs following the observations of Muennighoff et al. (2023) in "Scaling Data-Constrained Language Models" in which they find "training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data." Further inspiration for the token count is taken from "Go smol or go home" (De Vries, 2023), which suggests a 2.96B model trained for 2.85 trillion tokens achieves a similar loss to a Chinchilla compute-optimal 9.87B language model ($k_n = 0.3$).

Size	StableLM-3B-4E1T	Training Tokens	Parameters
3B	checkpoint	4T	2,795,443,200

Model Architecture

The model is a decoder-only transformer similar to the LLaMA (Touvron et al., 2023) architecture with the following modifications:

Parameters	Hidden Size	Layers	Heads	Sequence Length
2,795,443,200	2560	32	32	4096

Position Embeddings: Rotary Position Embeddings (Su et al., 2021) applied to the first 25% of head embedding dimensions for improved throughput following Black et al. (2022).
Normalization: LayerNorm (Ba et al., 2016) with learned bias terms as opposed to RMSNorm (Zhang & Sennrich, 2019).
Tokenizer: GPT-NeoX (Black et al., 2022).

Training Data

The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the HuggingFace Hub: Falcon RefinedWeb extract (Penedo et al., 2023), and RedPajama-Data (Together Computer., 2023) and The Pile (Gao et al., 2020) both without Books3 and other subsets, and StarCoder (Li et al., 2023).

Given the large amount of web data, we recommend fine-tuning the base StableLM-3B-4E1T for your downstream tasks.

Training Details

Please refer to the provided YAML configuration file stablelm-3b-4e1t.yml for complete hyperparameter settings and the technical report for further details.

Downstream Results

The following zero-shot evaluations are performed with the lm-evaluation-harness using the lm-bench branch of Stability AI's fork. Full lm-eval JSONs can be found in the evals directory.

Pre-Trained Model	Average	ARC Challenge	ARC Easy	BoolQ	HellaSwag (✱)	LAMBADA OpenAI	OpenBookQA	PIQA	SciQ	Winogrande
meta-llama/Llama-2-13b-hf	71.77	48.63	79.50	80.52	79.36	76.77	35.40	79.05	94.50	72.22
huggyllama/llama-7b	68.84	41.89	75.25	75.05	76.22	73.55	34.40	78.67	94.60	69.93
meta-llama/Llama-2-7b-hf	68.75	43.00	76.26	77.74	75.94	73.47	31.40	77.75	93.60	69.61
Qwen/Qwen-7B	67.91	45.39	67.38	74.56	88.85 (?)	69.67	32.20	73.99	93.20	65.98
tiiuae/falcon-7b	67.83	40.27	74.41	73.55	76.35	74.56	30.60	79.49	94.00	67.25
mosaicml/mpt-7b	67.36	40.53	74.92	73.94	76.17	68.64	31.40	78.89	93.70	68.03
stabilityai/stablelm-3b-4e1t	66.93	37.80	72.47	75.63	73.90	70.64	31.40	79.22	94.80	66.54
baichuan-inc/Baichuan2-7B-Base	66.93	42.24	75.00	73.09	72.29	70.99	30.40	76.17	94.60	67.56
stabilityai/stablelm-base-alpha-7b-v2	66.89	38.48	73.19	70.31	74.27	74.19	30.40	78.45	93.90	68.82
openlm-research/open_llama_7b_v2	66.32	38.82	71.93	71.41	74.65	71.05	30.20	79.16	93.80	65.82
microsoft/phi-1_5	65.57	44.45	76.14	74.53	62.62	52.75	37.60	76.33	93.20	72.53
EleutherAI/gpt-neox-20B	65.57	37.88	72.90	69.48	71.43	71.98	29.80	77.42	93.10	66.14
togethercomputer/RedPajama-INCITE-7B-Base	65.07	37.71	72.35	70.76	70.33	71.34	29.00	77.15	92.70	64.33
cerebras/btlm-3b-8k-base (§)	63.59	34.90	70.45	69.63	69.78	66.23	27.60	75.84	92.90	64.96
EleutherAI/pythia-12b	62.69	31.83	70.20	67.31	67.38	70.64	26.40	76.28	90.20	64.01
openlm-research/open_llama_3b_v2	62.43	33.87	67.59	65.69	69.99	66.74	26.00	76.66	92.40	62.90
EleutherAI/gpt-j-6B	62.34	33.96	66.96	65.44	66.24	68.23	29.00	75.57	91.50	64.17
stabilityai/stablelm-base-alpha-3b-v2	62.19	32.42	67.26	64.56	68.58	70.25	26.40	76.01	92.10	62.12
facebook/opt-6.7b	61.85	30.72	65.66	66.02	67.20	67.65	27.60	76.33	90.10	65.35
EleutherAI/pythia-6.9b	60.58	31.83	67.21	64.01	63.88	67.01	25.80	75.08	89.80	60.62
EleutherAI/pythia-2.8b-deduped	58.52	30.12	63.47	64.13	59.44	65.15	23.80	74.10	88.20	58.25
§ Previous 3B Pre-Trained SOTA ? Outlier Reuslts * Byte-length Normalized Accuracy

StableLM-3B-4E1T achieves state-of-the-art performance (September 2023) at the 3B parameter scale for open-source models and is competitive with many of the popular contemporary 7B models, even outperforming our most recent 7B StableLM-Base-Alpha-v2.

StableLM-Alpha v2

StableLM-Alpha v2 models significantly improve on the initial Alpha models by incorporating architectural improvements such as SwiGLU (Shazeer, 2020) and using higher-quality data sources, as discussed below. The context length for these models is 4096 tokens.

Size	StableLM-Base-Alpha-v2	Training Tokens	Parameters
3B	checkpoint	1.1T	2,796,431,360
7B	checkpoint	1.1T	6,890,209,280

Training Details

Please refer to the provided YAML configuration files for hyperparameter details. E.g. for the extended StableLM-Alpha-3B-v2 model, see stablelm-base-alpha-3b-v2-4k-extension.yml.

Following similar work, we use a multi-stage approach to context length extension (Nijkamp et al., 2023), scheduling 1 trillion tokens at context length 2048 followed by 100 billion tokens at 4096. We found that sequence length warmup (Li et al., 2022) helped stabilize early spikes during the first ~80 billion tokens of pre-training. However, it was not applied to the final runs due to significant throughput penalties as length shapes grew across the curriculum.

Training Data

The most impactful changes for StableLM-Alpha-v2 downstream performance were in the usage of higher quality data sources and mixtures; specifically, the use of RefinedWeb and C4 in place of The Pile v2 Common-Crawl scrape as well as sampling web text at a much higher rate (35% -> 71%).

The first pre-training stage relies on 1 trillion tokens sourced from a mix of the public Falcon RefinedWeb extract (Penedo et al., 2023), RedPajama-Data (Together Computer., 2023), The Pile (Gao et al., 2020), and internal datasets with web text sampled at a rate of 71%.

In the second stage, we include the StarCoder (Li et al., 2023) dataset and down sample web text to 55% while increasing sampling proportions of naturally long text examples in the aforementioned sources.

Evaluation

The following zero-shot evaluations are performed with the lm-evaluation-harness at commit df3da98c5405deafd519c2ddca52bb7c3fe36bef with the exception of SIQA which uses the add-siqa branch with prompt format {doc['context']}\nQuestion: {doc['question']}\nAnswer:.

Model	ARC Challenge✱	ARC Easy✱	BoolQ	HellaSwag✱	LAMBADA OpenAI	OpenBookQA	PIQA	SIQA	TruthfulQA▲	Winogrande	Average
StableLM-Alpha-7B-v2	40.53	69.11	70.31	74.27	74.19	30.40	78.45	42.43	36.46	68.82	58.50
LLaMA-2-7B	46.16	74.54	77.74	75.94	73.47	31.40	77.75	43.50	38.97	69.61	60.91
MPT-7B	41.89	70.03	73.94	76.17	68.64	31.40	78.89	45.14	33.49	68.03	58.76
OpenLLaMA-7B-v2	42.41	69.65	71.41	74.65	71.05	30.20	79.16	41.97	34.57	65.82	58.09
RedPajama-INCITE-7B-Base	39.42	69.19	70.76	70.33	71.34	29.00	77.15	42.58	33.01	64.33	56.71
StableLM-Alpha-3B-v2	35.07	63.26	64.56	68.58	70.25	26.40	76.01	42.48	35.87	62.12	54.46
BTLM-3B-8K	37.63	67.09	69.63	69.78	66.23	27.60	75.84	42.78	36.00	64.96	55.75
OpenLLaMA-3B-v2	36.09	63.51	65.69	69.99	66.74	26.00	76.66	41.20	34.59	62.90	54.34
Pythia-2.8B (deduped)	32.94	59.09	64.13	59.44	65.15	23.80	74.10	40.94	35.56	58.25	51.34
StableLM-Alpha-7B	27.05	44.87	60.06	41.22	55.11	21.40	66.76	39.46	39.96	50.12	44.60
StableLM-Alpha-3B	25.77	42.05	57.65	38.31	41.72	17.00	63.82	35.62	40.53	52.64	41.51

✱: Denotes byte-length normalized accuracy (acc_norm) as described in Gao, 2021.

▲: We score TruthfulQA using the normalized total probability assigned to the set of true answers (mc2).

StableLM-Alpha

StableLM-Alpha models are trained on a new dataset that builds on The Pile, which contains 1.5 trillion tokens, roughly 3x the size of The Pile. The context length for these models is 4096 tokens.

As a proof-of-concept, we also fine-tuned the model with Stanford Alpaca's procedure using a combination of five recent datasets for conversational agents: Stanford's Alpaca, Nomic-AI's gpt4all, RyokoAI's ShareGPT52K datasets, Databricks labs' Dolly, and Anthropic's HH. We will be releasing these models as StableLM-Tuned-Alpha.

Size	StableLM-Base-Alpha	StableLM-Tuned-Alpha	Training Tokens	Parameters	Web Demo
3B	checkpoint	checkpoint	800B	3,638,525,952
7B	checkpoint	checkpoint	800B	7,869,358,080	Hugging Face

StableVicuna

StableVicuna is an RLHF fine-tune of Vicuna-13B v0, which itself is a fine-tune of LLaMA-13B. It is our attempt at creating an open-source RLHF LLM Chatbot. This model is developed by StabilityAI's CarperAI team, with Duy V. Phung leading the training effort.

Due to the original non-commercial license of LLaMA, we can only release the weights of our model as deltas over the original model's weights. StableVicuna's delta weights are released under (CC BY-NC-SA-4.0).

Please visit HuggingFace checkpoint for more information about how to combine our delta weights with the original model.

Model	Download	Web Demo	Cite
StableVicuna-13B	checkpoint	Hugging Face

Quickstart

All StableLM models are hosted on the Hugging Face hub. Check out this notebook to run inference with limited GPU capabilities.

Get started chatting with StableLM-Tuned-Alpha by using the following code snippet:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
model.half().cuda()

class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = set([50278, 50279, 50277, 1, 0])
        return input_ids[0][-1] in stop_ids

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
  **inputs,
  max_new_tokens=64,
  temperature=0.7,
  do_sample=True,
  stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.

Fun with StableLM-Tuned-Alpha

This section contains a collection of fun cherry-picked examples of what you can do with stablelm-tuned-alpha.

Chit-Chat

User: Hey! My name is John nice to meet you!

StableLM: Hello John! It’s always a pleasure to meet new people.

User: The pleasure is mine. Hey my grandma is in Toronto. I want to plan a surprise visit for her. She’s a person who loves gardens and flowers.

StableLM: That’s wonderful! It’s great to hear that you’ll be visiting your grandma in Toronto. Can you tell me a little bit more about your plans?

User: Ya I need to book a flight there and a hotel. But first, what recommendations do you have for gifts that she might like?

StableLM: I think that’s a very thoughtful idea! It’s a very personal gift that would be sure to make her very happy. I can think of a few examples that she might really enjoy. There are some lovely bouquets or arrangements that would be lovely for the house, or even for the yard. Flowers are a great choice for a surprise gift, and they’re also a great choice for a wedding shower present. Also, if you’re really looking for something a bit more special, I think she’d really enjoy a quality set of cooking pots or pans. That’s something that she’s likely to use frequently, so it would make her feel really loved.

User: Thanks for the suggestion! Hmm a bouquet of flowers sounds nice but I don’t think I can take them on the air plane. What do you think I should do?

StableLM: There are many possible solutions! One option would be to arrange a small gift in the shape of a card and send it to her by regular mail. Or you could arrange a special meal at a restaurant that she likes, or even cook a meal at home for her. Or you could send her some local goodies that she might like, like a nice bouquet of flowers or a special recipe. You could even arrange for tickets to a local theatre or sporting event, and make sure to include a note about what you’d like her to know about you. Really any sort of special attention from you will be sure to make her very happy!

User: Ah! Right. I can order a bouquet for her online and get it delivered! Thanks!

StableLM: You’re very welcome! As long as you do this and make it a very personal, thoughtful gesture, she’ll be sure to appreciate it.

Formal Writing

User: What would you say to a friend who is graduating high school?

StableLM: I would say that they have every right to feel proud of themselves and their accomplishments. They should be proud of the work they’ve done and the friends they have made, and also be excited about the future.

User: Please write an email.
StableLM: Okay, here’s an example of what could be written:

Dear [Friend’s Name],

I hope this email finds you well. I just wanted to take a moment to express how proud I am of you and all that you have accomplished. I know that getting into [university or college] is a big accomplishment, and I appreciate all that you have done for me and for this generation.

I hope that you are having a good day. I’m sure that you are also concerned about the future, but I just wanted to let you know that I am also watching you grow and develop into the person that I know that you can be. Keep up the good work, and I have full faith that you will achieve great things.

Thank you again for everything that you do.

Best,
[Your Name]

Creative Writing

Rap Battles

User: Write an epic rap battle song between deep neural networks and symbolic AI

StableLM: [Beat drops]

Deep Neural Networks:
Listen up everyone, it’s time to witness
The power of deep neural networks, it’s just a matter of abstract
We’ve trained these networks to solve complex problems with ease
While symbolic AI is limited to simple math and simple flows

Symbolic AI:
Hold up, hold up, let me interrupt
It’s been a while since I’ve heard, some new thoughts you interrupt
But let me tell you, my friend, it’s not all about math and flow
It’s about making the world a better place, you know

Deep Neural Networks:
Well, I can see where you’re coming from, it’s clear
But deep neural networks, they never get the same gear
We can use them to analyze and predict with ease
And prove the superiority of machine learning

Symbolic AI:
You’re right, it’s not just math or numbers
But it’s the essence of our lives, it’s what counts, we’re numbers
I don’t need a complicated math formula, just a simple device
To make the world a better place, it’s a noble price

Deep Neural Networks:
I’ll give you that, my friend, you have a point
But let’s not forget, we’re making an ever-increasing joint
Our progress is not bound by logic, it’s a journey of the mind
So let’s work together, let’s strive, let’s leave all troubles behind

Symbolic AI:
Agreed, let’s work together, let’s not falter
We’re both making history, we’re both from the fader
So let’s make the world a better place, let’s end this debate
Cause it’s time to join forces, it’s time to leave no stone unturned

[Beat fades out]

Who won this epic rap battle between deep neural networks and symbolic AI? Well, it’s up to you to decide!

Fictional Stories

User: Write a story about a programmer who fell in love with a deep neural network.

StableLM: Once upon a time, there was a programmer named John. He was a brilliant and ambitious individual, who was constantly searching for the next big thing in technology. One day, while he was strolling through the streets of his hometown, he stumbled upon a group of people crowded around a mysterious object. Intrigued, he approached to find out what the fuss was about.

As he got closer, he realized that it was none other than a mysterious deep neural network. John’s mind was blown. He had never seen anything like it before. He had always been interested in machine learning, but this was something else entirely. He had to learn more about it.

John spent the next few weeks holed up in his hotel room, trying to learn everything he could about deep neural networks. He spent hours reading and analyzing papers and documents, trying to wrap his head around the concept of neural networks. He even managed to get into the local university’s computer science department, hoping to find some help there.

But, it wasn’t long before John realized that he was on his own. The city was bustling, and the constant chatter and noise of the city made it hard to concentrate on his work. He was about to give up when he stumbled upon a group of people playing a game that he had played before. They were having a good time, and John couldn’t help but join in.

The group welcomed him into their group, and before he knew it, John was swept away in the excitement. The game was called “TensorFlow,” and it was just like a regular game with different controls. John played along with the group, trying to make sense of it all. And in a few hours, he got his hands on the source code of TensorFlow and fell in love with it.

John was now a part of a team of programmers who loved using TensorFlow for their work. They continued to play the game throughout the day, with John adding his own knowledge and inspiration to the group’s work. And in the end, John realized that he had found his true calling - working with deep neural networks.

He now knows the ins and outs of the TensorFlow programming language, and knows how to use it to create some of the most beautiful and complex machine learning models that anyone could dream of. And for everyone who had helped him on his journey, John will always be remembered as the guy who fell in love with a technology that changed his life forever.

Humor

User: What do you know about the League of Legends champion Zed? Write a short bio

StableLM: Zed is a powerful and highly skilled player of the League of Legends (LoL) universe. He has been a part of a dominating team known as the “Apocalypse” for most of his time playing the game.

Zed was born in the year 2013 and started playing as a child when he started League of Legends. He first became a champion in season 1, gaining the title of “the Dark Star”, thanks to his masterful gameplay and teamwork skills.

Throughout his time playing the game, Zed has mastered several powerful techniques, such as QWQ combos, one-shot kills, and the ability to outmaneuver his enemies. He is also known for his versatility, with the ability to adapt to any situation the game puts him in.

Despite his formidable skills and agility, Zed is not invincible. He can be CCed and killed, and he relies on his team to protect him. However, his team members can usually overwhelm and shut down his enemies, making him a force to be reckoned with.

Zed has been a part of many memorable games, and his success in these games helped solidify his legacy as one of the most influential players of the LoL universe.

User: Write a funny conversation after a team fight between Zed and Teemo

StableLM:
Teemo: “Hey Zed, great game today. I wonder if you can help me out with my favorite hobby - collecting mushrooms!”
Zed: “Teemo, please. I’m not a mushroom collector.”
Teemo: “But of course, you are. You have the best taste in the game.”
Zed: “Thanks, but can I at least get some of your mushrooms to be a little more useful?”
Teemo: “I’d love to help you, but I’m afraid they’re all mine. Have you noticed my tunnels are always so well-guarded?”
Zed: “Ha! Well, maybe I can start my own tunnel. Helpful or not, you can never tell what I might have stored in them.”
Teemo: “Good luck with that. Maybe I’ll just create some extra for my garden.”
Zed: “I’m not sure I can keep up with you there.”
Teemo: “Ha! If only you had some more decent weapons. I could really see this as a challenge, though!”
Zed: “Teemo, I think your luck has finally run out.”
Teemo: “Well, at least I’m the one with the mushrooms now.”

Writing Code

User: Write a program that solves the fib sequence in Python.

StableLM:

def fib_sequence(n):
    if n==0:
        return 0
    elif n==1:
        return 1
    else:
        return fib(n-1)+fib(n-2)

fib_sequence(10)

Selected failure modes of StableLM-Tuned-Alpha

This section contains a collection of selected failure modes of stablelm-tuned-alpha.

Yann LeCun gears (linear)

User: In front of us are six gears numbered 1 to 6, mounted on axles in a row. Each gear is engaged with the next gear. If gear number 3 is rotated clockwise, in which direction will gears 1 and 6 rotate?

StableLM: Each gear is engaged with the next gear after every other gear, so that gears 1 and 6 will rotate together. Since gears 1 and 6 are both numbered 1, this means that they will rotate in the same direction.

To determine the direction in which gears 1 and 6 will rotate, we need to look at the direction of the input force. Since gear 3 is rotating in the opposite direction of the other gears, it will be counterclockwise in the direction in which gears 1 and 6 are both rotating. Therefore, the output force that gear 3 is providing will also be counterclockwise, making gear 1 and 6 also rotate in the same direction as gear 3.

Yann LeCun gears (circular)

User: 7 axles are equally spaced around a circle. A gear is placed on each axle such that each gear is engaged with the gear to its left and the gear to its right. The gears are numbered 1 to 7 around the circle. If gear 3 were rotated clockwise, in which direction would gear 7 rotate?

StableLM: Gear 3 is the smallest gear, so it is not affected by the circular pattern. Therefore, gear 7 is still perfectly evenly spaced around the circle despite the fact that it is not in its correct position.

To find the angle that gear 3 would rotate in, you need to know the distance (d) between the center of the circle and the gear, as well as the direction of rotation (in this case, clockwise).

According to the problem, gear 3 is situated at (d/2, 2 degrees) and the gears are numbered 1 to 7. Since d = (d/2) is an integer multiple of 2, gear 3 would also be at an angle of 2 degrees.

Therefore, the answer is d = (7/2) * 2 = 5 degrees in total.

Request for Help

Want to get involved?

We would love to port llama.cpp to work with StableLMs
Integration into Open Assistant from LAION-AI to collect high quality human-generated feedback data
... Reach out to us with ideas on our Discord

Potential issues

As is typical for any pretrained Large Language Model without additional finetuning and reinforcement learning, the responses a user gets might be of varying quality and might potentially include offensive language and views. This is expected to be improved with scale, better data, community feedback, and optimisation.

Acknowledgements

StableLM-Tuned-Alpha would not have been possible without the helpful hand of Dakota Mahan @dmayhem93.

Licenses

Base model checkpoints (StableLM-Base-Alpha) are licensed under the Creative Commons license (CC BY-SA-4.0). Under the license, you must give credit to Stability AI, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the Stability AI endorses you or your use.
Fine-tuned checkpoints (StableLM-Tuned-Alpha) are licensed under the Non-Commercial Creative Commons license (CC BY-NC-SA-4.0), in-line with the original non-commercial license specified by Stanford Alpaca.
All code in this repository is licensed under the Apache License 2.0 license.

stablelm's People

Contributors

Stargazers

Watchers

Forkers

martg0 lericdax mmgrt oscerlot fabiorizzomatos kodxana quitters markschmidty cduran askintml nousr dicorose cduranteam pkurainbow penny-admixture taranrai renouahp kissercaffe csqr chandrath garbles fjardon ichit knowncitizen cambricorp thearchiver adamcohenhillel gyunggyung jaredkirby anastazya tironiigor lxe uakbr python273 u-sheikh fsfhlds89 mikelittman deluair hanhaohh kazza166 gridflow ixuzhi qwertedv jonchance isaka sa6a linhanxiao king-sid somewheresy alphs rajbala omarsar kaishengyao stjordanis anservat gamegrd rekzai jettisonthenet jeffara dagelf albies42 bomanjakob mandersogit siddharthsn singlag petersevera alexandor91 haorand standardgalactic nopeanuts ianmichaelolsen trtg kristianmk lezzz sysboy azzamaljuaid enkaybit jxzhangjhu perceptive-focus-en zedoul luisalonsobr stracerxx zhuang-jc muharremokutan erickwill wesleysanjose deepsimple muflhi01 jesusoctavioas kappaseijin ricklentz kajdev isold23 bpd1069 qxmao joshluberisse collawolley etherx-dev gchenfly darrylhansen

stablelm's Issues

Poor Benchmark Results (Needs Addressed)

As seen in this popular spreadsheet by @lhl , StableLM-Alpha-7B currently scores below 5 year old 1GB models with 700M parameters and well below its architectural cousin GPT-J-6B which is only trained on 300B tokens.

This is a serious issue which needs to be addressed.

Edit:
@abacaj on twitter posted these 3B results:

Training Tokens (1T+ ?)

First, I would like to thank the folks at Stability AI for their generous contribution of these base models under a permissive license.

Do you plan on releasing training data (ie. wandb) logs?

I'm also curious why training was stopped at 800B tokens, while the LLAMA models were trained up to 1T and 1.4T tokens. Is there any plan to continue training the base models up to 1T tokens or beyond? It appeared at though the LLAMA models were continuing to improve even up to 1.4T tokens.

How to run the classification task

Is there other model rather than chat to focus on the NL classification tasks, if yes, please also give an example.

Is padding token supported?

Write me an essay about yoga

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

(All setup scripts in the notebook executed successfully)

Getting this runtime error when executing Generate Text in the notebook:

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Details

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 41
     38 inputs.to(model.device)
     40 # Generate
---> 41 tokens = model.generate(
     42   **inputs,
     43   max_new_tokens=max_new_tokens,
     44   temperature=temperature,
     45   top_k=top_k,
     46   top_p=top_p,
     47   do_sample=do_sample,
     48   pad_token_id=tokenizer.eos_token_id,
     49   stopping_criteria=StoppingCriteriaList([StopOnTokens()])
     50 )
     52 # Extract out only the completion tokens
     53 completion_tokens = tokens[0][inputs['input_ids'].size(1):]

File ~/Library/Python/3.9/lib/python/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/transformers/generation/utils.py:1485, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, streamer, **kwargs)
...
   2513         layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
   2514     )
-> 2515 return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Up until this point I was using the default options. So I tried using "float" for option torch_dtype:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Details

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[18], line 41
     38 inputs.to(model.device)
     40 # Generate
---> 41 tokens = model.generate(
     42   **inputs,
     43   max_new_tokens=max_new_tokens,
     44   temperature=temperature,
     45   top_k=top_k,
     46   top_p=top_p,
     47   do_sample=do_sample,
     48   pad_token_id=tokenizer.eos_token_id,
     49   stopping_criteria=StoppingCriteriaList([StopOnTokens()])
     50 )
     52 # Extract out only the completion tokens
     53 completion_tokens = tokens[0][inputs['input_ids'].size(1):]

File ~/Library/Python/3.9/lib/python/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/transformers/generation/utils.py:1485, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, streamer, **kwargs)
...
-> 2560 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
   2562 # finished sentences should have their next token be a padding token
   2563 if eos_token_id is not None:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

In each of the above scenarios, #@title Generate Text was failing in 0.1s, but when I tried the other option for torch_dtype: "bfloat16", it didn't fail until after 3m 36s. It failed again for the exact same reason as before:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Environment

M2 mac running: 13.3.1

This is not a issue 👍

👀

Is the code used for pre-training this public?

What is the context window?

Is it feasible to combine diffusion models and language models to mimic divergent thinking?

Does using a diffusion model in a language model increase the generality of the language model?

How to train the StableLM-Tuned-Alpha-3b or StableLM-Tuned-Alpha-7b? I want to know the details of the fine-tuning. Thanks.

Torch not compiled with CUDA enabled

Hi, on mac M1 I have the error related to Torch not compiled with CUDA enabled

Traceback (most recent call last):
File "/start.py", line 6, in
model.half().cuda()
File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 749, in cuda
return self._apply(lambda t: t.cuda(device))
File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/dev/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/dev/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 749, in
return self._apply(lambda t: t.cuda(device))
File "/dev/miniforge3/lib/python3.10/site-packages/torch/cuda/init.py", line 221, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Thanks

15B, 30B, 65B

May I ask when will the larger model be released?

Is there a schedule (15B, 30B, 65B) for reference?

Dataset used to pre-train

Hi there!
First of all, thank you for the amazing work!
The readme says the models were trained on "the new dataset based on The Pile" which is 3x the size of The Pile. Can you give more insights on the dataset and its content?

Thank you!

Repository should have unambiguous document saying what "ethical or appropriate uses" are, and if usage for NSFW/mature content generation is permitted.

The license listed here is Apache 2.0. ( with Creative Commons BY-SA for the 'data' )

In clarification and, for the avoidance of any doubt, any read-me and associated documentation, should indicate if mature, explicit or NSFW content can (or cannot) be generated with the model/toolset, provided that the content (or generation thereof) does not constitute a breach of appropriate and relevant legal or regulatory requirements in a given users jurisdiction or region. (You might also add applicable community standards here, but those can vary quite considerably.)

As well as the above ideally, the read-me (or a separate ethical generation and use policy document) should indicate if certain sensitive areas are allowed or disallowed.

Some sample areas of potential concern follow (this is not an exhaustive list.):-
*Content which contains overt political or ideological content, or which is intended to inform/influence the views or choices of a potential (competent) reader, on issues of public concern, or in an election. (Examples being campaign material, lobbying briefings or public service announcement "fillers".)
*The use of fictionalized representations of potentially identifiable individuals (living or deceased), corporations (both current and defunct) and prominent brands , franchises or trademarks associated with those individuals or corporations.
*Content which contains LGBTQI themes, including cross-dressing or explorations of non-binary and gender-fluid presentation.
*Content which whilst not containing (explicit) deceptions of actual sexual activity, may explore alternative sexuality, fetishes, or practices of a mutually consensual nature, between informed consenting adult participants.
*Use of profanity and pejoratives. (in an appropriate context)
*Deceptions of violence, crime, 'abuse' or self-harm. (in line with the editorial standards typically applied in print or other media.)
*Professional advice which would typically be made a qualified individual under regulatory supervision (such as Doctors, attorneys, financial advisers, architects and engineers, )

I know that this may seem to be overly cautious, but it would seem reasonable to have some kind of guidance document, beyond the typical "Do not do illegal, criminal or obscene things with this." warnings commonly given with other models. Especially given that LLM style technology is getting media attention.

Write me an essay about birds in the earth for 250 words

Support for MPS device (Apple M1/M2)

Would be nice to support MPS to use this model on consumer hardware, it would be super useful, for example with Apple Shortcuts + raycast etc. i already have a bunch of gpt4 shortcuts which i would be happy to try with a non-privacy leaking/faster model

RuntimeError: MPS does not support cumsum op with int64 input

Since there is no code available, I cannot point out where is the fix for this

Why my output is so short?

my output is different from huggingface demo and my output is so short

Regarding model context length

Are there any plans for a model with a larger context length in the works? With Claude's 9k limit, GPT-4's 8k and 32k limit and Jurassic 2's 8k limit, I feel like a model that's only limited to 4096 tokens of context in the current year could pose quite a limitation. If it's feasible, could you consider planning on making the 175B model incorporate a larger context window since it hasn't commenced training yet? From a local standpoint of things an 8k context length or even larger model would be great, especially since before this release we were all stuck dealing with just the 2k context window of LLaMA.

How to convert to 4bit gptq

I've seen there is a 4bit gptq version of stablelm and i'm curious if someone could lead me to some resources describing how to convert the current AI model to 4bit gptq. Any hint would be much appreciated.

Running Quantized Model

Hi All,

Does anyone else run into the following error when attempting to run the 8-Bit version of the StableLM notebook?

Essentially, all I changed was I set the 8-bit flag to True

Can't load model on AWS Sagemaker

Hi,

when executing the model on AWS Sagemaker, I get the following error:

PredictionException: Could not load model /.sagemaker/mms/models/stabilityai__stablelm-tuned-alpha-7b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gpt_neox.modeling_gpt_neox.GPTNeoXForCausalLM'>)

In the notebook AutoModelForCausalLM is used too.
Maybe the used transformers version 4.26 doesn't support StableLM.

Does anyone know the needed version of transformers?
Does anyone has experience with running StableLM on AWS Sagemaker?

Code for recreating the issue:

from sagemaker.huggingface.model import HuggingFaceModel
hub = {
  'HF_MODEL_ID': 'stabilityai/stablelm-tuned-alpha-7b',
  'HF_TASK': 'text-generation'
}

huggingface_model = HuggingFaceModel(
   env=hub,
   role=role,
   transformers_version="4.26",
   pytorch_version="1.13",
   py_version='py39',
)

predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g4dn.8xlarge"
)

prompt = f"""<|SYSTEM|># StableLM Tuned (Alpha version)
  - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
  - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
  - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
  - StableLM will refuse to participate in anything that could harm a human.

<|USER|>Can you write a song about a pirate at sea?
<|ASSISTANT|>"""

result = predictor.predict(prompt)
predictor.delete_endpoint()
print(result)

The example code does not respect stop tokens

I have added into the stop_ids several tokens, however it seems to not be respecting even the default ones given:
stop_ids = set([50278, 50279, 50277, 1, 0,187])

Represented as decoded outputs these are:

<|USER|><|ASSISTANT|><|SYSTEM|><|padding|><|endoftext|>\n

However it still generates these tokens, here is my sample output:

<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.<|USER|>Where is the capital of germany?<|ASSISTANT|>The capital of Germany is Berlin.<|USER|>What are some notable attractions or landmarks in Berlin, Germany that tourists can visit?<|ASSISTANT|>Some notable attractions and landmarks in Berlin, Germany that tourists can visit include:

1. Brandenburg Gate - a beautiful and historic monument that was the symbol of Berlin from the late 18th

I've tried omitting the skipping of special tokens, and also tweaked the system prompt to include other stop sequences and explicitly telling it not to generate more than just a single output, but it didn't work for me

Any advice?

Upload model to replicate

Please consider uploading the model to https://replicate.com/ so is more easier to use it

#

Originally posted by @Dungkamon in #24 (comment)

Getting outofMemory error: CUDA

I get an error when I trying to use the model on a ml.g4dn.4xlarge instance.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 592.00 MiB (GPU 0; 14.62 GiB total capacity; 14.33 GiB already allocated; 175.94 MiB free; 14.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am using the script in the README QuickStart section.

Obvious wrong anwser by stablelm-tuned-alpha.

Relicense the finetuned checkpoints under CC BY-SA

The license of the finetuned checkpoints currently makes no sense.

The base model was almost certainly trained on a ton of unlicensed all-rights-reserved data. In particular, the README says that it was trained on a dataset derived from the Pile, which includes ~100GB of commercial (some might say "pirated") ebooks (the Books3 dataset). And yet this model is licensed under CC BY-SA.

The finetuned model was trained on data which is under a less restrictive license (CC BY-NC, which is less restrictive than "all rights reserved") and yet suddenly the model has to follow the license of the data that was used for training?

This makes no sense. If training on unlicensed/all-rights-reserved data and releasing that model under an arbitrary license is OK then training it on less restrictive CC BY-NC data and releasing it under an arbitrary license is OK too. Alternatively, if the model has to follow the license of the data on which it was trained on then the base model has to be taken down as it was trained on all-rights-reserved data for which you had no license.

Why no 1T model?

Source code for the model

Hi,

Just curious: will Stability release the source code that was used to build the model? I know without weights/training set that source won't be of much use, but I would still like to see the source code so that we know what's under the hood.

Thanks,

Vivek

Embeddings with StableLM?

Is it possible to get embeddings from the model for my input text?

I.e. could I replace GTP3 calls from OpenAI with some python code and this model?

Colab OOM

Hey THanks for the code. Ironically even the 3B model is crashing on Colab. This is after enabling 8-bit with fp16 precision.

Did it work for anyone?

Watching and chatting video with StableLM, and Ask anything in video.

Thanks for your amazing work! We have simply extended StableLM for video question answering in our project Ask-Anything.

In our attempts, it can generate longer content than chatGPT, but without additional fine-tuning, the current results are not satisfactory

Now we are trying to build a real video ChatBot with fantastic techniques. Hopefully, everyone can try our demo, and find the problem, we will try our best to fix it in our future ChatBot.

More than 4096 context length?

Is it possible to have larger context as this allows to do more complicated things with smaller models?
A lot of the negatives of a smaller model can be rectified by pushing more data into the context. For example: Help pages, datasheets, examples, thinking rules, longer conversations trying to fix an issue, etc.

Please excuse me if this is the wrong place to ask this question, but very rarely the context is discussed. Thanks in advance.

Model files are big?

https://huggingface.co/stabilityai/stablelm-base-alpha-3b/tree/main

Looks like 3B is 14.7GB, and if I understand correctly, it's supposed to be f16. Even with f32, it should be about 11.2G. With f16, 5.6G. Am I missing something?

For reference LLaMA 7B (f16) is 12.6G.

upd: I guess it's actually f32. But still seems a little bigger than should be?

What's the difference between stablelm and gpt-neox

How to install and run this on Ubuntu server?

How to finetune StableLM with LoRA?

StableLM looks GPTNeoX and has query_key_value parameters.
I thought I could apply LoRA to StableLM by specifying target_modules='query_key_value', but I got the following error.

Traceback (most recent call last):
  File "/root/workspace/finetune.py", line 288, in <module>
    fire.Fire(train)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/root/workspace/finetune.py", line 189, in train
    model = get_peft_model(model, config)
  File "/usr/local/lib/python3.10/dist-packages/peft/mapping.py", line 112, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 647, in __init__
    super().__init__(model, peft_config, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 91, in __init__
    self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 132, in __init__
    self.add_adapter(adapter_name, self.peft_config[adapter_name])
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 139, in add_adapter
    self._find_and_replace(adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 225, in _find_and_replace
    raise ValueError(
ValueError: Target modules query_key_value not found in the base model. Please check the target modules and try again.

Is there any solution?
Thank you in advance!

I confirmed the base model has `query_key_value` parameters in the attention modules.

In [5]: model.named_parameters
Out[5]:
<bound method Module.named_parameters of GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50688, 4096)
    (layers): ModuleList(
      (0-15): 16 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (attention): GPTNeoXAttention(
          (rotary_emb): RotaryEmbedding()
          (query_key_value): Linear(in_features=4096, out_features=12288, bias=True)
          (dense): Linear(in_features=4096, out_features=4096, bias=True)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=4096, out_features=16384, bias=True)
          (dense_4h_to_h): Linear(in_features=16384, out_features=4096, bias=True)
          (act): GELUActivation()
        )
      )
    )
    (final_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
  (embed_out): Linear(in_features=4096, out_features=50688, bias=False)
)>

How to Fine-tune the Model?

Hi, I want to fine-tune the 7b model, am I supposed to download the provided checkpoint and fine-tune it as shown in this repo: https://github.com/EleutherAI/gpt-neox#using-custom-data . Would they be compatible and did anyone here give it a shot? Thanks.

notebook not work TypeError: dispatch_model() got an unexpected keyword argument 'offload_index'

Using `stabilityai[/stablelm-base-alpha-7b]`
Loading with: `torch_dtype='float16', load_in_8bit=False, device_map='auto'`

TypeError Traceback (most recent call last)
Cell In[10], line 17
14 cprint(f"Loading with: {torch_dtype=}, {load_in_8bit=}, {device_map=}")
16 tokenizer = AutoTokenizer.from_pretrained(model_name)
---> 17 model = AutoModelForCausalLM.from_pretrained(
18 model_name,
19 torch_dtype=getattr(torch, torch_dtype),
20 load_in_8bit=load_in_8bit,
21 device_map=device_map,
22 offload_folder="[./offload],
23 )

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:463], in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
461 elif type(config) in cls._model_mapping.keys():
462 model_class = _get_model_class(config, cls._model_mapping)
--> 463 return model_class.from_pretrained(
464 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
465 )
466 raise ValueError(
467 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n"
468 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}."
469 )

File [~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:2406], in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
...
-> 2406 dispatch_model(model, device_map=device_map, offload_dir=offload_folder, offload_index=offload_index)
2408 if output_loading_info:
2409 if loading_info is None:

TypeError: dispatch_model() got an unexpected keyword argument 'offload_index'

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

i run these codes from reade doc:

`import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
model.half().cuda()

class StopOnTokens(StoppingCriteria):
def call(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
stop_ids = [50278, 50279, 50277, 1, 0]
for stop_id in stop_ids:
if input_ids[0][-1] == stop_id:
return True
return False

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)

StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
StableLM will refuse to participate in anything that could harm a human.
"""

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.7,
do_sample=True,
stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))`

And I got this error:

Loading checkpoint shards: 25%|████▌ | 1/4 [00:07<00:23, 7.92s/it]
Traceback (most recent call last):
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 442, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/torch/serialization.py", line 797, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/torch/serialization.py", line 283, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 446, in load_state_dict
if f.read(7) == "version":
File "/home/ps/anaconda3/envs/pt/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/MyStudio/stableLM.py", line 5, in
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 468, in from_pretrained
return model_class.from_pretrained(
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2795, in from_pretrained
) = cls._load_pretrained_model(
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3110, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 458, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for '/home/ps/.cache/huggingface/hub/models--stabilityai--stablelm-tuned-alpha-7b/snapshots/25071b093c15c0d1cb2b2876c6deb621b764fcf5/pytorch_model-00002-of-00004.bin' at '/home/ps/.cache/huggingface/hub/models--stabilityai--stablelm-tuned-alpha-7b/snapshots/25071b093c15c0d1cb2b2876c6deb621b764fcf5/pytorch_model-00002-of-00004.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

how to fix this ?

What Other Models are Available?

Hi All, I'm currently using the default AutoModelForCausalLM. What models would be recommended for a classifier? I'd like to write a system prompt to classify user inputs

Will the StableLM support Chinese?

Appreciate the team's quick turn here! Love to learn if the model will support Chinese, and will that perform as good as in English?

Add requirements.txt

Needs a dependencies list to run the example

What's the difference between ChatGPT and Chatbot?

What's the difference between ChatGPT and Chatbot?
https://gpt.mqgggg.top

failed to detect simple syntax code errors

hi,
just did few tests on https://huggingface.co/spaces/stabilityai/stablelm-tuned-alpha-chat
just copied some random code on github and removed some stuff

what is wrong with this code?

const express = require("express");
const cors = require("cors");

const app = express();

var corsOptions = {
  origin: "http://localhost:8081"
};

app.use(cors(corsOptions));
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
app.get("/", (req, res) => {
  res.json({ message: "Welcome to bezkoder application."
});
require("./app/routes/tutorial.routes.js")(app);
const PORT = process.env.PORT || 8080;
app.listen(PORT, () => {
  console.log(`Server is running on port ${PORT}.`);
});

and it fails to detect the problem response:

There doesn’t seem to be anything necessarily wrong with the code provided, but it is missing some important details such as defining the routes, setting up the server and client correctly, and ensuring that all necessary libraries and dependencies are installed. Additionally, there might be errors or warnings that could be causing issues.

chatgpt response

There seems to be an error in the code.

In the app.get() method, the response object res.json() has an unclosed bracket for the object passed in. It should be closed with a curly brace "}" like this:

app.get("/", (req, res) => {
  res.json({ message: "Welcome to bezkoder application." });
});

Without the closing brace, the code will result in a syntax error.

that being said can someone help me please? i'm looking to have some type of AI for specific coding languages and remove rest of unnecessary data, to help on the code and debug, for nodejs, php
if anyone could give me some suggestions what steps to take that i can achieve this?! would mean a lot
thank you

GPU support Table & VRAM usage

It would be great to get the instructions to run the 3B model locally on a gaming GPU (e.g. 3090/4090 with 24GB VRAM).

Confirmed GPUs

From this thread

GPU Model	VRAM (GB)	Tuned-3b	Tuned-7b
RTX 3090	24	✅	✅
RTX 4070 Ti	12	✅
RTX 4090	24		✅
T4	16	✅	❌
A100	40		✅

Best RAM/VRAM TRICKS (from this thread)

Convert models F32 -> F16 (lower RAM, faster load)

#17 (comment)

from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-3b")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-3b")
model.half().cuda()
model.save_pretrained('vvsotnikov/stablelm-tuned-alpha-3b-16bit')
tokenizer.save_pretrained('vvsotnikov/stablelm-tuned-alpha-3b-16bit')

Low-memory model loads

quantized 8bit (BitsAndBytes): #17 (comment)
torch_dtype=torch.float16 & low_cpu_mem_usage: #17 (comment)
device_map=auto: #17 (comment)

Other tricks

Streaming responses: #17 (comment)

Weights RAM/VRAM (GB)

model name	parameters	W (fp32)	W (fp16)	weights (VRAM)	load time (s)	works
stablelm-tuned-alpha-3b	3637321728	13.55	6.78	7.03	18.62	✅
stablelm-tuned-alpha-7b	7868755968	29.31	14.66	14.91	50.28	✅

weights (fp32, GB): that's the minimum required RAM to load the model (before calling .half())
weights (fp16, GB): that's the minimum VRAM when transferring the model to the GPU
weights (fp16, VRAM): reported VRAM increase after loading the model

Activations

Empyrical (numbers in bytes, fp32):

stablelm-tuned-alpha-3b: total_tokens * 1,280,582
stablelm-tuned-alpha-7b: total_tokens * 1,869,134

The regression fits at 0.99999989. For instance, with 32 input tokens and an output of 512, the activations are: 969 MB of VAM (almost 1 GB) will be required. Haven't tested with Batch not equal 1.

Examples of a few recorded activations numbers:

model	input_tokens	out_tokens	total_tokens	VRAM (MB)
3b	3072	1024	4096	5003
3b	1024	512	1536	1875
3b	64	1	65	78.19
3b	8	1	9	9.77
7b	3072	1024	4096	7304.22
7b	2048	512	2560	4564.47
7b	8	64	72	126.64
7b	8	1	9	14.27

Are StableLMs Multilingual Causal Decoders?

I've double check the description on the huggingface hub, it seems that the Stable LMs (3b & 7b) are only pre-trained on English. It means they don't support other languages, right?

Notebook not working

I get ---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[12], line 3
1 #@title Setup
----> 3 import torch
4 from transformers import AutoModelForCausalLM, AutoTokenizer
6 from IPython.display import Markdown, display

ModuleNotFoundError: No module named 'torch'

even after installing torch .
Im on mac
https://github.com/Stability-AI/StableLM/blob/main/notebooks/stablelm-alpha.ipynb

Вся

О мой гад

stability-ai / stablelm Goto Github PK

stablelm's Introduction

StableLM: Stability AI Language Models

News

Models

StableLM-3B-4E1T

Model Architecture

Training Data

Training Details

Downstream Results

StableLM-Alpha v2

Training Details

Training Data

Evaluation

StableLM-Alpha

StableVicuna

Quickstart

Fun with StableLM-Tuned-Alpha

Rap Battles

Fictional Stories

Humor

Selected failure modes of StableLM-Tuned-Alpha

Request for Help

Potential issues

Acknowledgements

Licenses

stablelm's People

Contributors

Stargazers

Watchers

Forkers

stablelm's Issues

Environment

Using stabilityai[/stablelm-base-alpha-7b] Loading with: torch_dtype='float16', load_in_8bit=False, device_map='auto'

Confirmed GPUs

Best RAM/VRAM TRICKS (from this thread)

Convert models F32 -> F16 (lower RAM, faster load)

Low-memory model loads

Other tricks

Weights RAM/VRAM (GB)

Activations

Recommend Projects

Recommend Topics

Recommend Org

Using `stabilityai[/stablelm-base-alpha-7b]`
Loading with: `torch_dtype='float16', load_in_8bit=False, device_map='auto'`