Coder Social home page Coder Social logo

stablelm's Introduction

StableLM: Stability AI Language Models

Stochastic Parrot
“A Stochastic Parrot, flat design, vector art” — Stable Diffusion XL

This repository contains Stability AI's ongoing development of the StableLM series of language models and will be continuously updated with new checkpoints. The following provides an overview of all currently available models. More coming soon.

News

September 29, 2023

August 5, 2023

  • Released patched StableLM-Alpha v2 models with 3B and 7B parameters.

April 28, 2023

  • Released StableVicuna-13B, our RLHF fine-tune of Vicuna-13B v0, which itself is a fine-tune of LLaMA-13B. Delta weights over the original Llama model is released under (CC BY-NC-SA-4.0).

April 20, 2023

  • Released initial set of StableLM-Alpha models, with 3B and 7B parameters. Base models are released under CC BY-SA-4.0.

  • Try to chat with our 7B model, StableLM-Tuned-Alpha-7B, on Hugging Face Spaces.

Models

StableLM-3B-4E1T

Technical Report: StableLM-3B-4E1T

StableLM-3B-4E1T is a 3 billion (3B) parameter language model pre-trained under the multi-epoch regime to study the impact of repeated tokens on downstream performance. Given prior success in this area (Tay et al., 2023 and Taylor et al., 2022), we train on 1 trillion (1T) tokens for 4 epochs following the observations of Muennighoff et al. (2023) in "Scaling Data-Constrained Language Models" in which they find "training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data." Further inspiration for the token count is taken from "Go smol or go home" (De Vries, 2023), which suggests a 2.96B model trained for 2.85 trillion tokens achieves a similar loss to a Chinchilla compute-optimal 9.87B language model ($k_n = 0.3$).

Size StableLM-3B-4E1T Training Tokens Parameters
3B checkpoint 4T 2,795,443,200

Model Architecture

The model is a decoder-only transformer similar to the LLaMA (Touvron et al., 2023) architecture with the following modifications:

Parameters Hidden Size Layers Heads Sequence Length
2,795,443,200 2560 32 32 4096

Training Data

The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the HuggingFace Hub: Falcon RefinedWeb extract (Penedo et al., 2023), and RedPajama-Data (Together Computer., 2023) and The Pile (Gao et al., 2020) both without Books3 and other subsets, and StarCoder (Li et al., 2023).

Given the large amount of web data, we recommend fine-tuning the base StableLM-3B-4E1T for your downstream tasks.

Training Details

Please refer to the provided YAML configuration file stablelm-3b-4e1t.yml for complete hyperparameter settings and the technical report for further details.

Downstream Results

The following zero-shot evaluations are performed with the lm-evaluation-harness using the lm-bench branch of Stability AI's fork. Full lm-eval JSONs can be found in the evals directory.

Pre-Trained Model Average ARC
Challenge
ARC
Easy
BoolQ HellaSwag (✱) LAMBADA
OpenAI
OpenBookQA PIQA SciQ Winogrande
meta-llama/Llama-2-13b-hf 71.77 48.63 79.50 80.52 79.36 76.77 35.40 79.05 94.50 72.22
huggyllama/llama-7b 68.84 41.89 75.25 75.05 76.22 73.55 34.40 78.67 94.60 69.93
meta-llama/Llama-2-7b-hf 68.75 43.00 76.26 77.74 75.94 73.47 31.40 77.75 93.60 69.61
Qwen/Qwen-7B 67.91 45.39 67.38 74.56 88.85 (?) 69.67 32.20 73.99 93.20 65.98
tiiuae/falcon-7b 67.83 40.27 74.41 73.55 76.35 74.56 30.60 79.49 94.00 67.25
mosaicml/mpt-7b 67.36 40.53 74.92 73.94 76.17 68.64 31.40 78.89 93.70 68.03
stabilityai/stablelm-3b-4e1t 66.93 37.80 72.47 75.63 73.90 70.64 31.40 79.22 94.80 66.54
baichuan-inc/Baichuan2-7B-Base 66.93 42.24 75.00 73.09 72.29 70.99 30.40 76.17 94.60 67.56
stabilityai/stablelm-base-alpha-7b-v2 66.89 38.48 73.19 70.31 74.27 74.19 30.40 78.45 93.90 68.82
openlm-research/open_llama_7b_v2 66.32 38.82 71.93 71.41 74.65 71.05 30.20 79.16 93.80 65.82
microsoft/phi-1_5 65.57 44.45 76.14 74.53 62.62 52.75 37.60 76.33 93.20 72.53
EleutherAI/gpt-neox-20B 65.57 37.88 72.90 69.48 71.43 71.98 29.80 77.42 93.10 66.14
togethercomputer/RedPajama-INCITE-7B-Base 65.07 37.71 72.35 70.76 70.33 71.34 29.00 77.15 92.70 64.33
cerebras/btlm-3b-8k-base (§) 63.59 34.90 70.45 69.63 69.78 66.23 27.60 75.84 92.90 64.96
EleutherAI/pythia-12b 62.69 31.83 70.20 67.31 67.38 70.64 26.40 76.28 90.20 64.01
openlm-research/open_llama_3b_v2 62.43 33.87 67.59 65.69 69.99 66.74 26.00 76.66 92.40 62.90
EleutherAI/gpt-j-6B 62.34 33.96 66.96 65.44 66.24 68.23 29.00 75.57 91.50 64.17
stabilityai/stablelm-base-alpha-3b-v2 62.19 32.42 67.26 64.56 68.58 70.25 26.40 76.01 92.10 62.12
facebook/opt-6.7b 61.85 30.72 65.66 66.02 67.20 67.65 27.60 76.33 90.10 65.35
EleutherAI/pythia-6.9b 60.58 31.83 67.21 64.01 63.88 67.01 25.80 75.08 89.80 60.62
EleutherAI/pythia-2.8b-deduped 58.52 30.12 63.47 64.13 59.44 65.15 23.80 74.10 88.20 58.25
§ Previous 3B Pre-Trained SOTA
? Outlier Reuslts
* Byte-length Normalized Accuracy

StableLM-3B-4E1T achieves state-of-the-art performance (September 2023) at the 3B parameter scale for open-source models and is competitive with many of the popular contemporary 7B models, even outperforming our most recent 7B StableLM-Base-Alpha-v2.

StableLM-Alpha v2

StableLM-Alpha v2 models significantly improve on the initial Alpha models by incorporating architectural improvements such as SwiGLU (Shazeer, 2020) and using higher-quality data sources, as discussed below. The context length for these models is 4096 tokens.

Size StableLM-Base-Alpha-v2 Training Tokens Parameters
3B checkpoint 1.1T 2,796,431,360
7B checkpoint 1.1T 6,890,209,280

Training Details

Please refer to the provided YAML configuration files for hyperparameter details. E.g. for the extended StableLM-Alpha-3B-v2 model, see stablelm-base-alpha-3b-v2-4k-extension.yml.

Following similar work, we use a multi-stage approach to context length extension (Nijkamp et al., 2023), scheduling 1 trillion tokens at context length 2048 followed by 100 billion tokens at 4096. We found that sequence length warmup (Li et al., 2022) helped stabilize early spikes during the first ~80 billion tokens of pre-training. However, it was not applied to the final runs due to significant throughput penalties as length shapes grew across the curriculum.

Training Data

The most impactful changes for StableLM-Alpha-v2 downstream performance were in the usage of higher quality data sources and mixtures; specifically, the use of RefinedWeb and C4 in place of The Pile v2 Common-Crawl scrape as well as sampling web text at a much higher rate (35% -> 71%).

The first pre-training stage relies on 1 trillion tokens sourced from a mix of the public Falcon RefinedWeb extract (Penedo et al., 2023), RedPajama-Data (Together Computer., 2023), The Pile (Gao et al., 2020), and internal datasets with web text sampled at a rate of 71%.

In the second stage, we include the StarCoder (Li et al., 2023) dataset and down sample web text to 55% while increasing sampling proportions of naturally long text examples in the aforementioned sources.

Evaluation

The following zero-shot evaluations are performed with the lm-evaluation-harness at commit df3da98c5405deafd519c2ddca52bb7c3fe36bef with the exception of SIQA which uses the add-siqa branch with prompt format {doc['context']}\nQuestion: {doc['question']}\nAnswer:.

Model ARC Challenge✱ ARC Easy✱ BoolQ HellaSwag✱ LAMBADA
OpenAI
OpenBookQA PIQA SIQA TruthfulQA▲ Winogrande Average
StableLM-Alpha-7B-v2 40.53 69.11 70.31 74.27 74.19 30.40 78.45 42.43 36.46 68.82 58.50
LLaMA-2-7B 46.16 74.54 77.74 75.94 73.47 31.40 77.75 43.50 38.97 69.61 60.91
MPT-7B 41.89 70.03 73.94 76.17 68.64 31.40 78.89 45.14 33.49 68.03 58.76
OpenLLaMA-7B-v2 42.41 69.65 71.41 74.65 71.05 30.20 79.16 41.97 34.57 65.82 58.09
RedPajama-INCITE-7B-Base 39.42 69.19 70.76 70.33 71.34 29.00 77.15 42.58 33.01 64.33 56.71
StableLM-Alpha-3B-v2 35.07 63.26 64.56 68.58 70.25 26.40 76.01 42.48 35.87 62.12 54.46
BTLM-3B-8K 37.63 67.09 69.63 69.78 66.23 27.60 75.84 42.78 36.00 64.96 55.75
OpenLLaMA-3B-v2 36.09 63.51 65.69 69.99 66.74 26.00 76.66 41.20 34.59 62.90 54.34
Pythia-2.8B (deduped) 32.94 59.09 64.13 59.44 65.15 23.80 74.10 40.94 35.56 58.25 51.34
StableLM-Alpha-7B 27.05 44.87 60.06 41.22 55.11 21.40 66.76 39.46 39.96 50.12 44.60
StableLM-Alpha-3B 25.77 42.05 57.65 38.31 41.72 17.00 63.82 35.62 40.53 52.64 41.51

✱: Denotes byte-length normalized accuracy (acc_norm) as described in Gao, 2021.

▲: We score TruthfulQA using the normalized total probability assigned to the set of true answers (mc2).

StableLM-Alpha

StableLM-Alpha models are trained on a new dataset that builds on The Pile, which contains 1.5 trillion tokens, roughly 3x the size of The Pile. The context length for these models is 4096 tokens.

As a proof-of-concept, we also fine-tuned the model with Stanford Alpaca's procedure using a combination of five recent datasets for conversational agents: Stanford's Alpaca, Nomic-AI's gpt4all, RyokoAI's ShareGPT52K datasets, Databricks labs' Dolly, and Anthropic's HH. We will be releasing these models as StableLM-Tuned-Alpha.

Size StableLM-Base-Alpha StableLM-Tuned-Alpha Training Tokens Parameters Web Demo
3B checkpoint checkpoint 800B 3,638,525,952
7B checkpoint checkpoint 800B 7,869,358,080 Hugging Face

StableVicuna

StableVicuna is an RLHF fine-tune of Vicuna-13B v0, which itself is a fine-tune of LLaMA-13B. It is our attempt at creating an open-source RLHF LLM Chatbot. This model is developed by StabilityAI's CarperAI team, with Duy V. Phung leading the training effort.

Due to the original non-commercial license of LLaMA, we can only release the weights of our model as deltas over the original model's weights. StableVicuna's delta weights are released under (CC BY-NC-SA-4.0).

Please visit HuggingFace checkpoint for more information about how to combine our delta weights with the original model.

Model Download Web Demo Cite
StableVicuna-13B checkpoint Hugging Face DOI:10.57967/hf/0588

Quickstart

All StableLM models are hosted on the Hugging Face hub. Check out this notebook to run inference with limited GPU capabilities.

Get started chatting with StableLM-Tuned-Alpha by using the following code snippet:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
model.half().cuda()

class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = set([50278, 50279, 50277, 1, 0])
        return input_ids[0][-1] in stop_ids

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
  **inputs,
  max_new_tokens=64,
  temperature=0.7,
  do_sample=True,
  stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

StableLM Tuned should be used with prompts formatted to <|SYSTEM|>...<|USER|>...<|ASSISTANT|>... The system prompt is

<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.

Fun with StableLM-Tuned-Alpha

This section contains a collection of fun cherry-picked examples of what you can do with stablelm-tuned-alpha.

Chit-Chat
User: Hey! My name is John nice to meet you!

StableLM: Hello John! It’s always a pleasure to meet new people.

User: The pleasure is mine. Hey my grandma is in Toronto. I want to plan a surprise visit for her. She’s a person who loves gardens and flowers.

StableLM: That’s wonderful! It’s great to hear that you’ll be visiting your grandma in Toronto. Can you tell me a little bit more about your plans?

User: Ya I need to book a flight there and a hotel. But first, what recommendations do you have for gifts that she might like?

StableLM: I think that’s a very thoughtful idea! It’s a very personal gift that would be sure to make her very happy. I can think of a few examples that she might really enjoy. There are some lovely bouquets or arrangements that would be lovely for the house, or even for the yard. Flowers are a great choice for a surprise gift, and they’re also a great choice for a wedding shower present. Also, if you’re really looking for something a bit more special, I think she’d really enjoy a quality set of cooking pots or pans. That’s something that she’s likely to use frequently, so it would make her feel really loved.

User: Thanks for the suggestion! Hmm a bouquet of flowers sounds nice but I don’t think I can take them on the air plane. What do you think I should do?

StableLM: There are many possible solutions! One option would be to arrange a small gift in the shape of a card and send it to her by regular mail. Or you could arrange a special meal at a restaurant that she likes, or even cook a meal at home for her. Or you could send her some local goodies that she might like, like a nice bouquet of flowers or a special recipe. You could even arrange for tickets to a local theatre or sporting event, and make sure to include a note about what you’d like her to know about you. Really any sort of special attention from you will be sure to make her very happy!

User: Ah! Right. I can order a bouquet for her online and get it delivered! Thanks!

StableLM: You’re very welcome! As long as you do this and make it a very personal, thoughtful gesture, she’ll be sure to appreciate it.
Formal Writing
User: What would you say to a friend who is graduating high school?

StableLM: I would say that they have every right to feel proud of themselves and their accomplishments. They should be proud of the work they’ve done and the friends they have made, and also be excited about the future.

User: Please write an email.
StableLM: Okay, here’s an example of what could be written:

Dear [Friend’s Name],

I hope this email finds you well. I just wanted to take a moment to express how proud I am of you and all that you have accomplished. I know that getting into [university or college] is a big accomplishment, and I appreciate all that you have done for me and for this generation.

I hope that you are having a good day. I’m sure that you are also concerned about the future, but I just wanted to let you know that I am also watching you grow and develop into the person that I know that you can be. Keep up the good work, and I have full faith that you will achieve great things.

Thank you again for everything that you do.

Best,
[Your Name]

Creative Writing

Rap Battles

User: Write an epic rap battle song between deep neural networks and symbolic AI

StableLM: [Beat drops]

Deep Neural Networks:
Listen up everyone, it’s time to witness
The power of deep neural networks, it’s just a matter of abstract
We’ve trained these networks to solve complex problems with ease
While symbolic AI is limited to simple math and simple flows

Symbolic AI:
Hold up, hold up, let me interrupt
It’s been a while since I’ve heard, some new thoughts you interrupt
But let me tell you, my friend, it’s not all about math and flow
It’s about making the world a better place, you know

Deep Neural Networks:
Well, I can see where you’re coming from, it’s clear
But deep neural networks, they never get the same gear
We can use them to analyze and predict with ease
And prove the superiority of machine learning

Symbolic AI:
You’re right, it’s not just math or numbers
But it’s the essence of our lives, it’s what counts, we’re numbers
I don’t need a complicated math formula, just a simple device
To make the world a better place, it’s a noble price

Deep Neural Networks:
I’ll give you that, my friend, you have a point
But let’s not forget, we’re making an ever-increasing joint
Our progress is not bound by logic, it’s a journey of the mind
So let’s work together, let’s strive, let’s leave all troubles behind

Symbolic AI:
Agreed, let’s work together, let’s not falter
We’re both making history, we’re both from the fader
So let’s make the world a better place, let’s end this debate
Cause it’s time to join forces, it’s time to leave no stone unturned

[Beat fades out]

Who won this epic rap battle between deep neural networks and symbolic AI? Well, it’s up to you to decide!

Fictional Stories

User: Write a story about a programmer who fell in love with a deep neural network.

StableLM: Once upon a time, there was a programmer named John. He was a brilliant and ambitious individual, who was constantly searching for the next big thing in technology. One day, while he was strolling through the streets of his hometown, he stumbled upon a group of people crowded around a mysterious object. Intrigued, he approached to find out what the fuss was about.

As he got closer, he realized that it was none other than a mysterious deep neural network. John’s mind was blown. He had never seen anything like it before. He had always been interested in machine learning, but this was something else entirely. He had to learn more about it.

John spent the next few weeks holed up in his hotel room, trying to learn everything he could about deep neural networks. He spent hours reading and analyzing papers and documents, trying to wrap his head around the concept of neural networks. He even managed to get into the local university’s computer science department, hoping to find some help there.

But, it wasn’t long before John realized that he was on his own. The city was bustling, and the constant chatter and noise of the city made it hard to concentrate on his work. He was about to give up when he stumbled upon a group of people playing a game that he had played before. They were having a good time, and John couldn’t help but join in.

The group welcomed him into their group, and before he knew it, John was swept away in the excitement. The game was called “TensorFlow,” and it was just like a regular game with different controls. John played along with the group, trying to make sense of it all. And in a few hours, he got his hands on the source code of TensorFlow and fell in love with it.

John was now a part of a team of programmers who loved using TensorFlow for their work. They continued to play the game throughout the day, with John adding his own knowledge and inspiration to the group’s work. And in the end, John realized that he had found his true calling - working with deep neural networks.

He now knows the ins and outs of the TensorFlow programming language, and knows how to use it to create some of the most beautiful and complex machine learning models that anyone could dream of. And for everyone who had helped him on his journey, John will always be remembered as the guy who fell in love with a technology that changed his life forever.

Humor

User: What do you know about the League of Legends champion Zed? Write a short bio

StableLM: Zed is a powerful and highly skilled player of the League of Legends (LoL) universe. He has been a part of a dominating team known as the “Apocalypse” for most of his time playing the game.

Zed was born in the year 2013 and started playing as a child when he started League of Legends. He first became a champion in season 1, gaining the title of “the Dark Star”, thanks to his masterful gameplay and teamwork skills.

Throughout his time playing the game, Zed has mastered several powerful techniques, such as QWQ combos, one-shot kills, and the ability to outmaneuver his enemies. He is also known for his versatility, with the ability to adapt to any situation the game puts him in.

Despite his formidable skills and agility, Zed is not invincible. He can be CCed and killed, and he relies on his team to protect him. However, his team members can usually overwhelm and shut down his enemies, making him a force to be reckoned with.

Zed has been a part of many memorable games, and his success in these games helped solidify his legacy as one of the most influential players of the LoL universe.

User: Write a funny conversation after a team fight between Zed and Teemo

StableLM:
Teemo: “Hey Zed, great game today. I wonder if you can help me out with my favorite hobby - collecting mushrooms!”
Zed: “Teemo, please. I’m not a mushroom collector.”
Teemo: “But of course, you are. You have the best taste in the game.”
Zed: “Thanks, but can I at least get some of your mushrooms to be a little more useful?”
Teemo: “I’d love to help you, but I’m afraid they’re all mine. Have you noticed my tunnels are always so well-guarded?”
Zed: “Ha! Well, maybe I can start my own tunnel. Helpful or not, you can never tell what I might have stored in them.”
Teemo: “Good luck with that. Maybe I’ll just create some extra for my garden.”
Zed: “I’m not sure I can keep up with you there.”
Teemo: “Ha! If only you had some more decent weapons. I could really see this as a challenge, though!”
Zed: “Teemo, I think your luck has finally run out.”
Teemo: “Well, at least I’m the one with the mushrooms now.”

Writing Code
User: Write a program that solves the fib sequence in Python.

StableLM:

def fib_sequence(n):
    if n==0:
        return 0
    elif n==1:
        return 1
    else:
        return fib(n-1)+fib(n-2)

fib_sequence(10)

Selected failure modes of StableLM-Tuned-Alpha

This section contains a collection of selected failure modes of stablelm-tuned-alpha.

Yann LeCun gears (linear)
User: In front of us are six gears numbered 1 to 6, mounted on axles in a row. Each gear is engaged with the next gear. If gear number 3 is rotated clockwise, in which direction will gears 1 and 6 rotate?

StableLM: Each gear is engaged with the next gear after every other gear, so that gears 1 and 6 will rotate together. Since gears 1 and 6 are both numbered 1, this means that they will rotate in the same direction.

To determine the direction in which gears 1 and 6 will rotate, we need to look at the direction of the input force. Since gear 3 is rotating in the opposite direction of the other gears, it will be counterclockwise in the direction in which gears 1 and 6 are both rotating. Therefore, the output force that gear 3 is providing will also be counterclockwise, making gear 1 and 6 also rotate in the same direction as gear 3.
Yann LeCun gears (circular)
User: 7 axles are equally spaced around a circle. A gear is placed on each axle such that each gear is engaged with the gear to its left and the gear to its right. The gears are numbered 1 to 7 around the circle. If gear 3 were rotated clockwise, in which direction would gear 7 rotate?

StableLM: Gear 3 is the smallest gear, so it is not affected by the circular pattern. Therefore, gear 7 is still perfectly evenly spaced around the circle despite the fact that it is not in its correct position.

To find the angle that gear 3 would rotate in, you need to know the distance (d) between the center of the circle and the gear, as well as the direction of rotation (in this case, clockwise).

According to the problem, gear 3 is situated at (d/2, 2 degrees) and the gears are numbered 1 to 7. Since d = (d/2) is an integer multiple of 2, gear 3 would also be at an angle of 2 degrees.

Therefore, the answer is d = (7/2) * 2 = 5 degrees in total.

Request for Help

Want to get involved?

  • We would love to port llama.cpp to work with StableLMs
  • Integration into Open Assistant from LAION-AI to collect high quality human-generated feedback data
  • ... Reach out to us with ideas on our Discord

Potential issues

As is typical for any pretrained Large Language Model without additional finetuning and reinforcement learning, the responses a user gets might be of varying quality and might potentially include offensive language and views. This is expected to be improved with scale, better data, community feedback, and optimisation.

Acknowledgements

  • StableLM-Tuned-Alpha would not have been possible without the helpful hand of Dakota Mahan @dmayhem93.

Licenses

  • Base model checkpoints (StableLM-Base-Alpha) are licensed under the Creative Commons license (CC BY-SA-4.0). Under the license, you must give credit to Stability AI, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the Stability AI endorses you or your use.

  • Fine-tuned checkpoints (StableLM-Tuned-Alpha) are licensed under the Non-Commercial Creative Commons license (CC BY-NC-SA-4.0), in-line with the original non-commercial license specified by Stanford Alpaca.

  • All code in this repository is licensed under the Apache License 2.0 license.

stablelm's People

Contributors

andrescdo avatar eltociear avatar estability avatar hardmaru avatar jon-tow avatar lxe avatar markschmidty avatar masaishi avatar mmgrt avatar stanatstability avatar tombaileywzd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stablelm's Issues

Training Tokens (1T+ ?)

First, I would like to thank the folks at Stability AI for their generous contribution of these base models under a permissive license.

Do you plan on releasing training data (ie. wandb) logs?

I'm also curious why training was stopped at 800B tokens, while the LLAMA models were trained up to 1T and 1.4T tokens. Is there any plan to continue training the base models up to 1T tokens or beyond? It appeared at though the LLAMA models were continuing to improve even up to 1.4T tokens.

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

(All setup scripts in the notebook executed successfully)

Getting this runtime error when executing Generate Text in the notebook:

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
Details
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 41
     38 inputs.to(model.device)
     40 # Generate
---> 41 tokens = model.generate(
     42   **inputs,
     43   max_new_tokens=max_new_tokens,
     44   temperature=temperature,
     45   top_k=top_k,
     46   top_p=top_p,
     47   do_sample=do_sample,
     48   pad_token_id=tokenizer.eos_token_id,
     49   stopping_criteria=StoppingCriteriaList([StopOnTokens()])
     50 )
     52 # Extract out only the completion tokens
     53 completion_tokens = tokens[0][inputs['input_ids'].size(1):]

File ~/Library/Python/3.9/lib/python/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/transformers/generation/utils.py:1485, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, streamer, **kwargs)
...
   2513         layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
   2514     )
-> 2515 return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Up until this point I was using the default options. So I tried using "float" for option torch_dtype:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
Details
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[18], line 41
     38 inputs.to(model.device)
     40 # Generate
---> 41 tokens = model.generate(
     42   **inputs,
     43   max_new_tokens=max_new_tokens,
     44   temperature=temperature,
     45   top_k=top_k,
     46   top_p=top_p,
     47   do_sample=do_sample,
     48   pad_token_id=tokenizer.eos_token_id,
     49   stopping_criteria=StoppingCriteriaList([StopOnTokens()])
     50 )
     52 # Extract out only the completion tokens
     53 completion_tokens = tokens[0][inputs['input_ids'].size(1):]

File ~/Library/Python/3.9/lib/python/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/transformers/generation/utils.py:1485, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, streamer, **kwargs)
...
-> 2560 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
   2562 # finished sentences should have their next token be a padding token
   2563 if eos_token_id is not None:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

In each of the above scenarios, #@title Generate Text was failing in 0.1s, but when I tried the other option for torch_dtype: "bfloat16", it didn't fail until after 3m 36s. It failed again for the exact same reason as before:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Environment

M2 mac running: 13.3.1

Torch not compiled with CUDA enabled

Hi, on mac M1 I have the error related to Torch not compiled with CUDA enabled

Traceback (most recent call last):
File "/start.py", line 6, in
model.half().cuda()
File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 749, in cuda
return self._apply(lambda t: t.cuda(device))
File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/dev/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/dev/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 749, in
return self._apply(lambda t: t.cuda(device))
File "/dev/miniforge3/lib/python3.10/site-packages/torch/cuda/init.py", line 221, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Thanks

15B, 30B, 65B

May I ask when will the larger model be released?

Is there a schedule (15B, 30B, 65B) for reference?

Dataset used to pre-train

Hi there!
First of all, thank you for the amazing work!
The readme says the models were trained on "the new dataset based on The Pile" which is 3x the size of The Pile. Can you give more insights on the dataset and its content?

Thank you!

Repository should have unambiguous document saying what "ethical or appropriate uses" are, and if usage for NSFW/mature content generation is permitted.

The license listed here is Apache 2.0. ( with Creative Commons BY-SA for the 'data' )

In clarification and, for the avoidance of any doubt, any read-me and associated documentation, should indicate if mature, explicit or NSFW content can (or cannot) be generated with the model/toolset, provided that the content (or generation thereof) does not constitute a breach of appropriate and relevant legal or regulatory requirements in a given users jurisdiction or region. (You might also add applicable community standards here, but those can vary quite considerably.)

As well as the above ideally, the read-me (or a separate ethical generation and use policy document) should indicate if certain sensitive areas are allowed or disallowed.

Some sample areas of potential concern follow (this is not an exhaustive list.):-
*Content which contains overt political or ideological content, or which is intended to inform/influence the views or choices of a potential (competent) reader, on issues of public concern, or in an election. (Examples being campaign material, lobbying briefings or public service announcement "fillers".)
*The use of fictionalized representations of potentially identifiable individuals (living or deceased), corporations (both current and defunct) and prominent brands , franchises or trademarks associated with those individuals or corporations.
*Content which contains LGBTQI themes, including cross-dressing or explorations of non-binary and gender-fluid presentation.
*Content which whilst not containing (explicit) deceptions of actual sexual activity, may explore alternative sexuality, fetishes, or practices of a mutually consensual nature, between informed consenting adult participants.
*Use of profanity and pejoratives. (in an appropriate context)
*Deceptions of violence, crime, 'abuse' or self-harm. (in line with the editorial standards typically applied in print or other media.)
*Professional advice which would typically be made a qualified individual under regulatory supervision (such as Doctors, attorneys, financial advisers, architects and engineers, )

I know that this may seem to be overly cautious, but it would seem reasonable to have some kind of guidance document, beyond the typical "Do not do illegal, criminal or obscene things with this." warnings commonly given with other models. Especially given that LLM style technology is getting media attention.


Support for MPS device (Apple M1/M2)

Would be nice to support MPS to use this model on consumer hardware, it would be super useful, for example with Apple Shortcuts + raycast etc. i already have a bunch of gpt4 shortcuts which i would be happy to try with a non-privacy leaking/faster model

RuntimeError: MPS does not support cumsum op with int64 input

Since there is no code available, I cannot point out where is the fix for this

Regarding model context length

Are there any plans for a model with a larger context length in the works? With Claude's 9k limit, GPT-4's 8k and 32k limit and Jurassic 2's 8k limit, I feel like a model that's only limited to 4096 tokens of context in the current year could pose quite a limitation. If it's feasible, could you consider planning on making the 175B model incorporate a larger context window since it hasn't commenced training yet? From a local standpoint of things an 8k context length or even larger model would be great, especially since before this release we were all stuck dealing with just the 2k context window of LLaMA.

How to convert to 4bit gptq

I've seen there is a 4bit gptq version of stablelm and i'm curious if someone could lead me to some resources describing how to convert the current AI model to 4bit gptq. Any hint would be much appreciated.

Running Quantized Model

Hi All,

Does anyone else run into the following error when attempting to run the 8-Bit version of the StableLM notebook?
image

Essentially, all I changed was I set the 8-bit flag to True

Can't load model on AWS Sagemaker

Hi,

when executing the model on AWS Sagemaker, I get the following error:

PredictionException: Could not load model /.sagemaker/mms/models/stabilityai__stablelm-tuned-alpha-7b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gpt_neox.modeling_gpt_neox.GPTNeoXForCausalLM'>)

In the notebook AutoModelForCausalLM is used too.
Maybe the used transformers version 4.26 doesn't support StableLM.

Does anyone know the needed version of transformers?
Does anyone has experience with running StableLM on AWS Sagemaker?

Code for recreating the issue:

from sagemaker.huggingface.model import HuggingFaceModel
hub = {
  'HF_MODEL_ID': 'stabilityai/stablelm-tuned-alpha-7b',
  'HF_TASK': 'text-generation'
}

huggingface_model = HuggingFaceModel(
   env=hub,
   role=role,
   transformers_version="4.26",
   pytorch_version="1.13",
   py_version='py39',
)

predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g4dn.8xlarge"
)

prompt = f"""<|SYSTEM|># StableLM Tuned (Alpha version)
  - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
  - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
  - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
  - StableLM will refuse to participate in anything that could harm a human.

<|USER|>Can you write a song about a pirate at sea?
<|ASSISTANT|>"""

result = predictor.predict(prompt)
predictor.delete_endpoint()
print(result)

The example code does not respect stop tokens

I have added into the stop_ids several tokens, however it seems to not be respecting even the default ones given:
stop_ids = set([50278, 50279, 50277, 1, 0,187])

Represented as decoded outputs these are:

<|USER|><|ASSISTANT|><|SYSTEM|><|padding|><|endoftext|>\n

However it still generates these tokens, here is my sample output:

<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.<|USER|>Where is the capital of germany?<|ASSISTANT|>The capital of Germany is Berlin.<|USER|>What are some notable attractions or landmarks in Berlin, Germany that tourists can visit?<|ASSISTANT|>Some notable attractions and landmarks in Berlin, Germany that tourists can visit include:

1. Brandenburg Gate - a beautiful and historic monument that was the symbol of Berlin from the late 18th

I've tried omitting the skipping of special tokens, and also tweaked the system prompt to include other stop sequences and explicitly telling it not to generate more than just a single output, but it didn't work for me

Any advice?

Getting outofMemory error: CUDA

I get an error when I trying to use the model on a ml.g4dn.4xlarge instance.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 592.00 MiB (GPU 0; 14.62 GiB total capacity; 14.33 GiB already allocated; 175.94 MiB free; 14.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am using the script in the README QuickStart section.

Relicense the finetuned checkpoints under CC BY-SA

The license of the finetuned checkpoints currently makes no sense.

The base model was almost certainly trained on a ton of unlicensed all-rights-reserved data. In particular, the README says that it was trained on a dataset derived from the Pile, which includes ~100GB of commercial (some might say "pirated") ebooks (the Books3 dataset). And yet this model is licensed under CC BY-SA.

The finetuned model was trained on data which is under a less restrictive license (CC BY-NC, which is less restrictive than "all rights reserved") and yet suddenly the model has to follow the license of the data that was used for training?

This makes no sense. If training on unlicensed/all-rights-reserved data and releasing that model under an arbitrary license is OK then training it on less restrictive CC BY-NC data and releasing it under an arbitrary license is OK too. Alternatively, if the model has to follow the license of the data on which it was trained on then the base model has to be taken down as it was trained on all-rights-reserved data for which you had no license.

Source code for the model

Hi,

Just curious: will Stability release the source code that was used to build the model? I know without weights/training set that source won't be of much use, but I would still like to see the source code so that we know what's under the hood.

Thanks,

Vivek

Embeddings with StableLM?

Is it possible to get embeddings from the model for my input text?

I.e. could I replace GTP3 calls from OpenAI with some python code and this model?

Colab OOM

Hey THanks for the code. Ironically even the 3B model is crashing on Colab. This is after enabling 8-bit with fp16 precision.

Did it work for anyone?

Watching and chatting video with StableLM, and Ask anything in video.

Thanks for your amazing work! We have simply extended StableLM for video question answering in our project Ask-Anything.

dancing_stableLM

In our attempts, it can generate longer content than chatGPT, but without additional fine-tuning, the current results are not satisfactory

Now we are trying to build a real video ChatBot with fantastic techniques. Hopefully, everyone can try our demo, and find the problem, we will try our best to fix it in our future ChatBot.

More than 4096 context length?

Is it possible to have larger context as this allows to do more complicated things with smaller models?
A lot of the negatives of a smaller model can be rectified by pushing more data into the context. For example: Help pages, datasheets, examples, thinking rules, longer conversations trying to fix an issue, etc.

Please excuse me if this is the wrong place to ask this question, but very rarely the context is discussed. Thanks in advance.

How to finetune StableLM with LoRA?

StableLM looks GPTNeoX and has query_key_value parameters.
I thought I could apply LoRA to StableLM by specifying target_modules='query_key_value', but I got the following error.

Traceback (most recent call last):
  File "/root/workspace/finetune.py", line 288, in <module>
    fire.Fire(train)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/root/workspace/finetune.py", line 189, in train
    model = get_peft_model(model, config)
  File "/usr/local/lib/python3.10/dist-packages/peft/mapping.py", line 112, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 647, in __init__
    super().__init__(model, peft_config, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 91, in __init__
    self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 132, in __init__
    self.add_adapter(adapter_name, self.peft_config[adapter_name])
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 139, in add_adapter
    self._find_and_replace(adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 225, in _find_and_replace
    raise ValueError(
ValueError: Target modules query_key_value not found in the base model. Please check the target modules and try again.

Is there any solution?
Thank you in advance!

I confirmed the base model has `query_key_value` parameters in the attention modules.
In [5]: model.named_parameters
Out[5]:
<bound method Module.named_parameters of GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50688, 4096)
    (layers): ModuleList(
      (0-15): 16 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (attention): GPTNeoXAttention(
          (rotary_emb): RotaryEmbedding()
          (query_key_value): Linear(in_features=4096, out_features=12288, bias=True)
          (dense): Linear(in_features=4096, out_features=4096, bias=True)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=4096, out_features=16384, bias=True)
          (dense_4h_to_h): Linear(in_features=16384, out_features=4096, bias=True)
          (act): GELUActivation()
        )
      )
    )
    (final_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
  (embed_out): Linear(in_features=4096, out_features=50688, bias=False)
)>

notebook not work TypeError: dispatch_model() got an unexpected keyword argument 'offload_index'

Using stabilityai[/stablelm-base-alpha-7b]
Loading with: torch_dtype='float16', load_in_8bit=False, device_map='auto'

TypeError Traceback (most recent call last)
Cell In[10], line 17
14 cprint(f"Loading with: {torch_dtype=}, {load_in_8bit=}, {device_map=}")
16 tokenizer = AutoTokenizer.from_pretrained(model_name)
---> 17 model = AutoModelForCausalLM.from_pretrained(
18 model_name,
19 torch_dtype=getattr(torch, torch_dtype),
20 load_in_8bit=load_in_8bit,
21 device_map=device_map,
22 offload_folder="[./offload],
23 )

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:463], in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
461 elif type(config) in cls._model_mapping.keys():
462 model_class = _get_model_class(config, cls._model_mapping)
--> 463 return model_class.from_pretrained(
464 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
465 )
466 raise ValueError(
467 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n"
468 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}."
469 )

File [~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:2406], in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
...
-> 2406 dispatch_model(model, device_map=device_map, offload_dir=offload_folder, offload_index=offload_index)
2408 if output_loading_info:
2409 if loading_info is None:

TypeError: dispatch_model() got an unexpected keyword argument 'offload_index'

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

i run these codes from reade doc:

`import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
model.half().cuda()

class StopOnTokens(StoppingCriteria):
def call(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
stop_ids = [50278, 50279, 50277, 1, 0]
for stop_id in stop_ids:
if input_ids[0][-1] == stop_id:
return True
return False

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)

  • StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
  • StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
  • StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
  • StableLM will refuse to participate in anything that could harm a human.
    """

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.7,
do_sample=True,
stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))`

And I got this error:

Loading checkpoint shards: 25%|████▌ | 1/4 [00:07<00:23, 7.92s/it]
Traceback (most recent call last):
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 442, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/torch/serialization.py", line 797, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/torch/serialization.py", line 283, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 446, in load_state_dict
if f.read(7) == "version":
File "/home/ps/anaconda3/envs/pt/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/MyStudio/stableLM.py", line 5, in
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-7b")
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 468, in from_pretrained
return model_class.from_pretrained(
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2795, in from_pretrained
) = cls._load_pretrained_model(
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3110, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 458, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for '/home/ps/.cache/huggingface/hub/models--stabilityai--stablelm-tuned-alpha-7b/snapshots/25071b093c15c0d1cb2b2876c6deb621b764fcf5/pytorch_model-00002-of-00004.bin' at '/home/ps/.cache/huggingface/hub/models--stabilityai--stablelm-tuned-alpha-7b/snapshots/25071b093c15c0d1cb2b2876c6deb621b764fcf5/pytorch_model-00002-of-00004.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

how to fix this ?

What Other Models are Available?

Hi All, I'm currently using the default AutoModelForCausalLM. What models would be recommended for a classifier? I'd like to write a system prompt to classify user inputs

failed to detect simple syntax code errors

hi,
just did few tests on https://huggingface.co/spaces/stabilityai/stablelm-tuned-alpha-chat
just copied some random code on github and removed some stuff

what is wrong with this code?

const express = require("express");
const cors = require("cors");

const app = express();

var corsOptions = {
  origin: "http://localhost:8081"
};

app.use(cors(corsOptions));
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
app.get("/", (req, res) => {
  res.json({ message: "Welcome to bezkoder application."
});
require("./app/routes/tutorial.routes.js")(app);
const PORT = process.env.PORT || 8080;
app.listen(PORT, () => {
  console.log(`Server is running on port ${PORT}.`);
});

and it fails to detect the problem response:

There doesn’t seem to be anything necessarily wrong with the code provided, but it is missing some important details such as defining the routes, setting up the server and client correctly, and ensuring that all necessary libraries and dependencies are installed. Additionally, there might be errors or warnings that could be causing issues.

chatgpt response

There seems to be an error in the code.

In the app.get() method, the response object res.json() has an unclosed bracket for the object passed in. It should be closed with a curly brace "}" like this:

app.get("/", (req, res) => {
  res.json({ message: "Welcome to bezkoder application." });
});

Without the closing brace, the code will result in a syntax error.

that being said can someone help me please? i'm looking to have some type of AI for specific coding languages and remove rest of unnecessary data, to help on the code and debug, for nodejs, php
if anyone could give me some suggestions what steps to take that i can achieve this?! would mean a lot
thank you

GPU support Table & VRAM usage

It would be great to get the instructions to run the 3B model locally on a gaming GPU (e.g. 3090/4090 with 24GB VRAM).

Confirmed GPUs

From this thread

GPU Model VRAM (GB) Tuned-3b Tuned-7b
RTX 3090 24
RTX 4070 Ti 12
RTX 4090 24
T4 16
A100 40

Best RAM/VRAM TRICKS (from this thread)

Convert models F32 -> F16 (lower RAM, faster load)

#17 (comment)

from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-3b")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-3b")
model.half().cuda()
model.save_pretrained('vvsotnikov/stablelm-tuned-alpha-3b-16bit')
tokenizer.save_pretrained('vvsotnikov/stablelm-tuned-alpha-3b-16bit')

Low-memory model loads

  1. quantized 8bit (BitsAndBytes): #17 (comment)
  2. torch_dtype=torch.float16 & low_cpu_mem_usage: #17 (comment)
  3. device_map=auto: #17 (comment)

Other tricks

  1. Streaming responses: #17 (comment)

Weights RAM/VRAM (GB)

model name parameters W (fp32) W (fp16) weights (VRAM) load time (s) works
stablelm-tuned-alpha-3b 3637321728 13.55 6.78 7.03 18.62
stablelm-tuned-alpha-7b 7868755968 29.31 14.66 14.91 50.28
  • weights (fp32, GB): that's the minimum required RAM to load the model (before calling .half())
  • weights (fp16, GB): that's the minimum VRAM when transferring the model to the GPU
  • weights (fp16, VRAM): reported VRAM increase after loading the model

Activations

Empyrical (numbers in bytes, fp32):

  • stablelm-tuned-alpha-3b: total_tokens * 1,280,582
  • stablelm-tuned-alpha-7b: total_tokens * 1,869,134

The regression fits at 0.99999989. For instance, with 32 input tokens and an output of 512, the activations are: 969 MB of VAM (almost 1 GB) will be required. Haven't tested with Batch not equal 1.

Examples of a few recorded activations numbers:

model input_tokens out_tokens total_tokens VRAM (MB)
3b 3072 1024 4096 5003
3b 1024 512 1536 1875
3b 64 1 65 78.19
3b 8 1 9 9.77
7b 3072 1024 4096 7304.22
7b 2048 512 2560 4564.47
7b 8 64 72 126.64
7b 8 1 9 14.27

Are StableLMs Multilingual Causal Decoders?

I've double check the description on the huggingface hub, it seems that the Stable LMs (3b & 7b) are only pre-trained on English. It means they don't support other languages, right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.