mshumer / gpt-llm-trainer Goto Github PK

View Code? Open in Web Editor NEW

3.9K 3.9K 504.0 21 KB

License: MIT License

Jupyter Notebook 100.00%

gpt-llm-trainer's People

Contributors

Stargazers

Watchers

Forkers

tomchapin logikonline schof0rl mz0in dezigns333 jeffara andrewyu0 veryvanya d3287t328 gid3xn omarei stjordanis jaytoday sundog-ftw iwillcodeu shanthshivam teknium1 prblsing polya20 snowdj hetieke2 rogetxtap wdshin paolomarin stalinkay zeroch1ll geezer686 dsp0205 edwardburns stevekhuu enginbozkurt ibrahimshittu nkrumah-dubazana areegtarek aria1991 muhammadmotawe qazizia ialzyoud manojalwisnz mccun934 sko-kr datacontrol jguillengarcia encycdata kirkryan ai-awe udbhavtripathi beherit79 mdf-git jinzaizhichi jamescndubuisi rexmorgan89 tan2line mybinaryworld techthiyanes elynrose pontonkid amarbabuk awesome-software c00renut 0xnetrunner rogervaas bhardwajrahul alexyen1000 jiajiajia789 pterameta ailabteam bbascbr simrit1 dorucioclea codehornets codeaudit 0xmgg yourmoonlight destinythefox jungheanu mivanovitch missusk lrochetta milan-chicago itsbrex wildcatgeo felipeescallon trinhtuanvubk seshakiran keontang velvrix rayhunter luisriverag rkp2k00 mjdhasan jaysinghr say383 danyray420 riazspace infork22 prettysparklepony oceans0423 juandavidp9 venetisgr

gpt-llm-trainer's Issues

GPT-2 torch argmax issues

`import torch

tokenizer_custom = GPT2Tokenizer.from_pretrained("gpt2")
model_custom = GPT2LMHeadModel.from_pretrained('gpt2')

generated_custom = tokenizer_custom.encode("The Manhattan bridge")
context_custom = torch.tensor([generated_custom])
past_custom = None

for j in range(100):
print(j)
output_custom, past_custom = model_custom(context_custom, past=past_custom)
token_custom = torch.argmax(output_custom[..., -1, :])

generated_custom += [token_custom.tolist()]
context_custom = token_custom.unsqueeze(0)
sequence_custom = tokenizer_custom.decode(generated_custom)

print(sequence_custom)`
Please help me correct this

getting an error after running generate_example().

'''
KeyError Traceback (most recent call last)
in <cell line: 45>()
45 for i in range(number_of_examples):
46 print(f'Generating example {i}')
---> 47 example = generate_example(prompt, prev_examples, temperature)
48 print(example)
49 prev_examples.append(example)

in generate_example(prompt, prev_examples, temperature)
39 print(response.json())
40
---> 41 return '' + response.json()['content'][0]['text'].split('')[1]
42
43 # Generate examples

KeyError: 'content'
'''

Add 'LLM Knowledge Distillation' to Readme or Topic Tags

Hello,

The concept of 'LLM Knowledge Distillation', intrinsic to gpt-llm-trainer, isn't explicitly highlighted in the repository. Adding it to the Readme or the Github Topic Tags could introduce multiple benefits. It could help users understand the core mechanism employed and enhance the discoverability of this repository for those seeking similar solutions.

Thanks for considering my suggestion!

Can we use GPT3.5?

Can we modify the code to use gpt 3.5 instead of gpt 4, most people don't have access, and to make it a levelled field we may use double the examples?

which GPU?

I have Colab+ and avalable to me are

v100, t4, tpu

Update: TPU doesn't work using the off-the-shelf version of the script as it assumes an NVIDIA GPU.

which should I use/which will be fastest?

llm

Merge the model and store in Google Drive (Section)

It always runs out of memory...please remedy this issue. Error I get constantly and I am using Colab Pro V100 which should be enough for this project i think: 0/3 [02:11<?, ?it/s]

OutOfMemoryError Traceback (most recent call last)
in <cell line: 8>()
6
7 # Reload model in FP16 and merge it with LoRA weights
----> 8 base_model = AutoModelForCausalLM.from_pretrained(
9 model_name,
10 low_cpu_mem_usage=True,

4 frames
/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics)
296 module._parameters[tensor_name] = param_cls(new_value, requires_grad=old_value.requires_grad)
297 elif isinstance(value, torch.Tensor):
--> 298 new_value = value.to(device)
299 else:
300 new_value = torch.tensor(value, device=device)

OutOfMemoryError: CUDA out of memory. Tried to allocate 314.00 MiB (GPU 0; 15.77 GiB total capacity; 14.32 GiB already allocated; 2.12 MiB free; 14.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

error :You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0

APIRemovedInV1:

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run openai migrate to automatically upgrade your codebase to use the 1.0.0 interface.

Alternatively, you can pin your installation to the old version, e.g. pip install openai==0.28

A detailed migration guide is available here: openai/openai-python#742
APIRemovedInV1Proxy: def call(*_args: Any, **_kwargs: Any) -> Any

openai.lib._old_api.APIRemovedInV1Proxy instance

Cost estimate?

First, Matt, congratulations on the success thus far of your projects. I am sure you will soon have Sam Altman unable to sleep :-). In the readme, you said:

It'll take some time (from 10 minutes to a couple of hours, depending on how many examples you generate), but soon, you'll have your fine-tuned model!

Can you give some ballpark cost estimates / ranges for this? Thx.

hello, would you have time for a chat?

im interested in developing a model to generate the data for this pipeline, in fact, ive spent the better part of the last several months working on a very large and very scaled up system to do just exactly this thing.
i was wondering if you had a minute this week to talk and maybe compare notes?

Running into CUDA out of memory on Colab

Hello @mshumer . I am trying to run the code on colab and running into CUDA out of memory error as below :
OutOfMemoryError Traceback (most recent call last)
in <cell line: 14>()
12
13 # Reload model in FP16 and merge it with LoRA weights
---> 14 base_model = AutoModelForCausalLM.from_pretrained(
15 model_name,
16 low_cpu_mem_usage=True,

OutOfMemoryError: CUDA out of memory. Tried to allocate 250.00 MiB (GPU 0; 14.75 GiB total capacity; 13.52 GiB already allocated; 48.81 MiB free; 13.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Its happening at "Merge the model and store in Google Drive" step.

API not working even after upgrading to gpt 4

I am getting the following message: The model gpt-4 does not exist or you do not have access to it. eventhoguh i upgraded to gpt 4. Please help me out

Question

Hi,
Doesn’t finetuning on Claude’s output violate their terms of use?

Problem with workflow

Wish you could just upload your own jsonl instead of having to generate them in order to use the script, it's like you have to go step by step even if you want to start with the 'Upload the file to OpenAI' step

NousResearch/llama-2-7b-chat-hf NOT AVAILABLE

NousResearch/llama-2-7b-chat-hf is no longer available and when I got to this stage in the project, there was no model available...and the wholöe process needed to be restart from scratch. What llama 2 model would you recommend in its place. Tried the Bloke uncensored and ran into trouble using that one, and since it takes so long getting everything in place for that stage to fail is quite annoying and expensive on colab. So perhaps someone can help suggest a model that is loadable for this project.

without openai !!!

is this possible without openai?

Logging into wandb.ai

Hello!
the cell after"Load Datasets and Train" throw me a prompt to enter API details for wandb.ai. here is the message I get:

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

I want to run everything locally, I already replace the codes that goes to openai, but i don't see where this wandb is call and how to avoid it!
Before I'm looking at (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server) Can you help with directions on how to not use wandb?
thank you

Token generation limit

I was getting the RateLimitReached Error yesterday (after around the 80th generation each prompt is around 10000 tokens). My simple workaround is below, but is there a better way?

def generate_examples(tokenizer, prompt, number_of_examples):
    # Generate examples
    prev_examples = []
    for i in range(number_of_examples):
        try:
            print(f'Generating example {i}')
            prompt_tokens = tokenizer.tokenize(prompt)
            prev_examples_tokens = [tokenizer.tokenize(example) for example in prev_examples]
            total_tokens = len(prompt_tokens) + sum(len(tokens) for tokens in prev_examples_tokens)
            print(f'Tokens in prompt and previous examples: {total_tokens}')
            example = generate_example(prompt, prev_examples, temperature)
            print(example)
            prev_examples.append(example)
    #         if i % 5 == 0:
    #             time.sleep(10)
        except openai.error.RateLimitError:
            print("RATELIMITREACHED: waiting 10 seconds")
            time.sleep(10)

here is a local version: https://github.com/xiscoding/local_gpt_llm_trainer

The model `gpt-4` does not exist or you do not have access to it

In the first cell I got this error message: InvalidRequestError: The model gpt-4 does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.
Has anyone else seen this error or know to how fix it?

How to mine fine-tuning samples from specified corpora

How to expand the system to limit the generation of fine-tuning samples based on a given set of corpus documents, rather than blindly fabricating them。
For example, generating fine-tuning samples for disease diagnosis, I hope it is based on the case in the uploaded real diagnosis report

ㅂㅂ

the model before lora load and after lora load is diff

the model in point 1 and point 2 shown below is diff, i've compared their respective generated text.. it's really different.

1.just aft 4bit training->gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)

2.model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()
gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)