Coder Social home page Coder Social logo

Comments (13)

DejianYang avatar DejianYang commented on August 22, 2024 1

size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32256, 2048]).

https://huggingface.co/docs/accelerate/usage_guides/deepspeed#saving-and-loading

from deepseek-coder.

A-Janj avatar A-Janj commented on August 22, 2024 1

size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32256, 2048]).

https://huggingface.co/docs/accelerate/usage_guides/deepspeed#saving-and-loading

Thank you so much for your time and help. This helped me in understanding the different deepspeed config files and ZeRO stages.
What I did to resolve my issue was to put trainer.save_model() and tokenizer in the finetune script, done as below:

trainer.train()
trainer.save_model("SaveOutputFolder")
trainer.tokenizer.save_pretrained("SaveOutputFolder")
trainer.save_state()

Thanks once again so much for all your help.

from deepseek-coder.

DejianYang avatar DejianYang commented on August 22, 2024

I am trying to finetune DeepSeek-Coder but I am getting this -9 kill code, and I have no idea why. My dataset is in the following format:

Please check if you have enough CPU memory?

from deepseek-coder.

A-Janj avatar A-Janj commented on August 22, 2024

I am trying to finetune DeepSeek-Coder but I am getting this -9 kill code, and I have no idea why. My dataset is in the following format:

Please check if you have enough CPU memory?

I have 64 GB RAM (CPU memory). How much does deepseek require to get finetuned?

from deepseek-coder.

DejianYang avatar DejianYang commented on August 22, 2024

I am trying to finetune DeepSeek-Coder but I am getting this -9 kill code, and I have no idea why. My dataset is in the following format:

Please check if you have enough CPU memory?

I have 64 GB RAM (CPU memory). How much does deepseek require to get finetuned?

I do not have exact number of RAM required by finetune. The DeepSpeed is used in the finetune script which requires a lot of RAM to do cpu offload. You can try another config of deepspeed to reduce the cpu memory used if you have enough GPU memory. Maybe you can try our 1b model first.

from deepseek-coder.

A-Janj avatar A-Janj commented on August 22, 2024

I am trying to finetune DeepSeek-Coder but I am getting this -9 kill code, and I have no idea why. My dataset is in the following format:

Please check if you have enough CPU memory?

I have 64 GB RAM (CPU memory). How much does deepseek require to get finetuned?

I do not have exact number of RAM required by finetune. The DeepSpeed is used in the finetune script which requires a lot of RAM to do cpu offload. You can try another config of deepspeed to reduce the cpu memory used if you have enough GPU memory. Maybe you can try our 1b model first.

Can you give me an idea of how much GPU VRAM I would require if I have 64 GB system RAM?
Moreover should I put false instead of true in offload cpu parameters in ds_config_zero3.json, like:
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": false
},
"offload_param": {
"device": "cpu",
"pin_memory": false
},

from deepseek-coder.

A-Janj avatar A-Janj commented on August 22, 2024

Screenshot from 2024-03-13 13-06-24

@DejianYang Can You Help Me?!

I was able to finetune the 6.7b parameter model using 1 x H100 80GB SXM5 (80 GB VRAM and 251 GB RAM 24 vCPU). The Finetune script created files in the given output folder, but model.safetensors file is only 539.6 kB.

Doing inference on the finetuned directory first gave the following error:

"RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32256, 2048]).
You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method."

After setting ignore_mismatched_sizes=True argument in the from_pretrained method the model is giving gibberish. You can see the inference code and the output in the screenshot.

Am I missing something?

from deepseek-coder.

seancarmod-y avatar seancarmod-y commented on August 22, 2024

@DejianYang I'm looking to fine-tune deepseek-coder-1.3b-base. Ideally, I'd like to do it using huggingface libraries as I have done for tinyllama in the attached file. Is this possible or do I need to use the finetune_deepseekcoder.py (can this even be used for the 1.3b model?)
fine_tune_tiny_llama.txt

from deepseek-coder.

DejianYang avatar DejianYang commented on August 22, 2024

fine_tune_tiny_llama.txt
Yes, you can use the script to finetune our model just like you are using other llama models.

from deepseek-coder.

seancarmod-y avatar seancarmod-y commented on August 22, 2024

fine_tune_tiny_llama.txt
Yes, you can use the script to finetune our model just like you are using other llama models.

Hi, that's great, thanks. I can't seem to find documentation on how to format the custom dataset. For both Llama2 and Tinyllama I have formatted it as a csv where there is a 'text' column. Is there a similar format that I can follow for deepseek-coder-1.3b? The formats of each row are:
Llama2: [INST] prompt [/INST] Llama2 answer <\s>
Tinyllama: <|user|>
prompt
<|assistant|>
Tinyllama answer
I then load the dataset like this:
from datasets import load_dataset
dataset = load_dataset(dataset_folder, split="train")

from deepseek-coder.

LarkLeeOnePiece avatar LarkLeeOnePiece commented on August 22, 2024

Sorry, can you solve the mismatched_sizes problem after adding"trainer.train()
trainer.save_model("SaveOutputFolder")
trainer.Tokenizer.save_pretrained("SaveOutputFolder")
trainer.save_state()"
I met the same prblem, could you help me out?
Do you use the AutoTokenizer.from_pretrained and AutoModelForCausalLM.from_pretrained to load the model and tokenier?

from deepseek-coder.

A-Janj avatar A-Janj commented on August 22, 2024

Sorry, can you solve the mismatched_sizes problem after adding"trainer.train() trainer.save_model("SaveOutputFolder") trainer.Tokenizer.save_pretrained("SaveOutputFolder") trainer.save_state()" I met the same prblem, could you help me out? Do you use the AutoTokenizer.from_pretrained and AutoModelForCausalLM.from_pretrained to load the model and tokenier?

I then used the following code for inference:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("SaveOutputFolder", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("SaveOutputFolder", ignore_mismatched_sizes=True, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()

input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128, top_k=50,
top_p=0.95,
do_sample=True,
temperature=0.9, # Adjust as needed
repetition_penalty=1.2, # Penalize repeated tokens
no_repeat_ngram_size=2, # Prevent repeating n-grams
num_return_sequences=1)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

from deepseek-coder.

A-Janj avatar A-Janj commented on August 22, 2024

fine_tune_tiny_llama.txt
Yes, you can use the script to finetune our model just like you are using other llama models.

Hi, that's great, thanks. I can't seem to find documentation on how to format the custom dataset. For both Llama2 and Tinyllama I have formatted it as a csv where there is a 'text' column. Is there a similar format that I can follow for deepseek-coder-1.3b? The formats of each row are: Llama2: [INST] prompt [/INST] Llama2 answer <\s> Tinyllama: <|user|> prompt <|assistant|> Tinyllama answer I then load the dataset like this: from datasets import load_dataset dataset = load_dataset(dataset_folder, split="train")

this is the sample data set format for deepseek coder: https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1

the .json file should contain data like:
[
{
"instruction": "give python syntax in a Nutshell",
"output": "Row1"
},
{
"instruction": "Print the content in between the curly brackets to the template output",
"output": "Row2"
},
{
"instruction": "Statements of the Jinja language that do not have an output.",
"output": "Row3"
}
]

from deepseek-coder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.