deepseek-ai / deepseek-coder Goto Github PK

View Code? Open in Web Editor NEW

6.0K 6.0K 427.0 12.46 MB

DeepSeek Coder: Let the Code Write Itself

Home Page: https://coder.deepseek.com/

License: MIT License

Python 99.73% Shell 0.27%

deepseek-coder's People

Contributors

Stargazers

Watchers

Forkers

lzhbrian pippo1980 blueskyoneline steve-hu232 huhusmang ralf12358 codeaudit jmaigc wemersiveadmin chaojigang001 chesketh76 caojian1983 grquin bigrussthesolver tangtc1981 octag0no paco46545 pakloong clic-ethiopia f901107 choco9966 lyriczhao zguo0525 darcstar-solutions-tech tokenbender fai247 kingler flyhof eailoo yangxiaobao87 apollohuang1 sakthisurendar python-popular-repos wwc-development ejmejm scoutink asdlei99 webrulon benzhu2003 nhathuyvn19 sj-jay thevoidroger sinaxvo1 akankushjnvku martatalaia cybertron-ant kolyapank techthiyanes yeamoon zheng5yu9 farhadfa22 davidmonterocrespo24 kldulatre nothingfool ngogiaphat rrossezra 9cat musoftware kp-forks ideaguy3d gdavis74343 owengoble fredatgithub boringdanv devpius davidgao7 kaotik seifeur mohamedmaghrby yujun-8848 frankcastle2 gitzart johnny-rice pounddelia donewiththedollar flaviofuego buphnezz nliver manijeh-a echo-ht kenhanok mikigit11 jhozha beriameria docugs yanxg gmdorf guorunfa sophianandres009 ottoaaf qkjin yilmazcamci nzdub vanyapucman webclinic017 metagrapher-io reedington akembona cnopens patitimoner

deepseek-coder's Issues

显存占用过高，发现示例是fp32

运行 README 里的例子，发现尽管config里有torch_dtype，但transformers加载仍然是float32，建议在 README 示例中from_pretrained指定torch_dtype="auto"，这样是bfloat16。

P.S. SM低于80的用户应该不能使用bfloat16。

path = "./DeepSeek"
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype="auto").cuda()
total = 0
for param in model.parameters():
    total += param.numel() * param.element_size()
print(total / 2**30, "GB")
print(next(model.parameters()).dtype)

Discussion about possible contamination?

Hi, I appreciate your great contribution to the open-source community!

I wonder if there's any measure you took to prevent contamination in your training corpus or any discussion about it in your technical report.

Thanks

为啥运行完测试demo，感觉显卡资源不释放，必须重启服务器才可以？

请问在微调模型后，如何加载微调后的模型并在测试集上评估性能？

有没有这样的代码样例，或者可参考的例子，谢谢

建议提供全中文的注释和使用手册：）

模型效果很不错，我试用了。
希望后续贵司能提供一下全中文的 readme.md 和全中文的使用手册。
毕竟，**程序员挺多的，你们说呢？

deepseek-coder-1.3b-instruct运行问题

你好，我尝试运行deepseek-ai/deepseek-coder-1.3b-instruct这个模型

下载后根据README.md的例子运行，在如下代码会报错

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

错误如下：

AttributeError: 'LlamaTokenizerFast' object has no attribute 'apply_chat_template'

这个是因为环境问题么？能提供这个模型的运行api或demo吗？

Sagemaker hugging face deployment issue:

hi, good afternoon, i deployed the deepseek-ai/deepseek-coder-7b-instruct model on sagemaker with the same config as your demo on hugging face like tok_p 0.9 and top_k 50, i assume the temprature is 0.6, if it is not please tell me the one you use, do_sample as false, it is running fine, but if i try a prompt on your demo, it gives correct and accurate result, but if i prompt the one i deployed it doesn't give me as accurate result with thesame prompt, please is there any tweak that you did there and you can share it with me, please i need your help. thanks. @chester please respond to this.
and please could it be that there is "deepseek-ai/deepseek-coder-7b-instruct" and also "deepseek-ai/deepseek-coder-7b-chat"
?

and please what is the stop token, because even if i use "stop":[<|EOT|>], it still keep generating until the max_new_token is exhausted.
here is how i am deploying to sagemaker:
`import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

Hub Model configuration. https://huggingface.co/models

hub = {
'HF_MODEL_ID':'deepseek-ai/deepseek-coder-6.7b-instruct',
'SM_NUM_GPUS': json.dumps(1)
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)

send request

predictor.predict({
"inputs": "My name is Julien and I like to",
"parameters": {
"do_sample": False,
"top_p": 0.90,
"top_k": 50,
"temperature": 0.35,
"max_new_tokens": 1024,
"repetition_penalty": 1.0,
"stop": ["<|EOT|>"]
}
})`

Repo level concatenation of data

Can you share more details on the technique for repo level concatenation part?

我看模型支持了amis，请问下amis的训练数据应该如何构造？

Question on the license

Really fantastic work!!! I just have one question about the license, which is said to be free for both research and commercial use. Just curious was the DeepSeek-Coder-Instruct series fine-tuned on any GPT-3.5/4 generated data? If so, I'm not sure if it can claim to be "completely open source", because the OpenAI policy has a restriction that "People cannot use the output of them to develop models that compete with OpenAI" (c.f. https://openai.com/policies/terms-of-use). Would appreciate any clarification!

Instruction dataset?

Hi,
The humaneval of deepseek-7b-instruct is 78.6%, but I can't get the result, which datasets are you training?

tokenizer.model

Hey!

I'd like to work on implementing exllama support, but the tokenizer.model is missing.

https://discord.com/channels/1169871344037548062/1170016413428228168/1171019891839615077
https://discord.com/channels/1169871344037548062/1170016413428228168/1171040772229967952

A couple of weeks ago I asked for this in the discord, I was told that you are working on that

I would highly appreciate if you published the tokenizer.model :)

代码输出和官网输出不

当使用本地代码跑的时候，大于670长度的话，模型直接输出原来的问句是什么意思？为什么同样的句子，代码出不来结果，官网可以.

TensorRT-LLM Support

We tried to deploy 6.7B instruct model with TensorRT-LLM and Triton Inference Server. The model seems not giving correct generation but only repeating certain characters all the time.

Any instructions about TensorRT-LLM deployment or plans in the future? Thank you.

For evaluating on the MBPP dataset, any code for the instruction-based model?

There are different codes for base model and instruction-based model when evaluated on HumanEval dataset. But I didn't find the code for instruction-based model evaluated on MBPP dataset.

使用ollama后发生了驴唇不对马嘴的情况

Modelfile

FROM ./deepseek-coder-6.7b-instruct.Q4_0.gguf

shell

ollama create DeepSeekqq4 -f Modelfile

output

deepseek-coder-6.7B-instruct-GGUF# ollama run DeepSeekqq4:latest
>>> hi
.


A: If you are using a version of Java that supports the java-xml and jsoup libraries, then these libraries provide
some very powerful tools for manipulating HTML content.
Here's an example to get all href links from html page with JSoup:
Document doc = Jsoup.connect("http://example.com").get();
Elements links = doc.select("a[href]");
for (Element link : links) {
    System.out.println(link.attr("abs:href"));
}

Or if you prefer java-xml, here's an example:
URL url = new URL("http://example.com");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document doc = dbf.newDocumentBuilder().parse(url.openStream());
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("^C

以上是我遇到的问题.感觉是模型不兼容的问题..也可能是量化问题.希望有人可以复现,并提供解决办法.

Dedup of code during data prep

Thank you for sharing the details on this work. This is indeed impressive!

It was mentioned that a repo level dedup was performed. Did you guys consider exact and fuzzy dedup at file level also or did some studies to see if repo level dedup performed better?

Also, was your repo level dedup, an exact dedup or a fuzzy dedup?

Visual Studio Code Extension

Hello everyone,

Is there a plan to create a visual studio code extension that can use my codebase as context?

And thank you for DeepSeek, it's brilliant and I hope it stays free.

请问如何用VLLM部署33B

会报错啊，单机A100 ，torch 2.01， transformers 4.35
key = torch.repeat_interleave(key, self.num_queries_per_kv, dim=1)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

有无计划发布量化后的版本

想问一下有无计划发布int4，或int8量化后的模型

使用Ollama运行6.7B，Chat回复空白

你好，

我使用ollama(0.1.7)加载guff文件

使用的Modelfile为：

FROM ./deepseek-coder-6.7b-instruct.Q4_K_M.gguf
PARAMETER temperature 0.7

TEMPLATE """{{ .System }}
### Instruction:
{{ .Prompt }}
### Response:
"""

SYSTEM """You are an advanced AI programming assistant."""

加载后运行:

ollama run deepseekQ4KM   
>>> hi
    













^C

似乎一直在输出“回车”？

Impact of repo-level code concat

Very impressive work. Does the repo-level code conact improve humaneval result?

示例3代码运行报错

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("./deepseek-coder-1.3b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("./deepseek-coder-1.3b-instruct", trust_remote_code=True).cuda()
messages=[
... { 'role': 'user', 'content': "write a quick sort algorithm in python."}
... ]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'LlamaTokenizerFast' object has no attribute 'apply_chat_template'

How can i create a 13B version?

33b version is way too slow for me. I'm currently using 6.7b, but I wonder if I can use a model between 6.7b to 13b.

Is there any way to downscale 33B model to a model with custom hyper-parameters?
Thanks

Reproducing HumanEval scores

Hi thanks for the great work.

I have just evaluated the 33b base model on both HumanEval and HumanEval+. However, there is a certain gap between the claimed 56.1% pass@1 from https://github.com/deepseek-ai/DeepSeek-Coder#2-evaluation-results and my regenerated 51.2%. I am curious if there is anything that I misimplemented for DeepSeek-Coder? We were using torch.bfloat16 and the default prompt from HumanEval. Meanwhile, I am using greedy decoding with top_p=0.95 and top_k=None.

Cannot reproduce the results of HumanEval-Python

Thanks for your great work! I'm using the scripts in your repo to evaluate the released DeepSeek-Coder 1.3B (both base version and instruct version) with 8 V100s (therefore I use fp16 as the mixed precision).

But there is a performance gap between my results and those reported in this repo (3% for the base version and around 20% for the instruct version).

I'm wondering how to reproduce the reported results and might this is caused by fp16?

Prompt format of chat model

For chat model, I believe users have to know the prompt format in order to deploy the model correctly. Any plan to add instructions about the prompt format?

Like this:
https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0#prompt-format

加载33b模型异常

下载33b 的模型之后，加载不了，同样的命令6.7， 1.3的都可以

当Prompt中带有比较多的数字时，33b-instruct模型会重复输出。

Prompt中有这样一串数字：``json\n{"发布月份": ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"], "中标总金额": ["529.62万", "170.91万", "176.99万", "1046.61万", "290.27万", "1109.53万", "335.30万", "1104.10万", "1011.10万", "383.46万", "192.12万", "225.39万"]}
要求根据这串数字，利用pyechart的python包，生成折线图。

训练基于多少种语言啊

代码补全or生成支持多少种语言？

某些情况下，模型会重复最后的一两句话

6B和33B都存在类似问题，6B在某些情况下，也会出现类似现象

可以复现的环境：https://huggingface.co/spaces/deepseek-ai/deepseek-coder-33b-instruct

问：计算机科学知识范围很大哎

答：是的，计算机科学是一个非常广泛的领域，涵盖了许多不同的主题。以下是一些主要的计算机科学领域：

计算理论：研究计算机和计算过程的性质和行为。

计算机系统：研究计算机硬件和软件的设计、实现和应用。

网络和通信：研究计算机网络和通信系统的设计、实现和应用。

数据科学：研究如何从数据中提取有用的信息和知识。

人工智能：研究如何使计算机系统能够理解、学习和适应环境。

计算机图形学：研究计算机图形和图像的生成和处理。

软件工程：研究如何设计、实现和维护软件系统。

信息安全：研究如何保护计算机系统和网络免受攻击。

数据库：研究如何存储、检索和管理数据。

计算机视觉：研究如何使计算机系统能够理解和 interpret visual information。

计算机图形学：研究如何创建和处理图形和图像。

人机交互：研究如何使计算机系统和用户交互。

计算机网络：研究如何连接和管理计算机和网络。

计算机架构：研究计算机硬件和软件的设计和实现。

计算机语言：研究如何创建和理解计算机程序。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

自然语言处理：研究如何使计算机系统理解和生成自然语言。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

计算机视觉：研究如何使计算机系统理解和解释视觉信息。

计算机图形学：研究如何创建和处理图形和图像。

我现在想使用ollama加载这个模型,但是我无法转换tokenizer.model

虽然问题有点2,但是,我希望可以提供一下转换方法,或者tokenizer.model 文件.

生成sql

大家有没有使用它输出sql结果的，这个prompt应该怎么构建

better docs

Please see here:
https://huggingface.co/deepseek-ai/deepseek-coder-33b-base/discussions/2#655c33b213309a611ba40964

这里微调脚本有什么特别处理吗？能给一下这里微调的的脚本细节吗？

执行finetune脚本之后，未看到模型保存

执行微调脚本后，在OUTPUT_PATH没有保存模型，也没有明显的报错
执行环境：24GB显存、32GB内存

输入：100个训练样本

执行finetune.sh后的输出：
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Using /home/jovyan/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/jovyan/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 6.114262342453003 seconds
Parameter Offload: Total persistent parameters: 266240 in 65 params

使用单A100加载33b模型，推理很慢，请问有模型并行的方案吗？

请问如何做模型并行？

能否集成到vscode的插件里

如题^_^

About the Max Sequence Length

Hi,

Some of the released models have config 'max_position_embeddings' set to 8192 instead of 16384. However, in the repo readme, the models are said to have a 16k context size. Could you clarify what the max sequence length of these models is? Thanks!

DeepSeek Chat

Hi! Do you plan on releasing the DeepSeek Chat models?

flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol

from transformers import AutoTokenizer, AutoModelForCausalLM

folder="deepseek-coder-6.7b-instruct"
model_name="deepseek-ai/deepseek-coder-6.7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).cuda()

Traceback (most recent call last):
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1345, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 48, in
from flash_attn import flash_attn_func, flash_attn_varlen_func
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn/init.py", line 3, in
from flash_attn.flash_attn_interface import (
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 8, in
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationESsb

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/blap/AutoGPTQ_teste/update_models.py", line 14, in
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).cuda()
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
model_class = _get_model_class(config, cls._model_mapping)
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 387, in _get_model_class
supported_models = model_mapping[type(config)]
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 740, in getitem
return self._load_attr_from_module(model_type, model_name)
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 754, in _load_attr_from_module
return getattribute_from_module(self._modules[module_name], attr)
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 698, in getattribute_from_module
if hasattr(module, attr):
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1335, in getattr
module = self._get_module(self._class_to_module[name])
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1347, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationESsb

finetune效果不能复现

在基础模型上，使用同样规模的2B的进化数据进行finetune，但不能复现humaneval的效果。可以提供相关建议么

Running finetune_deepseekcoder.py results in return code = -9 and running script directly results in RuntimeError: 'weight' must be 2-D

Thank you for the handy fine tuning guide but I am not able to get started.

I tried using the default settings as a POC but it ends up erroring out.

This is the output I get when using the sample deepspeed command in the README.md

[2023-11-27 20:47:43,736] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-27 20:47:44,929] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-11-27 20:47:44,929] [INFO] [runner.py:570:main] cmd = /home/user/DeepSeek-Coder/finetune/.venv/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None finetune_deepseekcoder.py --model_name_or_path deepseek-ai/deepseek-coder-6.7b-instruct --data_path ./data/training-data.json --output_dir ./output/ --num_train_epochs 3 --model_max_length 1024 --per_device_train_batch_size 16 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy no --save_strategy steps --save_steps 100 --save_total_limit 100 --learning_rate 2e-5 --warmup_steps 10 --logging_steps 1 --lr_scheduler_type cosine --gradient_checkpointing True --report_to tensorboard --deepspeed configs/ds_config_zero3.json --bf16 True
[2023-11-27 20:47:47,291] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-27 20:47:48,425] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2023-11-27 20:47:48,425] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-11-27 20:47:48,425] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-11-27 20:47:48,425] [INFO] [launch.py:163:main] dist_world_size=1
[2023-11-27 20:47:48,425] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-11-27 20:47:52,153] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-27 20:47:52,476] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-27 20:47:52,476] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
====================================================================================================
TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=configs/ds_config_zero3.json,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./output/runs/Nov27_20-47-51_dev-llm-finetuning,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=1.0,
logging_strategy=steps,
lr_scheduler_type=cosine,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=./output/,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=16,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./output/,
save_on_each_node=False,
save_safetensors=True,
save_steps=100,
save_strategy=steps,
save_total_limit=100,
seed=42,
skip_memory_metrics=True,
split_batches=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=10,
weight_decay=0.0,
)
PAD Token: <｜end▁of▁sentence｜> 32014
BOS Token <｜begin▁of▁sentence｜> 32013
EOS Token <|EOT|> 32021
Load tokenizer from deepseek-ai/deepseek-coder-6.7b-instruct over.
[2023-11-27 20:48:03,930] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 291, num_elems = 6.74B
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.72s/it]
Load model from deepseek-ai/deepseek-coder-6.7b-instruct over.
Training dataset samples: 99
...
Using /home/user/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/user/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.3529317378997803 seconds
Parameter Offload: Total persistent parameters: 266240 in 65 params
[2023-11-27 20:49:16,555] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 36733
[2023-11-27 20:49:16,557] [ERROR] [launch.py:321:sigkill_handler] ['/home/user/DeepSeek-Coder/finetune/.venv/bin/python', '-u', 'finetune_deepseekcoder.py', '--local_rank=0', '--model_name_or_path', 'deepseek-ai/deepseek-coder-6.7b-instruct', '--data_path', './data/training-data.json', '--output_dir', './output/', '--num_train_epochs', '3', '--model_max_length', '1024', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '4', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '100', '--save_total_limit', '100', '--learning_rate', '2e-5', '--warmup_steps', '10', '--logging_steps', '1', '--lr_scheduler_type', 'cosine', '--gradient_checkpointing', 'True', '--report_to', 'tensorboard', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', 'True'] exits with return code = -9

I tried to run the finetune_deepseekcoder.py script directly to see what the actual error is and it outputted

Traceback (most recent call last):
  File "/home/user/DeepSeek-Coder/finetune/finetune_deepseekcoder.py", line 193, in <module>
    train()
  File "/home/user/DeepSeek-Coder/finetune/finetune_deepseekcoder.py", line 187, in train
    trainer.train()
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 1555, in train
    return inner_training_loop(
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 2725, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 2748, in compute_loss
    outputs = model(**inputs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 659, in forward
    return model_forward(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 647, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 879, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D

deepseek-ai / deepseek-coder Goto Github PK

deepseek-coder's People

Contributors

Stargazers

Watchers

Forkers

deepseek-coder's Issues

Hub Model configuration. https://huggingface.co/models

create Hugging Face Model Class

deploy model to SageMaker Inference

send request

Recommend Projects

Recommend Topics

Recommend Org