Coder Social home page Coder Social logo

deepseek-coder's People

Contributors

aleksandir avatar antiquality avatar benjamin-eecs avatar bingxuanwang avatar dejianyang avatar eltociear avatar foldl avatar guoday avatar itstalmeez avatar jacoblincool avatar lyriczhao avatar pcystc avatar pkuzqh avatar soloice avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepseek-coder's Issues

显存占用过高,发现示例是fp32

运行 README 里的例子,发现尽管config里有torch_dtype,但transformers加载仍然是float32,建议在 README 示例中from_pretrained指定torch_dtype="auto",这样是bfloat16。

P.S. SM低于80的用户应该不能使用bfloat16。

path = "./DeepSeek"
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype="auto").cuda()
total = 0
for param in model.parameters():
    total += param.numel() * param.element_size()
print(total / 2**30, "GB")
print(next(model.parameters()).dtype)

Discussion about possible contamination?

Hi, I appreciate your great contribution to the open-source community!

I wonder if there's any measure you took to prevent contamination in your training corpus or any discussion about it in your technical report.

Thanks

deepseek-coder-1.3b-instruct运行问题

你好,我尝试运行deepseek-ai/deepseek-coder-1.3b-instruct这个模型

下载后根据README.md的例子运行,在如下代码会报错

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

错误如下:

AttributeError: 'LlamaTokenizerFast' object has no attribute 'apply_chat_template'

这个是因为环境问题么?能提供这个模型的运行api或demo吗?

Sagemaker hugging face deployment issue:

hi, good afternoon, i deployed the deepseek-ai/deepseek-coder-7b-instruct model on sagemaker with the same config as your demo on hugging face like tok_p 0.9 and top_k 50, i assume the temprature is 0.6, if it is not please tell me the one you use, do_sample as false, it is running fine, but if i try a prompt on your demo, it gives correct and accurate result, but if i prompt the one i deployed it doesn't give me as accurate result with thesame prompt, please is there any tweak that you did there and you can share it with me, please i need your help. thanks. @chester please respond to this.
and please could it be that there is "deepseek-ai/deepseek-coder-7b-instruct" and also "deepseek-ai/deepseek-coder-7b-chat"
?

and please what is the stop token, because even if i use "stop":[<|EOT|>], it still keep generating until the max_new_token is exhausted.
here is how i am deploying to sagemaker:
`import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

Hub Model configuration. https://huggingface.co/models

hub = {
'HF_MODEL_ID':'deepseek-ai/deepseek-coder-6.7b-instruct',
'SM_NUM_GPUS': json.dumps(1)
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)

send request

predictor.predict({
"inputs": "My name is Julien and I like to",
"parameters": {
"do_sample": False,
"top_p": 0.90,
"top_k": 50,
"temperature": 0.35,
"max_new_tokens": 1024,
"repetition_penalty": 1.0,
"stop": ["<|EOT|>"]
}
})`

Question on the license

Really fantastic work!!! I just have one question about the license, which is said to be free for both research and commercial use. Just curious was the DeepSeek-Coder-Instruct series fine-tuned on any GPT-3.5/4 generated data? If so, I'm not sure if it can claim to be "completely open source", because the OpenAI policy has a restriction that "People cannot use the output of them to develop models that compete with OpenAI" (c.f. https://openai.com/policies/terms-of-use). Would appreciate any clarification!

Instruction dataset?

Hi,
The humaneval of deepseek-7b-instruct is 78.6%, but I can't get the result, which datasets are you training?

代码输出和官网输出不

当使用本地代码跑的时候,大于670长度的话,模型直接输出原来的问句是什么意思?为什么同样的句子,代码出不来结果,官网可以.
image
image

TensorRT-LLM Support

We tried to deploy 6.7B instruct model with TensorRT-LLM and Triton Inference Server. The model seems not giving correct generation but only repeating certain characters all the time.

Any instructions about TensorRT-LLM deployment or plans in the future? Thank you.

使用ollama后发生了驴唇不对马嘴的情况

Modelfile

FROM ./deepseek-coder-6.7b-instruct.Q4_0.gguf

shell

ollama create DeepSeekqq4 -f Modelfile

output

deepseek-coder-6.7B-instruct-GGUF# ollama run DeepSeekqq4:latest
>>> hi
.


A: If you are using a version of Java that supports the java-xml and jsoup libraries, then these libraries provide
some very powerful tools for manipulating HTML content.
Here's an example to get all href links from html page with JSoup:
Document doc = Jsoup.connect("http://example.com").get();
Elements links = doc.select("a[href]");
for (Element link : links) {
    System.out.println(link.attr("abs:href"));
}

Or if you prefer java-xml, here's an example:
URL url = new URL("http://example.com");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document doc = dbf.newDocumentBuilder().parse(url.openStream());
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("^C

以上是我遇到的问题.感觉是模型不兼容的问题..也可能是量化问题.希望有人可以复现,并提供解决办法.

Dedup of code during data prep

Thank you for sharing the details on this work. This is indeed impressive!

It was mentioned that a repo level dedup was performed. Did you guys consider exact and fuzzy dedup at file level also or did some studies to see if repo level dedup performed better?

Also, was your repo level dedup, an exact dedup or a fuzzy dedup?

Visual Studio Code Extension

Hello everyone,

Is there a plan to create a visual studio code extension that can use my codebase as context?

And thank you for DeepSeek, it's brilliant and I hope it stays free.

请问如何用VLLM部署33B

会报错啊,单机A100 ,torch 2.01, transformers 4.35
key = torch.repeat_interleave(key, self.num_queries_per_kv, dim=1)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

使用Ollama运行6.7B,Chat回复空白

你好,

我使用ollama(0.1.7)加载guff文件

使用的Modelfile为:

FROM ./deepseek-coder-6.7b-instruct.Q4_K_M.gguf
PARAMETER temperature 0.7

TEMPLATE """{{ .System }}
### Instruction:
{{ .Prompt }}
### Response:
"""

SYSTEM """You are an advanced AI programming assistant."""

加载后运行:

ollama run deepseekQ4KM   
>>> hi
    













^C

似乎一直在输出“回车”?

示例3代码运行报错

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("./deepseek-coder-1.3b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("./deepseek-coder-1.3b-instruct", trust_remote_code=True).cuda()
messages=[
... { 'role': 'user', 'content': "write a quick sort algorithm in python."}
... ]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'LlamaTokenizerFast' object has no attribute 'apply_chat_template'

How can i create a 13B version?

33b version is way too slow for me. I'm currently using 6.7b, but I wonder if I can use a model between 6.7b to 13b.

Is there any way to downscale 33B model to a model with custom hyper-parameters?
Thanks

Reproducing HumanEval scores

Hi thanks for the great work.

I have just evaluated the 33b base model on both HumanEval and HumanEval+. However, there is a certain gap between the claimed 56.1% pass@1 from https://github.com/deepseek-ai/DeepSeek-Coder#2-evaluation-results and my regenerated 51.2%. I am curious if there is anything that I misimplemented for DeepSeek-Coder? We were using torch.bfloat16 and the default prompt from HumanEval. Meanwhile, I am using greedy decoding with top_p=0.95 and top_k=None.

image

Cannot reproduce the results of HumanEval-Python

Thanks for your great work! I'm using the scripts in your repo to evaluate the released DeepSeek-Coder 1.3B (both base version and instruct version) with 8 V100s (therefore I use fp16 as the mixed precision).

But there is a performance gap between my results and those reported in this repo (3% for the base version and around 20% for the instruct version).

I'm wondering how to reproduce the reported results and might this is caused by fp16?

加载33b模型异常

下载33b 的模型之后,加载不了 ,同样的命令6.7, 1.3的都可以
image

当Prompt中带有比较多的数字时,33b-instruct模型会重复输出。

Prompt中有这样一串数字:``json\n{"发布月份": ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"], "中标总金额": ["529.62万", "170.91万", "176.99万", "1046.61万", "290.27万", "1109.53万", "335.30万", "1104.10万", "1011.10万", "383.46万", "192.12万", "225.39万"]}
要求根据这串数字,利用pyechart的python包,生成折线图。
image

某些情况下,模型会重复最后的一两句话

6B和33B都存在类似问题,6B在某些情况下,也会出现类似现象

可以复现的环境:https://huggingface.co/spaces/deepseek-ai/deepseek-coder-33b-instruct

问:计算机科学知识范围很大哎

答:是的,计算机科学是一个非常广泛的领域,涵盖了许多不同的主题。以下是一些主要的计算机科学领域:

计算理论:研究计算机和计算过程的性质和行为。

计算机系统:研究计算机硬件和软件的设计、实现和应用。

网络和通信:研究计算机网络和通信系统的设计、实现和应用。

数据科学:研究如何从数据中提取有用的信息和知识。

人工智能:研究如何使计算机系统能够理解、学习和适应环境。

计算机图形学:研究计算机图形和图像的生成和处理。

软件工程:研究如何设计、实现和维护软件系统。

信息安全:研究如何保护计算机系统和网络免受攻击。

数据库:研究如何存储、检索和管理数据。

计算机视觉:研究如何使计算机系统能够理解和 interpret visual information。

计算机图形学:研究如何创建和处理图形和图像。

人机交互:研究如何使计算机系统和用户交互。

计算机网络:研究如何连接和管理计算机和网络。

计算机架构:研究计算机硬件和软件的设计和实现。

计算机语言:研究如何创建和理解计算机程序。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

自然语言处理:研究如何使计算机系统理解和生成自然语言。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

计算机视觉:研究如何使计算机系统理解和解释视觉信息。

计算机图形学:研究如何创建和处理图形和图像。

生成sql

大家有没有使用它输出sql结果的,这个prompt应该怎么构建

执行finetune脚本之后,未看到模型保存

执行微调脚本后,在OUTPUT_PATH没有保存模型,也没有明显的报错
执行环境:24GB显存、32GB内存

输入:100个训练样本

执行finetune.sh后的输出:
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Using /home/jovyan/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/jovyan/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 6.114262342453003 seconds
Parameter Offload: Total persistent parameters: 266240 in 65 params

About the Max Sequence Length

Hi,

Some of the released models have config 'max_position_embeddings' set to 8192 instead of 16384. However, in the repo readme, the models are said to have a 16k context size. Could you clarify what the max sequence length of these models is? Thanks!

DeepSeek Chat

Hi! Do you plan on releasing the DeepSeek Chat models?

flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol

from transformers import AutoTokenizer, AutoModelForCausalLM

folder="deepseek-coder-6.7b-instruct"
model_name="deepseek-ai/deepseek-coder-6.7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).cuda()

Traceback (most recent call last):
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1345, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 48, in
from flash_attn import flash_attn_func, flash_attn_varlen_func
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn/init.py", line 3, in
from flash_attn.flash_attn_interface import (
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 8, in
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationESsb

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/blap/AutoGPTQ_teste/update_models.py", line 14, in
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).cuda()
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
model_class = _get_model_class(config, cls._model_mapping)
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 387, in _get_model_class
supported_models = model_mapping[type(config)]
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 740, in getitem
return self._load_attr_from_module(model_type, model_name)
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 754, in _load_attr_from_module
return getattribute_from_module(self._modules[module_name], attr)
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 698, in getattribute_from_module
if hasattr(module, attr):
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1335, in getattr
module = self._get_module(self._class_to_module[name])
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1347, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationESsb

finetune效果不能复现

在基础模型上,使用同样规模的2B的进化数据进行finetune,但不能复现humaneval的效果。可以提供相关建议么

Running finetune_deepseekcoder.py results in return code = -9 and running script directly results in RuntimeError: 'weight' must be 2-D

Thank you for the handy fine tuning guide but I am not able to get started.

I tried using the default settings as a POC but it ends up erroring out.

This is the output I get when using the sample deepspeed command in the README.md

[2023-11-27 20:47:43,736] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-27 20:47:44,929] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-11-27 20:47:44,929] [INFO] [runner.py:570:main] cmd = /home/user/DeepSeek-Coder/finetune/.venv/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None finetune_deepseekcoder.py --model_name_or_path deepseek-ai/deepseek-coder-6.7b-instruct --data_path ./data/training-data.json --output_dir ./output/ --num_train_epochs 3 --model_max_length 1024 --per_device_train_batch_size 16 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy no --save_strategy steps --save_steps 100 --save_total_limit 100 --learning_rate 2e-5 --warmup_steps 10 --logging_steps 1 --lr_scheduler_type cosine --gradient_checkpointing True --report_to tensorboard --deepspeed configs/ds_config_zero3.json --bf16 True
[2023-11-27 20:47:47,291] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-27 20:47:48,425] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2023-11-27 20:47:48,425] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-11-27 20:47:48,425] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-11-27 20:47:48,425] [INFO] [launch.py:163:main] dist_world_size=1
[2023-11-27 20:47:48,425] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-11-27 20:47:52,153] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-27 20:47:52,476] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-27 20:47:52,476] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
====================================================================================================
TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=configs/ds_config_zero3.json,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./output/runs/Nov27_20-47-51_dev-llm-finetuning,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=1.0,
logging_strategy=steps,
lr_scheduler_type=cosine,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=./output/,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=16,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./output/,
save_on_each_node=False,
save_safetensors=True,
save_steps=100,
save_strategy=steps,
save_total_limit=100,
seed=42,
skip_memory_metrics=True,
split_batches=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=10,
weight_decay=0.0,
)
PAD Token: <|end▁of▁sentence|> 32014
BOS Token <|begin▁of▁sentence|> 32013
EOS Token <|EOT|> 32021
Load tokenizer from deepseek-ai/deepseek-coder-6.7b-instruct over.
[2023-11-27 20:48:03,930] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 291, num_elems = 6.74B
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.72s/it]
Load model from deepseek-ai/deepseek-coder-6.7b-instruct over.
Training dataset samples: 99
...
Using /home/user/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/user/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.3529317378997803 seconds
Parameter Offload: Total persistent parameters: 266240 in 65 params
[2023-11-27 20:49:16,555] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 36733
[2023-11-27 20:49:16,557] [ERROR] [launch.py:321:sigkill_handler] ['/home/user/DeepSeek-Coder/finetune/.venv/bin/python', '-u', 'finetune_deepseekcoder.py', '--local_rank=0', '--model_name_or_path', 'deepseek-ai/deepseek-coder-6.7b-instruct', '--data_path', './data/training-data.json', '--output_dir', './output/', '--num_train_epochs', '3', '--model_max_length', '1024', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '4', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '100', '--save_total_limit', '100', '--learning_rate', '2e-5', '--warmup_steps', '10', '--logging_steps', '1', '--lr_scheduler_type', 'cosine', '--gradient_checkpointing', 'True', '--report_to', 'tensorboard', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', 'True'] exits with return code = -9

I tried to run the finetune_deepseekcoder.py script directly to see what the actual error is and it outputted

Traceback (most recent call last):
  File "/home/user/DeepSeek-Coder/finetune/finetune_deepseekcoder.py", line 193, in <module>
    train()
  File "/home/user/DeepSeek-Coder/finetune/finetune_deepseekcoder.py", line 187, in train
    trainer.train()
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 1555, in train
    return inner_training_loop(
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 2725, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 2748, in compute_loss
    outputs = model(**inputs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 659, in forward
    return model_forward(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 647, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
    outputs = self.model(
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 879, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D

有api调用吗

比那个免费版的chatgpt写代码强了不少,确定是国内大模型的天花板了。
但作为一个有10000片A100的公司,你们内测的聊天应用生成速度太慢了,你们确实是一个魔幻的公司

重复输出

请问下,代码补全中模型经常出现说不停的情况,你们有发现类似的情况么?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.