deepseek-ai / deepseek-coder Goto Github PK
View Code? Open in Web Editor NEWDeepSeek Coder: Let the Code Write Itself
Home Page: https://coder.deepseek.com/
License: MIT License
DeepSeek Coder: Let the Code Write Itself
Home Page: https://coder.deepseek.com/
License: MIT License
运行 README 里的例子,发现尽管config里有torch_dtype
,但transformers
加载仍然是float32,建议在 README 示例中from_pretrained
指定torch_dtype="auto"
,这样是bfloat16。
P.S. SM低于80的用户应该不能使用bfloat16。
path = "./DeepSeek"
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype="auto").cuda()
total = 0
for param in model.parameters():
total += param.numel() * param.element_size()
print(total / 2**30, "GB")
print(next(model.parameters()).dtype)
Hi, I appreciate your great contribution to the open-source community!
I wonder if there's any measure you took to prevent contamination in your training corpus or any discussion about it in your technical report.
Thanks
有没有这样的代码样例,或者可参考的例子,谢谢
模型效果很不错,我试用了。
希望后续贵司能提供一下全中文的 readme.md 和全中文的使用手册。
毕竟,**程序员挺多的,你们说呢?
你好,我尝试运行deepseek-ai/deepseek-coder-1.3b-instruct这个模型
下载后根据README.md的例子运行,在如下代码会报错
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
错误如下:
AttributeError: 'LlamaTokenizerFast' object has no attribute 'apply_chat_template'
这个是因为环境问题么?能提供这个模型的运行api或demo吗?
hi, good afternoon, i deployed the deepseek-ai/deepseek-coder-7b-instruct model on sagemaker with the same config as your demo on hugging face like tok_p 0.9 and top_k 50, i assume the temprature is 0.6, if it is not please tell me the one you use, do_sample as false, it is running fine, but if i try a prompt on your demo, it gives correct and accurate result, but if i prompt the one i deployed it doesn't give me as accurate result with thesame prompt, please is there any tweak that you did there and you can share it with me, please i need your help. thanks. @chester please respond to this.
and please could it be that there is "deepseek-ai/deepseek-coder-7b-instruct" and also "deepseek-ai/deepseek-coder-7b-chat"
?
and please what is the stop token, because even if i use "stop":[<|EOT|>], it still keep generating until the max_new_token is exhausted.
here is how i am deploying to sagemaker:
`import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
hub = {
'HF_MODEL_ID':'deepseek-ai/deepseek-coder-6.7b-instruct',
'SM_NUM_GPUS': json.dumps(1)
}
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role=role,
)
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)
predictor.predict({
"inputs": "My name is Julien and I like to",
"parameters": {
"do_sample": False,
"top_p": 0.90,
"top_k": 50,
"temperature": 0.35,
"max_new_tokens": 1024,
"repetition_penalty": 1.0,
"stop": ["<|EOT|>"]
}
})`
Can you share more details on the technique for repo level concatenation part?
Really fantastic work!!! I just have one question about the license, which is said to be free for both research and commercial use. Just curious was the DeepSeek-Coder-Instruct series fine-tuned on any GPT-3.5/4 generated data? If so, I'm not sure if it can claim to be "completely open source", because the OpenAI policy has a restriction that "People cannot use the output of them to develop models that compete with OpenAI" (c.f. https://openai.com/policies/terms-of-use). Would appreciate any clarification!
Hi,
The humaneval of deepseek-7b-instruct is 78.6%, but I can't get the result, which datasets are you training?
Hey!
I'd like to work on implementing exllama support, but the tokenizer.model
is missing.
https://discord.com/channels/1169871344037548062/1170016413428228168/1171019891839615077
https://discord.com/channels/1169871344037548062/1170016413428228168/1171040772229967952
A couple of weeks ago I asked for this in the discord, I was told that you are working on that
I would highly appreciate if you published the tokenizer.model
:)
We tried to deploy 6.7B instruct model with TensorRT-LLM and Triton Inference Server. The model seems not giving correct generation but only repeating certain characters all the time.
Any instructions about TensorRT-LLM deployment or plans in the future? Thank you.
There are different codes for base model and instruction-based model when evaluated on HumanEval dataset. But I didn't find the code for instruction-based model evaluated on MBPP dataset.
Modelfile
FROM ./deepseek-coder-6.7b-instruct.Q4_0.gguf
shell
ollama create DeepSeekqq4 -f Modelfile
output
deepseek-coder-6.7B-instruct-GGUF# ollama run DeepSeekqq4:latest
>>> hi
.
A: If you are using a version of Java that supports the java-xml and jsoup libraries, then these libraries provide
some very powerful tools for manipulating HTML content.
Here's an example to get all href links from html page with JSoup:
Document doc = Jsoup.connect("http://example.com").get();
Elements links = doc.select("a[href]");
for (Element link : links) {
System.out.println(link.attr("abs:href"));
}
Or if you prefer java-xml, here's an example:
URL url = new URL("http://example.com");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document doc = dbf.newDocumentBuilder().parse(url.openStream());
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("^C
以上是我遇到的问题.感觉是模型不兼容的问题..也可能是量化问题.希望有人可以复现,并提供解决办法.
Thank you for sharing the details on this work. This is indeed impressive!
It was mentioned that a repo level dedup was performed. Did you guys consider exact and fuzzy dedup at file level also or did some studies to see if repo level dedup performed better?
Also, was your repo level dedup, an exact dedup or a fuzzy dedup?
Hello everyone,
Is there a plan to create a visual studio code extension that can use my codebase as context?
And thank you for DeepSeek, it's brilliant and I hope it stays free.
会报错啊,单机A100 ,torch 2.01, transformers 4.35
key = torch.repeat_interleave(key, self.num_queries_per_kv, dim=1)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
想问一下有无计划发布int4,或int8量化后的模型
你好,
我使用ollama(0.1.7)加载guff文件
使用的Modelfile为:
FROM ./deepseek-coder-6.7b-instruct.Q4_K_M.gguf
PARAMETER temperature 0.7
TEMPLATE """{{ .System }}
### Instruction:
{{ .Prompt }}
### Response:
"""
SYSTEM """You are an advanced AI programming assistant."""
加载后运行:
ollama run deepseekQ4KM
>>> hi
^C
似乎一直在输出“回车”?
Very impressive work. Does the repo-level code conact improve humaneval result?
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("./deepseek-coder-1.3b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("./deepseek-coder-1.3b-instruct", trust_remote_code=True).cuda()
messages=[
... { 'role': 'user', 'content': "write a quick sort algorithm in python."}
... ]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'LlamaTokenizerFast' object has no attribute 'apply_chat_template'
33b version is way too slow for me. I'm currently using 6.7b, but I wonder if I can use a model between 6.7b to 13b.
Is there any way to downscale 33B model to a model with custom hyper-parameters?
Thanks
Hi thanks for the great work.
I have just evaluated the 33b base model on both HumanEval and HumanEval+. However, there is a certain gap between the claimed 56.1% pass@1 from https://github.com/deepseek-ai/DeepSeek-Coder#2-evaluation-results and my regenerated 51.2%. I am curious if there is anything that I misimplemented for DeepSeek-Coder
? We were using torch.bfloat16
and the default prompt from HumanEval. Meanwhile, I am using greedy decoding with top_p=0.95 and top_k=None.
Thanks for your great work! I'm using the scripts in your repo to evaluate the released DeepSeek-Coder 1.3B (both base version and instruct version) with 8 V100s (therefore I use fp16 as the mixed precision).
But there is a performance gap between my results and those reported in this repo (3% for the base version and around 20% for the instruct version).
I'm wondering how to reproduce the reported results and might this is caused by fp16?
For chat model, I believe users have to know the prompt format in order to deploy the model correctly. Any plan to add instructions about the prompt format?
Like this:
https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0#prompt-format
代码补全or生成支持多少种语言?
6B和33B都存在类似问题,6B在某些情况下,也会出现类似现象
可以复现的环境:https://huggingface.co/spaces/deepseek-ai/deepseek-coder-33b-instruct
问:计算机科学知识范围很大哎
答:是的,计算机科学是一个非常广泛的领域,涵盖了许多不同的主题。以下是一些主要的计算机科学领域:
计算理论:研究计算机和计算过程的性质和行为。
计算机系统:研究计算机硬件和软件的设计、实现和应用。
网络和通信:研究计算机网络和通信系统的设计、实现和应用。
数据科学:研究如何从数据中提取有用的信息和知识。
人工智能:研究如何使计算机系统能够理解、学习和适应环境。
计算机图形学:研究计算机图形和图像的生成和处理。
软件工程:研究如何设计、实现和维护软件系统。
信息安全:研究如何保护计算机系统和网络免受攻击。
数据库:研究如何存储、检索和管理数据。
计算机视觉:研究如何使计算机系统能够理解和 interpret visual information。
计算机图形学:研究如何创建和处理图形和图像。
人机交互:研究如何使计算机系统和用户交互。
计算机网络:研究如何连接和管理计算机和网络。
计算机架构:研究计算机硬件和软件的设计和实现。
计算机语言:研究如何创建和理解计算机程序。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
自然语言处理:研究如何使计算机系统理解和生成自然语言。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
计算机视觉:研究如何使计算机系统理解和解释视觉信息。
计算机图形学:研究如何创建和处理图形和图像。
虽然问题有点2,但是,我希望可以提供一下转换方法,或者tokenizer.model 文件.
大家有没有使用它输出sql结果的,这个prompt应该怎么构建
执行微调脚本后,在OUTPUT_PATH没有保存模型,也没有明显的报错
执行环境:24GB显存、32GB内存
输入:100个训练样本
执行finetune.sh后的输出:
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Using /home/jovyan/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/jovyan/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 6.114262342453003 seconds
Parameter Offload: Total persistent parameters: 266240 in 65 params
请问如何做模型并行?
如题^_^
Hi,
Some of the released models have config 'max_position_embeddings' set to 8192 instead of 16384. However, in the repo readme, the models are said to have a 16k context size. Could you clarify what the max sequence length of these models is? Thanks!
Hi! Do you plan on releasing the DeepSeek Chat models?
from transformers import AutoTokenizer, AutoModelForCausalLM
folder="deepseek-coder-6.7b-instruct"
model_name="deepseek-ai/deepseek-coder-6.7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).cuda()
Traceback (most recent call last):
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1345, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 48, in
from flash_attn import flash_attn_func, flash_attn_varlen_func
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn/init.py", line 3, in
from flash_attn.flash_attn_interface import (
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 8, in
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationESsb
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/blap/AutoGPTQ_teste/update_models.py", line 14, in
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).cuda()
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
model_class = _get_model_class(config, cls._model_mapping)
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 387, in _get_model_class
supported_models = model_mapping[type(config)]
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 740, in getitem
return self._load_attr_from_module(model_type, model_name)
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 754, in _load_attr_from_module
return getattribute_from_module(self._modules[module_name], attr)
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 698, in getattribute_from_module
if hasattr(module, attr):
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1335, in getattr
module = self._get_module(self._class_to_module[name])
File "/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1347, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/home/blap/AutoGPTQ-env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationESsb
在基础模型上,使用同样规模的2B的进化数据进行finetune,但不能复现humaneval的效果。可以提供相关建议么
Thank you for the handy fine tuning guide but I am not able to get started.
I tried using the default settings as a POC but it ends up erroring out.
This is the output I get when using the sample deepspeed command in the README.md
[2023-11-27 20:47:43,736] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-27 20:47:44,929] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-11-27 20:47:44,929] [INFO] [runner.py:570:main] cmd = /home/user/DeepSeek-Coder/finetune/.venv/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None finetune_deepseekcoder.py --model_name_or_path deepseek-ai/deepseek-coder-6.7b-instruct --data_path ./data/training-data.json --output_dir ./output/ --num_train_epochs 3 --model_max_length 1024 --per_device_train_batch_size 16 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy no --save_strategy steps --save_steps 100 --save_total_limit 100 --learning_rate 2e-5 --warmup_steps 10 --logging_steps 1 --lr_scheduler_type cosine --gradient_checkpointing True --report_to tensorboard --deepspeed configs/ds_config_zero3.json --bf16 True
[2023-11-27 20:47:47,291] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-27 20:47:48,425] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2023-11-27 20:47:48,425] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-11-27 20:47:48,425] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-11-27 20:47:48,425] [INFO] [launch.py:163:main] dist_world_size=1
[2023-11-27 20:47:48,425] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-11-27 20:47:52,153] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-27 20:47:52,476] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-27 20:47:52,476] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
====================================================================================================
TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=configs/ds_config_zero3.json,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./output/runs/Nov27_20-47-51_dev-llm-finetuning,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=1.0,
logging_strategy=steps,
lr_scheduler_type=cosine,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=./output/,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=16,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./output/,
save_on_each_node=False,
save_safetensors=True,
save_steps=100,
save_strategy=steps,
save_total_limit=100,
seed=42,
skip_memory_metrics=True,
split_batches=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=10,
weight_decay=0.0,
)
PAD Token: <|end▁of▁sentence|> 32014
BOS Token <|begin▁of▁sentence|> 32013
EOS Token <|EOT|> 32021
Load tokenizer from deepseek-ai/deepseek-coder-6.7b-instruct over.
[2023-11-27 20:48:03,930] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 291, num_elems = 6.74B
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00, 5.72s/it]
Load model from deepseek-ai/deepseek-coder-6.7b-instruct over.
Training dataset samples: 99
...
Using /home/user/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/user/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.3529317378997803 seconds
Parameter Offload: Total persistent parameters: 266240 in 65 params
[2023-11-27 20:49:16,555] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 36733
[2023-11-27 20:49:16,557] [ERROR] [launch.py:321:sigkill_handler] ['/home/user/DeepSeek-Coder/finetune/.venv/bin/python', '-u', 'finetune_deepseekcoder.py', '--local_rank=0', '--model_name_or_path', 'deepseek-ai/deepseek-coder-6.7b-instruct', '--data_path', './data/training-data.json', '--output_dir', './output/', '--num_train_epochs', '3', '--model_max_length', '1024', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '4', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '100', '--save_total_limit', '100', '--learning_rate', '2e-5', '--warmup_steps', '10', '--logging_steps', '1', '--lr_scheduler_type', 'cosine', '--gradient_checkpointing', 'True', '--report_to', 'tensorboard', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', 'True'] exits with return code = -9
I tried to run the finetune_deepseekcoder.py script directly to see what the actual error is and it outputted
Traceback (most recent call last):
File "/home/user/DeepSeek-Coder/finetune/finetune_deepseekcoder.py", line 193, in <module>
train()
File "/home/user/DeepSeek-Coder/finetune/finetune_deepseekcoder.py", line 187, in train
trainer.train()
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 1555, in train
return inner_training_loop(
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 2725, in training_step
loss = self.compute_loss(model, inputs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/trainer.py", line 2748, in compute_loss
outputs = model(**inputs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 659, in forward
return model_forward(*args, **kwargs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 647, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
outputs = self.model(
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 879, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/home/user/DeepSeek-Coder/finetune/.venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D
官方提供的deepspeed zero3 脚本运行比较慢,可以提供多机多卡的训练脚本吗?
非常感谢!
比那个免费版的chatgpt写代码强了不少,确定是国内大模型的天花板了。
但作为一个有10000片A100的公司,你们内测的聊天应用生成速度太慢了,你们确实是一个魔幻的公司
请问下,代码补全中模型经常出现说不停的情况,你们有发现类似的情况么?
目前第一步数据清洗是与starcoder相同,想学习了解后面是如何过滤掉低质量代码、语法错误或可读性差的代码的。
谢谢!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.