Coder Social home page Coder Social logo

chatglm_lora_multi-gpu's Introduction

Chatglm_lora_multi-gpu

大模型prompt&delta理论部分知识

1.CSDN链接

2.知乎链接

语音学术助手理论部分

1.知乎链接

2.知乎链接

langchain keypoint理论部分

1.知乎链接

2.知乎链接

代码见APP_example/langchain_keypoint

C3893346-9075-4140-B7C2-0377ABCF8459

real time draw

代码见APP_example/real_time_draw 9c6c0fad58b45cf76c51c601f5a75808 png A297A761-871C-46A3-83A5-3C46440550DD

clip retrieval理论部分

1.知乎链接 代码见APP_example/clip_retrieval

1.图片库特征抽取代码:extract_embeddings.py 2.图片特征在faiss向量数据库建立索引:build_index.py 3.可视化应用界面:app.py

clip_search00

clip_searcg01

retrieval inage generator理论部分

1.知乎链接

代码见APP_example/retrieval_image_gen,如果直接启动需要24G左右显卡(没这么对显卡同学可以考虑api方式实现llm和image2image,clip检索显卡需求很低)

1.整合最终效果代码:app_gradio.py 2.图片image2image代码:upimage.py 3.openaistyle访问qwen大模型:先启动服务端openai_api.py;在启动可视化界面 chatbot_st.py
aigc-imag0 aigc-img01

带文字海报智能生成理论部分

1.知乎链接

代码见APP_example/auto_poster,如果直接启动需要24G左右显卡.目前代码还之包括4个模块,模块间衔接需要人手动操作,下一个版本会做到自动化一键输入生成。

1.生图模块 2.字排版模块 3.图文混合模块 4.图审核验证模块
aigc-img01

以chatglm为引擎,逐步配置各种插件,拓展更多应用

初始化环境

pip install -r requirements.txt

包括3种方式多gpu运行:

0 最简单的多gpu运行,能跑通的单机脚本+deepspeed的配置文件就可以

单机执行命令 python finetune.py \ --dataset_path data/alpaca \ --lora_rank 8 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 1 \ --max_steps 2000 \ --save_steps 1000 \ --save_total_limit 2 \ --learning_rate 2e-5 \ --fp16 \ --remove_unused_columns false \ --logging_steps 50 \ --report_to wandb --output_dir output
多Gpu执行命令 torchrun --nproc_per_node=2 multi_gpu_fintune_belle.py \ --dataset_path data/alpaca \ --lora_rank 8 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 1 \ --save_steps 2000 \ --save_total_limit 2 \ --learning_rate 2e-5 \ --fp16 \ --num_train_epochs 2 \ --remove_unused_columns false \ --logging_steps 50 \ --report_to wandb --output_dir output \ --deepspeed ds_config_zero3.json

1.deepspeed

数据处理

给两份belle中文的self instruct数据
     1.0.5M版本:
      cd data 
      
      wget https://huggingface.co/datasets/BelleGroup/generated_train_0.5M_CN/resolve/main/Belle.train.json

     2.1M版本
     
     wget https://huggingface.co/datasets/BelleGroup/generated_train_1M_CN/resolve/main/belle_open_source_1M.train.json

     3.把两份数据合并成一份

     a.0.5M和1M数据字段有些不同,统一处理数据,用地下代码处理1M数据
     
     cd ..
     
     python process_belle_1M_data.py

     b.把两份文件合并成一份,命名为:Belle_0_1.train.json

     cd data & cat Belle.train.json Belle_1M.train.json>Belle_0_1.train.json

数据准备好后执行下面命令

torchrun --nproc_per_node=2 multi_gpu_fintune_belle.py \ --dataset_path data/alpaca \ --lora_rank 8 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 1 \ --save_steps 1000 \ --save_total_limit 2 \ --learning_rate 2e-5 \ --fp16 \ --num_train_epochs 2 \ --remove_unused_columns false \ --logging_steps 50 \ --gradient_accumulation_steps 2 \ --output_dir output \ --deepspeed ds_config_zero3.json

2.accelerate+deepspeed

准备数据

下载数据

cd data

wget https://huggingface.co/datasets/BelleGroup/generated_train_0.5M_CN/resolve/main/Belle.train.json

python tokenize_dataset_rows_belle.py
--jsonl_path data/alpaca_data.jsonl
--save_path data/alpaca
--max_seq_length 200
--skip_overlength

数据准备好后执行下面命令

accelerate launch --config_file accelerate_ds_zero3_cpu_offload_config.yaml multi_gpu_fintune_belle.py \ --dataset_path data/alpaca \ --lora_rank 8 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 1 \ --max_steps 10000 \ --save_steps 1000 \ --save_total_limit 2 \ --learning_rate 2e-5 \ --fp16 \ --remove_unused_columns false \ --logging_steps 50 \ --output_dir output

3.ddp方式还没试

batch inference

实际工作中常常会出现,需要批量不数据预测出来问题

往往我们有一台高性能的机器,但是如果做fintune,只能一张卡一个时间面对一个请求,造成显卡存资源浪费

batch inference成为必要

1.deepspeed --num_gpus 2 chatglm_deepspeed_inference.py

2.显卡资源不足以装下大模型,可以用accelerate.load_checkpoint_and_dispatch:

python chatglm_milti_gpu_inference.py

如果也想用deepspeed加速,把以下注释代码去掉:

# init deepspeed inference engine '''ds_model = deepspeed.init_inference( model=model, # Transformers models mp_size=8, # Number of GPU dtype=torch.float16, # dtype of the weights (fp16) replace_method="auto", # Lets DS autmatically identify the layer to replace replace_with_kernel_inject=True, # replace the model with the kernel injector ) print(f"model is loaded on device {ds_model.module.device}")'''
deepspeed --num_gpus 2 chatglm_milti_gpu_inference.py

webUI交互

进入webui文件夹,执行readme.txt命令即可 image

streamlit run web_feedback.py --server.port 6006

新增chatglm作图应用

生成图

进入APP——example应用

023106E2-912D-4999-A0A2-9971C36A0769

7762BA98-AE3C-4D28-8CFD-8531A1C9209A

利用自定义知识库约束,chatglm回复

进入APP——example应用 chat_langchain

pip install -r requirement.txt \n python knowledge_based_chatglm.py

不带知识库回复: Q:世界上最大河流 A:"世界上最大的河流是尼罗河。尼罗河是非洲大陆最长的河流,全长约6650公里,发源于东非高原,流经苏丹、乌干达、肯尼亚、坦桑尼亚、卢旺达、刚果**共和国、布隆迪和埃及,最终注入地中海。尼罗河流域是非洲最重要的农业地区之一,也是世界上最古老的文明之一埃及文明的发源地之一。"

带知识库回复 基于本地知识搜索没有找到答案 image

新增chatglm强化学习Alignment部分(RLHF)

现在还比较naive,逐步会增加更实用更工业化的任务

新增stablediffusion lora训练能力

1.新增dreambooth lora训练方法 2.多lora合并生成效果 webwxgetmsgimg (3)

webwxgetmsgimg (1)

LLM_StableDiffusion_Studio

做了一个工具整合,后面会整合更多能力,相信我们不会只做工具罗列的人

https://github.com/liangwq/LLM_StableDiffusion_Studio

1620634258 webwxgetmsgimg (8) webwxgetmsgimg (9) webwxgetmsgimg (10) webwxgetmsgimg (11) webwxgetmsgimg (12)

新增chatglm实现agent的能力

增加chtglm构建agent代码 1.知乎链接 增加向量检索tool 1.知乎链接 image

chatglm_lora_multi-gpu's People

Contributors

liangwq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

chatglm_lora_multi-gpu's Issues

一张卡能运行,两张卡报错

3090*2
--nproc_per_node=2 改为1可以正常运行

image

image

没有找到更加详细的报错日志,小白冒昧问一下,低级问题麻烦大佬不要嘲笑

推理问题

比如我要启一个服务端,我怎么能在接收到请求的时候用上deepspeed的多卡推理呢

初始化的时候报错

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
FAILED: flatten_unflatten.o
c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
In file included from /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/Device.h:4:0,
from /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /usr/local/lib/python3.8/dist-packages/torch/include/torch/extension.h:6,
from /usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp:8:
/usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
#include <Python.h>
^~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "multi_gpu_fintune_belle.py", line 339, in
main()
File "multi_gpu_fintune_belle.py", line 270, in main
model, optimizer, train_dataloader = accelerator.prepare(model, optimizer, train_dataloader)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1090, in prepare
result = self._prepare_deepspeed(*args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1368, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1547, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 165, in init
util_ops = UtilsBuilder().load()
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'utils'

请问下这是什么问题

deepspeed推理多进程问题

运行deepspeed inference.py 会产生gpu数量*输入句子数量的结果; 就是假设我的gpu总共2个,我测试文本是一句话,它的结果会是返回2个答案,deepspeed好像会产生gpu数量的进程,然后返回多个结果, 怎么处理?

请问ddp模式的如何分布式导入模型?

我使用
model = ChatGLMForConditionalGeneration.from_pretrained(
model_name, load_in_8bit=False, trust_remote_code=True
)
model=DDP(model.cuda(), device_ids=[2])
结果报错内存溢出,判断应该是一个模型在显卡里加载了两遍,请问如何处理

关于多GPU训练

你好,我想请问一下,假如我有8张卡(07),这个程序有没有办法能让我指定哪几张特定的卡运行(比如57)呢?

运行 web_feadback.py 报错

1.安装依赖后,执行命令 streamlit run web_feadback.py --server.port=8080,报错:

/root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /root/anaconda3/envs/lora did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('history -a; history -a; printf "\\033]0;%s@%s'), PosixPath('%s\\007" "${USER}" "${HOSTNAME%%.*}" "${PWD/#$HOME/\\~}"')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 114
/root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
CUDA SETUP: Loading binary /root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
2023-04-26 11:11:07.907 Uncaught app exception
Traceback (most recent call last):
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "/root/Chatglm_lora_multi-gpu/webui/web_feadback.py", line 15, in <module>
    from modeling_chatglm import ChatGLMForConditionalGeneration
ModuleNotFoundError: No module named 'modeling_chatglm'

2.将项目根目录的 modeling_chatglm.pyconfiguration_chatglm.py 文件拷贝到 web_ui 目录下,再次执行 streamlit run web_feadback.py --server.port=8080,再次报错:

/root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /root/anaconda3/envs/lora did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('%s\\007" "${USER}" "${HOSTNAME%%.*}" "${PWD/#$HOME/\\~}"'), PosixPath('history -a; history -a; printf "\\033]0;%s@%s')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 114
/root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
CUDA SETUP: Loading binary /root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
标注数据集已创建。
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|█████████████████████████████████████████████| 8/8 [00:06<00:00,  1.16it/s]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
2023-04-26 11:15:21.961 Uncaught app exception
Traceback (most recent call last):
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "/root/Chatglm_lora_multi-gpu/webui/web_feadback.py", line 289, in <module>
    main()
  File "/root/Chatglm_lora_multi-gpu/webui/web_feadback.py", line 282, in main
    start_evaluate_page()
  File "/root/Chatglm_lora_multi-gpu/webui/web_feadback.py", line 117, in start_evaluate_page
    out = model.generate(
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/peft/peft_model.py", line 729, in generate
    outputs = self.base_model.generate(**kwargs)
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/generation/utils.py", line 1406, in generate
    return self.greedy_search(
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/generation/utils.py", line 2198, in greedy_search
    model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
  File "/root/Chatglm_lora_multi-gpu/webui/modeling_chatglm.py", line 988, in prepare_inputs_for_generation
    mask_position = seq.index(mask_token)
ValueError: 150001 is not in list

3.将步骤2中添加的modeling_chatglm.pyconfiguration_chatglm.py 文件删除,替换为THUDM/ChatGLM-6B项目 chatglm-6b 模型中提供的文件,仍旧报错:

/root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /root/anaconda3/envs/lora did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('%s\\007" "${USER}" "${HOSTNAME%%.*}" "${PWD/#$HOME/\\~}"'), PosixPath('history -a; history -a; printf "\\033]0;%s@%s')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 114
/root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
CUDA SETUP: Loading binary /root/anaconda3/envs/lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
标注数据集已创建。
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|█████████████████████████████████████████████| 8/8 [00:06<00:00,  1.15it/s]
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
2023-04-26 11:32:12.915 Uncaught app exception
Traceback (most recent call last):
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "/root/Chatglm_lora_multi-gpu/webui/web_feadback.py", line 289, in <module>
    main()
  File "/root/Chatglm_lora_multi-gpu/webui/web_feadback.py", line 282, in main
    start_evaluate_page()
  File "/root/Chatglm_lora_multi-gpu/webui/web_feadback.py", line 117, in start_evaluate_page
    out = model.generate(
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/peft/peft_model.py", line 729, in generate
    outputs = self.base_model.generate(**kwargs)
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/generation/utils.py", line 1406, in generate
    return self.greedy_search(
  File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/transformers/generation/utils.py", line 2245, in greedy_search
    model_kwargs = self._update_model_kwargs_for_generation(
  File "/root/Chatglm_lora_multi-gpu/webui/modeling_chatglm.py", line 1085, in _update_model_kwargs_for_generation
    attention_mask = torch.cat(
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 3)

chatglm用deepspeed多卡推理问题

你好,看你的github中用了deepspeed对chatglm模型进行了多卡推理,我们在用多卡推理过程中,显存并未减少,即未做模型并行。想问下,你们的多卡并行是否能正常执行。多卡显存是否符合预期

out of memory 显存溢出

您好,我设定了, 512的长度, batch size = 1 还是显存溢出。

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 372.00 MiB (GPU 6; 31.75 GiB total capacity; 29.02 GiB already allocated; 35.75 MiB free; 29.86 GiB reserved in total by PyT
orch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

请问有解决办法么?

模型检查到一半就报错,大佬能帮我看看吗

GLM) wxd7@wxd7-EG341W-G21:~/glm/Chatglm_lora_multi-gpu-main$ torchrun --nproc_per_node=2 multi_gpu_fintune_belle.py --dataset_path /home/wxd7/glm/ChatGLM-Tuning-master/data/alpaca --model_path /home/wxd7/upan/GLM/model/chatglm-6b --lora_rank 8 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --save_steps 2000 --save_total_limit 2 --learning_rate 2e-5 --fp16 --num_train_epochs 2 --remove_unused_columns false --logging_steps 50 --report_to wandb --output_dir output --deepspeed ds_config_zero3.json
WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: CUDA runtime path found: /home/wxd7/anaconda3/envs/GLM/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: CUDA runtime path found: /home/wxd7/anaconda3/envs/GLM/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 113
/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "", line 1, in
FileNotFoundError: [Errno 2] No such file or directory: '/home/wxd7/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/aa51e62ddc9c9f334858b0af44cf59b05c70148a/tokenization_chatglm.py'
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 38%|██████▊ | 3/8 [00:08<00:13, 2.64s/it]WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 17785 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 1 (pid: 17786) of binary: /home/wxd7/anaconda3/envs/GLM/bin/python
Traceback (most recent call last):
File "/home/wxd7/anaconda3/envs/GLM/bin/torchrun", line 8, in
sys.exit(main())
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/wxd7/anaconda3/envs/GLM/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

multi_gpu_fintune_belle.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-04-13_02:02:27
host : wxd7-EG341W-G21
rank : 1 (local_rank: 1)
exitcode : -9 (pid: 17786)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 17786

设备是48G内存,双P40(24G),可以跑单卡,双卡模型加载一半的就报错了

如何训练自己数据集

### Dataset

PROMPT_DICT = {
    "prompt_input": (
        "Below is an instruction that describes a task, paired with an input that provides further context. "
        "Write a response that appropriately completes the request.\n\n"
        "### Input:\n{input}\n\n### Response:"
    ),
    "prompt_no_input": (
        "Below is an instruction that describes a task. "
        "Write a response that appropriately completes the request.\n\n"
        "### Instruction:\n{input}\n\n### Response:"
    ),}

with open('data/belle_open_source_1M.train.json', 'r') as f:
    content =[]
    for line in f.readlines():##readlines(),函数把所有的行都读取进来;
        #print(json.loads(line)['input'])    
        content.append(json.loads(line))


pairs = []

for line in content:
    if line['input'] == '':
        prompt = PROMPT_DICT['prompt_no_input'].format_map(line)
    else:
        prompt = PROMPT_DICT['prompt_input'].format_map(line)
    completion = line['target']+'</s>'
    if len(prompt) + len(completion) < MAX_LENGTH:
        pairs.append({'prompt':prompt, 'completion':completion})      

两个问题:
1、上述代码里,pairs中的prompt会把原来的全部替换为PROMPT_DICT['prompt_no_input']或者PROMPT_DICT['prompt_input'],原本的问题全部都没有了?
2、如果使用自己的数据集训练,这一部分代码是不是要自己修改,构造pairs。运行时传的dataset_path感觉没有使用呢?

Belle数据集更新

Belle huggingface的数据集更新,预处理部分的代码好像失效了,需要更新?

deepspeed和lora

使用lora微调的显存,和deepspeed+lora结合的显存,发现是一致的。。。所以deepspeed的作用是?

多卡并行问题

博主你好 我用4张a100 40g进行多卡并行跑multi_gpu,py的时候 终端报NCCL错误,并且第一张卡显存爆了
1683982496509
1683982532168
1683982571409

模型是否存在信息泄露

我在你的代码基础上,将标点的attention_mask置为0,即不mask。但训练时,发现模型很快就收敛了,且会读到未来的token,即使是无意义的。
step: 0
如下示例:
为了得到完整的句子,我构造了这样的示例,进行forward。输出结果第一个字符最大概率为"会增加"这个token。另外训练时,loss很快收敛到了0.01左右。(一个epoch)
Instruction: 根据主题生成中文歌词,每一句歌词用中文逗号隔开
Input:这首歌的歌词的主题是爱情和乌梅子酱。生成的歌词为: 会增加会增加会增加会增加会增加会增加会增加会增加会增加,会议上会议上会议上会议上会议上会议上会议上会议上会议上,保证了保证了保证了保证了保证了保证了保证了保证了,多是多是多是多是多是多是,完全没有完全没有完全没有完全没有完全没有完全没有完全没有完全没有完全没有完全没有完全没有完全没有完全没有,升值升值升值升值升值升值升值升值升值,茵茵茵茵茵茵茵茵茵,句话句话句话句话句话句话句话句话,西路西路西路西路西路西路,其中一个其中一个其中一个其中一个其中一个其中一个其中一个其中一个其中一个其中一个其中一个其中一个其中一个,发脾气发脾气发脾气发脾气发脾气发脾气,关税关税关税关税关税关税关税关税关税关税关税关税关税,就好像就好像就好像就好像就好像就好像就好像就好像就好像,口袋口袋口袋口袋口袋口袋口袋口袋口袋,辽宁省辽宁省辽宁省辽宁省辽宁省辽宁省辽宁省辽宁省,大佬大佬大佬大佬大佬大佬,重要的是重要的是重要的是重要的是重要的是重要的是重要的是重要的是重要的是重要的是重要的是重要的是重要的是,沪指沪指沪指沪指沪指沪指,一口气一口气一口气一口气一口气一口气一口气一口气一口气一口气一口气一口气一口气。

mask相关代码:

    def encode_input(self, prompt_str, bs=1):
        input_ids = []
        position_ids = []
        prompt_str = prompt_str.replace(',',',')
        prompt_str = prompt_str.replace(':',':')
        prompt = self.tokenizer.encode(prompt_str.split('每句字数:')[0]+ '生成的歌词为:')
        # sent_word_counts = eval(f"[{prompt_str.split('每句字数:')[-1][:-1].replace('\nAnswer: 生成的歌词为:',;)}]")
        counts = prompt_str.split('每句字数:')[-1][:-1].replace('。生成的歌词为','')
        sent_word_counts = eval(f"[{counts}]")
        completion = self.make_output_mask(sent_word_counts)
        _max_length = len(completion) + len(prompt)
        attention_mask = torch.ones((bs, _max_length, _max_length), device=self.device)
        attention_mask.tril_()
        # 需要根据字数构造completion
        # 修复??问题,可能是逗号没有编码对
        completion = [6 if i == 0 else i for i in completion ]
        context_length = prompt.index(130004)
        attention_mask[0, :, :context_length] = 1

        to_pad = _max_length - len(prompt) - len(completion)

        inp = prompt + completion + [self.tokenizer.pad_token_id] * to_pad
        input_ids.append(inp)
        # convert to tensor
        input_ids = torch.tensor(input_ids, device=self.device).long()
        pun_pos = [i for i, x in enumerate(inp) if x == 6]
        pun_pos += [i for i, x in enumerate(inp) if x == 63823]  # 句号
        # 将逗号位置的attention设置为可见
        attention_mask[0, :, pun_pos] = 1

        position_ids.append(torch.stack([torch.arange(0, _max_length, device=self.device),
                                    torch.concat([torch.zeros(context_length - 1, device=self.device),
                                                torch.arange(0, _max_length - context_length + 1,
                                                            device=self.device)])]).long())
        position_ids = torch.stack(position_ids)
        attention_mask.unsqueeze_(1)
        return {
            'input_ids': input_ids,
            # 'attention_mask': 1 - attention_mask,  # 翻转mask
            'attention_mask':(attention_mask<0.5).bool(),  # 翻转mask
            'position_ids': position_ids
        }

chatglm 分片模型不适合deepspeed

试过deepspeed加载模型,多卡的时候的确会启动多个进程,但是显存其实不是成倍的。只有类似chatglm保存了分片检查点的情况,会出现成倍的显存增加,参考
https://github.com/microsoft/DeepSpeed/issues/2379,里面提出For example, gpt-neo-x 20B takes about 40GB in RAM, and if you run this script with deepspeed --num_gpus 4 example.py --save_ckpt, you will end up using 4 * 40GB in RAM

请问目前有什么办法解决这个问题吗?

解析相应报错

image
如上图,是不是没有检索出来?你的read.txt里面咋写的

报错求助

FAILED: flatten_unflatten.o
c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /data/software/anaconda3/envs/chatglm_mul/lib/python3.8/site-packages/torch/include -isystem /data/software/anaconda3/envs/chatglm_mul/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/software/anaconda3/envs/chatglm_mul/lib/python3.8/site-packages/torch/include/TH -isystem /data/software/anaconda3/envs/chatglm_mul/lib/python3.8/site-packages/torch/include/THC -isystem /data/software/anaconda3/envs/chatglm_mul/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /data/software/anaconda3/envs/chatglm_mul/lib/python3.8/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
c++: error: unrecognized command line option ‘-std=c++17’
ninja: build stopped: subcommand failed.

请问这是gcc版本过低吗?

运行web_ui.py,报错:NameError: name 'LoraConfig' is not defined

[root@VM-245-18-centos webui]# streamlit run web_ui.py --server.port 8080

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.

You can now view your Streamlit app in your browser.

Network URL: http://10.0.245.18:8080

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
2023-04-26 10:22:28.285 Uncaught app exception
Traceback (most recent call last):
File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 245, in _get_or_create_cached_value
cached_result = cache.read_result(value_key)
File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/caching/cache_resource_api.py", line 447, in read_result
raise CacheKeyNotFoundError()
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 293, in _handle_cache_miss
cached_result = cache.read_result(value_key)
File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/caching/cache_resource_api.py", line 447, in read_result
raise CacheKeyNotFoundError()
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "/root/Chatglm_lora_multi-gpu/webui/web_ui.py", line 58, in
st.session_state["state"] = predict(prompt_text, st.session_state["state"])
File "/root/Chatglm_lora_multi-gpu/webui/web_ui.py", line 35, in predict
tokenizer, model = get_model()
File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 194, in wrapper
return cached_func(*args, **kwargs)
File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 223, in call
return self._get_or_create_cached_value(args, kwargs)
File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 248, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "/root/anaconda3/envs/lora/lib/python3.9/site-packages/streamlit/runtime/caching/cache_utils.py", line 302, in _handle_cache_miss
computed_value = self._info.func(*func_args, **func_kwargs)
File "/root/Chatglm_lora_multi-gpu/webui/web_ui.py", line 17, in get_model
peft_config = LoraConfig(
NameError: name 'LoraConfig' is not defined

你的README.md与Chatglm_lora_multi-gpu/data

cd data

wget https://huggingface.co/datasets/BelleGroup/generated_train_0.5M_CN/resolve/main/Belle.train.json

python tokenize_dataset_rows_belle.py
--jsonl_path data/alpaca_data.jsonl
--save_path data/alpaca
--max_seq_length 200
--skip_overlength

数据准备好后执行下面命令
accelerate launch --config_file accelerate_ds_zero3_cpu_offload_config.yaml multi_gpu_fintune_belle.py
--dataset_path data/alpaca
--lora_rank 8
--per_device_train_batch_size 2
--gradient_accumulation_steps 1
--max_steps 10000
--save_steps 1000
--save_total_limit 2
--learning_rate 2e-5
--fp16
--remove_unused_columns false
--logging_steps 50
--output_dir output

data 文件夹没有这一个,README.md没有对应的命令文件,里面文件很乱。

多卡deepspeed模式

想问下项目中deepspeed多卡是模型并行吗?就是把不同层拆分到不同的gpu上,这样我多张小显存的卡也能跑起来,是这样的吗?

langchain版本是多少

ModuleNotFoundError: No module named 'langchain.callbacks.manager'
报了以上提示,langchain版本是多少,我安装的是0.0.401

ValueError: 150004 is not in list

obj: {'prompt': [12313, 107, 125, 6054, 109, 3384, 104, 1833, 6, 11311, 110, 125, 938, 109, 986, 583, 1303, 7, 11121, 104, 532, 109, 12475, 29321, 100, 1029, 7, 4, 4, 125875, 14150, 12, 4, 64298, 66977, 100326, 69122, 76809, 65324, 65459, 85929, 63823, 43, 151, 4, 43, 151, 69106, 12, 64176, 6, 64219, 6, 68651, 63823, 43, 151, 4, 4, 125875, 11034, 12, 130001, 130004], 'completion': [28, 64872, 64219, 68281, 69902, 63984, 6, 63984, 64548, 68651, 63962, 6, 64872, 81715, 69754, 68840, 78, 150005]}
0%| | 0/134265 [00:00<?, ?it/s]
obj: {'prompt': [12313, 107, 125, 6054, 109, 3384, 104, 1833, 6, 11311, 110, 125, 938, 109, 986, 583, 1303, 7, 11121, 104, 532, 109, 12475, 29321, 100, 1029, 7, 4, 4, 125875, 14150, 12, 4, 67769, 65056, 68395, 100087, 63823, 4, 4987, 6, 225, 118, 120, 31, 4, 4, 125875, 11034, 12, 130001, 130004], 'completion': [5, 74874, 6, 63852, 66348, 31, 150005]}
0%| | 0/134265 [00:00<?, ?it/s]
Traceback (most recent call last):
File "multi_gpu_fintune_belle.py", line 361, in
main()
File "multi_gpu_fintune_belle.py", line 309, in main
for step, batch in enumerate(t:=tqdm.tqdm(train_dataloader)):
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/accelerate/data_loader.py", line 378, in iter
current_batch = next(dataloader_iter)
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 634, in next
data = self._next_data()
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 678, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "multi_gpu_fintune_belle.py", line 121, in collate_fn
context_length = obj['prompt'].index(150004)
ValueError: 150004 is not in list
Traceback (most recent call last):
File "multi_gpu_fintune_belle.py", line 361, in
main()
File "multi_gpu_fintune_belle.py", line 309, in main
for step, batch in enumerate(t:=tqdm.tqdm(train_dataloader)):
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/accelerate/data_loader.py", line 378, in iter
current_batch = next(dataloader_iter)
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 634, in next
data = self._next_data()
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 678, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "multi_gpu_fintune_belle.py", line 121, in collate_fn
context_length = obj['prompt'].index(150004)
ValueError: 150004 is not in list
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2408521) of binary: /tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/bin/python
Traceback (most recent call last):
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/bin/torchrun", line 8, in
sys.exit(main())
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/tal-vePFS/SFT/dengshuhao1/local/anaconda3/envs/Chatglm_lora_multi-gpu/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
multi_gpu_fintune_belle.py FAILED


Failures:
[1]:
time : 2023-04-12_07:04:52
host : localhost
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 2408522)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-04-12_07:04:52
host : localhost
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 2408521)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper__index_select)

报错信息如下,请问是我哪里设置的有问题还是其它原因?

`Traceback (most recent call last):
  File "finetune.py", line 170, in <module>
    main()
  File "finetune.py", line 161, in main
    trainer.train()
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/transformers/trainer.py", line 1633, in train
    return inner_training_loop(
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/transformers/trainer.py", line 1902, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/transformers/trainer.py", line 2645, in training_step
    loss = self.compute_loss(model, inputs)
  File "finetune.py", line 103, in compute_loss
    return model(
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
    loss = self.module(*inputs, **kwargs)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/peft/peft_model.py", line 529, in forward
    return self.base_model(
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home2/la/chatgml-tuning/modeling_chatglm.py", line 1043, in forward
    transformer_outputs = self.transformer(
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home2/la/chatgml-tuning/modeling_chatglm.py", line 860, in forward
    inputs_embeds = self.word_embeddings(input_ids)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    return F.embedding(
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper__index_select)
Using /home/la/.cache/torch_extensions/py38_cu113 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0004496574401855469 seconds
  0%|                                                                                                                                                                               | 0/10000 [00:00<?, ?it/s]/home2/la/chatgml-tuning/modeling_chatglm.py:266: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484810403/work/aten/src/ATen/native/cuda/Indexing.cu:1239.)
  attention_scores.masked_fill_(attention_mask.byte(), -10000.0)
/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/autograd/__init__.py:173: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484810403/work/aten/src/ATen/native/cuda/Indexing.cu:1239.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 92653 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 92654) of binary: /home/la/anaconda3/envs/chatglm-tuning/bin/python
Traceback (most recent call last):
  File "/home/la/anaconda3/envs/chatglm-tuning/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==1.12.1', 'console_scripts', 'torchrun')())
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
    run(args)
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/la/anaconda3/envs/chatglm-tuning/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
finetune.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-31_14:02:55
  host      : guest-server
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 92654)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html`

huggingface_hub.utils._validators.HFValidationError

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'model/chatglm3/chatglm3-6b-base'. Use repo_type argument if needed.

报了以上错误,这个是什么原因?
截图1截图2
截图3

一些问题

  1. 朋友,readme里CSDN的link好像是有误的;
  2. 是不是目前大部分finetune都是用lora算法训练的cross attention那部分权重,有没有微调整个模型的?

Deepspeed并未生效

有测试过加了deepspeed和未加deepspeed的效率吗?没有变化,而且看DeepTransformerInference并未生效

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.