wxjiao / parrot Goto Github PK

The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

Python 91.16% Shell 0.14% Makefile 0.01% Dockerfile 0.04% Jsonnet 0.01% Jupyter Notebook 0.41% C++ 0.03% Cuda 0.34% Cython 0.01% C 0.01% MDX 7.86%

chatgpt gpt-4 llama machine-translation human-feedback contrastive instruction-tuning error-guided bloomz lora

parrot's Introduction

ParroT: Translating During Chat Using Large Language Models tuned with Human Translation and Feedback

🔥 Update

[2023/10/12] ParroT accepted to EMNLP 2023 (Findings)!
[2023/09/23] Fixed the streaming mode for local large datasets, which originally supports only datasets in Hugging Face Datasets; need to use --max_steps instead of --num_train_epochs due to the IterableDataset type.
[2023/07/14] Incorporated flash-attention into BLOOM for long-context training; observed about 20-30% speedup with other settings fixed.

[2023/06/14] Releasing detailed instruction data and scripts on @InstructMT.
The WMT22 test sets are made available.
For medium-to-small models (e.g., 7B), we recommend ZeRO2+offload rather than ZerO3; use gradient accumulation to maximize GPU usage.
Important optimizations: preprocess_function to be 4-5X faster; DataCollatorForSeq2Seq for batch-wise padding to save 5-10% GPU usage.
Introducing ParroT-LoRA which supports saving and restarting from the checkpoints (base model and lora weights) during finetuning.
Setting the default Transformers to >= 4.28.0.dev0 directly as it merged the PR of LLaMA. With this version on Torch 1.13.1 + CUDA 11.7, we find the finetuning process could be a bit faster (~18%) than our v1.0.0 implementation.

⭐ Highlight ⭐

🤗 Try the pretrained models at HuggingFace model hub:
- [Alpaca-7b], [ParroT-7b], [ParroT-Hint-7b]
- [ParroT-Hint-7b-lora] based on [LLaMA-7b]

ParroT

Parrots are smart birds that can respond to simple commands or questions. The question is whether they're just mimicking, or really intelligent enough to communicate with humans. This is similar to what we currently speculate about LLMs.

Promoting the good is essential, but punishing the evil is also necessary to ensure that goodness prevails. Similarly, aligning LLMs with human feedbacks is exactly to learn from correct examples and discriminate erroneous examples.

Large language models (LLMs) like ChatGPT and GPT-4 have exhibited remarkable abilities on a wide range of natural language processing (NLP) tasks, including various machine translation abilities accomplished during chat. However, these models are only accessible through restricted APIs, which creates barriers to new research and advancements in the field. Therefore, we propose the ParroT framework to enhance and regulate the translation abilities during chat based on open-sourced LLMs (e.g., LLaMA, Bloomz) and human written translation and evaluation data. Specifically, ParroT reformulates translation data into the instruction-following style, and introduces a “Hint” field for incorporating extra requirements to regulate the translation process.

Figure 1: Framework of ParroT. Hints are (optional) extra requirements to regulate the translation process.

Configurations

Datasets

Train Data: data/data_alpaca_hf.json, data_parrot_hf.json
- You can also use Alpaca data by GPT-4: data/data_alpaca_gpt4_hf_en.json, data/data_alpaca_gpt4_hf_zh.json
- Find more data details and resources in [InstructMT]
Test Data: Flores subsets, WMT22 test sets
Instruction-following format:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
We are translating the following sentences from Chinese to English.
    
### Input:
检查情况显示，市场销售的粮油、肉类、水果、蔬菜、蛋奶等生活必需品供应充足，商品价格基本稳定，未发现严重违法违规行为，市场经营秩序总体平稳。

### Hint: A translation with major accuracy/mistranslation errors could be

### Response:The results of the inspection indicate the sufficient supply of living necessities <v>on marketing</v> 
including cereals and oils, meat, fruits, vegetables, eggs and milk, and the basically stabilized commodity price. 
The inspection hasn’t found serious violation of laws and regulations. The market order is stable on an overall basis.

Environment

We develop ParroT based on open-sourced LLMs (e.g., LLaMA, Bloomz) with HuggingFace's transformers library.

Framework Versions:

Python 3.8.12
Pytorch 1.13.1+cu117
Transformers (git+https://github.com/huggingface/transformers.git)
Peft (git+https://github.com/huggingface/peft.git)
Flash-attn
Triton 2.0.0.dev20221202
Other requirements

pip install -r requirements.txt

Data Format Conversion

Convert the regular bilingual sentence pairs into Alpaca data format:

python3 scripts/convert_pair_to_alpaca.py \
    -s zh -t en \
    -if scripts/instruct_follow.txt \
    -sf data/train.zh-en.zh.txt \
    -tf data/train.zh-en.en.txt \
    -of data/train_alp.json

Convert the Alpaca data format to the training data format here:

python3 scripts/convert_alpaca_to_hf.py \
    -i data/train_alp.json \
    -o data/train_alp_hf.json

Finetune

We modify the example script of language modeling in transformers for finetuning, i.e., run_clm.py with the built in HuggingFace Trainer. So it would be easy to get started if you are familiar with run_clm.py. Also, this script supports data streaming, which might be helpful for handling larger datasets. DeepSpeed ZeRO stage 2/3 is adopted for distributed training.

The resulting finetuning scripts are named as run_clm_llms.py and run_clm_lora.py for full model training and LoRA training, respectively. Theoretically, the run_clm_lora.py script can handle both full model and LoRA by specifying the arguments. But we also keep the former one for full model in consideration of safe development.

For LoRA training, we recommend to use ZeRO2 since ZeRO3 is very unstable when saving adapter_model.bin.

For long-context training, we provide the run_clm_llms_flash.py to improve the memory efficiency.

LLaMA-7b:

Original weights for the LLaMA models can be obtained by filling out this Form
Convert the LLaMA weights into the HuggingFace format by following the instructions in this Doc
Optionally converted one [LLaMA-7b]

Bloomz-7b1-mt:

Available on HuggingFace: Bloomz-7b1-mt

Example usages on 8 A100 by 1 node:

Full Model

# Multi-nodes are also supported

export NCCL_DEBUG=INFO
export NCCL_SOCKET_IFNAME=eth1
export NCCL_IB_GID_INDEX=3
export NCCL_IB_SL=3
export NCCL_NET_GDR_READ=1

export MASTER_ADDR="${CHIEF_IP:=localhost}"
export MASTER_PORT="${MASTER_PORT:=29500}"

train_path=transformers/examples/pytorch/language-modeling/run_clm_llms.py
model_path=<your_proj_path>/llama-7b
model_save=<your_proj_path>/parrot-hint-7b

# HOST_NUM will be 1
torchrun --nnodes $HOST_NUM --node_rank $INDEX --nproc_per_node 8 \
    --master_addr $MASTER_ADDR --master_port $MASTER_PORT  \
    ${train_path} \
    --deepspeed train/deepspeed_config_zero2.json \
    --model_name_or_path ${model_path} \
    --train_file data/data_parrot_hf.json \
    --preprocessing_num_workers 16 \
    --dataloader_num_workers 8 \
    --dataloader_pin_memory True \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --num_train_epochs 1.5 \
    --save_strategy "steps" \
    --save_steps 500 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 10 \
    --block_size 512 \
    --do_train \
    --evaluation_strategy "no" \
    --validation_split_percentage 0 \
    --fp16 True \
    --fp16_full_eval True \
    --ddp_timeout 3600 \
    --seed 1 \
    --gradient_checkpointing True \
    --output_dir ${model_save}

# Use streaming for large datasets and specify the max_steps
#    --streaming \
#    --max_steps 2500 \

LoRA

# Multi-nodes are also supported

export NCCL_DEBUG=INFO
export NCCL_SOCKET_IFNAME=eth1
export NCCL_IB_GID_INDEX=3
export NCCL_IB_SL=3
export NCCL_NET_GDR_READ=1

export MASTER_ADDR="${CHIEF_IP:=localhost}"
export MASTER_PORT="${MASTER_PORT:=29500}"

train_path=transformers/examples/pytorch/language-modeling/run_clm_lora.py
model_path=<your_proj_path>/llama-7b
model_save=<your_proj_path>/parrot-hint-lora-7b

# HOST_NUM will be 1
torchrun --nnodes $HOST_NUM --node_rank $INDEX --nproc_per_node 8 \
    --master_addr $MASTER_ADDR --master_port $MASTER_PORT  \
    ${train_path} \
    --deepspeed train/deepspeed_config_zero2.json \
    --model_name_or_path ${model_path} \
    --train_file data/data_parrot_hf.json \
    --use_lora True \
    --lora_config train/lora_config.json \
    --preprocessing_num_workers 16 \
    --dataloader_num_workers 8 \
    --dataloader_pin_memory True \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --num_train_epochs 1.5 \
    --save_strategy "steps" \
    --save_steps 500 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 10 \
    --block_size 512 \
    --do_train \
    --evaluation_strategy "no" \
    --validation_split_percentage 0 \
    --fp16 True \
    --fp16_full_eval True \
    --ddp_timeout 3600 \
    --seed 1 \
    --gradient_checkpointing True \
    --output_dir ${model_save}

Inference

The scripts support generation with and without hints using different instructions. The hints are appended to the default instruction with ### as a delimiter. Simply switch the inference instruction for different strategies.

None: instruct_inf.txt
- Translate the following sentences from [SRC] to [TGT].
No Errors: instruct_inf_e2t.txt
- Translate the following sentences from [SRC] to [TGT].###A translation with no errors could be
Minor Errors: instruct_inf_e2t_minor.txt
- Translate the following sentences from [SRC] to [TGT].###A translation with minor errors could be
Major Errors: instruct_inf_e2t_major.txt
- Translate the following sentences from [SRC] to [TGT].###A translation with major errors could be
Preferred: instruct_inf_t2t.txt
- Translate the following sentences from [SRC] to [TGT].###We prefer to translate it to

Example usages:

Full Model

# Translation
python3 inference.py --model-name-or-path <your_proj_path>/parrot-hint-7b \
    -lp 'zh-en' \
    -t 0.1 \
    -sa 'beam' \
    -ins test/instruct_inf.txt \
    -i test/test_rand_50.zh.txt \
    -o test/test_rand_50.zh-en.none-hint.txt
    
# Text generation
python3 inference.py --model-name-or-path <your_proj_path>/parrot-hint-7b \
    -t 0.7 \
    -sa 'sample' \
    -i test/test_case.txt \
    -o test/test_case.general-task.txt

LoRA

# Translation
python3 inference_lora.py --model-name-or-path <your_proj_path>/llama-7b \
    --lora-weights <your_proj_path>/parrot-hint-lora-7b/adapter_model \
    -lp 'zh-en' \
    -t 0.1 \
    -sa 'beam' \
    -ins test/instruct_inf.txt \
    -i test/test_rand_50.zh.txt \
    -o test/test_rand_50.zh-en.none-hint.txt
    
# Text generation
python3 inference_lora.py --model-name-or-path <your_proj_path>/llama-7b \
    --lora-weights <your_proj_path>/parrot-hint-lora-7b/adapter_model \
    -t 0.7 \
    -sa 'sample' \
    -i test/test_case.txt \
    -o test/test_case.general-task.txt

MT Evaluation

We adopt two metrics, SacreBLEU and COMET (Unbabel/wmt22-comet-da), which are driven by n-gram similarity and cross-lingual pretrained models, respectively.

# SacreBLEU
cat test_rand_50.zh-en.none-hint.txt.hyp | sacrebleu -w 2 test_rand_50.en.txt

# COMET
comet-score -r test_rand_50.en.txt -s test_rand_50.zh.txt -t test_rand_50.zh-en.none-hint.txt.hyp --quiet --only_system

Finetuned LLMs and Results

Currently, we finetuned the following LLMs for ParroT with the evaluation mainly on WMT22 test sets.

LLaMA-7b
Bloomz-mt-7b
ParroT-LoRA

There are several interesting observations:

ParroT based on Bloomz-mt-7b also works well with hints. Besides, Bloomz-mt-7b shows stronger ability in the modeling of Chinese texts.
LoRA seems to prevent LLMs from overfitting which benefits the high-resource De-En translation but restricts the instruction learning of other directions. The limited trainable parameters (only ~4.2M) may explain this observation.

Caption: Translation performance of LLMs on Flores subsets and WMT22 test sets.

Run LLMs on your MacBook

Try llama.cpp to run the LLMs using 4-bit quantization on a MacBook. We adopt a specific fork from comex/llama.cpp which supports the conversion of HuggingFace models to ggml format.

We recommend the use of Python 3.10.10 for convert.py since we encountered bugs with Python 3.9.5.

TypeError: 'staticmethod' object is not callable

# Clone the specific fork 
git clone --branch convert-script https://github.com/comex/llama.cpp.git
cd llama.cpp
make

# Install Python dependencies
python3 -m pip install -r requirements.txt

# Convert the 7b model to ggml fp16 format
python3 convert.py models/alpaca/pytorch_model.bin

# Quantize the model to 4-bits (using method 2 = q4_0)
./quantize models/alpaca/ggml-model-f16.bin models/alpaca/ggml-model-q4_0.bin 2 

# Run instruction mode with Alpaca
./main -m ./models/alpaca/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -ins -b 256 --top_p 0.95 --top_k 50 --temp 0.7 --repeat_penalty 1 -t 7

Now you can talk to your own Chatbot!

Alpaca-7b

Caption: Alpaca cannot respond to the hints.

ParroT-Hint-7b

Caption: ParroT responds to the hints as expected.

Public Impact

Acknowledgement

This project cannot be developed without the following resources:

Meta AI LLaMA: https://github.com/facebookresearch/llama
BigScience Bloomz: https://huggingface.co/bigscience/bloom
HuggingFace developers on LLaMA: huggingface/transformers#21955
Stanford Alpaca: https://github.com/tatsu-lab/stanford_alpaca
OptimalScale: https://github.com/OptimalScale/LMFlow
llama.cpp by @ggerganov and @comex

Citation

Please kindly cite our paper if you find it helpful:

@inproceedings{jiao2023parrot,
  title={ParroT: Translating during Chat using Large Language Models tuned with Human Translation and Feedback}, 
  author={Wenxiang Jiao and Jen-tse Huang and Wenxuan Wang and Zhiwei He and Tian Liang and Xing Wang and Shuming Shi and Zhaopeng Tu},
  booktitle = {Findings of EMNLP},
  year      = {2023}
}

parrot's People

Contributors

Stargazers

Watchers

parrot's Issues

Support OneRingTranslator and multiple language engines to estimate

Hi, guys!

Thanks for interesting project - I worked on something the same with Russian-English translation.

I thinks, you may be interested in use in your project one my project https://github.com/janvarev/OneRingTranslator
OneRingTranslator provide single REST interface to different translation language.

It seems to be integrated in your workflow in two cases:

Get other translations in a universal way to compare with. Now it support FB NLLB model, so you can compare with it and get BLEU scores.
If you transfer your solution as a plugin to OneRingTranslator (or easily add others like DeepL), you'll be able to get all translations in one interface to compare with.
As a solution to run your lora/model throw REST interface you can use https://github.com/LostRuins/koboldcpp/releases, that provides REST API interface to loaded ggml model.

I think it's a good idea to have one interface with multiple plugins to play with.

Good luck anywhere!

关于 run_clm_llms.py修改

感谢您对llms在翻译社区的贡献！

想问您所提到在run_clm_llms.py文件中的修改主要有哪些呢？

where is run_clm_lora.py?

which is infered by readme,but I can not find

could not evaluate on multiple gpu

hi, some issues trouble me when i reproduce translation result at 8*A100. I try to evaluate the model at the end of training, the process seems to be stuck, it does not display any error and the gpu keeps 100% utilization. But, there is no error when I evaluate on single gpu.

lora 版本

你好，请问ParroT你有训lora版本吗？能把lora权重放出来吗？抱抱脸上是完整的模型，hint版和普通版都要下载完整的内存占用太大了

Why is the inference blue low in translation task (en-de)?

Hello, I have a problem while using ParroT inference.
In the translation task of wmt22 testset (en-de), I loaded the parameters of Lrama-7b for inference, and the bleu was only 6.9808, while after loading the fine-tuning parameters of ParroT-Hint-7b-lora you provided, without adding Hint, bleu did not improve. How can I improve inference performance? Thank you!

How to get the translation data only?a

What should be modified if I finetune the LLaMA with 2 A100 GPUs?

Hi Wenxiang,

I find your work very useful and interesting. Thanks for you efforts!

I wonder what part in the example of fine-tuning LoRA needs to be modified if I only have 2 A100 GPUs (except the --nproc_per_node).

关于deepspeed训练问题

我在使用您的代码进行训练时遇到了以下的报错并一直没有得到解决：
`Emitting ninja build file /share/home/hubaotian/hbt_user05/.cache/torch_extensions/py38_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/TH -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/THC -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -march=native -fopenmp -D__AVX256 -D__DISABLE_CUDA_ -c /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
FAILED: cpu_adam.o
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/TH -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/THC -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -march=native -fopenmp -D__AVX256 -D__DISABLE_CUDA_ -c /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
c++: 错误：unrecognized command line option ‘-std=c++14’
[2/3] c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/TH -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/THC -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -march=native -fopenmp -D__AVX256 -D__DISABLE_CUDA_ -c /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o
FAILED: cpu_adam_impl.o
c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/TH -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/include/THC -isystem /share/home/hubaotian/hbt_user05/.conda/envs/parrot/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -march=native -fopenmp -D__AVX256 -D__DISABLE_CUDA_ -c /share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o
c++: 错误：unrecognized command line option ‘-std=c++14’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "../transformers/examples/pytorch/language-modeling/run_clm_llms.py", line 710, in
main()
File "../transformers/examples/pytorch/language-modeling/run_clm_llms.py", line 658, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/transformers/trainer.py", line 1648, in train
return inner_training_loop(
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/transformers/trainer.py", line 1717, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/init.py", line 171, in initialize
engine = DeepSpeedEngine(args=args,
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 303, in init
Loading extension module cpu_adam...
self._configure_optimizer(optimizer, model_parameters)
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1185, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1256, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
return self.jit_load(verbose)
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
op_module = load(name=self.name,
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Time to load cpu_adam op: 1.2853024005889893 seconds
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f0ec5fe39d0>
Traceback (most recent call last):
File "/share/home/hubaotian/hbt_user05/.conda/envs/parrot/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000020, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1
[2023-09-22 16:33:26,892] [INFO] [logging.py:96:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adam as basic optimizer
[2023-09-22 16:33:26,893] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2023-09-22 16:33:26,907] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam
[2023-09-22 16:33:26,907] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'>
[2023-09-22 16:33:26,907] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-09-22 16:33:26,908] [INFO] [stage_1_and_2.py:146:init] Reduce bucket size 200000000
[2023-09-22 16:33:26,908] [INFO] [stage_1_and_2.py:147:init] Allgather bucket size 200000000
[2023-09-22 16:33:26,908] [INFO] [stage_1_and_2.py:148:init] CPU Offload: True
[2023-09-22 16:33:26,908] [INFO] [stage_1_and_2.py:149:init] Round robin gradient partitioning: False
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 68279 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 68280) of binary: /share/home/hubaotian/hbt_user05/.conda/envs/parrot/bin/python`

我所使用的训练脚本为：
torchrun --nnodes 1 --node_rank 0 --nproc_per_node 2 \ --master_addr $MASTER_ADDR --master_port $MASTER_PORT \ ${train_path} \ --deepspeed ./deepspeed_config_zero2.json \ --model_name_or_path ${model_path} \ --train_file ../data/ccmt_alp_hf.json \ --preprocessing_num_workers 16 \ --dataloader_num_workers 8 \ --dataloader_pin_memory True \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_steps 1 \ --num_train_epochs 1.5 \ --save_strategy "steps" \ --save_steps 500 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 10 \ --block_size 512 \ --do_train \ --evaluation_strategy "no" \ --validation_split_percentage 0 \ --fp16 True \ --fp16_full_eval True \ --streaming \ --ddp_timeout 3600 \ --seed 1 \ --gradient_checkpointing True \ --output_dir ${model_save}

不知道关于这个问题您是否可以给出解决的思路