idea-ccnl / fengshenbang-lm Goto Github PK

View Code? Open in Web Editor NEW

3.9K 55.0 357.0 86.52 MB

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

License: Apache License 2.0

Python 87.13% Makefile 0.01% C++ 3.11% Shell 8.42% Jupyter Notebook 0.84% C 0.21% Cuda 0.28%

chinese-nlp pretrained-models pytorch distributed-training transformers aigc multimodal

fengshenbang-lm's People

Contributors

Stargazers

Watchers

Forkers

biandh haojiepan1 dumpmemory laoli2046 tiffen jq8205 yuzhang112 xxentropy jx1100370217 sysuhys foreveryl techthiyanes fangd123 gitgl21 randolph-zeng adambear csnelsonchu ztl-35 mysqlsc qianrenjian wanng-ide wangguojim xiayulxy elliotthwang tigerneil jingliyang yinnxinn diqiuzhuanzhuan yuansky createrll yubuyuabc c4712 skbl5694 huhk-sysu sjyttkl liyunbin tumanshu tonylv lucille1005 xiong666 catherinezhou skinny-joey denglizong reflection01 williams-hao michael-wzhu wushidiguo lyogavin knightlancelot songhuan541 joel0495 carma2002 johnson7788 chivalrouss hsiehpinghan vpegasus bill007bill cookize dushwe muximuxi zhianlin sunny820828449 odora zuiwufenghua liangyuxin42 mmiror pennpeng suzhoushr zhangzhiqi1999 jieguangzhou ch8os beethovenvirus haoduoyu1203 gaojl19 11asdad jxzhang789 jhbsz allensky708 feisan billythegod contropist vincezengqiang buptldy george-han dengbenyang wangxiaochun520 beatsleo gisgit liujunhg maxmax2016 finley1991 undercontroller walhalla-summary syx528911137 codebyteme longmarch7 jessie37464 1556830575 ajunlonglive gg-big-org

fengshenbang-lm's Issues

Hi, there are some questions about training hardware👋

I noticed that you wrote "32张A100训练14天" in the document. Erlangshen-MegatronBert-1.3B.md

I have some questions:

Does 32 A100 mean (8 cards in one machine) * 4 or (32 cards in one machine) * 1? (In other words, multiple cards on one machine or multiple machines)
If multiple machines, how do these machines communicate at the hardware level? (NVSwitch+NVlink or netword card?)

We look forward to your answer, thank you ^_^

环境问题

你好，照着官方给的fengshen/requirement.txt环境装好包后，跑不通。我的环境是：
python==3.8.10
pytorch-lightning==1.6.3
torch==1.9.1+cu111
transformers==4.22.1
datasets==2.4.0
deepspeed==0.5.10
jieba-fast==0.53
jieba==0.42.1
protobuf==3.20.1
尝试换了transformers、pytorch lightning版本都不行。请问能提供个可以跑通的环境吗?或者提供一个docker镜像也是极好的。

Pegasus 如何使用

请问如何使用您提供的预训练好的中文Pegasus模型？
我在examples/Pegasus 里编写脚本运行，出现以下错误：

Building prefix dict from the default dictionary ...Dumping model to file cache /cognitive_comp/dongxiaoqun/software/jieba/tmp/jieba.cache
Dump cache file failed.
Traceback (most recent call last):
File "/home/wangyx/miniconda3/envs/wang/lib/python3.8/site-packages/jieba/init.py", line 150, in initialize
fd, fpath = tempfile.mkstemp(dir=tmpdir)
File "/home/wangyx/miniconda3/envs/wang/lib/python3.8/tempfile.py", line 331, in mkstemp
return _mkstemp_inner(dir, prefix, suffix, flags, output_type)
File "/home/wangyx/miniconda3/envs/wang/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: '/cognitive_comp/dongxiaoqun/software/jieba/tmp/tmp7knonap8'
Loading model cost 0.631 seconds.
Prefix dict has been built successfully.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Truncation was not explicitly activated but max_length is provided a specific value, please use truncation=True to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategymore precisely by providing a specific strategy to truncation.

关于模型对字符编码方面的疑问

从以往的经验来看，一般为了语言模型的效果会考虑把多个字符组成一个词然后编码成一个token，Wenzhong却把单个中文字符编码成了两个以上的token，从vocab.json中也看到的也都是跟中文不相关的字符，想知道作者这么做的意图

请问有较成熟的prompt learning的训练和推理框架可用么

您好，请问二郎神刷榜few shot和zero shot的任务是用的prompt learning的范式么，是否有开源的prompt learning框架呢。

Code example to finetune Erlangshen-Roberta-330M-Sentiment

Hi,

Thanks a lot for sharing the pre-trained models. We are using the following model for a research project with hotel reviews in Chinese.
https://huggingface.co/IDEA-CCNL/Erlangshen-Roberta-330M-Sentiment

Many positive reviews are labeled as negative by the model such as the following one:

We would like to fine-tune this model after correcting the labeling results and wonder whether you can give us some pointers on how to do that.

Thanks a lot!

分布式多机多卡训练卡住，超时后报错

接 #111 ，我们搭建了两个相同环境（500G内存、8块1080Ti 11G显卡）的服务器，想尝试多机多卡训练方案，加载模型成功了，但是并没有开始训练，过了一段时间后应该是超时退出了。

#!/bin/bash

set -x -e

echo "START TIME: $(date)"
MICRO_BATCH_SIZE=1
ROOT_DIR=$(pwd)

ZERO_STAGE=3

config_json="$ROOT_DIR/training_config.json"
export MASTER_PORT=$((RANDOM % 10000 + 30000))

cat <<EOT >$config_json
{
  "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE,
  "steps_per_print": 1000,
  "gradient_clipping": 1,
  "zero_optimization": {
    "stage": ${ZERO_STAGE},
    "allgather_partitions": false,
    "allgather_bucket_size": 2e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 2e8,
    "contiguous_gradients": true,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "offload_param": {
      "device": "cpu",
      "pin_memory": true
    },
    "stage3_max_live_parameters" : 2e8,
    "stage3_max_reuse_distance" : 2e8,
    "stage3_prefetch_bucket_size": 2e8,
    "stage3_param_persistence_threshold": 2e8,
    "sub_group_size" : 2e8,
    "round_robin_gradients": true
  },
  "bf16": {
    "enabled": true
  },
  "optimizer": {
    "type": "Adam",
    "params": {
      "lr": 1e-5,
      "betas": [0.9,0.95],
      "eps": 1e-8,
      "weight_decay": 1e-2
    }
  },
  "scheduler": {
    "type": "WarmupLR",
    "params":{
      "warmup_min_lr": 5e-6,
      "warmup_max_lr": 1e-5
    }
  }
}
EOT

export PL_DEEPSPEED_CONFIG_PATH=$config_json
TRAINER_ARGS="
    --max_epochs 1 \
    --num_nodes 2 \
    --gpus 8 \
    --strategy deepspeed_stage_${ZERO_STAGE}_offload \
    --default_root_dir $ROOT_DIR \
    --dirpath $ROOT_DIR/ckpt \
    --save_top_k 3 \
    --monitor train_loss \
    --mode min \
    --save_last \
"

DATA_DIR=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets
DATA_ARGS="
    --data_dir $DATA_DIR \
    --max_seq_length 64 \
    --train_batchsize $MICRO_BATCH_SIZE \
    --valid_batchsize $MICRO_BATCH_SIZE \
    --train_data test_train.txt \
    --valid_data test.txt \
    --test_data  test.txt
"

PRETRAINED_MODEL_PATH="IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese"
MODEL_ARGS="
    --pretrained_model_path ${PRETRAINED_MODEL_PATH} \
    --output_save_path $ROOT_DIR/predict.json \
    --learning_rate 1e-4 \
    --weight_decay 0.1 \
    --warmup 0.01 \
"

DISTRIBUTED_ARGS="
    --nnodes 2 \
    --nproc_per_node=8 \
    --master_addr 192.168.1.14 \
    --master_port 9005 \
    --node_rank 0 \
    --max_restarts=1
"

SCRIPTS_PATH=${ROOT_DIR}/finetune_gpt2.py

export CMD=" \
    $DISTRIBUTED_ARGS \
    $SCRIPTS_PATH \
    $TRAINER_ARGS \
    $MODEL_ARGS \
    $DATA_ARGS \
"

export NCCL_IB_DISABLE=1
export NCCL_SOCKET_IFNAME=enp129s0f0

#python ${CMD}
torchrun ${CMD}

Node0报错如下：

[E ProcessGroupNCCL.cpp:737] [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1800258 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:414] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:737] [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801313 milliseconds before timing out.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1800258 milliseconds before timing out.
Fatal Python error: Aborted

Thread 0x00007fa5abfff700 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007fa431fff700 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007fa5e75ff700 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 316 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 574 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007fa6dc4a6340 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in <lambda>
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 625 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in cpu
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 147 in cpu
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 474 in teardown
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1298 in _teardown
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 736 in _call_and_handle_interrupt
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768 in fit
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 216 in train
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345 in wrapper
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 224 in <module>
[E ProcessGroupNCCL.cpp:737] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801477 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:414] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801477 milliseconds before timing out.
Fatal Python error: Aborted

Thread 0x00007f3c0ffff700 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f3c167fc700 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f3c4b4bf700 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 316 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 574 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007f3d40350340 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in <lambda>
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in cpu
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 147 in cpu
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 474 in teardown
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1298 in _teardown
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 736 in _call_and_handle_interrupt
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768 in fit
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 216 in train
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345 in wrapper
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 224 in <module>
[E ProcessGroupNCCL.cpp:414] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801313 milliseconds before timing out.
Fatal Python error: Aborted

Thread 0x00007fa897fff700 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007fa89effd700 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007fa8d51ec700 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 316 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 574 in wait
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap

Thread 0x00007fa9ca07d340 (most recent call first):
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in <lambda>
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in cpu
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 147 in cpu
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 474 in teardown
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1298 in _teardown
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 736 in _call_and_handle_interrupt
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768 in fit
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 216 in train
  File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345 in wrapper
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 224 in <module>
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59889 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59890 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59891 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59892 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59893 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59894 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59895 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 7 (pid: 59896) of binary: /home/liuzhaofeng/anaconda3/bin/python

有意思的现象是，Node0报错后退出执行程序，而Node1则直接退出了SSH。

client_loop: send disconnect: Broken pipe

我查阅了一些资料，在finetune_gpt2.py增加了一些配置，但也没有起效果。

查阅内容内容如下：
ultralytics/yolov5#7481
https://www.zhihu.com/question/512132168
https://discuss.pytorch.org/t/nccl-timed-out-when-using-the-torch-distributed-run/153276
https://stackoverflow.com/questions/69693950/error-some-nccl-operations-have-failed-or-timed-out

各种模型的名词已经有些无力吐槽

希望可以 share 一下 wenzhong 预训练 GPT2 的参数。

fintuning摘要任务时报错 No module named 'utils'

(summaryfengshen) [hyzhang10@083207 fengshen_pegasus]$ sh randeng_pegasus_523M_summary.sh
Building prefix dict from the default dictionary ...
Loading model from cache /data/hyzhang10/environment/jiebaCache/jieba.cache
Loading model cost 0.662 seconds.
Prefix dict has been built successfully.
Using custom data configuration default-d0a497e7c2f7c312
Reusing dataset json (/data/hyzhang10/.cache/huggingface/datasets/json/default-d0a497e7c2f7c312/0.0.0/a3e658c4731e59120d44081ac10bf85dc7e1388126b92338344ce9661907f253)
Using custom data configuration default-7e746462023f1ebc
Reusing dataset json (/data/hyzhang10/.cache/huggingface/datasets/json/default-7e746462023f1ebc/0.0.0/a3e658c4731e59120d44081ac10bf85dc7e1388126b92338344ce9661907f253)
Using custom data configuration default-33b0ce470df9653c
Reusing dataset json (/data/hyzhang10/.cache/huggingface/datasets/json/default-33b0ce470df9653c/0.0.0/a3e658c4731e59120d44081ac10bf85dc7e1388126b92338344ce9661907f253)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[2022-08-04 14:51:48,391] [WARNING] [deepspeed.py:629:_auto_select_batch_size] Tried to infer the batch size for internal deepspeed logging from the train_dataloader(). To ensure DeepSpeed logging remains correct, please manually pass the plugin with the batch size, Trainer(strategy=DeepSpeedPlugin(logging_batch_size_per_gpu=batch_size)).
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
You have not specified an optimizer or scheduler within the DeepSpeed config. Using configure_optimizers to define optimizer and scheduler.
/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/optimizers.py:37: UserWarning: LightningModule.configure_optimizers returned None, this fit will run with no optimizer
rank_zero_warn(
[2022-08-04 14:51:54,431] [WARNING] [engine.py:1126:_configure_optimizer] **** You are using ZeRO with an untested optimizer, proceed with caution *****
Using /data/hyzhang10/.cache/torch_extensions as PyTorch extensions root...
/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/torch/utils/cpp_extension.py:311: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (g++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
Emitting ninja build file /data/hyzhang10/.cache/torch_extensions/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.10.2.git.kitware.jobserver-1
Loading extension module utils...
Traceback (most recent call last):
File "finetune_pegasus_summary.py", line 330, in
main()
File "finetune_pegasus_summary.py", line 320, in main
trainer.fit(model, data_model)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1188, in _run
self._pre_dispatch()
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1223, in _pre_dispatch
self.accelerator.pre_dispatch(self)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 136, in pre_dispatch
self.training_type_plugin.pre_dispatch()
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 389, in pre_dispatch
self.init_deepspeed()
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 459, in init_deepspeed
self._initialize_deepspeed_train(model)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 492, in _initialize_deepspeed_train
model, deepspeed_optimizer = self._setup_model_and_optimizer(model, optimizer, scheduler)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 423, in _setup_model_and_optimizer
deepspeed_engine, deepspeed_optimizer, _, _ = deepspeed.initialize(
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/init.py", line 119, in initialize
engine = DeepSpeedEngine(args=args,
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 291, in init
self._configure_optimizer(optimizer, model_parameters)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1129, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1350, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer(
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 141, in init
util_ops = UtilsBuilder().load()
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 403, in load
return self.jit_load(verbose)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 435, in jit_load
op_module = load(
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1079, in load
return _jit_compile(
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1317, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1699, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/imp.py", line 296, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'utils'

torch.distributed.launch 分布式训练longformer报错

longformer model训练报错

使用该分布式训练方式时，系统报错
入口使用方式：python -m torch.distributed.launch --nproc_per_node $NUM_GPU --master_port $PORT_ID finetune.py

错误信息：
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.

单卡训练时
入口方式python finetune.py是正常的

请问，可能的原因是什么呢？

DeBertaV2模型重复实验不可复现问题(loss有差异)

使用DeBertaV2做分类任务，采用Erlangshen-DeBERTa-v2-97M-Chinese中文预训练权重
环境如下：cuda11.2 torch 1.8.1+cu111 python 3.7.7 transformers 4.21.1
运行同样的代码2次结果不一样，同样的环境和参数，设置了随机种子
日志信息如下：
`(hy_py37_torch) [root@localhost ccf_fewshot_classification]# python train_patent_bert_kfold.py
/home/kedu/opt/anaconda3/envs/hy_py37_torch/lib/python3.7/site-packages/sklearn/utils/validation.py:37: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'
2022-09-08 17:21:05,479 train_patent_bert_kfold.py [line:95] INFO submit_path------submit/submit_title_abstract_ernie_5fold_integrate_logit_2022-09-08_20.csv
2022-09-08 17:21:05,479 train_patent_bert_kfold.py [line:97] INFO Namespace(accumulation_steps=1, adversarial_type='PGD', batch_size=16, bert_type='deberta', data_type='title_abstract', device='0', duplicate=1, epochs=5, integrate_type='logit', is_adversarial=True, is_masklm=False, is_prompt=False, lr=2e-05, max_len=460, model_out='./output/patent/', pretrained='./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese', prompt_text='[SEP]专利类别[MASK]', random_seed=100, test_file='./data/testA.json', train_file='./data/train.json')
2022-09-08 17:21:05,479 train_patent_bert_kfold.py [line:98] INFO data_type--------title_abstract
2022-09-08 17:21:05,480 train_patent_bert_kfold.py [line:99] INFO patentBert---------./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese
2022-09-08 17:21:05,544 train_patent_bert_kfold.py [line:352] INFO test_datas: 20839
2022-09-08 17:21:05,546 train_patent_bert_kfold.py [line:357] INFO train_datas: 958
tokenization: 20839it [00:15, 1306.00it/s]
2022-09-08 17:21:21,505 train_patent_bert_kfold.py [line:125] INFO ================fold 0===============
2022-09-08 17:21:21,505 train_patent_bert_kfold.py [line:128] INFO save_path---------./output/patent/deberta_186M_title_abstract_2022-09-08_fold_0
Some weights of the model checkpoint at ./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese were not used when initializing PatentDeBertaV2: ['cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias']

This IS expected if you are initializing PatentDeBertaV2 from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing PatentDeBertaV2 from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of PatentDeBertaV2 were not initialized from the model checkpoint at ./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
tokenization: 766it [00:00, 1287.14it/s]
tokenization: 192it [00:00, 1282.17it/s]
2022-09-08 17:21:25,928 train_patent_bert_kfold.py [line:156] INFO ***** Running training *****
2022-09-08 17:21:25,928 train_patent_bert_kfold.py [line:157] INFO Num examples = 48
2022-09-08 17:21:25,929 train_patent_bert_kfold.py [line:158] INFO Num Epochs = 5
2022-09-08 17:21:25,929 train_patent_bert_kfold.py [line:159] INFO Num batch_size = 16
[evaldation] 12/12 [==============================] 97.3ms/step step: 11.0000 2022-09-08 17:21:40,516 train_patent_bert_kfold.py [line:197] INFO save model
2022-09-08 17:21:42,196 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.006359------best_macro_f1:0.006359, loss:2.119576
[evaldation] 12/12 [==============================] 96.7ms/step step: 11.0000 2022-09-08 17:21:56,371 train_patent_bert_kfold.py [line:197] INFO save model
2022-09-08 17:21:58,020 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.011809------best_macro_f1:0.011809, loss:1.970104
[evaldation] 12/12 [==============================] 97.0ms/step step: 11.0000 2022-09-08 17:22:12,217 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.010802------best_macro_f1:0.011809, loss:0.696715
[evaldation] 12/12 [==============================] 99.2ms/step step: 11.0000 2022-09-08 17:22:26,410 train_patent_bert_kfold.py [line:197] INFO save model
2022-09-08 17:22:28,177 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.015645------best_macro_f1:0.015645, loss:0.691476
[evaldation] 12/12 [==============================] 97.7ms/step step: 11.0000 2022-09-08 17:22:42,333 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.008696------best_macro_f1:0.015645, loss:0.281529
(hy_py37_torch) [root@localhost ccf_fewshot_classification]# python train_patent_bert_kfold.py
/home/kedu/opt/anaconda3/envs/hy_py37_torch/lib/python3.7/site-packages/sklearn/utils/validation.py:37: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'
2022-09-08 17:23:30,084 train_patent_bert_kfold.py [line:95] INFO submit_path------submit/submit_title_abstract_ernie_5fold_integrate_logit_2022-09-08_20.csv
2022-09-08 17:23:30,084 train_patent_bert_kfold.py [line:97] INFO Namespace(accumulation_steps=1, adversarial_type='PGD', batch_size=16, bert_type='deberta', data_type='title_abstract', device='0', duplicate=1, epochs=5, integrate_type='logit', is_adversarial=True, is_masklm=False, is_prompt=False, lr=2e-05, max_len=460, model_out='./output/patent/', pretrained='./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese', prompt_text='[SEP]专利类别[MASK]', random_seed=100, test_file='./data/testA.json', train_file='./data/train.json')
2022-09-08 17:23:30,084 train_patent_bert_kfold.py [line:98] INFO data_type--------title_abstract
2022-09-08 17:23:30,084 train_patent_bert_kfold.py [line:99] INFO patentBert---------./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese
2022-09-08 17:23:30,149 train_patent_bert_kfold.py [line:352] INFO test_datas: 20839
2022-09-08 17:23:30,151 train_patent_bert_kfold.py [line:357] INFO train_datas: 958
tokenization: 20839it [00:16, 1264.25it/s]
2022-09-08 17:23:46,638 train_patent_bert_kfold.py [line:125] INFO ================fold 0===============
2022-09-08 17:23:46,638 train_patent_bert_kfold.py [line:128] INFO save_path---------./output/patent/deberta_186M_title_abstract_2022-09-08_fold_0
Some weights of the model checkpoint at ./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese were not used when initializing PatentDeBertaV2: ['cls.predictions.decoder.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
This IS expected if you are initializing PatentDeBertaV2 from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing PatentDeBertaV2 from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of PatentDeBertaV2 were not initialized from the model checkpoint at ./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
tokenization: 766it [00:00, 1286.47it/s]
tokenization: 192it [00:00, 1277.11it/s]
2022-09-08 17:23:51,239 train_patent_bert_kfold.py [line:156] INFO ***** Running training *****
2022-09-08 17:23:51,239 train_patent_bert_kfold.py [line:157] INFO Num examples = 48
2022-09-08 17:23:51,239 train_patent_bert_kfold.py [line:158] INFO Num Epochs = 5
2022-09-08 17:23:51,239 train_patent_bert_kfold.py [line:159] INFO Num batch_size = 16
[evaldation] 12/12 [==============================] 96.2ms/step step: 11.0000 2022-09-08 17:24:05,923 train_patent_bert_kfold.py [line:197] INFO save model
2022-09-08 17:24:07,702 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.007106------best_macro_f1:0.007106, loss:2.129214
[evaldation] 12/12 [==============================] 97.4ms/step step: 11.0000 2022-09-08 17:24:21,986 train_patent_bert_kfold.py [line:197] INFO save model
2022-09-08 17:24:23,790 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.011858------best_macro_f1:0.011858, loss:1.971897
[evaldation] 12/12 [==============================] 96.5ms/step step: 11.0000 2022-09-08 17:24:38,055 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.010802------best_macro_f1:0.011858, loss:0.720681
[evaldation] 12/12 [==============================] 99.7ms/step step: 11.0000 2022-09-08 17:24:52,246 train_patent_bert_kfold.py [line:197] INFO save model
2022-09-08 17:24:53,892 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.013702------best_macro_f1:0.013702, loss:0.709787
[evaldation] 12/12 [==============================] 98.7ms/step step: 11.0000 2022-09-08 17:25:08,261 train_patent_bert_kfold.py [line:204] INFO val_macro_f1:0.007966------best_macro_f1:0.013702, loss:0.290335
`
可以看到每次实验loss都有所差异——sh实验还发现这个差异和输入到模型中的句长有关 sen_length < 100的时候没有差异大于300 400的时候差异明显
fengshenbang_issue.zip

代码在附件中

The zhouwenwang model can't finetune with FP16

The error is as follows:

  File "roformer/modeling_roformer.py", line 333, in forward
    attention_scores = attention_scores.masked_fill(attention_mask,
RuntimeError: value cannot be converted to type at::Half without overflow: -1e+08

What should I do to solve this problem ?

请问 funetune Pegasus 需要的数据格式是什么样的？谢谢！

Fengshenbang-LM/fengshen/examples/summary/randeng_t5_70M_summary.sh

Line 91 in 1415779

DATA_DIR=/cognitive_comp/ganruyi/data_datasets_LCSTS_LCSTS/

clip_finetune_flickr for multi gpus

clip_finetune_flickr 这里的代码只能运行在一个gpu吗，如果多gpus 时候怎么办呢

关于推理输出的问题

您好，我最近正在使用您开源的模型做小样本分类任务，但是在推理结果中，有时还会遇到两个entity_type都会判定为1的情况，我想通过score来取，但是并不是每一个字段都有score。请问是否有参数可控，来强制输出每个字段的score？

例子：
{entity_type: '0', 'label': 0, 'entity_list': []}
{entity_type: '1', 'label': 1, 'entity_list': [], 'score':0.1123412}
{entity_type: '2', 'label': 1, 'entity_list': []}
{entity_type: '3', 'label': 0, 'entity_list': []}

如上所示，在1,2中，模型都判定为1，此时我想通过score来获取最终结果，但是 entity_type = 2的字典中，没有输出score。请问你们是如何处理这种情况的

transformers-4.9.2似乎已有RoFormer结构相关实现？

感谢大家的一系列开源工作，我有个疑问；我这里transformers-4.9.2似乎已有RoFormer结构相关实现，是与本项目中的实现有差异么

二郎神Pretrain的细节

请问是否有二郎神预训练的实验细节呢，谢谢！

Question about Erlangshen 3.9B

Hi, thanks for the great job. Is Erlangshen-3.9B model only use data parallelism ? I did not found any model parallelism in Erlangshen code for 3.9B size model. Can you provide the Erlangshen Big model pretraining detail ? Why the script pretrain_erlangshen_3.9B.sh has been removed in master ?

is there any example for relation extraction (ubert)?

import argparse
from fengshen import UbertPiplines

total_parser = argparse.ArgumentParser("TASK NAME")
total_parser = UbertPiplines.piplines_args(total_parser)
args = total_parser.parse_args()
args.pretrained_model_path = 'IDEA-CCNL/Erlangshen-Ubert-110M-Chinese'  #预训练模型路径
test_data=[
    {
        "task_type": "抽取任务", 
        "subtask_type": "关系抽取", 
        "text": "姚明妻子叶莉罕见现身!39岁气质出众端庄,姚明却发福严重", 
        "choices": [ 
            {"entity_type": "夫妻关系"}
            ],
        "id": 0}
]

model = UbertPiplines(args)
result = model.predict(test_data)
for line in result:
    print(line)

求闻仲（Wenzhong-GPT2-3.5B）finetune的方法

大佬们好，

只在封神榜文档中看到闻仲大模型的下载和生成案例。并没有看到如何使用自己的数据集进行finetune的案例。我的数据集是单论对话案例，即{用户问，AI答}的样例，不知道如何使用闻仲模型进行finetune。能否请各位大佬提供一个torch的finetune例子。

谢谢

麻烦问下太乙多模态模型提供的预训练模型只有英文版本吗IDEA-CCNL/Taiyi-Roberta-124M-D；中文版不知道有吗

老师您好！有个pl的小问题想请教一下!

老师我用了你们的框架，发现都是pl写的，然后我也去试了下，但是我没有slurm，试了很多种方法，但是都卡在开头，没有启动起来，就想问您是不是也遇到了这些情况才使用的slurm？
我用这个minimal version 在两台机器上各两张卡启动
用的链接里的这个方法卡在开头
https://www.pudn.com/news/6313752788df2007aa1b6f42.html
然后pl官方的教程里面的也尝试了也卡住了就是没用slurm

# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))

# define the LightningModule
class LitAutoEncoder(pl.LightningModule):
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def training_step(self, batch, batch_idx):
        # training_step defines the train loop.
        # it is independent of forward
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)
        # Logging to TensorBoard by default
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        optimizer = optim.Adam(self.parameters(), lr=1e-3)
        return optimizer


# init the autoencoder
autoencoder = LitAutoEncoder(encoder, decoder)

# setup data
dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)

# train the model (hint: here are some helpful Trainer arguments for rapid idea iteration)
trainer = pl.Trainer(limit_train_batches=500, accelerator='gpu', devices=2,max_epochs=5,strategy='ddp',num_nodes=2)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)

二郎神1.3B的命名规范和Huggieface不兼容，导致部分权重不能加载

比如'bert.encoder.layer.23.attention.ln.bias', 'bert.encoder.layer.15.attention.ln.bias'
ln在Huggieface里应该是LayerNorm

闻仲finetune的方法

是否有办法在5*3090的机器上对闻仲进行finetune

尝试微调Wenzhong-GPT2-3.5B报错，麻烦大佬看看

尝试微调Wenzhong-GPT2-3.5B报错，具体报错信息如下：

Using pad_token, but it is not set yet.
训练集处理进度: 100%|████████████████████████████████████████████████████████████████| 3774619/3774619 [00:41<00:00, 90473.26it/s]
Using pad_token, but it is not set yet.
验证集处理进度: 100%|████████████████████████████████████████████████████████████████████| 19220/19220 [00:00<00:00, 60371.62it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|██████████████████████████████████████████████████████████████████████| 2409/2409 [00:00<00:00, 67752.58it/s]
num_data: 3774619
/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:446: LightningDeprecationWarning: Setting Trainer(gpus=4) is deprecated in v1.7 and will be removed in v2.0. Please use Trainer(accelerator='gpu', devices=4) instead.
f"Setting Trainer(gpus={gpus!r}) is deprecated in v1.7 and will be removed"
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/opt/conda/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 4 leaked semaphores to clean up at shutdown
len(cache))
Bus error (core dumped)

请教专家老师一个关于batch_sampler中consumed_samples参数的问题

在学习使用官方提供的二郎神1.3B模型预训练脚本pretrain_erlangshen_base.sh时，其中replace_sampler_ddp设置为False，那么在train_dataloader中将会使用自定义的batch_sampler，我看到是通过get_custom_sampler这一函数实现的，其中consumed_samples这个参数的意义和计算逻辑是怎么样的呢？它指的是已经经过训练的样本数量吗？我看到它在模型开始训练时值输出为0，不知道这块有没有问题？自己看代码有点没搞懂，希望可以请假一下专家老师

燃灯 T5 生成模型没法使用

感谢提供的中文大模型。

燃灯 T5 生成模型使用目录 fengshen 中的 T5ForConditionalGeneration ，对于 README 中例子无法正常生成

输入：北京是**的<extra_id_0>
生成结果：'[PAD] <extra_id_0> [eos] <extra_id_0> 。 [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos]'

另外，例子代码貌似有问题，需要传入 batch 才能生成。

The recall of coco-cn test data is not the same

I have download the checkpoint Taiyi-CLIP-Roberta-102M-Chinese, but the recall of coco-cn test data is not the same as described, can you offer the evaluation code. Thanks

unexpected keyword argument 'tokenizer_class'

Traceback (most recent call last):
  File "finetune.py", line 20, in <module>
    from model.roformer.modeling_roformer import RoFormerModel, RoFormerForMaskedLM, RoFormerForSequenceClassification
  File "model/roformer/modeling_roformer.py", line 894, in <module>
    class RoFormerModel(RoFormerPreTrainedModel):
  File "model/roformer/modeling_roformer.py", line 934, in RoFormerModel
    @add_code_sample_docstrings(
TypeError: add_code_sample_docstrings() got an unexpected keyword argument 'tokenizer_class'

huggingface/transformers@f5af873
replace tokenizer_class to processor_class

希望可以开源 continue training 相关代码

希望可以尽快开源 DeBerta continue training 的相关代码。感谢

燃灯系列的 Randeng-Pegasus-523M-Summary-Chinese 模型是不是长传错了？实际效果和文档不符

`
from transformers import PegasusForConditionalGeneration
from tokenizers_pegasus import PegasusTokenizer

model = PegasusForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese")
tokenizer = PegasusTokenizer.from_pretrained("IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese")

text = "据微信公众号“界面”报道，4日上午10点左右，**发改委反垄断调查小组突击查访奔驰上海办事处，调取数据材料，并对多名奔驰高管进行了约谈。截止昨日晚9点，包括北京梅赛德斯-奔驰销售服务有限公司东区总经理在内的多名管理人员仍留在上海办公室内"
inputs = tokenizer(text, max_length=1024, return_tensors="pt")

Generate Summary

summary_ids = model.generate(inputs["input_ids"])
tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

宣传输出: 反垄断调查小组突击查访奔驰上海办事处，对多名奔驰高管进行约谈

实际输出：老鼠老鼠老鼠老鼠老鼠老鼠老鼠

效果远不如base 版本，甚至不如微调版本
模型上传错误？

燃灯 megatron_t5 与Huggingface t5的区别

根据megatron_t5代码的备注，区别在于layer norm 和几个bias=True
请问我的理解对吗

提供一个封神榜docker镜像

最近一些试用反馈了环境问题，可以提供一个运行封神榜所需要安装环境的docker镜像和示例，方便使用。

二郎神模型中遇到不同文档相似度都趋近于 0.5 的问题

使用的模型是：

IDEA-CCNL/Erlangshen-MegatronBert-1.3B

按照下列的流程将文本转为向量：

encode 文本，获得向量数组。
对向量数组 mean pooling 合为一个。
对 A 和 B 文本都进行上述操作，得到的两个结果计算 cosine 值。

from sentence_transformers import util
print(util.cos_sim(encode(["今天天气真好"]), encode(["天天向上"])))

输出：

tensor([[0.5146]], device='cuda:0')

能否提供太乙模型微调或者预训练的代码

BUS error

你好，请问finetuning至少需要多少内存呢？我尝试运行finetune_classification.sh，但是一直报如下错误：
fengshen/examples/classification/finetune_classification.sh: line 74: 28016 Bus error (core dumped) python3 $SCRIPT_PATH $options。网上有人说这个问题是因为内存不够。可是我看硬盘还有670G。

使用Wenzhong2.0-GPT2-3.5B微调后生成乱码

你好，我使用Wenzhong2.0-GPT2-3.5B在下游任务微调后，预测结果是这种乱码，请问有解决办法吗？谢谢🙏

预测生成代码：
tokenizer = GPT2Tokenizer.from_pretrained(model_path)
model = GPT2LMHeadModel.from_pretrained(model_path)
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
print(generator(context, max_length=100, num_return_sequences=1))

根据context生成结果：
grinning pres ausp grinning grinning pres spectator Romo Wad Wad grinning 283 intellectual restricts Spartans mogul [jiang Walter'}

[效果复现问题]Base roberta 模型 + vit-16b + wukong dataset 复现coco-cn结果不足

hi，各位作者好

我在zhihu看到了项目的简介，以及放出的榜单，很感兴趣。

这几天我正在尝试复现这个工作，但是我在base 模型下，coco-cn的评估数据结果和目前公布的数据还有较大的差距，后续会放出训练的细节吗？

我可以先说一下我的训练细节：我是用moco + 对比学习，adam优化器，初始学习率e-4，学习率warm_up + polydecay，4 * 8 a100 多机训练，bs256，大约训练了80w步，目前coco-cn只能到80+。

感谢开源这么优秀的工作！是否有考虑出一个封神系列模型和现有模型的详细对比实验结果呢，这具有很好的参考价值

如何对IDEA-CCNL/Taiyi-CLIP-Roberta-large-326M-Chinese模型进行fine-tune?

你好，我想请问下，如何使用IDEA-CCNL/Taiyi-CLIP-Roberta-large-326M-Chinese在自己的数据集上进行微调？谢谢！

longformer摘要输出问题

你好，我看到你们提供了longformer加载模型的方式，能否再提供一下使用longformer做生成摘要任务时的代码

Finetune闻仲2.0-GPT2-3.5B-chinese显存爆炸，似乎offload_param没有生效

我正在微调最近发布的IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese模型。

使用的脚本是wenzhong_qa，基于我们的业务场景进行了调整。

由于机器配置的限制，我们想结合DeepSpeed的ZeRO-3进行训练，但似乎并没有对模型参数进行切分。

机器配置如下，8块1080Ti：

Every 1.0s: nvidia-smi                         office4: Tue Aug  9 18:43:35 2022

Tue Aug  9 18:43:35 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
| 23%   29C    P8     8W / 250W |      8MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:05:00.0 Off |                  N/A |
| 23%   26C    P8     9W / 250W |      8MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:08:00.0 Off |                  N/A |
| 23%   25C    P8     8W / 250W |      8MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 23%   25C    P8     9W / 250W |      8MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce ...  Off  | 00000000:84:00.0 Off |                  N/A |
| 23%   28C    P8    10W / 250W |      8MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA GeForce ...  Off  | 00000000:85:00.0 Off |                  N/A |
| 23%   26C    P8     8W / 250W |      8MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA GeForce ...  Off  | 00000000:88:00.0 Off |                  N/A |
| 23%   26C    P8     9W / 250W |      8MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA GeForce ...  Off  | 00000000:89:00.0 Off |                  N/A |
| 23%   24C    P8     8W / 250W |      8MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2464      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      2464      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      2464      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      2464      G   /usr/lib/xorg/Xorg                  4MiB |
|    4   N/A  N/A      2464      G   /usr/lib/xorg/Xorg                  4MiB |
|    5   N/A  N/A      2464      G   /usr/lib/xorg/Xorg                  4MiB |
|    6   N/A  N/A      2464      G   /usr/lib/xorg/Xorg                  4MiB |
|    7   N/A  N/A      2464      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

Finetune的shell脚本：

#!/bin/bash

set -x -e

echo "START TIME: $(date)"
MICRO_BATCH_SIZE=1
ROOT_DIR=$(pwd)

ZERO_STAGE=3

config_json="$ROOT_DIR/training_config.json"
export MASTER_PORT=$[RANDOM%10000+30000]

# Deepspeed figures out GAS dynamically from dynamic GBS via set_train_batch_size()
cat <<EOT > $config_json
{
  "train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE,
  "steps_per_print": 1000,
  "gradient_clipping": 1,
  "zero_optimization": {
    "stage": ${ZERO_STAGE},
    "allgather_partitions": true,
    "allgather_bucket_size": 1e7,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 1e7,
    "contiguous_gradients": true,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "offload_param": {
      "device": "cpu",
      "pin_memory": true
    },
    "stage3_max_live_parameters" : 1e7,
    "stage3_max_reuse_distance" : 1e7,
    "stage3_prefetch_bucket_size": 1e7,
    "stage3_param_persistence_threshold": 1e7,
    "sub_group_size" : 1e7,
    "round_robin_gradients": true
  },
  "fp16": {
    "enabled": true,
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "initial_scale_power": 16,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "optimizer": {
      "type": "AdamW",
      "params": {
      "lr": 0.001,
      "betas": [
          0.8,
          0.999
      ],
      "eps": 1e-8,
      "weight_decay": 3e-7
      }
  }
}
EOT

export PL_DEEPSPEED_CONFIG_PATH=$config_json
TRAINER_ARGS="
    --max_epochs 10 \
    --gpus 8 \
    --num_nodes 1 \
    --strategy deepspeed_stage_${ZERO_STAGE}_offload \
    --default_root_dir $ROOT_DIR \
    --dirpath $ROOT_DIR/ckpt \
    --save_top_k 3 \
    --monitor train_loss \
    --mode min \
    --save_last \
"
DATA_DIR=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets
DATA_ARGS="
    --data_dir $DATA_DIR \
    --train_batchsize $MICRO_BATCH_SIZE \
    --valid_batchsize $MICRO_BATCH_SIZE \
    --train_data train.txt \
    --valid_data valid.txt \
    --test_data  test.txt
"

PRETRAINED_MODEL_PATH="IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese"
MODEL_ARGS="
    --pretrained_model_path ${PRETRAINED_MODEL_PATH} \
    --output_save_path $ROOT_DIR/predict.json \
    --learning_rate 1e-4 \
    --weight_decay 0.1 \
    --warmup 0.01 \
"

SCRIPTS_PATH=${ROOT_DIR}/finetune_gpt2.py

export CMD=" \
    $SCRIPTS_PATH \
    $TRAINER_ARGS \
    $MODEL_ARGS \
    $DATA_ARGS \
    "

python ${CMD}

Finetune的Python脚本：

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"

import argparse

import torch as th
import pytorch_lightning as pl

from transformers import GPT2LMHeadModel
from pytorch_lightning import Trainer, loggers
from pytorch_lightning.callbacks import ModelCheckpoint
from transformers.optimization import get_linear_schedule_with_warmup

from dataset import GPT2DataModel


class GPT2FinetuneMedicalQAModelCheckpoint:
	@staticmethod
	def add_argparse_args(parent_args):
		parser = parent_args.add_argument_group('BaseModel')

		parser.add_argument('--monitor', default='train_loss', type=str)
		parser.add_argument('--mode', default='min', type=str)
		parser.add_argument('--dirpath', default='./ckpt/', type=str)
		parser.add_argument('--filename', default='model-{epoch:02d}-{train_loss:.4f}', type=str)
		parser.add_argument('--save_last', action='store_true', default=True)
		parser.add_argument('--save_top_k', default=3, type=float)
		parser.add_argument('--every_n_train_steps', default=1000, type=float)
		parser.add_argument('--save_weights_only', default=True, type=bool)

		return parent_args

	def __init__(self, args):
		self.callbacks = ModelCheckpoint(monitor=args.monitor, save_top_k=args.save_top_k, mode=args.mode,
		                                 save_weights_only=args.save_weights_only, dirpath=args.dirpath,
		                                 filename=args.filename, save_last=args.save_last)


class GPT2Finetune(pl.LightningModule):

	@staticmethod
	def add_model_specific_args(parent_args):
		parser = parent_args.add_argument_group("BaseModel")
		parser.add_argument("--learning_rate", default=1e-4, type=float)
		parser.add_argument("--weight_decay", default=0.1, type=float)
		parser.add_argument("--warmup", default=0.01, type=float)
		return parent_args

	def __init__(self, args, num_data):
		super().__init__()
		self.args = args
		self.num_data = num_data
		print('num_data:', num_data)
		self.model = GPT2LMHeadModel.from_pretrained(args.pretrained_model_path)

	def setup(self, stage) -> None:
		if stage == 'fit':
			num_gpus = self.trainer.gpus if self.trainer.gpus is not None else 0
			self.total_step = int(self.trainer.max_epochs * self.num_data /
			                      (max(1, num_gpus) * self.trainer.accumulate_grad_batches))
			print('Total training step:', self.total_step)

	def training_step(self, batch, batch_idx):
		output = self.model(input_ids=batch['input_ids'], attention_mask=batch['attention_mask'],
		                    labels=batch['labels'])
		# output = self.model(input_ids=batch['input_ids'], labels=batch['labels'])
		# acc = self.comput_metrix(output.logits, batch['labels'])
		self.log('train_loss', output.loss)
		return output.loss

	def comput_metrix(self, logits, labels):
		y_pred = th.argmax(logits, dim=-1)
		y_pred = y_pred.view(size=(-1,))
		y_true = labels.view(size=(-1,)).float()
		corr = th.eq(y_pred, y_true)
		acc = th.sum(corr.float()) / labels.size()[0]
		return acc

	def validation_step(self, batch, batch_idx):
		output = self.model(input_ids=batch['input_ids'], attention_mask=batch['attention_mask'],
		                    labels=batch['labels'])
		self.log('val_loss', output.loss)

	def configure_optimizers(self):
		no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
		paras = list(filter(lambda p: p[1].requires_grad, self.named_parameters()))
		paras = [{
			'params':
				[p for n, p in paras if not any(nd in n for nd in no_decay)],
			'weight_decay': self.args.weight_decay
		}, {
			'params': [p for n, p in paras if any(nd in n for nd in no_decay)],
			'weight_decay': 0.0
		}]
		optimizer = th.optim.AdamW(paras, lr=self.args.learning_rate)
		scheduler = get_linear_schedule_with_warmup(
			optimizer, int(self.total_step * self.args.warmup),
			self.total_step)

		return [{
			'optimizer': optimizer,
			'lr_scheduler': {
				'scheduler': scheduler,
				'interval': 'step',
				'frequency': 1
			}
		}]


def main():
	total_parser = argparse.ArgumentParser("Summary Task")
	total_parser.add_argument('--do_eval_only', action='store_true', default=False)
	total_parser.add_argument('--pretrained_model_path', default=None, type=str)
	total_parser.add_argument('--output_save_path', default='./predict.json', type=str)
	# * Args for data preprocessing
	total_parser = GPT2DataModel.add_data_specific_args(total_parser)
	# * Args for training
	total_parser = Trainer.add_argparse_args(total_parser)
	total_parser = GPT2FinetuneMedicalQAModelCheckpoint.add_argparse_args(total_parser)
	total_parser = GPT2Finetune.add_model_specific_args(total_parser)
	# * Args for base model
	args = total_parser.parse_args()

	data_model = GPT2DataModel(args)
	model = GPT2Finetune(args, len(data_model.train_dataloader()))
	checkpoint_callback = GPT2FinetuneMedicalQAModelCheckpoint(args).callbacks
	logger = loggers.TensorBoardLogger(save_dir=os.path.join(args.default_root_dir, 'log/'), name='MedicalQA-GPT2')
	trainer = Trainer.from_argparse_args(args, logger=logger, callbacks=[checkpoint_callback])
	trainer.fit(model, data_model)

	model.model.save_pretrained("./models/finetune/gpt2")


if __name__ == '__main__':
	main()

调用脚本之后，前面加载数据正常，但后面开始训练后报错，日志：

$ bash finetune_gpt2.sh 
++ date
+ echo 'START TIME: 2022年 08月 09日 星期二 18:49:15 CST'
START TIME: 2022年 08月 09日 星期二 18:49:15 CST
+ MICRO_BATCH_SIZE=1
++ pwd
+ ROOT_DIR=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog
+ ZERO_STAGE=3
+ config_json=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/training_config.json
+ export MASTER_PORT=30021
+ MASTER_PORT=30021
+ cat
+ export PL_DEEPSPEED_CONFIG_PATH=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/training_config.json
+ PL_DEEPSPEED_CONFIG_PATH=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/training_config.json
+ TRAINER_ARGS='
    --max_epochs 10     --gpus 8     --num_nodes 1     --strategy deepspeed_stage_3_offload     --default_root_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog     --dirpath /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/ckpt     --save_top_k 3     --monitor train_loss     --mode min     --save_last '
+ DATA_DIR=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets
+ DATA_ARGS='
    --data_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets     --train_batchsize 1     --valid_batchsize 1     --train_data train.txt     --valid_data valid.txt     --test_data  test.txt
'
+ PRETRAINED_MODEL_PATH=IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese
+ MODEL_ARGS='
    --pretrained_model_path IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese     --output_save_path /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/predict.json     --learning_rate 1e-4     --weight_decay 0.1     --warmup 0.01 '
+ SCRIPTS_PATH=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py
+ export 'CMD=     /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py     
    --max_epochs 10     --gpus 8     --num_nodes 1     --strategy deepspeed_stage_3_offload     --default_root_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog     --dirpath /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/ckpt     --save_top_k 3     --monitor train_loss     --mode min     --save_last      
    --pretrained_model_path IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese     --output_save_path /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/predict.json     --learning_rate 1e-4     --weight_decay 0.1     --warmup 0.01      
    --data_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets     --train_batchsize 1     --valid_batchsize 1     --train_data train.txt     --valid_data valid.txt     --test_data  test.txt
     '
+ CMD='     /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py     
    --max_epochs 10     --gpus 8     --num_nodes 1     --strategy deepspeed_stage_3_offload     --default_root_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog     --dirpath /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/ckpt     --save_top_k 3     --monitor train_loss     --mode min     --save_last      
    --pretrained_model_path IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese     --output_save_path /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/predict.json     --learning_rate 1e-4     --weight_decay 0.1     --warmup 0.01      
    --data_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets     --train_batchsize 1     --valid_batchsize 1     --train_data train.txt     --valid_data valid.txt     --test_data  test.txt
     '
+ python /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py --max_epochs 10 --gpus 8 --num_nodes 1 --strategy deepspeed_stage_3_offload --default_root_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog --dirpath /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/ckpt --save_top_k 3 --monitor train_loss --mode min --save_last --pretrained_model_path IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese --output_save_path /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/predict.json --learning_rate 1e-4 --weight_decay 0.1 --warmup 0.01 --data_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets --train_batchsize 1 --valid_batchsize 1 --train_data train.txt --valid_data valid.txt --test_data test.txt
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1141178.50it/s]
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 744991.83it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 801970.17it/s]
num_data: 234801
Loading DeepSpeed config from set PL_DEEPSPEED_CONFIG_PATH environment variable
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1466089.81it/s]
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 725658.13it/s]
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1313860.90it/s]
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1346486.25it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 735842.81it/s]
num_data: 234801
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 719434.65it/s]
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 719434.65it/s]
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1120199.53it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 747647.77it/s]
num_data: 234801
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1271098.49it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 775287.25it/s]
num_data: 234801
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/8
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 741043.11it/s]
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1259098.78it/s]
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 782519.40it/s]
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1224649.98it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 798915.05it/s]
num_data: 234801
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 700217.70it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 659481.76it/s]
num_data: 234801
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 661562.15it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 812849.61it/s]
num_data: 234801
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 679789.95it/s]
num_data: 234801
initializing deepspeed distributed: GLOBAL_RANK: 1, MEMBER: 2/8
initializing deepspeed distributed: GLOBAL_RANK: 3, MEMBER: 4/8
initializing deepspeed distributed: GLOBAL_RANK: 2, MEMBER: 3/8
initializing deepspeed distributed: GLOBAL_RANK: 4, MEMBER: 5/8
initializing deepspeed distributed: GLOBAL_RANK: 6, MEMBER: 7/8
initializing deepspeed distributed: GLOBAL_RANK: 5, MEMBER: 6/8
initializing deepspeed distributed: GLOBAL_RANK: 7, MEMBER: 8/8
Total training step: 293501
Total training step: 293501
Total training step: 293501
Total training step: 293501
Total training step: 293501
Total training step: 293501
Total training step: 293501
/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:611: UserWarning: Checkpoint directory /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/ckpt exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:2192: LightningDeprecationWarning: `Trainer.gpus` was deprecated in v1.6 and will be removed in v1.8. Please use `Trainer.num_devices` or `Trainer.device_ids` to get device information instead.
  rank_zero_deprecation(
Total training step: 293501
LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
You have specified an optimizer and/or scheduler within the DeepSpeed config. It is recommended to define it in `LightningModule.configure_optimizers`.
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/liuzhaofeng/.cache/torch_extensions/py39_cu113/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.354057550430298 seconds
Loading extension module cpu_adam...
Loading extension module cpu_adam...
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.3590314388275146 seconds
Time to load cpu_adam op: 3.3760969638824463 seconds
Time to load cpu_adam op: 3.365596055984497 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.258009910583496 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.3222544193267822 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.3342092037200928 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.2041637897491455 seconds
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Emitting ninja build file /home/liuzhaofeng/.cache/torch_extensions/py39_cu113/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.5016887187957764 seconds
Loading extension module utils...
Loading extension module utils...
Time to load utils op: 0.20363640785217285 seconds
Time to load utils op: 0.10370874404907227 seconds
Loading extension module utils...
Time to load utils op: 0.2036135196685791 seconds
Loading extension module utils...
Time to load utils op: 0.20850586891174316 seconds
Loading extension module utils...
Time to load utils op: 0.20370721817016602 seconds
Loading extension module utils...
Loading extension module utils...
Time to load utils op: 0.20325350761413574 seconds
Time to load utils op: 0.2022240161895752 seconds
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Loading extension module utils...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Time to load utils op: 0.0008826255798339844 seconds
Time to load utils op: 0.0008733272552490234 seconds
Time to load utils op: 0.0008597373962402344 seconds
Time to load utils op: 0.0008990764617919922 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Time to load utils op: 0.0011043548583984375 seconds
Time to load utils op: 0.0009999275207519531 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0008752346038818359 seconds
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0008261203765869141 seconds

  | Name  | Type            | Params
------------------------------------------
0 | model | GPT2LMHeadModel | 364   
------------------------------------------
364       Trainable params
0         Non-trainable params
364       Total params
0.001     Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Epoch 0:   0%|                                                                                                                                                                                                | 0/29364 [00:00<?, ?it/s]/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py:420: UserWarning: Error handling mechanism for deadlock detection is uninitialized. Skipping check.
  rank_zero_warn("Error handling mechanism for deadlock detection is uninitialized. Skipping check.")
Traceback (most recent call last):
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 147, in <module>
    main()
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 141, in main
    trainer.fit(model, data_model)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
    self._call_and_handle_interrupt(
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1234, in _run
    results = self._run_stage()
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1321, in _run_stage
    return self._run_train()
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1351, in _run_train
    self.fit_loop.run()
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 268, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
    batch_output = self.batch_loop.run(batch, batch_idx)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
    result = self._run_optimization(
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1593, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1644, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step
    optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/deepspeed.py", line 70, in optimizer_step
    closure_result = closure()
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
    step_output = self._step_fn()
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1763, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 341, in training_step
    return self.model(*args, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1588, in forward
    loss = self.module(*inputs, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/deepspeed.py", line 80, in forward
    return super().forward(*inputs, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 76, in training_step
    output = self.model(input_ids=batch['input_ids'], attention_mask=batch['attention_mask'],
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1058, in forward
    transformer_outputs = self.transformer(
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 901, in forward
    outputs = block(
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 438, in forward
    feed_forward_hidden_states = self.mlp(hidden_states)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 365, in forward
    hidden_states = self.c_fc(hidden_states)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1137, in _call_impl
    result = hook(self, input)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 1408, in _pre_forward_module_hook
    self.pre_sub_module_forward_function(module)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 1520, in pre_sub_module_forward_function
    self.param_coordinator.fetch_sub_module(sub_module)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 448, in fetch_sub_module
    self._all_gather(partitioned_params, async_op=False)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 525, in _all_gather
    handles = partitioned_params[0].all_gather(
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 596, in all_gather
    return self._all_gather(param_list, async_op=async_op, hierarchy=hierarchy)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 705, in _all_gather
Traceback (most recent call last):
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 147, in <module>
    ret_value = self._allgather_params_coalesced(all_gather_list, hierarchy)
  File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 936, in _allgather_params_coalesced
    main()
  File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 141, in main
    flat_tensor = torch.empty(tensor_size,
RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 3; 10.92 GiB total capacity; 10.02 GiB already allocated; 32.69 MiB free; 10.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

由于1080Ti的显存是11G，而经过我们计算3.5B的模型仅参数就13G，因此想通过ZeRO-3将参数切分到8块GPU，但是我们脚本启动之后我们监控了GPU的显存变换，似乎并没有将模型进行切分。

可以发现每块GPU上显存都占满了，似乎并没有对模型进行切分。

想请教一下：

是我写的配置文件有问题么？
预训练好的模型是否还支持模型参数切分？

非常感谢，如果能帮忙解答以上问题的话，感激不尽！

被名字吸引而来=.=

这名字太有趣了QAQ

请问Randeng-MegatronT5-770M可以像huggingface T5有一个num_return_sequences参数来获得多个返回结果吗？

请教一下Wenzhong2.0-GPT2-3.5B-chinese模型训练时采用的技术架构

各位大佬好，非常感谢IDEA开源的这些预训练模型，给我们带来了很大的帮助。

我们最近在尝试微调Wenzhong2.0-GPT2-3.5B-chinese这个模型，但受限于机器的配置，虽然单机多卡但并不能加载整个模型。

我注意到HuggingFace上提到这个模型在训练时采用了32块A100，所以想请教一下训练的时候所采用的技术架构，如果方便的话，还请帮忙解答以下问题：

预训练的过程中采用的分布式训练方案？（模型并行、数据并行、流水线并行或者混合并行？）
训练过程中是否是通过集群进行训练？（单机多卡、多机多卡？）
不同显卡之间如何通信？（Ring AII-Reduce？）
是否是通过框架实现的分布式训练？（ColossalAI、FairSeq、Megatron-LM？）

非常感谢，如果能帮忙解答以上问题的话，感激不尽，

使用Rangdeng-Pegasus系列，提示文件not found /cognitive_comp/dongxiaoqun/software/jieba/tmp/tmpk8ungvhc

使用Randeng-Pegasus-523M-Summary-Chinese

请问可以提供这份自定义jieba分词文件吗？

运行速度超级慢，加了cuda之后更慢了

1.openai原版的clip官方demo：一张图片，三个标签，使用cuda的情况下，只需要4s
2.您这里的clip的官方demo：一张图片，五个标签，默认不使用cuda，需要30s
好不容易改造了代码，使用cuda的情况下，需要50s

这么神奇的操作，我也是服了
请问官方可以早日出cuda加速代码么

ValueError:

非常感谢开源。有几个问题请教一下：

1、在加载下面代码时，发现程序报错了，但是将AutoTokenizer替换为BertTokenizer程序就对了！

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(Path('E://work//Dialog//checkpoints//Zhouwenwang-110M'))

ValueError: unable to parse E:\work\Dialog\checkpoints\Zhouwenwang-110M\tokenizer_config.json as a URL or as a local path

transformers版本为 4.15.0
Python 版本为 3.7
torch 版本为 1.7.1

请问是版本不对还是缺少tokenizer_config.json文件

2、我发现110M和1.3B两个模型config.json中model_type不一样前者是megatron-bert，后者是bert。但是加载的都是RoFormerModel，所以是不是不用关心model_type。但是发现在pretraining.py中，对于下面的代码

model_mlm_type = {'bert': BertForMaskedLM,
                  'roformer': RoFormerForMaskedLM,
                  'megatron': MegatronBertForMaskedLM}

不同的model_type所采用的MaskedLM是不一样的。

3、对于给的例程，我发现predict的时候是用当前词去预测下一个词的，但是pretraining的时候是将当前位置用[MASK]代替，然后让模型对于source和target采用不同的attention_mask方式去预测的。这样的话predict个pretraining是不一至的，模型在predict的时候又怎么能够去根据当前词去预测下一个词呢？请问我是不是什么地方理解的不对。因为用下面的例程可以有很好的预测结果，因此不明白1.3B这个模型具体是怎么训练的？

from model.roformer.modeling_roformer import RoFormerModel
from transformers import BertTokenizer
import torch
import numpy as np
from pathlib import Path

sentence = '清华大学位于'
max_length = 32

tokenizer = BertTokenizer.from_pretrained(Path('E://work//Dialog//checkpoints//Zhouwenwang-1.3B'))
model = RoFormerModel.from_pretrained(Path('E://work//Dialog//checkpoints//Zhouwenwang-1.3B'))
# model = model.to(torch.device('cuda'))
for i in range(max_length):
    encode = torch.tensor(
        [[tokenizer.cls_token_id]+tokenizer.encode(sentence, add_special_tokens=False)]).long()
    # encode = encode.to(torch.device('cuda'))
    print(encode)
    logits = model(encode)[0]
    logits = torch.nn.functional.linear(
        logits, model.embeddings.word_embeddings.weight)
    logits = torch.nn.functional.softmax(
        logits, dim=-1).cpu().detach().numpy()[0]
    sentence = sentence + \
        tokenizer.decode(int(np.random.choice(logits.shape[1], p=logits[-1])))
    if sentence[-1] == '。':
        break
print(sentence)

非常期待得到您的回复！