idea-ccnl / fengshenbang-lm Goto Github PK
View Code? Open in Web Editor NEWFengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
License: Apache License 2.0
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
License: Apache License 2.0
I noticed that you wrote "32张A100训练14天" in the document. Erlangshen-MegatronBert-1.3B.md
I have some questions:
We look forward to your answer, thank you ^_^
你好,照着官方给的fengshen/requirement.txt环境装好包后,跑不通。我的环境是:
python==3.8.10
pytorch-lightning==1.6.3
torch==1.9.1+cu111
transformers==4.22.1
datasets==2.4.0
deepspeed==0.5.10
jieba-fast==0.53
jieba==0.42.1
protobuf==3.20.1
尝试换了transformers、pytorch lightning版本都不行。请问能提供个可以跑通的环境吗?或者提供一个docker镜像也是极好的。
请问如何使用您提供的预训练好的中文Pegasus模型?
我在examples/Pegasus 里编写脚本运行,出现以下错误:
Building prefix dict from the default dictionary ...Dumping model to file cache /cognitive_comp/dongxiaoqun/software/jieba/tmp/jieba.cache
Dump cache file failed.
Traceback (most recent call last):
File "/home/wangyx/miniconda3/envs/wang/lib/python3.8/site-packages/jieba/init.py", line 150, in initialize
fd, fpath = tempfile.mkstemp(dir=tmpdir)
File "/home/wangyx/miniconda3/envs/wang/lib/python3.8/tempfile.py", line 331, in mkstemp
return _mkstemp_inner(dir, prefix, suffix, flags, output_type)
File "/home/wangyx/miniconda3/envs/wang/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: '/cognitive_comp/dongxiaoqun/software/jieba/tmp/tmp7knonap8'
Loading model cost 0.631 seconds.
Prefix dict has been built successfully.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Truncation was not explicitly activated but max_length
is provided a specific value, please use truncation=True
to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategymore precisely by providing a specific strategy to truncation
.
您好,请问二郎神刷榜few shot和zero shot的任务是用的prompt learning的范式么,是否有开源的prompt learning框架呢。
Hi,
Thanks a lot for sharing the pre-trained models. We are using the following model for a research project with hotel reviews in Chinese.
https://huggingface.co/IDEA-CCNL/Erlangshen-Roberta-330M-Sentiment
Many positive reviews are labeled as negative by the model such as the following one:
We would like to fine-tune this model after correcting the labeling results and wonder whether you can give us some pointers on how to do that.
Thanks a lot!
接 #111 ,我们搭建了两个相同环境(500G内存、8块1080Ti 11G显卡)的服务器,想尝试多机多卡训练方案,加载模型成功了,但是并没有开始训练,过了一段时间后应该是超时退出了。
#!/bin/bash
set -x -e
echo "START TIME: $(date)"
MICRO_BATCH_SIZE=1
ROOT_DIR=$(pwd)
ZERO_STAGE=3
config_json="$ROOT_DIR/training_config.json"
export MASTER_PORT=$((RANDOM % 10000 + 30000))
cat <<EOT >$config_json
{
"train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE,
"steps_per_print": 1000,
"gradient_clipping": 1,
"zero_optimization": {
"stage": ${ZERO_STAGE},
"allgather_partitions": false,
"allgather_bucket_size": 2e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"contiguous_gradients": true,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"stage3_max_live_parameters" : 2e8,
"stage3_max_reuse_distance" : 2e8,
"stage3_prefetch_bucket_size": 2e8,
"stage3_param_persistence_threshold": 2e8,
"sub_group_size" : 2e8,
"round_robin_gradients": true
},
"bf16": {
"enabled": true
},
"optimizer": {
"type": "Adam",
"params": {
"lr": 1e-5,
"betas": [0.9,0.95],
"eps": 1e-8,
"weight_decay": 1e-2
}
},
"scheduler": {
"type": "WarmupLR",
"params":{
"warmup_min_lr": 5e-6,
"warmup_max_lr": 1e-5
}
}
}
EOT
export PL_DEEPSPEED_CONFIG_PATH=$config_json
TRAINER_ARGS="
--max_epochs 1 \
--num_nodes 2 \
--gpus 8 \
--strategy deepspeed_stage_${ZERO_STAGE}_offload \
--default_root_dir $ROOT_DIR \
--dirpath $ROOT_DIR/ckpt \
--save_top_k 3 \
--monitor train_loss \
--mode min \
--save_last \
"
DATA_DIR=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets
DATA_ARGS="
--data_dir $DATA_DIR \
--max_seq_length 64 \
--train_batchsize $MICRO_BATCH_SIZE \
--valid_batchsize $MICRO_BATCH_SIZE \
--train_data test_train.txt \
--valid_data test.txt \
--test_data test.txt
"
PRETRAINED_MODEL_PATH="IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese"
MODEL_ARGS="
--pretrained_model_path ${PRETRAINED_MODEL_PATH} \
--output_save_path $ROOT_DIR/predict.json \
--learning_rate 1e-4 \
--weight_decay 0.1 \
--warmup 0.01 \
"
DISTRIBUTED_ARGS="
--nnodes 2 \
--nproc_per_node=8 \
--master_addr 192.168.1.14 \
--master_port 9005 \
--node_rank 0 \
--max_restarts=1
"
SCRIPTS_PATH=${ROOT_DIR}/finetune_gpt2.py
export CMD=" \
$DISTRIBUTED_ARGS \
$SCRIPTS_PATH \
$TRAINER_ARGS \
$MODEL_ARGS \
$DATA_ARGS \
"
export NCCL_IB_DISABLE=1
export NCCL_SOCKET_IFNAME=enp129s0f0
#python ${CMD}
torchrun ${CMD}
Node0报错如下:
[E ProcessGroupNCCL.cpp:737] [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1800258 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:414] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[E ProcessGroupNCCL.cpp:737] [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801313 milliseconds before timing out.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1800258 milliseconds before timing out.
Fatal Python error: Aborted
Thread 0x00007fa5abfff700 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007fa431fff700 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007fa5e75ff700 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 316 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 574 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007fa6dc4a6340 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in <lambda>
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 625 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in cpu
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 147 in cpu
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 474 in teardown
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1298 in _teardown
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 736 in _call_and_handle_interrupt
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768 in fit
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 216 in train
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345 in wrapper
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 224 in <module>
[E ProcessGroupNCCL.cpp:737] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801477 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:414] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801477 milliseconds before timing out.
Fatal Python error: Aborted
Thread 0x00007f3c0ffff700 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007f3c167fc700 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007f3c4b4bf700 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 316 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 574 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007f3d40350340 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in <lambda>
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in cpu
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 147 in cpu
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 474 in teardown
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1298 in _teardown
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 736 in _call_and_handle_interrupt
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768 in fit
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 216 in train
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345 in wrapper
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 224 in <module>
[E ProcessGroupNCCL.cpp:414] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=749, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801313 milliseconds before timing out.
Fatal Python error: Aborted
Thread 0x00007fa897fff700 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007fa89effd700 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 312 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 910 in run
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007fa8d51ec700 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 316 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 574 in wait
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/home/liuzhaofeng/anaconda3/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007fa9ca07d340 (most recent call first):
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in <lambda>
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579 in _apply
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 738 in cpu
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 147 in cpu
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 474 in teardown
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1298 in _teardown
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 736 in _call_and_handle_interrupt
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768 in fit
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 216 in train
File "/home/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345 in wrapper
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 224 in <module>
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59889 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59890 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59891 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59892 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59893 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59894 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 59895 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 7 (pid: 59896) of binary: /home/liuzhaofeng/anaconda3/bin/python
有意思的现象是,Node0报错后退出执行程序,而Node1则直接退出了SSH。
client_loop: send disconnect: Broken pipe
我查阅了一些资料,在finetune_gpt2.py增加了一些配置,但也没有起效果。
查阅内容内容如下:
ultralytics/yolov5#7481
https://www.zhihu.com/question/512132168
https://discuss.pytorch.org/t/nccl-timed-out-when-using-the-torch-distributed-run/153276
https://stackoverflow.com/questions/69693950/error-some-nccl-operations-have-failed-or-timed-out
RT
各位好 :) 推荐试试我的 RWKV-v2-RNN 模型,达到 Transformer 性能,且支持并行和串行模式,速度快,省显存。
我也在深圳。目前国内外有多个团队正在测试它在各领域的性能。
之前我在 Reddit 写的介绍,近10万阅读量: https://www.reddit.com/r/MachineLearning/comments/umq908/r_rwkvv2rnn_a_parallelizable_rnn_with/
LSTM 提出者 Sepp Hochreiter 在推特也发贴介绍: https://twitter.com/HochreiterSepp/status/1524270961314484227
Github 地址:https://github.com/BlinkDL/RWKV-LM
我正在炼的中文模型的效果: https://www.zhihu.com/pin/1508538195382169600
(summaryfengshen) [hyzhang10@083207 fengshen_pegasus]$ sh randeng_pegasus_523M_summary.sh
Building prefix dict from the default dictionary ...
Loading model from cache /data/hyzhang10/environment/jiebaCache/jieba.cache
Loading model cost 0.662 seconds.
Prefix dict has been built successfully.
Using custom data configuration default-d0a497e7c2f7c312
Reusing dataset json (/data/hyzhang10/.cache/huggingface/datasets/json/default-d0a497e7c2f7c312/0.0.0/a3e658c4731e59120d44081ac10bf85dc7e1388126b92338344ce9661907f253)
Using custom data configuration default-7e746462023f1ebc
Reusing dataset json (/data/hyzhang10/.cache/huggingface/datasets/json/default-7e746462023f1ebc/0.0.0/a3e658c4731e59120d44081ac10bf85dc7e1388126b92338344ce9661907f253)
Using custom data configuration default-33b0ce470df9653c
Reusing dataset json (/data/hyzhang10/.cache/huggingface/datasets/json/default-33b0ce470df9653c/0.0.0/a3e658c4731e59120d44081ac10bf85dc7e1388126b92338344ce9661907f253)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[2022-08-04 14:51:48,391] [WARNING] [deepspeed.py:629:_auto_select_batch_size] Tried to infer the batch size for internal deepspeed logging from the train_dataloader()
. To ensure DeepSpeed logging remains correct, please manually pass the plugin with the batch size, Trainer(strategy=DeepSpeedPlugin(logging_batch_size_per_gpu=batch_size))
.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
You have not specified an optimizer or scheduler within the DeepSpeed config. Using configure_optimizers
to define optimizer and scheduler.
/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/optimizers.py:37: UserWarning: LightningModule.configure_optimizers
returned None
, this fit will run with no optimizer
rank_zero_warn(
[2022-08-04 14:51:54,431] [WARNING] [engine.py:1126:_configure_optimizer] **** You are using ZeRO with an untested optimizer, proceed with caution *****
Using /data/hyzhang10/.cache/torch_extensions as PyTorch extensions root...
/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/torch/utils/cpp_extension.py:311: UserWarning:
!! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (g++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! WARNING !!
warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
Emitting ninja build file /data/hyzhang10/.cache/torch_extensions/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.10.2.git.kitware.jobserver-1
Loading extension module utils...
Traceback (most recent call last):
File "finetune_pegasus_summary.py", line 330, in
main()
File "finetune_pegasus_summary.py", line 320, in main
trainer.fit(model, data_model)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1188, in _run
self._pre_dispatch()
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1223, in _pre_dispatch
self.accelerator.pre_dispatch(self)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 136, in pre_dispatch
self.training_type_plugin.pre_dispatch()
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 389, in pre_dispatch
self.init_deepspeed()
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 459, in init_deepspeed
self._initialize_deepspeed_train(model)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 492, in _initialize_deepspeed_train
model, deepspeed_optimizer = self._setup_model_and_optimizer(model, optimizer, scheduler)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 423, in _setup_model_and_optimizer
deepspeed_engine, deepspeed_optimizer, _, _ = deepspeed.initialize(
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/init.py", line 119, in initialize
engine = DeepSpeedEngine(args=args,
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 291, in init
self._configure_optimizer(optimizer, model_parameters)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1129, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1350, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer(
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 141, in init
util_ops = UtilsBuilder().load()
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 403, in load
return self.jit_load(verbose)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 435, in jit_load
op_module = load(
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1079, in load
return _jit_compile(
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1317, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1699, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "/data/hyzhang10/environment/miniconda3/envs/summaryfengshen/lib/python3.8/imp.py", line 296, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'utils'
longformer model训练报错
使用该分布式训练方式时,系统报错
入口使用方式:python -m torch.distributed.launch --nproc_per_node $NUM_GPU --master_port $PORT_ID finetune.py
错误信息:
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward
function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint
functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
单卡训练时
入口方式python finetune.py是正常的
请问,可能的原因是什么呢?
使用DeBertaV2做分类任务,采用Erlangshen-DeBERTa-v2-97M-Chinese中文预训练权重
环境如下:cuda11.2 torch 1.8.1+cu111 python 3.7.7 transformers 4.21.1
运行同样的代码2次结果不一样,同样的环境和参数,设置了随机种子
日志信息如下:
`(hy_py37_torch) [root@localhost ccf_fewshot_classification]# python train_patent_bert_kfold.py
/home/kedu/opt/anaconda3/envs/hy_py37_torch/lib/python3.7/site-packages/sklearn/utils/validation.py:37: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'
2022-09-08 17:21:05,479 train_patent_bert_kfold.py [line:95] INFO submit_path------submit/submit_title_abstract_ernie_5fold_integrate_logit_2022-09-08_20.csv
2022-09-08 17:21:05,479 train_patent_bert_kfold.py [line:97] INFO Namespace(accumulation_steps=1, adversarial_type='PGD', batch_size=16, bert_type='deberta', data_type='title_abstract', device='0', duplicate=1, epochs=5, integrate_type='logit', is_adversarial=True, is_masklm=False, is_prompt=False, lr=2e-05, max_len=460, model_out='./output/patent/', pretrained='./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese', prompt_text='[SEP]专利类别[MASK]', random_seed=100, test_file='./data/testA.json', train_file='./data/train.json')
2022-09-08 17:21:05,479 train_patent_bert_kfold.py [line:98] INFO data_type--------title_abstract
2022-09-08 17:21:05,480 train_patent_bert_kfold.py [line:99] INFO patentBert---------./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese
2022-09-08 17:21:05,544 train_patent_bert_kfold.py [line:352] INFO test_datas: 20839
2022-09-08 17:21:05,546 train_patent_bert_kfold.py [line:357] INFO train_datas: 958
tokenization: 20839it [00:15, 1306.00it/s]
2022-09-08 17:21:21,505 train_patent_bert_kfold.py [line:125] INFO ================fold 0===============
2022-09-08 17:21:21,505 train_patent_bert_kfold.py [line:128] INFO save_path---------./output/patent/deberta_186M_title_abstract_2022-09-08_fold_0
Some weights of the model checkpoint at ./pretrained_models/torch/Erlangshen-DeBERTa-v2-97M-Chinese were not used when initializing PatentDeBertaV2: ['cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias']
代码在附件中
The error is as follows:
File "roformer/modeling_roformer.py", line 333, in forward
attention_scores = attention_scores.masked_fill(attention_mask,
RuntimeError: value cannot be converted to type at::Half without overflow: -1e+08
What should I do to solve this problem ?
clip_finetune_flickr 这里的代码只能运行在一个gpu吗,如果多gpus 时候怎么办呢
您好,我最近正在使用您开源的模型做小样本分类任务,但是在推理结果中,有时还会遇到两个entity_type都会判定为1的情况,我想通过score来取,但是并不是每一个字段都有score。请问是否有参数可控,来强制输出每个字段的score?
例子:
{entity_type: '0', 'label': 0, 'entity_list': []}
{entity_type: '1', 'label': 1, 'entity_list': [], 'score':0.1123412}
{entity_type: '2', 'label': 1, 'entity_list': []}
{entity_type: '3', 'label': 0, 'entity_list': []}
如上所示,在1,2中,模型都判定为1,此时我想通过score来获取最终结果,但是 entity_type = 2的字典中,没有输出score。 请问你们是如何处理这种情况的
感谢大家的一系列开源工作,我有个疑问;我这里transformers-4.9.2似乎已有RoFormer结构相关实现,是与本项目中的实现有差异么
请问是否有二郎神预训练的实验细节呢,谢谢!
Hi, thanks for the great job. Is Erlangshen-3.9B model only use data parallelism ? I did not found any model parallelism in Erlangshen code for 3.9B size model. Can you provide the Erlangshen Big model pretraining detail ? Why the script pretrain_erlangshen_3.9B.sh
has been removed in master ?
import argparse
from fengshen import UbertPiplines
total_parser = argparse.ArgumentParser("TASK NAME")
total_parser = UbertPiplines.piplines_args(total_parser)
args = total_parser.parse_args()
args.pretrained_model_path = 'IDEA-CCNL/Erlangshen-Ubert-110M-Chinese' #预训练模型路径
test_data=[
{
"task_type": "抽取任务",
"subtask_type": "关系抽取",
"text": "姚明妻子叶莉罕见现身!39岁气质出众端庄,姚明却发福严重",
"choices": [
{"entity_type": "夫妻关系"}
],
"id": 0}
]
model = UbertPiplines(args)
result = model.predict(test_data)
for line in result:
print(line)
大佬们好,
只在封神榜文档中看到闻仲大模型的下载和生成案例。并没有看到如何使用自己的数据集进行finetune的案例。我的数据集是单论对话案例,即{用户问,AI答}的样例,不知道如何使用闻仲模型进行finetune。能否请各位大佬提供一个torch的finetune例子。
谢谢
老师我用了你们的框架,发现都是pl写的,然后我也去试了下,但是我没有slurm,试了很多种方法,但是都卡在开头,没有启动起来,就想问您是不是也遇到了这些情况才使用的slurm?
我用这个minimal version 在两台机器上各两张卡启动
用的链接里的这个方法 卡在开头
https://www.pudn.com/news/6313752788df2007aa1b6f42.html
然后pl官方的教程里面的也尝试了 也卡住了 就是没用slurm
# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))
# define the LightningModule
class LitAutoEncoder(pl.LightningModule):
def __init__(self, encoder, decoder):
super().__init__()
self.encoder = encoder
self.decoder = decoder
def training_step(self, batch, batch_idx):
# training_step defines the train loop.
# it is independent of forward
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
# Logging to TensorBoard by default
self.log("train_loss", loss)
return loss
def configure_optimizers(self):
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return optimizer
# init the autoencoder
autoencoder = LitAutoEncoder(encoder, decoder)
# setup data
dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset)
# train the model (hint: here are some helpful Trainer arguments for rapid idea iteration)
trainer = pl.Trainer(limit_train_batches=500, accelerator='gpu', devices=2,max_epochs=5,strategy='ddp',num_nodes=2)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)
比如'bert.encoder.layer.23.attention.ln.bias', 'bert.encoder.layer.15.attention.ln.bias'
ln在Huggieface里应该是LayerNorm
是否有办法在5*3090的机器上对闻仲进行finetune
尝试微调Wenzhong-GPT2-3.5B报错, 具体报错信息如下:
Using pad_token, but it is not set yet.
训练集处理进度: 100%|████████████████████████████████████████████████████████████████| 3774619/3774619 [00:41<00:00, 90473.26it/s]
Using pad_token, but it is not set yet.
验证集处理进度: 100%|████████████████████████████████████████████████████████████████████| 19220/19220 [00:00<00:00, 60371.62it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|██████████████████████████████████████████████████████████████████████| 2409/2409 [00:00<00:00, 67752.58it/s]
num_data: 3774619
/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:446: LightningDeprecationWarning: Setting Trainer(gpus=4)
is deprecated in v1.7 and will be removed in v2.0. Please use Trainer(accelerator='gpu', devices=4)
instead.
f"Setting Trainer(gpus={gpus!r})
is deprecated in v1.7 and will be removed"
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/opt/conda/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 4 leaked semaphores to clean up at shutdown
len(cache))
Bus error (core dumped)
在学习使用官方提供的二郎神1.3B模型预训练脚本pretrain_erlangshen_base.sh时,其中replace_sampler_ddp设置为False,那么在train_dataloader中将会使用自定义的batch_sampler,我看到是通过get_custom_sampler这一函数实现的,其中consumed_samples这个参数的意义和计算逻辑是怎么样的呢?它指的是已经经过训练的样本数量吗?我看到它在模型开始训练时值输出为0,不知道这块有没有问题?自己看代码有点没搞懂,希望可以请假一下专家老师
感谢提供的中文大模型。
燃灯 T5 生成模型使用目录 fengshen
中的 T5ForConditionalGeneration
,对于 README 中例子无法正常生成
输入:北京是**的<extra_id_0>
生成结果:'[PAD] <extra_id_0> [eos] <extra_id_0> 。 [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos] [eos]'
另外,例子代码貌似有问题,需要传入 batch 才能生成。
I have download the checkpoint Taiyi-CLIP-Roberta-102M-Chinese, but the recall of coco-cn test data is not the same as described, can you offer the evaluation code. Thanks
Traceback (most recent call last):
File "finetune.py", line 20, in <module>
from model.roformer.modeling_roformer import RoFormerModel, RoFormerForMaskedLM, RoFormerForSequenceClassification
File "model/roformer/modeling_roformer.py", line 894, in <module>
class RoFormerModel(RoFormerPreTrainedModel):
File "model/roformer/modeling_roformer.py", line 934, in RoFormerModel
@add_code_sample_docstrings(
TypeError: add_code_sample_docstrings() got an unexpected keyword argument 'tokenizer_class'
huggingface/transformers@f5af873
replace tokenizer_class to processor_class
希望可以尽快开源 DeBerta continue training 的相关代码。感谢
`
from transformers import PegasusForConditionalGeneration
from tokenizers_pegasus import PegasusTokenizer
model = PegasusForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese")
tokenizer = PegasusTokenizer.from_pretrained("IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese")
text = "据微信公众号“界面”报道,4日上午10点左右,**发改委反垄断调查小组突击查访奔驰上海办事处,调取数据材料,并对多名奔驰高管进行了约谈。截止昨日晚9点,包括北京梅赛德斯-奔驰销售服务有限公司东区总经理在内的多名管理人员仍留在上海办公室内"
inputs = tokenizer(text, max_length=1024, return_tensors="pt")
summary_ids = model.generate(inputs["input_ids"])
tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
`
效果远不如base 版本,甚至不如微调版本
模型上传错误?
根据megatron_t5代码的备注,区别在于layer norm 和 几个bias=True
请问我的理解对吗
最近一些试用反馈了环境问题,可以提供一个运行封神榜所需要安装环境的docker镜像和示例,方便使用。
使用的模型是:
IDEA-CCNL/Erlangshen-MegatronBert-1.3B
按照下列的流程将文本转为向量:
from sentence_transformers import util
print(util.cos_sim(encode(["今天天气真好"]), encode(["天天向上"])))
输出:
tensor([[0.5146]], device='cuda:0')
你好,请问finetuning至少需要多少内存呢?我尝试运行finetune_classification.sh,但是一直报如下错误:
fengshen/examples/classification/finetune_classification.sh: line 74: 28016 Bus error (core dumped) python3 $SCRIPT_PATH $options
。网上有人说这个问题是因为内存不够。可是我看硬盘还有670G。
你好,我使用Wenzhong2.0-GPT2-3.5B在下游任务微调后,预测结果是这种乱码,请问有解决办法吗?谢谢🙏
预测生成代码:
tokenizer = GPT2Tokenizer.from_pretrained(model_path)
model = GPT2LMHeadModel.from_pretrained(model_path)
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
print(generator(context, max_length=100, num_return_sequences=1))
根据context生成结果:
grinning pres ausp grinning grinning pres spectator Romo Wad Wad grinning 283 intellectual restricts Spartans mogul [jiang Walter'}
hi,各位作者好
我在zhihu看到了项目的简介,以及放出的榜单,很感兴趣。
这几天我正在尝试复现这个工作,但是我在base 模型下,coco-cn的评估数据结果和目前公布的数据还有较大的差距,后续会放出训练的细节吗?
我可以先说一下我的训练细节:我是用moco + 对比学习,adam优化器,初始学习率e-4,学习率warm_up + polydecay,4 * 8 a100 多机训练,bs256,大约训练了80w步,目前coco-cn只能到80+。
你好,我想请问下,如何使用IDEA-CCNL/Taiyi-CLIP-Roberta-large-326M-Chinese在自己的数据集上进行微调?谢谢!
你好,我看到你们提供了longformer加载模型的方式,能否再提供一下使用longformer做生成摘要任务时的代码
我正在微调最近发布的IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese模型。
使用的脚本是wenzhong_qa,基于我们的业务场景进行了调整。
由于机器配置的限制,我们想结合DeepSpeed的ZeRO-3进行训练,但似乎并没有对模型参数进行切分。
机器配置如下,8块1080Ti:
Every 1.0s: nvidia-smi office4: Tue Aug 9 18:43:35 2022
Tue Aug 9 18:43:35 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:04:00.0 Off | N/A |
| 23% 29C P8 8W / 250W | 8MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:05:00.0 Off | N/A |
| 23% 26C P8 9W / 250W | 8MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:08:00.0 Off | N/A |
| 23% 25C P8 8W / 250W | 8MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... Off | 00000000:09:00.0 Off | N/A |
| 23% 25C P8 9W / 250W | 8MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA GeForce ... Off | 00000000:84:00.0 Off | N/A |
| 23% 28C P8 10W / 250W | 8MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA GeForce ... Off | 00000000:85:00.0 Off | N/A |
| 23% 26C P8 8W / 250W | 8MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA GeForce ... Off | 00000000:88:00.0 Off | N/A |
| 23% 26C P8 9W / 250W | 8MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA GeForce ... Off | 00000000:89:00.0 Off | N/A |
| 23% 24C P8 8W / 250W | 8MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2464 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2464 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2464 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 2464 G /usr/lib/xorg/Xorg 4MiB |
| 4 N/A N/A 2464 G /usr/lib/xorg/Xorg 4MiB |
| 5 N/A N/A 2464 G /usr/lib/xorg/Xorg 4MiB |
| 6 N/A N/A 2464 G /usr/lib/xorg/Xorg 4MiB |
| 7 N/A N/A 2464 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
Finetune的shell脚本:
#!/bin/bash
set -x -e
echo "START TIME: $(date)"
MICRO_BATCH_SIZE=1
ROOT_DIR=$(pwd)
ZERO_STAGE=3
config_json="$ROOT_DIR/training_config.json"
export MASTER_PORT=$[RANDOM%10000+30000]
# Deepspeed figures out GAS dynamically from dynamic GBS via set_train_batch_size()
cat <<EOT > $config_json
{
"train_micro_batch_size_per_gpu": $MICRO_BATCH_SIZE,
"steps_per_print": 1000,
"gradient_clipping": 1,
"zero_optimization": {
"stage": ${ZERO_STAGE},
"allgather_partitions": true,
"allgather_bucket_size": 1e7,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 1e7,
"contiguous_gradients": true,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"stage3_max_live_parameters" : 1e7,
"stage3_max_reuse_distance" : 1e7,
"stage3_prefetch_bucket_size": 1e7,
"stage3_param_persistence_threshold": 1e7,
"sub_group_size" : 1e7,
"round_robin_gradients": true
},
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": 0.001,
"betas": [
0.8,
0.999
],
"eps": 1e-8,
"weight_decay": 3e-7
}
}
}
EOT
export PL_DEEPSPEED_CONFIG_PATH=$config_json
TRAINER_ARGS="
--max_epochs 10 \
--gpus 8 \
--num_nodes 1 \
--strategy deepspeed_stage_${ZERO_STAGE}_offload \
--default_root_dir $ROOT_DIR \
--dirpath $ROOT_DIR/ckpt \
--save_top_k 3 \
--monitor train_loss \
--mode min \
--save_last \
"
DATA_DIR=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets
DATA_ARGS="
--data_dir $DATA_DIR \
--train_batchsize $MICRO_BATCH_SIZE \
--valid_batchsize $MICRO_BATCH_SIZE \
--train_data train.txt \
--valid_data valid.txt \
--test_data test.txt
"
PRETRAINED_MODEL_PATH="IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese"
MODEL_ARGS="
--pretrained_model_path ${PRETRAINED_MODEL_PATH} \
--output_save_path $ROOT_DIR/predict.json \
--learning_rate 1e-4 \
--weight_decay 0.1 \
--warmup 0.01 \
"
SCRIPTS_PATH=${ROOT_DIR}/finetune_gpt2.py
export CMD=" \
$SCRIPTS_PATH \
$TRAINER_ARGS \
$MODEL_ARGS \
$DATA_ARGS \
"
python ${CMD}
Finetune的Python脚本:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"
import argparse
import torch as th
import pytorch_lightning as pl
from transformers import GPT2LMHeadModel
from pytorch_lightning import Trainer, loggers
from pytorch_lightning.callbacks import ModelCheckpoint
from transformers.optimization import get_linear_schedule_with_warmup
from dataset import GPT2DataModel
class GPT2FinetuneMedicalQAModelCheckpoint:
@staticmethod
def add_argparse_args(parent_args):
parser = parent_args.add_argument_group('BaseModel')
parser.add_argument('--monitor', default='train_loss', type=str)
parser.add_argument('--mode', default='min', type=str)
parser.add_argument('--dirpath', default='./ckpt/', type=str)
parser.add_argument('--filename', default='model-{epoch:02d}-{train_loss:.4f}', type=str)
parser.add_argument('--save_last', action='store_true', default=True)
parser.add_argument('--save_top_k', default=3, type=float)
parser.add_argument('--every_n_train_steps', default=1000, type=float)
parser.add_argument('--save_weights_only', default=True, type=bool)
return parent_args
def __init__(self, args):
self.callbacks = ModelCheckpoint(monitor=args.monitor, save_top_k=args.save_top_k, mode=args.mode,
save_weights_only=args.save_weights_only, dirpath=args.dirpath,
filename=args.filename, save_last=args.save_last)
class GPT2Finetune(pl.LightningModule):
@staticmethod
def add_model_specific_args(parent_args):
parser = parent_args.add_argument_group("BaseModel")
parser.add_argument("--learning_rate", default=1e-4, type=float)
parser.add_argument("--weight_decay", default=0.1, type=float)
parser.add_argument("--warmup", default=0.01, type=float)
return parent_args
def __init__(self, args, num_data):
super().__init__()
self.args = args
self.num_data = num_data
print('num_data:', num_data)
self.model = GPT2LMHeadModel.from_pretrained(args.pretrained_model_path)
def setup(self, stage) -> None:
if stage == 'fit':
num_gpus = self.trainer.gpus if self.trainer.gpus is not None else 0
self.total_step = int(self.trainer.max_epochs * self.num_data /
(max(1, num_gpus) * self.trainer.accumulate_grad_batches))
print('Total training step:', self.total_step)
def training_step(self, batch, batch_idx):
output = self.model(input_ids=batch['input_ids'], attention_mask=batch['attention_mask'],
labels=batch['labels'])
# output = self.model(input_ids=batch['input_ids'], labels=batch['labels'])
# acc = self.comput_metrix(output.logits, batch['labels'])
self.log('train_loss', output.loss)
return output.loss
def comput_metrix(self, logits, labels):
y_pred = th.argmax(logits, dim=-1)
y_pred = y_pred.view(size=(-1,))
y_true = labels.view(size=(-1,)).float()
corr = th.eq(y_pred, y_true)
acc = th.sum(corr.float()) / labels.size()[0]
return acc
def validation_step(self, batch, batch_idx):
output = self.model(input_ids=batch['input_ids'], attention_mask=batch['attention_mask'],
labels=batch['labels'])
self.log('val_loss', output.loss)
def configure_optimizers(self):
no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
paras = list(filter(lambda p: p[1].requires_grad, self.named_parameters()))
paras = [{
'params':
[p for n, p in paras if not any(nd in n for nd in no_decay)],
'weight_decay': self.args.weight_decay
}, {
'params': [p for n, p in paras if any(nd in n for nd in no_decay)],
'weight_decay': 0.0
}]
optimizer = th.optim.AdamW(paras, lr=self.args.learning_rate)
scheduler = get_linear_schedule_with_warmup(
optimizer, int(self.total_step * self.args.warmup),
self.total_step)
return [{
'optimizer': optimizer,
'lr_scheduler': {
'scheduler': scheduler,
'interval': 'step',
'frequency': 1
}
}]
def main():
total_parser = argparse.ArgumentParser("Summary Task")
total_parser.add_argument('--do_eval_only', action='store_true', default=False)
total_parser.add_argument('--pretrained_model_path', default=None, type=str)
total_parser.add_argument('--output_save_path', default='./predict.json', type=str)
# * Args for data preprocessing
total_parser = GPT2DataModel.add_data_specific_args(total_parser)
# * Args for training
total_parser = Trainer.add_argparse_args(total_parser)
total_parser = GPT2FinetuneMedicalQAModelCheckpoint.add_argparse_args(total_parser)
total_parser = GPT2Finetune.add_model_specific_args(total_parser)
# * Args for base model
args = total_parser.parse_args()
data_model = GPT2DataModel(args)
model = GPT2Finetune(args, len(data_model.train_dataloader()))
checkpoint_callback = GPT2FinetuneMedicalQAModelCheckpoint(args).callbacks
logger = loggers.TensorBoardLogger(save_dir=os.path.join(args.default_root_dir, 'log/'), name='MedicalQA-GPT2')
trainer = Trainer.from_argparse_args(args, logger=logger, callbacks=[checkpoint_callback])
trainer.fit(model, data_model)
model.model.save_pretrained("./models/finetune/gpt2")
if __name__ == '__main__':
main()
调用脚本之后,前面加载数据正常,但后面开始训练后报错,日志:
$ bash finetune_gpt2.sh
++ date
+ echo 'START TIME: 2022年 08月 09日 星期二 18:49:15 CST'
START TIME: 2022年 08月 09日 星期二 18:49:15 CST
+ MICRO_BATCH_SIZE=1
++ pwd
+ ROOT_DIR=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog
+ ZERO_STAGE=3
+ config_json=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/training_config.json
+ export MASTER_PORT=30021
+ MASTER_PORT=30021
+ cat
+ export PL_DEEPSPEED_CONFIG_PATH=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/training_config.json
+ PL_DEEPSPEED_CONFIG_PATH=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/training_config.json
+ TRAINER_ARGS='
--max_epochs 10 --gpus 8 --num_nodes 1 --strategy deepspeed_stage_3_offload --default_root_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog --dirpath /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/ckpt --save_top_k 3 --monitor train_loss --mode min --save_last '
+ DATA_DIR=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets
+ DATA_ARGS='
--data_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets --train_batchsize 1 --valid_batchsize 1 --train_data train.txt --valid_data valid.txt --test_data test.txt
'
+ PRETRAINED_MODEL_PATH=IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese
+ MODEL_ARGS='
--pretrained_model_path IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese --output_save_path /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/predict.json --learning_rate 1e-4 --weight_decay 0.1 --warmup 0.01 '
+ SCRIPTS_PATH=/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py
+ export 'CMD= /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py
--max_epochs 10 --gpus 8 --num_nodes 1 --strategy deepspeed_stage_3_offload --default_root_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog --dirpath /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/ckpt --save_top_k 3 --monitor train_loss --mode min --save_last
--pretrained_model_path IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese --output_save_path /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/predict.json --learning_rate 1e-4 --weight_decay 0.1 --warmup 0.01
--data_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets --train_batchsize 1 --valid_batchsize 1 --train_data train.txt --valid_data valid.txt --test_data test.txt
'
+ CMD=' /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py
--max_epochs 10 --gpus 8 --num_nodes 1 --strategy deepspeed_stage_3_offload --default_root_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog --dirpath /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/ckpt --save_top_k 3 --monitor train_loss --mode min --save_last
--pretrained_model_path IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese --output_save_path /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/predict.json --learning_rate 1e-4 --weight_decay 0.1 --warmup 0.01
--data_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets --train_batchsize 1 --valid_batchsize 1 --train_data train.txt --valid_data valid.txt --test_data test.txt
'
+ python /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py --max_epochs 10 --gpus 8 --num_nodes 1 --strategy deepspeed_stage_3_offload --default_root_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog --dirpath /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/ckpt --save_top_k 3 --monitor train_loss --mode min --save_last --pretrained_model_path IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese --output_save_path /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/predict.json --learning_rate 1e-4 --weight_decay 0.1 --warmup 0.01 --data_dir /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/datasets --train_batchsize 1 --valid_batchsize 1 --train_data train.txt --valid_data valid.txt --test_data test.txt
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1141178.50it/s]
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 744991.83it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 801970.17it/s]
num_data: 234801
Loading DeepSpeed config from set PL_DEEPSPEED_CONFIG_PATH environment variable
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1466089.81it/s]
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 725658.13it/s]
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1313860.90it/s]
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1346486.25it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 735842.81it/s]
num_data: 234801
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 719434.65it/s]
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 719434.65it/s]
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1120199.53it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 747647.77it/s]
num_data: 234801
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1271098.49it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 775287.25it/s]
num_data: 234801
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/8
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 741043.11it/s]
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1259098.78it/s]
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 782519.40it/s]
Using pad_token, but it is not set yet.
训练集处理进度: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234801/234801 [00:00<00:00, 1224649.98it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 798915.05it/s]
num_data: 234801
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 700217.70it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 659481.76it/s]
num_data: 234801
Using pad_token, but it is not set yet.
验证集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 661562.15it/s]
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 812849.61it/s]
num_data: 234801
Using pad_token, but it is not set yet.
测试集处理进度: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 679789.95it/s]
num_data: 234801
initializing deepspeed distributed: GLOBAL_RANK: 1, MEMBER: 2/8
initializing deepspeed distributed: GLOBAL_RANK: 3, MEMBER: 4/8
initializing deepspeed distributed: GLOBAL_RANK: 2, MEMBER: 3/8
initializing deepspeed distributed: GLOBAL_RANK: 4, MEMBER: 5/8
initializing deepspeed distributed: GLOBAL_RANK: 6, MEMBER: 7/8
initializing deepspeed distributed: GLOBAL_RANK: 5, MEMBER: 6/8
initializing deepspeed distributed: GLOBAL_RANK: 7, MEMBER: 8/8
Total training step: 293501
Total training step: 293501
Total training step: 293501
Total training step: 293501
Total training step: 293501
Total training step: 293501
Total training step: 293501
/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:611: UserWarning: Checkpoint directory /home/liuzhaofeng/nlg_pipeline/gpt2/dialog/ckpt exists and is not empty.
rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:2192: LightningDeprecationWarning: `Trainer.gpus` was deprecated in v1.6 and will be removed in v1.8. Please use `Trainer.num_devices` or `Trainer.device_ids` to get device information instead.
rank_zero_deprecation(
Total training step: 293501
LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
You have specified an optimizer and/or scheduler within the DeepSpeed config. It is recommended to define it in `LightningModule.configure_optimizers`.
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/liuzhaofeng/.cache/torch_extensions/py39_cu113/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.354057550430298 seconds
Loading extension module cpu_adam...
Loading extension module cpu_adam...
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.3590314388275146 seconds
Time to load cpu_adam op: 3.3760969638824463 seconds
Time to load cpu_adam op: 3.365596055984497 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.258009910583496 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.3222544193267822 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.3342092037200928 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.2041637897491455 seconds
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Emitting ninja build file /home/liuzhaofeng/.cache/torch_extensions/py39_cu113/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.5016887187957764 seconds
Loading extension module utils...
Loading extension module utils...
Time to load utils op: 0.20363640785217285 seconds
Time to load utils op: 0.10370874404907227 seconds
Loading extension module utils...
Time to load utils op: 0.2036135196685791 seconds
Loading extension module utils...
Time to load utils op: 0.20850586891174316 seconds
Loading extension module utils...
Time to load utils op: 0.20370721817016602 seconds
Loading extension module utils...
Loading extension module utils...
Time to load utils op: 0.20325350761413574 seconds
Time to load utils op: 0.2022240161895752 seconds
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Loading extension module utils...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Time to load utils op: 0.0008826255798339844 seconds
Time to load utils op: 0.0008733272552490234 seconds
Time to load utils op: 0.0008597373962402344 seconds
Time to load utils op: 0.0008990764617919922 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
Time to load utils op: 0.0011043548583984375 seconds
Time to load utils op: 0.0009999275207519531 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0008752346038818359 seconds
Using /home/liuzhaofeng/.cache/torch_extensions/py39_cu113 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0008261203765869141 seconds
| Name | Type | Params
------------------------------------------
0 | model | GPT2LMHeadModel | 364
------------------------------------------
364 Trainable params
0 Non-trainable params
364 Total params
0.001 Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Epoch 0: 0%| | 0/29364 [00:00<?, ?it/s]/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py:420: UserWarning: Error handling mechanism for deadlock detection is uninitialized. Skipping check.
rank_zero_warn("Error handling mechanism for deadlock detection is uninitialized. Skipping check.")
Traceback (most recent call last):
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 147, in <module>
main()
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 141, in main
trainer.fit(model, data_model)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
self._call_and_handle_interrupt(
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1234, in _run
results = self._run_stage()
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1321, in _run_stage
return self._run_train()
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1351, in _run_train
self.fit_loop.run()
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 268, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
batch_output = self.batch_loop.run(batch, batch_idx)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
result = self._run_optimization(
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1593, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1644, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step
optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/deepspeed.py", line 70, in optimizer_step
closure_result = closure()
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
self._result = self.closure(*args, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
step_output = self._step_fn()
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1763, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 341, in training_step
return self.model(*args, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1588, in forward
loss = self.module(*inputs, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
result = forward_call(*input, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/strategies/deepspeed.py", line 80, in forward
return super().forward(*inputs, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward
output = self.module.training_step(*inputs, **kwargs)
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 76, in training_step
output = self.model(input_ids=batch['input_ids'], attention_mask=batch['attention_mask'],
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
result = forward_call(*input, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1058, in forward
transformer_outputs = self.transformer(
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
result = forward_call(*input, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 901, in forward
outputs = block(
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
result = forward_call(*input, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 438, in forward
feed_forward_hidden_states = self.mlp(hidden_states)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
result = forward_call(*input, **kwargs)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 365, in forward
hidden_states = self.c_fc(hidden_states)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1137, in _call_impl
result = hook(self, input)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 1408, in _pre_forward_module_hook
self.pre_sub_module_forward_function(module)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 1520, in pre_sub_module_forward_function
self.param_coordinator.fetch_sub_module(sub_module)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 448, in fetch_sub_module
self._all_gather(partitioned_params, async_op=False)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 525, in _all_gather
handles = partitioned_params[0].all_gather(
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 596, in all_gather
return self._all_gather(param_list, async_op=async_op, hierarchy=hierarchy)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 705, in _all_gather
Traceback (most recent call last):
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 147, in <module>
ret_value = self._allgather_params_coalesced(all_gather_list, hierarchy)
File "/datafile/liuzhaofeng/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 936, in _allgather_params_coalesced
main()
File "/home/liuzhaofeng/nlg_pipeline/gpt2/dialog/finetune_gpt2.py", line 141, in main
flat_tensor = torch.empty(tensor_size,
RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 3; 10.92 GiB total capacity; 10.02 GiB already allocated; 32.69 MiB free; 10.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
由于1080Ti的显存是11G,而经过我们计算3.5B的模型仅参数就13G,因此想通过ZeRO-3将参数切分到8块GPU,但是我们脚本启动之后我们监控了GPU的显存变换,似乎并没有将模型进行切分。
可以发现每块GPU上显存都占满了,似乎并没有对模型进行切分。
想请教一下:
非常感谢,如果能帮忙解答以上问题的话,感激不尽!
这名字太有趣了QAQ
各位大佬好,非常感谢IDEA开源的这些预训练模型,给我们带来了很大的帮助。
我们最近在尝试微调Wenzhong2.0-GPT2-3.5B-chinese这个模型,但受限于机器的配置,虽然单机多卡但并不能加载整个模型。
我注意到HuggingFace上提到这个模型在训练时采用了32块A100,所以想请教一下训练的时候所采用的技术架构,如果方便的话,还请帮忙解答以下问题:
非常感谢,如果能帮忙解答以上问题的话,感激不尽,
1.openai原版的clip官方demo:一张图片,三个标签,使用cuda的情况下,只需要4s
2.您这里的clip的官方demo:一张图片,五个标签,默认不使用cuda,需要30s
好不容易改造了代码,使用cuda的情况下,需要50s
这么神奇的操作,我也是服了
请问官方可以早日出cuda加速代码么
非常感谢开源。有几个问题请教一下:
1、在加载下面代码时,发现程序报错了,但是将AutoTokenizer
替换为BertTokenizer
程序就对了!
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(Path('E://work//Dialog//checkpoints//Zhouwenwang-110M'))
ValueError: unable to parse E:\work\Dialog\checkpoints\Zhouwenwang-110M\tokenizer_config.json as a URL or as a local path
transformers版本为 4.15.0
Python 版本为 3.7
torch 版本为 1.7.1
请问是版本不对还是缺少tokenizer_config.json
文件
2、我发现110M和1.3B两个模型config.json
中model_type
不一样前者是megatron-bert
,后者是bert
。但是加载的都是RoFormerModel
,所以是不是不用关心model_type
。但是发现在pretraining.py
中,对于下面的代码
model_mlm_type = {'bert': BertForMaskedLM,
'roformer': RoFormerForMaskedLM,
'megatron': MegatronBertForMaskedLM}
不同的model_type
所采用的MaskedLM
是不一样的。
3、对于给的例程,我发现predict
的时候是用当前词去预测下一个词的,但是pretraining
的时候是将当前位置用[MASK]
代替,然后让模型对于source和target采用不同的attention_mask方式去预测的。这样的话predict
个pretraining
是不一至的,模型在predict
的时候又怎么能够去根据当前词去预测下一个词呢?请问我是不是什么地方理解的不对。因为用下面的例程可以有很好的预测结果,因此不明白1.3B
这个模型具体是怎么训练的?
from model.roformer.modeling_roformer import RoFormerModel
from transformers import BertTokenizer
import torch
import numpy as np
from pathlib import Path
sentence = '清华大学位于'
max_length = 32
tokenizer = BertTokenizer.from_pretrained(Path('E://work//Dialog//checkpoints//Zhouwenwang-1.3B'))
model = RoFormerModel.from_pretrained(Path('E://work//Dialog//checkpoints//Zhouwenwang-1.3B'))
# model = model.to(torch.device('cuda'))
for i in range(max_length):
encode = torch.tensor(
[[tokenizer.cls_token_id]+tokenizer.encode(sentence, add_special_tokens=False)]).long()
# encode = encode.to(torch.device('cuda'))
print(encode)
logits = model(encode)[0]
logits = torch.nn.functional.linear(
logits, model.embeddings.word_embeddings.weight)
logits = torch.nn.functional.softmax(
logits, dim=-1).cpu().detach().numpy()[0]
sentence = sentence + \
tokenizer.decode(int(np.random.choice(logits.shape[1], p=logits[-1])))
if sentence[-1] == '。':
break
print(sentence)
非常期待得到您的回复!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.