运行bert蒸馏到4层的示例时出现如下问题 Defaults for this optimization level are:

我把这行代码删了能运行了，但是根本问题在哪还不知道 <div class="Box-h

RuntimeError: Incoming model is an instance of torch.nn.parallel.DataParallel. Parallel wrappers should only be applied to the model(s) AFTER the model(s) have been returned from amp.initialize. about textbrewer HOT 4 CLOSED

airaria commented on May 27, 2024

RuntimeError: Incoming model is an instance of torch.nn.parallel.DataParallel. Parallel wrappers should only be applied to the model(s) AFTER the model(s) have been returned from amp.initialize.

from textbrewer.

Comments (4)

Youarerare commented on May 27, 2024

我把这行代码删了能运行了，但是根本问题在哪还不知道

TextBrewer/examples/mnli_example/main.distill.py

Line 43 in ba8ad6d

torch.distributed.init_process_group(backend='nccl')

from textbrewer.

airaria commented on May 27, 2024

是在同时使用多卡计算和fp16时出现问题吗？使用的是数据并行方式DataParallel还是分布式并行DistributedDataParallel？

from textbrewer.

Youarerare commented on May 27, 2024

采用如下配置蒸馏自定义的三层模型同样报上面的错误
device : cuda:5
fp16 : True
fp16_opt_level : O1
data_parallel : False
local_rank : -1

完整错误信息：
2021/03/30 17:49:30 - INFO - Main - output_dir:/data/private/syk/zyb/TextMatch/Bert/mnli_t8_TbaseST3tiny_L3SmmdMSE_lr10e40_bs16
2021/03/30 17:49:30 - INFO - Main - data_dir:/data/private/syk/zyb/TextMatch/data/MNLI
2021/03/30 17:49:30 - INFO - Main - max_seq_length:512
2021/03/30 17:49:30 - INFO - Main - do_train:True
2021/03/30 17:49:30 - INFO - Main - do_predict:True
2021/03/30 17:49:30 - INFO - Main - train_batch_size:16
2021/03/30 17:49:30 - INFO - Main - predict_batch_size:8
2021/03/30 17:49:30 - INFO - Main - learning_rate:0.0001
2021/03/30 17:49:30 - INFO - Main - num_train_epochs:40.0
2021/03/30 17:49:30 - INFO - Main - warmup_proportion:0.1
2021/03/30 17:49:30 - INFO - Main - no_cuda:False
np_resource = np.dtype([("resource", np.ubyte, 1)])
2021/03/30 17:49:30 - INFO - faiss - Loading faiss with AVX2 support.
2021/03/30 17:49:30 - INFO - faiss - Loading faiss.
2021/03/30 17:49:30 - INFO - Main - output_dir:/data/private/syk/zyb/TextMatch/Bert/mnli_t8_TbaseST3tiny_L3SmmdMSE_lr10e40_bs16
2021/03/30 17:49:30 - INFO - Main - data_dir:/data/private/syk/zyb/TextMatch/data/MNLI
2021/03/30 17:49:30 - INFO - Main - max_seq_length:512
2021/03/30 17:49:30 - INFO - Main - do_train:True
2021/03/30 17:49:30 - INFO - Main - do_predict:True
2021/03/30 17:49:30 - INFO - Main - train_batch_size:16
2021/03/30 17:49:30 - INFO - Main - predict_batch_size:8
2021/03/30 17:49:30 - INFO - Main - learning_rate:0.0001
2021/03/30 17:49:30 - INFO - Main - gradient_accumulation_steps:1
2021/03/30 17:49:30 - INFO - Main - local_rank:-1
2021/03/30 17:49:30 - INFO - Main - fp16:True
2021/03/30 17:49:30 - INFO - Main - random_seed:9580
2021/03/30 17:49:30 - INFO - Main - weight_decay_rate:0.01
2021/03/30 17:49:30 - INFO - Main - do_eval:True
2021/03/30 17:49:30 - INFO - Main - data_dir:/data/private/syk/zyb/TextMatch/data/MNLI
2021/03/30 17:49:30 - INFO - Main - max_seq_length:512
2021/03/30 17:49:30 - INFO - Main - do_train:True
2021/03/30 17:49:30 - INFO - Main - do_predict:True
2021/03/30 17:49:30 - INFO - Main - train_batch_size:16
2021/03/30 17:49:30 - INFO - Main - predict_batch_size:8
2021/03/30 17:49:30 - INFO - Main - learning_rate:0.0001
2021/03/30 17:49:30 - INFO - Main - num_train_epochs:40.0
2021/03/30 17:49:30 - INFO - Main - warmup_proportion:0.1
2021/03/30 17:49:30 - INFO - Main - no_cuda:False
2021/03/30 17:49:30 - INFO - Main - gradient_accumulation_steps:1
2021/03/30 17:49:30 - INFO - Main - local_rank:-1
2021/03/30 17:49:30 - INFO - Main - fp16:True
2021/03/30 17:49:30 - INFO - Main - random_seed:9580
2021/03/30 17:49:30 - INFO - Main - weight_decay_rate:0.01
2021/03/30 17:49:30 - INFO - Main - do_eval:True
2021/03/30 17:49:30 - INFO - Main - PRINT_EVERY:200
2021/03/30 17:49:30 - INFO - Main - ckpt_frequency:1
2021/03/30 17:49:30 - INFO - Main - temperature:8.0
2021/03/30 17:49:30 - INFO - Main - teacher_cached:False
2021/03/30 17:49:30 - INFO - Main - task_name:mnli
2021/03/30 17:49:30 - INFO - Main - aux_task_name:None
2021/03/30 17:49:30 - INFO - Main - aux_data_dir:None
2021/03/30 17:49:30 - INFO - Main - matches:['L3_attention_mse', 'L3_hidden_smmd']
2021/03/30 17:49:30 - INFO - Main - model_config_json:/data/private/syk/zyb/TextMatch/Bert/mnli_t8_TbaseST3tiny_L3SmmdMSE_lr10e40_bs16/DIstillBertToT3.json.run
2021/03/30 17:49:30 - INFO - Main - do_test:False
2021/03/30 17:49:30 - WARNING - Main - Output directory () already exists and is not empty.
2021/03/30 17:49:30 - INFO - Main - device cuda:5 n_gpu 8 distributed training False
2021/03/30 17:49:30 - INFO - utils - Loading features from cached file /data/private/syk/zyb/TextMatch/data/MNLI/rbt_3_train_512_mnli
2021/03/30 17:50:07 - INFO - utils - Loading features from cached file /data/private/syk/zyb/TextMatch/data/MNLI/rbt_3_dev_512_mnli
2021/03/30 17:50:09 - INFO - utils - Loading features from cached file /data/private/syk/zyb/TextMatch/data/MNLI/rbt_3_dev_512_mnli-mm
2021/03/30 17:50:12 - INFO - Main - Data loaded
2021/03/30 17:50:14 - INFO - Main - Teacher Model bert loaded
2021/03/30 17:50:20 - INFO - Main - missing keys:['bert.embeddings.position_ids', 'classifier.weight', 'classifier.bias']
2021/03/30 17:50:20 - INFO - Main - unexpected keys:['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
2021/03/30 17:50:20 - INFO - Main - Student Model loaded
2021/03/30 17:50:20 - INFO - Main - Length of all_trainable_params: 2
2021/03/30 17:50:20 - INFO - Main - [{'layer_T': 4, 'layer_S': 1, 'feature': 'attention', 'loss': 'attention_mse', 'weight': 1}, {'layer_T': 8, 'layer_S': 2, 'feature': 'attention', 'loss': 'attention_mse', 'weight': 1}, {'layer_T': 12, 'layer_S': 3, 'feature': 'attention', 'loss': 'attention_mse', 'weight': 1}, {'layer_T': [0, 0], 'layer_S': [0, 0], 'feature': 'hidden', 'loss': 'mmd', 'weight': 1}, {'layer_T': [4, 4], 'layer_S': [1, 1], 'feature': 'hidden', 'loss': 'mmd', 'weight': 1}, {'layer_T': [8, 8], 'layer_S': [2, 2], 'feature': 'hidden', 'loss': 'mmd', 'weight': 1}, {'layer_T': [12, 12], 'layer_S': [3, 3], 'feature': 'hidden', 'loss': 'mmd', 'weight': 1}]
2021/03/30 17:50:20 - INFO - Main - gradient_accumulation_steps : 1
ckpt_frequency : 1
ckpt_epoch_frequency : 1
ckpt_steps : None
log_dir : /data/private/syk/zyb/TextMatch/Bert/mnli_t8_TbaseST3tiny_L3SmmdMSE_lr10e40_bs16
output_dir : /data/private/syk/zyb/TextMatch/Bert/mnli_t8_TbaseST3tiny_L3SmmdMSE_lr10e40_bs16
device : cuda:5
fp16 : True
fp16_opt_level : O1
data_parallel : False
local_rank : -1

2021/03/30 17:50:20 - INFO - Main - temperature : 8.0
temperature_scheduler : None
hard_label_weight : 0
hard_label_weight_scheduler : None
kd_loss_type : ce
kd_loss_weight : 1
kd_loss_weight_scheduler : None
probability_shift : False
intermediate_matches : [
IntermediateMatch: layer_T : 4, layer_S : 1, feature : attention, weight : 1, loss : attention_mse, proj : None,
IntermediateMatch: layer_T : 8, layer_S : 2, feature : attention, weight : 1, loss : attention_mse, proj : None,
IntermediateMatch: layer_T : 12, layer_S : 3, feature : attention, weight : 1, loss : attention_mse, proj : None,
IntermediateMatch: layer_T : [0, 0], layer_S : [0, 0], feature : hidden, weight : 1, loss : mmd, proj : None,
IntermediateMatch: layer_T : [4, 4], layer_S : [1, 1], feature : hidden, weight : 1, loss : mmd, proj : None,
IntermediateMatch: layer_T : [8, 8], layer_S : [2, 2], feature : hidden, weight : 1, loss : mmd, proj : None,
IntermediateMatch: layer_T : [12, 12], layer_S : [3, 3], feature : hidden, weight : 1, loss : mmd, proj : None]
is_caching_logits : False

2021/03/30 17:50:20 - INFO - Main - ***** Running training *****
2021/03/30 17:50:20 - INFO - Main - Num examples = 300000
2021/03/30 17:50:20 - INFO - Main - Forward batch size = 16
2021/03/30 17:50:20 - INFO - Main - Num backward steps = 750000
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Traceback (most recent call last):
File "main.distill.py", line 199, in
main()
File "main.distill.py", line 192, in main
num_epochs = args.num_train_epochs, callback=callback_func,max_grad_norm=1)
File "/home/syk/anaconda3/lib/python3.7/site-packages/textbrewer/distiller_basic.py", line 277, in train
optimizer, scheduler, tqdm_disable = self.initialize_training(optimizer, scheduler_class, scheduler_args, scheduler)
File "/home/syk/anaconda3/lib/python3.7/site-packages/textbrewer/distiller_basic.py", line 89, in initialize_training
(self.model_S, self.model_T), optimizer = amp.initialize([self.model_S, self.model_T], optimizer, opt_level=self.t_config.fp16_opt_level)
File "/home/syk/anaconda3/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/home/syk/anaconda3/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_initialize.py", line 168, in _initialize
check_models(models)
File "/home/syk/anaconda3/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_initialize.py", line 75, in check_models
"Parallel wrappers should only be applied to the model(s) AFTER \n"
RuntimeError: Incoming model is an instance of torch.nn.parallel.DataParallel. Parallel wrappers should only be applied to the model(s) AFTER
the model(s) have been returned from amp.initialize.

from textbrewer.

Youarerare commented on May 27, 2024

破案了。。。有多张卡的时候最好加上cuda_visible_device你用的那几张，不然n_gpus 是默认全部卡。。。

from textbrewer.

RuntimeError: Incoming model is an instance of torch.nn.parallel.DataParallel. Parallel wrappers should only be applied to the model(s) AFTER the model(s) have been returned from amp.initialize. about textbrewer HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent