Coder Social home page Coder Social logo

RuntimeError: Incoming model is an instance of torch.nn.parallel.DataParallel. Parallel wrappers should only be applied to the model(s) AFTER the model(s) have been returned from amp.initialize. about textbrewer HOT 4 CLOSED

airaria avatar airaria commented on May 27, 2024
RuntimeError: Incoming model is an instance of torch.nn.parallel.DataParallel. Parallel wrappers should only be applied to the model(s) AFTER the model(s) have been returned from amp.initialize.

from textbrewer.

Comments (4)

Youarerare avatar Youarerare commented on May 27, 2024

我把这行代码删了能运行了,但是根本问题在哪还不知道

torch.distributed.init_process_group(backend='nccl')

from textbrewer.

airaria avatar airaria commented on May 27, 2024

是在同时使用多卡计算和fp16时出现问题吗?使用的是数据并行方式DataParallel还是分布式并行DistributedDataParallel?

from textbrewer.

Youarerare avatar Youarerare commented on May 27, 2024

采用如下配置蒸馏自定义的三层模型同样报上面的错误
device : cuda:5
fp16 : True
fp16_opt_level : O1
data_parallel : False
local_rank : -1

完整错误信息:
2021/03/30 17:49:30 - INFO - Main - output_dir:/data/private/syk/zyb/TextMatch/Bert/mnli_t8_TbaseST3tiny_L3SmmdMSE_lr10e40_bs16
2021/03/30 17:49:30 - INFO - Main - data_dir:/data/private/syk/zyb/TextMatch/data/MNLI
2021/03/30 17:49:30 - INFO - Main - max_seq_length:512
2021/03/30 17:49:30 - INFO - Main - do_train:True
2021/03/30 17:49:30 - INFO - Main - do_predict:True
2021/03/30 17:49:30 - INFO - Main - train_batch_size:16
2021/03/30 17:49:30 - INFO - Main - predict_batch_size:8
2021/03/30 17:49:30 - INFO - Main - learning_rate:0.0001
2021/03/30 17:49:30 - INFO - Main - num_train_epochs:40.0
2021/03/30 17:49:30 - INFO - Main - warmup_proportion:0.1
2021/03/30 17:49:30 - INFO - Main - no_cuda:False
np_resource = np.dtype([("resource", np.ubyte, 1)])
2021/03/30 17:49:30 - INFO - faiss - Loading faiss with AVX2 support.
2021/03/30 17:49:30 - INFO - faiss - Loading faiss.
2021/03/30 17:49:30 - INFO - Main - output_dir:/data/private/syk/zyb/TextMatch/Bert/mnli_t8_TbaseST3tiny_L3SmmdMSE_lr10e40_bs16
2021/03/30 17:49:30 - INFO - Main - data_dir:/data/private/syk/zyb/TextMatch/data/MNLI
2021/03/30 17:49:30 - INFO - Main - max_seq_length:512
2021/03/30 17:49:30 - INFO - Main - do_train:True
2021/03/30 17:49:30 - INFO - Main - do_predict:True
2021/03/30 17:49:30 - INFO - Main - train_batch_size:16
2021/03/30 17:49:30 - INFO - Main - predict_batch_size:8
2021/03/30 17:49:30 - INFO - Main - learning_rate:0.0001
2021/03/30 17:49:30 - INFO - Main - gradient_accumulation_steps:1
2021/03/30 17:49:30 - INFO - Main - local_rank:-1
2021/03/30 17:49:30 - INFO - Main - fp16:True
2021/03/30 17:49:30 - INFO - Main - random_seed:9580
2021/03/30 17:49:30 - INFO - Main - weight_decay_rate:0.01
2021/03/30 17:49:30 - INFO - Main - do_eval:True
2021/03/30 17:49:30 - INFO - Main - data_dir:/data/private/syk/zyb/TextMatch/data/MNLI
2021/03/30 17:49:30 - INFO - Main - max_seq_length:512
2021/03/30 17:49:30 - INFO - Main - do_train:True
2021/03/30 17:49:30 - INFO - Main - do_predict:True
2021/03/30 17:49:30 - INFO - Main - train_batch_size:16
2021/03/30 17:49:30 - INFO - Main - predict_batch_size:8
2021/03/30 17:49:30 - INFO - Main - learning_rate:0.0001
2021/03/30 17:49:30 - INFO - Main - num_train_epochs:40.0
2021/03/30 17:49:30 - INFO - Main - warmup_proportion:0.1
2021/03/30 17:49:30 - INFO - Main - no_cuda:False
2021/03/30 17:49:30 - INFO - Main - gradient_accumulation_steps:1
2021/03/30 17:49:30 - INFO - Main - local_rank:-1
2021/03/30 17:49:30 - INFO - Main - fp16:True
2021/03/30 17:49:30 - INFO - Main - random_seed:9580
2021/03/30 17:49:30 - INFO - Main - weight_decay_rate:0.01
2021/03/30 17:49:30 - INFO - Main - do_eval:True
2021/03/30 17:49:30 - INFO - Main - PRINT_EVERY:200
2021/03/30 17:49:30 - INFO - Main - ckpt_frequency:1
2021/03/30 17:49:30 - INFO - Main - temperature:8.0
2021/03/30 17:49:30 - INFO - Main - teacher_cached:False
2021/03/30 17:49:30 - INFO - Main - task_name:mnli
2021/03/30 17:49:30 - INFO - Main - aux_task_name:None
2021/03/30 17:49:30 - INFO - Main - aux_data_dir:None
2021/03/30 17:49:30 - INFO - Main - matches:['L3_attention_mse', 'L3_hidden_smmd']
2021/03/30 17:49:30 - INFO - Main - model_config_json:/data/private/syk/zyb/TextMatch/Bert/mnli_t8_TbaseST3tiny_L3SmmdMSE_lr10e40_bs16/DIstillBertToT3.json.run
2021/03/30 17:49:30 - INFO - Main - do_test:False
2021/03/30 17:49:30 - WARNING - Main - Output directory () already exists and is not empty.
2021/03/30 17:49:30 - INFO - Main - device cuda:5 n_gpu 8 distributed training False
2021/03/30 17:49:30 - INFO - utils - Loading features from cached file /data/private/syk/zyb/TextMatch/data/MNLI/rbt_3_train_512_mnli
2021/03/30 17:50:07 - INFO - utils - Loading features from cached file /data/private/syk/zyb/TextMatch/data/MNLI/rbt_3_dev_512_mnli
2021/03/30 17:50:09 - INFO - utils - Loading features from cached file /data/private/syk/zyb/TextMatch/data/MNLI/rbt_3_dev_512_mnli-mm
2021/03/30 17:50:12 - INFO - Main - Data loaded
2021/03/30 17:50:14 - INFO - Main - Teacher Model bert loaded
2021/03/30 17:50:20 - INFO - Main - missing keys:['bert.embeddings.position_ids', 'classifier.weight', 'classifier.bias']
2021/03/30 17:50:20 - INFO - Main - unexpected keys:['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
2021/03/30 17:50:20 - INFO - Main - Student Model loaded
2021/03/30 17:50:20 - INFO - Main - Length of all_trainable_params: 2
2021/03/30 17:50:20 - INFO - Main - [{'layer_T': 4, 'layer_S': 1, 'feature': 'attention', 'loss': 'attention_mse', 'weight': 1}, {'layer_T': 8, 'layer_S': 2, 'feature': 'attention', 'loss': 'attention_mse', 'weight': 1}, {'layer_T': 12, 'layer_S': 3, 'feature': 'attention', 'loss': 'attention_mse', 'weight': 1}, {'layer_T': [0, 0], 'layer_S': [0, 0], 'feature': 'hidden', 'loss': 'mmd', 'weight': 1}, {'layer_T': [4, 4], 'layer_S': [1, 1], 'feature': 'hidden', 'loss': 'mmd', 'weight': 1}, {'layer_T': [8, 8], 'layer_S': [2, 2], 'feature': 'hidden', 'loss': 'mmd', 'weight': 1}, {'layer_T': [12, 12], 'layer_S': [3, 3], 'feature': 'hidden', 'loss': 'mmd', 'weight': 1}]
2021/03/30 17:50:20 - INFO - Main - gradient_accumulation_steps : 1
ckpt_frequency : 1
ckpt_epoch_frequency : 1
ckpt_steps : None
log_dir : /data/private/syk/zyb/TextMatch/Bert/mnli_t8_TbaseST3tiny_L3SmmdMSE_lr10e40_bs16
output_dir : /data/private/syk/zyb/TextMatch/Bert/mnli_t8_TbaseST3tiny_L3SmmdMSE_lr10e40_bs16
device : cuda:5
fp16 : True
fp16_opt_level : O1
data_parallel : False
local_rank : -1

2021/03/30 17:50:20 - INFO - Main - temperature : 8.0
temperature_scheduler : None
hard_label_weight : 0
hard_label_weight_scheduler : None
kd_loss_type : ce
kd_loss_weight : 1
kd_loss_weight_scheduler : None
probability_shift : False
intermediate_matches : [
IntermediateMatch: layer_T : 4, layer_S : 1, feature : attention, weight : 1, loss : attention_mse, proj : None,
IntermediateMatch: layer_T : 8, layer_S : 2, feature : attention, weight : 1, loss : attention_mse, proj : None,
IntermediateMatch: layer_T : 12, layer_S : 3, feature : attention, weight : 1, loss : attention_mse, proj : None,
IntermediateMatch: layer_T : [0, 0], layer_S : [0, 0], feature : hidden, weight : 1, loss : mmd, proj : None,
IntermediateMatch: layer_T : [4, 4], layer_S : [1, 1], feature : hidden, weight : 1, loss : mmd, proj : None,
IntermediateMatch: layer_T : [8, 8], layer_S : [2, 2], feature : hidden, weight : 1, loss : mmd, proj : None,
IntermediateMatch: layer_T : [12, 12], layer_S : [3, 3], feature : hidden, weight : 1, loss : mmd, proj : None]
is_caching_logits : False

2021/03/30 17:50:20 - INFO - Main - ***** Running training *****
2021/03/30 17:50:20 - INFO - Main - Num examples = 300000
2021/03/30 17:50:20 - INFO - Main - Forward batch size = 16
2021/03/30 17:50:20 - INFO - Main - Num backward steps = 750000
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Traceback (most recent call last):
File "main.distill.py", line 199, in
main()
File "main.distill.py", line 192, in main
num_epochs = args.num_train_epochs, callback=callback_func,max_grad_norm=1)
File "/home/syk/anaconda3/lib/python3.7/site-packages/textbrewer/distiller_basic.py", line 277, in train
optimizer, scheduler, tqdm_disable = self.initialize_training(optimizer, scheduler_class, scheduler_args, scheduler)
File "/home/syk/anaconda3/lib/python3.7/site-packages/textbrewer/distiller_basic.py", line 89, in initialize_training
(self.model_S, self.model_T), optimizer = amp.initialize([self.model_S, self.model_T], optimizer, opt_level=self.t_config.fp16_opt_level)
File "/home/syk/anaconda3/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/home/syk/anaconda3/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_initialize.py", line 168, in _initialize
check_models(models)
File "/home/syk/anaconda3/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_initialize.py", line 75, in check_models
"Parallel wrappers should only be applied to the model(s) AFTER \n"
RuntimeError: Incoming model is an instance of torch.nn.parallel.DataParallel. Parallel wrappers should only be applied to the model(s) AFTER
the model(s) have been returned from amp.initialize.

from textbrewer.

Youarerare avatar Youarerare commented on May 27, 2024

破案了。。。 有多张卡的时候最好加上cuda_visible_device你用的那几张,不然n_gpus 是默认全部卡。。。

from textbrewer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.