Hi, When following the code step by step, I get an error when runnin

Error in step 2.3 (Evolutionary search with latency constraint) about hardware-aware-transformers HOT 5 CLOSED

mit-han-lab commented on August 14, 2024

Error in step 2.3 (Evolutionary search with latency constraint)

from hardware-aware-transformers.

Comments (5)

Hanrui-Wang commented on August 14, 2024

Hi ihish52,

Thanks for your question!
Could you provide more details about your command and which line caught the error?

from hardware-aware-transformers.

ihish52 commented on August 14, 2024

Thanks for the quick reply. Attached is the config file I am using to perform the evolutionary search for my i5 CPU (barely any change from your example). There are no NVIDIA drivers installed so I do not think the GPU is affecting this.

wmt14ende_i5.zip

Below is the output when I run the command for evo_search.py:

python3 evo_search.py --configs=configs/wmt14.en-de/supertransformer/space0.yml --evo-configs=configs/wmt14.en-de/evo_search/wmt14ende_i5.yml

Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, arch='transformersuper_wmt_en_de', attention_dropout=0.1, beam=5, best_checkpoint_metric='loss', bucket_cap_mb=25, ckpt_path='./latency_dataset/predictors/wmt14ende_cpu_i5.pt', clip_norm=0.0, configs='configs/wmt14.en-de/supertransformer/space0.yml', cpu=False, criterion='label_smoothed_cross_entropy', crossover_size=50, curriculum=0, data='data/binary/wmt16_en_de', dataset_impl=None, ddp_backend='no_c10d', decoder_arbitrary_ende_attn_all_subtransformer=None, decoder_arbitrary_ende_attn_choice=[-1, 1, 2], decoder_attention_heads=8, decoder_embed_choice=[640, 512], decoder_embed_dim=640, decoder_embed_dim_subtransformer=None, decoder_embed_path=None, decoder_ende_attention_heads_all_subtransformer=None, decoder_ende_attention_heads_choice=[8, 4], decoder_ffn_embed_dim=3072, decoder_ffn_embed_dim_all_subtransformer=None, decoder_ffn_embed_dim_choice=[3072, 2048, 1024], decoder_input_dim=640, decoder_layer_num_choice=[6, 5, 4, 3, 2, 1], decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=640, decoder_self_attention_heads_all_subtransformer=None, decoder_self_attention_heads_choice=[8, 4], device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, diverse_beam_groups=-1, diverse_beam_strength=0.5, dropout=0.3, encoder_attention_heads=8, encoder_embed_choice=[640, 512], encoder_embed_dim=640, encoder_embed_dim_subtransformer=None, encoder_embed_path=None, encoder_ffn_embed_dim=3072, encoder_ffn_embed_dim_all_subtransformer=None, encoder_ffn_embed_dim_choice=[3072, 2048, 1024], encoder_layer_num_choice=[6], encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=False, encoder_self_attention_heads_all_subtransformer=None, encoder_self_attention_heads_choice=[8, 4], evo_configs='configs/wmt14.en-de/evo_search/wmt14ende_i5.yml', evo_iter=30, feature_norm=[640.0, 6.0, 2048.0, 6.0, 640.0, 6.0, 2048.0, 6.0, 6.0, 2.0], find_unused_parameters=False, fix_batches_to_gpus=False, fp16=True, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, get_attn=False, keep_interval_updates=-1, keep_last_epochs=20, label_smoothing=0.1, lat_norm=700.0, latency_constraint=6000.0, lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr=[1e-07], lr_period_updates=-1, lr_scheduler='cosine', lr_shrink=1.0, match_source_len=False, max_epoch=0, max_len_a=0, max_len_b=200, max_lr=0.001, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=4096, max_tokens_valid=4096, max_update=40000, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, min_lr=-1, model_overrides='{}', mutation_prob=0.3, mutation_size=50, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_repeat_ngram_size=0, no_save=False, no_save_optimizer_state=False, no_token_positional_embeddings=False, num_workers=10, optimizer='adam', optimizer_overrides='{}', parent_size=25, path=None, pdb=False, population_size=125, prefix_size=0, print_alignment=False, profile_latency=False, qkv_dim=512, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='./downloaded_models/HAT_wmt14ende_super_space0.pt', results_path=None, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, save_dir='checkpoints/wmt14.en-de/supertransformer/space0', save_interval=10, save_interval_updates=0, score_reference=False, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, t_mult=1, target_lang=None, task='translation', tbmf_wrapper=False, temperature=1.0, tensorboard_logdir='checkpoints/wmt14.en-de/supertransformer/space0/tensorboard', threshold_loss_scale=None, train_subset='train', unkpen=0, unnormalized=False, update_freq=[16], upsample_primary=1, use_bmuf=False, user_dir=None, valid_cnt_max=1000000000.0, valid_subset='valid', validate_interval=10, vocab_original_scaling=False, warmup_init_lr=1e-07, warmup_updates=10000, weight_decay=0.0, write_config_path='configs/wmt14.en-de/subtransformer/wmt14ende_i5.yml')
| [en] dictionary: 32768 types
| [de] dictionary: 32768 types
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.en
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.de
| data/binary/wmt16_en_de valid en-de 3000 examples
| Fallback to xavier initializer
TransformerSuperModel(
(encoder): TransformerEncoder(
(embed_tokens): EmbeddingSuper(32768, 640, padding_idx=1)
(embed_positions): SinusoidalPositionalEmbedding()
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
)
)
(decoder): TransformerDecoder(
(embed_tokens): EmbeddingSuper(32768, 640, padding_idx=1)
(embed_positions): SinusoidalPositionalEmbedding()
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
)
)
)
| loaded checkpoint ./downloaded_models/HAT_wmt14ende_super_space0.pt (epoch 136 @ 0 updates)
| loading train data for epoch 136
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.en
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.de
| data/binary/wmt16_en_de valid en-de 3000 examples
| Start Iteration 0:
Traceback (most recent call last):
File "evo_search.py", line 106, in
cli_main()
File "evo_search.py", line 102, in cli_main
main(args)
File "evo_search.py", line 51, in main
best_config = evolver.run_evo_search()
File "/home/hishan/hardware-aware-transformers/fairseq/evolution.py", line 217, in run_evo_search
popu_scores = self.get_scores(popu)
File "/home/hishan/hardware-aware-transformers/fairseq/evolution.py", line 281, in get_scores
scores = validate_all(self.args, self.trainer, self.task, self.epoch_iter, configs)
File "/home/hishan/hardware-aware-transformers/fairseq/evolution.py", line 401, in validate_all
trainer.valid_step(sample)
File "/home/hishan/hardware-aware-transformers/fairseq/trainer.py", line 451, in valid_step
raise e
File "/home/hishan/hardware-aware-transformers/fairseq/trainer.py", line 438, in valid_step
_loss, sample_size, logging_output = self.task.valid_step(
File "/home/hishan/hardware-aware-transformers/fairseq/tasks/fairseq_task.py", line 241, in valid_step
loss, sample_size, logging_output = criterion(model, sample)
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hishan/hardware-aware-transformers/fairseq/criterions/label_smoothed_cross_entropy.py", line 56, in forward
net_output = model(**sample['net_input'])
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hishan/hardware-aware-transformers/fairseq/models/fairseq_model.py", line 222, in forward
encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hishan/hardware-aware-transformers/fairseq/models/transformer_super.py", line 401, in forward
x = layer(x, encoder_padding_mask)
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hishan/hardware-aware-transformers/fairseq/models/transformer_super.py", line 900, in forward
x, _ = self.self_attn(query=x, key=x, value=x, key_padding_mask=encoder_padding_mask)
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hishan/hardware-aware-transformers/fairseq/modules/multihead_attention_super.py", line 182, in forward
q, k, v = self.in_proj_qkv(query)
File "/home/hishan/hardware-aware-transformers/fairseq/modules/multihead_attention_super.py", line 314, in in_proj_qkv
return self._in_proj(query, sample_dim=self.sample_q_embed_dim).chunk(3, dim=-1)
File "/home/hishan/hardware-aware-transformers/fairseq/modules/multihead_attention_super.py", line 351, in _in_proj
return F.linear(input, weight, bias)
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: _th_addmm_out not supported on CPUType for Half

from hardware-aware-transformers.

ihish52 commented on August 14, 2024

Still not sure what caused this. Made a new environment and installed all dependencies again, with versions exactly as stated in the requirements and it worked. Closing this issue. Thanks for the response.

from hardware-aware-transformers.

Hanrui-Wang commented on August 14, 2024

Hi ihish52,

Sorry for my late reply, I was too busy in the past several weeks.
The reason for the error is that some methods on float16 are not supported by PyTorch CPU, so I fixed by using fp32 when performing the evolutionary search on CPU. (commit)

Thanks for your contribution!

Best,
Hanrui

from hardware-aware-transformers.

Hanrui-Wang commented on August 14, 2024

Hi Hishan,

I will close the issue for now. Feel free to reopen if you have any further questions!

Best,
Hanrui

from hardware-aware-transformers.

Error in step 2.3 (Evolutionary search with latency constraint) about hardware-aware-transformers HOT 5 CLOSED

Comments (5)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent