Coder Social home page Coder Social logo

Comments (5)

Hanrui-Wang avatar Hanrui-Wang commented on August 14, 2024

Hi ihish52,

Thanks for your question!
Could you provide more details about your command and which line caught the error?

from hardware-aware-transformers.

ihish52 avatar ihish52 commented on August 14, 2024

Thanks for the quick reply. Attached is the config file I am using to perform the evolutionary search for my i5 CPU (barely any change from your example). There are no NVIDIA drivers installed so I do not think the GPU is affecting this.

wmt14ende_i5.zip

Below is the output when I run the command for evo_search.py:

python3 evo_search.py --configs=configs/wmt14.en-de/supertransformer/space0.yml --evo-configs=configs/wmt14.en-de/evo_search/wmt14ende_i5.yml

Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, arch='transformersuper_wmt_en_de', attention_dropout=0.1, beam=5, best_checkpoint_metric='loss', bucket_cap_mb=25, ckpt_path='./latency_dataset/predictors/wmt14ende_cpu_i5.pt', clip_norm=0.0, configs='configs/wmt14.en-de/supertransformer/space0.yml', cpu=False, criterion='label_smoothed_cross_entropy', crossover_size=50, curriculum=0, data='data/binary/wmt16_en_de', dataset_impl=None, ddp_backend='no_c10d', decoder_arbitrary_ende_attn_all_subtransformer=None, decoder_arbitrary_ende_attn_choice=[-1, 1, 2], decoder_attention_heads=8, decoder_embed_choice=[640, 512], decoder_embed_dim=640, decoder_embed_dim_subtransformer=None, decoder_embed_path=None, decoder_ende_attention_heads_all_subtransformer=None, decoder_ende_attention_heads_choice=[8, 4], decoder_ffn_embed_dim=3072, decoder_ffn_embed_dim_all_subtransformer=None, decoder_ffn_embed_dim_choice=[3072, 2048, 1024], decoder_input_dim=640, decoder_layer_num_choice=[6, 5, 4, 3, 2, 1], decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=640, decoder_self_attention_heads_all_subtransformer=None, decoder_self_attention_heads_choice=[8, 4], device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, diverse_beam_groups=-1, diverse_beam_strength=0.5, dropout=0.3, encoder_attention_heads=8, encoder_embed_choice=[640, 512], encoder_embed_dim=640, encoder_embed_dim_subtransformer=None, encoder_embed_path=None, encoder_ffn_embed_dim=3072, encoder_ffn_embed_dim_all_subtransformer=None, encoder_ffn_embed_dim_choice=[3072, 2048, 1024], encoder_layer_num_choice=[6], encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=False, encoder_self_attention_heads_all_subtransformer=None, encoder_self_attention_heads_choice=[8, 4], evo_configs='configs/wmt14.en-de/evo_search/wmt14ende_i5.yml', evo_iter=30, feature_norm=[640.0, 6.0, 2048.0, 6.0, 640.0, 6.0, 2048.0, 6.0, 6.0, 2.0], find_unused_parameters=False, fix_batches_to_gpus=False, fp16=True, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, get_attn=False, keep_interval_updates=-1, keep_last_epochs=20, label_smoothing=0.1, lat_norm=700.0, latency_constraint=6000.0, lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr=[1e-07], lr_period_updates=-1, lr_scheduler='cosine', lr_shrink=1.0, match_source_len=False, max_epoch=0, max_len_a=0, max_len_b=200, max_lr=0.001, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=4096, max_tokens_valid=4096, max_update=40000, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, min_lr=-1, model_overrides='{}', mutation_prob=0.3, mutation_size=50, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_repeat_ngram_size=0, no_save=False, no_save_optimizer_state=False, no_token_positional_embeddings=False, num_workers=10, optimizer='adam', optimizer_overrides='{}', parent_size=25, path=None, pdb=False, population_size=125, prefix_size=0, print_alignment=False, profile_latency=False, qkv_dim=512, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='./downloaded_models/HAT_wmt14ende_super_space0.pt', results_path=None, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, save_dir='checkpoints/wmt14.en-de/supertransformer/space0', save_interval=10, save_interval_updates=0, score_reference=False, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, t_mult=1, target_lang=None, task='translation', tbmf_wrapper=False, temperature=1.0, tensorboard_logdir='checkpoints/wmt14.en-de/supertransformer/space0/tensorboard', threshold_loss_scale=None, train_subset='train', unkpen=0, unnormalized=False, update_freq=[16], upsample_primary=1, use_bmuf=False, user_dir=None, valid_cnt_max=1000000000.0, valid_subset='valid', validate_interval=10, vocab_original_scaling=False, warmup_init_lr=1e-07, warmup_updates=10000, weight_decay=0.0, write_config_path='configs/wmt14.en-de/subtransformer/wmt14ende_i5.yml')
| [en] dictionary: 32768 types
| [de] dictionary: 32768 types
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.en
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.de
| data/binary/wmt16_en_de valid en-de 3000 examples
| Fallback to xavier initializer
TransformerSuperModel(
(encoder): TransformerEncoder(
(embed_tokens): EmbeddingSuper(32768, 640, padding_idx=1)
(embed_positions): SinusoidalPositionalEmbedding()
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
)
)
(decoder): TransformerDecoder(
(embed_tokens): EmbeddingSuper(32768, 640, padding_idx=1)
(embed_positions): SinusoidalPositionalEmbedding()
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
)
)
)
| loaded checkpoint ./downloaded_models/HAT_wmt14ende_super_space0.pt (epoch 136 @ 0 updates)
| loading train data for epoch 136
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.en
| loaded 3000 examples from: data/binary/wmt16_en_de/valid.en-de.de
| data/binary/wmt16_en_de valid en-de 3000 examples
| Start Iteration 0:
Traceback (most recent call last):
File "evo_search.py", line 106, in
cli_main()
File "evo_search.py", line 102, in cli_main
main(args)
File "evo_search.py", line 51, in main
best_config = evolver.run_evo_search()
File "/home/hishan/hardware-aware-transformers/fairseq/evolution.py", line 217, in run_evo_search
popu_scores = self.get_scores(popu)
File "/home/hishan/hardware-aware-transformers/fairseq/evolution.py", line 281, in get_scores
scores = validate_all(self.args, self.trainer, self.task, self.epoch_iter, configs)
File "/home/hishan/hardware-aware-transformers/fairseq/evolution.py", line 401, in validate_all
trainer.valid_step(sample)
File "/home/hishan/hardware-aware-transformers/fairseq/trainer.py", line 451, in valid_step
raise e
File "/home/hishan/hardware-aware-transformers/fairseq/trainer.py", line 438, in valid_step
_loss, sample_size, logging_output = self.task.valid_step(
File "/home/hishan/hardware-aware-transformers/fairseq/tasks/fairseq_task.py", line 241, in valid_step
loss, sample_size, logging_output = criterion(model, sample)
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hishan/hardware-aware-transformers/fairseq/criterions/label_smoothed_cross_entropy.py", line 56, in forward
net_output = model(**sample['net_input'])
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hishan/hardware-aware-transformers/fairseq/models/fairseq_model.py", line 222, in forward
encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hishan/hardware-aware-transformers/fairseq/models/transformer_super.py", line 401, in forward
x = layer(x, encoder_padding_mask)
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hishan/hardware-aware-transformers/fairseq/models/transformer_super.py", line 900, in forward
x, _ = self.self_attn(query=x, key=x, value=x, key_padding_mask=encoder_padding_mask)
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hishan/hardware-aware-transformers/fairseq/modules/multihead_attention_super.py", line 182, in forward
q, k, v = self.in_proj_qkv(query)
File "/home/hishan/hardware-aware-transformers/fairseq/modules/multihead_attention_super.py", line 314, in in_proj_qkv
return self._in_proj(query, sample_dim=self.sample_q_embed_dim).chunk(3, dim=-1)
File "/home/hishan/hardware-aware-transformers/fairseq/modules/multihead_attention_super.py", line 351, in _in_proj
return F.linear(input, weight, bias)
File "/home/hishan/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: _th_addmm_out not supported on CPUType for Half

from hardware-aware-transformers.

ihish52 avatar ihish52 commented on August 14, 2024

Still not sure what caused this. Made a new environment and installed all dependencies again, with versions exactly as stated in the requirements and it worked. Closing this issue. Thanks for the response.

from hardware-aware-transformers.

Hanrui-Wang avatar Hanrui-Wang commented on August 14, 2024

Hi ihish52,

Sorry for my late reply, I was too busy in the past several weeks.
The reason for the error is that some methods on float16 are not supported by PyTorch CPU, so I fixed by using fp32 when performing the evolutionary search on CPU. (commit)

Thanks for your contribution!

Best,
Hanrui

from hardware-aware-transformers.

Hanrui-Wang avatar Hanrui-Wang commented on August 14, 2024

Hi Hishan,

I will close the issue for now. Feel free to reopen if you have any further questions!

Best,
Hanrui

from hardware-aware-transformers.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.