Comments (11)
Hi ihish52,
Thanks for your question! You can download the SuperTransformer checkpoints and then test with the following command:
bash configs/[task_name]/test.sh ./downloaded_models/[supertransformer_model].pt configs/[task_name]/subtransformer/[subtransformer_config].yml
# for example
bash configs/wmt14.en-de/test.sh ./downloaded_models/HAT_wmt14ende_super_space0.pt configs/wmt14.en-de/subtransformer/[email protected][email protected]
Note that the models used in Figure 5 are not exactly the models we released, so the BLEU numbers may have some small differences.
Best,
Hanrui
from hardware-aware-transformers.
Hi Hanrui-Wang,
Thanks for the quick reply on this. I have run the exact example you provided above (testing the inherited SubTrainsformer for HAT_wmt14ende_raspberrypi@[email protected]).
The code runs into two problems:
- RuntimeError: result type Float can't be cast to the desired output type Long
- ZeroDivisionError: division by zero
I am using the downloaded SuperTransformer checkpoint and the provided config files, so nothing has been changed from the project. Do you have any idea what may be causing this?
Below is the full terminal output for the example command you provided:
TransformerSuperModel(
(encoder): TransformerEncoder(
(embed_tokens): EmbeddingSuper(32768, 640, padding_idx=1)
(embed_positions): SinusoidalPositionalEmbedding()
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:4 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:4 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:4 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerEncoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:4 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
)
)
(decoder): TransformerDecoder(
(embed_tokens): EmbeddingSuper(32768, 640, padding_idx=1)
(embed_positions): SinusoidalPositionalEmbedding()
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:4 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:4 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:4 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerDecoderLayer(
(self_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(self_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttentionSuper num_heads:8 qkv_dim:512
(out_proj): LinearSuper(in_features=512, out_features=640, bias=True)
)
(encoder_attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
(fc1): LinearSuper(in_features=640, out_features=3072, bias=True)
(fc2): LinearSuper(in_features=3072, out_features=640, bias=True)
(final_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
)
)
)
)
./downloaded_models/HAT_wmt14ende_super_space0.pt
Traceback (most recent call last):
File "generate.py", line 211, in
cli_main()
File "generate.py", line 207, in cli_main
main(args)
File "generate.py", line 116, in main
hypos, decoder_times = task.inference_step(generator, models, sample, prefix_tokens)
File "/home/hmrp1r17/hat_from_H_10-11/hardware-aware-transformers/fairseq/tasks/fairseq_task.py", line 246, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/home/hmrp1r17/.local/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/hmrp1r17/hat_from_H_10-11/hardware-aware-transformers/fairseq/sequence_generator.py", line 378, in generate
scores.view(bsz, beam_size, -1)[:, :, :step],
File "/home/hmrp1r17/hat_from_H_10-11/hardware-aware-transformers/fairseq/search.py", line 81, in step
torch.div(self.indices_buf, vocab_size, out=self.beams_buf)
RuntimeError: result type Float can't be cast to the desired output type Long
Evaluate Normal BLEU score!
Namespace(ignore_case=False, order=4, ref='./downloaded_models/exp/HAT_wmt14ende_raspberrypi@[email protected]_test_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='./downloaded_models/exp/HAT_wmt14ende_raspberrypi@[email protected]_test_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 83, in
main()
File "score.py", line 79, in main
score(f)
File "score.py", line 73, in score
print(scorer.result_string(args.order))
File "/home/hmrp1r17/hat_from_H_10-11/hardware-aware-transformers/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/home/hmrp1r17/hat_from_H_10-11/hardware-aware-transformers/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/home/hmrp1r17/hat_from_H_10-11/hardware-aware-transformers/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen / self.stat.predlen
ZeroDivisionError: division by zero
from hardware-aware-transformers.
Hi ihish52,
I think this is due to the same reason as issue #6. I have changed the default data type on CPU to fp32 as in commit.
Thanks!
Hanrui
from hardware-aware-transformers.
Hi Hanrui-Wang,
Thanks for getting back to me. The system I am running this on does have a GPU and Pytorch recognizes it (torch.cuda.current_device()). It's a Jetson Nano with an NVIDIA Tegra X1 GPU. However, I still run into those errors.
I have also tested the above example command on a Linux server with 4 1080Ti GPUs, the same RuntimeError and ZeroDivisionError pops up.
I have also changed generate.py as you mentioned before running the code.
Is there something else that could be causing this? Looking forward to your reply.
Thanks!
from hardware-aware-transformers.
Hi ihish52,
Can you try to set args.fp16 always to False and see whether the issue still exists? The ZeroDivisionError error is because after having the RuntimeError, there is no generated translations. So if the RuntimeError is fixed, ZeroDivisionError will disappear.
Best,
Hanrui
from hardware-aware-transformers.
Hi Hanrui-Wang,
As suggested, I set args.fp16 to False always in generate.py but the same error persists. Would you have any other suggestions? Would changing the tensor type for the torch.div() line in search.py be necessary to fix this?
Thanks for your help.
from hardware-aware-transformers.
Hi ihish52,
I cannot reproduce the error, but you may try replacing the search.py line 81 with this:
self.beams_buf = torch.div(self.indices_buf, vocab_size).type_as(self.beams_buf)
Best,
Hanrui
from hardware-aware-transformers.
Hi Hanrui-Wang,
Made the change to line 81 as suggested. Indeed the code did run and I was able to obtain a reasonable BLEU score!
As the tensor type was changed, this warning pops us several times for each iteration of the script:
UserWarning: An output with one or more elements was resized since it had shape [1], which does not match the required output shape [7].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at /opt/conda/conda-bld/pytorch_1603729021865/work/aten/src/ATen/native/Resize.cpp:19.)_
Do you think this would affect the output of the script (bleu score calculation)?
Thanks for the advice.
from hardware-aware-transformers.
Hi ihish52,
Great!
When I run the following command, I get BLEU 25.99. If you get the same score, then the warning may not affect results.
bash configs/wmt14.en-de/test.sh ./downloaded_models/HAT_wmt14ende_super_space0.pt configs/wmt14.en-de/subtransformer/HAT_wmt14ende_raspberrypi@6.0s_bleu@28.2.yml
Best,
Hanrui
from hardware-aware-transformers.
Hi Hanrui-Wang,
Yes, I can confirm a BLEU score of 25.99 when I run the above command. Thanks for your help with this.
Regards,
Hishan
from hardware-aware-transformers.
Hi Hishan,
I will close the issue for now. Feel free to reopen if you have any further questions!
Best,
Hanrui
from hardware-aware-transformers.
Related Issues (17)
- One question HOT 2
- Training new SuperTransformer - calculating number of SubTransformer combinations? HOT 2
- Latency predictor relative error instead of absolute error HOT 1
- What is the method used to sample training examples for the MLP latency predictor? HOT 2
- Does the generated latency count in the embedding lookup table and the last output layers ? HOT 2
- lower loss but the BLEU is 0
- Question about the latency on Raspberry Pi HOT 1
- Used version of `fairseq`
- Question about the SubTransformers sampling process.
- RAM in the used Raspberry Pi HOT 2
- Quantization on HAT. HOT 4
- questions about the search & training process HOT 1
- question about number of parameters HOT 2
- Error in step 2.3 (Evolutionary search with latency constraint) HOT 5
- About the Quantization Friendly. HOT 1
- how to use the processed data in your code?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hardware-aware-transformers.