mit-han-lab / hardware-aware-transformers Goto Github PK

View Code? Open in Web Editor NEW

323.0 13.0 48.0 17.12 MB

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Home Page: https://hat.mit.edu

License: Other

Python 95.70% Shell 2.96% C++ 0.51% Cuda 0.83%

hardware-aware transformer specialization efficient-model natural-language-processing machine-translation

hardware-aware-transformers's People

Contributors

Stargazers

Watchers

Forkers

ml-lab onisimchukv zeta1999 abodacs 5l1v3r1 zhengzx-nlp seeker1943 bshoterj jason-168 xrosliang sxjscience brunocavagnaro shipxu talha252 toydogcat xqq2018rebuild lloo099 geyouheng ihish52 xmumath dennistang742 jiangboguo ygexe vinodganesan jawaechan grasscannoli wangclnlp leimao ttoreli ariusewy colinsongf yamwd sushantdaga hayeonlee blyucs yaoao2017 tiantian-han liuxukun2000 ganeshjawahar jinshubai nready-nlp leo038 jimmy-adams ronnie-lhh nidheeshkumar743 idealitywaver gogog01-29-2021 peilin-chen

hardware-aware-transformers's Issues

RAM in the used Raspberry Pi

Hello,

I would just like to ask how large is the RAM in the used Rasberry Pi? 4 GB or 8 GB? And was it solely sufficient for your experiments?

Thanks.

how to use the processed data in your code?

As described in title, I have downloaded the processed data, ' *.tgz' , but I don't find how to use it. pls help me out.
I find the data path in each space0.yml. Is it for preprocessed data ?

Question on how to evaluate inherited SubTransformers.

Hi,

Table 5 in the paper mentions that the "Inherited" BLEU score is similar to the "From-Scratch" BLEU score.

Can you please specify which part of the code can be used to run inference/test the inherited SubTransformers (without training from scratch) or how to use the code to perform this task?

In other words, I would like to know how to test translations from a specific SubTransformer in the SuperTransformer design space without training the model again from scratch.

Hope you can point me in the right direction on this. Thanks for your help.

Error in step 2.3 (Evolutionary search with latency constraint)

Hi,

When following the code step by step, I get an error when running the evolutionary search. The error is:
"_th_admm_out not supported on CPUType for Half"

Do you know what could be causing this and how to fix it? I am currently running this for my i5 CPU. Does the config file need any change to avoid using the GPU when only the CPU is being tested?

Help with this would be highly appreciated. Thanks.

questions about the search & training process

Hi, I tried to run an evolutionary search on the IWLST14.de-en dataset with a 1080 Ti GPU.

I modified the latency-constraint from 200 to 150 since the 1080 Ti is faster than the Titan XP.

But the best architecture (143 ms) didn't change after ten epochs while the max iteration is 30.

Then I trained the searched architecture using the same configuration file and got only 33.77 BLEU (normal).

My questions are:

Is this phenomenon normal? Does it mean that the search has encountered a local optimum?
How to get comparable scores reported in your list if I use other GPUs with similar latency?

Here is the search log:
iwlst.evo.gpu.log

Latency predictor relative error instead of absolute error

Hello,

You mention in Fig. 6 in the paper that the average prediction error of the latency predictor is 0.1 secs.

Could you please provide how much error percentage is that on Raspberry Pi?

I checked your Raspberry Pi latency dataset file and I could see ground truth latency ("latency_mean_encoder" and "latency_mean_decoder") of the models used for test (last 200).

However I could not see the predicted values to calculate how much percentage is the error.

Thanks.

Question about the latency on Raspberry Pi

Dear Authors,

Hi, thanks for the library!
I tested it successfully on my GPU, but the problem I am having now is that the latency of the raspberry pi I'm used is different with yours. I used WMT'14 En-Fr for testing, and below is the instruction for my delay and test:

I'm not sure what the problem is, do I need additional settings?
And this is my environment:

Python == 3.7.3
pytorch == 1.4.0

Thanks.

Quantization on HAT.

Hi, firstly thanks for the library.
I tried a few experiments and trained sub-transformer with latency constraint of 75 ms. It is mentioned in the paper that HAT is quantization friendly. Now I want to quantize the subtransformer I trained.
Can you please share on how you quantized the subtransformer ? It would be helpful.
Also is there any comparison with respect to latency when quantized. Can we expect latency reduction here ?
Thanks.

Used version of `fairseq`

Hi,

Would it be possible to disclose which version/commit of fairseq was used & modified for this implementation?

Thanks!

lower loss but the BLEU is 0

I have trained a model, the loss in train and valid dataset is very low(lower than 2), But when I evaluated the BLEU, I found it 0. And I check the translation result, all the result is the same, like "the the the the ..." and etc. This is very strange, what could be the reason?

About the Quantization Friendly.

Hi, firstly thanks for your code.
I tried the experiments about the "Quantization Friendly". Could you tell me about how exactly you implement Transformer Float32？
Thanks!

Question about the SubTransformers sampling process.

Hi,

Thanks a lot for releasing this great project.
I have a question on the SubTransformers sampling process in the distributed training environment. I see you sample a random SubTransformer before each train step by doing the following, then in multi-GPU scenario, does each GPU has the same random SubTransformer or they each has a different random Subnetwork? Would reset_rand_seed force all GPUs to sample the same random SubTransformer from the SuperNet? And is trainer.get_num_updates() the same at each train step?

configs = [utils.sample_configs(utils.get_all_choices(args), reset_rand_seed=True, rand_seed=trainer.get_num_updates(), super_decoder_num_layer=args.decoder_layers)]

Thanks a lot for your help.

What is the method used to sample training examples for the MLP latency predictor?

Hi,

The paper says it trains the MLP latency predictor on 2000 latency samples. May I ask what was your sampling method? That is, how did you choose those 2000 samples?

Thanks,
Mohamed.

One question

Thanks for releasing the great project！

One thing I‘d like to make sure is whether the parameter of head_num has an effect on model latency? From your code, I find that qkv_dim is fixed, thus i conjecture that the head_num would not affect the model latency.

Thanks

Does the generated latency count in the embedding lookup table and the last output layers ?

According to the code, the generated latency should count in the embedding lookup table and the last output layers. But I find a problem, I train a predictor , and it is very accurate. Then I run the evo search with a hardware latency constraint of 200ms. After the subTransformer is trained, I test the latency, and the latency is 270ms, which is much larger than predicted latency. Why does this happen?

Training new SuperTransformer - calculating number of SubTransformer combinations?

Dear Authors,

Thanks for the great library. I am currently attempting to train a new SuperTransformer. The paper states that the default design space contains 10^15 SubTransformer configurations. Can you explain how this number is calculated, so I can work on calculating the number of SubTransformers in my new SuperTransformer?

question about number of parameters

Hi, I trained some models using the pre-defined configurations but
the number of parameters is much larger than what you reported (55.1M vs. 31.5M).

Configuration:
HAT_iwslt14deen_titanxp@[email protected]

Here is the code I used to calculate the number of parameters:
(embedding layers are excluded)

import torch

m = torch.load('checkpoints/iwslt14.de-en/subtransformer/HAT_iwslt14deen_titanxp@\
[email protected]/checkpoint_best.pt', map_location='cpu')
m = m['model']

n = 0
for k in m:
    if 'emb' not in k:
        n += m[k].numel()

print(n)

mit-han-lab / hardware-aware-transformers Goto Github PK

hardware-aware-transformers's People

Contributors

Stargazers

Watchers

Forkers

hardware-aware-transformers's Issues

Recommend Projects

Recommend Topics

Recommend Org