Coder Social home page Coder Social logo

mmnas's Introduction

MMNas: Deep Multimodal Neural Architecture Search

This repository corresponds to the PyTorch implementation of the MMnas for visual question answering (VQA), visual grounding (VGD), and image-text matching (ITM) tasks.

example-image

Prerequisites

Software and Hardware Requirements

You may need a machine with at least 4 GPU (>= 8GB), 50GB memory for VQA and VGD and 150GB for ITM and 50GB free disk space. We strongly recommend to use a SSD drive to guarantee high-speed I/O.

You should first install some necessary packages.

  1. Install Python >= 3.6

  2. Install Cuda >= 9.0 and cuDNN

  3. Install PyTorch >= 0.4.1 with CUDA (Pytorch 1.x is also supported).

  4. Install SpaCy and initialize the GloVe as follows:

    $ pip install -r requirements.txt
    $ wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
    $ pip install en_vectors_web_lg-2.1.0.tar.gz

Dataset Preparations

Please follow the instructions in dataset_setup.md to download the datasets and features.

Search

To search an optimal architecture for a specific task, run

$ python3 search_[vqa|vgd|vqa].py

At the end of each searching epoch, we will output the optimal architecture (choosing operators with largest architecture weight for every block) accroding to current architecture weights. When the optimal architecture doesn't change for several continuous epochs, you can kill the searching process manually.

Training

The following script will start training network with the optimal architecture that we've searched by MMNas:

$ python3 train_[vqa|vgd|itm].py --RUN='train' --ARCH_PATH='./arch/train_vqa.json'

To add:

  1. --VERSION=str, e.g.--VERSION='mmnas_vqa' to assign a name for your this model.

  2. --GPU=str, e.g.--GPU='0, 1, 2, 3' to train the model on specified GPU device.

  3. --NW=int, e.g.--NW=8 to accelerate I/O speed.

  1. --RESUME to start training with saved checkpoint parameters.

  2. --ARCH_PATH can use the different searched architectures.

If you want to evaluate an architecture that you got from seaching stage, for example, it's the output architecture at the 50-th searching epoch for vqa model, you can run

$ python3 train_vqa.py --RUN='train' --ARCH_PATH='[PATH_TO_YOUR_SEARCHING_LOG]' --ARCH_EPOCH=50

Validation and Testing

Offline Evaluation

It's convenient to modify follows args: --RUN={'val', 'test'} --CKPT_PATH=[Your Model Path] to Run val or test Split.

Example:

$ python3 train_vqa.py --RUN='test' --CKPT_PATH=[Your Model Path] --ARCH_PATH=[Searched Architecture Path]

Online Evaluation (ONLY FOR VQA)

Test Result files will stored in ./logs/ckpts/result_test/result_train_[Your Version].json

You can upload the obtained result file to Eval AI to evaluate the scores on test-dev and test-std splits.

Pretrained Models

We provide the pretrained models in pretrained_models.md to reproduce the experimental results in our paper.

Citation

If this repository is helpful for your research, we'd really appreciate it if you could cite the following paper:

@article{yu2020mmnas,
  title={Deep Multimodal Neural Architecture Search},
  author={Yu, Zhou and Cui, Yuhao and Yu, Jun and Wang, Meng and Tao, Dacheng and Tian, Qi},
  journal={Proceedings of the 28th ACM International Conference on Multimedia},
  pages = {3743--3752},
  year={2020}
}

mmnas's People

Contributors

cuiyuhao1996 avatar mil-vlg avatar paradoxzw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

mmnas's Issues

Errors during searching VQA.

The error is as the following:

 ========== Answer token vocab size (occur more than 8 times): 3129
 ========== Answer token vocab size (occur more than 8 times): 3129
 ========== Answer token vocab size (occur more than 8 times): 3129
 ========== Answer token vocab size (occur more than 8 times): 3129
Traceback (most recent call last):
  File "search_vqa.py", line 615, in <module>
    join=True
  File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
  File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/zhouxx/gprojects/mmnas/search_vqa.py", line 607, in mp_entrance
    exec.run()
  File "/home/zhouxx/gprojects/mmnas/search_vqa.py", line 585, in run
    self.search(train_loader, eval_loader)
  File "/home/zhouxx/gprojects/mmnas/search_vqa.py", line 268, in search
    for step, step_load in enumerate(train_loader):
  File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
    return self._process_data(data)
  File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zhouxx/anaconda3/envs/py36-t12-cu100/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zhouxx/gprojects/mmnas/mmnas/loader/load_data_vqa.py", line 224, in __getitem__
    frcn_feat = np.load(self.iid_to_frcn_feat_path[iid])
KeyError: '463620'

Why does the warning occur? And is it necessary to fix it?

The warning is as the following:

lib/python3.6/site-packages/spacy/util.py:275: UserWarning: [W031] Model 'en_vectors_web_lg' (2.1.0) requires spaCy v2.1 and is incompatible with the current spaCy ver
sion (2.3.5). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and availabl
e updates, run: python -m spacy validate
  warnings.warn(warn_msg)

Why does the warning occur? And is it necessary to fix it?

Some questions about how to calculate the gradient of 'alpha_prob'

Why calculate the gradient of alpha_probs like this?

probs = self.probs_over_ops.data
            for i in range(self.n_choices):     
                for j in range(self.n_choices): 
                    self.alpha_prob.grad.data[i] += binary_grads[j] * probs[j] * (self.delta_ij(i, j) - probs[i])  

Code In MixedOp.set_arch_param_grad().

Searching ITM is stucked at epoch10

run search_itm.py is stucked at epoch10. No errors occur and the program does not terminate itself.
The last output is as the following,

evaluate percent 45.2755905511811
evaluate percent 47.24409448818898
evaluate percent 49.21259842519685
evaluate percent 51.181102362204726
evaluate percent 53.14960629921261
evaluate percent 55.118110236220474
evaluate percent 57.08661417322835
evaluate percent 59.055118110236215
evaluate percent 61.023622047244096
evaluate percent 62.99212598425197
evaluate percent 64.96062992125984
evaluate percent 66.92913385826772
evaluate percent 68.89763779527559
evaluate percent 70.86614173228347
evaluate percent 72.83464566929135
evaluate percent 74.80314960629921
evaluate percent 76.77165354330708
evaluate percent 78.74015748031496
evaluate percent 80.70866141732283
evaluate percent 82.67716535433071
evaluate percent 84.64566929133859
evaluate percent 86.61417322834646
evaluate percent 88.58267716535433
evaluate percent 90.5511811023622
evaluate percent 92.51968503937007
evaluate percent 94.48818897637796
evaluate percent 96.45669291338582
evaluate percent 98.4251968503937
(1014, 5070)
i2t stat num: 1014
i2t results: 14.89 37.48 50.79 10.00 34.80

t2i stat num: 5070
t2i results: 12.31 36.31 51.50 10.00 29.36

reset negative captions ...
reset negative captions ...
reset negative captions ...
reset negative captions ...

And the output of nvidia-smi is as the following all the time since the program is stucked.

Sat Feb 27 18:48:18 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40c          On   | 00000000:02:00.0 Off |                    0 |
| 23%   37C    P0    63W / 235W |   5573MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40c          On   | 00000000:03:00.0 Off |                    0 |
| 23%   43C    P0    69W / 235W |   9840MiB / 11441MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K40m          On   | 00000000:82:00.0 Off |                    0 |
| N/A   33C    P0    62W / 235W |   5573MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K40m          On   | 00000000:83:00.0 Off |                    0 |
| N/A   34C    P0    68W / 235W |   9840MiB / 11441MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     24425      C   ...naconda3/envs/py36-t101-cu90/bin/python  5562MiB |
|    1     24426      C   ...naconda3/envs/py36-t101-cu90/bin/python  9827MiB |
|    2     24427      C   ...naconda3/envs/py36-t101-cu90/bin/python  5562MiB |
|    3     24428      C   ...naconda3/envs/py36-t101-cu90/bin/python  9827MiB |
+-----------------------------------------------------------------------------+

I have noticed that epoch10 is the NEG_START_EPOCH, but I have no idea about what is wrong there.

Implementation details of hyper-parameters in CfgSearch and Cfg

I am following your work and running the code on VGD, with pertained features from dataset_setup.md But I failed to get the results mentioned in the paper: my test accuracy was often 2%~5% lower. Could you possibly provide more experimental details about the hyper-parameters such as CfgSearch and Cfg (e.g. ALPHA_START, ALPHA_EVERY, ALPHA_WEIGHT_DECAY, NET_OPTIM_WARMUP, NET_LR_DECAY_R), and other potentially helpful tricks?
Thanks for your preeminent work and help.

Potential bugs in `train_itm.py` when generating negative samples

As the following codes show,

mmnas/train_itm.py

Lines 335 to 353 in 552e29e

for step, (frcn_feat_iter_list, bbox_feat_iter_list, rel_img_iter_list, cap_ix_iter_list, rel_cap_iter_list, neg_idx_list) in enumerate(tqdm.tqdm(neg_imgs_loader)):
frcn_feat_iter_list = all_frcn_feat_iter_list[neg_idx_list, :]
bbox_feat_iter_list = all_bbox_feat_iter_list[neg_idx_list, :]
rel_img_iter_list = all_rel_img_iter_list[neg_idx_list, :]
frcn_feat_iter_list = frcn_feat_iter_list.view(-1, self.__C.FRCNFEAT_LEN, self.__C.FRCNFEAT_SIZE)
bbox_feat_iter_list = bbox_feat_iter_list.view(-1, self.__C.FRCNFEAT_LEN, 5)
rel_img_iter_list = rel_img_iter_list.view(-1, self.__C.FRCNFEAT_LEN, self.__C.FRCNFEAT_LEN, 4)
cap_ix_iter_list = cap_ix_iter_list.view(-1, neg_caps_loader.dataset.max_token)
rel_cap_iter_list = rel_cap_iter_list.view(-1, neg_caps_loader.dataset.max_token, neg_caps_loader.dataset.max_token, 3)
input = (frcn_feat_iter_list, bbox_feat_iter_list, rel_img_iter_list, cap_ix_iter_list, rel_cap_iter_list)
scores = net(input)
scores = scores.view(-1, self.__C.NEG_RANDSIZE)
arg_scores = torch.argsort(scores, dim=-1, descending=True)[:, :self.__C.NEG_HARDSIZE]
arg_scores_bi = torch.arange(arg_scores.size(0)).unsqueeze(1).expand_as(arg_scores)
scores_ind = neg_idx_list[arg_scores_bi, arg_scores].to(scores.device)
neg_imgs_idx_list.append(scores_ind)

And here is what confuses me,

mmnas/train_itm.py

Lines 336 to 338 in 552e29e

frcn_feat_iter_list = all_frcn_feat_iter_list[neg_idx_list, :]
bbox_feat_iter_list = all_bbox_feat_iter_list[neg_idx_list, :]
rel_img_iter_list = all_rel_img_iter_list[neg_idx_list, :]

Why use negative caption indices to get corresponding image features?
I think the three lines of codes should be removed.

Why add 0 loss to the original loss?

mmnas/search_vqa.py

Lines 285 to 288 in 552e29e

# for avoid backward the unused params
loss += 0 * sum(p.sum() for p in net.module.alpha_prob_parameters())
loss += 0 * sum(p.sum() for p in net.module.alpha_gate_parameters())
loss += 0 * sum(p.sum() for p in net.module.net_parameters())

What is this part of the codes aimed at?
I'll appreciate it if anyone could explain that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.