pliang279 / multibench Goto Github PK

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

License: MIT License

Python 27.43% Shell 0.02% Makefile 0.01% CSS 0.75% JavaScript 1.46% HTML 70.31% Batchfile 0.02%

machine-learning multimodal-learning robotics natural-language-processing computer-vision deep-learning healthcare representation-learning speech-processing

multibench's People

Contributors

Stargazers

Watchers

Forkers

tianhaofu geetikaahuja yuanmengzhixing dingwoai duchenzhuang gchhablani haipinglu ankitshah009 mrk1992 mrbeann lvyiwei1 klodivio355 mashrurmorshed vanvan2017 hzhang57 allenk aradhanasinha arunadevikaruppasamy nghianguyen7171 arav-agarwal2 andreitih jizhihang sciai-ai dondongwon sinatabakhi srb-cv yuqinzhou9 karthik19967829 srguo24 douxiaotian sznie stpingi hzhou10cs liangfeifei0920 standardgalactic adoresli xylarshayu xlggzzz cdchushig shijiu-ty kapikantzari cappelletto bayesianist baobaopandas huiqi2000 pkdpsn hengxyz rongfei-chen voodoocjl isevr gjyotin305 guonengneng0904 huacong nano1337 ja1zhou leon-wang521 shaina-12 iltocl to-thesis-mwucs rajeevveera24

multibench's Issues

Error while testing with avmnist_simple_late_fusion.

Firstly, thank you for this great repo.
I try to run examples/multimedia/avmnist_simple_late_fusion.py. The training procedure is all right. But while it runs into the test, it gets the following error.

single_test(encoder, head, test_dataloaders_all[list(test_dataloaders_all.keys())[0]][0], auprc, modalnum, task, criterion)
AttributeError: 'DataLoader' object has no attribute 'keys'

I guess this is due to refactoring do not complete? How can I fix this? Thanks!

EMAP evaluation

Hi There!

Thanks for your work in putting together MultiBench --- this benchmark seems quite promising! I got a google scholar ping from your arxiv paper about the potential inclusion of Empirical Multimodally Additive Projections (EMAP) as a means of evaluating whether or not algorithms are using multi-modal interactions to get better accuracy or not. I'm one of the authors of that paper, and after seeing your RFC, wanted to reach out. Don't hesitate to let me know if I can be helpful implementation-wise for that potential addition!

Jack

Link for processed Affective Computing datasets

In the main README file, the link for the processed mosei dataset is the same as the link for the processed humor dataset. Could you please update the README file with the correct link?

It will be great to have some tests to make sure this benchmark is runnable.

After pulling the recent updates, the code is unable to run.
Some errors like,
recent code change the sys.path.append(os.getcwd()) after all imports, makes the import raise error.

I think it needs tests to make sure all the changes do not make this repo worse.

Errors running mmimdb examples

First off, grateful for the repo and hats off to the tremendous effort that went into building this.

When experimenting with one of the given examples "MultiBench/examples/multimedia/mmimdb_simple_early_fusion.py". There are multiple errors being faced.

The files vgg.tar, synset_words.txt and GoogleNews-vectors-negative300.bin.gz are required to run the function at from datasets.imdb.get_data import get_dataloader and to initialize the class at from .vgg import VGGClassifier. These are locally passed from the authors' source code but are not available in the git repo. This makes it hard for developers like me to run tests and experiment with the repo.

I would also want to point out that installation of the package blocks isn't available in the environment.yml file so that had to be installed separately. If possible please, share the above files so I can run experiments for my project as well.

can you make multibench a python package?

It would make it easier to import the package into my main codebose. Right now I have to add MultiBench directory to the python path in order to use it.

Some algorithms hang while running.

I test avmnist with different algorithms, but some of it hang while running. E.g., unimodal_1 (the strange thing is unimodal_0 is fine), MFM, cca.

Labels for CMU MOSEI

@pliang279 When I access CMU MOSEI labels using mosei.hdf I have an array of 7. What labels do each of the array elements correspond to?

Question about the mosei dataset

Hi, Thanks for your code.

When I use your code to train a model in mosei dataset, I find that after 10 epoch, the model became overfiting. Is this normal?

By the way, in your example, you train your model 1000 epoch. Is this superparameter the result of your experiment?

Thanks!

My model is as follows:

encoders=[GRU(35,300,dropout=True,has_padding=False).cuda(), \
    GRU(74,300,dropout=True,has_padding=False).cuda(),\
    GRU(300,300,dropout=True,has_padding=False).cuda()]
head=MLP(300,150,1).cuda()
fusion =add()

Leaderboard

Hello,
I am currently conducting some experiments on CMU-MOSI and CMU-MOSEI using mmsdk, but I would like to use MultiBench too for my research. Where can I find the sota? It seems to me that you are busy with other things than creating a Leaderboard right now, but do you have any suggestions on how to reproduce the state-of-the-art for MultiBench? Are you aware of the state-of-the-art right now?

Problem with code: os.environ['CUDA_VISIBLE_DEVICES'] = '0'

In examples/affect, there are os.environ['CUDA_VISIBLE_DEVICES'] = '0' in some files, and this might result in an error.

The link to the raw dataset is broken

Hi there,

The link to the raw dataset for mosei and mosi which I have highlighted below, seems to be broken.

The link leads to 404.

Suggestion about PyTorch version

The author did not mention it in requirement.txt, but after testing, you had better use torch2.0 and the torchtext of the corresponding version.

作者在requirement.txt中没有讲，但经过测试，pytorch2.0以及对应的torchtext版本是必须的。

Info regarding preprocessed MOSEI

Hi,

Thank you for this amazing repo. I would like to ask further information about how the MOSEI dataset was preprocessed in the released files in the affecting computing part. I was questioning why the sentiment includes 22.777 datapoints while the whole dataset seems to be 23.453. It would be useful however to include maybe a small readme with additional info on how each modality was preprocessed.

What's the meaning of modalities in MUJOCO PUSH dataset?

Hi, I recently tried the MUJOCO PUSH dataset, but I cannot figure out the concrete meaning of the modalities. The paper mentioned

The multimodal inputs are gray-scaled images (1 × 32 × 32) from an RGB camera, forces (and binary contact information) from a force/torque sensor, and the 3D position of the robot end-effector.

I found the modality in the dataset are "control", "image", "sensor", "pos". What are the correspondences between these modalities and the paper? (i.e. what's the meaning of these modalities?).

requirements for imdb dataset

It seems the imdb dataset uses the theano and blocks, what is the system and version requirement for these? I have tried the official link, but it seems not working.
https://blocks.readthedocs.io/en/latest/setup.html

Some error message like,

File "/tmp/pip-install-sazv5l9d/toolz_ccc093e7dfa34bf2af1fb5c703132aa3/toolz/functoolz.py", line 467
      f.__name__ for f in reversed((self.first,) + self.funcs),
      ^
  SyntaxError: Generator expression must be parenthesized

mosei dataset, why some label isn't int type( belong to [-3,3]), but is float?

For example, why some label is 1.333?

Thanks:)

Questions about the video encodings of the mosi and mosei datasets

Thank you for writing a brilliant paper and a convenient code repository to reproduce the results. I have gone through the repo and the paper, but I still have questions about the implemented datasets and dataloaders.

Could you please lend some time to elucidate the following questions about the datasets?

For MOSEI dataset, the encodings for a datapoint are of size 713. I can understand that these features are obtained from OpenFace and Facet libraries, but could you tell us which component/indices in the encodings are obtained from where ?
For the MOSI dataset, the encodings are only of size 35. It seems there are only the Facet features provided for the dataset. Is there a reason why other (OpenFace) features are not used/provided as in Mosei ?
Are you fine-tuning the training data of MOSI/MOSEI to obtain the video encodings?

Thank you again for your efforts. Your answers would save many hours banging our heads around the code.

Question regarding the DHG-14/28 dataset

Hello. I wished to open a PR sometime to add support for the DHG-14/28 dataset [ site | paper ]. It's a challenging dynamic hand-gesture recognition dataset consisting of three modalities:

Depth videos / sequences of 16-bit depth-maps, at resolution 640x480
Sequences of 2D skeleton coordinates (in the image space) of 22 hand joints (frames, 22*2)
Sequences of 3D skeleton coordinates (in the world space), (frames, 22*3)

However, there's a small issue: the standard evaluation process of this dataset is a bit different from the norm.

There are exactly 2800 data instances in the dataset, performed by 20 unique people. Benchmarks on this dataset are evaluated through a 20-fold, 'leave-one-out' cross validation process. Models are trained 20 times: each time 19 people's data is used for training, while 1 person's data is strictly isolated and used as validation. This prevents any data leakage, and is supposed to increase the robustness of the evaluation.

The instructions in MultiBench mention implementing get_dataloader and having it return 3 dataloaders for train, val and test respectively. However there is no test in this dataset, rather 20 combinations of train and val.

Would it be okay to implement it in such a way that it returns training and validation dataloaders only?

Refactor out private_test_scripts/"all_in_one_train/test" functions

Code to obtain features from raw data

Could you please tell us if scripts are available to obtain features from the raw video and audio data? Could you point us to the code or provide us with the code?

[QUESTION] Availability of trained models

Hello everyone :)

First congratulations for this amazing work and benchmark ! It's really really huge ! Second, i was wondering if some already trained models could potentially be shared (e.g. best state of the art, specific to some domains , etc)

Again a huge congrats !!

Léo

Extracting info from the H5 files

Hello,

I would be interested to train an audio-only model (or, perhaps, a bimodal audio-text one) using CMU-MOSEI data.

I would be recomputing the audio embeddings.

So I would need only the links to the videos plus the timestamps and the annotated emotions per timestamp range.

How would I go about extracting this information?

Thanks,

Did you forget switch model train/eval state?

Hi, Thank your excellent work.

Recently, I found a small performance mismatch problem. It seems that the pytorch model is not switched to eval state when you validate.

At line 169, model.eval() is called, and a few lines later, model.train() is called.

MultiBench/training_structures/Supervised_Learning.py

Line 169 in d291e36

model.eval()

MultiBench/training_structures/Supervised_Learning.py

Line 177 in d291e36

model.train()

Combine Kinetics scripts in Special/ Folder to Main Repo

Question about relative robustness

Hi I have a question about relative robustness.
There is a function of "relative_robustness_helper" in eval_scripts/robustness.py.
It may be my misunderstanding, but that function is correct?
I think it's necessary to compare with 'LF' result, but it doesn't do it.
(I checked the paper and there is some explanation about it but I think it's not same to code)

The Kinetics Dataset

Where can I get the data in /data/yiwei/kinetics?

questions about mosei dataset

Hi, thanks for your code.

When I use your dataloader loading mosei affect dataset[I used the dataset provided in your repo]. I found that the batch video data shape is [batchsize, 50, 35]. batch audio data shape is [batchsize, 50 74]. What does 50 and 35 in video date shape mean? BTW, the regular video batch data format should be [batchsize, channel, clip_length, crop_size, crop_size]. It seems that [batchsize, 50, 35] doesn't follow this format. What is the reason for this?

Thanks!

Models, data used in get_data.py for mmimdb missing

Hi,

Is there any way to access the VGGNet pretrained model and data used to extract parameters/features in vgg.py and get_data.py ?

The repo for this (multibench) seems to have been deleted.

TIA