pliang279 / multibench Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
License: MIT License
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
License: MIT License
Firstly, thank you for this great repo.
I try to run examples/multimedia/avmnist_simple_late_fusion.py
. The training procedure is all right. But while it runs into the test, it gets the following error.
single_test(encoder, head, test_dataloaders_all[list(test_dataloaders_all.keys())[0]][0], auprc, modalnum, task, criterion)
AttributeError: 'DataLoader' object has no attribute 'keys'
I guess this is due to refactoring do not complete? How can I fix this? Thanks!
Hi There!
Thanks for your work in putting together MultiBench --- this benchmark seems quite promising! I got a google scholar ping from your arxiv paper about the potential inclusion of Empirical Multimodally Additive Projections (EMAP) as a means of evaluating whether or not algorithms are using multi-modal interactions to get better accuracy or not. I'm one of the authors of that paper, and after seeing your RFC, wanted to reach out. Don't hesitate to let me know if I can be helpful implementation-wise for that potential addition!
Jack
In the main README file, the link for the processed mosei dataset is the same as the link for the processed humor dataset. Could you please update the README file with the correct link?
After pulling the recent updates, the code is unable to run.
Some errors like,
recent code change the sys.path.append(os.getcwd())
after all imports, makes the import raise error.
I think it needs tests to make sure all the changes do not make this repo worse.
First off, grateful for the repo and hats off to the tremendous effort that went into building this.
When experimenting with one of the given examples "MultiBench/examples/multimedia/mmimdb_simple_early_fusion.py". There are multiple errors being faced.
vgg.tar
, synset_words.txt
and GoogleNews-vectors-negative300.bin.gz
are required to run the function at from datasets.imdb.get_data import get_dataloader
and to initialize the class at from .vgg import VGGClassifier
. These are locally passed from the authors' source code but are not available in the git repo. This makes it hard for developers like me to run tests and experiment with the repo.I would also want to point out that installation of the package blocks isn't available in the environment.yml
file so that had to be installed separately. If possible please, share the above files so I can run experiments for my project as well.
It would make it easier to import the package into my main codebose. Right now I have to add MultiBench directory to the python path in order to use it.
I test avmnist with different algorithms, but some of it hang while running. E.g., unimodal_1 (the strange thing is unimodal_0 is fine), MFM, cca.
@pliang279 When I access CMU MOSEI labels using mosei.hdf I have an array of 7. What labels do each of the array elements correspond to?
Hi, Thanks for your code.
When I use your code to train a model in mosei dataset, I find that after 10 epoch, the model became overfiting. Is this normal?
By the way, in your example, you train your model 1000 epoch. Is this superparameter the result of your experiment?
Thanks!
My model is as follows:
encoders=[GRU(35,300,dropout=True,has_padding=False).cuda(), \
GRU(74,300,dropout=True,has_padding=False).cuda(),\
GRU(300,300,dropout=True,has_padding=False).cuda()]
head=MLP(300,150,1).cuda()
fusion =add()
Hello,
I am currently conducting some experiments on CMU-MOSI and CMU-MOSEI using mmsdk, but I would like to use MultiBench too for my research. Where can I find the sota? It seems to me that you are busy with other things than creating a Leaderboard right now, but do you have any suggestions on how to reproduce the state-of-the-art for MultiBench? Are you aware of the state-of-the-art right now?
In examples/affect, there are os.environ['CUDA_VISIBLE_DEVICES'] = '0'
in some files, and this might result in an error.
The author did not mention it in requirement.txt, but after testing, you had better use torch2.0 and the torchtext of the corresponding version.
作者在requirement.txt中没有讲,但经过测试,pytorch2.0以及对应的torchtext版本是必须的。
Hi,
Thank you for this amazing repo. I would like to ask further information about how the MOSEI dataset was preprocessed in the released files in the affecting computing part. I was questioning why the sentiment includes 22.777 datapoints while the whole dataset seems to be 23.453. It would be useful however to include maybe a small readme with additional info on how each modality was preprocessed.
Hi, I recently tried the MUJOCO PUSH dataset, but I cannot figure out the concrete meaning of the modalities. The paper mentioned
The multimodal inputs are gray-scaled images (1 × 32 × 32) from an RGB camera, forces (and binary contact information) from a force/torque sensor, and the 3D position of the robot end-effector.
I found the modality in the dataset are "control", "image", "sensor", "pos". What are the correspondences between these modalities and the paper? (i.e. what's the meaning of these modalities?).
It seems the imdb dataset uses the theano and blocks, what is the system and version requirement for these? I have tried the official link, but it seems not working.
https://blocks.readthedocs.io/en/latest/setup.html
Some error message like,
File "/tmp/pip-install-sazv5l9d/toolz_ccc093e7dfa34bf2af1fb5c703132aa3/toolz/functoolz.py", line 467
f.__name__ for f in reversed((self.first,) + self.funcs),
^
SyntaxError: Generator expression must be parenthesized
For example, why some label is 1.333?
Thanks:)
Could you please lend some time to elucidate the following questions about the datasets?
For MOSEI dataset, the encodings for a datapoint are of size 713. I can understand that these features are obtained from OpenFace and Facet libraries, but could you tell us which component/indices in the encodings are obtained from where ?
For the MOSI dataset, the encodings are only of size 35. It seems there are only the Facet features provided for the dataset. Is there a reason why other (OpenFace) features are not used/provided as in Mosei ?
Are you fine-tuning the training data of MOSI/MOSEI to obtain the video encodings?
Thank you again for your efforts. Your answers would save many hours banging our heads around the code.
Hello. I wished to open a PR sometime to add support for the DHG-14/28 dataset [ site | paper ]. It's a challenging dynamic hand-gesture recognition dataset consisting of three modalities:
However, there's a small issue: the standard evaluation process of this dataset is a bit different from the norm.
There are exactly 2800 data instances in the dataset, performed by 20 unique people. Benchmarks on this dataset are evaluated through a 20-fold, 'leave-one-out' cross validation process. Models are trained 20 times: each time 19 people's data is used for training, while 1 person's data is strictly isolated and used as validation. This prevents any data leakage, and is supposed to increase the robustness of the evaluation.
The instructions in MultiBench mention implementing get_dataloader
and having it return 3 dataloaders for train, val and test respectively. However there is no test in this dataset, rather 20 combinations of train and val.
Would it be okay to implement it in such a way that it returns training and validation dataloaders only?
Could you please tell us if scripts are available to obtain features from the raw video and audio data? Could you point us to the code or provide us with the code?
Hello everyone :)
First congratulations for this amazing work and benchmark ! It's really really huge ! Second, i was wondering if some already trained models could potentially be shared (e.g. best state of the art, specific to some domains , etc)
Again a huge congrats !!
Léo
Hello,
I would be interested to train an audio-only model (or, perhaps, a bimodal audio-text one) using CMU-MOSEI data.
I would be recomputing the audio embeddings.
So I would need only the links to the videos plus the timestamps and the annotated emotions per timestamp range.
How would I go about extracting this information?
Thanks,
Ed
Hi, Thank your excellent work.
Recently, I found a small performance mismatch problem. It seems that the pytorch model is not switched to eval state when you validate.
At line 169, model.eval() is called, and a few lines later, model.train() is called.
Hi I have a question about relative robustness.
There is a function of "relative_robustness_helper" in eval_scripts/robustness.py.
It may be my misunderstanding, but that function is correct?
I think it's necessary to compare with 'LF' result, but it doesn't do it.
(I checked the paper and there is some explanation about it but I think it's not same to code)
Where can I get the data in /data/yiwei/kinetics?
Hi, thanks for your code.
When I use your dataloader loading mosei affect dataset[I used the dataset provided in your repo]. I found that the batch video data shape is [batchsize, 50, 35]. batch audio data shape is [batchsize, 50 74]. What does 50 and 35 in video date shape mean? BTW, the regular video batch data format should be [batchsize, channel, clip_length, crop_size, crop_size]. It seems that [batchsize, 50, 35] doesn't follow this format. What is the reason for this?
Thanks!
Hi,
Is there any way to access the VGGNet pretrained model and data used to extract parameters/features in vgg.py and get_data.py ?
The repo for this (multibench) seems to have been deleted.
TIA
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.