Coder Social home page Coder Social logo

pliang279 / multibench Goto Github PK

View Code? Open in Web Editor NEW
430.0 16.0 63.0 51.09 MB

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

License: MIT License

Python 27.43% Shell 0.02% Makefile 0.01% CSS 0.75% JavaScript 1.46% HTML 70.31% Batchfile 0.02%
machine-learning multimodal-learning robotics natural-language-processing computer-vision deep-learning healthcare representation-learning speech-processing

multibench's People

Contributors

arav-agarwal2 avatar js0nwu avatar kapikantzari avatar lvyiwei1 avatar mrbeann avatar neal-ztwu avatar peter-yh-wu avatar pliang279 avatar sfanxiang avatar vanvan2017 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multibench's Issues

Error while testing with avmnist_simple_late_fusion.

Firstly, thank you for this great repo.
I try to run examples/multimedia/avmnist_simple_late_fusion.py. The training procedure is all right. But while it runs into the test, it gets the following error.

single_test(encoder, head, test_dataloaders_all[list(test_dataloaders_all.keys())[0]][0], auprc, modalnum, task, criterion)
AttributeError: 'DataLoader' object has no attribute 'keys'

I guess this is due to refactoring do not complete? How can I fix this? Thanks!

EMAP evaluation

Hi There!

Thanks for your work in putting together MultiBench --- this benchmark seems quite promising! I got a google scholar ping from your arxiv paper about the potential inclusion of Empirical Multimodally Additive Projections (EMAP) as a means of evaluating whether or not algorithms are using multi-modal interactions to get better accuracy or not. I'm one of the authors of that paper, and after seeing your RFC, wanted to reach out. Don't hesitate to let me know if I can be helpful implementation-wise for that potential addition!

Jack

Errors running mmimdb examples

First off, grateful for the repo and hats off to the tremendous effort that went into building this.

When experimenting with one of the given examples "MultiBench/examples/multimedia/mmimdb_simple_early_fusion.py". There are multiple errors being faced.

  1. The files vgg.tar, synset_words.txt and GoogleNews-vectors-negative300.bin.gz are required to run the function at from datasets.imdb.get_data import get_dataloader and to initialize the class at from .vgg import VGGClassifier. These are locally passed from the authors' source code but are not available in the git repo. This makes it hard for developers like me to run tests and experiment with the repo.

I would also want to point out that installation of the package blocks isn't available in the environment.yml file so that had to be installed separately. If possible please, share the above files so I can run experiments for my project as well.

Some algorithms hang while running.

I test avmnist with different algorithms, but some of it hang while running. E.g., unimodal_1 (the strange thing is unimodal_0 is fine), MFM, cca.

Labels for CMU MOSEI

@pliang279 When I access CMU MOSEI labels using mosei.hdf I have an array of 7. What labels do each of the array elements correspond to?

Question about the mosei dataset

Hi, Thanks for your code.

When I use your code to train a model in mosei dataset, I find that after 10 epoch, the model became overfiting. Is this normal?

By the way, in your example, you train your model 1000 epoch. Is this superparameter the result of your experiment?

Thanks!

My model is as follows:

encoders=[GRU(35,300,dropout=True,has_padding=False).cuda(), \
    GRU(74,300,dropout=True,has_padding=False).cuda(),\
    GRU(300,300,dropout=True,has_padding=False).cuda()]
head=MLP(300,150,1).cuda()
fusion =add()

Leaderboard

Hello,
I am currently conducting some experiments on CMU-MOSI and CMU-MOSEI using mmsdk, but I would like to use MultiBench too for my research. Where can I find the sota? It seems to me that you are busy with other things than creating a Leaderboard right now, but do you have any suggestions on how to reproduce the state-of-the-art for MultiBench? Are you aware of the state-of-the-art right now?

Suggestion about PyTorch version

The author did not mention it in requirement.txt, but after testing, you had better use torch2.0 and the torchtext of the corresponding version.

作者在requirement.txt中没有讲,但经过测试,pytorch2.0以及对应的torchtext版本是必须的。

Info regarding preprocessed MOSEI

Hi,

Thank you for this amazing repo. I would like to ask further information about how the MOSEI dataset was preprocessed in the released files in the affecting computing part. I was questioning why the sentiment includes 22.777 datapoints while the whole dataset seems to be 23.453. It would be useful however to include maybe a small readme with additional info on how each modality was preprocessed.

What's the meaning of modalities in MUJOCO PUSH dataset?

Hi, I recently tried the MUJOCO PUSH dataset, but I cannot figure out the concrete meaning of the modalities. The paper mentioned

The multimodal inputs are gray-scaled images (1 × 32 × 32) from an RGB camera, forces (and binary contact information) from a force/torque sensor, and the 3D position of the robot end-effector.

I found the modality in the dataset are "control", "image", "sensor", "pos". What are the correspondences between these modalities and the paper? (i.e. what's the meaning of these modalities?).

requirements for imdb dataset

It seems the imdb dataset uses the theano and blocks, what is the system and version requirement for these? I have tried the official link, but it seems not working.
https://blocks.readthedocs.io/en/latest/setup.html

Some error message like,

File "/tmp/pip-install-sazv5l9d/toolz_ccc093e7dfa34bf2af1fb5c703132aa3/toolz/functoolz.py", line 467
      f.__name__ for f in reversed((self.first,) + self.funcs),
      ^
  SyntaxError: Generator expression must be parenthesized

Questions about the video encodings of the mosi and mosei datasets

Thank you for writing a brilliant paper and a convenient code repository to reproduce the results. I have gone through the repo and the paper, but I still have questions about the implemented datasets and dataloaders.

Could you please lend some time to elucidate the following questions about the datasets?

  1. For MOSEI dataset, the encodings for a datapoint are of size 713. I can understand that these features are obtained from OpenFace and Facet libraries, but could you tell us which component/indices in the encodings are obtained from where ?

  2. For the MOSI dataset, the encodings are only of size 35. It seems there are only the Facet features provided for the dataset. Is there a reason why other (OpenFace) features are not used/provided as in Mosei ?

  3. Are you fine-tuning the training data of MOSI/MOSEI to obtain the video encodings?

Thank you again for your efforts. Your answers would save many hours banging our heads around the code.

Question regarding the DHG-14/28 dataset

Hello. I wished to open a PR sometime to add support for the DHG-14/28 dataset [ site | paper ]. It's a challenging dynamic hand-gesture recognition dataset consisting of three modalities:

  • Depth videos / sequences of 16-bit depth-maps, at resolution 640x480
  • Sequences of 2D skeleton coordinates (in the image space) of 22 hand joints (frames, 22*2)
  • Sequences of 3D skeleton coordinates (in the world space), (frames, 22*3)

However, there's a small issue: the standard evaluation process of this dataset is a bit different from the norm.

There are exactly 2800 data instances in the dataset, performed by 20 unique people. Benchmarks on this dataset are evaluated through a 20-fold, 'leave-one-out' cross validation process. Models are trained 20 times: each time 19 people's data is used for training, while 1 person's data is strictly isolated and used as validation. This prevents any data leakage, and is supposed to increase the robustness of the evaluation.

The instructions in MultiBench mention implementing get_dataloader and having it return 3 dataloaders for train, val and test respectively. However there is no test in this dataset, rather 20 combinations of train and val.

Would it be okay to implement it in such a way that it returns training and validation dataloaders only?

Code to obtain features from raw data

Could you please tell us if scripts are available to obtain features from the raw video and audio data? Could you point us to the code or provide us with the code?

[QUESTION] Availability of trained models

Hello everyone :)

First congratulations for this amazing work and benchmark ! It's really really huge ! Second, i was wondering if some already trained models could potentially be shared (e.g. best state of the art, specific to some domains , etc)

Again a huge congrats !!

Léo

Extracting info from the H5 files

Hello,

I would be interested to train an audio-only model (or, perhaps, a bimodal audio-text one) using CMU-MOSEI data.

I would be recomputing the audio embeddings.

So I would need only the links to the videos plus the timestamps and the annotated emotions per timestamp range.

How would I go about extracting this information?

Thanks,

Ed

Question about relative robustness

Hi I have a question about relative robustness.
There is a function of "relative_robustness_helper" in eval_scripts/robustness.py.
It may be my misunderstanding, but that function is correct?
I think it's necessary to compare with 'LF' result, but it doesn't do it.
(I checked the paper and there is some explanation about it but I think it's not same to code)

questions about mosei dataset

Hi, thanks for your code.

When I use your dataloader loading mosei affect dataset[I used the dataset provided in your repo]. I found that the batch video data shape is [batchsize, 50, 35]. batch audio data shape is [batchsize, 50 74]. What does 50 and 35 in video date shape mean? BTW, the regular video batch data format should be [batchsize, channel, clip_length, crop_size, crop_size]. It seems that [batchsize, 50, 35] doesn't follow this format. What is the reason for this?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.