PyTorch implementation of the paper: Deep Equilibrium Multimodal Fusion [arXiv].
Please clone this repo and use the following command to setup the environment, adjust the CUDA version according to your GPUs:
conda create -n deqfusion python==3.8 pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=10.2 -c pytorch
conda activate deqfusion
pip install -r requirements.txt
The code is modified from the official implementation of MM-Dynamics.
Please follow the command below to train a model on BRCA:
cd experiments/BRCA
python main.py --mode=train -mrna -dna -mirna --f_thres=105 --b_thres=106
Please run python main.py -h
for more details.
The code is modified from MultiBench.
Please first download MM-IMDB dataset from here.
If using ResNet152, please also download raw MM-IMDB dataset from here.
There are several example scripts for running the experiments using different fusion strategies. To train a model with our DEQ fusion on MM-IMDB with default settings (Word2vec+VGGNet), Specify $FILE_PATH
as the path for multimodal_imdb.hdf5
, then run:
cd experiments/MM-IMDB
python examples/multimedia/mmimdb_deq.py -p $FILE_PATH
Alternatively, you can train a model with our DEQ fusion with BERT+ResNet152. First, please specify $FILE_PATH
as the path of raw data directory mmimdb
then run the following command:
cd experiments/MM-IMDB
python examples/multimedia/mmimdb_deq_bert_resnet152.py -p $FILE_PATH
The code is modified from the official implementation of Cross-Modal BERT.
First download the pre-trained BERT model from Google Drive, then use the following command:
wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
unzip uncased_L-12_H-768_A-12.zip
Alternatively, you may download both files from Baidu Netdisk with code fuse
.
Run the experiments by:
cd experiments/CMU-MOSI
python run_classifier.py
If you wish to use DEQ fusion in your own dataset, please copy DEQ_fusion.py
, solver.py
, and jacobian.py
into your repo. Then run from DEQ_fusion import DEQFusion
and use DEQFusion
for multimodal fusion.
Our work benefits largely from DEQ, MDEQ, MM-Dynamics, MultiBench, and Cross-Modal BERT.