Coder Social home page Coder Social logo

mim's Introduction

Multimodal Information Modulation

This is the open source code for paper: Multimodal Reaction: Information Modulation for Cross-modal Representation Learning

Table of Contents

Description

This is the open source code for paper: Multimodal Reaction: Information Modulation for Cross-modal Representation Learning. We have provided the implementation on the task of multimodal sentiment analysis. The main components are as follow:

  1. ./datasets contains the datasets used in the experiments
  2. ./modules contains the model definition
  3. ./utils contains the functions for data processing, evaluation metrics, etc. The file ./utils/loss.py will be updated in May, 2024.
  4. global_configs.py defines important constants
  5. train.py defines the training process

Preparation

Datasets

We have already provided the processed MOSI dataset in ./dataset

To download the larger MOSEI dataset, you can run datasets/download_datasets.sh

Installation

To install the required packages, you can run pip install -r requirements.txt

Configuration

Before starting training, you should define the global constants in global_configs.py. Default configuration is set to MOSI dataset. Important settings include GPU setting, learning rate, feature dimension, training epochs, training batch size, dataset setting and dimension setting of input data. To run the MOSEI dataset, remember to change dimension of the visual modality

from torch import nn

class DefaultConfigs(object):

    device = '1'                                 #GPU setting
    logs = './logs/'
    
    max_seq_length = 50 
    lr = 1e-5                                    #learning rate
    d_l = 80                                     #feature dimension
    n_epochs = 100                               #training epochs
    train_batch_size = 16                        #training batch size
    dev_batch_size = 128
    test_batch_size = 128
    model_choice = 'bert-base-uncased'

    dataset = 'mosi'                             #dataset setting
    TEXT_DIM = 768                               #dimension setting
    ACOUSTIC_DIM = 74
    VISUAL_DIM = 47

    # dataset = 'mosei'
    # ACOUSTIC_DIM = 74
    # VISUAL_DIM = 35
    # TEXT_DIM = 768

config = DefaultConfigs()

Running

To run the experiments, you can run the following command

python train.py

Note that the proposed method is flexible to work with various fusion strategies. We have performed experiments on element-wise addition, element-wise multiplication, concatenation and tensor fusion. If tensor fusion is adopted, the setting of feature dimension d_l in global_configs.py can be larger to ensure higher capacity.

Acknowledgments

We would like to express our gratitude to huggingface and MAG-BERT, which are of great help to our work.

mim's People

Contributors

zengy268 avatar

Stargazers

Linpeng Peng avatar Zhu Li'an avatar Lam Chi avatar  avatar  avatar  avatar happy678jm avatar  avatar  avatar

Watchers

 avatar Kostas Georgiou avatar  avatar

mim's Issues

代码不全

作者您好,我仔细看了您的该篇论文,对于如何动态确定主模态这块的代码很感兴趣,但是我发现代码不全,您可以补全嘛

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.