Coder Social home page Coder Social logo

noagarcia / roll-videoqa Goto Github PK

View Code? Open in Web Editor NEW
19.0 3.0 4.0 535 KB

PyTorch code for ROLL, a knowledge-based video story question answering model.

License: MIT License

Python 100.00%
visual-question-answering video-question-answering video-understanding knowledge-based-reasoning

roll-videoqa's Introduction

Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

This is the PyTorch implementation of our ROLL model for VideoQA. ROLL has been recently published at ECCV 2020. Find the technical paper here.

roll

ROLL consists on three branches, each performing a different inspired-cognitive task:

  1. Read branch: Dialog comprehension.
  2. Observe branch: Visual scene reasoning.
  3. Recall branch: Storyline recalling.

The information generated by each branch is encoded via Transformers. A modality weighting mechanism balances the output from the different modalities to predict the final answer.

Dependencies

This code runs on Python 3.6 and PyTorch 1.0.1. We recommend using Anaconda to install the dependencies.

conda create --name roll-videoqa python=3.6
conda activate roll-videoqa
conda install -c anaconda numpy pandas scikit-learn 
conda install -c conda-forge visdom tqdm
conda install pytorch==1.0.1 torchvision==0.2.2 -c pytorch
pip install pytorch-transformers

Data

For data preparation, follow instructions in DATA.md.

ROLL on KnowIT VQA

  1. Start Visdom Server. To visualize the training plots, first start the Visdom server: python -m visdom.server. Plots can be found by visiting http://localhost:8097 in a browser.
  2. Pretrain branches. The three branches (read, observe, recall) are first pretrained independently:
    # Read branch training using the subtitles
    python Source/branch_read.py --dataset knowit
    
    # Recall branch training using the video summaries
    python Source/branch_recall.py --dataset knowit
    
    # For the observe branch, the video descriptions need to be computed first.
    # The descriptions will be at Data/knowit_observe/scenes_descriptions.csv
    python Source/generate_scene_description.py knowit
    
    # Observe branch training using the generated descriptions
    python Source/branch_observe.py --dataset knowit
    
  3. Multimodality fusion. The outputs from the branches are fused and the network is trained one last time using the modality weighting mechanism.
    python Source/fuse_branches.py --dataset knowit
    

ROLL on TVQA+

TODO.

Citation

If you find this code useful, please cite our work:

@InProceedings{garcia2020knowledge,
   author    = {Noa Garcia and Yuta Nakashima},
   title     = {Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions},
   booktitle = {Proceedings of the European Conference on Computer Vision},
   year      = {2020},
}
@InProceedings{garcia2020knowit,
   author    = {Noa Garcia and Mayu Otani and Chenhui Chu and Yuta Nakashima},
   title     = {KnowIT VQA: Answering Knowledge-Based Questions about Videos},
   booktitle = {Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence},
   year      = {2020},
}

TODO

  • TVQA+ code

roll-videoqa's People

Contributors

noagarcia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

roll-videoqa's Issues

How to visualize the scene graph?

hello!
I have a question about how can I visualize the scene graph like Fig.7 in your paper?
Or if there is a way to test the scene graph generation part independently.
thanks a lot!

Links for object relations are down

This XML file does not appear to have any style information associated with it. The document tree is shown below.

NoSuchBucket
The specified bucket does not exist
knowit-vqa
HEMH2KMVCX1MJQ9S
mXZFX0mG5Rv4qyCi6UCiAzTUO71VXioomrFmcNc98DzORR1vd7FhpumiLl34Ct/2IqO4UUwURuw=

Can you update the connection or send it to me privately? my email is [email protected]
Thank you !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.