Coder Social home page Coder Social logo

violin's Introduction

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

Data and code for CVPR 2020 paper: "VIOLIN: A Large-Scale Dataset for Video-and-Language Inference"

example

We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text. Given a video clip with aligned subtitles as premise, paired with a natural language hypothesis based on the video content, a model needs to infer whether the hypothesis is entailed or contradicted by the given video clip.

Also, we present a new large-scale dataset, named Violin (VIdeO-and-Language INference) for this task, which consists of 95,322 video-hypothesis pairs from 15,887 video clips, spanning over 582 hours of video (YouTube and TV shows). In order to address our new multimodal inference task, a model is required to possess sophisticated reasoning skills, from surface-level grounding (e.g., identifying objects and characters in the video) to in-depth commonsense reasoning (e.g., inferring causal relations of events in the video).

News

  • 2020.04.29 Baseline code released, and leaderboard will be available soon.
  • 2020.04.04 Data features, subtitles and statements released.
  • 2020.03.25 Paper released (arXiv).

Violin Dataset

  • Data Statistics
source #episodes #clips avg clip len avg pos. statement len avg neg. statement len avg subtitle len
Friends 234 2,676 32.89s 17.94 17.85 72.80
Desperate Housewives 180 3,466 32.56s 17.79 17.81 69.19
How I Met Your Mother 207 1,944 31.64s 18.08 18.06 76.78
Modern Family 210 1,917 32.04s 18.52 18.20 98.50
MovieClips 5,885 5,885 40.00s 17.79 17.81 69.20
All 6,716 15,887 35.20s 18.10 18.04 76.40

Baseline Models

  • Model Overview model

Requirements

  • pytorch >= 1.2
  • transformers
  • h5py
  • tqdm
  • numpy

Usage

  1. Download video features, subtitles and statements and put them into your feat directory.

  2. Finetune BERT-base on Violin's training statements, or download our finetuned BERT model.

  3. Training

    Using only subtitles

    python main.py --feat_dir [feat dir] --bert_dir [bert dir] --input_streams sub
    

    Using both subtitles and video resnet features (--feat c3d for c3d features)

    python main.py --feat_dir [feat dir] --bert_dir [bert dir] --input_streams sub vid --feat resnet
    
  4. Testing

    Testing a specific model

    python main.py --test --feat_dir [feat dir] --bert_dir [bert dir] --input_streams sub vid --feat c3d --model_path [model path]
    

violin's People

Contributors

jimmy646 avatar

Stargazers

VincentDENG avatar Eunjeong Hwang avatar  avatar Prince Wang avatar  avatar  avatar  avatar Jiaxin Ge avatar  avatar Hong Z avatar Yang Liu avatar Kamino avatar gengyuanmax avatar Eslam Hussein avatar Hanning Zhang avatar pj avatar David Marx avatar Seungju avatar Minjoon Jung avatar Shawon Ashraf avatar frankfanslc avatar  avatar  avatar Zi-Yuan Hu avatar Shuai Zhao avatar  avatar David Wu avatar  avatar Nathan Raw avatar Joshua avatar Trun_ avatar  avatar Xianyu Chen avatar  avatar Sigrid avatar LeigangQu avatar  avatar  avatar  avatar LinkToPast1990 avatar Tsu-Jui Fu avatar Yang An avatar Yixin Nie avatar Donglearner avatar 涂云斌 avatar Zhijian Hou  avatar Ti-Tai Wang avatar Suhail Kamal avatar Meng Liu avatar Angie avatar aurae avatar  avatar Spyros Nikolakis avatar Hunachi avatar  avatar Mr.Cao avatar  avatar  avatar Aditya Chetan avatar Shubham Agarwal avatar Jeffrey Hsu avatar Olga Ivanova avatar  avatar JUMPING avatar Kunpeng Li avatar Shaun Berryman avatar Wenjia Xu avatar  avatar Crystal Silva Campos avatar liusong avatar ArrowLuo avatar joelxiangnanchen avatar  avatar  avatar Jiachang Hao, 郝家畅 avatar Yongfei Liu avatar Jing Shi avatar  avatar Dong Yi avatar bion howard avatar Lingdong Kong avatar kas-one avatar Harshit Joshi avatar Antoine Yang avatar Dongxu avatar  avatar Xiong Lin avatar  avatar  avatar  avatar Songyang Zhang avatar Yong Yuan avatar Xinhong Ma avatar 0.0 avatar  avatar  avatar junting avatar  avatar Hugh avatar 艾梦 avatar

Watchers

James Cloos avatar Ibrahim Musa avatar  avatar Tejas Gokhale avatar Hugh avatar Edward Kamau avatar  avatar LeigangQu avatar Feiyang(Vance) Chen  avatar paper2code - bot avatar

violin's Issues

Statement to reasoning type mapping

Hi,

Figure 3 in the paper shows the distribution of 6 reasoning types. Could you provide the mapping from a statement pair to its reasoning type, so that we can conduct deeper performance analysis for different types?

Thanks,

Shin Lee

How to make the model runs with cuda9

Hi,
I am using 1.3 version of pytorch with cuda==9.0 and cuDNN=7.0. When I run the model I am getting the following error:

Traceback (most recent call last):
File "main.py", line 137, in
bert.to(opt.device)
File "/home/mitr/anaconda3/envs/vqa20/lib/python3.6/site-packages/torch/nn/modules/module.py", line 432, in to
return self._apply(convert)
File "/home/mitr/anaconda3/envs/vqa20/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply
module._apply(fn)
File "/home/mitr/anaconda3/envs/vqa20/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply
module._apply(fn)
File "/home/mitr/anaconda3/envs/vqa20/lib/python3.6/site-packages/torch/nn/modules/module.py", line 230, in _apply
param_applied = fn(param)
File "/home/mitr/anaconda3/envs/vqa20/lib/python3.6/site-packages/torch/nn/modules/module.py", line 430, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
File "/home/mitr/anaconda3/envs/vqa20/lib/python3.6/site-packages/torch/cuda/init.py", line 178, in _lazy_init
_check_driver()
File "/home/mitr/anaconda3/envs/vqa20/lib/python3.6/site-packages/torch/cuda/init.py", line 108, in _check_driver
of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
AssertionError:
The NVIDIA driver on your system is too old (found version 9000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

Any help, please

config.json not found

Hi,

When trying to run python main.py --feat_dir [feat dir] --bert_dir [bert dir] --input_streams sub vid --feat resnet given in the readme (feat_dir and bert_dir replaced by my local addresses for the C3D features and the pretrained bert model respectively), I am encountering the error 404 Client Error: Not Found for url: https://huggingface.co/bert_output/resolve/main/config.json. Is there ay other way I can get the config.json file?

RAW VIDEOS NOT FOUND

Hi, I was trying to download raw videos from YouTube using the id provided in the dataset. However, some videos are no longer available and no where to be found. How can I reproduce or follow your work when we do not have access to your raw video dataset?

e.g. ,
"gt3ntYidpvs_clip_000_040": {"file": "gt3ntYidpvs_clip_000_040", "source": "gt3ntYidpvs", "span": [0.0, 40.0], "statement": [["The lady in the black tanktop looked out the window to make sure it was safe to open it.", "The lady in the black tanktop looked out the window to make sure it was safe to go outside."], ["The lady in the blue shirt didn't want the lady in black with the gun to open the window because she was afraid something might get in.", "The lady in the blue shirt didn't want the lady in black with the gun to open the window because she was afraid something might get shot."], ["The man in white shirt put the bloody mass outside to try and keep the creatures away from himself.", "A luxury car salesman talks about the red car's performance specifications to a potential customer."]], "sub": [["one board one minute home free okay make", [120, 11629]], ["it quick", [11639, 21970]], ["you ready yeah wait what are you doing", [21980, 26230]], ["show me superiority the senator dead may", [26240, 28990]], ["drive them back sure sound like a call", [29000, 40000]]], "split": "test"}

Download links for features and finetuned BERT model not available

I noticed that the download links for the image (resnet) features, C3D features, detection features, and finetuned BERT model are currently inactive. I would greatly appreciate it if you could kindly make these data available once again, as they are integral to my current research.

About "adversarial matching"

In your paper, you use adversarial matching to collect negative statements.
I want to know how do you calculate the similarity between two statements. Is similar model( ESIM+ELMo) and training strategy used in the VCR ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.