Coder Social home page Coder Social logo

vqa's Introduction

Python API and Evaluation Code for v2.0 and v1.0 releases of the VQA dataset.

VQA v2.0 release

This release consists of

  • Real
    • 82,783 MS COCO training images, 40,504 MS COCO validation images and 81,434 MS COCO testing images (images are obtained from [MS COCO website] (http://mscoco.org/dataset/#download))
    • 443,757 questions for training, 214,354 questions for validation and 447,793 questions for testing
    • 4,437,570 answers for training and 2,143,540 answers for validation (10 per question)

There is only one type of task

  • Open-ended task

VQA v1.0 release

This release consists of

  • Real
    • 82,783 MS COCO training images, 40,504 MS COCO validation images and 81,434 MS COCO testing images (images are obtained from [MS COCO website] (http://mscoco.org/dataset/#download))
    • 248,349 questions for training, 121,512 questions for validation and 244,302 questions for testing (3 per image)
    • 2,483,490 answers for training and 1,215,120 answers for validation (10 per question)
  • Abstract
    • 20,000 training images, 10,000 validation images and 20,000 MS COCO testing images
    • 60,000 questions for training, 30,000 questions for validation and 60,000 questions for testing (3 per image)
    • 600,000 answers for training and 300,000 answers for validation (10 per question)

There are two types of tasks

  • Open-ended task
  • Multiple-choice task (18 choices per question)

Requirements

  • python 2.7
  • scikit-image (visit this page for installation)
  • matplotlib (visit this page for installation)

Files

./Questions

  • For v2.0, download the question files from the VQA download page, extract them and place in this folder.
  • For v1.0, both real and abstract, question files can be found on the VQA v1 download page.
  • Question files from Beta v0.9 release (123,287 MSCOCO train and val images, 369,861 questions, 3,698,610 answers) can be found below
  • Question files from Beta v0.1 release (10k MSCOCO images, 30k questions, 300k answers) can be found here.

./Annotations

  • For v2.0, download the annotations files from the VQA download page, extract them and place in this folder.
  • For v1.0, for both real and abstract, annotation files can be found on the VQA v1 download page.
  • Annotation files from Beta v0.9 release (123,287 MSCOCO train and val images, 369,861 questions, 3,698,610 answers) can be found below
  • Annotation files from Beta v0.1 release (10k MSCOCO images, 30k questions, 300k answers) can be found here.

./Images

  • For real, create a directory with name mscoco inside this directory. For each of train, val and test, create directories with names train2014, val2014 and test2015 respectively inside mscoco directory, download respective images from MS COCO website and place them in respective folders.
  • For abstract, create a directory with name abstract_v002 inside this directory. For each of train, val and test, create directories with names train2015, val2015 and test2015 respectively inside abstract_v002 directory, download respective images from VQA download page and place them in respective folders.

./PythonHelperTools

  • This directory contains the Python API to read and visualize the VQA dataset
  • vqaDemo.py (demo script)
  • vqaTools (API to read and visualize data)

./PythonEvaluationTools

  • This directory contains the Python evaluation code
  • vqaEvalDemo.py (evaluation demo script)
  • vqaEvaluation (evaluation code)

./Results

  • OpenEnded_mscoco_train2014_fake_results.json (an example of a fake results file for v1.0 to run the demo)
  • Visit [VQA evaluation page] (http://visualqa.org/evaluation) for more details.

./QuestionTypes

  • This directory contains the following lists of question types for both real and abstract questions (question types are unchanged from v1.0 to v2.0). In a list, if there are question types of length n+k and length n with the same first n words, then the question type of length n does not include questions that belong to the question type of length n+k.
  • mscoco_question_types.txt
  • abstract_v002_question_types.txt

References

Developers

vqa's People

Contributors

aishwaryaagrawal avatar tejaskhot avatar jiasenlu avatar yash-goyal avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.