Coder Social home page Coder Social logo

soumakm / visual-question-answering-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from shikamaru-96/visual-question-answering

0.0 2.0 0.0 8.29 MB

Implementation of the visual question answering model from the paper "Exploring Models and Data for Image Question Answering".

License: MIT License

Python 100.00%

visual-question-answering-1's Introduction

Visual Question Answering

This is a python and keras implementation of the visual question answering model from the paper Exploring Models and Data for Image Question Answering.The model implemented is similar to the 2-VIS+BLSTM model mentioned in the paper except that the LSTMs are not bidirectional.This model has two image feature inputs, at the start and the end of the sentence, with different learned linear transformations. We call it 2-VIS+LSTM.

Details about the dataset are explained at the VisualQA website.

Requirements

  • Python 2.7
  • Numpy
  • Scipy (for loading pre-computed MS COCO features)
  • NLTK (for tokenizer)
  • Keras(version used: 2.0.9)

Training

  • The basic usage is python train.py.

  • The batch size and the number of epochs can also be specified using the options -num_epochs and -batch_size. The default batch size and number of epochs are 100 and 10 respectively.

  • To train with a batch size of 200 for 20 epochs, we would use: python train.py -batch_size=200 -num_epochs=20.

  • If your device gives memory error then make swap space of 40GB and rerun the code.

Results

Our model has a training accuracy of 59.70% and validation accuracy of52.04%

Pre Trained Weights

If you don't feel like making the entire model on your machine you can download the pretrained weights from these links:

Running the Model

  • Questions can be asked on any image using the script question_answer.py.

  • Run the script: python question_answer.py

  • Enter the image address in image_path (Enter n in image_path to exit) and question `

Here are some examples of predictions:

Image Question Top Answers (left to right)
Which animal is this? dog, cat, giraffe
Which game is this? tennis, baseball, frisbee
Which animal is this? giraffe, cat, bear

visual-question-answering-1's People

Contributors

abhaygupta97 avatar

Watchers

James Cloos avatar Soumak Mookherjee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.