Coder Social home page Coder Social logo

vizwiz-qualityissues's Introduction

VizWiz-QualityIssues

Code for the VizWiz Image Quality Issues Dataset, including API, baseline models and evaluation method.

Requirements

  • tensorflow v1.14
  • keras v2.3.1

Files

./

  • demo_recognizability.ipynb: demo of recognizability prediction using Detectron or Resnet152 feature maps and its evaluation.
  • demo_answerability_recognizability.ipynb: demo of joint prediction of answerability and recognizability using Detectron or Resnet152 feature maps and its evaluation.

./api

  • load_annotations.ipynb: This file shows how to compile VizWiz-VQA and VizWiz-ImageQualityIssues annotations saved in ./annotations into arrays for further use, such as Tensorflow Dataset as in ./utils/DatasetAPI_to_tensor.py. Compiled files are saved in ./data.

./annotations

  • vqa_annotations/train.json, val.json, test.json (VizWiz-VQA training/val/test set)
  • quality_annotations/train.json, val.json, test.json (VizWiz-ImageQualityIssues training/val/test set)

./data

  • quality.json: arrayed quality annotations compiled from ./annotations/quality_annotations/train.json, val.json, test.json by following ./api/load_annotations.ipynb. This file is for recognizability prediction.
data = json.load(open('./data/quality.json')) 
# data is a dictionary with keys ['train', 'val', 'test'] corresponding to training/val/test set
# Take training set for example. data['train'] is a dictionary with keys ['image', 'flaws', 'recognizable']
# data['train']['image'], data['train']['flaws'], data['train']['recognizable'] are lists; can be converted numpy array with np.asarray()
# data['train']['image'][0] == the name of first image
# data['train']['flaws'][0] == the flaws of first image
# data['train']['recognizable'][0] == recognizability of first image
  • vqa_quality_merger.json: arrayed vqa and quality annotations; incorporate ./annotations/vqa_annotations/train.json, val.json, test.json into quality.json. This file is for joint answerability-recognizability prediction.
data = json.load(open('./data/vqa_quality_merger.json')) 
# In addition to the keys ['image', 'flaws', 'recognizable'], data['train'] has two other keys ['answerable', 'question']

./fmap

  • detectron/: where image feature maps extracted by Detectron are stored. For how to extract the features, please refer to this notebook.
  • resnet152/: where image feature maps extracted by Resnet152 are stored. They can be derived by following the snippet of code below:
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input

resnet152 = keras.applications.ResNet152(include_top=False, weights='imagenet', input_shape=[448, 448, 3])
base_model = keras.models.Model(inputs=resnet152.input, outputs=resnet152.get_layer('conv5_block3_add').output)

img = image.load_img(IMG_PATH, target_size=(448,448)) 
img = image.img_to_array(img)
img = np.expand_dims(img, axis=0)
img = preprocess_input(img)
img_feat = base_model.predict(img)

./model/VqaQualityModel.py: Model adapted from Up-Down attention VQA model for the joint prediction of answerability and recognizability

./utils

  • DatasetAPI_to_tensor.py: Tensorflow dataset API for loading data while running the model
  • word2vocab_vizwiz: Ids of tokenized frequent words in the questions in VizWiz VQA dataset

./ckpt_rec and ./ckpt_ans_rec: checkpoints for the recongizability predictor and Up-Down model for answerability and recognizability prediction, respectively.

Evaluation results from the baseline models

We use average precision as the evaluation metric.

Recognizability prediction

Avg. precision of unrecognizability:

Feature maps Validation set Test-dev Test-standard
Detectron 79.44 77.36 78.49
Resnet-152 80.17 77.82 78.69

Answerability and recognizability prediction

The format shown below is (avg. precision of unanswerability / avg. precision of unrecognizability given the question is unanswerable)

Feature maps Validation set Test-dev Test-standard
Detectron 71.41 / 83.08 72.26 / 85.38 70.53 / 86.20
Resnet-152 70.97 / 83.12 71.26 / 84.90 70.39 / 85.13

References

vizwiz-qualityissues's People

Contributors

chiutaiyin avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.