Coder Social home page Coder Social logo

chengxili / disccaptioning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ruotianluo/disccaptioning

0.0 1.0 0.0 175 KB

Code for Discriminability objective for training descriptive captions(CVPR 2018)

Python 97.73% Shell 1.32% HTML 0.95%

disccaptioning's Introduction

Discriminability objective for training descriptive captions

This is the implementation of paper Discriminability objective for training descriptive captions.

Requirements

Python 2.7 (because there is no coco-caption version for python 3)

PyTorch 1.0 (along with torchvision)

java 1.8 for (coco-caption)

Downloads

Clone the repository

git clone --recursive https://github.com/ruotianluo/DiscCaptioning.git

Data split

In this paper we use the data split from Context-aware Captions from Context-agnostic Supervision. It's different from standard karpathy's split, so we need to download different files.

Download link: Google drive link

To train on your own, you only need to download dataset_coco.json, but it's also suggested to download cocotalk.json and cocotalk_label.h5 as well. If you want to run pretrained model, you have to download all three files.

coco-caption

cd coco-caption
bash ./get_stanford_models.sh
cd annotations
# Download captions_val2014.json from the google drive link above to this folder
cd ../../

The reason why we need to replace the captions_val2014.json is because the original file can only evaluate images from the val2014 set, and we are using rama's split.

Pre-computed feature

In this paper, for retrieval model, we use outputs of last layer of resnet-101. For captioning model, we use the bottom-up feature from https://arxiv.org/abs/1707.07998.

The features can be downloaded from the same link, and you need to compress them to data/cocotalk_fc and data/cocobu_att respectively.

Pretrained models.

Download pretrained models from link. Decompress them into root folder.

To evaluate on pretrained model, run:

bash eval.sh att_d1 test

The pretrained models can match the results shown in the paper.

Train on you rown

Preprocessing

Preprocess the captions (skip if you already have 'cocotalk.json' and 'cocotalk_label.h5'):

$ python scripts/prepro_labels.py --input_json data/dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk

Preprocess for self-critical training:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Start training

First train a retrieval model:

bash run_fc_con.sh

Second, pretrain the captioning model.

bash run_att.sh

Third, finetune the captioning model with cider+discriminability optimization:

bash run_att_d.sh 1 (1 is the discriminability weight, and can be changed to other values)

Evaluate

bash eval.sh att_d1 test

Citation

If you found this useful, please consider citing:

@InProceedings{Luo_2018_CVPR,
author = {Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory},
title = {Discriminability Objective for Training Descriptive Captions},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

Acknowledgements

The code is based on ImageCaptioning.pytorch

disccaptioning's People

Contributors

ruotianluo avatar raoyongming avatar gujiuxiang avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.