Coder Social home page Coder Social logo

yenchi-hsu / soft_attention_captioning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from markdtw/soft-attention-image-captioning

0.0 1.0 0.0 548 KB

Image captioning on MSCOCO dataset with visual attention based on show, attend and tell paper (ICML'15)

Python 100.00%

soft_attention_captioning's Introduction

Soft Attention Captioning

Tensorflow implementation of Show, Attend and Tell presented in ICML'15.

This repository is highly based on jazzsaxmafia/show_attend_and_tell.tensorflow with few bugs fixed according to the paper. The bugs include LSTM and loss computation errors, though they are quite trivial.

Prerequisites

Data

Preprocessing

We need 3 things prepared before we train the model:

  • The extracted image features from vgg19-conv5_4 of shape (14, 14, 512)
  • Training captions with respect to the features
  • Create an empty folder log/ for tensorflow

Extracts the features by executing python vgg/coco_conv54.py, which requires heavy CPU memory usage (up to 80GB) and time (up to 5 hrs). The extracted features are too large that I decided to use np.float16 as the final type. The vgg model is from machrisaa/tensorflow-vgg. To generate the needed captions for my implementation, please refer to data/map_features.py. This program generate two files: train_82783_order.pkl, test_20548_order.pkl.

After the preprocessing steps, we should have train_82783_order.pkl, train_82783_vggc54npf16.npy in the data/ directory.

Train

python train.py

Tunable parameters in configs.py

Test

Single image test:

python test.py --img_path=/path/to/image.jpg

Generate generated.csv for all the images in data/test.csv. This need you to extracts the features again...

python test.py --eval_all=True

Model can be designated by passing argument: python test.py --model_path=/path/to/model-epoch-n. **Notice it is model-epoch-n, not model-epoch-n.meta nor model-epoch-n.data **.

Evaluation

Sorry, didn't write.

Resources

These helps me a lot when building the model besides the original paper:

Acknowledgments

Code based highly on jazzsaxmafia/show_attend_and_tell.tensorflow
VGG model from machrisaa/tensorflow-vgg

soft_attention_captioning's People

Contributors

markdtw avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.