The cnn-lstm-caption-generator from ml-lab

ml-lab / cnn-lstm-caption-generator Goto Github PK

View Code? Open in Web Editor NEW

A Tensorflow implementation of CNN-LSTM image caption generator architecture that achieves close to state-of-the-art results on the MSCOCO dataset.

Python 95.76% Shell 4.24%

cnn-lstm-caption-generator's Introduction

Learning CNN-LSTM Architectures for Image Caption Generation

This code contains a Tensorflow implementation of the CNN-LSTM architecture used to attain state-of-the-art performance on the MSCOCO dataset. We achieve a BLEU-4 score of 24.4 and CIDEr score of 81.7 compared to 27.7 and 85.5 by Google's implementation. Qualitative analysis of the generated captions indicate that the model is able to sensibly caption a wide variety of images from the MSCOCO dataset.

Demo instructions

To try a demo of our best trained model, first ensure that Caffe is installed on your computer and that you have downloaded the GoogleNet model using these instructions. You'll also need Tensorflow 0.8 installed. Then, run:

./download.sh

which will retrive all pickled data files (graciously shared by Satoshi in his chainer implementation.) and the Tensorflow saved model created in this project needed to run the demo. This requires around 180MB of disk space. The 'caption_image.py' file contains all the code needed to load and use the saved model. To run the demo, do:

python caption_image.py -i <path_to_image>

We have included a demo pizza image at images/pizza.jpg to sanity check your installation. Running python caption_image.py -i images/pizza.jpg produces the caption "a pizza with cheese and cheese on a table". It's not perfect, but still pretty cool!

Other files

model.py contains the Model class that contains the CNN-LSTM architecture (using Tensorflow's dynamic_rnn API) and various helper functions for generating captions. evaluate_captions.py is a helper script to generate aggregated JSON files that can then be used for hyperparameter tuning. image_feature_cnn.py contains the helper functions we use to load up the GoogleNet batch normalization CNN model and turn images into 1024 x 1 vectors.

Recommend Projects

ml-lab / cnn-lstm-caption-generator Goto Github PK

cnn-lstm-caption-generator's Introduction

Learning CNN-LSTM Architectures for Image Caption Generation

Demo instructions

Other files

cnn-lstm-caption-generator's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent