Coder Social home page Coder Social logo

milhidaka / chainer-image-caption Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dsanno/chainer-image-caption

51.0 2.0 11.0 50.42 MB

Image caption generator using Chainer, Python 3 and ResNet feature version

Home Page: https://milhidaka.github.io/chainer-image-caption/

License: MIT License

Python 69.93% HTML 3.47% JavaScript 26.61%
deep-learning image-captioning python browser

chainer-image-caption's Introduction

Image caption generator using Chainer

Python 3 and ResNet feature version by @milhidaka

Including caption generation demo on web browser using WebDNN.

Screenshot

Requirement

Usage (only caption generation)

Simply doing caption generation using pre-trained model (ResNet-50 + MSCOCO)

Please download dataset and train by yourself by looking at usage (training model using MSCOCO dataset) below.

$ python src/generate_caption.py -s dataset_coco.pkl -m caption_gen_resnet.model -l image/list.txt -g 0

Options:

  • -s, sentence: (required) sentence dataset file path.
  • -m, --model: (required) trained model file path.
  • -l, --list: (required) image path list file.
  • -g, --gpu: (optional) GPU index. -1 means CPU.

Convert model to WebDNN (browser demo)

$ python src/convert_webdnn.py --sentence dataset_coco.pkl --model caption_gen_resnet.model --example_image image/asakusa.jpg

Then start a HTTP server (python -m http.server) and go to http://localhost:8000/webdnn.

Usage (training model using MSCOCO dataset)

Download dataset

  1. Download images (2014) from http://mscoco.org/dataset/#download and extract to some directory.
  2. Download caption_datasets.zip from: http://cs.stanford.edu/people/karpathy/deepimagesent/
  3. Extract downloaded zip file, and you'll get dataset_coco.json.

Convert dataset

$ python src/convert_dataset.py dataset_coco.json dataset_coco.pkl

Parameters:

  • sentence JSON file of dataset.
  • output pkl file.

Extract ResNet feature

$ python src/extract_resnet_feat.py dataset_coco.json /path/to/coco/images resnet_feat.mat -g 0 -b 16

Options:

  • sentence JSON file of dataset.
  • Top-level directory containing images. Searches files recursively.
  • output feature matrix file. (becomes about 1GB)
  • -g, --gpu: (optional) GPU index. -1 means CPU.
  • -b, --batchsize: (optional) batch size for extracting feature.

It will take several hours.

Train dataset

$ python src/train.py -g 0 -s dataset_coco.pkl -i resnet_feats.mat -o model/caption_gen

Options:

  • -g, --gpu: (optional) GPU device index (default: -1).
  • -s, --sentence: (required) sentence dataset file path.
  • -i, --image: (required) image feature file path.
  • -m, --model: (optional) input model file path without extension.
  • -o, --output: (required) output model file path without extension.
  • --iter: (optional) the number of iterations (default: 100).

Image path list file sample

image/asakusa.jpg
image/tree.jpg

License

MIT License

chainer-image-caption's People

Contributors

dsanno avatar milhidaka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.