Light

studian / cvnd_p2_image_captioning Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 0.0 2.44 MB

Udacity second project for CVND: image captioning. (GPU version)

Jupyter Notebook 49.61% HTML 50.14% Python 0.25%

cvnd_p2_image_captioning's Introduction

Udacity Computer Vision Nanodegree Project 2

This is my image captioning model for project 2 of the Computer Vision Nanodegree.
I used the model architecture from: https://arxiv.org/abs/1411.4555 in order to implement the system.
I used Titan XP (12 GB) x 4ea GPU.

Install pytorch 0.4.0

Cuda : 8.0
CuDNN : 6.0
pytorch : 0.4.0

pip install http://download.pytorch.org/whl/cu80/torch-0.4.0-cp36-cp36m-linux_x86_64.whl

Install COCO API and Download COCO Dataset

Clone this repo: https://github.com/cocodataset/cocoapi

git clone https://github.com/cocodataset/cocoapi.git

Setup the coco API (also described in the readme here)

cd cocoapi/PythonAPI  
make  
cd ..

Download some specific data from here: http://cocodataset.org/#download (described below)

Under Annotations, download:
- 2014 Train/Val annotations [241MB] (extract captions_train2014.json and captions_val2014.json, and place at locations cocoapi/annotations/captions_train2014.json and cocoapi/annotations/captions_val2014.json, respectively)
- 2014 Testing Image info [1MB] (extract image_info_test2014.json and place at location cocoapi/annotations/image_info_test2014.json)
Under Images, download:
- 2014 Train images [83K/13GB] (extract the train2014 folder and place at location cocoapi/images/train2014/)
- 2014 Val images [41K/6GB] (extract the val2014 folder and place at location cocoapi/images/val2014/)
- 2014 Test images [41K/6GB] (extract the test2014 folder and place at location cocoapi/images/test2014/)

The project is structured as a series of Jupyter notebooks that are designed to be completed in sequential order (0_Dataset.ipynb, 1_Preliminaries.ipynb, 2_Training.ipynb, 3_Inference.ipynb).

cvnd_p2_image_captioning's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.