Coder Social home page Coder Social logo

deepclassificationbot's Introduction

Classification Bot

Join the chat at https://gitter.im/AntreasAntoniou/DeepClassificationBot Welcome to the Classification Bot codebase. Classification Bot is an attempt of simplifying the collection, extraction and preprocessing of data as well as providing an end to end pipeline for using them to train large deep neural networks.

The system is composed of scrapers, data extractors, preprocessors, deep neural network models using Keras provided by Francois Chollet and an easy to use deployment module.

Installation

Make sure you have a GPU as the training is very compute intensive

  1. (OSX) Install gcc: brew install gcc
  2. Install CUDA_toolkit 7.5
  3. Install cuDNN 4
  4. Install Theano, using sudo pip install git+git://github.com/Theano/Theano.git
  5. Install OpenCV
  6. Install hdf5 library (libhdf5-dev)
  7. Make sure you have Python 2.7.6 and virtualenv installed on your system
  8. Install Python dependencies
$ virtualenv --python=python2 --system-site-packages env
$ . env/bin/activate
$ pip install -r requirements.txt

Training and deploying

To download images

Use google_image_scraper.py to download images. It takes a .csv file of the categories you want, and downloads a number of images per line.

The first line of the .csv file will be ignored.

The number of images per category is configurable. We suggest a number between 200-1000:

$ google_image_scraper.py -n 200 yourfilehere.csv

Easy Mode:

(For users that have a list of categories available at hand):

  1. Create a .csv file with one category per line of what you want the scraper to search for.
  2. Now let's download some images! Run python google_image_scraper.py yourfilehere.csv

Hacker Mode:

(For users that know an online repo that has their categories and want to fetch them, or if their categories are too many and you MUST automate the procedure, or if you much rather code stuff rather than copy and paste)

  1. Write a script that can fetch your categories using Wikipedia or any other resource you would like. For an example look at examples/anime_names.py to see what we used to get our categories.
  2. Have your script create a .csv file with the categories you require.
  3. Then run python google_image_scraper.py yourfilehere.csv

To extract and preprocess data ready for training

  1. Once you have your data ready, run python train.py extract_data to get all of your data ready and saved in HDF5 files.

To train your network

  1. Once all of the above have been met then you are ready to train your network, by running python train.py --run to load data from HDF5 files or python train.py --run --extract_data to extract data and train in one procedure.
  2. If you want to continue training a model, you can. After each epoch the weights are saved. If you want to continue training simply run python train.py --run --continue

Deploying a model

  1. Once your training has finished and a good model has been trained then you can deploy your model.
  2. To deploy a model on a single URL image use python deploy.py --URL [URL_LINK]
  3. To deploy a model on a folder full of images use python deploy --image-folder path/to/folder
  4. To deploy a model on a single file use python deploy --image-path path/to/file

Once deployed the model should return the top 5 predictions on each image in a nice string formatted view: e.g.

Image Name: Tengen.Toppa.Gurren-Lagann.full.174481.jpg
Categories:
0. Gurren Lagann: 0.999914288521
1. Kill La Kill: 7.29278544895e-05
2. Naruto: 4.92283288622e-06
3. Redline: 2.71744352176e-06
4. Cowboy Bebop: 1.41406655985e-06
_________________________________________________

Things for you to try

  1. Create your own classifiers
  2. Try different model architectures (Hint: go to google scholar or arxiv and search for GoogLeNet, VGG-Net, AlexNet, ResNet and follow the waves :) )

Twitter bot

deepanimebot/bot.py is a Twitter bot that provides an interface for querying the classifier.

Running the bot locally

Prerequisites

Copy bot.ini.example to bot.ini and overwrite with your consumer key/secret and access token/secret.

Run it

$ PYTHONPATH=. python deepanimebot/bot.py -c bot.ini --debug --classifier=local

python deepanimebot/bot.py --help will list all available command line options.

Web interface

deepanimebot/webapp.py is a Flask app for querying the classifier.

$ PYTHONPATH=. python deepanimebot/webapp.py

Deploying to Google Cloud Platform

This repo comes with the necessary support files for deploying the Twitter bot and/or the web app to Google Cloud Platform.

Prerequisites

Building and registering your own Docker image

classificationbot/base:latest comes with all the dependencies installed. If you've modified the code and added a new dependency, make a new Docker image based on the dockerfiles in this repo.

This repo's base images are built with these commands:

$ docker build -t classificationbot/base:latest -f dockerfiles/base/Dockerfile .
$ docker push classificationbot/base:latest

$ docker build -t classificationbot/ci:latest -f dockerfiles/ci/Dockerfile .
$ docker push classificationbot/ci:latest

Deploying

There are two options:

  1. (Not used anymore) Google Compute Engine, container-optimized instance, supervisord + tweepy: bot-standalone
  2. Google Container Engine, kubernetes, gunicorn + flask + tweepy: follow this gist

Special Thanks

Special thanks to Francois Chollet (fchollet) for building the superb Keras deep learning library. We couldn't have brought a project ready to be used by non-machine learning people if it wasn't for the ease of use of Keras.

Special thanks to https://github.com/shuvronewscred/ for building the image scraper we adapted for our project. Original source code can be found at https://github.com/shuvronewscred/google-search-image-downloader

deepclassificationbot's People

Contributors

antreasantoniou avatar ento avatar gitter-badger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

deepclassificationbot's Issues

Serialize and save model when initializing workspace

Motivation

deploy.load_model takes an input shape argument, which is hardcoded in bot/webapp. This value is not something dependent on the code, but rather tied to the model that goes with the saved weights. Furthermore, the saved weights are tightly coupled with the model itself.

Coupled data should saved in proximity with each other, not updated/specified by human hands.

Proposed changes

When initializing a workspace, user can specify the model builder name and input/output shapes. Build a model and save it in the workspace.

All other code should load the model from the workspace, and get input/output shapes from the model.

Portable workspace directory

Motivation

Data files created by this framework are currently saved in multiple places, which are hardcoded:

./data
./downloaded_images
./pre_trained_weights

In particular, files needed for deployment are in ./data and ./pre_trained_weights.

This leaves these user stories unfulfilled:

  • When I deploy a model to a remote location, I want a simple way to specify all the files I need, so that it's easily automated and future-proof.
  • When I'm experimenting with multiple classification projects, I want to easily switch between projects, so that progress is saved and easy to pick up later.

Proposed changes

Provide a command that initializes a workspace directory with the expected directory structure.

workspace/
  downloaded_images/
  wip/
    data.hdf5
  dist/
    latest_model_weights.hdf5
    model_weights.hdf5
    categories.p

All of the entrypoint scripts should take a --workspace=dir argument and look for files within the workspace:

google_image_scraper.py
train.py
deploy.py
deepanimebot/bot.py
deepanimebot/webapp.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.