Coder Social home page Coder Social logo

kuzushiji-recognition's Introduction

Kuzushiji Recognition

Kaggle Kuzushiji Recognition: Code for the 8th place solution.

The kuzushiji recognition pipeline is consists of two models: CenterNet character detection model and MobileNetV3 per-character classification model.

Setup

Language environment

Python version:

  • 3.7.3

Libraries:

  • chainer (6.2.0)
  • chainercv (0.13.1)
  • cupy-cuda92 (6.2.0)
  • albumentations (0.3.1)
  • opencv-python (4.1.0.25)
  • Pillow (6.1.0)
  • pandas (0.25.0)
  • numpy (1.17.0)
  • matplotlib (3.1.1)
  • japanize-matplotlib (1.0.4)

For unittest:

  • pytest (4.4.1)

Download dataset

Please download the competition dataset from here and unzip to <repo root>/data/kuzushiji-recognition.

The expected directory structure is as follows:

kuzushiji-recognition/
    data/
        kuzushiji-recognition/
            train.csv
            train_images
            test_images
            unicode_translation.csv
            sample_submission.csv

Training procedure

Please follow the steps below to train kuzushiji recognition models.

  1. Set environment variable:
cd <path to this repo>
export PYTHONPATH=`pwd`
  1. Split all annotated samples written in train.csv into train and validation split:
python scripts/prepare_train_val_split.py
  1. Prepare per-character cropped image set for character classifier training:
python scripts/prepare_char_crop_dataset.py
  1. Train character detection model:
python scripts/train_detector.py --gpu 0 --out ./results/detector --full-data
  1. Train character classification model:
python scripts/train_classifier.py --gpu 0 --out ./results/classifier --full-data
  1. Prepare pseudo label using trained detector and classifier:
python scripts/prepare_pseudo_labels.py --gpu 0 \
    ./results/detector/model_700.npz \
    ./results/classifier/model_900.npz \
    --out data/kuzushiji-recognition-pseudo
  1. Finetune classifier using pseudo label and original training data:
python scripts/finetune_classifier.py --gpu 0 \
    --pseudo-labels-dir  data/kuzushiji-recognition-pseudo \
    --out ./results/classifier-finetune \
    ./results/classifier/model_900.npz

Prepare submission

To generate a CSV for submission, please execute the following commands.:

python scripts/prepare_submission.py --gpu 0 \
    ./results/detector/model_700.npz \
    ./results/classifier-finetune/model_100.npz

Python API

The detector class and the classifier class provide easy-to-use inferface for inference. This is an example of inference code. Note that the bounding box format is (xmin, ymin, xmax, ymax).

import chainer
from PIL import Image

from kr.detector.centernet.resnet import Res18UnetCenterNet
from kr.classifier.softmax.mobilenetv3 import MobileNetV3
from kr.datasets import KuzushijiUnicodeMapping


# unicode <-> unicode index mapping
mapping = KuzushijiUnicodeMapping()

# load trained detector
detector = Res18UnetCenterNet()
chainer.serializers.load_npz('./results/detector/model_700.npz', detector)

# load trained classifier
classifier = MobileNetV3(out_ch=len(mapping))
chainer.serializers.load_npz('./results/classifier/model_900.npz', classifier)

# load image
image = Image.open('path/to/image.jpg')

# character detection
bboxes, bbox_scores = detector.detect(image)

# character classification
unicode_indices, scores = classifier.classify(image, bboxes)
unicodes = [mapping.index_to_unicode(idx) for idx in unicode_indices]

License

Released under the MIT license.

kuzushiji-recognition's People

Contributors

t-hanya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.