Coder Social home page Coder Social logo

ai-forever / ocr-model Goto Github PK

View Code? Open in Web Editor NEW
42.0 4.0 14.0 96 KB

An easy-to-run OCR model pipeline based on CRNN and CTC loss

License: MIT License

Dockerfile 1.54% Makefile 0.72% Python 89.39% Jupyter Notebook 8.35%
crnn ocr pytorch text-recognition

ocr-model's Introduction

OCR model

This is a model for Optical Character Recognition based on CRNN-arhitecture and CTC loss.

OCR-model is a part of ReadingPipeline repo.

Demo

In the demo you can find an example of using of OCR-model (you can run it in your Google Colab).

Quick setup and start

The provided Dockerfile is supplied to build an image with CUDA support and cuDNN.

Preparations

  • Clone the repo.
  • Download and extract dataset to the data/ folder.
  • sudo make all to build a docker image and create a container. Or sudo make all GPUS=device=0 CPUS=10 if you want to specify gpu devices and limit CPU-resources.

If you don't want to use Docker, you can install dependencies via requirements.txt

Configuring the model

You can change the ocr_config.json and set the necessary training and evaluating parameters: alphabet, image size, saving path, etc.

"train": {
    "datasets": [
        {
            "csv_path": "/workdir/data/dataset_1/train.csv",
            "prob": 0.5
        },
        {
            "csv_path": "/workdir/data/dataset_2/train.csv",
            "prob": 0.7
        },
        ...
    ],
    "epoch_size": 10000,
    "batch_size": 512
}
  • epoch_size - the size of an epoch. If you set it to null, then the epoch size will be equal to the amount of samples in the all datasets.
  • It is also possible to specify several datasets for the train/validation/test, setting the probabilities for each dataset separately (the sum of prob can be greater than 1, since normalization occurs inside the processing).

Prepare data

Datasets must be pre-processed and have a single format: each dataset must contain a folder with images (crop images with text) and csv file with annotations. The csv file should contain two columns: "filename" with the relative path to the images (folder-name/image-name.png), and "text"-column with the image transcription.

filename text
images/4099-0.png is

If you use polygon annotations in COCO format, you can prepare a training dataset using this script:

python scripts/prepare_dataset.py \
    --annotation_json_path path/to/the/annotaions.json \
    --annotation_image_root dir/to/images/from/annotation/file \
    --class_names pupil_text pupil_comment teacher_comment \
    --bbox_scale_x 1 \
    --bbox_scale_y 1 \
    --save_dir dir/to/save/dataset \
    --output_csv_name data.csv

Training

To train the model:

python scripts/train.py --config_path path/to/the/ocr_config.json

Evaluating

To test the model:

python scripts/evaluate.py \
--config_path path/to/the/ocr_config.json \
--model_path path/to/the/model-weights.ckpt

If you want to use a beam search decoder with LM, you can pass lm_path arg with path to .arpa kenLM file. --lm_path path/to/the/language-model.arpa

ONNX

You can convert Torch model to ONNX to speed up inference on cpu.

python scripts/torch2onnx.py \
--config_path path/to/the/ocr_config.json \
--model_path path/to/the/model-weights.ckpt

ocr-model's People

Contributors

julia132 avatar skalinin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ocr-model's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.