Coder Social home page Coder Social logo

nickto / pytorch-ocr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gabrieldornelles/pytorch-ocr

0.0 0.0 0.0 194 KB

Simple Pytorch framework to train OCRs. Supports CRNNs, Attention, CTC and Cross Entropy Loss.

License: MIT License

Python 98.33% Dockerfile 1.67%

pytorch-ocr's Introduction

TorchNN-OCR

A simple PyTorch framework to train Optical Character Recognition (OCR) models.

You can train models to read captchas, license plates, digital displays, and any type of text!

See:

Rich Text while Training!

Hydra!

You have the whole Training Log in a train.log file so you can process it anywhere!

You can also run multiple training runs with Hydra:

python3 train.py --multirun model.use_attention=true,false model.use_ctc=true,false training.num_epochs=50,100

This example will run 8 different trainings with each configuration.

How to train?

  • Create a directory called "dataset" and throw your images there (preferable to be png, but you can use other formats as long as you change that)

  • Your file tree should be like that:

    torch-nn-ocr
    │   README.md
    │   ...  
    │
    └─── dataset
        cute.png
        motor.png
        machine.png
    

    The image name needs to be the content writen in the image. In this case you have one image with 'cute' written in it, other with 'motor' and another with 'machine'.

  • Your data should be of same length, padding is done automatically if using Attention + CrossEntropy, but padding is not done for CTC Loss, so make sure you normalize your target lengths in case of using CTC Loss (you can do this by adding a character to represent empty space, remember to not use the same as CTC uses for blank, those are different blanks).

  • Configure your model at configs/config.yaml

    model:
      use_attention: true 
      use_ctc: true
      dims: 256
  • Run:

python3 train.py

Support:

  • CRNNs ✅
  • Attention ✅
  • CTC Loss ✅
  • Cross Entropy Loss ✅

Will Support:

  • Other backbones
  • Text detection models(would you like it?)

TODO:

  • Add logging with hydra, so it saves logging in text files. ✅
  • Add CI with github actions, to test if everything works fine after pushes to this repo. ✅
  • Add tests to main methods so it keeps secure when adding more models and functionalities in the future. Partially added. ✅
  • Configure Dockerfile for inference

pytorch-ocr's People

Contributors

gabrieldornelles avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.