Coder Social home page Coder Social logo

darrenonly / bi-stet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mauritsbleeker/bi-stet

0.0 1.0 0.0 80 KB

Implementation of Bidirectional Scene Text Recognition with a Single Decoder

Home Page: https://arxiv.org/abs/1912.03656

Python 100.00%

bi-stet's Introduction

Bi-STET

This is the repository for 'Bidirectional Scene Text Recognition with a Single Decoder', by Maurits Bleeker and Maarten de Rijke [pdf]

The base source-code for this project comes from: http://nlp.seas.harvard.edu/2018/04/03/attention.html

I have tried to keep the code as general as possible. However, some elements of the pipeline are specially for the environment I worked with.

Model weights and reproducibility

To reproduce the results of the paper, please use the final model parameters.

https://drive.google.com/file/d/1OwJ3iVpRhnjIZyOi7aOQIeLv7N1DHZkC/view?usp=sharing

In the folder data_utils/, all the scripts to generate the train and test sets as used for this paper are provided.

Python and package versions

  • Python 3.7
  • Pillow 5.4.1
  • nltk 3.4.5
  • numpy 1.17.1
  • scipy 1.2.0
  • seaborn 0.9.0
  • tensorboard-logger 0.1.0
  • tensorboardX 1.7
  • torch 1.1.0.post2
  • torchvision 0.2.1
  • transformers 2.1.1

Run

To run the code, just run main.py, and set all the configurations in the Config.py. The configurations to reproduce the results are set in the Config.py file.

Training

There are two options to load the training/test data:

  • From disk. This can be done by using the annotation file(s).
  • From a pickle file. The pickle file should contain a python dict with the following data format.
{
image_id : { 
    'data' : 'binary image string',
    'label' : 'word'
    }
}

Test and train annotations

The annotations text files are formatted as 'path/to/image.jpg annotation'. The path to image is always relative to a root folder.

Example root folder: User/Documents/Project/data/IIITK/

In User/Documents/Project/data/IIITK/, we have an annotation.txt and the images.

An example of the annotation file:

test/1002_1.png private

Data processing

All the files to process the original provided train datasets are given in /data_utils.

Reference

If you found this code useful, please cite the following paper:

@article{bleeker2019bidirectional,
  title={Bidirectional Scene Text Recognition with a Single Decoder},
  author={Bleeker, Maurits and de Rijke, Maarten},
  journal={arXiv preprint arXiv:1912.03656},
  year={2019}
}

bi-stet's People

Contributors

mauritsbleeker avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.