Coder Social home page Coder Social logo

chinese-ocr's Introduction

chinese-ocr

About

This repository contains code that trains a convolutional neural network (CNN) to recognize handwritten Chinese characters. The network is trained on the CASIA dataset, which can be found at http://www.nlpr.ia.ac.cn/databases/handwriting/Home.html.

Downloading

Clone this repository to your computer. From the Downloads Section of the CASIA page, Download HWDB1.0train_gnt (2741MB) and HWDB1.0test_gnt (681MB) and extract the folder so that you end up with two folders with GNT files. Store them in a folder labeled data.

Obtaining the full dataset

Change your directory to this repository. Modify lines 12 and 13 in in convert.py to correspond to the relative locations of your train and test folders. Call these folders train and test.

train_data_dir = os.path.join(data_dir, 'your_path_here')
test_data_dir = os.path.join(data_dir, 'your_path_here')

Preprocessing

Uncomment lines 60-79 of preprocess.py and run. The compressed dataset will new be stored in data/compressed.

Training

Modify train.py to reflect the number of classes you choose to train the network on.

model.add(Dense(number_of_classes, activation='softmax'))

Run the program.

Prediction

Save an image labelled test.png you would like to have the network predict to data. Comment out lines 60-79 of preprocess.py, then run predict.py

Acknowledgements

I'd like to thank the following people/pages for providing resources that have especially helped me.

integeruser on Github

想飞的石头在知乎

蹦跶的小羊羔在cdsn.net

chinese-ocr's People

Contributors

tanayb11 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.