Coder Social home page Coder Social logo

xei7 / mlp-character-recognition Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mateogianolio/ocr

0.0 1.0 0.0 12.36 MB

Trains a multi-layer perceptron (MLP) neural network to perform optical character recognition (OCR).

License: MIT License

JavaScript 100.00%

mlp-character-recognition's Introduction

MLP character recognition

Trains a multi-layer perceptron (MLP) neural network to perform optical character recognition (OCR).

The training set is automatically generated using a heavily modified version of the captcha-generator node-captcha. Support for the MNIST handwritten digit database has been added recently (see performance section).

The network takes a one-dimensional binary array (default 20 * 20 = 400-bit) as input and outputs an 8-bit array, which can then be converted into a character code. Initial performance measurements show promising success rates.

After training, the network is saved as a standalone module to ./ocr.js, which can then be used in your project with

var ocr = require('./ocr.js');
var output = ocr
  .activate(input)
  .map(function(bit) {
    return bit > 0.5 ? 1 : 0;
  });
  
// output is now binary array, converting to character is easy
var character = String.fromCharCode(parseInt(output.join(''), 2));

// do stuff ...

Performance

// config.json
{
  "mnist": true,
  "network": {
    "hidden": 160,
    "learning_rate": 0.03
  }
}
  • Neurons
    • 400 input
    • 160 hidden
    • 4 output
  • Learning rate: 0.03
  • Training set: 60000 digits
  • Testing set: 10000 digits
  • Training time: 36 min 22 s 50 ms
  • Success rate: 90.14%

[A-Za-z0-9]

// config.json
{
  "mnist": false,
  "text": "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ012356789",
  "fonts": [
    "sans-serif",
    "serif"
  ],
  "training_set": 2000,
  "testing_set": 1000,
  "image_size": 16,
  "threshold": 400,
  "network": {
    "hidden": 40,
    "learning_rate": 0.03
  }
}
  • Neurons
    • 256 input
    • 40 hidden
    • 8 output
  • Learning rate: 0.03
  • Training set
    • Size: 124000 characters
    • Sample: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
  • Testing set: 62000 characters
  • Training time: 4 min 8 s 2 ms
  • Success rate: 78.60322580645162%

[a-z]

// config.json
{
  "mnist": false,
  "text": "abcdefghijklmnopqrstuvwxyz",
  "fonts": [
    "sans-serif",
    "serif"
  ],
  "training_set": 2000,
  "testing_set": 1000,
  "image_size": 16,
  "threshold": 400,
  "network": {
    "hidden": 40,
    "learning_rate": 0.1
  }
}
  • Neurons
    • 256 input
    • 40 hidden
    • 8 output
  • Learning rate: 0.1
  • Training set
    • Size: 52000 characters
    • Sample: abcdefghijklmnopqrstuvwxyz
  • Testing set: 26000 characters
  • Training time: 2 min 10 s 752 ms
  • Success rate: 91.77692307692308%

[0-9]

// config.json
{
  "mnist": false,
  "text": "0123456789",
  "fonts": [
    "sans-serif",
    "serif"
  ],
  "training_set": 2000,
  "testing_set": 1000,
  "image_size": 16,
  "threshold": 400,
  "network": {
    "hidden": 40,
    "learning_rate": 0.1
  }
}
  • Neurons
    • 256 input
    • 40 hidden
    • 8 output
  • Learning rate: 0.1
  • Training set
    • Size: 20000 digits
    • Sample: 0123456789
  • Testing set: 10000 digits
  • Training time: 1 min 6 s 620 ms
  • Success rate: 99.22%

Configuration

Tweak the network for your needs by editing the config.json file located in the main folder. Pasted below is the default config file.

// config.json
{
  "mnist": false,
  "text": "abcdefghijklmnopqrstuvwxyz",
  "fonts": [
    "sans-serif",
    "serif"
  ],
  "training_set": 2000,
  "testing_set": 500,
  "image_size": 20,
  "threshold": 400,
  "network": {
    "hidden": 40,
    "learning_rate": 0.1
  }
}
  • mnist
    • If set to true, the MNIST handwritten digit dataset will be used for training and testing the network. This setting will overwrite configured set sizes and will ignore the image_size, threshold, fonts and text settings.
  • text
    • A string containing the glyphs with which to train/test the network.
  • fonts
    • An array of fonts to be used when generating images.
  • training_set
    • Number of images to be generated and used as the network training set.
  • testing_set
    • Same as above, but these images are used for testing the network.
  • image_size
    • The size of the square chunk (in pixels) containing a glyph. The resulting network input size is image_size^2.
  • threshold
    • When analyzing the pixels of a glyph, the algorithm reduces each pixel (r, g, b) to (r + g + b) and everything below threshold is marked as 1 in the resulting binary array used as network input.
  • network
    • hidden
      • The size (number of neurons) of the hidden layer of the network.
    • learning_rate
      • The learning rate of the network.

Usage

Clone this repository. The script is using canvas, so you'll need to install the Cairo rendering engine. On OSX, this can be done with the following one-liner (copied from canvas README):

$ wget https://raw.githubusercontent.com/LearnBoost/node-canvas/master/install -O - | sh

Then install npm dependencies and test it:

$ npm install
$ node main.js

mlp-character-recognition's People

Contributors

christosilvia avatar mateogianolio avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.