Coder Social home page Coder Social logo

Optimize Training parameters about swiftocr HOT 6 OPEN

nmac427 avatar nmac427 commented on May 18, 2024
Optimize Training parameters

from swiftocr.

Comments (6)

NMAC427 avatar NMAC427 commented on May 18, 2024

Hi there,

Small question. What do you mean by "the algorithm you use to recognize the texts"? Do you mean how I separate the characters, how the NN works or something else?

from swiftocr.

pabloromeu avatar pabloromeu commented on May 18, 2024

Wow, Thanks for such a fast answer!

I mean how are you separating characters and what are you passing to the input of the NN. I would like to know which components can I tweak to get better character recognition and, for instance, if a learning rate of 0.1 makes sense or not.

from swiftocr.

NMAC427 avatar NMAC427 commented on May 18, 2024

I'm using the Swift-AI framework for the NN and a Connected-component labeling algorithm for getting the bounding boxes of the characters.

I have looked at your training sample and I realized you simply set the parameters to fixed ones (0.7 learning rate , 0.4 momentum ...) and you have a fixed 1-hidden layer size.

Since this was my first time working with Neural Networks, I searched the Internet for what learning rate and momentum I should use ^^
I would have loved to use a NN library that allows more than one hidden layer but I couldn't find one.

Parameters to tweak in SwiftOCR.swift:

  • recognizableCharacters (Which characters were used / are used for training the NN (see #34))
  • globalNetwork: hidden, learningRate, momentum, activationFunction and errorFunction (This only has an affect on training. I only achieved good training results when using .CrossEntropy(average: false) as the errorFunction. If you use the Training App, you have to change them on line 32 and 97)
  • xMergeRadius and yMergeRadius (see #1)
  • confidenceThreshold (The confidence for recognizing a character has to be bigger or equal to this threshold. If it is too high, it may filter too much 'noise', if it is too low, if may not filter enough.)
  • //Filter blobs (line 347 - 360) (This filters the connected components. E.g. If the blob is thinner than 1% of the input image width, then notToThin will be false and the blob will get filtered out.)
  • //Filter rects: - Not to small (line 417 - 429) (the same as //Filter blobs but only checks if the width and height of the blob (after merging) is OK)
  • cropSize (line 469) (How big the final image (of the blob) should be for recognition. If you change the cropSize, you have change the number of inputs of the NN to cropSize.width * cropSize.height + 1)

Parameters to tweak in SwiftOCRTraining.swift (Only for training):

  • trainingImageNames (These images will be used for adding noise in the background when training)
  • trainingFontNames (The font names used tor training. Only important when you aren't using the Training app)
  • numberOfTrainImages and numberOfTestImages (how large the training and testing set should be)
  • errorThreshold (When it should stop training. Only kind of important when you aren't using the Training app)
  • //Distortions (line 236 - 246) (CGAffineTransform: How much the image should get distorted for training)

SwiftOCRDelegate:

  • func preprocessImageForOCR(inputImage: OCRImage) -> OCRImage? (Custom image preprocessing)

I think that for the beginning this is more than enough parameters to fiddle with.

from swiftocr.

pabloromeu avatar pabloromeu commented on May 18, 2024

Wow! Thanks! I think you might put that on the readme or the wiki of the library. Some people might find it really interesting.

What it is usually done when trying to get the best NN is to train tons of them in parallel with different settings to check which one works best. then you use the best settings to train your network. That is why I asked you for this. 👍

from swiftocr.

RollingGoron avatar RollingGoron commented on May 18, 2024

@garnele007 Does errorThreshold increase in accuracy when you provide a higher or lower number?

from swiftocr.

NMAC427 avatar NMAC427 commented on May 18, 2024

@RollingGoron The lower the number, the more accurate (and time-consuming) the training should get.

from swiftocr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.