Comments (4)
In general Calamari is not designed to predict single characters, instead it is designed to predict a complete sequence of characters (sentence) as a whole.
To predict a single character a simple Classification-Network might be better suited (see e.g. MNIST examples)
If you still want to use Calamari:
There are many parameters that could possible effect the accuracy. You could try to increase the --batch_size e.g. to 128. Moreover another network structure could be usefull (--network).
You can also try to limit the alphabet to test if Calamari is able to learn a smaller charset (e. g. 1000, just for testing)
from calamari.
@ChWick Thank you for your advice. I change data to sequence of character.
from calamari.
@ChWick I trianed using data provided by tesseract
(https://github.com/tesseract-ocr/langdata/blob/master/kor/kor.training_text)
Training works quite well.
#00070662: loss=1.50009847 ler=0.01098767 dt=0.05774912s
PRED: '10 연락 미용 톈진 강릉 끙 홍콩 월간 라 큰술 란 잇는 의회 쪄'
TRUE: '10 연락 미용 톈진 강릉 끙 홍콩 월간 라 큰술 란 잇는 의회 쪄'
#00070663: loss=1.54514930 ler=0.01124408 dt=0.05764973s
PRED: '넷째 발표 되며 ( 바향 모퉁이 세괌 16 뒤에 등 자료실 알뜰 늠름한'
TRUE: '넷째 발표 되며 ( 방향 모퉁이 세괌 16 뒤에 등 자료실 알뜰 늠름한'
#00070664: loss=1.40779295 ler=0.01045460 dt=0.05747745s
PRED: '카를로스 신지식 과 보다는 곳 수 바깥 역할 벼룩 질문 . 꿰어 중'
TRUE: '카를로스 신지식 과 보다는 곳 수 바깥 역할 벼룩 질문 . 꿰어 중'
#00070665: loss=1.44664021 ler=0.01071776 dt=0.05732183s
PRED: '쟌느 분 코뮌 디앤샵 건의 반침 19 헌법 법령 프톨레마이오스 > 골'
TRUE: '쟌느 분 코뮌 디앤샵 건의 방침 19 헌법 법령 프톨레마이오스 > 골'
#00070666: loss=1.44412356 ler=0.01071776 dt=0.05723174s
PRED: '17 숙박 조각 다룬다 커스텀 최저가 것이 사건 맥 답하기 뻘 탭'
TRUE: '17 숙박 조각 다룬다 커스텀 최저가 것이 사건 맥 답하기 뻘 탭'
My sample prediction(sentence not in my training dataset) seems good
TRUE: 원대복귀 조치에 따라 둘은 육군으로 돌아가게 됐다.
PRE: 원대복귀 조치에 따라 둘은 육군으로 돌아가게 됐다.
Thanks again 👍
P.S) In your README.md, It said Modules to segment pages into lines will be available soon.
You recomend to use OCRopy scripts. But It's not that good.
When can I check this module?
from calamari.
@a41888936 I'm very glad u got this working! Unfortunately, the line segmentation part of our complete OCR-workflow also relies on the OCRopy scripts, therefore this module wont help you neither.
from calamari.
Related Issues (20)
- Error when convert old trained model to latest version model HOT 1
- Got exception during training HOT 4
- calamari-ocr 2.2.2 on ubuntu 22.04 partial success, difficulty with GPU software
- Prediction from calamari trained .pb model HOT 5
- Issue while using the model and json HOT 8
- setup.py on Ubuntu20.04: tensorflow is wrong version HOT 7
- Model very sensitive on PNG input HOT 3
- calamari/1.0: hold Tensorflow and Protobuf dependencies HOT 6
- What is the accuracy on Chinese/Japanese text? HOT 2
- Attention layer
- "No training configuration" for code that should not have one HOT 5
- Downgrading of models is not supported (5 to 2). Please upgrade your Calamari instance (currently installed: 1.0.6) HOT 4
- UnknownArgumentError HOT 7
- Release confusion HOT 4
- calmari/1.0: Fix 1.0.x models for Python 3.11 HOT 11
- allow SpatialDropout for Conv layers
- use annotated baseline instead of CenterNormalizer.measure
- network topology at CNN-RNN interface
- please release v1.0.7 off calamari/1.0 HOT 3
- ValueError: A KerasTensor cannot be used as input to a TensorFlow function. HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from calamari.