Comments (9)
Interesting! Is the raw array smaller for a png of the same size?
from tesseract.js.
Posting findings in case it helps someone debug this issue. It'd be great to have a couple of sets of images that are more complex (maybe) to try this out
Medium size image (960px x 720px)
Execution time from start to finish
Tesseract JPG (22Kb)
- 7836.192ms
- 7683.145ms
- 7327.576ms
- 7546.628ms
Tesseract PNG (11kb)
- 9132.705ms
- 9054.147ms
- 8672.093ms
- 9110.318ms
This is the set of images I used (dummies I created with google draw.. )
I'm running a second set of tests with bigger images (>100kb) that looks like it's going to yield pretty much the same results (>2 minutes processing time for each file on the first run, with the 177kb JPG taking ~13 second less than the 100Kb PNG)... weird.
from tesseract.js.
This is the "large" set of images..
from tesseract.js.
@rowasc I can provide you with some testing images:
These have already been pre-processed. I think the images you tested with are too clean and don't have any noise so they don't apply on our use case for tesseract and tesseract.js. I will look into this issue more today and provide you with feedback when I know more.
from tesseract.js.
Thanks ! yes, that makes total sense.
I'll get back to this after work today.
from tesseract.js.
@rowasc Have you looked into the performance yet? I'm interested in further contribution if I can help.
from tesseract.js.
bmp image width in pixel:608 height in pixel:300
TEXT is a THREE LETTTER UPERCASE sans seriff.
i get TIME spent (ms)= 5330, TEXT= OAX, CONFIDENCE 92
way too slow compared to human.
i am wondering if it has something todo with preloading of language file ?
how can i tune it up for many small stream of images ?
from tesseract.js.
Please use the latest version and see if the performance issue still exists, feel free to open new issues, close this one for now.
from tesseract.js.
@jeromewu I used the new version, but it still takes almost a minute for images with more than 50 words. I have used the below attached image. It is a .png image and less than 100kb.
So I have a POST endpoint called "/image2text".
- I have tried creating a worker and loading the lang data ahead of time (out side of the POST api call), and
- I have also tried keeping it all together inside the POST call (creating the worker and loading the lang data once the user make a POST call, then terminate the worker after the call has been fulfilled).
However, the results are the same. I didn't notice any improvement in the performance.
I used this random image from google image:
from tesseract.js.
Related Issues (20)
- Custom traindata do not work HOT 2
- possibility to capture stderr HOT 3
- JSDelivr CDN not accessible in China HOT 8
- Large images cause excessive memory usage
- Worker stuck on "loading language traineddata" HOT 4
- Updated types to infer output formats
- Inference of Chinese handwritten characters is bad HOT 3
- Add line size metrics (ascender, descender, size) to `line` objects in `blocks` output HOT 1
- Font attributes incorrect even when font is properly identified (`is_italic`, `is_serif`, etc.) HOT 1
- Focusing area HOT 1
- Multiple issues: Discussion
- Disable non-text output formats by default
- Tesseract - Running in Browser Console HOT 1
- Execution `worker.recognize` repeatedly causes "Out of Memory" error in JSFiddle HOT 5
- Error: Network error while fetching HOT 1
- how to use installed tessercat lib on windows for tesseract.js? HOT 1
- createWorker throws exception with option.langPath set in electron HOT 6
- Auto fill forms by scanning ID cards
- Suppressing "Corrupt JPEG data: 1 extraneous bytes before marker 0xd9" output HOT 1
- Tesseract.js Bug on IBM i Server HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesseract.js.