Coder Social home page Coder Social logo

adaptech-cz / tesseract4android Goto Github PK

View Code? Open in Web Editor NEW
692.0 15.0 108.0 31.89 MB

Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.

License: Apache License 2.0

Java 1.06% CMake 0.58% C++ 27.59% C 57.54% Makefile 2.30% Shell 4.49% Roff 5.49% M4 0.15% Lua 0.01% PostScript 0.01% HTML 0.32% SAS 0.05% Smalltalk 0.02% WebAssembly 0.05% Assembly 0.06% Module Management System 0.06% DIGITAL Command Language 0.04% Dockerfile 0.01% Batchfile 0.01% Awk 0.15%
android ocr tesseract tesseract-ocr leptonica libpng libjpeg optical-character-recognition tesseract-android

tesseract4android's Introduction

Tesseract4Android

Fork of tess-two rewritten from scratch to build with CMake and support latest Android Studio and Tesseract OCR.

The Java/JNI wrapper files and tests for Leptonica / Tesseract are based on the tess-two project, which is based on Tesseract Tools for Android.

Dependencies

This project uses additional libraries (with their own specific licenses):

Prerequisites

  • Android 4.1 (API 16) or higher
  • A v4.0.0 trained data file(s) for language(s) you want to use.
    • These files must be placed in the (sub)directory named tessdata and the path must be readable by the app. When targeting API >=29, only suitable places for this are app's private directories (like context.getFilesDir() or context.getExternalFilesDir()).

Variants

This library is available in two variants.

  • Standard - Single-threaded. Best for single-core processors or when using multiple Tesseract instances in parallel.
  • OpenMP - Multi-threaded. Provides better performance on multi-core processors when using only single instance of Tesseract.

Usage

You can get compiled version of Tesseract4Android from JitPack.io.

  1. Add the JitPack repository to your project root build.gradle file at the end of repositories:
allprojects {
    repositories {
        ...
        maven { url 'https://jitpack.io' }
    }
}
  1. Add the dependency to your app module build.gradle file:
dependencies {
    // To use Standard variant:
    implementation 'cz.adaptech.tesseract4android:tesseract4android:4.7.0'

    // To use OpenMP variant:
    implementation 'cz.adaptech.tesseract4android:tesseract4android-openmp:4.7.0'
}
  1. Use the TessBaseAPI class in your code:

This is the simplest example you can have. In this case TessBaseAPI is always created, used to recognize the image and then destroyed. Better would be to create and initialize the instance only once and use it to recognize multiple images instead. Look at the sample project for such usage, additionally with progress notifications and a way to stop the ongoing processing.

// Create TessBaseAPI instance (this internally creates the native Tesseract instance)
TessBaseAPI tess = new TessBaseAPI();

// Given path must contain subdirectory `tessdata` where are `*.traineddata` language files
// The path must be directly readable by the app
String dataPath = new File(context.getFilesDir(), "tesseract").getAbsolutePath();

// Initialize API for specified language
// (can be called multiple times during Tesseract lifetime)
if (!tess.init(dataPath, "eng")) { // could be multiple languages, like "eng+deu+fra"
    // Error initializing Tesseract (wrong/inaccessible data path or not existing language file(s))
    // Release the native Tesseract instance
    tess.recycle();
    return;
}

// Load the image (file path, Bitmap, Pix...)
// (can be called multiple times during Tesseract lifetime)
tess.setImage(image);

// Start the recognition (if not done for this image yet) and retrieve the result
// (can be called multiple times during Tesseract lifetime)
String text = tess.getUTF8Text();

// Release the native Tesseract instance when you don't want to use it anymore
// After this call, no method can be called on this TessBaseAPI instance
tess.recycle();

Sample app

There is example application in the sample directory. It shows basic usage of the TessBaseAPI inside ViewModel, showing progress indication, allowing stopping the processing and more.

It uses sample image and english traineddata, which are extracted from the assets in the APK to app's private directory on device. This is simple, but you are keeping 2 instances of the data file (first is kept in the APK file itself, second is kept on the storage) - wasting some space. If you plan to use multiple traineddata files, it would be better to download them directly from the internet rather than distributing them within the APK.

Building

You can use Android Studio to open the project and build the AAR. Or you can use gradlew from command line.

To build the release version of the library, use task tesseract4android:assembleRelease. After successful build, you will have resulting AAR files in the <project dir>/tesseract4Android/build/outputs/aar/ directory.

Or you can publish the AAR directly to your local maven repository, by using task tesseract4android:publishToMavenLocal. After successful build, you can consume your library as any other maven dependency. Just make sure to add mavenLocal() repository in repositories {} block in your project's build.gradle file.

Android Studio

  • Open this project in Android Studio.
  • Open Gradle panel, expand Tesseract4Android / :tesseract4Android / Tasks / other and run assembleRelease (to get AAR).
  • Or in the same panel expand Tesseract4Android / :tesseract4Android / Tasks / publishing and run publishToMavenLocal (to publish AAR).

GradleW

  • In project directory create local.properties file containing:
sdk.dir=c\:\\your\\path\\to\\android\\sdk
ndk.dir=c\:\\your\\path\\to\\android\\ndk

Note for paths on Windows you must use \ to escape some special characters, as in example above.

  • Call gradlew tesseract4android:assembleRelease from command line (to get AAR).
  • Or call gradlew tesseract4android:publishToMavenLocal from command line (to publish AAR).

License

Copyright 2019 Adaptech s.r.o., Robert Pösel

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

tesseract4android's People

Contributors

fab1ano avatar robyer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tesseract4android's Issues

Received status code 401 from server: Unauthorized

I get an error in version 4.1.1. Maybe there are private submodules?

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':app:checkDebugAarMetadata'.
> Could not resolve all files for configuration ':app:debugRuntimeClasspath'.
   > Could not resolve cz.adaptech.android:tesseract4android:4.1.1.
     Required by:
         project :app > project :flutter_mrz_scanner
      > Could not resolve cz.adaptech.android:tesseract4android:4.1.1.
         > Could not get resource 'https://jitpack.io/cz/adaptech/android/tesseract4android/4.1.1/tesseract4android-4.1.1.pom'.
            > Could not GET 'https://jitpack.io/cz/adaptech/android/tesseract4android/4.1.1/tesseract4android-4.1.1.pom'. Received status code 401 from server: Unauthorized

Bug in the documentation

	/**
	 * Calls End() and finalizes native data. Must be called on object destruction.
	 */
	private native void nativeRecycle(long mNativeData);

Replace end to recycle - TessBaseAPI

Expose Deskew from Skew

Currently only findSkew is exposed in Skew.java , Deskew is also equally important which is not exposed by this lib. Kindly make available them.

PIX * pixDeskew (PIX *pixs, l_int32 redsearch)
PIX * pixFindSkewAndDeskew (PIX *pixs, l_int32 redsearch, l_float32 *pangle, l_float32 *pconf)

Couldn't find "libleptonica.so"

Good afternoon.
I'm trying to link this library to a .jar project. That runs directly without installation. I'm getting the following error on initialization:
"
java.lang.UnsatisfiedLinkError: dalvik.system.PathClassLoader[DexPathList[[zip file "/data/local/tmp/appname.jar"],nativeLibraryDirectories=[/system/lib, /product/lib, /system/lib, /product/lib]]] couldn't find "libleptonica.so"
at java.lang.Runtime.loadLibrary0(Runtime.java:1012)
at java.lang.System.loadLibrary(System.java:1672)
at com.googlecode.tesseract.android.TessBaseAPI.(TessBaseAPI.java:57)
"
Any ideas how to fix this?

Jitpack or maven repo

Thanks for great work,
i used this library in [Trivia Hack(https://github.com/SubhamTyagi/loco-answers) as a submodule and it working fine. But it will be more helpful for other if you could make available this library as maven (or others types) dependencies for android like tess-two

Again thanks for this amazing work

UnsatisfiedLinkError when trying to create new instance of TessBaseAPI

This is a weird error because it doesn't happen every time - mostly rare - so it's difficult to debug.
This is the line where the error occurs:

TessBaseAPI tessBaseApi = new TessBaseAPI();

The error log:

E/linker: package com.app.myapp: library "/system/lib64/libjpeg.so" ("/system/lib64/libjpeg.so") needed or dlopened by "/system/lib64/libnativeloader.so" is not accessible for the namespace: [name="classloader-namespace", ld_library_paths="", default_library_paths="/data/app/com.app.myapp-4EcKvX8ZmvEUrqVJAF20Dg==/lib/arm64:/data/app/com.app.myapp-4EcKvX8ZmvEUrqVJAF20Dg==/base.apk!/lib/arm64-v8a", permitted_paths="/data:/mnt/expand:/mnt/asec:/data/data/com.app.myapp"]

D/AndroidRuntime: Shutting down VM
E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.app.myapp, PID: 6393
java.lang.UnsatisfiedLinkError: dlopen failed: library "/system/lib64/libjpeg.so" needed or dlopened by "/system/lib64/libnativeloader.so" is not accessible for the namespace "classloader-namespace"
at java.lang.Runtime.loadLibrary0(Runtime.java:1016)
at java.lang.System.loadLibrary(System.java:1657)
at com.googlecode.tesseract.android.TessBaseAPI.(TessBaseAPI.java:52)
at com.app.myapp.utils.UtilsOCR.getTessBaseAPI(UtilsOCR.java:257)
at com.app.myapp.ocr.OCRTextEvaluator.init(OCRTextEvaluator.java:381)
at com.app.myapp.ocr.OCRTextEvaluator.(OCRTextEvaluator.java:48)
at com.app.myapp.helper.NotebookWriter.init(NotebookWriter.java:530)
at com.app.myapp.helper.NotebookWriter.(NotebookWriter.java:89)
at com.app.myapp.exercises.writing.fillblankspen.FillBlanksPenFragment.init(FillBlanksPenFragment.java:382)
at com.app.myapp.exercises.writing.fillblankspen.FillBlanksPenFragment.onCreateView(FillBlanksPenFragment.java:77)
at androidx.fragment.app.Fragment.performCreateView(Fragment.java:2600)
at androidx.fragment.app.FragmentManagerImpl.moveToState(FragmentManagerImpl.java:881)
at androidx.fragment.app.FragmentManagerImpl.moveFragmentToExpectedState(FragmentManagerImpl.java:1238)
at androidx.fragment.app.FragmentManagerImpl.moveToState(FragmentManagerImpl.java:1303)
at androidx.fragment.app.BackStackRecord.executeOps(BackStackRecord.java:439)
at androidx.fragment.app.FragmentManagerImpl.executeOps(FragmentManagerImpl.java:2079)
at androidx.fragment.app.FragmentManagerImpl.executeOpsTogether(FragmentManagerImpl.java:1869)
at androidx.fragment.app.FragmentManagerImpl.removeRedundantOperationsAndExecute(FragmentManagerImpl.java:1824)
at androidx.fragment.app.FragmentManagerImpl.execPendingActions(FragmentManagerImpl.java:1727)
at androidx.fragment.app.FragmentManagerImpl.dispatchStateChange(FragmentManagerImpl.java:2663)
at androidx.fragment.app.FragmentManagerImpl.dispatchActivityCreated(FragmentManagerImpl.java:2613)
at androidx.fragment.app.Fragment.performActivityCreated(Fragment.java:2624)
at androidx.fragment.app.FragmentManagerImpl.moveToState(FragmentManagerImpl.java:904)
at androidx.fragment.app.FragmentManagerImpl.moveFragmentToExpectedState(FragmentManagerImpl.java:1238)
at androidx.fragment.app.FragmentManagerImpl.moveToState(FragmentManagerImpl.java:1303)
at androidx.fragment.app.BackStackRecord.executeOps(BackStackRecord.java:439)
at androidx.fragment.app.FragmentManagerImpl.executeOps(FragmentManagerImpl.java:2079)
at androidx.fragment.app.FragmentManagerImpl.executeOpsTogether(FragmentManagerImpl.java:1869)
at androidx.fragment.app.FragmentManagerImpl.removeRedundantOperationsAndExecute(FragmentManagerImpl.java:1824)
at androidx.fragment.app.FragmentManagerImpl.execPendingActions(FragmentManagerImpl.java:1727)
at androidx.fragment.app.FragmentManagerImpl$2.run(FragmentManagerImpl.java:150)
at android.os.Handler.handleCallback(Handler.java:789)
at android.os.Handler.dispatchMessage(Handler.java:98)
at android.os.Looper.loop(Looper.java:251)
at android.app.ActivityThread.main(ActivityThread.java:6572)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:240)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:767)

Crash on onProgressValues when shrinking code

I'm experiencing this crash

java.lang.NoClassDefFoundError: com.googlecode.tesseract.android.TessBaseAPI
	at com.xxx.yyy.MainActivity.a(Unknown Source:291)
	at com.xxx.yyy.a.onMethodCall(Unknown Source:2)
	at io.flutter.plugin.common.MethodChannel$IncomingMethodCallHandler.onMessage(Unknown Source:17)
	at io.flutter.embedding.engine.dart.DartMessenger.handleMessageFromDart(Unknown Source:57)
	at io.flutter.embedding.engine.FlutterJNI.handlePlatformMessage(Unknown Source:4)
	at android.os.MessageQueue.nativePollOnce(Native Method)
	at android.os.MessageQueue.next(MessageQueue.java:326)
	at android.os.Looper.loop(Looper.java:160)
	at android.app.ActivityThread.main(ActivityThread.java:6863)
	at java.lang.reflect.Method.invoke(Native Method)
	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:537)
	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
Caused by: java.lang.NoSuchMethodError: no non-static method "Lcom/googlecode/tesseract/android/TessBaseAPI;.onProgressValues(IIIIIIIII)V"
	at com.googlecode.tesseract.android.TessBaseAPI.nativeClassInit(Native Method)
	at com.googlecode.tesseract.android.TessBaseAPI.<clinit>(Unknown Source:20)

when building the apk with flutter. See related flutter issue here. Flutter shrinks the code, and given onProgressValues looks unused (because called from native code) it is removed.
We should either update the proguard file to keep it or use the @Keep notation.

To be clear this issue is not limited to flutter, you would have the same problem even when compiling a normal android project using Tesseract4Android when shrinking the code, but it's now more evident for flutter users because flutter shrinks by default starting from the latest version.

Reference https://developer.android.com/studio/build/shrink-code

SIGSEGV when calling getUTF8Text with custom traineddata

First of all, I would like so say thank you for your work and efforts on this project.

As the title says we are getting an abortion signal SIGSEGV when calling getUTF8Text(). This happens while we are using our trained data trained with the OCR-D repository.

When using this custom traineddata from command line (tesseract version 4.0.0beta) it works fine.
Does someone has any idea why this the case?

Cannot interupt OCR process started by getHOCRText

I use the TessBaseApi inside AsyncTask then call the getHOCRTText using an instance asyncOCR of AsyncOcr as follows :
asyncOcr.execute(image);

In doInBackground method:

protected String doInBackground(Bitmap... bitmaps) {
tessAPI.setImage(bitmaps[0]);
return tessAPI.getHOCRText(0);
}

When the AsyncOcr is interupted:
@OverRide
protected void onCancelled() {
super.onCancelled();
tessAPI.stop();
}

But the OCR is not interupted and runs to completion

Below is the full code of AsyncOcr:

static class AsyncOcr extends AsyncTask<Bitmap, Void, String>{
private TessBaseAPI tessAPI;
AsyncOcr(Reader context){
tessAPI = new TessBaseAPI(new TessBaseAPI.ProgressNotifier() {
@OverRide
public void onProgressValues(TessBaseAPI.ProgressValues progressValues) {
}
});
tessAPI.init(dataPath, model_name, TessBaseAPI.OEM_LSTM_ONLY);
}
@OverRide
protected void onPreExecute() {
super.onPreExecute();

    }

    @Override
    protected String doInBackground(Bitmap... bitmaps) {
        tessAPI.setImage(bitmaps[0]);
        return tessAPI.getHOCRText(0);
    }
    @Override
    protected void onPostExecute(String text) {
     }

    @Override
    protected void onCancelled() {
        super.onCancelled();
        tessAPI.stop();
    }
}

Crash "nativeGetUTF8Text" in unit test after 30min of running

First thank you for this library 👍
I am using it for long running unit test (extracting 100+ documents sequentially), where it crashes after about 30 min of running

2019-05-07 09:02:46.794 14237-14272/de.minirechnung.devel I/Tesseract(native): Initialized Tesseract API with language=eng
    
    --------- beginning of crash
2019-05-07 09:03:00.058 14237-14272/de.minirechnung.devel A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 14272 (roidJUnitRunner)
2019-05-07 09:03:00.118 990-990/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
2019-05-07 09:03:00.118 990-990/? A/DEBUG: Build fingerprint: 'samsung/dream2qltezh/dream2qltechn:7.1/N2G48H/G9550ZHU1AQEE:user/release-keys'
2019-05-07 09:03:00.118 990-990/? A/DEBUG: Revision: '12'
2019-05-07 09:03:00.118 990-990/? A/DEBUG: ABI: 'x86'
2019-05-07 09:03:00.118 990-990/? A/DEBUG: pid: 14237, tid: 14272, name: roidJUnitRunner  >>> de.minirechnung.devel <<<
2019-05-07 09:03:00.119 990-990/? A/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
2019-05-07 09:03:00.119 990-990/? A/DEBUG:     eax 00000000  ebx 0000379d  ecx 000037c0  edx 00000006
2019-05-07 09:03:00.119 990-990/? A/DEBUG:     esi 999fa978  edi 999fa920
2019-05-07 09:03:00.119 990-990/? A/DEBUG:     xcs 00000073  xds 0000007b  xes 0000007b  xfs 0000003b  xss 0000007b
2019-05-07 09:03:00.119 990-990/? A/DEBUG:     eip b7662bf0  ebp 999f2028  esp 999f1fcc  flags 00000292
2019-05-07 09:03:00.121 990-990/? A/DEBUG: backtrace:
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #00 pc 00000bf0  [vdso:b7662000] (__kernel_vsyscall+16)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #01 pc 0007cadc  /system/lib/libc.so (tgkill+28)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #02 pc 000782b5  /system/lib/libc.so (pthread_kill+85)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #03 pc 00028a2a  /system/lib/libc.so (raise+42)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #04 pc 0001eed6  /system/lib/libc.so (abort+86)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #05 pc 0016b50a  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZNK7ERRCODE5errorEPKc16TessErrorLogCodeS1_z+266)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #06 pc 001bfb4e  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZNK9tesseract4Dict7case_okERK11WERD_CHOICERK10UNICHARSET+142)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #07 pc 001ceb5b  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZNK9tesseract4Dict16AcceptableResultEP8WERD_RES+443)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #08 pc 0010b2f0  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZN9tesseract9Tesseract20tess_acceptable_wordEP8WERD_RES+48)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #09 pc 000cd630  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZN9tesseract9Tesseract17match_word_pass_nEiP8WERD_RESP3ROWP5BLOCK+256)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #10 pc 000cd313  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZN9tesseract9Tesseract19classify_word_pass1ERKNS_8WordDataEPP8WERD_RESPNS_13PointerVectorIS4_EE+403)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #11 pc 000ca299  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZN9tesseract9Tesseract17RetryWithLanguageERKNS_8WordDataEMS0_FvS3_PP8WERD_RESPNS_13PointerVectorIS4_EEEbS6_S9_+233)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #12 pc 000c49eb  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZN9tesseract9Tesseract26classify_word_and_languageEiP11PAGE_RES_ITPNS_8WordDataE+491)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #13 pc 000c58b5  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZN9tesseract9Tesseract18RecogAllWordsPassNEiP10ETEXT_DESCP11PAGE_RES_ITP13GenericVectorINS_8WordDataEE+757)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #14 pc 000c6d98  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZN9tesseract9Tesseract15recog_all_wordsEP8PAGE_RESP10ETEXT_DESCPK4TBOXPKci+456)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #15 pc 000ae902  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZN9tesseract11TessBaseAPI9RecognizeEP10ETEXT_DESC+1154)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #16 pc 000acf7c  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (_ZN9tesseract11TessBaseAPI11GetUTF8TextEv+76)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #17 pc 002c2b5a  /data/app/de.minirechnung.devel-2/lib/x86/libtesseract.so (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetUTF8Text+74)
2019-05-07 09:03:00.122 990-990/? A/DEBUG:     #18 pc 00116157  /system/lib/libart.so

Duplicate classes during build

I am getting the following duplicate class issues during compilation.

By using the latest version of Tesseract4Android and

    implementation 'androidx.appcompat:appcompat:1.3.0'
    implementation 'com.google.android.material:material:1.3.0'

I get these:

	Duplicate class kotlin.internal.jdk7.JDK7PlatformImplementations found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk7-1.3.21 (org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.3.21)
Duplicate class kotlin.jdk7.AutoCloseableKt found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk7-1.3.21 (org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.3.21)

By using the latest version of Tesseract4Android and

    implementation 'androidx.appcompat:appcompat:1.6.1'
    implementation 'com.google.android.material:material:1.9.0'

I get these:

Duplicate class kotlin.collections.jdk8.CollectionsJDK8Kt found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)
Duplicate class kotlin.internal.jdk7.JDK7PlatformImplementations found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk7-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.6.0)
Duplicate class kotlin.internal.jdk8.JDK8PlatformImplementations found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)
Duplicate class kotlin.io.path.ExperimentalPathApi found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk7-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.6.0)
Duplicate class kotlin.io.path.PathRelativizer found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk7-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.6.0)
Duplicate class kotlin.io.path.PathsKt found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk7-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.6.0)
Duplicate class kotlin.io.path.PathsKt__PathReadWriteKt found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk7-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.6.0)
Duplicate class kotlin.io.path.PathsKt__PathUtilsKt found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk7-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.6.0)
Duplicate class kotlin.jdk7.AutoCloseableKt found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk7-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.6.0)
Duplicate class kotlin.jvm.jdk8.JvmRepeatableKt found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)
Duplicate class kotlin.random.jdk8.PlatformThreadLocalRandom found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)
Duplicate class kotlin.streams.jdk8.StreamsKt found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)
Duplicate class kotlin.streams.jdk8.StreamsKt$asSequence$$inlined$Sequence$1 found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)
Duplicate class kotlin.streams.jdk8.StreamsKt$asSequence$$inlined$Sequence$2 found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)
Duplicate class kotlin.streams.jdk8.StreamsKt$asSequence$$inlined$Sequence$3 found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)
Duplicate class kotlin.streams.jdk8.StreamsKt$asSequence$$inlined$Sequence$4 found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)
Duplicate class kotlin.text.jdk8.RegexExtensionsJDK8Kt found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)
Duplicate class kotlin.time.jdk8.DurationConversionsJDK8Kt found in modules jetified-kotlin-stdlib-1.8.0 (org.jetbrains.kotlin:kotlin-stdlib:1.8.0) and jetified-kotlin-stdlib-jdk8-1.6.0 (org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.6.0)

mistake in build line

On the github page change :

dependencies {
// To use Standard variant:
implementation 'cz.adaptech:tesseract4android:4.1.1'

replace the quote : implementation "cz.adaptech:tesseract4android:4.1.1"

Guide for getting it running?

Hey!

I saw the previous issue for "installed, now what" post but I'm still not able to get it running. Is it possible to get a step-by-step installation walk-through?

Tesseract OCR: less memory consumption by avoiding new instances of TessBaseAPI?

Not really an issue but I thought about asking here since I don't know exactly how this API wrapper works.

Basically I make a new instance of MyOCR whenever I need to perform OCR.
This is currently what my constructor looks like:

public MyOCR(Bitmap bitmap)
	{
		this.tessBaseAPI = new TessBaseAPI();
		this.tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO_ONLY);
		try {
			this.tessBaseAPI.setDebug(true);
			this.tessBaseAPI.init("storage/emulated/0", "eng");
			this.tessBaseAPI.setImage(bitmap);
			this.text = tessBaseAPI.getUTF8Text();
			this.tessBaseAPI.end();
		} catch (Exception e) {
			e.printStackTrace();
			System.err.println(e.getMessage());
		}
	}

I was wondering if performance wise, in the long run, the following code be preferable.
Basically I make only one instance of MyOCR and set the new image every time I need to perform OCR.

public MyOCR()
{
	this.tessBaseAPI = new TessBaseAPI();
	this.tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO_ONLY);
	try {
		this.tessBaseAPI.setDebug(true);
		this.tessBaseAPI.init("storage/emulated/0", "eng");
	} catch (Exception e) {
		e.printStackTrace();
		System.err.println(e.getMessage());
	}
}

public void ocrTask(Bitmap bitmap)
{
	this.tessBaseAPI.setImage(bitmap);
	this.text = tessBaseAPI.getUTF8Text();
}

Thread for running the OCR

Does Tesseract4Android runs the OCR on a separate thread other than the UI thread ??
Or the users is responsible for running on separate thread.

What is the right approach to using TessBaseAPI with AsyncTask ??

Cannot resolve symbol 'TessBaseAPI'

I am trying to switch from:

implementation 'cz.adaptech.android:tesseract4android:2.1.0'

to:

implementation 'cz.adaptech:tesseract4android:4.1.1'

But when I try to use the class TessBaseAPI I am not able to import it. With the 2.1.0 version I was able to import it with:

import com.googlecode.tesseract.android.TessBaseAPI;

What am I missing here?

TessPdfRenderer not working with jpg files

Hi.

Having used the alexcohn/tess-two repository, the TessPdfRenderer works with jpg files, but when using the latest version of this library, it doesn't. The generated PDF apparently has the text in it, but not the image, and the workaround is to create a PNG out of the JPG file, which in some situations adds up to 2 seconds of processing. This situation is quite similar to the old issue (from 2015) found in the original rmtheis repository.

Here is the code that works with 'com.rmtheis:tess-two:9.1.0' but not this library:

TessBaseAPI mTess = new TessBaseAPI();
mTess.setDebug(true);
mTess.init(DATA_PATH, lang);

String pdfOutput = Environment.getExternalStorageDirectory().toString() + "/Download/ocrOutput";
String jpegInput = Environment.getExternalStorageDirectory().toString() + "/Download/jpegInput.jpg";
TessPdfRenderer renderer = new TessPdfRenderer(mTess, pdfOutput);
    
mTess.beginDocument(renderer);

File file = new File(jpegInput);
Pix pix = ReadFile.readFile(file);

boolean addedPageOne = mTess.addPageToDocument(pix, file.getAbsolutePath(), renderer);
Log.e(TAG, "convertImageToSearchablePdf: addedPageOne: " + addedPageOne);

boolean endDocument = mTess.endDocument(renderer);
Log.e(TAG, "convertImageToSearchablePdf: endDocument: " + endDocument );

renderer.recycle();
pix.recycle();

Am I missing something in my code, so it works with the other library and not this one?

How to pass ocr configurations properly

I tried to put my anpr.tessconfig configuration file into /sdcard/tesseract/tessdata where it contains:

load_system_dawg 0
load_freq_dawg 0
tessedit_char_whitelist 0123456789ABGRZSTJPMNO
user_patterns_suffix anpr.user-patterns

Knowing that anpr.user-patternsis my pattern file (holding this pattern only: \c\d*) placed at the same destination, I tried to pass these configs using the readConfigFile function as follows, however it seems not to be working, neither for the setPageSegMode.

mTess = new TessBaseAPI();
 mTess.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_WORD);
mTess.readConfigFile("anpr.tessconfig");
mTess.init(datapath, "eng");

Please note that i tried to pass the configs using the setVariable() function like this:

mTess.setVariable("load_system_dawg", "false");
mTess.setVariable("load_freq_dawg", "false");
mTess.setVariable("tessedit_char_whitelist", "0123456789ABGRZSTJPMNO");
mTess.setVariable("user_patterns_suffix", "anpr.user-patterns");

but still not working either.
any help would be highly appreciated. Thanks!!

How can I use Tess.setVariable(whitelist)?

I tried to use setVariable to use whitelist that I can filter characters.
I heard that Tesseract 4.0 doesn't have whitelist or blacklist but 4.1x does. And this project uses Tesseract 4.1.1.
I made aar file on your guide, copy the aar file on my project's libs folder and implementation on my build.gradle. Is there anything I did wrong way?
If anyone who successfully apply whitelist, PLEASE give me some guide.
Thanks for great works.

Not working with some traineddata files for tesseract 4

Hi guys, great job done! :D

I used your library for a while and it is working well, until recently that I tried to use with this traineddata file:
https://github.com/Shreeshrii/tessdata_shreetest/blob/master/fas-minus-float.traineddata

I extracted the mentioned traineeddata file and the .version file says:
4.0.0-beta.1-232-g45a6:fas:minus20180518:from:4.00.00alpha:Arabic:synth20170629

while for the eng.traineddata shipped with this repo, the .version file says:
Pre-4.0.0

Is my fas-minus-float.traineddata version is right? Can it be used with your library?

I soon provide the error thrown on my android device, sorry that I cannot provide it at the moment. Thought maybe the version of my traineddata is not compatible at all so that error is not important.

Crash on initialization on some devices

Hello. I use the compiled version of the library in my application. When I run the app on my Google Pixel 6a everything works great. But if I run the application on a weaker device or on a device with an old version of Android, I get a crash with the following error: A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 12321. Tested on two devices: Xiaomi Redmi note 10 Pro(Android 11) and Xiaomi Mi 5(Android 8). All attempts to correct the error, to no avail.
image

Detail log:

22971-22971 libc com.example.forblitz.livestatistics A Fatal signal 6 (SIGABRT), code -6 in tid 22971 (.livestatistics)
2023-06-09 02:16:51.102 23086-23086 DEBUG pid-23086 A pid: 22971, tid: 22971, name: .livestatistics >>> com.example.forblitz.livestatistics <<<
2023-06-09 02:16:51.119 23086-23086 DEBUG pid-23086 A #2 pc 000000000021cfc4 /data/app/com.example.forblitz.livestatistics-SH0kovyRsnozQwxSJhbXWg==/lib/arm64/libtesseract.so (_ZNK9tesseract7ERRCODE5errorEPKcNS_16TessErrorLogCodeES2_z+368)
2023-06-09 02:16:51.119 23086-23086 DEBUG pid-23086 A #3 pc 0000000000232cd8 /data/app/com.example.forblitz.livestatistics-SH0kovyRsnozQwxSJhbXWg==/lib/arm64/libtesseract.so (_ZN9tesseract8Classify22InitAdaptiveClassifierEPNS_15TessdataManagerE+420)
2023-06-09 02:16:51.119 23086-23086 DEBUG pid-23086 A #4 pc 0000000000310b08 /data/app/com.example.forblitz.livestatistics-SH0kovyRsnozQwxSJhbXWg==/lib/arm64/libtesseract.so (ZN9tesseract7Wordrec14program_editupERKNSt6__ndk112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEPNS_15TessdataManagerESB+84)
2023-06-09 02:16:51.120 23086-23086 DEBUG pid-23086 A #5 pc 00000000001d1c90 /data/app/com.example.forblitz.livestatistics-SH0kovyRsnozQwxSJhbXWg==/lib/arm64/libtesseract.so (_ZN9tesseract9Tesseract14init_tesseractERKNSt6__ndk112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES9_S9_NS_13OcrEngineModeEPPciPKNS1_6vectorIS7_NS5_IS7_EEEESH_bPNS_15TessdataManagerE+688)
2023-06-09 02:16:51.120 23086-23086 DEBUG pid-23086 A #6 pc 0000000000180d78 /data/app/com.example.forblitz.livestatistics-SH0kovyRsnozQwxSJhbXWg==/lib/arm64/libtesseract.so (_ZN9tesseract11TessBaseAPI4InitEPKciS2_NS_13OcrEngineModeEPPciPKNSt6__ndk16vectorINS6_12basic_stringIcNS6_11char_traitsIcEENS6_9allocatorIcEEEENSB_ISD_EEEESH_bPFbS2_PNS7_IcSC_EEE+1040)
2023-06-09 02:16:51.120 23086-23086 DEBUG pid-23086 A #7 pc 0000000000180958 /data/app/com.example.forblitz.livestatistics-SH0kovyRsnozQwxSJhbXWg==/lib/arm64/libtesseract.so (_ZN9tesseract11TessBaseAPI4InitEPKcS2_NS_13OcrEngineModeEPPciPKNSt6__ndk16vectorINS6_12basic_stringIcNS6_11char_traitsIcEENS6_9allocatorIcEEEENSB_ISD_EEEESH_b+56)
2023-06-09 02:16:51.120 23086-23086 DEBUG pid-23086 A #8 pc 00000000003134f0 /data/app/com.example.forblitz.livestatistics-SH0kovyRsnozQwxSJhbXWg==/lib/arm64/libtesseract.so (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeInitOem+136)
2023-06-09 02:16:51.121 23086-23086 DEBUG pid-23086 A #9 pc 0000000000043f34 /data/app/com.example.forblitz.livestatistics-SH0kovyRsnozQwxSJhbXWg==/oat/arm64/base.odex (offset 0x30000)

Crash on nativeGetUTF8Text

I am getting the whole application to crash, once: getUTF8Text() method is called. When used debugger it crashes once I hit the: nativeGetUTF8Text.
Nothing obvious in logs, I can't catch any exception and when I execute the: tess.utF8Text line in the debugger I simply get: VMDisconnectedException.

I've tried this dataset: https://github.com/tesseract-ocr/tessdata/blob/4.0.0/eng.traineddata

tried on multiple different images, but no success. Trying on emulator, but on real phone happens as well.

I am using below code.

 val assetName = "eng.traineddata"
                    val fileNameOnDevice = "eng.traineddata"
                    val tess = TessBaseAPI()
                    val data: File = File(context.application.dataDir, "tessdata")
                    val traineddataFile = File(data, fileNameOnDevice)
                    if (!traineddataFile.exists()) {
                        data.mkdirs()
                        val src = context.getAssets().open(assetName)
                        FileUtils.copyToFile(src, traineddataFile);
                    }

                    tess.init(context.application.dataDir.absolutePath,"eng")
                    tess.setImage(File(image.absolutePath))
                    try {

                         // CRASHES HERE !
                        val text = tess.utF8Text

                        Toast.makeText(context, text, Toast.LENGTH_SHORT).show()
                    } catch (e : Exception) {
                        val bp = 1
                    }

                    tess.recycle()

I am using: 4.1.1 tesseract4android.

Any ideas how to debug what's the issue?

pixReadMemTiff - function not present

Hello,
after building this library and invoking it on the device I get the following error:

Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made

what can be done about it?

PSM_SINGLE_BLOCK_VERT_TEXT doesn't work for Japanese

I'm getting some pretty non-sensical results when I try to use PSM_SINGLE_BLOCK_VERT_TEXT with Japanese text. Back when I used to use tess-two instead of this library, it seemed to work. I'm using jpn.traineddata and jpn_vert.traineddata in https://github.com/tesseract-ocr/tessdata_best. And the way I'm initializing the APIs is here: https://github.com/0xbad1d3a5/Kaku/blob/master/app/src/main/java/ca/fuwafuwa/kaku/Ocr/OcrRunnable.kt

image

But yeah, I'm not entirely sure what's wrong here. Any hints/tips on how to debug this issue? Thanks!

Missing PageSegMode 13

I notify that TessBaseApi class is missing PageSegMode 13 (PSM_RAW_LINE ) which is used with the new LSTM engine to OCR a single text line image.

simple app calling the library

Hello,
Could you provide a very simple App that call the library (with an image stored in res resource for example) that gets the text ?
So, we could get all compiler option and parameters in an App ready for use.
Sincerely,

local implementation very slow

Hello,
Thank you for the sample.
I have build the Tesseract project, and the sample

When I change the option (first) :
// Use library from JitPack for simplicity
implementation 'cz.adaptech:tesseract4android:4.1.1'
// Or use library compiled locally
//implementation project(':tesseract4android')

by (second option)
// Use library from JitPack for simplicity
//implementation 'cz.adaptech:tesseract4android:4.1.1'
// Or use library compiled locally
implementation project(':tesseract4android')

and execute the sample
It takes 30 seconds to recognize the text, and only about 2 seconds with first option

with the same smartphone.

Sincerely,

Move to mavenCentral?

If you delete jcenter now, then the project is not going to. jcenter is deprecated. Maybe move to mavenCentral?

Not able to execute gradlew tesseract4android:assembleRelease

Hi,
Thank you for this amazing repository, being new to ocr it took a lot of research to found this repository, which works with tesseract4. I have cloned the project run the assemblerelease successfully but i am not able to execute gradlew tesseract4android:assembleRelease from terminal.
I tried finding solutions to it which says,

Command 'gradlew' not found

for solving this i am running ,

gradle wrapper --gradle-version 5.1.1

which is giving the error:

A problem occurred evaluating project ':app'.

Failed to apply plugin [id 'com.android.application']
Minimum supported Gradle version is 5.1.1. Current version is 4.4.1. If using the gradle wrapper, try editing the distributionUrl in /home/akanksha/.gradle/daemon/4.4.1/gradle/wrapper/gradle-wrapper.properties to gradle-5.1.1-all.zip

I have tried updating gradle versions, changing distribution url in gradle wrapper, but not able to execute this, can you please help me with it?

Thank you.

TessBaseAPI init failed.

Hi.
I called TessBaseAPI.init after copy eng.traineddata but return value is false.
Do you know what problem is?

confusing rootproject.libraryVersion

Dear Developer,

The rootProject.ext.libraryVersion is confusing when we use more libraries.
I suggest to rename it to tesseractLibraryVersion, or move this constant from the project level to module level.

Thanks
Gabor

Error with initialize tessdata

Hey, i have got the error where is say "couldn't initialize tesseract api with language", i have try many solution, even create subdirectory "tessdata". but it isn't work
Error

SegFault in ReadFile.nativeReadBytes8

Hi,
I encountered a null pointer dereference in ReadFile.nativeReadBytes8 (Java_com_googlecode_leptonica_android_ReadFile_nativeReadBytes8, here).

Whenever either pixCreateNoInit or pixSetupByteProcessing return a null pointer, there happens a null pointer dereference at memcpy (here). This can be triggered by causing an Integer overflow in ReadFile.readBytes8 in the multiplication of the width and height parameter, which leads to pixCreateNoInit returning NULL (due to a check in pixCreateHeader, called by pixCreateNoInit).

In particular, this call gives me a SIGSEGV:

byte[] pixelData = "Some String".getBytes();
Pix p = ReadFile.readBytes8(pixelData, 0x10000, 0x10000);

To my knowledge these stubs are originally from Google. Since the google project seems unmaintained and the tess-two project as well is now longer maintained by the author, I think it makes sense to add checks in the wrapper files here.

Pull request #33 adds checks that prevent the crash from happening.

duplicate class

The following error shows when building app
java.lang.RuntimeException: Duplicate class com.googlecode.leptonica.android.AdaptiveMap found in modules tesseract4android-4.1.0-runtime.jar (cz.adaptech.tesseract4android:tesseract4android:4.1.0) and tesseract4android-openmp-4.1.0-runtime.jar (cz.adaptech.tesseract4android:tesseract4android-openmp:4.1.0)

Here is the build.gradle
`dependencies {
implementation fileTree(dir: 'libs', include: ['*.jar'])
implementation 'androidx.appcompat:appcompat:1.0.2'
implementation 'androidx.constraintlayout:constraintlayout:1.1.3'
testImplementation 'junit:junit:4.12'
androidTestImplementation 'androidx.test.ext:junit:1.1.0'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.1.1'
implementation project(path: ':opencv')
implementation ('cz.adaptech:tesseract4android:4.1.0')
// implementation 'cz.adaptech:tesseract4android-openmp:4.1.0'

}
`

Some mobile crash when update system.

Some Sony and Samsung users report that recognition crashes after recent system upgrades
Almost covers nearly 1~2 years of mobile phones...
I don't know how to fix it...

TessBaseAPI api = new TessBaseAPI();
api.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_LINE);
api.init(context.getCacheDir().getAbsolutePath() + "/", "jpn");

api.setImage(bitmap);
api.setRectangle(rect);
String result = api.getUTF8Text();

tessdata:tessdata_fast

Here is the report from google play

Error type 1

System version:

  • 12 (78.6%)
  • 11 (20.2%)
  • 12L (1.2%)

Devices

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 0 >>> net.package <<<

backtrace:
  #00  pc 0000000000051b20  /apex/com.android.runtime/lib64/bionic/libc.so (abort+168)
  #00  pc 000000000018e000  /data/app/~~EFTlK7iTvvXOP0nicI2Jvw==/net.package-SpUGjz22JpoYWPdXxVgDbA==/split_config.arm64_v8a.apk!libtesseract.so (ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+376)
  #00  pc 0000000000119c5c  /data/app/~~EFTlK7iTvvXOP0nicI2Jvw==/net.package-SpUGjz22JpoYWPdXxVgDbA==/split_config.arm64_v8a.apk!libtesseract.so (tesseract::Tesseract::SegmentPage(STRING const*, BLOCK_LIST*, tesseract::Tesseract*, OSResults*)+136)
  #00  pc 00000000000e2318  /data/app/~~EFTlK7iTvvXOP0nicI2Jvw==/net.package-SpUGjz22JpoYWPdXxVgDbA==/split_config.arm64_v8a.apk!libtesseract.so (tesseract::TessBaseAPI::FindLines()+652)
  #00  pc 00000000000e27f4  /data/app/~~EFTlK7iTvvXOP0nicI2Jvw==/net.package-SpUGjz22JpoYWPdXxVgDbA==/split_config.arm64_v8a.apk!libtesseract.so (tesseract::TessBaseAPI::Recognize(ETEXT_DESC*)+56)
  #00  pc 00000000000e15cc  /data/app/~~EFTlK7iTvvXOP0nicI2Jvw==/net.package-SpUGjz22JpoYWPdXxVgDbA==/split_config.arm64_v8a.apk!libtesseract.so (tesseract::TessBaseAPI::GetUTF8Text()+60)
  #00  pc 00000000002a3c48  /data/app/~~EFTlK7iTvvXOP0nicI2Jvw==/net.package-SpUGjz22JpoYWPdXxVgDbA==/split_config.arm64_v8a.apk!libtesseract.so (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetUTF8Text+64)
  #00  pc 000000000006c594  /data/app/~~EFTlK7iTvvXOP0nicI2Jvw==/net.package-SpUGjz22JpoYWPdXxVgDbA==/oat/arm64/base.odex (art_jni_trampoline+100)
  #00  pc 0000000000101518  /data/app/~~EFTlK7iTvvXOP0nicI2Jvw==/net.package-SpUGjz22JpoYWPdXxVgDbA==/oat/arm64/base.odex (com.googlecode.tesseract.android.TessBaseAPI.b+56)
  #00  pc 000000000014e490  /data/app/~~EFTlK7iTvvXOP0nicI2Jvw==/net.package-SpUGjz22JpoYWPdXxVgDbA==/oat/arm64/base.odex (net.package.b.a.c+2400)
  #00  pc 00000000001507d8  /data/app/~~EFTlK7iTvvXOP0nicI2Jvw==/net.package-SpUGjz22JpoYWPdXxVgDbA==/oat/arm64/base.odex (net.package.b.c.run+552)
  #00  pc 00000000001bf19c  /apex/com.android.art/javalib/arm64/boot.oat (java.lang.Thread.run+76)
  #00  pc 00000000002ca764  /apex/com.android.art/lib64/libart.so (art_quick_invoke_stub+548)
  #00  pc 000000000030e980  /apex/com.android.art/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+156)
  #00  pc 00000000003c1db4  /apex/com.android.art/lib64/libart.so (art::JValue art::InvokeVirtualOrInterfaceWithJValues<art::ArtMethod*>(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, art::ArtMethod*, jvalue const*)+380)
  #00  pc 00000000004578ec  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+992)
  #00  pc 00000000000b6e44  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+264)
  #00  pc 0000000000053454  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)

Error type 2

System version:

  • 11 (99.6%)
  • 12 (0.4%)

Devices

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 0 >>> net.package <<<

backtrace:
  #00  pc 000000000004e40c  /apex/com.android.runtime/lib64/bionic/libc.so (abort+164)
  #00  pc 000000000018e000  /data/app/~~-7k4zfCJeQZLILrkkybH2g==/net.package-24GS5V1rQeWBo82xzjDSqA==/split_config.arm64_v8a.apk!lib/arm64-v8a/libtesseract.so (offset 0x8d1000) (ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+376)
  #00  pc 0000000000119c5c  /data/app/~~-7k4zfCJeQZLILrkkybH2g==/net.package-24GS5V1rQeWBo82xzjDSqA==/split_config.arm64_v8a.apk!lib/arm64-v8a/libtesseract.so (offset 0x8d1000) (tesseract::Tesseract::SegmentPage(STRING const*, BLOCK_LIST*, tesseract::Tesseract*, OSResults*)+136)
  #00  pc 00000000000e2318  /data/app/~~-7k4zfCJeQZLILrkkybH2g==/net.package-24GS5V1rQeWBo82xzjDSqA==/split_config.arm64_v8a.apk!lib/arm64-v8a/libtesseract.so (offset 0x8d1000) (tesseract::TessBaseAPI::FindLines()+652)
  #00  pc 00000000000e27f4  /data/app/~~-7k4zfCJeQZLILrkkybH2g==/net.package-24GS5V1rQeWBo82xzjDSqA==/split_config.arm64_v8a.apk!lib/arm64-v8a/libtesseract.so (offset 0x8d1000) (tesseract::TessBaseAPI::Recognize(ETEXT_DESC*)+56)
  #00  pc 00000000000e15cc  /data/app/~~-7k4zfCJeQZLILrkkybH2g==/net.package-24GS5V1rQeWBo82xzjDSqA==/split_config.arm64_v8a.apk!lib/arm64-v8a/libtesseract.so (offset 0x8d1000) (tesseract::TessBaseAPI::GetUTF8Text()+60)
  #00  pc 00000000002a3c48  /data/app/~~-7k4zfCJeQZLILrkkybH2g==/net.package-24GS5V1rQeWBo82xzjDSqA==/split_config.arm64_v8a.apk!lib/arm64-v8a/libtesseract.so (offset 0x8d1000) (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetUTF8Text+64)
  #00  pc 000000000006d6a4  /data/app/~~-7k4zfCJeQZLILrkkybH2g==/net.package-24GS5V1rQeWBo82xzjDSqA==/oat/arm64/base.odex (art_jni_trampoline+132)
  #00  pc 00000000000710a4  /data/app/~~-7k4zfCJeQZLILrkkybH2g==/net.package-24GS5V1rQeWBo82xzjDSqA==/oat/arm64/base.odex (net.package.b.a.c+2564)
  #00  pc 0000000000072d20  /data/app/~~-7k4zfCJeQZLILrkkybH2g==/net.package-24GS5V1rQeWBo82xzjDSqA==/oat/arm64/base.odex (net.package.b.c.run+752)
  #00  pc 000000000015ab38  /apex/com.android.art/javalib/arm64/boot.oat (java.lang.Thread.run+72)
  #00  pc 0000000000133564  /apex/com.android.art/lib64/libart.so (art_quick_invoke_stub+548)
  #00  pc 00000000001a8a78  /apex/com.android.art/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+200)
  #00  pc 0000000000554cac  /apex/com.android.art/lib64/libart.so (art::JValue art::InvokeVirtualOrInterfaceWithJValues<art::ArtMethod*>(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, art::ArtMethod*, jvalue const*)+460)
  #00  pc 00000000005a4048  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+1308)
  #00  pc 00000000000b0048  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+64)
  #00  pc 00000000000503c8  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

preserve_interword_spaces available?

Thanks for great work and updates.
I'm using Tesseract in Android to recognize Korean characters now and I've been using Tesseract at PC with python. Almost everything works like at PC.
But only in Korean the output text has word spacing between every each character.
And I know that -c preserve_interword_spacing can fix this problem. I've searched for
http://localhost:63342/nm8ebwyhq7nmjt405kxwyfp053hpsn7y2puzf/imgTest/tesseract4android-2.1.0-javadoc.jar/com/googlecode/tesseract/android/package-summary.html
but there's no mention for that function.
Is it possible to use word spacing at Android? Or can I make it on my own?(Is it gonna be very difficult?)

tesseract-ocr version is 4.0.0

I import tesseract4android-2.0.0.aar,but the version is 4.0.0

        TessBaseAPI tessBaseAPI=new TessBaseAPI();
        String version = tessBaseAPI.getVersion();
        Log.e("vvv",tessBaseAPI.getVersion()); // 4.0.0

Could not initialize Tesseract API with language

I used getExternalFilesDir("/testOCR/") on Android12 to obtain external storage, which should not require read and write permissions. I copied the characters in assets to this directory and succeeded, but the Could not initialize Tesseract API with language, not working

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.