Coder Social home page Coder Social logo

thiagoalessio / tesseract-ocr-for-php Goto Github PK

View Code? Open in Web Editor NEW
2.8K 116.0 546.0 1.2 MB

A wrapper to work with Tesseract OCR inside PHP.

Home Page: https://packagist.org/packages/thiagoalessio/tesseract_ocr

License: MIT License

PHP 100.00%
ocr tesseract php text-recognition image-to-text

tesseract-ocr-for-php's Introduction

Tesseract OCR for PHP

Tesseract OCR for PHP

A wrapper to work with Tesseract OCR inside PHP.

CI AppVeyor Codacy Test Coverage
Latest Stable Version Total Downloads Monthly Downloads

Installation

Via Composer:

$ composer require thiagoalessio/tesseract_ocr

‼️ This library depends on Tesseract OCR, version 3.02 or later.


Note for Windows users

There are many ways to install Tesseract OCR on your system, but if you just want something quick to get up and running, I recommend installing the Capture2Text package with Chocolatey.

choco install capture2text --version 3.9

⚠️ Recent versions of Capture2Text stopped shipping the tesseract binary.


Note for macOS users

With MacPorts you can install support for individual languages, like so:

$ sudo port install tesseract-<langcode>

But that is not possible with Homebrew. It comes only with English support by default, so if you intend to use it for other language, the quickest solution is to install them all:

$ brew install tesseract tesseract-lang

Usage

Basic usage

use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('text.png'))
    ->run();
The quick brown fox
jumps over
the lazy dog.

Other languages

use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('german.png'))
    ->lang('deu')
    ->run();
Bülowstraße

Multiple languages

use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('mixed-languages.png'))
    ->lang('eng', 'jpn', 'spa')
    ->run();
I eat すし y Pollo

Inducing recognition

use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('8055.png'))
    ->allowlist(range('A', 'Z'))
    ->run();
BOSS

Breaking CAPTCHAs

Yes, I know some of you might want to use this library for the noble purpose of breaking CAPTCHAs, so please take a look at this comment:

#91 (comment)

API

run

Executes a tesseract command, optionally receiving an integer as timeout, in case you experience stalled tesseract processes.

$ocr = new TesseractOCR();
$ocr->run();
$ocr = new TesseractOCR();
$timeout = 500;
$ocr->run($timeout);

image

Define the path of an image to be recognized by tesseract.

$ocr = new TesseractOCR();
$ocr->image('/path/to/image.png');
$ocr->run();

imageData

Set the image to be recognized by tesseract from a string, with its size. This can be useful when dealing with files that are already loaded in memory. You can easily retrieve the image data and size of an image object :

//Using Imagick
$data = $img->getImageBlob();
$size = $img->getImageLength();
//Using GD
ob_start();
// Note that you can use any format supported by tesseract
imagepng($img, null, 0);
$size = ob_get_length();
$data = ob_get_clean();

$ocr = new TesseractOCR();
$ocr->imageData($data, $size);
$ocr->run();

executable

Define a custom location of the tesseract executable, if by any reason it is not present in the $PATH.

echo (new TesseractOCR('img.png'))
    ->executable('/path/to/tesseract')
    ->run();

version

Returns the current version of tesseract.

echo (new TesseractOCR())->version();

availableLanguages

Returns a list of available languages/scripts.

foreach((new TesseractOCR())->availableLanguages() as $lang) echo $lang;

More info: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages-and-scripts

tessdataDir

Specify a custom location for the tessdata directory.

echo (new TesseractOCR('img.png'))
    ->tessdataDir('/path')
    ->run();

userWords

Specify the location of user words file.

This is a plain text file containing a list of words that you want to be considered as a normal dictionary words by tesseract.

Useful when dealing with contents that contain technical terminology, jargon, etc.

$ cat /path/to/user-words.txt
foo
bar
echo (new TesseractOCR('img.png'))
    ->userWords('/path/to/user-words.txt')
    ->run();

userPatterns

Specify the location of user patterns file.

If the contents you are dealing with have known patterns, this option can help a lot tesseract's recognition accuracy.

$ cat /path/to/user-patterns.txt'
1-\d\d\d-GOOG-441
www.\n\\\*.com
echo (new TesseractOCR('img.png'))
    ->userPatterns('/path/to/user-patterns.txt')
    ->run();

lang

Define one or more languages to be used during the recognition. A complete list of available languages can be found at: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages

Tip from @daijiale: Use the combination ->lang('chi_sim', 'chi_tra') for proper recognition of Chinese.

 echo (new TesseractOCR('img.png'))
     ->lang('lang1', 'lang2', 'lang3')
     ->run();

psm

Specify the Page Segmentation Method, which instructs tesseract how to interpret the given image.

More info: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method

echo (new TesseractOCR('img.png'))
    ->psm(6)
    ->run();

oem

Specify the OCR Engine Mode. (see tesseract --help-oem)

echo (new TesseractOCR('img.png'))
    ->oem(2)
    ->run();

dpi

Specify the image DPI. It is useful if your image does not contain this information in its metadata.

echo (new TesseractOCR('img.png'))
    ->dpi(300)
    ->run();

allowlist

This is a shortcut for ->config('tessedit_char_whitelist', 'abcdef....').

echo (new TesseractOCR('img.png'))
    ->allowlist(range('a', 'z'), range(0, 9), '-_@')
    ->run();

configFile

Specify a config file to be used. It can either be the path to your own config file or the name of one of the predefined config files: https://github.com/tesseract-ocr/tesseract/tree/master/tessdata/configs

echo (new TesseractOCR('img.png'))
    ->configFile('hocr')
    ->run();

setOutputFile

Specify an Outputfile to be used. Be aware: If you set an outputfile then the option withoutTempFiles is ignored. Tempfiles are written (and deleted) even if withoutTempFiles = true.

In combination with configFile you are able to get the hocr, tsv or pdf files.

echo (new TesseractOCR('img.png'))
    ->configFile('pdf')
    ->setOutputFile('/PATH_TO_MY_OUTPUTFILE/searchable.pdf')
    ->run();

digits

Shortcut for ->configFile('digits').

echo (new TesseractOCR('img.png'))
    ->digits()
    ->run();

hocr

Shortcut for ->configFile('hocr').

echo (new TesseractOCR('img.png'))
    ->hocr()
    ->run();

pdf

Shortcut for ->configFile('pdf').

echo (new TesseractOCR('img.png'))
    ->pdf()
    ->run();

quiet

Shortcut for ->configFile('quiet').

echo (new TesseractOCR('img.png'))
    ->quiet()
    ->run();

tsv

Shortcut for ->configFile('tsv').

echo (new TesseractOCR('img.png'))
    ->tsv()
    ->run();

txt

Shortcut for ->configFile('txt').

echo (new TesseractOCR('img.png'))
    ->txt()
    ->run();

tempDir

Define a custom directory to store temporary files generated by tesseract. Make sure the directory actually exists and the user running php is allowed to write in there.

echo (new TesseractOCR('img.png'))
    ->tempDir('./my/custom/temp/dir')
    ->run();

withoutTempFiles

Specify that tesseract should output the recognized text without writing to temporary files. The data is gathered from the standard output of tesseract instead.

echo (new TesseractOCR('img.png'))
    ->withoutTempFiles()
    ->run();

Other options

Any configuration option offered by Tesseract can be used like that:

echo (new TesseractOCR('img.png'))
    ->config('config_var', 'value')
    ->config('other_config_var', 'other value')
    ->run();

Or like that:

echo (new TesseractOCR('img.png'))
    ->configVar('value')
    ->otherConfigVar('other value')
    ->run();

More info: https://github.com/tesseract-ocr/tesseract/wiki/ControlParams

Thread-limit

Sometimes, it may be useful to limit the number of threads that tesseract is allowed to use (e.g. in this case). Set the maxmium number of threads as param for the run function:

echo (new TesseractOCR('img.png'))
    ->threadLimit(1)
    ->run();

How to contribute

You can contribute to this project by:

  • Opening an Issue if you found a bug or wish to propose a new feature;
  • Placing a Pull Request with code that fix a bug, missing/wrong documentation or implement a new feature;

Just make sure you take a look at our Code of Conduct and Contributing instructions.

License

tesseract-ocr-for-php is released under the MIT License.

Made with love in Berlin

tesseract-ocr-for-php's People

Contributors

adamasantares avatar benmorel avatar betsuno avatar bourdaisj avatar den1n avatar drsassafras avatar iamvar avatar joshuamabina avatar lecodeurdudimanche avatar mal-risma avatar malanx avatar manubo avatar michaljusiega avatar mrusme avatar rhys-mcguckin avatar rizwanjiwan avatar scrutinizer-auto-fixer avatar sjorso avatar suud avatar thiagoalessio avatar tobias74 avatar zoilomora avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tesseract-ocr-for-php's Issues

Don't work with Microsoft IIS 7.5 server

hi,

I have a problem with IIS 7.5 server.
No output data is displayed.
However with Wamp server or command line on the IIS server, it works.

Can you help me?

Thank you very much.

exec takes no effect on tesseract

hi~
I have tried your code,it works out to have error when running to "exec("tesseract $tifImage $outputFile nobatch $configFile");"
No txt file generated,but it works if I run this code in linux terminal directly.

Haven't you met this? I am using centos 6.5 and php 5.5;

Hope your confirm~

No errors return nothing with very easy test

Hello

I try your tesseract-ocr-for-php to decode number in a gif file I follow your instalation instructions and your sample exemple but return me empty result page witheout any errors :

my image file -> http://hpics.li/4e167da
i try to a text in paint file like this but same result -> http://hpics.li/708e750

My code

require_once './vendor/autoload.php'; //if you are using composer

$tesseract = new TesseractOCR('hello.png');
echo $tesseract->recognize();

Maybe I do something wrong ??

Thank in advance for your return.

test
test2

Warnings in lines 236 and 249

Hi there,

I'm running Tesseract at Linux server, PHP 5.4. I uploaded main class file into classes/TesseractOCR.php, set chmod to 666 on and unfortunatelly it still returning warnings instead of any values from png file.

Here is my code:

require_once 'classes/TesseractOCR.php';
....
$tesseract = new TesseractOCR($file);
$tesseract->setTempDir('/absolute_path_to/temp/'); // temp folder have chmod 777
$tesseract->setWhitelist(range(0,9), '-+.'); // i'm trying to recognize phone numbers
echo $tesseract->recognize();

I tried many ways with path to temp folder (and also without it), but I still gain nothing. I'll very appreciate any help or suggestion; I tried to google out solution but every one comes to set temp folder, which not helps me.

Many thanks in advance.

file_get_contents

i work in local system xampp in tesseract-ocr, working properly bt upload online linux server to not working fiend this error

Warning: file_get_contents(temp/1135767769.txt): failed to open stream: No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 255

Warning: unlink(temp/1135767769.txt): No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 268

First example problem.

As shown in the README I simple did this,

require_once 'TesseractOCR.php';
$tesseract = new TesseractOCR('foo.png');
echo $tesseract->recognize();

and I am getting an error

Warning: file_get_contents(/tmp/798893693.txt): failed to open stream: No such file or directory in /var/.../TesseractOCR.php on line 212

Warning: unlink(/tmp/798893693.txt): No such file or directory in /var/.../TesseractOCR.php on line 225

Both the image foo.png and TerssoractOCR.php are in same directory. Furthermore, I didn't understand the dependencies you talk about, but I donwloaded it anyway, but I have no idea where to put it.

How to set config or parameters

please give an example snippet to set initialization parameters or through config files.
so that we can do different task like applying grayscale filter etc

setVerbose

I dont wanna see this Tesseract Open Source OCR Engine v3.02.02 with Leptonica each time.
I have added this after line 222 $command .= " 2>&1 1> /dev/null";. An option in the constructor or as a setter would be helpful for people annoyed with the message.

XY coordinates

Hello, how could I retrieve the XY coordinates for each word recognized ?

Doesn't display any thing

Hello,

I created a simple demo using example. but, it doesn't work .It shows nothing in echo.I ran
tesseract images/8055.png test in command line.
in the same directory of project,it worked,it added the contents of image "8055.png" to test.txt. Here is my code
<?php require_once './TesseractOCR.php'; $ec=new TesseractOCR('./images/text.jpeg'); echo $ec->run(); ?>

I am using wamp server as localhost.Here is my folder structure
TesseractOCR.php
test.php
images
images/text.jpeg

What am I doing wrong.Please help.

file_get_contents error

Hello there,

I have build a small app on my local machine with this class. I had a lot of issues i first run it on my windows machine but i was able to fix them by making changes that i read on discussions here.
Now i am trying to put my app online on a linux server and i have an issue that is hard to fix.
When i run the script with command line it works perfectly but on browser it says: file_get_contents(tempi/1525942339.txt): failed to open stream: No such file or directory in /var/www/html/ocr/TesseractOCR/TesseractOCR.php on line 236

I have tried some of tricks discussed here but non of them worked for me.

Regards
Agon

install on AWS EC2

Its not an issue just need support. How can I install it on AWS EC2 ubuntu machine

psm command

Is it possible to set psm command into the script?

-psm N
Set Tesseract to only run a subset of layout analysis and assume a
certain form of image. The options for N are:

           0 = Orientation and script detection (OSD) only.
           1 = Automatic page segmentation with OSD.
           2 = Automatic page segmentation, but no OSD, or OCR.
           3 = Fully automatic page segmentation, but no OSD. (Default)
           4 = Assume a single column of text of variable sizes.
           5 = Assume a single uniform block of vertically aligned text.
           6 = Assume a single uniform block of text.
           7 = Treat the image as a single text line.
           8 = Treat the image as a single word.
           9 = Treat the image as a single word in a circle.
           10 = Treat the image as a single character.

Fix a small bug for set Tesseract language PHP API with chinese

Hi,thiagoalessio!
Glad to find and use your project tesseract-ocr-for-php, it does help me a lot! Thanks for your selfless contribution!
But during my coding with my ocr-chinese-project, I have used your new API : (new TesseractOCR('xxx.png'))->lang('deu'), your wiki said we need to be specified as 3-character ISO 639-2 language codes.But when I open the ISO 639-2 Page, I find chinese of ISO 639-2 is chi/zho,which is no suit to your API.
For example: (new TesseractOCR('chinese.png')) ->lang('chi'), it doesn't work here,And I changed the ISO Code as (new TesseractOCR('chinese.png')) ->lang('chi_sim'),or, (new TesseractOCR('chinese.png')) ->lang('chi_tra'),the same to the original tesseract langdata , it does work!Maybe the bug of ISO 639-2 is not only appear for chinese!Hope you can pay attention!
I will fix the wiki and pull a request for your repository later , hope it can help you!Thanks again!I am a university student from china。 Expect to help for this tesseract-ocr-for-php project and make friend with you!

                                                                                                                 Dave  2016.5.25  InChina

failed to open stream

i'm trying run this on wamp server and it gives several warning messages. "Warning: file_get_contents(tmp/tesseract-ocr-output-24039.txt) [function.file-get-contents]: failed to open stream: No such file or directory in C:\wamp\www\tesseract-ocr-for-php-master\tesseract_ocr\tesseract_ocr.php on line 49"

how to fix this issue? Thanks in advance.

Cannot run PSM(0)

Hi, I'm trying to run Tesseract in Orientation and script detection (OSD) only mode (PSM 0).

On the command line, setting PSM 0 works and produces something like:

Orientation: 0
Orientation in degrees: 0
Orientation confidence: 22.31
Script: 1
Script confidence: 36.67

I was hoping that the following code would produce that result, but instead it just gives a PSM 3 result of the OCR (default):

$tess = (new TesseractOCR(storage_path('app/doc.jpg')));
$tess->psm(0);
$text = $tess->run();
//Returns PSM 3 default OCR result

The other PSM modes (1-10) produce expected results, it's just PSM 0 which I can't get to work.

Is there a way to run PSM 0 and get the actual orientation result instead of the OCR? Is this an error, or is getting the OCR back for PSM 0 the expected result?

Thanks

text file is not generating on live website

my execute command
protected function execute()
{
$path = getenv('PATH');
putenv("PATH=$path:/usr/local/bin/tesseract ");

    $pathtess = $path."/tessdata";
             $this->outputFile = rand();
     echo $this->buildTesseractCommand();
             exec($this->buildTesseractCommand(),$pathtess);
}

tesseract-ocr-for-php in laravel 5.2

I make all step to install this library into laravel 5.2
and I use this into my Controller but it's return null and make nothing.
Could you have me. This is my code

image

But it's return blank page

Language

Can we set the language of OCR

Simple test doesn't work

Hello

Id' like to use TesseractOCR, so I installed it with composer, as mentioned in the readme file, and I tried the simple example from the readme file, with the same picture. Here is my code:

`<?php
require DIR.'/phpshell-2.4/vendor/autoload.php';

echo (new TesseractOCR('text.jpeg')) ->run();

?>`

I don't have any error and my file does exist (I tried with a if(is_file(...)) before). However, the returned string is null.
Am I missing something ?

documentation wrong

documentation seems to be all wrong.

No run() method (now recognize)
No lang() method (now setLanguage)

etc

not reading image

Code


include 'TesseractOCR.php';
//$obj = new TesseractOCR('text.png');
//var_dump($obj->run());
var_dump (new TesseractOCR('var/www/html/ocr-tesseract/src/text.png'))->run();
//echo dirname(__FILE__);
die('dafs');

object(TesseractOCR)#1 (8) { ["image":"TesseractOCR":private]=> string(39) "var/www/html/ocr-tesseract/src/text.png" ["executable":"TesseractOCR":private]=> string(9) "tesseract" ["tessdataDir":"TesseractOCR":private]=> NULL ["userWords":"TesseractOCR":private]=> NULL ["userPatterns":"TesseractOCR":private]=> NULL ["languages":"TesseractOCR":private]=> array(0) { } ["psm":"TesseractOCR":private]=> NULL ["configs":"TesseractOCR":private]=> array(0) { } }

file_get_contents error NEW

Hello there,

I have build a small app on my linux shared server and wamp server on local machine also with this class. I had a lot of issues.
Now i am trying to put my app online on a linux server and i have an issue that is hard to fix.
When i run the script with command line it works perfectly but on browser it says:

Warning: file_get_contents(temp/1135767769.txt): failed to open stream: No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 255

Warning: unlink(temp/1135767769.txt): No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 268

I have tried some of tricks discussed here but non of them worked for me.

Regards
big89

Returns empty string.

Hi,

I am using Centos 5 with tesseract-2.04-2.

When I use tesseract on console I can only read .tif image, for other it does not recognise the type.

But in my script it does not even read the tif, juste return empty string.

This is how I am using it.

include('TesseractOCR.php');
if (isset($_FILES['file'])) {
$tesseract = new TesseractOCR($_FILES['file']['tmp_name']);
$result = $tesseract->recognize();
}

Any idea where I am going wrong with it.

no image read

hello sir,

i am useing tesseractOCR for image reading but that doesnot work. the output of image is blank.please help me

tmpfile stream not opened

Hi,

Its a great library, but Im very new to the TesseractOCR.

I cannot output the TesseractOCR('text.jpg') as the error log of from Apache server saying
"Warning in pixReadMemJpeg: work-around: writing to a temp file
Error in pixReadMemJpeg: tmpfile stream not opened
Error in pixReadMem: jpeg: no pix returned
Error during processing."

If I used Terminal to Tesseract the text.jpg image, it will works without an issue and the output.txt file is being generated. However, when I do this on the browser to run the php file with the followiing code:
require_once dir . '/vendor/autoload.php';
echo (new TesseractOCR('text.jpg'))
->run();

it won't render and produce the error above.
Below was what I did:
I have added full path to the Tesseract executable file.
On the TesseractOCR.php, I have add a full path to the $executable variable (line 20)
private $executable = '/opt/local/bin/tesseract';

Can you please guide me on what does the error mean?, and what am I doing wrong here?

My operating system is:
OSX El Capitan 10.11.4
XAMPP Server
PHP Version 5.6.8

Thanks
Mike

Tesseract OCR

I am bascially a beginner in programming and developing a web application which converts Image into Text i have used Orcad for this but it's for simple text and creating many issues i want to know is this i helpful to me that i can give an interface to user in which he can upload a file of formats like png, jpeg etc and that web app converts it into text i am bascially learning node.js for this kindly guide me which is better and easy to learn particulary for this project?

No issues just question

I want to use bufferimage or image raw data as input in tesseract.
let's say i have a big image and I want only a small portion of it, to read by tesseract.
I can do Imagemagick crop method, the problem is
new TesseractOCR() // only accept only existing file not raw data.
that means I need to use Imagemagick->writeImage() method. to create local file // which is not efficient.
Does anyone know how to do it. directly input image data in tesseract. Thanks.

cropImage(100,100,0,0); $a = new TesseractOCR($img); // I want tesseract to accept image directly without writing it. echo $a->recognize(); ?>

missing error handling during execution of tesseract

Hi guys,

if e.g. the inout is not a valid file the error messages of tesseract were taken as text, which is unwanted behavior I guess. It would be better to consider the returncode of the tesseract command and in case of errors to throw an exception.

Multi-language?

Can I use something like this?

$tesseract->setLanguage('rus+eng');

Stupid Question

If i have bought web-hosting can i use that?
I mean, does this REQUIRE to have any software installed?
Or just PHP should work ?

Disabling dictionary

Are there any support in thiagoalessio/tesseract-ocr-for-php to disable the language dictionary?

Libraries(Chinese)

Where do I insert the language pack for chinses. And curious where is the core code , I don't see how these files will retrieve some other code.

Just wondering how github/composer work xD

Not recognizing number 8

I'm having a problem with number 8, it is read as 3...
my code:
$text = (new TesseractOCR("resized.png")) ->whitelist(range('A', 'z'), range(0,9), range(":",",")) ->psm(6) ->run();
Result:
image

I already change PSM, but no success.
Anyone can help?

Install into Xampp not working gave blank result?

Hi, I've installed tesseract using composer into my window installed xampp PHP(5.5.37) localhost. Everything is run in 'htdocs/picture' directory folder (version 1.0.0-RC). Tried testing the program with the code below written in test.php but return me blank echo? Please enlighten me. Thanks.

<?php
require_once 'vendor/autoload.php'; // Tried omitting this statement however it return me Fatal error: Class 'TesseractOCR' not found in C:\xampp\htdocs\picture\test.php on line 3, therefore added it back but gave blank statement.

echo (new TesseractOCR('text.jpeg'))->run();
?>

setLanguage

When using the setLanguage option it throws an error.
I fixed this by placing $command.= " {$this->outputFile}"; above the $command.= " -l {$this->language}";

Absolute paths for image and TempDir

Would have been nice to mention you can't use absolute paths for image location and TempDir. I had this working in "vanilla" PHP in minutes, but then when I tried to integrate it into Laravel, it just wouldn't work. It took me 4 hours of trial and error only to realize

$tesseract = new TesseractOCR("C:\Program Files\wamp\www\test\public\img\hello.png");

doesn't work, and that only relative paths work.

setCommand()

I always have to modify your file after updating for local and prod and after updating. They all have different paths for the tesseract command.

Make it simple and allow people to set custom tesseract path.

Suporte para PDF

Thiago,
Acredito que o suporte a OCR em arquivos PDF seja simples. Precisei desse recurso e adicionei uma função no código que converte um pdf para tiff:
Segue:
function convertPDFToTif($originalImage) {
$tifImage = sys_get_temp_dir().'/tesseract-ocr-tif-'.rand().'.tif';
exec("gs -dNOPAUSE -q -r300 -sDEVICE=tiffg4 -dBATCH -sOutputFile=$tifImage $originalImage");
return $tifImage;
}

Logicamente, é necessário ter o ghostscript instalado.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.