thiagoalessio / tesseract-ocr-for-php Goto Github PK

View Code? Open in Web Editor NEW

2.8K 2.8K 546.0 1.2 MB

A wrapper to work with Tesseract OCR inside PHP.

Home Page: https://packagist.org/packages/thiagoalessio/tesseract_ocr

License: MIT License

PHP 100.00%

image-to-text ocr php tesseract text-recognition

tesseract-ocr-for-php's People

Contributors

Stargazers

Watchers

Forkers

theclanks mcampa petch0399 alvarlaigna bgarlock hazmohd mixtec joglomedia shivakarna86 vamshavardhan wiskeyjohn th3n3rd ron4stoppable liveadsense arafa75 ruanaragao gabrielgmm staticall hamzist magnoleandro adamf15 umairamjad swarnat godinall chrismaddock zouzehe tusharsnk marlncpe borduchi bobdoles mkocevar yarec colorwalf tomkita giovannicocco lounesh ulcaro kkl886 sibghatullahsheikh seferov qxp1011 jonasporto giampy5 payonesmile pytonic shareed2k joelgarciajr84 champtechnet hdimo baixinxing hkopenc133075 ronisaha leopucci-zz marcelosboeira whaleinvasion jnesbeth steve-goodwin ringofholder bdstefan tsadimas flashnet69 ardha2008 felipebarroscruz grmxque tiagocardosos hexiyou mahedi2014 sprklinginfo flames silvester mainakibui defan-marunchak l1291434519 lupi-stole-my-code hawei bercanozcan sujeshthekkepatt crimsonvspurple guimconde igormx aisuvro wolu mamiao26536 cake654326 isfpcn collmomo blmeena1991 sagarsdeshmukh carlllenares djeraseit kleberco evgeniyblinov blestab rajesh155 rubythonode atmira586 bkrukowski haitao1880 hongchangfirst hpaulo

tesseract-ocr-for-php's Issues

text file is not generating on live website

my execute command
protected function execute()
{
$path = getenv('PATH');
putenv("PATH=$path:/usr/local/bin/tesseract ");

    $pathtess = $path."/tessdata";
             $this->outputFile = rand();
     echo $this->buildTesseractCommand();
             exec($this->buildTesseractCommand(),$pathtess);
}

setVerbose

I dont wanna see this Tesseract Open Source OCR Engine v3.02.02 with Leptonica each time.
I have added this after line 222 $command .= " 2>&1 1> /dev/null";. An option in the constructor or as a setter would be helpful for people annoyed with the message.

tesseract-ocr-for-php in laravel 5.2

I make all step to install this library into laravel 5.2
and I use this into my Controller but it's return null and make nothing.
Could you have me. This is my code

But it's return blank page

tmpfile stream not opened

Hi,

Its a great library, but Im very new to the TesseractOCR.

I cannot output the TesseractOCR('text.jpg') as the error log of from Apache server saying
"Warning in pixReadMemJpeg: work-around: writing to a temp file
Error in pixReadMemJpeg: tmpfile stream not opened
Error in pixReadMem: jpeg: no pix returned
Error during processing."

If I used Terminal to Tesseract the text.jpg image, it will works without an issue and the output.txt file is being generated. However, when I do this on the browser to run the php file with the followiing code:
require_once dir . '/vendor/autoload.php';
echo (new TesseractOCR('text.jpg'))
->run();

it won't render and produce the error above.
Below was what I did:
I have added full path to the Tesseract executable file.
On the TesseractOCR.php, I have add a full path to the $executable variable (line 20)
private $executable = '/opt/local/bin/tesseract';

Can you please guide me on what does the error mean?, and what am I doing wrong here?

My operating system is:
OSX El Capitan 10.11.4
XAMPP Server
PHP Version 5.6.8

Thanks
Mike

file_get_contents error NEW

Hello there,

I have build a small app on my linux shared server and wamp server on local machine also with this class. I had a lot of issues.
Now i am trying to put my app online on a linux server and i have an issue that is hard to fix.
When i run the script with command line it works perfectly but on browser it says:

Warning: file_get_contents(temp/1135767769.txt): failed to open stream: No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 255

Warning: unlink(temp/1135767769.txt): No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 268

I have tried some of tricks discussed here but non of them worked for me.

Regards
big89

setCommand()

I always have to modify your file after updating for local and prod and after updating. They all have different paths for the tesseract command.

Make it simple and allow people to set custom tesseract path.

documentation wrong

documentation seems to be all wrong.

No run() method (now recognize)
No lang() method (now setLanguage)

etc

Suporte para PDF

Thiago,
Acredito que o suporte a OCR em arquivos PDF seja simples. Precisei desse recurso e adicionei uma função no código que converte um pdf para tiff:
Segue:
function convertPDFToTif($originalImage) {
$tifImage = sys_get_temp_dir().'/tesseract-ocr-tif-'.rand().'.tif';
exec("gs -dNOPAUSE -q -r300 -sDEVICE=tiffg4 -dBATCH -sOutputFile=$tifImage $originalImage");
return $tifImage;
}

Logicamente, é necessário ter o ghostscript instalado.

How limit number of character?

i want limit number of character.
How limit number of character?

Tesseract OCR doesn't work

Hi,

I'm trying to install tesseract for php on my webserver. I installed it by downloading the source from github. I want now to install https://github.com/tesseract-ocr/tesseract, but I don't know where to change the $path and where to put this scripts.
Can you please help me?
Thanks!

file_get_contents error

Hello there,

I have build a small app on my local machine with this class. I had a lot of issues i first run it on my windows machine but i was able to fix them by making changes that i read on discussions here.
Now i am trying to put my app online on a linux server and i have an issue that is hard to fix.
When i run the script with command line it works perfectly but on browser it says: file_get_contents(tempi/1525942339.txt): failed to open stream: No such file or directory in /var/www/html/ocr/TesseractOCR/TesseractOCR.php on line 236

I have tried some of tricks discussed here but non of them worked for me.

Regards
Agon

Fatal error: Class 'PHPUnit_Framework_TestCase' not found

i downloaded it as a zip file as php project and i got this error
i don't know is this a laravel project or not but i uploaded it without laravel

Fix a small bug for set Tesseract language PHP API with chinese

Hi,thiagoalessio!
Glad to find and use your project tesseract-ocr-for-php, it does help me a lot! Thanks for your selfless contribution！
But during my coding with my ocr-chinese-project, I have used your new API : (new TesseractOCR('xxx.png'))->lang('deu'), your wiki said we need to be specified as 3-character ISO 639-2 language codes.But when I open the ISO 639-2 Page, I find chinese of ISO 639-2 is chi/zho,which is no suit to your API.
For example: (new TesseractOCR('chinese.png')) ->lang('chi'), it doesn't work here，And I changed the ISO Code as (new TesseractOCR('chinese.png')) ->lang('chi_sim'),or, (new TesseractOCR('chinese.png')) ->lang('chi_tra'),the same to the original tesseract langdata , it does work！Maybe the bug of ISO 639-2 is not only appear for chinese！Hope you can pay attention！
I will fix the wiki and pull a request for your repository later ， hope it can help you！Thanks again！I am a university student from china。 Expect to help for this tesseract-ocr-for-php project and make friend with you！

                                                                                                                 Dave  2016.5.25  InChina

Multi-page PDF?

Is there support for multi-page PDFs?

Khmer Languge Support

Do OCR support for khmer language ?

Tesseract ocr php in cpanal

how can i use your php wrapper in cpanal,i am unable to find any tutorial on it.
Thank you

only parse particular area of an image

is it possible to define a specific area within the image to be parsed? like x,y,w,h ?

not reading image

Code


include 'TesseractOCR.php';
//$obj = new TesseractOCR('text.png');
//var_dump($obj->run());
var_dump (new TesseractOCR('var/www/html/ocr-tesseract/src/text.png'))->run();
//echo dirname(__FILE__);
die('dafs');

object(TesseractOCR)#1 (8) { ["image":"TesseractOCR":private]=> string(39) "var/www/html/ocr-tesseract/src/text.png" ["executable":"TesseractOCR":private]=> string(9) "tesseract" ["tessdataDir":"TesseractOCR":private]=> NULL ["userWords":"TesseractOCR":private]=> NULL ["userPatterns":"TesseractOCR":private]=> NULL ["languages":"TesseractOCR":private]=> array(0) { } ["psm":"TesseractOCR":private]=> NULL ["configs":"TesseractOCR":private]=> array(0) { } }

Language

Can we set the language of OCR

psm command

Is it possible to set psm command into the script?

-psm N
Set Tesseract to only run a subset of layout analysis and assume a
certain form of image. The options for N are:

           0 = Orientation and script detection (OSD) only.
           1 = Automatic page segmentation with OSD.
           2 = Automatic page segmentation, but no OSD, or OCR.
           3 = Fully automatic page segmentation, but no OSD. (Default)
           4 = Assume a single column of text of variable sizes.
           5 = Assume a single uniform block of vertically aligned text.
           6 = Assume a single uniform block of text.
           7 = Treat the image as a single text line.
           8 = Treat the image as a single word.
           9 = Treat the image as a single word in a circle.
           10 = Treat the image as a single character.

file_get_contents

i work in local system xampp in tesseract-ocr, working properly bt upload online linux server to not working fiend this error

Warning: file_get_contents(temp/1135767769.txt): failed to open stream: No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 255

Warning: unlink(temp/1135767769.txt): No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 268

install on AWS EC2

Its not an issue just need support. How can I install it on AWS EC2 ubuntu machine

Fatal error: Call to undefined method TesseractOCR::run() in /var/www/fixtures/index.php on line 11

Error

Fatal error: Call to undefined method TesseractOCR::run() in /var/www/fixtures/index.php on line 11

PHP Code

<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
include ('.vendor/autoload.php');
echo (new TesseractOCR('screen.jpg'))
    ->run();

?>

any ideas?

"thiagoalessio/tesseract_ocr": "^0.2.1"

installed

Returns empty string.

Hi,

I am using Centos 5 with tesseract-2.04-2.

When I use tesseract on console I can only read .tif image, for other it does not recognise the type.

But in my script it does not even read the tif, juste return empty string.

This is how I am using it.

include('TesseractOCR.php');
if (isset($_FILES['file'])) {
$tesseract = new TesseractOCR($_FILES['file']['tmp_name']);
$result = $tesseract->recognize();
}

Any idea where I am going wrong with it.

codeclimate configuration

Hello,

When automated tests are runs from travis, you got an error from
Notice: Undefined index: CODECLIMATE_REPO_TOKEN

See the end of the log : https://api.travis-ci.org/jobs/175863052/log.txt?deansi=true

exec takes no effect on tesseract

hi~
I have tried your code,it works out to have error when running to "exec("tesseract $tifImage $outputFile nobatch $configFile");"
No txt file generated,but it works if I run this code in linux terminal directly.

Haven't you met this? I am using centos 6.5 and php 5.5;

Hope your confirm~

Disabling dictionary

Are there any support in thiagoalessio/tesseract-ocr-for-php to disable the language dictionary?

Libraries(Chinese)

Where do I insert the language pack for chinses. And curious where is the core code , I don't see how these files will retrieve some other code.

Just wondering how github/composer work xD

No errors return nothing with very easy test

Hello

I try your tesseract-ocr-for-php to decode number in a gif file I follow your instalation instructions and your sample exemple but return me empty result page witheout any errors :

my image file -> http://hpics.li/4e167da
i try to a text in paint file like this but same result -> http://hpics.li/708e750

My code

require_once './vendor/autoload.php'; //if you are using composer

$tesseract = new TesseractOCR('hello.png');
echo $tesseract->recognize();

Maybe I do something wrong ??

Thank in advance for your return.

Cannot run PSM(0)

Hi, I'm trying to run Tesseract in Orientation and script detection (OSD) only mode (PSM 0).

On the command line, setting PSM 0 works and produces something like:

Orientation: 0
Orientation in degrees: 0
Orientation confidence: 22.31
Script: 1
Script confidence: 36.67

I was hoping that the following code would produce that result, but instead it just gives a PSM 3 result of the OCR (default):

$tess = (new TesseractOCR(storage_path('app/doc.jpg')));
$tess->psm(0);
$text = $tess->run();
//Returns PSM 3 default OCR result

The other PSM modes (1-10) produce expected results, it's just PSM 0 which I can't get to work.

Is there a way to run PSM 0 and get the actual orientation result instead of the OCR? Is this an error, or is getting the OCR back for PSM 0 the expected result?

Thanks

How to set config or parameters

please give an example snippet to set initialization parameters or through config files.
so that we can do different task like applying grayscale filter etc

Don't work with Microsoft IIS 7.5 server

hi,

I have a problem with IIS 7.5 server.
No output data is displayed.
However with Wamp server or command line on the IIS server, it works.

Can you help me?

Thank you very much.

XY coordinates

Hello, how could I retrieve the XY coordinates for each word recognized ?

Warnings in lines 236 and 249

Hi there,

I'm running Tesseract at Linux server, PHP 5.4. I uploaded main class file into classes/TesseractOCR.php, set chmod to 666 on and unfortunatelly it still returning warnings instead of any values from png file.

Here is my code:

require_once 'classes/TesseractOCR.php';
....
$tesseract = new TesseractOCR($file);
$tesseract->setTempDir('/absolute_path_to/temp/'); // temp folder have chmod 777
$tesseract->setWhitelist(range(0,9), '-+.'); // i'm trying to recognize phone numbers
echo $tesseract->recognize();

I tried many ways with path to temp folder (and also without it), but I still gain nothing. I'll very appreciate any help or suggestion; I tried to google out solution but every one comes to set temp folder, which not helps me.

Many thanks in advance.

setLanguage

When using the setLanguage option it throws an error.
I fixed this by placing $command.= " {$this->outputFile}"; above the $command.= " -l {$this->language}";

Multi-language?

Can I use something like this?

$tesseract->setLanguage('rus+eng');

no image read

hello sir,

i am useing tesseractOCR for image reading but that doesnot work. the output of image is blank.please help me

Stupid Question

If i have bought web-hosting can i use that?
I mean, does this REQUIRE to have any software installed?
Or just PHP should work ?

Simple test doesn't work

Hello

Id' like to use TesseractOCR, so I installed it with composer, as mentioned in the readme file, and I tried the simple example from the readme file, with the same picture. Here is my code:

`<?php
require DIR.'/phpshell-2.4/vendor/autoload.php';

echo (new TesseractOCR('text.jpeg')) ->run();

?>`

I don't have any error and my file does exist (I tried with a if(is_file(...)) before). However, the returned string is null.
Am I missing something ?

First example problem.

As shown in the README I simple did this,

require_once 'TesseractOCR.php';
$tesseract = new TesseractOCR('foo.png');
echo $tesseract->recognize();

and I am getting an error

Warning: file_get_contents(/tmp/798893693.txt): failed to open stream: No such file or directory in /var/.../TesseractOCR.php on line 212

Warning: unlink(/tmp/798893693.txt): No such file or directory in /var/.../TesseractOCR.php on line 225

Both the image foo.png and TerssoractOCR.php are in same directory. Furthermore, I didn't understand the dependencies you talk about, but I donwloaded it anyway, but I have no idea where to put it.

Absolute paths for image and TempDir

Would have been nice to mention you can't use absolute paths for image location and TempDir. I had this working in "vanilla" PHP in minutes, but then when I tried to integrate it into Laravel, it just wouldn't work. It took me 4 hours of trial and error only to realize

$tesseract = new TesseractOCR("C:\Program Files\wamp\www\test\public\img\hello.png");

doesn't work, and that only relative paths work.

How to use regular expressions with tesseract-php

I m running tesseract on windows (xampp), I want to format the result what I want, It should be in \d{1,9}.((v|V)|(x|X)) format. Can someone help me how to setup that rule in tesseract in php.

failed to open stream

i'm trying run this on wamp server and it gives several warning messages. "Warning: file_get_contents(tmp/tesseract-ocr-output-24039.txt) [function.file-get-contents]: failed to open stream: No such file or directory in C:\wamp\www\tesseract-ocr-for-php-master\tesseract_ocr\tesseract_ocr.php on line 49"

how to fix this issue? Thanks in advance.

missing error handling during execution of tesseract

Hi guys,

if e.g. the inout is not a valid file the error messages of tesseract were taken as text, which is unwanted behavior I guess. It would be better to consider the returncode of the tesseract command and in case of errors to throw an exception.

Not recognizing number 8

I'm having a problem with number 8, it is read as 3...
my code:
$text = (new TesseractOCR("resized.png")) ->whitelist(range('A', 'z'), range(0,9), range(":",",")) ->psm(6) ->run();
Result:

I already change PSM, but no success.
Anyone can help?

No issues just question

I want to use bufferimage or image raw data as input in tesseract.
let's say i have a big image and I want only a small portion of it, to read by tesseract.
I can do Imagemagick crop method, the problem is
new TesseractOCR() // only accept only existing file not raw data.
that means I need to use Imagemagick->writeImage() method. to create local file // which is not efficient.
Does anyone know how to do it. directly input image data in tesseract. Thanks.

cropImage(100,100,0,0); $a = new TesseractOCR($img); // I want tesseract to accept image directly without writing it. echo $a->recognize(); ?>

Install into Xampp not working gave blank result?

Hi, I've installed tesseract using composer into my window installed xampp PHP(5.5.37) localhost. Everything is run in 'htdocs/picture' directory folder (version 1.0.0-RC). Tried testing the program with the code below written in test.php but return me blank echo? Please enlighten me. Thanks.

<?php
require_once 'vendor/autoload.php'; // Tried omitting this statement however it return me Fatal error: Class 'TesseractOCR' not found in C:\xampp\htdocs\picture\test.php on line 3, therefore added it back but gave blank statement.

echo (new TesseractOCR('text.jpeg'))->run();
?>

Doesn't display any thing

Hello,

I created a simple demo using example. but, it doesn't work .It shows nothing in echo.I ran
tesseract images/8055.png test in command line.
in the same directory of project,it worked,it added the contents of image "8055.png" to test.txt. Here is my code
<?php require_once './TesseractOCR.php'; $ec=new TesseractOCR('./images/text.jpeg'); echo $ec->run(); ?>

I am using wamp server as localhost.Here is my folder structure
TesseractOCR.php
test.php
images
images/text.jpeg

What am I doing wrong.Please help.

Tesseract OCR

I am bascially a beginner in programming and developing a web application which converts Image into Text i have used Orcad for this but it's for simple text and creating many issues i want to know is this i helpful to me that i can give an interface to user in which he can upload a file of formats like png, jpeg etc and that web app converts it into text i am bascially learning node.js for this kindly guide me which is better and easy to learn particulary for this project?

sample image to test

hi, is there anyone able to extract phone number from this image?

https://s3-ap-southeast-1.amazonaws.com/jualo/user_phones/326113/phone_number20160313-14251-1xbhb3l.jpg