Coder Social home page Coder Social logo

thiagoalessio / tesseract-ocr-for-php Goto Github PK

View Code? Open in Web Editor NEW
2.8K 117.0 546.0 1.2 MB

A wrapper to work with Tesseract OCR inside PHP.

Home Page: https://packagist.org/packages/thiagoalessio/tesseract_ocr

License: MIT License

PHP 100.00%
ocr tesseract php text-recognition image-to-text

tesseract-ocr-for-php's Issues

XY coordinates

Hello, how could I retrieve the XY coordinates for each word recognized ?

tmpfile stream not opened

Hi,

Its a great library, but Im very new to the TesseractOCR.

I cannot output the TesseractOCR('text.jpg') as the error log of from Apache server saying
"Warning in pixReadMemJpeg: work-around: writing to a temp file
Error in pixReadMemJpeg: tmpfile stream not opened
Error in pixReadMem: jpeg: no pix returned
Error during processing."

If I used Terminal to Tesseract the text.jpg image, it will works without an issue and the output.txt file is being generated. However, when I do this on the browser to run the php file with the followiing code:
require_once dir . '/vendor/autoload.php';
echo (new TesseractOCR('text.jpg'))
->run();

it won't render and produce the error above.
Below was what I did:
I have added full path to the Tesseract executable file.
On the TesseractOCR.php, I have add a full path to the $executable variable (line 20)
private $executable = '/opt/local/bin/tesseract';

Can you please guide me on what does the error mean?, and what am I doing wrong here?

My operating system is:
OSX El Capitan 10.11.4
XAMPP Server
PHP Version 5.6.8

Thanks
Mike

install on AWS EC2

Its not an issue just need support. How can I install it on AWS EC2 ubuntu machine

Returns empty string.

Hi,

I am using Centos 5 with tesseract-2.04-2.

When I use tesseract on console I can only read .tif image, for other it does not recognise the type.

But in my script it does not even read the tif, juste return empty string.

This is how I am using it.

include('TesseractOCR.php');
if (isset($_FILES['file'])) {
$tesseract = new TesseractOCR($_FILES['file']['tmp_name']);
$result = $tesseract->recognize();
}

Any idea where I am going wrong with it.

tesseract-ocr-for-php in laravel 5.2

I make all step to install this library into laravel 5.2
and I use this into my Controller but it's return null and make nothing.
Could you have me. This is my code

image

But it's return blank page

Stupid Question

If i have bought web-hosting can i use that?
I mean, does this REQUIRE to have any software installed?
Or just PHP should work ?

setLanguage

When using the setLanguage option it throws an error.
I fixed this by placing $command.= " {$this->outputFile}"; above the $command.= " -l {$this->language}";

file_get_contents error

Hello there,

I have build a small app on my local machine with this class. I had a lot of issues i first run it on my windows machine but i was able to fix them by making changes that i read on discussions here.
Now i am trying to put my app online on a linux server and i have an issue that is hard to fix.
When i run the script with command line it works perfectly but on browser it says: file_get_contents(tempi/1525942339.txt): failed to open stream: No such file or directory in /var/www/html/ocr/TesseractOCR/TesseractOCR.php on line 236

I have tried some of tricks discussed here but non of them worked for me.

Regards
Agon

no image read

hello sir,

i am useing tesseractOCR for image reading but that doesnot work. the output of image is blank.please help me

missing error handling during execution of tesseract

Hi guys,

if e.g. the inout is not a valid file the error messages of tesseract were taken as text, which is unwanted behavior I guess. It would be better to consider the returncode of the tesseract command and in case of errors to throw an exception.

Warnings in lines 236 and 249

Hi there,

I'm running Tesseract at Linux server, PHP 5.4. I uploaded main class file into classes/TesseractOCR.php, set chmod to 666 on and unfortunatelly it still returning warnings instead of any values from png file.

Here is my code:

require_once 'classes/TesseractOCR.php';
....
$tesseract = new TesseractOCR($file);
$tesseract->setTempDir('/absolute_path_to/temp/'); // temp folder have chmod 777
$tesseract->setWhitelist(range(0,9), '-+.'); // i'm trying to recognize phone numbers
echo $tesseract->recognize();

I tried many ways with path to temp folder (and also without it), but I still gain nothing. I'll very appreciate any help or suggestion; I tried to google out solution but every one comes to set temp folder, which not helps me.

Many thanks in advance.

exec takes no effect on tesseract

hi~
I have tried your code,it works out to have error when running to "exec("tesseract $tifImage $outputFile nobatch $configFile");"
No txt file generated,but it works if I run this code in linux terminal directly.

Haven't you met this? I am using centos 6.5 and php 5.5;

Hope your confirm~

Not recognizing number 8

I'm having a problem with number 8, it is read as 3...
my code:
$text = (new TesseractOCR("resized.png")) ->whitelist(range('A', 'z'), range(0,9), range(":",",")) ->psm(6) ->run();
Result:
image

I already change PSM, but no success.
Anyone can help?

not reading image

Code


include 'TesseractOCR.php';
//$obj = new TesseractOCR('text.png');
//var_dump($obj->run());
var_dump (new TesseractOCR('var/www/html/ocr-tesseract/src/text.png'))->run();
//echo dirname(__FILE__);
die('dafs');

object(TesseractOCR)#1 (8) { ["image":"TesseractOCR":private]=> string(39) "var/www/html/ocr-tesseract/src/text.png" ["executable":"TesseractOCR":private]=> string(9) "tesseract" ["tessdataDir":"TesseractOCR":private]=> NULL ["userWords":"TesseractOCR":private]=> NULL ["userPatterns":"TesseractOCR":private]=> NULL ["languages":"TesseractOCR":private]=> array(0) { } ["psm":"TesseractOCR":private]=> NULL ["configs":"TesseractOCR":private]=> array(0) { } }

setVerbose

I dont wanna see this Tesseract Open Source OCR Engine v3.02.02 with Leptonica each time.
I have added this after line 222 $command .= " 2>&1 1> /dev/null";. An option in the constructor or as a setter would be helpful for people annoyed with the message.

Multi-language?

Can I use something like this?

$tesseract->setLanguage('rus+eng');

Disabling dictionary

Are there any support in thiagoalessio/tesseract-ocr-for-php to disable the language dictionary?

No errors return nothing with very easy test

Hello

I try your tesseract-ocr-for-php to decode number in a gif file I follow your instalation instructions and your sample exemple but return me empty result page witheout any errors :

my image file -> http://hpics.li/4e167da
i try to a text in paint file like this but same result -> http://hpics.li/708e750

My code

require_once './vendor/autoload.php'; //if you are using composer

$tesseract = new TesseractOCR('hello.png');
echo $tesseract->recognize();

Maybe I do something wrong ??

Thank in advance for your return.

test
test2

Cannot run PSM(0)

Hi, I'm trying to run Tesseract in Orientation and script detection (OSD) only mode (PSM 0).

On the command line, setting PSM 0 works and produces something like:

Orientation: 0
Orientation in degrees: 0
Orientation confidence: 22.31
Script: 1
Script confidence: 36.67

I was hoping that the following code would produce that result, but instead it just gives a PSM 3 result of the OCR (default):

$tess = (new TesseractOCR(storage_path('app/doc.jpg')));
$tess->psm(0);
$text = $tess->run();
//Returns PSM 3 default OCR result

The other PSM modes (1-10) produce expected results, it's just PSM 0 which I can't get to work.

Is there a way to run PSM 0 and get the actual orientation result instead of the OCR? Is this an error, or is getting the OCR back for PSM 0 the expected result?

Thanks

Fix a small bug for set Tesseract language PHP API with chinese

Hi,thiagoalessio!
Glad to find and use your project tesseract-ocr-for-php, it does help me a lot! Thanks for your selfless contribution!
But during my coding with my ocr-chinese-project, I have used your new API : (new TesseractOCR('xxx.png'))->lang('deu'), your wiki said we need to be specified as 3-character ISO 639-2 language codes.But when I open the ISO 639-2 Page, I find chinese of ISO 639-2 is chi/zho,which is no suit to your API.
For example: (new TesseractOCR('chinese.png')) ->lang('chi'), it doesn't work here,And I changed the ISO Code as (new TesseractOCR('chinese.png')) ->lang('chi_sim'),or, (new TesseractOCR('chinese.png')) ->lang('chi_tra'),the same to the original tesseract langdata , it does work!Maybe the bug of ISO 639-2 is not only appear for chinese!Hope you can pay attention!
I will fix the wiki and pull a request for your repository later , hope it can help you!Thanks again!I am a university student from china。 Expect to help for this tesseract-ocr-for-php project and make friend with you!

                                                                                                                 Dave  2016.5.25  InChina

Absolute paths for image and TempDir

Would have been nice to mention you can't use absolute paths for image location and TempDir. I had this working in "vanilla" PHP in minutes, but then when I tried to integrate it into Laravel, it just wouldn't work. It took me 4 hours of trial and error only to realize

$tesseract = new TesseractOCR("C:\Program Files\wamp\www\test\public\img\hello.png");

doesn't work, and that only relative paths work.

text file is not generating on live website

my execute command
protected function execute()
{
$path = getenv('PATH');
putenv("PATH=$path:/usr/local/bin/tesseract ");

    $pathtess = $path."/tessdata";
             $this->outputFile = rand();
     echo $this->buildTesseractCommand();
             exec($this->buildTesseractCommand(),$pathtess);
}

Tesseract OCR

I am bascially a beginner in programming and developing a web application which converts Image into Text i have used Orcad for this but it's for simple text and creating many issues i want to know is this i helpful to me that i can give an interface to user in which he can upload a file of formats like png, jpeg etc and that web app converts it into text i am bascially learning node.js for this kindly guide me which is better and easy to learn particulary for this project?

Don't work with Microsoft IIS 7.5 server

hi,

I have a problem with IIS 7.5 server.
No output data is displayed.
However with Wamp server or command line on the IIS server, it works.

Can you help me?

Thank you very much.

No issues just question

I want to use bufferimage or image raw data as input in tesseract.
let's say i have a big image and I want only a small portion of it, to read by tesseract.
I can do Imagemagick crop method, the problem is
new TesseractOCR() // only accept only existing file not raw data.
that means I need to use Imagemagick->writeImage() method. to create local file // which is not efficient.
Does anyone know how to do it. directly input image data in tesseract. Thanks.

cropImage(100,100,0,0); $a = new TesseractOCR($img); // I want tesseract to accept image directly without writing it. echo $a->recognize(); ?>

psm command

Is it possible to set psm command into the script?

-psm N
Set Tesseract to only run a subset of layout analysis and assume a
certain form of image. The options for N are:

           0 = Orientation and script detection (OSD) only.
           1 = Automatic page segmentation with OSD.
           2 = Automatic page segmentation, but no OSD, or OCR.
           3 = Fully automatic page segmentation, but no OSD. (Default)
           4 = Assume a single column of text of variable sizes.
           5 = Assume a single uniform block of vertically aligned text.
           6 = Assume a single uniform block of text.
           7 = Treat the image as a single text line.
           8 = Treat the image as a single word.
           9 = Treat the image as a single word in a circle.
           10 = Treat the image as a single character.

documentation wrong

documentation seems to be all wrong.

No run() method (now recognize)
No lang() method (now setLanguage)

etc

Simple test doesn't work

Hello

Id' like to use TesseractOCR, so I installed it with composer, as mentioned in the readme file, and I tried the simple example from the readme file, with the same picture. Here is my code:

`<?php
require DIR.'/phpshell-2.4/vendor/autoload.php';

echo (new TesseractOCR('text.jpeg')) ->run();

?>`

I don't have any error and my file does exist (I tried with a if(is_file(...)) before). However, the returned string is null.
Am I missing something ?

file_get_contents error NEW

Hello there,

I have build a small app on my linux shared server and wamp server on local machine also with this class. I had a lot of issues.
Now i am trying to put my app online on a linux server and i have an issue that is hard to fix.
When i run the script with command line it works perfectly but on browser it says:

Warning: file_get_contents(temp/1135767769.txt): failed to open stream: No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 255

Warning: unlink(temp/1135767769.txt): No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 268

I have tried some of tricks discussed here but non of them worked for me.

Regards
big89

Language

Can we set the language of OCR

failed to open stream

i'm trying run this on wamp server and it gives several warning messages. "Warning: file_get_contents(tmp/tesseract-ocr-output-24039.txt) [function.file-get-contents]: failed to open stream: No such file or directory in C:\wamp\www\tesseract-ocr-for-php-master\tesseract_ocr\tesseract_ocr.php on line 49"

how to fix this issue? Thanks in advance.

file_get_contents

i work in local system xampp in tesseract-ocr, working properly bt upload online linux server to not working fiend this error

Warning: file_get_contents(temp/1135767769.txt): failed to open stream: No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 255

Warning: unlink(temp/1135767769.txt): No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 268

First example problem.

As shown in the README I simple did this,

require_once 'TesseractOCR.php';
$tesseract = new TesseractOCR('foo.png');
echo $tesseract->recognize();

and I am getting an error

Warning: file_get_contents(/tmp/798893693.txt): failed to open stream: No such file or directory in /var/.../TesseractOCR.php on line 212

Warning: unlink(/tmp/798893693.txt): No such file or directory in /var/.../TesseractOCR.php on line 225

Both the image foo.png and TerssoractOCR.php are in same directory. Furthermore, I didn't understand the dependencies you talk about, but I donwloaded it anyway, but I have no idea where to put it.

Doesn't display any thing

Hello,

I created a simple demo using example. but, it doesn't work .It shows nothing in echo.I ran
tesseract images/8055.png test in command line.
in the same directory of project,it worked,it added the contents of image "8055.png" to test.txt. Here is my code
<?php require_once './TesseractOCR.php'; $ec=new TesseractOCR('./images/text.jpeg'); echo $ec->run(); ?>

I am using wamp server as localhost.Here is my folder structure
TesseractOCR.php
test.php
images
images/text.jpeg

What am I doing wrong.Please help.

Libraries(Chinese)

Where do I insert the language pack for chinses. And curious where is the core code , I don't see how these files will retrieve some other code.

Just wondering how github/composer work xD

setCommand()

I always have to modify your file after updating for local and prod and after updating. They all have different paths for the tesseract command.

Make it simple and allow people to set custom tesseract path.

Install into Xampp not working gave blank result?

Hi, I've installed tesseract using composer into my window installed xampp PHP(5.5.37) localhost. Everything is run in 'htdocs/picture' directory folder (version 1.0.0-RC). Tried testing the program with the code below written in test.php but return me blank echo? Please enlighten me. Thanks.

<?php
require_once 'vendor/autoload.php'; // Tried omitting this statement however it return me Fatal error: Class 'TesseractOCR' not found in C:\xampp\htdocs\picture\test.php on line 3, therefore added it back but gave blank statement.

echo (new TesseractOCR('text.jpeg'))->run();
?>

Suporte para PDF

Thiago,
Acredito que o suporte a OCR em arquivos PDF seja simples. Precisei desse recurso e adicionei uma função no código que converte um pdf para tiff:
Segue:
function convertPDFToTif($originalImage) {
$tifImage = sys_get_temp_dir().'/tesseract-ocr-tif-'.rand().'.tif';
exec("gs -dNOPAUSE -q -r300 -sDEVICE=tiffg4 -dBATCH -sOutputFile=$tifImage $originalImage");
return $tifImage;
}

Logicamente, é necessário ter o ghostscript instalado.

How to set config or parameters

please give an example snippet to set initialization parameters or through config files.
so that we can do different task like applying grayscale filter etc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.