thiagoalessio / tesseract-ocr-for-php Goto Github PK
View Code? Open in Web Editor NEWA wrapper to work with Tesseract OCR inside PHP.
Home Page: https://packagist.org/packages/thiagoalessio/tesseract_ocr
License: MIT License
A wrapper to work with Tesseract OCR inside PHP.
Home Page: https://packagist.org/packages/thiagoalessio/tesseract_ocr
License: MIT License
my execute command
protected function execute()
{
$path = getenv('PATH');
putenv("PATH=$path:/usr/local/bin/tesseract ");
$pathtess = $path."/tessdata";
$this->outputFile = rand();
echo $this->buildTesseractCommand();
exec($this->buildTesseractCommand(),$pathtess);
}
I dont wanna see this Tesseract Open Source OCR Engine v3.02.02 with Leptonica
each time.
I have added this after line 222 $command .= " 2>&1 1> /dev/null";
. An option in the constructor or as a setter would be helpful for people annoyed with the message.
Hi,
Its a great library, but Im very new to the TesseractOCR.
I cannot output the TesseractOCR('text.jpg') as the error log of from Apache server saying
"Warning in pixReadMemJpeg: work-around: writing to a temp file
Error in pixReadMemJpeg: tmpfile stream not opened
Error in pixReadMem: jpeg: no pix returned
Error during processing."
If I used Terminal to Tesseract the text.jpg image, it will works without an issue and the output.txt file is being generated. However, when I do this on the browser to run the php file with the followiing code:
require_once dir . '/vendor/autoload.php';
echo (new TesseractOCR('text.jpg'))
->run();
it won't render and produce the error above.
Below was what I did:
I have added full path to the Tesseract executable file.
On the TesseractOCR.php, I have add a full path to the $executable variable (line 20)
private $executable = '/opt/local/bin/tesseract';
Can you please guide me on what does the error mean?, and what am I doing wrong here?
My operating system is:
OSX El Capitan 10.11.4
XAMPP Server
PHP Version 5.6.8
Thanks
Mike
Hello there,
I have build a small app on my linux shared server and wamp server on local machine also with this class. I had a lot of issues.
Now i am trying to put my app online on a linux server and i have an issue that is hard to fix.
When i run the script with command line it works perfectly but on browser it says:
Warning: file_get_contents(temp/1135767769.txt): failed to open stream: No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 255
Warning: unlink(temp/1135767769.txt): No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 268
I have tried some of tricks discussed here but non of them worked for me.
Regards
big89
I always have to modify your file after updating for local and prod and after updating. They all have different paths for the tesseract command.
Make it simple and allow people to set custom tesseract path.
documentation seems to be all wrong.
No run() method (now recognize)
No lang() method (now setLanguage)
etc
Thiago,
Acredito que o suporte a OCR em arquivos PDF seja simples. Precisei desse recurso e adicionei uma função no código que converte um pdf para tiff:
Segue:
function convertPDFToTif($originalImage) {
$tifImage = sys_get_temp_dir().'/tesseract-ocr-tif-'.rand().'.tif';
exec("gs -dNOPAUSE -q -r300 -sDEVICE=tiffg4 -dBATCH -sOutputFile=$tifImage $originalImage");
return $tifImage;
}
Logicamente, é necessário ter o ghostscript instalado.
i want limit number of character.
How limit number of character?
Hi,
I'm trying to install tesseract for php on my webserver. I installed it by downloading the source from github. I want now to install https://github.com/tesseract-ocr/tesseract, but I don't know where to change the $path and where to put this scripts.
Can you please help me?
Thanks!
Hello there,
I have build a small app on my local machine with this class. I had a lot of issues i first run it on my windows machine but i was able to fix them by making changes that i read on discussions here.
Now i am trying to put my app online on a linux server and i have an issue that is hard to fix.
When i run the script with command line it works perfectly but on browser it says: file_get_contents(tempi/1525942339.txt): failed to open stream: No such file or directory in /var/www/html/ocr/TesseractOCR/TesseractOCR.php on line 236
I have tried some of tricks discussed here but non of them worked for me.
Regards
Agon
i downloaded it as a zip file as php project and i got this error
i don't know is this a laravel project or not but i uploaded it without laravel
Hi,thiagoalessio!
Glad to find and use your project tesseract-ocr-for-php, it does help me a lot! Thanks for your selfless contribution!
But during my coding with my ocr-chinese-project, I have used your new API : (new TesseractOCR('xxx.png'))->lang('deu'), your wiki said we need to be specified as 3-character ISO 639-2 language codes.But when I open the ISO 639-2 Page, I find chinese of ISO 639-2 is chi/zho,which is no suit to your API.
For example: (new TesseractOCR('chinese.png')) ->lang('chi'), it doesn't work here,And I changed the ISO Code as (new TesseractOCR('chinese.png')) ->lang('chi_sim'),or, (new TesseractOCR('chinese.png')) ->lang('chi_tra'),the same to the original tesseract langdata , it does work!Maybe the bug of ISO 639-2 is not only appear for chinese!Hope you can pay attention!
I will fix the wiki and pull a request for your repository later , hope it can help you!Thanks again!I am a university student from china。 Expect to help for this tesseract-ocr-for-php project and make friend with you!
Dave 2016.5.25 InChina
Is there support for multi-page PDFs?
Do OCR support for khmer language ?
how can i use your php wrapper in cpanal,i am unable to find any tutorial on it.
Thank you
is it possible to define a specific area within the image to be parsed? like x,y,w,h ?
Code
include 'TesseractOCR.php';
//$obj = new TesseractOCR('text.png');
//var_dump($obj->run());
var_dump (new TesseractOCR('var/www/html/ocr-tesseract/src/text.png'))->run();
//echo dirname(__FILE__);
die('dafs');
object(TesseractOCR)#1 (8) { ["image":"TesseractOCR":private]=> string(39) "var/www/html/ocr-tesseract/src/text.png" ["executable":"TesseractOCR":private]=> string(9) "tesseract" ["tessdataDir":"TesseractOCR":private]=> NULL ["userWords":"TesseractOCR":private]=> NULL ["userPatterns":"TesseractOCR":private]=> NULL ["languages":"TesseractOCR":private]=> array(0) { } ["psm":"TesseractOCR":private]=> NULL ["configs":"TesseractOCR":private]=> array(0) { } }
Can we set the language of OCR
Is it possible to set psm command into the script?
-psm N
Set Tesseract to only run a subset of layout analysis and assume a
certain form of image. The options for N are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR.
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
i work in local system xampp in tesseract-ocr, working properly bt upload online linux server to not working fiend this error
Warning: file_get_contents(temp/1135767769.txt): failed to open stream: No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 255
Warning: unlink(temp/1135767769.txt): No such file or directory in /home/catchmyd/public_html/ocr_demo/TesseractOCR/TesseractOCR.php on line 268
Its not an issue just need support. How can I install it on AWS EC2 ubuntu machine
Error
Fatal error: Call to undefined method TesseractOCR::run() in /var/www/fixtures/index.php on line 11
PHP Code
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
include ('.vendor/autoload.php');
echo (new TesseractOCR('screen.jpg'))
->run();
?>
any ideas?
"thiagoalessio/tesseract_ocr": "^0.2.1"
installed
Hi,
I am using Centos 5 with tesseract-2.04-2.
When I use tesseract on console I can only read .tif image, for other it does not recognise the type.
But in my script it does not even read the tif, juste return empty string.
This is how I am using it.
include('TesseractOCR.php');
if (isset($_FILES['file'])) {
$tesseract = new TesseractOCR($_FILES['file']['tmp_name']);
$result = $tesseract->recognize();
}
Any idea where I am going wrong with it.
Hello,
When automated tests are runs from travis, you got an error from
Notice: Undefined index: CODECLIMATE_REPO_TOKEN
See the end of the log : https://api.travis-ci.org/jobs/175863052/log.txt?deansi=true
hi~
I have tried your code,it works out to have error when running to "exec("tesseract $tifImage $outputFile nobatch $configFile");"
No txt file generated,but it works if I run this code in linux terminal directly.
Haven't you met this? I am using centos 6.5 and php 5.5;
Hope your confirm~
Are there any support in thiagoalessio/tesseract-ocr-for-php to disable the language dictionary?
Where do I insert the language pack for chinses. And curious where is the core code , I don't see how these files will retrieve some other code.
Just wondering how github/composer work xD
Hello
I try your tesseract-ocr-for-php to decode number in a gif file I follow your instalation instructions and your sample exemple but return me empty result page witheout any errors :
my image file -> http://hpics.li/4e167da
i try to a text in paint file like this but same result -> http://hpics.li/708e750
My code
require_once './vendor/autoload.php'; //if you are using composer
$tesseract = new TesseractOCR('hello.png');
echo $tesseract->recognize();
Maybe I do something wrong ??
Thank in advance for your return.
Hi, I'm trying to run Tesseract in Orientation and script detection (OSD) only mode (PSM 0).
On the command line, setting PSM 0 works and produces something like:
Orientation: 0
Orientation in degrees: 0
Orientation confidence: 22.31
Script: 1
Script confidence: 36.67
I was hoping that the following code would produce that result, but instead it just gives a PSM 3 result of the OCR (default):
$tess = (new TesseractOCR(storage_path('app/doc.jpg')));
$tess->psm(0);
$text = $tess->run();
//Returns PSM 3 default OCR result
The other PSM modes (1-10) produce expected results, it's just PSM 0 which I can't get to work.
Is there a way to run PSM 0 and get the actual orientation result instead of the OCR? Is this an error, or is getting the OCR back for PSM 0 the expected result?
Thanks
please give an example snippet to set initialization parameters or through config files.
so that we can do different task like applying grayscale filter etc
hi,
I have a problem with IIS 7.5 server.
No output data is displayed.
However with Wamp server or command line on the IIS server, it works.
Can you help me?
Thank you very much.
Hello, how could I retrieve the XY coordinates for each word recognized ?
Hi there,
I'm running Tesseract at Linux server, PHP 5.4. I uploaded main class file into classes/TesseractOCR.php, set chmod to 666 on and unfortunatelly it still returning warnings instead of any values from png file.
Here is my code:
require_once 'classes/TesseractOCR.php';
....
$tesseract = new TesseractOCR($file);
$tesseract->setTempDir('/absolute_path_to/temp/'); // temp folder have chmod 777
$tesseract->setWhitelist(range(0,9), '-+.'); // i'm trying to recognize phone numbers
echo $tesseract->recognize();
I tried many ways with path to temp folder (and also without it), but I still gain nothing. I'll very appreciate any help or suggestion; I tried to google out solution but every one comes to set temp folder, which not helps me.
Many thanks in advance.
When using the setLanguage option it throws an error.
I fixed this by placing $command.= " {$this->outputFile}"; above the $command.= " -l {$this->language}";
Can I use something like this?
$tesseract->setLanguage('rus+eng');
hello sir,
i am useing tesseractOCR for image reading but that doesnot work. the output of image is blank.please help me
If i have bought web-hosting can i use that?
I mean, does this REQUIRE to have any software installed?
Or just PHP should work ?
Hello
Id' like to use TesseractOCR, so I installed it with composer, as mentioned in the readme file, and I tried the simple example from the readme file, with the same picture. Here is my code:
`<?php
require DIR.'/phpshell-2.4/vendor/autoload.php';
echo (new TesseractOCR('text.jpeg')) ->run();
?>`
I don't have any error and my file does exist (I tried with a if(is_file(...)) before). However, the returned string is null.
Am I missing something ?
As shown in the README I simple did this,
require_once 'TesseractOCR.php';
$tesseract = new TesseractOCR('foo.png');
echo $tesseract->recognize();
and I am getting an error
Warning: file_get_contents(/tmp/798893693.txt): failed to open stream: No such file or directory in /var/.../TesseractOCR.php on line 212
Warning: unlink(/tmp/798893693.txt): No such file or directory in /var/.../TesseractOCR.php on line 225
Both the image foo.png
and TerssoractOCR.php
are in same directory. Furthermore, I didn't understand the dependencies you talk about, but I donwloaded it anyway, but I have no idea where to put it.
Would have been nice to mention you can't use absolute paths for image location and TempDir. I had this working in "vanilla" PHP in minutes, but then when I tried to integrate it into Laravel, it just wouldn't work. It took me 4 hours of trial and error only to realize
$tesseract = new TesseractOCR("C:\Program Files\wamp\www\test\public\img\hello.png");
doesn't work, and that only relative paths work.
I m running tesseract on windows (xampp), I want to format the result what I want, It should be in \d{1,9}.((v|V)|(x|X)) format. Can someone help me how to setup that rule in tesseract in php.
i'm trying run this on wamp server and it gives several warning messages. "Warning: file_get_contents(tmp/tesseract-ocr-output-24039.txt) [function.file-get-contents]: failed to open stream: No such file or directory in C:\wamp\www\tesseract-ocr-for-php-master\tesseract_ocr\tesseract_ocr.php on line 49"
how to fix this issue? Thanks in advance.
Hi guys,
if e.g. the inout is not a valid file the error messages of tesseract were taken as text, which is unwanted behavior I guess. It would be better to consider the returncode of the tesseract command and in case of errors to throw an exception.
I want to use bufferimage or image raw data as input in tesseract.
let's say i have a big image and I want only a small portion of it, to read by tesseract.
I can do Imagemagick crop method, the problem is
new TesseractOCR() // only accept only existing file not raw data.
that means I need to use Imagemagick->writeImage() method. to create local file // which is not efficient.
Does anyone know how to do it. directly input image data in tesseract. Thanks.
Hi, I've installed tesseract using composer into my window installed xampp PHP(5.5.37) localhost. Everything is run in 'htdocs/picture' directory folder (version 1.0.0-RC). Tried testing the program with the code below written in test.php but return me blank echo? Please enlighten me. Thanks.
<?php
require_once 'vendor/autoload.php'; // Tried omitting this statement however it return me Fatal error: Class 'TesseractOCR' not found in C:\xampp\htdocs\picture\test.php on line 3, therefore added it back but gave blank statement.
echo (new TesseractOCR('text.jpeg'))->run();
?>
Hello,
I created a simple demo using example. but, it doesn't work .It shows nothing in echo.I ran
tesseract images/8055.png test
in command line.
in the same directory of project,it worked,it added the contents of image "8055.png" to test.txt. Here is my code
<?php require_once './TesseractOCR.php'; $ec=new TesseractOCR('./images/text.jpeg'); echo $ec->run(); ?>
I am using wamp server as localhost.Here is my folder structure
TesseractOCR.php
test.php
images
images/text.jpeg
What am I doing wrong.Please help.
I am bascially a beginner in programming and developing a web application which converts Image into Text i have used Orcad for this but it's for simple text and creating many issues i want to know is this i helpful to me that i can give an interface to user in which he can upload a file of formats like png, jpeg etc and that web app converts it into text i am bascially learning node.js for this kindly guide me which is better and easy to learn particulary for this project?
hi, is there anyone able to extract phone number from this image?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.