Coder Social home page Coder Social logo

ruediger / vobsub2srt Goto Github PK

View Code? Open in Web Editor NEW
293.0 10.0 65.0 231 KB

Converts VobSub subtitles (.idx/.srt format) into .srt subtitles.

License: GNU General Public License v3.0

Makefile 0.37% Shell 0.33% C 70.38% C++ 13.89% Ruby 0.27% Perl 1.14% CMake 13.62%

vobsub2srt's Introduction

VobSub2SRT is a simple command line program to convert .idx / .sub subtitles into .srt text subtitles by using OCR. It is based on code from the MPlayer project - a really really great movie player. Some minor parts are copied from ffmpeg/avutil headers. Tesseract is used as OCR software.

vobsub2srt is released under the GPL3+ license. The MPlayer code included is GPL2+ licensed.

The quality of the OCR depends on the text in the subtitles. Currently the code does not use any preprocessing. But I’m currently looking into adding filters and scaling options to improve the OCR. You can correct mistakes in the .srt files with a text editor or a special subtitle editor.

Building

You need tesseract. You also need cmake and a gcc to build it. With Ubuntu 12.10 you can install the dependencies with

sudo apt-get install libtiff5-dev libtesseract-dev tesseract-ocr-eng build-essential cmake pkg-config

You should also install the tesseract data for the languages you want to use! Note that the support for tesseract 2 is deprecated and will be removed in the future!

./configure
make
sudo make install

This should install the program vobsub2srt to /usr/local/bin. You can uninstall vobsub2srt with sudo make uninstall.

Static binary

I recommend using the dynamic binary! However if you really need a static binary you can add the flag -DBUILD_STATIC=ON to the ./configure call. But be aware that building static binaries can be quite troublesome. You need the static library files for tesseract, libtill, libavutils, and for their dependencies as well. On Ubuntu 12.04 the static libraries are only included in the dev packages! You probably also need the Gold linker.

For Ubuntu 12.04 you need the following extra packages:

sudo apt-get install libleptonica-dev libpng12-dev libwebp-dev libgif-dev zlib1g-dev libjpeg-dev binutils-gold

If linking fails with undefined references then checking what other dependencies your version of leptonica has is a good starting point. You can do this by running ldd /usr/lib/liblept.so (or whatever the path to leptonica is on your system). Add those dependencies to CMakeModules/FindTesseract.cmake.

Ubuntu PPA and .deb packages

I have created a PPA (Personal Package Archive) to make installation on Ubuntu easy. Simply add the PPA to your apt-get sources and run an update and you can install the vobsub2srt package:

sudo add-apt-repository ppa:ruediger-c-plusplus/vobsub2srt
sudo apt-get update
sudo apt-get install vobsub2srt

.deb (Debian/Ubuntu)

You can build a *.deb package (Debian/Ubuntu) with make package. The package is created in the build directory.

You can also create a source package and upload it to your own PPA by using the UploadPPA.cmake. But this is only recommended for people experienced with cmake and creating Debian packages.

Homebrew

Vobsub2srt contains a formula for Homebrew (a package manager for OS X). It can be installed by using the following commands:

brew install --with-all-languages tesseract
brew install --HEAD https://github.com/ruediger/VobSub2SRT/raw/master/packaging/vobsub2srt.rb

Gentoo ebuild

An ebuild for Gentoo Linux is also available. You can make it available to emerge with the following steps

sudo mkdir -p /usr/local/portage/media-video/vobsub2srt/
wget https://github.com/ruediger/VobSub2SRT/raw/master/packaging/vobsub2srt-9999.ebuild
sudo mv vobsub2srt-9999.ebuild /usr/local/portage/media-video/vobsub2srt/
cd /usr/local/portage/media-video/vobsub2srt/
sudo ebuild vobsub2srt-999.ebuild digest

You should be able to install vobsub2srt with emerge vobsub2srt now. If you want to use a newer version (3+) of tesseract you have to use layman. See #13 for details.

Arch AUR

There also exist a PKGBUILD file for Arch Linux in AUR: https://aur.archlinux.org/packages/vobsub2srt-git

Usage

vobsub2srt converts subtitles in VobSub (.idx / .sub) format into subtitles in .srt format. VobSub subtitles consist of two or three files called Filename.idx, Filename.sub and optional Filename.ifo. To convert subtitles simply call

vobsub2srt Filename

with Filename being the file name of the subtitle files WITHOUT the extension (.idx / .sub). vobsub2srt writes the subtitles to a file called Filename.srt.

If a subtitle file contains more than one language you can use the --lang parameter to set the correct language (Use --langlist to find out about the languages in the file). For some languages you might need to set the tesseract language yourself (e.g., chi_tra/chi_sim for traditional or simplified chinese characters). You can use --tesseract-lang to do this. In most cases this should however be autodetected.

If you want to dump the subtitles as images (e.g. to check for correct ocr) you can use the --dump-images flag.

Use --help or read the manpage to get more information about the options of vobsub2srt.

Bug reports

Please submit bug reports or feature requests to the issue tracker on GitHub. If you do not have a GitHub account and feel uncomfortable creating one then feel free to send an e-mail to <[email protected]> instead.

If you have problems with a specific subtitle file then please check if it works in mplayer first. If it does not then please report the bug to mplayer as well and link to the mplayer bug report.

For bug reports please run vobsub2srt with the --verbose option and copy and paste the full output to the bug report.

Contributors

Most code is from the MPlayer project.

  • Armin Häberling <[email protected]> wrote a patch to fix an issue with multiple instances of the same subtitle in result file (21af426)
  • James Harris <[email protected]> wrote the formula for Homebrew (54f311d6)
  • Leo Koppelkamm reported and fixed issue #5 and problems with long filenames (b903074c, 36ec8da, d3602d6)
  • Till Korten <[email protected]> wrote the ebuild script (#13)
  • Andreasf fixed missing libavutil include path (3a175eb, #15)
  • Michal Gawlik fixed the overlapping issue (5b2ccabc55f, #29, #32)
  • “bit” made sure no trailing whitespace are written to the SRT (3a59dc278abc2, #38)
  • Baudouin Raoult for various fixes (028f742, #44, b722a03, #42, 7293ac2, #40)
  • Justyn Butler added the y-threshold support (f873761, #43)
  • James Laird-Wah added min-width/height support and fixed other issues (41c6844, #48, #46)
  • Filirom1 fixed a minor issue (4ed58c2, #49)

To Do

  • implement preprocessing (first step scaling. Code available in spudec.c)

vobsub2srt's People

Contributors

abrasive avatar andreasf avatar arminha avatar b8raoult avatar bit avatar c17r avatar camjn avatar filirom1 avatar florianpircher avatar justyn avatar norayr avatar ruediger avatar subpop avatar vinzenz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vobsub2srt's Issues

TESSERACT_DATA_PATH and tesseract 3.03_rc1

Hi,

My system is gentoo.
If I use the ebuild from packaging/, vobsub2srt is working fine with tesseract-3.02.
But, if I upgrade tesseract to 3.03_rc1, vobsub2srt fails like this :

$ vobsub2srt 01
Error opening data file /usr/share/tesseract-ocr/tessdata/fra.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'fra'
Tesseract couldn't load any languages!
Failed to initialize tesseract (OCR).

The only way I found to make it work with tesseract 3.03_rc1 is to rebuild vobsub2srt to force TESSERACT_DATA_PATH at configure phase, like following in the ebuild :

src_configure() {
      local mycmakeargs=(
              -DTESSERACT_DATA_PATH="/usr/share"
      )
      cmake-utils_src_configure
}

So my question is : why do I need to force the TESSERACT_DATA_PATH value with tesseract 3.03_rc1 but not with 3.02 ? Is this normal ?

Thanks.

Blank pgm images and no text data

I'm having trouble getting VobSub2SRT to work.

On my files test.idx/test.sub, running the command:

vobsub2srt --langlist test

Produces output:

Languages:
0: en

Using the command:

vobsub2srt --lang en --tesseract-lang eng --dump-images test

Appears to work, with output:

Selected VOBSUB language: 0 language: en
Wrote Subtitles to 'test.srt

However test.srt contains the timing data but no actual text.
~500 .pgm files are output but they are all blank.

Loading the same idx file with avidemux OCR Vobsub->SRT tool shows all the correct subtitles are in the file.

Any way I can get some more debug output?
Running latest git master on Ubuntu 14.04 64-bit.

Thanks.

compiling tesseract-pkg-config

Hi, well here again, opencl removed from tesseract so now works, lets try use pkg-config with the title branch:

./configure
-- The C compiler identification is GNU 4.9.3
-- The CXX compiler identification is GNU 4.9.3
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Source: /home/pipe/Documentos/git/VobSub2SRT
-- Binary: /home/pipe/Documentos/git/VobSub2SRT/build
-- Build type: Debug
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.28") 
-- Checking for module 'tesseract'
--   Found tesseract, version 3.04.00
-- Could NOT find Tesseract (missing:  Tesseract_INCLUDE_DIR) 
-- Bash completion path: /usr/share/bash-completion/completions
-- vobsub2srt version: 1.0pre7-2-ga2b2682
-- Debian architecture: amd64
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pipe/Documentos/git/VobSub2SRT/build
pipe@8dsaIHG ~/Documentos/git/VobSub2SRT $ make -j9
make -C build
make[1]: se entra en el directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
make[2]: se entra en el directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
make[3]: se entra en el directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
make[3]: se entra en el directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
Scanning dependencies of target documentation
Scanning dependencies of target mplayer
make[3]: se sale del directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
make[3]: se entra en el directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
[ 10%] Generating vobsub2srt.1.gz
make[3]: se sale del directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
[ 10%] Built target documentation
make[3]: se sale del directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
make[3]: se entra en el directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
[ 30%] Building C object mplayer/CMakeFiles/mplayer.dir/mp_msg.c.o
[ 40%] Building C object mplayer/CMakeFiles/mplayer.dir/unrar_exec.c.o
[ 50%] Building C object mplayer/CMakeFiles/mplayer.dir/spudec.c.o
[ 50%] Building C object mplayer/CMakeFiles/mplayer.dir/vobsub.c.o
[ 60%] Linking C static library ../lib/libmplayer.a
make[3]: se sale del directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
[ 60%] Built target mplayer
make[3]: se entra en el directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
Scanning dependencies of target vobsub2srt
make[3]: se sale del directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
make[3]: se entra en el directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
[ 80%] Building CXX object src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o
[ 80%] Building CXX object src/CMakeFiles/vobsub2srt.dir/langcodes.c++.o
[ 90%] Building CXX object src/CMakeFiles/vobsub2srt.dir/cmd_options.c++.o
/home/pipe/Documentos/git/VobSub2SRT/src/vobsub2srt.c++: En la función ‘int main(int, char**)’:
/home/pipe/Documentos/git/VobSub2SRT/src/vobsub2srt.c++:218:3: error: ‘TessBaseAPI’ no se ha declarado
   TessBaseAPI::SimpleInit(tess_path, tess_lang, false); // TODO params
   ^
/home/pipe/Documentos/git/VobSub2SRT/src/vobsub2srt.c++:220:5: error: ‘TessBaseAPI’ no se ha declarado
     TessBaseAPI::SetVariable("tessedit_char_blacklist", blacklist.c_str());
     ^
/home/pipe/Documentos/git/VobSub2SRT/src/vobsub2srt.c++:275:20: error: ‘TessBaseAPI’ no se ha declarado
       char *text = TessBaseAPI::TesseractRect(image, 1, stride, 0, 0, width, height);
                    ^
/home/pipe/Documentos/git/VobSub2SRT/src/vobsub2srt.c++:314:3: error: ‘TessBaseAPI’ no se ha declarado
   TessBaseAPI::End();
   ^
src/CMakeFiles/vobsub2srt.dir/build.make:62: fallo en las instrucciones para el objetivo 'src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o'
make[3]: *** [src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o] Error 1
make[3]: se sale del directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
CMakeFiles/Makefile2:172: fallo en las instrucciones para el objetivo 'src/CMakeFiles/vobsub2srt.dir/all'
make[2]: *** [src/CMakeFiles/vobsub2srt.dir/all] Error 2
make[2]: se sale del directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
Makefile:149: fallo en las instrucciones para el objetivo 'all'
make[1]: *** [all] Error 2
make[1]: se sale del directorio '/home/pipe/Documentos/git/VobSub2SRT/build'
Makefile:4: fallo en las instrucciones para el objetivo 'all'
make: *** [all] Error 2

Creating a threshold option to convert all colours to either full-black or full-white?

I've found VobSub2SRT to be a great tool, thank you.

However with the subtitle streams I'm using there are a lot of repetitive OCR mistakes that seem to be caused by the grey pixels outlining each of the (off-white) characters.

You can see an example image here:
https://www.dropbox.com/s/dkbd9fh6hfr26fa/stvoy2x23-en-513-orig.png?dl=0

I've found that if I modify the palette line in the idx file to change the grey colours to black, leaving only one lighter colour, I get fantastically better results.

The images from the modified idx look like this:
https://www.dropbox.com/s/p08mk4204717dnx/stvoy2x23-en-513-bw.png?dl=0

I think that in the modified version it is easier to distinguish individual characters.

I'm guessing that this could be a common problem, so I was thinking of a really simple option to add to VobSubSRT that would perform the step automatically.

The simplest way I can think of would be to specify a threshold value as a parameter, and to convert every colour in the palette above the value to white and below it to black, ie --bw-threshold 200.

  1. Does this sound sensible to you?
  2. At what point in the code would this make the most sense?

Dedicated --langlist option

For files with multiple languages --verbose can be used to get a list of languages. But the behaviour of --verbose is not very predictable and the list of languages is not printed if there is only one language in the sub file. Adding a --langlist option could make it easier to write scripts.

This issue was reported to me by email from Enrico Gherardo.

output subtitled with overlapping errors

If you don't mind I'd put 2 more bugs that happened to me after the .srt was generated.

When i played it with VLC I realized the subs where getting overlapped, so I CHECKED ERRORS with the subtitleeditor software(great one too btw).

here is the output of the check errors.
selection_002

Even if subtitleeditor has a menu option for TRY TO FIX ALL, apparently it can't manage to do so.

Thanks again.

Blacklist doesn't seem to work

Hi Guys

I'm not 100% sure if I'm passing the paramater in the right format, but it would seem that the blacklist doesn't work quite right.

I tried

vobsub2srt --blacklist \| filename

and

vobsub2srt --blacklist "|" filename

With both options words like |'|| or wou|d still shows in the SRT file

As a quick fix, can I change line 137 of vobsub2srt.c++ from

  tess_base_api.SetVariable("tessedit_char_blacklist", blacklist.c_str());

to

  tess_base_api.SetVariable("tessedit_char_blacklist", "|");

and remove the if statement around it? (Sorry, I have zero C++ experience).

Thank you

Werner

error during configure with tesseract 3.01

when I try to compile with tesseract 3.01, I get the following error during configure:

-- Found Threads: TRUE
-- Performing Test TESSERACT_NAMESPACE
-- Performing Test TESSERACT_NAMESPACE - Success
-- Found Tesseract: Tesseract_LIBRARIES-NOTFOUND;/usr/lib64/libtiff.so
-- vobsub2srt version: 48723ea
-- dpkg not found: No package generation.
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
Tesseract_LIBRARIES (ADVANCED)
linked by target "vobsub2srt" in directory /var/tmp/portage/media-video/vobsub2srt-9999/work/vobsub2srt-9999/src

-- Configuring incomplete, errors occurred!

I am on a gentoo system and using tesseract 3.01 and libavutil version 50.43.0
with tesseract 2.04 the compile works fine, but I ran into the same problem reported here:
#4
after upgrading to tesseract 3.01 without recompiling vobsub2srt, I get the unicharset file not found error. therefore I tried to recompile.

I figure this has something to do with tesseract 2.04 and tesseract 3.01 not being compatible (from the tesseract ReadMe wiki http://code.google.com/p/tesseract-ocr/wiki/ReadMe ):
"Another important change is that you should really be using TessBaseAPI if you are linking with another program. In Linux (non-Windows) the main library is now libtesseract_api.a instead of the old libtesseract_full.a. "

Compilation fails on Fedora

./configure -DBUILD_STATIC=ON
-- Source: /home/ssbarnea/VobSub2SRT
-- Binary: /home/ssbarnea/VobSub2SRT/build
-- Build type: Debug
CMake Warning at CMakeLists.txt:26 (message):
  Building a statically linked version of VobSub2SRT is NOT recommended.  You
  might run into library dependency issues.  Please check the README!


-- Performing Test GIF_GifFileType_UserData
-- Performing Test GIF_GifFileType_UserData - Success
-- Found GIF: /usr/lib64/libgif.so (found version "4")
CMake Warning at CMakeModules/FindTesseract.cmake:56 (message):
  You are using an old Tesseract version.  Support for Tesseract 2 is
  deprecated and will be removed in the future!
Call Stack (most recent call first):
  CMakeLists.txt:66 (find_package)


-- Bash completion path: /usr/share/bash-completion/completions
-- vobsub2srt version: 1.0pre7-7-gd8f6803
-- Debian architecture: amd64
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ssbarnea/VobSub2SRT/build

Compile

$ make
make -C build
make[1]: Entering directory '/home/ssbarnea/VobSub2SRT/build'
make[2]: Entering directory '/home/ssbarnea/VobSub2SRT/build'
make[3]: Entering directory '/home/ssbarnea/VobSub2SRT/build'
make[3]: Leaving directory '/home/ssbarnea/VobSub2SRT/build'
[ 50%] Built target mplayer
make[3]: Entering directory '/home/ssbarnea/VobSub2SRT/build'
make[3]: Leaving directory '/home/ssbarnea/VobSub2SRT/build'
make[3]: Entering directory '/home/ssbarnea/VobSub2SRT/build'
[ 60%] Building CXX object src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o
/home/ssbarnea/VobSub2SRT/src/vobsub2srt.c++: In function ‘int main(int, char**)’:
/home/ssbarnea/VobSub2SRT/src/vobsub2srt.c++:218:3: error: ‘TessBaseAPI’ has not been declared
   TessBaseAPI::SimpleInit(tess_path, tess_lang, false); // TODO params
   ^~~~~~~~~~~
/home/ssbarnea/VobSub2SRT/src/vobsub2srt.c++:220:5: error: ‘TessBaseAPI’ has not been declared
     TessBaseAPI::SetVariable("tessedit_char_blacklist", blacklist.c_str());
     ^~~~~~~~~~~
/home/ssbarnea/VobSub2SRT/src/vobsub2srt.c++:275:20: error: ‘TessBaseAPI’ has not been declared
       char *text = TessBaseAPI::TesseractRect(image, 1, stride, 0, 0, width, height);
                    ^~~~~~~~~~~
/home/ssbarnea/VobSub2SRT/src/vobsub2srt.c++:314:3: error: ‘TessBaseAPI’ has not been declared
   TessBaseAPI::End();
   ^~~~~~~~~~~
src/CMakeFiles/vobsub2srt.dir/build.make:62: recipe for target 'src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o' failed
make[3]: *** [src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o] Error 1
make[3]: Leaving directory '/home/ssbarnea/VobSub2SRT/build'
CMakeFiles/Makefile2:172: recipe for target 'src/CMakeFiles/vobsub2srt.dir/all' failed
make[2]: *** [src/CMakeFiles/vobsub2srt.dir/all] Error 2
make[2]: Leaving directory '/home/ssbarnea/VobSub2SRT/build'
Makefile:149: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/home/ssbarnea/VobSub2SRT/build'
Makefile:4: recipe for target 'all' failed
make: *** [all] Error 2

Make statically linking less painfull

Currently statically linking is a bit of a hack. This is mainly due to the lack of pkg-config support in leptonica and thus it requires manually tracking all the library dependencies of it (which is quite a lot). Tesseract finally has pkg-config support in 3.02.02 (not in Ubuntu 12.10 though).

Statically linking on Ubuntu 12.10 (and Debian) is currently not working because no static version of libjbig is provided. See https://bugs.launchpad.net/ubuntu/+source/jbigkit/+bug/1098361 Which is a dependency of their leptonica package.

This issue was also reported by e-mail

Linking CXX executable ../bin/vobsub2srt
/home/ozone/test/build/lib/libtesseract.a(svutil.o): nella funzione
"SVNetwork::SVNetwork(char const*, int)":
svutil.cpp:(.text+0x42d): attenzione: Using 'getaddrinfo' in statically
linked applications requires at runtime the shared libraries from the
glibc version used for linking
/usr/lib64/gcc/x86_64-slackware-linux/4.7.1/../../../../x86_64-slackware-linux/bin/ld:
/usr/lib64/gcc/x86_64-slackware-linux/4.7.1/../../../../lib64/libstdc++.a(eh_globals.o):
undefined reference to symbol '__tls_get_addr@@GLIBC_2.3'
/usr/lib64/gcc/x86_64-slackware-linux/4.7.1/../../../../x86_64-slackware-linux/bin/ld:
note: '__tls_get_addr@@GLIBC_2.3' is defined in DSO
/lib64/ld-linux-x86-64.so.2 so try adding it to the linker command line
/lib64/ld-linux-x86-64.so.2: could not read symbols: Invalid operation
collect2: error: ld returned 1 exit status

Use Tesseract 4 installed without root

Hello,

I'm wondering if it's possible to install VobSub2SRT and to specify the path of the Tesseract 4 installation (in my home), because I want to use the last version and I install it by myself.
I already have VobSub2SRT working but with Tesseract 3.

Thank you very much.

vobsub2srt doesn't detect subtitles in this file.

I am trying to use this program on vobsubs extracted from an mkv track. I don't have an IFO file. The subtitles were extracted using mkvextract. The program appearsYou can download the offending .sub/.idx files from my dropbox.

Here is the file: http://dl.dropbox.com/u/4660781/problem_vobsub.zip

I believe that at https://github.com/ruediger/VobSub2SRT/blob/master/src/vobsub2srt.c++#L177 vobsub_get_next_packet returns -1 causing the program to skip the main loop entirely. If you could shed any light on the situation that would be incredibly helpful.

Thanks in advance.

Incorrect Homebrew install instructions

The following instructions are preferable:

brew install --all-languages tesseract
brew install --HEAD https://github.com/ruediger/VobSub2SRT/raw/master/packaging/vobsub2srt.rb

dump images as png

hello dear,
how to produce a png image that has transparency instead of a pgm?
thanks in advance :)

Fail to build with GCC 7

Here I try to build the latest git snapshot 0ba6e25

Build for debian unstable amd64 with tesseract 4.00~git2174-3b62badd-5

cd "/src/vobsub2srt-1.0~pre7+20171219/build/src" && /usr/lib/ccache/c++  -DINSTALL_PREFIX=\"/usr\" -I"/src/vobsub2srt-1.0~pre7+20171219/mplayer"  -ansi -pedantic -Wall -Wextra -Wno-long-long   -o CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o -c "/src/vobsub2srt-1.0~pre7+20171219/src/vobsub2srt.c++"
In file included from /usr/include/c++/7/cinttypes:35:0,
                 from /usr/include/tesseract/host.h:30,
                 from /usr/include/tesseract/serialis.h:26,
                 from /usr/include/tesseract/baseapi.h:32,
                 from /src/vobsub2srt-1.0~pre7+20171219/src/vobsub2srt.c++:27:
/usr/include/c++/7/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^~~~~
In file included from /usr/include/tesseract/apitypes.h:23:0,
                 from /usr/include/tesseract/baseapi.h:27,
                 from /src/vobsub2srt-1.0~pre7+20171219/src/vobsub2srt.c++:27:
/usr/include/tesseract/publictypes.h:33:1: error: 'constexpr' does not name a type
 constexpr int kPointsPerInch = 72;
 ^~~~~~~~~
/usr/include/tesseract/publictypes.h:33:1: note: C++11 'constexpr' only available with -std=c++11 or -std=gnu++11
/usr/include/tesseract/publictypes.h:38:1: error: 'constexpr' does not name a type
 constexpr int kMinCredibleResolution = 70;
 ^~~~~~~~~
/usr/include/tesseract/publictypes.h:38:1: note: C++11 'constexpr' only available with -std=c++11 or -std=gnu++11
/usr/include/tesseract/publictypes.h:40:1: error: 'constexpr' does not name a type
 constexpr int kMaxCredibleResolution = 2400;
 ^~~~~~~~~
/usr/include/tesseract/publictypes.h:40:1: note: C++11 'constexpr' only available with -std=c++11 or -std=gnu++11
/usr/include/tesseract/publictypes.h:45:1: error: 'constexpr' does not name a type
 constexpr int kResolutionEstimationFactor = 10;
 ^~~~~~~~~
/usr/include/tesseract/publictypes.h:45:1: note: C++11 'constexpr' only available with -std=c++11 or -std=gnu++11
In file included from /usr/include/tesseract/ltrresultiterator.h:26:0,
                 from /usr/include/tesseract/resultiterator.h:26,
                 from /usr/include/tesseract/baseapi.h:31,
                 from /src/vobsub2srt-1.0~pre7+20171219/src/vobsub2srt.c++:27:
/usr/include/tesseract/unichar.h:164:10: error: 'string' does not name a type; did you mean 'stdin'?
   static string UTF32ToUTF8(const std::vector<char32>& str32);
          ^~~~~~
          stdin
/src/vobsub2srt-1.0~pre7+20171219/src/vobsub2srt.c++: In function 'int main(int, char**)':
/src/vobsub2srt-1.0~pre7+20171219/src/vobsub2srt.c++:218:3: error: 'TessBaseAPI' has not been declared
   TessBaseAPI::SimpleInit(tess_path, tess_lang, false); // TODO params
   ^~~~~~~~~~~
/src/vobsub2srt-1.0~pre7+20171219/src/vobsub2srt.c++:220:5: error: 'TessBaseAPI' has not been declared
     TessBaseAPI::SetVariable("tessedit_char_blacklist", blacklist.c_str());
     ^~~~~~~~~~~
/src/vobsub2srt-1.0~pre7+20171219/src/vobsub2srt.c++:275:20: error: 'TessBaseAPI' has not been declared
       char *text = TessBaseAPI::TesseractRect(image, 1, stride, 0, 0, width, height);
                    ^~~~~~~~~~~
/src/vobsub2srt-1.0~pre7+20171219/src/vobsub2srt.c++:314:3: error: 'TessBaseAPI' has not been declared
   TessBaseAPI::End();
   ^~~~~~~~~~~
src/CMakeFiles/vobsub2srt.dir/build.make:65: recipe for target 'src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o' failed
make[2]: *** [src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o] Error 1

doesn't work with Tesseract 4 ?

It looks like it can't cope with tesseract 4's language data files:

open("/usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=4113088, ...}) = 0
read(3, "\30\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 4096) = 4096
write(2, "Failed loading language 'eng'\n", 30Failed loading language 'eng'
) = 30
write(2, "Tesseract couldn't load any lang"..., 39Tesseract couldn't load any languages!

--dump-images outputing some fuzzy .pgm

So sorry for reporting bugs again, if I'd know C/C++ I'd fork it myself and send u patches instead though, anyway I'd create the bug first so I could assign it to me, so please don't bother if I create too many issues.

I built the latest code, and used the --dump-images

I realized that the OCR was still running(could it be possible that this option would make the program not to run the OCR) , or adding some --no-ocr option? That could be useful for the GUI project.

Nevertheless, while the OCR is correctly done, many of the .pgm generated appear like this:

https://dl.dropboxusercontent.com/u/1599184/others/gxp-spirit.45.xvid-002.pgm

it is quite odd, because the OCR for that line is well generated.
Thanks.

vobsub2srt ebuild for gentoo

Hi,

I was not sure, where to put this, so I am putting it here.
I have created an ebuild that allows installing vobsub2srt and its dependencies on a gentoo machine. I put the ebuild code below. just copy and paste it into a file called vobsub2srt-9999.ebuild in the directory /usr/local/portage/media/video/vobsub2srt/. Then, you need to open a terminal, cd to that directory and run ebuild vobsub2srt-999.ebuild digest. Afterwards you can compile and install the latest git version of vobsub2srt by running emerge vobsub2srt
It works with a stable release of gentoo where it will use tesseract 2.04-r1. However that tesseract version is quite old. in order to get tesseract 3.x to run, you need to enable the 'stuff' portage overlay as follows. run emerge layman then run layman -a stuff which will enable a small gentoo overlay that contains tesseract 3.01 and its dependencies that are not in the standard portage tree. Afterwards, you can again just emerge vobsub2srt which will then use tesseract 3.01.
Here is the ebuild:

# Copyright 1999-2012 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: /media-video/vobsub2srt/ChangeLog,v 0.1 2012/01/06 19:15:04 thawn Exp $

EAPI="4"

EGIT_REPO_URI="git://github.com/ruediger/VobSub2SRT.git"

inherit git-2

IUSE=""

DESCRIPTION="Converts image subtitles created by VobSub (.sub/.idx) to .srt textual subtitles using tesseract OCR engine"
HOMEPAGE="https://github.com/ruediger/VobSub2SRT"

LICENSE="GPL-3"
SLOT="0"
KEYWORDS="~amd64 ~x86"

RDEPEND=">=app-text/tesseract-2.04-r1
    >=virtual/ffmpeg-0.6.90"
DEPEND="${RDEPEND}"
src_configure() {
    econf
}
src_compile() {
    emake || die
}

src_install() {
    emake DESTDIR="${D}" install || die
}

multiple subtitles of same language

Hi,
I have a file with two english subtitles:
--langlist:
Languages:
0: en
1: en

Using the --lang en option, stream 0 is selected, at least according to help and manpage there is no option to choose stream 1.
In this case stream 0 is the HOH-subtitle, while stream 1 containes the subtiltes for the non-english part only.

Couldn't install via the provided PPA

Hi,

I just tried to install VobSub2SRT via the provided PPA and the install failed.

jarrett@eddy:~% sudo add-apt-repository ppa:ruediger-c-plusplus/vobsub2srt
 VobSub2SRT is a simple command line program to convert .idx / .sub subtitles into .srt text subtitles by using OCR. It is based on code from the MPlayer project - a really really great movie player. Tesseract is used as OCR software.
 More info: https://launchpad.net/~ruediger-c-plusplus/+archive/ubuntu/vobsub2srt
Press [ENTER] to continue or ctrl-c to cancel adding it
gpg: keyring `/tmp/tmpofuhag0z/secring.gpg' created
gpg: keyring `/tmp/tmpofuhag0z/pubring.gpg' created
gpg: requesting key 7E0A5F1D from hkp server keyserver.ubuntu.com
gpg: /tmp/tmpofuhag0z/trustdb.gpg: trustdb created
gpg: key 7E0A5F1D: public key "Launchpad PPA for Rüdiger Sonderfeld" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
OK
jarrett@eddy:~% sudo apt-get update
...
W: Failed to fetch http://ppa.launchpad.net/ruediger-c-plusplus/vobsub2srt/ubuntu/dists/trusty/main/binary-amd64/Packages  404  Not Found
W: Failed to fetch http://ppa.launchpad.net/ruediger-c-plusplus/vobsub2srt/ubuntu/dists/trusty/main/binary-i386/Packages  404  Not Found
E: Some index files failed to download. They have been ignored, or old ones used instead.
jarrett@eddy:~% sudo apt-get install vobsub2srt
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package vobsub2srt

I'm using ubuntu server 14.04.3 LTS.

jarrett@eddy:~% lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:        14.04
Codename:       trusty

Here's what /etc/apt/sources.list.d/ruediger-c-plusplus-vobsub2srt-trusty.list looks like:

deb http://ppa.launchpad.net/ruediger-c-plusplus/vobsub2srt/ubuntu trusty main
# deb-src http://ppa.launchpad.net/ruediger-c-plusplus/vobsub2srt/ubuntu trusty main

I can install using another method. I just figured the issue should be logged :)

Thanks,
Jarrett

wrong lang prefix makes the operation fail

Trying to extract french subtitles to srt from idx/sub than contains english, spanish and french, I installed the french localization for tesseract but vobsub2srt conversion still fails because it's looking for files prefixed fre. when the installed files are prefixed with fra.

for example vobsub2srt requires fre.unicharset when the french tesseract package installed fra.unicharset

VobSub2SRT fails to build with tesseract 3.02

Reported by e-mail:

This seems to be the core of it

In file included from
/home/ozone/subchain/usr/include/tesseract/apitypes.h:23:0,
                 from
/home/ozone/subchain/usr/include/tesseract/baseapi.h:28,
                 from /home/ozone/subchain/VobSub2SRT/src/vobsub2srt.c++:27:
/home/ozone/subchain/usr/include/tesseract/publictypes.h:108:28: error:
comma at end of enumerator list [-pedantic]
/home/ozone/subchain/usr/include/tesseract/publictypes.h:122:38: error:
comma at end of enumerator list [-pedantic]
/home/ozone/subchain/usr/include/tesseract/publictypes.h:139:35: error:
comma at end of enumerator list [-pedantic]
/home/ozone/subchain/usr/include/tesseract/publictypes.h:221:22: error:
comma at end of enumerator list [-pedantic]
In file included from
/home/ozone/subchain/usr/include/tesseract/baseapi.h:30:0,
                 from /home/ozone/subchain/VobSub2SRT/src/vobsub2srt.c++:27:
/home/ozone/subchain/usr/include/tesseract/unichar.h:42:14: error: comma
at end of enumerator list [-pedantic]
In file included from
/home/ozone/subchain/usr/include/tesseract/tesscallback.h:22:0,
                 from
/home/ozone/subchain/usr/include/tesseract/baseapi.h:31,
                 from /home/ozone/subchain/VobSub2SRT/src/vobsub2srt.c++:27:
/home/ozone/subchain/usr/include/tesseract/host.h:108:1: error: ISO C++
1998 does not support ‘long long’ [-Wlong-long]
/home/ozone/subchain/usr/include/tesseract/host.h:109:1: error: ISO C++
1998 does not support ‘long long’ [-Wlong-long]
In file included from
/home/ozone/subchain/usr/include/tesseract/unicharset.h:23:0,
                 from
/home/ozone/subchain/usr/include/tesseract/ltrresultiterator.h:26,
                 from
/home/ozone/subchain/usr/include/tesseract/resultiterator.h:26,
                 from
/home/ozone/subchain/usr/include/tesseract/baseapi.h:34,
                 from /home/ozone/subchain/VobSub2SRT/src/vobsub2srt.c++:27:
/home/ozone/subchain/usr/include/tesseract/errcode.h:98:2: error: extra
‘;’ [-pedantic]
In file included from
/home/ozone/subchain/usr/include/tesseract/ltrresultiterator.h:26:0,
                 from
/home/ozone/subchain/usr/include/tesseract/resultiterator.h:26,
                 from
/home/ozone/subchain/usr/include/tesseract/baseapi.h:34,
                 from /home/ozone/subchain/VobSub2SRT/src/vobsub2srt.c++:27:
/home/ozone/subchain/usr/include/tesseract/unicharset.h:165:66: warning:
type qualifiers ignored on function return type [-Wignored-qualifiers]
/home/ozone/subchain/usr/include/tesseract/unicharset.h:170:46: warning:
type qualifiers ignored on function return type [-Wignored-qualifiers]
/home/ozone/subchain/usr/include/tesseract/unicharset.h:185:50: warning:
type qualifiers ignored on function return type [-Wignored-qualifiers]
/home/ozone/subchain/usr/include/tesseract/unicharset.h:191:54: warning:
type qualifiers ignored on function return type [-Wignored-qualifiers]
In file included from
/home/ozone/subchain/VobSub2SRT/src/vobsub2srt.c++:27:0:
/home/ozone/subchain/usr/include/tesseract/baseapi.h:653:32: warning:
type qualifiers ignored on function return type [-Wignored-qualifiers]
/home/ozone/subchain/usr/include/tesseract/baseapi.h:657:29: warning:
type qualifiers ignored on function return type [-Wignored-qualifiers]
/home/ozone/subchain/VobSub2SRT/src/vobsub2srt.c++: In function ‘int
main(int, char**)’:

Changing CXXFLAGS from -pedantic-errors to -pedantic (and adding -Wno-long-long) should fix this.

This seems to actually be a bug in tesseract if they want C++98 compliance. C++11 allows comma at end of enumerator list. TODO: Report upstream?

Not working in every way

I dumped my vobsubs using
mencoder -o /dev/null VTS_07_1.VOB VTS_07_2.VOB -oac copy -ovc copy -vobsubout subtitle
Subtitledit can open it fine so I think files fine.
However when using vobsub2srt I get an srt with only time codes.

I executed 'vobsub2srt --dumpimages --tesseract-lang eng subtitle'

and all images appear to be blank like so
https://dl.dropboxusercontent.com/u/26289275/sub/sub.pmg

Here is the sub and idx files
https://dl.dropboxusercontent.com/u/26289275/sub/subtitle.sub
https://dl.dropboxusercontent.com/u/26289275/sub/subtitle.idx

Can't find file?

I'm sorry if this is the wrong place to put this but I'm having a very strange issue. I compiled and built from the latest version.

juan@Anne:$ stat subs.idx
File: ‘subs.idx’
Size: 131782 Blocks: 264 IO Block: 4096 regular file
Device: 805h/2053d Inode: 12062128 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ juan) Gid: ( 1000/ juan)
Access: 2013-06-20 01:30:21.095885169 -0500
Modify: 2013-06-20 01:30:07.979885717 -0500
Change: 2013-06-20 01:30:08.027885715 -0500
Birth: -
juan@Anne:
$ vobsub2srt subs --listlang
fopen Vobsub file failed: No such file or directory
VobSub: Can't open SUB file
Couldn't open VobSub files 'subs.idx/.sub'

What could I be doing wrong?
Thanks for your help,
Juan

feature request: dump only failed images.

ERROR: OCR failed for 1
ERROR: OCR failed for 23
ERROR: OCR failed for 133
ERROR: OCR failed for 367
ERROR: OCR failed for 367
ERROR: OCR failed for 386

Can you make an argumen to dump only the images that failed to ocr? And if possible allow them to be opened in external image editor so I can be prompted on the cli for a fix?

deb package script man directory incorrect

hi - first of all thank you for your great application - make package results in vobsub2srt.1.gz being placed in incorrect directory path
/usr/INSTALL_MAN_DIR/ instead of correct patch /usr/share/man/man1/

make package was done on Debian Wheezy (testing)

Compile error

$ git clone https://github.com/ruediger/VobSub2SRT.git
Cloning into 'VobSub2SRT'...
remote: Counting objects: 356, done.
remote: Compressing objects: 100% (203/203), done.
remote: Total 356 (delta 214), reused 287 (delta 145)
Receiving objects: 100% (356/356), 102.33 KiB, done.
Resolving deltas: 100% (214/214), done.

$ ./configure
-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Check for working C compiler: /usr/bin/gcc
-- Check for working C compiler: /usr/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Source: /tmp/VobSub2SRT
-- Binary: /tmp/VobSub2SRT/build
-- Build type: Debug
-- checking for module 'libavutil'
-- found libavutil, version 51.34.101
-- Looking for include files CMAKE_HAVE_PTHREAD_H
-- Looking for include files CMAKE_HAVE_PTHREAD_H - found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Performing Test TESSERACT_NAMESPACE
-- Performing Test TESSERACT_NAMESPACE - Success
-- Found Tesseract: /usr/lib/libtesseract_full.a;/usr/lib/i386-linux-gnu/libtiff.so
-- vobsub2srt version: 23dcb63
-- Debian architecture: i386
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/VobSub2SRT/build

$ make
make -C build
make[1]: Entering directory /tmp/VobSub2SRT/build' make[2]: Entering directory/tmp/VobSub2SRT/build'
make[3]: Entering directory /tmp/VobSub2SRT/build' Scanning dependencies of target mplayer make[3]: Leaving directory/tmp/VobSub2SRT/build'
make[3]: Entering directory /tmp/VobSub2SRT/build' [ 14%] Building C object mplayer/CMakeFiles/mplayer.dir/mp_msg.c.o [ 28%] Building C object mplayer/CMakeFiles/mplayer.dir/spudec.c.o [ 42%] Building C object mplayer/CMakeFiles/mplayer.dir/unrar_exec.c.o [ 57%] Building C object mplayer/CMakeFiles/mplayer.dir/vobsub.c.o Linking C static library ../lib/libmplayer.a make[3]: Leaving directory/tmp/VobSub2SRT/build'
[ 57%] Built target mplayer
make[3]: Entering directory /tmp/VobSub2SRT/build' Scanning dependencies of target vobsub2srt make[3]: Leaving directory/tmp/VobSub2SRT/build'
make[3]: Entering directory /tmp/VobSub2SRT/build' [ 71%] Building CXX object src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o [ 85%] Building CXX object src/CMakeFiles/vobsub2srt.dir/langcodes.c++.o [100%] Building CXX object src/CMakeFiles/vobsub2srt.dir/cmd_options.c++.o Linking CXX executable ../bin/vobsub2srt CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o: In functionmain':
/tmp/VobSub2SRT/src/vobsub2srt.c++:155: undefined reference to tesseract::TessBaseAPI::TessBaseAPI()' /tmp/VobSub2SRT/src/vobsub2srt.c++:158: undefined reference totesseract::TessBaseAPI::SetVariable(char const_, char const_)'
/tmp/VobSub2SRT/src/vobsub2srt.c++:207: undefined reference to tesseract::TessBaseAPI::TesseractRect(unsigned char const*, int, int, int, int, int, int)' /tmp/VobSub2SRT/src/vobsub2srt.c++:225: undefined reference totesseract::TessBaseAPI::End()'
/tmp/VobSub2SRT/src/vobsub2srt.c++:232: undefined reference to tesseract::TessBaseAPI::~TessBaseAPI()' /tmp/VobSub2SRT/src/vobsub2srt.c++:232: undefined reference totesseract::TessBaseAPI::~TessBaseAPI()'
CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o: In function tesseract::TessBaseAPI::Init(char const*, char const*)': /usr/include/tesseract/baseapi.h:208: undefined reference totesseract::TessBaseAPI::Init(char const_, char const_, tesseract::OcrEngineMode, char*, int, GenericVector const, GenericVector const_, bool)'
collect2: ld returned 1 exit status
make[3]: *_* [bin/vobsub2srt] Error 1
make[3]: Leaving directory /tmp/VobSub2SRT/build' make[2]: *** [src/CMakeFiles/vobsub2srt.dir/all] Error 2 make[2]: Leaving directory/tmp/VobSub2SRT/build'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/tmp/VobSub2SRT/build'
make: *** [all] Error 2

$ dpkg -l | grep tesseract
ii libtesseract-dev 3.02.01-2 Development files for the tesseract command line OCR tool
ii libtesseract3 3.02.01-2 Command line OCR tool
ii tesseract-ocr 3.02.01-2 Command line OCR tool
ii tesseract-ocr-deu-f 2.01-2 tesseract-ocr language files for the German Fraktur script
ii tesseract-ocr-dev 2.04-2+squeeze1 Development files for the tesseract command line OCR tool
ii tesseract-ocr-eng 3.02-2 tesseract-ocr language files for English
ii tesseract-ocr-equ 3.02-2 tesseract-ocr language files for equations
ii tesseract-ocr-osd 3.02-2 tesseract-ocr language files for script and orientation

language support weird

in my subtitles file, when i listlangs, i get "zh", so when i select "zh" with --lang, vobsub2srt looks for zho.traineddata in tesseract. it should be looking for chi_* or perhaps i should be able to specify this thing directly.

linking against libavutil no longer needed?

Output of "lddtree vobsub2srt":

vobsub2srt => /usr/bin/vobsub2srt (interpreter => /lib64/ld-linux-x86-64.so.2)
    libtesseract.so.3 => /usr/lib64/libtesseract.so.3
        liblept.so.5 => /usr/lib64/liblept.so.5
            libz.so.1 => /lib64/libz.so.1
            libpng16.so.16 => /usr/lib64/libpng16.so.16
            libjpeg.so.62 => /usr/lib64/libjpeg.so.62
            libgif.so.7 => /usr/lib64/libgif.so.7
            libtiff.so.5 => /usr/lib64/libtiff.so.5
                liblzma.so.5 => /lib64/liblzma.so.5
            libwebp.so.6 => /usr/lib64/libwebp.so.6
        libm.so.6 => /lib64/libm.so.6
            ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
    libpthread.so.0 => /lib64/libpthread.so.0
    libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/libstdc++.so.6
    libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/libgcc_s.so.1
    libc.so.6 => /lib64/libc.so.6

make package (deb) broken - error in 'Version' field string 'unknown-dirty'

Hi ruediger

Already using couple of years your amazing OCR tool. I am setting up my new laptop right now and noticed that the compiled deb package is broken and refuses to be installed via dpkg. Info to reproduce:

OS: debian jessie/stable

configure: ok, no issues
make: ok, no issues
make package: built package 'vobsub2srt-unknown-dirty-Linux.deb' refuses to be installed via dpkg with following output:

dpkg -i vobsub2srt-unknown-dirty-Linux.deb

dpkg: error processing archive vobsub2srt-unknown-dirty-Linux.deb (--install): parsing file '/var/lib/dpkg/tmp.ci/control' near line 2 package 'vobsub2srt': error in 'Version' field string 'unknown-dirty': version number does not start with digit

Could you please fix this and if possible upload a new release source code? It fails with release source code VobSub2SRT-1.0pre7 and it also fails with todays source code.

Thanks in advance, and your work is highly appreciated!

I got an "Aborted" error

Whilst running the program I got:

vobsub2srt: unicharset.cpp:76: const UNICHAR_ID UNICHARSET::unichar_to_id(const char*, int) const: Assertion `ids.contains(unichar_repr, length)' failed.
Aborted

This was on line 1,313 of a 1,620 line .srt file. There didn't appear to be anything different about the next image to be ocr'ed?

error : not support my lang?

Hi
When i use vobsub2srt in my debian(amd64) this error occured :

alireza@debian-lap:/media/1E2AB0A8/Sec I/02$ vobsub2srt Rome-S01E02
Error opening data file /usr/share/tesseract-ocr/tessdata/fas.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'fas'
Tesseract couldn't load any languages!
Segmentation fault

debian 64bit.
latest github vobsub2srt installed.
My Language : Persian(Farsi)

Thanks.

error compiling gentoo D:

Hi, well i try compiling and i get htis error (dependeces installed):

Scanning dependencies of target documentation
Scanning dependencies of target mplayer
[ 10%] Generating vobsub2srt.1.gz
[ 40%] Building C object mplayer/CMakeFiles/mplayer.dir/unrar_exec.c.o
[ 40%] Building C object mplayer/CMakeFiles/mplayer.dir/mp_msg.c.o
[ 40%] Building C object mplayer/CMakeFiles/mplayer.dir/spudec.c.o
[ 50%] Building C object mplayer/CMakeFiles/mplayer.dir/vobsub.c.o
[ 50%] Built target documentation
[ 60%] Linking C static library ../lib/libmplayer.a
[ 60%] Built target mplayer
Scanning dependencies of target vobsub2srt
[ 80%] Building CXX object src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o
[ 90%] Building CXX object src/CMakeFiles/vobsub2srt.dir/cmd_options.c++.o
[ 90%] Building CXX object src/CMakeFiles/vobsub2srt.dir/langcodes.c++.o
[100%] Linking CXX executable ../bin/vobsub2srt
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clBuildProgram' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clEnqueueNDRangeKernel' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clSetKernelArg' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clReleaseMemObject' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clFinish' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clEnqueueUnmapMemObject' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clCreateContextFromType' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clGetCommandQueueInfo' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clReleaseContext' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clEnqueueCopyBuffer' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clGetProgramBuildInfo' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clCreateContext' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clEnqueueMapBuffer' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clGetDeviceIDs' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clGetContextInfo' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clGetDeviceInfo' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clReleaseCommandQueue' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clGetPlatformIDs' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clGetPlatformInfo' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clCreateProgramWithBinary' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clCreateCommandQueue' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clReleaseProgram' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clGetProgramInfo' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clCreateKernel' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clCreateBuffer' sin definir
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/../../../../lib64/libtesseract.so: referencia a `clCreateProgramWithSource' sin definir
collect2: error: ld devolvió el estado de salida 1
src/CMakeFiles/vobsub2srt.dir/build.make:149: fallo en las instrucciones para el objetivo 'bin/vobsub2srt'
make[2]: *** [bin/vobsub2srt] Error 1
CMakeFiles/Makefile2:172: fallo en las instrucciones para el objetivo 'src/CMakeFiles/vobsub2srt.dir/all'
make[1]: *** [src/CMakeFiles/vobsub2srt.dir/all] Error 2
Makefile:149: fallo en las instrucciones para el objetivo 'all'
make: *** [all] Error 2

Cya.

HomeBrew formula broken

brew install --HEAD https://raw.githubusercontent.com/ruediger/VobSub2SRT/master/packaging/vobsub2srt.rb
######################################################################## 100.0%
==> Installing vobsub2srt 
==> Cloning git://github.com/ruediger/VobSub2SRT.git
Updating /Users/camdennarzt/Library/Caches/Homebrew/vobsub2srt--git
==> Checking out branch master
==> ./configure ["-DCMAKE_C_FLAGS_RELEASE=-DNDEBUG", "-DCMAKE_CXX_FLAGS_RELEASE=-DNDEBUG", "-DCMAKE_INSTALL_PREFIX=/usr/local/Cellar/vobsub2srt/HEAD-d4c34ca", "-DCMAKE_BUILD_TYPE=Release", "-DCMAKE_FIND_FRAMEWORK=LAST", "-DCMAKE_VERBOSE_MAKEFILE=ON", "-Wno-dev"]
==> make install
Error: Empty installation
HOMEBREW_VERSION: 1.3.5-4-g56458f0
ORIGIN: https://github.com/Homebrew/brew
HEAD: 56458f03fcc68ef6d8ee3ee4a7c1d16021aa5800
Last commit: 22 hours ago
Core tap ORIGIN: https://github.com/Homebrew/homebrew-core
Core tap HEAD: be58c3e7adb0801b0e0259d6120282ab5e3d35f3
Core tap last commit: 2 hours ago
HOMEBREW_PREFIX: /usr/local
HOMEBREW_REPOSITORY: /usr/local/Homebrew
HOMEBREW_CELLAR: /usr/local/Cellar
HOMEBREW_BOTTLE_DOMAIN: https://homebrew.bintray.com
CPU: octa-core 64-bit haswell
Homebrew Ruby: 2.3.3 => /System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/bin/ruby
Clang: 9.0 build 900
Git: 2.14.2 => /usr/local/bin/git
Perl: /usr/bin/perl
Python: /usr/bin/python
Ruby: /Users/camdennarzt/.rbenv/shims/ruby => /Users/camdennarzt/.rbenv/versions/2.3.5/bin/ruby
Java: 9
macOS: 10.13-x86_64
Xcode: 9.0
CLT: 9.0.0.0.1.1504363082
X11: N/A

VobSub2SRT output 0KB probably problem finding out languages id?

I'm using the PPA latest .deb packages,

when I run
vobsub2srt gxp-spirit.45.xvid --dump-images
it returns
Wrote Subtitles to 'gxp-spirit.45.xvid.srt'

with a 0KB gxp-spirit.45.xvid.srt file.

I realized it may have to do with vobsub2srt not being able to decode the list of languages, it also shows "0:\n" when I use --langlist parameter.

here are the VobSub subtitle for you to try if you want, with smplayer, the subtitles are correctly displayed and the language EN shows up.
https://dl.dropboxusercontent.com/u/1599184/others/gxp-spirit.45.xvid.sub
https://dl.dropboxusercontent.com/u/1599184/others/gxp-spirit.45.xvid.idx

Thanks a lot to take a look.

brew formula fails to install

I did a bit of digging with brew --debug and the shell. Looks like make install fails, trying to find the manpage file.

Install the project...
-- Install configuration: "None"
-- Up-to-date: /usr/local/Cellar/vobsub2srt/HEAD/share/doc/vobsub2srt/copyright
-- Up-to-date: /usr/local/Cellar/vobsub2srt/HEAD/share/doc/vobsub2srt/README
-- Up-to-date: /usr/local/Cellar/vobsub2srt/HEAD/bin/vobsub2srt
CMake Error at doc/cmake_install.cmake:39 (FILE):
file INSTALL cannot find "/tmp/vobsub2srt-vSax/build/doc/vobsub2srt.1.gz".
Call Stack (most recent call first):
cmake_install.cmake:58 (INCLUDE)

make: *** [install] Error 1

Build failed on MacOS Mojave with Homebrew

Installing vobsub2srt --HEAD
==> Cloning git://github.com/ruediger/VobSub2SRT.git
Cloning into '/Users/marvin/Library/Caches/Homebrew/vobsub2srt--git'...
==> Checking out branch master
Already on 'master'
Your branch is up to date with 'origin/master'.
==> cmake .. -DCMAKE_C_FLAGS_RELEASE=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE=-DNDEBUG -DCMAKE_INSTALL_PREFIX=/usr/local/Cellar/vobsub
==> make install
Last 15 lines from /Users/marvin/Library/Logs/Homebrew/vobsub2srt/02.make:
^
In file included from /tmp/vobsub2srt-20181107-81806-lxgjcj/src/vobsub2srt.c++:27:
In file included from /usr/local/include/tesseract/baseapi.h:32:
/usr/local/include/tesseract/serialis.h:60:43: error: unknown type name 'size_t'; did you mean 'ssize_t'?
bool DeSerialize(FILE* fp, int32_t* data, size_t n = 1);
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/_types/_ssize_t.h:31:33: note: 'ssize_t' declared here
typedef __darwin_ssize_t ssize_t;
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make[2]: *** [src/CMakeFiles/vobsub2srt.dir/vobsub2srt.c++.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [src/CMakeFiles/vobsub2srt.dir/all] Error 2
make: *** [all] Error 2

Chinese character error

Hello
i've been trying to use vobsub2srt to convert chinese sb to srt.
Using the following command :
vobsub2srt --lang zh --tesseract-lang chi_sim subtitles

however the conversion is not working well, a lot of character are not recognized correctly, even so the font used in vobsub is perfectly readable.

Here the vobsub screenshot :
http://img546.imageshack.us/img546/4601/u1xu.jpg

Here the converted sub :
http://img571.imageshack.us/img571/5273/iyx2.jpg

We can easily see that some characted have been simplfied.
Here are the sub/idx files.
http://www.2shared.com/file/ZvL3xukf/subtitles.html
http://www.2shared.com/file/1u7L35fD/subtitles.html

Is this normal? is there a work around ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.