Coder Social home page Coder Social logo

grim's Introduction

                    ,____
                    |---.\
            ___     |    `
           / .-\  ./=)
          |  |"|_/\/|
          ;  |-;| /_|
         / \_| |/ \ |
        /      \/\( |
        |   /  |` ) |
        /   \ _/    |
       /--._/  \    |
       `/|)    |    /
         /     |   |
       .'      |   |
      /         \  |
     (_.-.__.__./  /

Grim

Grim is a simple gem for extracting (reaping) a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.

Prerequisites

You will need ghostscript, imagemagick, and xpdf installed. On the Mac (OSX) I highly recommend using Homebrew to get them installed.

$ brew install ghostscript imagemagick xpdf

Installation

$ gem install grim

Usage

pdf   = Grim.reap("/path/to/pdf")         # returns Grim::Pdf instance for pdf
count = pdf.count                         # returns the number of pages in the pdf
png   = pdf[3].save('/path/to/image.png') # will return true if page was saved or false if not
text  = pdf[3].text                       # returns text as a String

pdf.each do |page|
  puts page.text
end

We also support using other processors (the default is whatever version of Imagemagick/Ghostscript is in your path).

# specifying one processor with specific ImageMagick and GhostScript paths
Grim.processor =  Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/convert", :ghostscript_path => "/path/to/gs"})

# multiple processors with fallback if first fails, useful if you need multiple versions of convert/gs
Grim.processor = Grim::MultiProcessor.new([
  Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/6.7/convert", :ghostscript_path => "/path/to/9.04/gs"}),
  Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/6.6/convert", :ghostscript_path => "/path/to/9.02/gs"})
])

pdf = Grim.reap('/path/to/pdf')

You can even specify a Windows executable โšก

# specifying another ghostscript executable, win64 in this example
# the ghostscript/bin folder still has to be in the PATH for this to work
Grim.processor =  Grim::ImageMagickProcessor.new({:ghostscript_path => "gswin64c.exe"})

pdf = Grim.reap('/path/to/pdf')

Grim::ImageMagickProcessor#save supports several options as well:

pdf = Grim.reap("/path/to/pdf")
pdf[0].save('/path/to/image.png', {
  :width => 600,         # defaults to 1024
  :density => 72,        # defaults to 300
  :quality => 60,        # defaults to 90
  :colorspace => "CMYK", # defaults to "RGB"
  :alpha => "Activate"   # not used when not set
})

Grim has limited logging abilities. The default logger is Grim::NullLogger but you can also set your own logger.

require "logger"
Grim.logger = Logger.new($stdout).tap { |logger| logger.progname = 'Grim' }
Grim.processor = Grim::ImageMagickProcessor.new({:ghostscript_path => "/path/to/bin/gs"})
pdf = Grim.reap("/path/to/pdf")
pdf[3].save('/path/to/image.png')
# D, [2016-06-09T22:43:07.046532 #69344] DEBUG -- grim: Running imagemagick command
# D, [2016-06-09T22:43:07.046626 #69344] DEBUG -- grim: PATH=/path/to/bin:/usr/local/bin:/usr/bin
# D, [2016-06-09T22:43:07.046787 #69344] DEBUG -- grim: convert -resize 1024 -antialias -render -quality 90 -colorspace RGB -interlace none -density 300 /path/to/pdf /path/to/image.png

Reference

Contributors

License

See LICENSE for details.

grim's People

Contributors

adamcrown avatar bkeepers avatar bryckbost avatar fujimura avatar gordcorp avatar jamespaden avatar jnunemaker avatar jonmagic avatar jonrcahill avatar rubikan avatar victormier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

grim's Issues

MongoDB GridFS

Dears,

kindly, i was looking for a mongodb gridfs support by grim.
to save files on GridFS of MongoDB

ImageMagick processor changing white to black unnecessarily

We are processing a high volume of documents with Grim and many of the documents white backgrounds become black when converted to JPGs. I believe the issue has to do with an alpha layer in the PDFs.

I tried to remedy the issue by adding an alpha flag. I used -alpha remove and -alpha flatten as such
page.save path, width: 1024, quality: 70, density: 150, alpha:"remove"
to no avail.

These flags do work, however, when using the standalone CLI imagemagick tool.

Any help would be much appreciated.

Grim sample code not working

Following error is thrown when run the code

pdf = Grim.reap("pdf.pdf")
count = pdf.count
pdf[3].save('like.png')
text = pdf[3].text

pdf.each do |page|
puts page.text
end

Error
pdf.pdf

C:/Ruby200/lib/ruby/gems/2.0.0/gems/grim-1.3.0/lib/grim/image_magick_processor.r
b:21:in ``': No such file or directory - gs -dNODISPLAY -q -sFile=pdf.pdf C:/Rub
y200/lib/ruby/gems/2.0.0/gems/grim-1.3.0/lib/pdf_info.ps (Errno::ENOENT)
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/grim-1.3.0/lib/grim/image_magic
k_processor.rb:21:in count' from C:/Ruby200/lib/ruby/gems/2.0.0/gems/grim-1.3.0/lib/grim/pdf.rb:35:i n count'
from pdfpng.rb:18:in `

'

Supersampling technique

Hi guys, I need to improve the quality of the images I get from the PDF. When I was researching from the internet, I encountered the supersampling technique. People try this technique in the following way.

convert -density 288 image.pdf -resize 25% resultimage.png

-resize option works by default before conversion. How can I run this line with grim gem?

Grim::UnprocessablePage convert PDF to Images images empty

I am using Grim to convert PDF to images, here is my code :

class PdfToImagesService

  def initialize(pdf_file)
    @pdf_file = pdf_file
  end

  def call
    res = []
    Grim.reap(@pdf_file.path).each_with_index do |page, index|
      input_page = Tempfile.new([index.to_s, '.png'])
      page.save(input_page.path, {
        alpha: 'remove',
        density: 300
      })
      res << input_page
    end
    res
  end

end
It does create an image in the /tmp folder of my server but it's a 0 octet one and so then it crashes with the

convert: not authorized /home/api/source/tmp/uploads/RackMultipart20181031-42206-6lqz8a_aea0ee662b.pdf' @ error/constitute.c/ReadImage/412. convert: no images defined/tmp/020181031-42176-xkzpxu.png' @ error/convert.c/ConvertImageCommand/3210.

So the "020181031-42176-xkzpxu.png" exists but 0 octet.

Command.unshift producing strange command

When using a non default ghostscript_path in the imagemagick_processor, this line gets executed:

command.unshift("PATH=#{File.dirname(@ghostscript_path)}:#{ENV['PATH']}") if @ghostscript_path && @ghostscript_path != DefaultGhostScriptPath

what was the actual purpose of it? The only thing it does for me now on windows is producing a strange command where he tries to invoke my whole path with the imagemagick arguments.

File.exists? deprecated since Ruby 2.1.0

Hi,

As File.exists? has been deprecated since Ruby 2.1.0, would be nice to swith to File.exist? in lib/grim/pdf.rb

  1) Grim::Pdf should not raise NoMethodErrod for File:class
     Failure/Error:
       expect{
         pdf = Grim::Pdf.new(fixture_path("smoker.pdf"))
       }.not_to raise_error(NoMethodError)

       expected no NoMethodError, got #<NoMethodError: undefined method `exists?' for File:Class> with backtrace:
         # ./lib/grim/pdf.rb:18:in `initialize'
         # ./spec/lib/grim/pdf_spec.rb:8:in `new'
         # ./spec/lib/grim/pdf_spec.rb:8:in `block (3 levels) in <top (required)>'
         # ./spec/lib/grim/pdf_spec.rb:7:in `block (2 levels) in <top (required)>'
         # ./spec/lib/grim/pdf_spec.rb:7:in `block (2 levels) in <top (required)>'

Finished in 0.01266 seconds (files took 0.11393 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/lib/grim/pdf_spec.rb:6 # Grim::Pdf should not raise NoMethodErrod for File:class

Extract Pages to create other pdf

Hello,

I have a scenario where I need to extract specific pages from pdf and create a pdf only from those pages.

Can this library will be helpful in this scenario?

I will be thankful to your help.

Set Image Height in Convert

Dears,

in the save method it accept other params but not the height of the new image.
please, how to set the image height by grim.

Thanks,
Shenouda Bertel

need help w/ the gem

I added the grim gem (and just that gem) to my Gemfile and ran bundle; I got a long error; please help:

$ bundle
[DEPRECATED] This Gemfile does not include an explicit global source. Not using an explicit global source may result in a different lockfile being generated depending on the gems you have installed locally before bundler is run. Instead, define a global source in your Gemfile like this: source "https://rubygems.org".
The git source `git://https://github.com/jonmagic/grim` uses the `git` protocol, which transmits data without encryption. Disable this warning with `bundle config set --local git.allow_insecure true`, or switch to the `https` protocol to keep your data secure.
Fetching git://https//github.com/jonmagic/grim
--- ERROR REPORT TEMPLATE -------------------------------------------------------

ArgumentError: Gem sources must be absolute. You provided 'https/'.
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/settings.rb:508:in normalize_uri' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/settings.rb:490:in key_for'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/settings.rb:304:in key_for' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/settings.rb:98:in []'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/source/git/git_proxy.rb:201:in configured_uri_for' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/source/git/git_proxy.rb:92:in checkout'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/source/git.rb:326:in fetch' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/source/git.rb:172:in specs'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/source.rb:58:in spec_names' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/source_map.rb:21:in block in all_requirements'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/source_map.rb:20:in map' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/source_map.rb:20:in all_requirements'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/definition.rb:799:in source_requirements' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/definition.rb:477:in resolver'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/definition.rb:279:in resolve' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/definition.rb:177:in resolve_remotely!'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/installer.rb:271:in resolve_if_needed' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/installer.rb:82:in block in run'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/process_lock.rb:12:in block in lock' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/process_lock.rb:9:in open'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/process_lock.rb:9:in lock' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/installer.rb:71:in run'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/installer.rb:23:in install' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/cli/install.rb:62:in run'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/cli.rb:257:in block in install' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/settings.rb:131:in temporary'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/cli.rb:256:in install' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/vendor/thor/lib/thor/command.rb:27:in run'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/vendor/thor/lib/thor/invocation.rb:127:in invoke_command' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/vendor/thor/lib/thor.rb:392:in dispatch'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/cli.rb:31:in dispatch' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/vendor/thor/lib/thor/base.rb:485:in start'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/cli.rb:25:in start' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/gems/bundler-2.3.26/libexec/bundle:48:in block in <top (required)>'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/bundler/friendly_errors.rb:120:in with_friendly_errors' /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0/gems/bundler-2.3.26/libexec/bundle:36:in <top (required)>'
/home/drhuffman12/.asdf/installs/ruby/3.1.3/bin/bundle:25:in load' /home/drhuffman12/.asdf/installs/ruby/3.1.3/bin/bundle:25:in

'


## Environment

Bundler 2.3.26
Platforms ruby, x86_64-linux
Ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e0ba6b95ab71a441357ed5484e33498) [x86_64-linux]
Full Path /home/drhuffman12/.asdf/installs/ruby/3.1.3/bin/ruby
Config Dir /home/drhuffman12/.asdf/installs/ruby/3.1.3/etc
RubyGems 3.3.26
Gem Home /home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0
Gem Path /home/drhuffman12/.local/share/gem/ruby/3.1.0:/home/drhuffman12/.asdf/installs/ruby/3.1.3/lib/ruby/gems/3.1.0
User Home /home/drhuffman12
User Path /home/drhuffman12/.local/share/gem/ruby/3.1.0
Bin Dir /home/drhuffman12/.asdf/installs/ruby/3.1.3/bin
OpenSSL
Compiled OpenSSL 3.0.7 1 Nov 2022
Loaded OpenSSL 3.1.2 1 Aug 2023
Cert File /home/linuxbrew/.linuxbrew/etc/openssl@3/cert.pem
Cert Dir /home/linuxbrew/.linuxbrew/etc/openssl@3/certs
Tools
Git 2.34.1
RVM not installed
rbenv not installed
chruby not installed


## Bundler Build Metadata

Built At 2023-09-10
Git SHA unknown
Released Version false


## Gemfile

### Gemfile

```ruby
# source "https://rubygems.org"


# gem "pdf-reader", "~> 2.2"
# gem "ascii85_native"
# gem install ascii85_native
gem 'grim', git: "git://https://github.com/jonmagic/grim"

Gemfile.lock

<No /home/drhuffman12/_tmp_/github/drhuffman12/lotto_scanner/Gemfile.lock found>

--- TEMPLATE END ----------------------------------------------------------------

Unfortunately, an unexpected error occurred, and Bundler cannot continue.

First, try this link to see if there are any existing issue reports for this error:
https://github.com/rubygems/rubygems/search?q=Gem+sources+must+be+absolute.+You+provided+%27https%2F%27.&type=Issues

If there aren't any reports for this error yet, please fill in the new issue form located at https://github.com/rubygems/rubygems/issues/new?labels=Bundler&template=bundler-related-issue.md, and copy and paste the report template above in there.

Grim::Pdf.count strange behaviour in windows

When I'm using the "Grim::Pdf.count" method under windows (with all dependencies installed and added to the path) I get this error message:

No such file or directory - gs -dNODISPLAY -q -sFile=C:/Users/#####/Documents/Webtests/2.4.0/webtest_downloads/#####.pdf C:/Ruby/1.9.3/lib/ruby/gems/1.9.1/gems/grim-1.1.0/lib/pdf_info.ps

But when I use this same command and copy & paste it into cmd, it works flawlessly. If I execute the script in MacOSX it also works flawlessy.
Could it be a problem/bug with windows, or did I do anything wrong on my side?

The only thing I had to do was create a symlink from gs to gswin64c.exe since this gem only seems to work if it can execute ghostscript with gs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.