Coder Social home page Coder Social logo

prawnpdf / pdf-inspector Goto Github PK

View Code? Open in Web Editor NEW
153.0 153.0 34.0 98 KB

A collection of PDF::Reader based analysis classes for inspecting PDF output. Mainly used for testing Prawn, but will work with any PDF.

Home Page: http://prawnpdf.org

License: Other

Ruby 100.00%

pdf-inspector's People

Contributors

alexblackie avatar alexsoble avatar bradediger avatar davetron5000 avatar fnando avatar gnclmorais avatar henrik avatar iamjohnford avatar johnnyshields avatar msbit avatar packetmonkey avatar petergoldstein avatar pointlessone avatar practicingruby avatar sigmike avatar yob avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pdf-inspector's Issues

Text.analyze_file([MY_PATH]).strings returns array of characters

I'm testing the contents of a PDF generated by PDFKit. When I run Text.analyze_file([MY_PATH]).strings on the file I get an array which holds each character of the PDF content in it's own index. Spaces are stored as '' (empty string). I've been able to move forward by replacing all empty strings with a space character. However, I'm now up against content which contains new line characters. The new lines are not stored in the array, so the separation between the words is lost around the new line character. Ever see this sort of behaviour? I realize that there are a number of factors which could be screwing things up, including my own ignorance, and I'd love to find the root of the problem, but I have no time. Right now, I'd be happy with a hack to get my tests working.
Cheers!

EDIT: So I came up with a hack that'll get me through. I remove all the white space characters from the array (they weren't actually empty strings, as I had believed). Then join the characters with exactly one space, and downcase the whole thing.

def char_array_to_normalized_string(arr)
arr.delete_if{|s| s =~ /\s/ }.join(' ').downcase
end

After I put my test strings through the same process, by calling char_array_to_normalized_string("Test String".scan(/./)), I'm able to match them against the ouput of PDF inspector. It's not pretty, but it gets me where I need to go.
Cheers!

Pick a license for pdf-inspector

What license should pdf-inspector 1.0 be released with?

So far development has proceeded with no explicit license, but if we're upgrading this code to a gem in it's own right I feel we should be explicit.

Numerals read as `\u0000` when using font feature settings

First of all, thanks for the work and effort you've put into this great library!

Bug description

We are having an issue with numerals not being read correctly by PDF::Inspector::Text.analyze. They get misinterpreted as \u0000 when we use font-feature-settings: 'tnum' as style. We are generating the PDF with Gotenberg from HTML templates.

Minimal reproducible example

<div>21.09.2023</div> gets read as 21.09.2023

while

<div style="font-feature-settings: 'tnum'">21.09.2023</div>gets read as \u0000\u0000.\u0000\u0000.\u0000\u0000\u0000\u0000.

PDFs

Here are two PDFs, one with the feature turned off and one with the feature turned on:
font_features_off.pdf
font_features_on.pdf

Further information

The UNIX tool pdftotext is able to read both versions correctly so I think the PDF is alright.
The font in use is Barlow if that makes any difference.

Any help would be appreciated!

P.S.: I'll also open an issue regarding this problem over at https://github.com/yob/pdf-reader so feel free to close this one if you think it should be handled there.

release 1.1

The next release of pdf-reader (1.4) will start printing deprecation warnings when the pre 1.0 API is used.

I'd like to release the current pdf-inspector master branch as 1.1 and move prawn to thatso the prawn specs don't print deprecation warnings.

The pdf-inspector API remains the same.

Any objections?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.