pdf-raku / pdf-class-raku Goto Github PK
View Code? Open in Web Editor NEWPDF Document Object Model (under construction)
License: Artistic License 2.0
PDF Document Object Model (under construction)
License: Artistic License 2.0
After upgrading to the latest Rakudo (2021.08-01), re-installing PDF::Class failed. The error messages are in the attached file.
A bit of an anomaly w.r.t to PDF::Lite, for example t/helloword.pdf
has '<< /Type /Pages /Resources <<...>> >>' in the page tree, not at the page-level. Other PDF's similarly affected.
Hello,
I'm trying to upgrade this module, because I have an error message processing some PDF files using PDF::API6:
Probable version skew in pre-compiled
[...] (PDF::Font)' (cause: no object at index 776)
But I'm getting this error during the testing phase:
$ zef install 'PDF::Class:ver<0.1.2>'
===> Searching for: PDF::Class
===> Testing: PDF::Class:ver<0.1.2>:auth<github:p6-pdf>:api<PDF-1.7>
eval error: use PDF::Class;
use PDF::Catalog;
my PDF::Class $pdf .= new;
my PDF::Catalog $doc = $pdf.catalog;
try {
$doc.PageMode = 'UseToes';
CATCH { default { say "err, that didn't work: $_" } }
}
# same again, bypassing type checking
$doc<PageMode> = :name<UseToes>;
in block at t/00-readme.t line 25
unknown /ShadingType 42 - supported range is 1..7
No Doc handler class [PDF PDF::COS::Type]::Unknown
No Doc handler class [PDF PDF::COS::Type]::Annot::Caret
Probable version skew in pre-compiled '/home/nando/.zef/store/PDF-Class-0.1.2.tar.gz/PDF-Class-0.1.2/lib/PDF/OutputIntent.pm (PDF::OutputIntent)' (cause: no object at index 776)
in method find-delegate at /home/nando/.zef/store/PDF-Class-0.1.2.tar.gz/PDF-Class-0.1.2/lib/PDF/Class/Loader.pm (PDF::Class::Loader) line 32
in method load-delegate at /home/nando/.zef/store/PDF-Class-0.1.2.tar.gz/PDF-Class-0.1.2/lib/PDF/Class/Loader.pm (PDF::Class::Loader) line 105
in block <unit> at t/load-delegate.t line 23
# Looks like you planned 18 tests, but ran 11
===> Testing [FAIL]: PDF::Class:ver<0.1.2>:auth<github:p6-pdf>:api<PDF-1.7>
Aborting due to test failure: PDF::Class:ver<0.1.2>:auth<github:p6-pdf>:api<PDF-1.7> (use --force-test to override)
in code at /opt/rakudo-pkg/share/perl6/sources/8244C3B17ACA61B0EC04857BB3283A8FAF7A186D (Zef::Client) line 374
in method test at /opt/rakudo-pkg/share/perl6/sources/8244C3B17ACA61B0EC04857BB3283A8FAF7A186D (Zef::Client) line 354
in code at /opt/rakudo-pkg/share/perl6/sources/8244C3B17ACA61B0EC04857BB3283A8FAF7A186D (Zef::Client) line 523
in sub at /opt/rakudo-pkg/share/perl6/sources/8244C3B17ACA61B0EC04857BB3283A8FAF7A186D (Zef::Client) line 520
in method install at /opt/rakudo-pkg/share/perl6/sources/8244C3B17ACA61B0EC04857BB3283A8FAF7A186D (Zef::Client) line 621
in sub MAIN at /opt/rakudo-pkg/share/perl6/sources/81436475BD18D66BFD96BBCEE07CCCDC0F368879 (Zef::CLI) line 152
in block <unit> at /opt/rakudo-pkg/share/perl6/resources/D822DF07A6D5CB602F97ED307F62A1B3B5D2C90D line 3
in sub MAIN at /opt/rakudo-pkg/bin/zef line 2
in block <unit> at /opt/rakudo-pkg/bin/zef line 2
I have the following modules installed:
$ zef list --installed|grep PDF
===> Found via /opt/rakudo-pkg/share/perl6
===> Found via /home/nando/.perl6
PDF::API6:ver<0.1.0>:auth<github:p6-pdf>
PDF::Class:ver<0.1.0>:auth<github:p6-pdf>:api<PDF-1.7>
PDF::Content:ver<0.2.1>:auth<github:p6-pdf>:api<PDF-1.7>
PDF::Grammar:ver<0.1.5>:auth<github:p6-pdf>:api<PDF-1.7>
PDF:ver<0.2.8>:auth<github:p6-pdf>:api<PDF-1.7>
and perl6 version is:
perl6 -v
This is Rakudo version 2018.03 built on MoarVM version 2018.03
implementing Perl 6.c.
I have a script that uses PDF::API6 to extract pages from a larger PDF. For one particular file I'm getting an endless series of errors about coercing Int to UInt. when I run pdf-checker --repair here is what the errors look like
Warning: unable to coerce object -167 of type Int to UInt
Warning: unable to coerce object -167 of type Int to UInt
Error in 4649 0 R (PDF::Annot::Link) /StructParent entry: Int.StructParent: -167 not of type: UInt
It seems that coercing an Int to a UInt is something that should be easy.
I upgraded to the latest modules available with not change.
I'm running Ubuntu Linux 20.04.5
The plan is to encode core fonts using single byte encodings. To this end, Font::AFM calculates string-widths for a 0..255 (latin1) subset only.
The core fonts use an ExtendedRoman character set, which goes well past this. Consider:
% perl6 -MFont::Metrics::helvetica-bold -M Font::AFM -e 'my $hb = Font::Metrics::helvetica-bold.new; say $hb.Wx.keys (-) @Font::AFM::ISOLatin1Encoding'
set(Kcommaaccent, OE, guilsinglright, Umacron, radical, Sacute, oe, ecaron, dcaron, ohungarumlaut, perthousand, amacron, dagger, Nacute, Tcaron, notequal, lslash, quotesinglbase, Dcroat, lacute, fi, cacute, Ecaron, ncommaaccent, Zacute, umacron, ccaron, Aogonek, ncaron, zacute, nacute, summation, Amacron, Ncommaaccent, Ccaron, florin, lozenge, abreve, emdash, sacute, commaaccent, Emacron, scaron, endash, partialdiff, Ohungarumlaut, ellipsis, Rcaron, quotedblleft, zdotaccent, zcaron, Rcommaaccent, Lacute, scedilla, Ydieresis, fl, quotedblbase, Scaron, uring, edotaccent, omacron, tcaron, Zcaron, Omacron, Edotaccent, Racute, Euro, uhungarumlaut, racute, guilsinglleft, aogonek, lcaron, lessequal, greaterequal, tcommaaccent, rcommaaccent, gbreve, dcroat, Uogonek, Uhungarumlaut, Tcommaaccent, Zdotaccent, Lcommaaccent, Uring, trademark, Delta, Scedilla, emacron, imacron, Dcaron, Lcaron, Scommaaccent, Imacron, Lslash, Gcommaaccent, Cacute, Gbreve, quotesingle, gcommaaccent, Idotaccent, fraction, bullet, Ncaron, Eogonek, eogonek, quotedblright, lcommaaccent, iogonek, rcaron, kcommaaccent, scommaaccent, Iogonek, daggerdbl, Abreve, uogonek)
That's an extra 115 glyphs that fall outside of the latin-1 subset.
Will need a solution in the long term. I'm not sure if there's a nicer solution than moving to Identity-H.
Currently, the PDF::Reader.ind-obj method fully realizes DOM objects using an unnecessary and convoluted callback mechanism. This is currently needed to ensure objects are stantiated to the correct type.
But if all of PDF::DOM classes are converted to roles, We can then simply subclass the ind-obj method then apply the roles at run-time. Simpler, more conventional and easier to extend.
PDF::Class currently has just PDF::AdditionalActions (based on Table 194), but there are actually 4 types Table 194 Annotation, 195 Page , 196 Form and 197 Catalog
This module was mostly built using the pre 2016 ufo build-tool, which is no longer available.
It's now reliant on rakudo 2016.xx precompilation, which isn't yet fast to load or run. Now takes quite a few minutes to run the test-suite, with most of the time spent in precompilation and/or loading.
At this stage, I'm only regressing this module occasionally. There doesn't seem to be a fair bit of scope for optimization both in rakudo and within this module. May look at this again towards the end of 2016 or early in 2017.
The changes have been made with PR #17.
There's a few work arounds in the test suite.
In particular some of the tests have use PDF::Content::Util::TransformMatrix
, prior to use PDF
.
If they don't, dies as follows:
% perl6 -I lib t/helloworld.t
t/helloworld.t ..
ok 1 - MediaBox bad setter - dies
ok 2 - MediaBox bad setter - ignored
ok 3 - The object is-a 'PDF::Content::Text::Block'
ok 4 - The object is-a 'PDF::Content::Text::Block'
ok 5 - The object is-a 'PDF::Content::Text::Block'
ok 6 - $img.Width
Cannot invoke this object (REPR: Null; VMNull)
in sub multiply at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content/Util/TransformMatrix.pm (PDF::Content::Util::TransformMatrix) line 44
in method track-graphics at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content/Ops.pm (PDF::Content::Ops) line 706
in method op at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content/Ops.pm (PDF::Content::Ops) line 639
in method do at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content.pm (PDF::Content) line 187
in block at t/helloworld.t line 58
in method graphics at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content.pm (PDF::Content) line 51
in method graphics at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content/Graphics.pm (PDF::Content::Graphics) line 49
in block <unit> at t/helloworld.t line 22
I've also had to make some adjustments in PDF::Content::Op, which requires rather than using PDF::Content::Util::TransformMatrix.
Not sure why a simple dependency-fee set of functions is causing such mayhem.
I suspect that much of the material has been indirectly or directly produced from Adobe's internal data dictionaries and accurately represents the internal structure, as represented by the Adobe suite.
I've been manually constructing classes from the spec (which is what everyone seems to do); but a structured dump of the object definitions would greatly assist with checking the classes built so far, and with the completion of PDF::Class.
Hopefully achievable with the tools built to date. For example, both pdf-checker.p6
and pdf-toc.p6
are capable of scanning the specification PDF.
Dumping the tables in PDF Spec to JSON or some-such would be a big help. These seem to be reasonable well defined as such via the documents struct tree root.
A generic mechanism may be necessary to handle the 'must be an indirect reference; constraint on certain entries. For example the threads entry in the Catalog object see [PDF 1.7 TABLE 3.25 Entries in the catalog dictionary].
Most likely the entry
trait (Dictionaries) and index
trait Arrays needs an additional :ind-ref
argument that is somehow interpreted during Dict/Array ast constructuions and/or passed-through to the serializer. so that it 'knows' to construct an indirect object.
Write a script that traverses the PDF::Class document hierarchy and automatically generates documentation and a representation of the Zen object tree.
Could be it outputs markdown or html for publication.
That name's just to similar to the PHP utility https://github.com/dompdf/dompdf
PDF::Doc?
See the input file at:
It currently takes a long time for perl6 -I PDF -e0
A lot of classes are being eagerly loaded via use statements
.
At this stage, at least, it may make sense to make greater use of PDF::Type::Delegator to autoload these classes, on demand, when needed.
http://unicode.org/reports/tr14/ contains some useful info that can be used to improve and better generalize word/line breaking in the $.page.text() method.
Without going overboard, the text breaking method could make use of the non-breaking classes, break opportunities and better handle numeric context.
Port of Perl 5 Unicode::LineBreak might come in handy here.
After pdf-checker.raku --repair myFile.pdf ran and reported "completed with 0 warnings and 0 errors" I tried to display the file info with pdf-info.raku myFile.pdf and got the following error:
Type check failed in assignment to $box; expected List but got Positional[Numeric] (Positional[Numeric])
in sub MAIN at /home/tsalada/rakudo-2020.06/share/perl6/site/resources/564D3C75174AAD1D6ACB041994069B712F6EA8B9 line 51
in block at /home/tsalada/rakudo-2020.06/share/perl6/site/resources/564D3C75174AAD1D6ACB041994069B712F6EA8B9 line 7
in sub MAIN at /home/tsalada/rakudo-2020.06/share/perl6/site/bin/pdf-info.raku line 3
in block at /home/tsalada/rakudo-2020.06/share/perl6/site/bin/pdf-info.raku line 1
Via the newly released PDF:_SO32000_2 module. Work in progress on pdf-2.0 branch.
E.g. from t/helloworld.pdf:
<< /Info << /Author (t/helloworld.t) /Creator (PDF::Tools) >> /Root 1 0 R /Size 10 >>
startxref
10416
%%EOF
Info
entry is wrong. PDF spec says that it must be an indirect object.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.