pdf-raku / pdf-class-raku Goto Github PK

PDF Document Object Model (under construction)

License: Artistic License 2.0

Raku 99.81% Makefile 0.19%

pdf pdf-objects raku-module

pdf-class-raku's Issues

Re-Installing PDF::Class Failed

After upgrading to the latest Rakudo (2021.08-01), re-installing PDF::Class failed. The error messages are in the attached file.

errMsg_Re-Installing_PDF_Class.txt

Install fails. CentOS/Rakudo version 2020.02.1 built on MoarVM version 2020.02.1

http://repo.westus.cloudapp.azure.com/rakudist/reports/PDF%3A%3AClass/centos/1586017722.txt

HTH

Regards

Aleksei

/Resources entry placed in /Pages not /Page

A bit of an anomaly w.r.t to PDF::Lite, for example t/helloword.pdf has '<< /Type /Pages /Resources <<...>> >>' in the page tree, not at the page-level. Other PDF's similarly affected.

Error during installation (testing phase)

Hello,
I'm trying to upgrade this module, because I have an error message processing some PDF files using PDF::API6:
Probable version skew in pre-compiled [...] (PDF::Font)' (cause: no object at index 776)
But I'm getting this error during the testing phase:

$ zef install 'PDF::Class:ver<0.1.2>'
===> Searching for: PDF::Class
===> Testing: PDF::Class:ver<0.1.2>:auth<github:p6-pdf>:api<PDF-1.7>
eval error:     use PDF::Class;
    use PDF::Catalog;
    my PDF::Class $pdf .= new;

    my PDF::Catalog $doc = $pdf.catalog;
    try {
        $doc.PageMode   = 'UseToes';
        CATCH { default { say "err, that didn't work: $_" } }
    }

    # same again, bypassing type checking
    $doc<PageMode>  = :name<UseToes>;

  in block  at t/00-readme.t line 25
unknown /ShadingType 42 - supported range is 1..7
No Doc handler class [PDF PDF::COS::Type]::Unknown
No Doc handler class [PDF PDF::COS::Type]::Annot::Caret
Probable version skew in pre-compiled '/home/nando/.zef/store/PDF-Class-0.1.2.tar.gz/PDF-Class-0.1.2/lib/PDF/OutputIntent.pm (PDF::OutputIntent)' (cause: no object at index 776)
  in method find-delegate at /home/nando/.zef/store/PDF-Class-0.1.2.tar.gz/PDF-Class-0.1.2/lib/PDF/Class/Loader.pm (PDF::Class::Loader) line 32
  in method load-delegate at /home/nando/.zef/store/PDF-Class-0.1.2.tar.gz/PDF-Class-0.1.2/lib/PDF/Class/Loader.pm (PDF::Class::Loader) line 105
  in block <unit> at t/load-delegate.t line 23

# Looks like you planned 18 tests, but ran 11
===> Testing [FAIL]: PDF::Class:ver<0.1.2>:auth<github:p6-pdf>:api<PDF-1.7>
Aborting due to test failure: PDF::Class:ver<0.1.2>:auth<github:p6-pdf>:api<PDF-1.7> (use --force-test to override)
  in code  at /opt/rakudo-pkg/share/perl6/sources/8244C3B17ACA61B0EC04857BB3283A8FAF7A186D (Zef::Client) line 374
  in method test at /opt/rakudo-pkg/share/perl6/sources/8244C3B17ACA61B0EC04857BB3283A8FAF7A186D (Zef::Client) line 354
  in code  at /opt/rakudo-pkg/share/perl6/sources/8244C3B17ACA61B0EC04857BB3283A8FAF7A186D (Zef::Client) line 523
  in sub  at /opt/rakudo-pkg/share/perl6/sources/8244C3B17ACA61B0EC04857BB3283A8FAF7A186D (Zef::Client) line 520
  in method install at /opt/rakudo-pkg/share/perl6/sources/8244C3B17ACA61B0EC04857BB3283A8FAF7A186D (Zef::Client) line 621
  in sub MAIN at /opt/rakudo-pkg/share/perl6/sources/81436475BD18D66BFD96BBCEE07CCCDC0F368879 (Zef::CLI) line 152
  in block <unit> at /opt/rakudo-pkg/share/perl6/resources/D822DF07A6D5CB602F97ED307F62A1B3B5D2C90D line 3
  in sub MAIN at /opt/rakudo-pkg/bin/zef line 2
  in block <unit> at /opt/rakudo-pkg/bin/zef line 2

I have the following modules installed:

$ zef list --installed|grep PDF
===> Found via /opt/rakudo-pkg/share/perl6
===> Found via /home/nando/.perl6
PDF::API6:ver<0.1.0>:auth<github:p6-pdf>
PDF::Class:ver<0.1.0>:auth<github:p6-pdf>:api<PDF-1.7>
PDF::Content:ver<0.2.1>:auth<github:p6-pdf>:api<PDF-1.7>
PDF::Grammar:ver<0.1.5>:auth<github:p6-pdf>:api<PDF-1.7>
PDF:ver<0.2.8>:auth<github:p6-pdf>:api<PDF-1.7>

and perl6 version is:

perl6 -v
This is Rakudo version 2018.03 built on MoarVM version 2018.03
implementing Perl 6.c.

Error Converting Int to UInt

I have a script that uses PDF::API6 to extract pages from a larger PDF. For one particular file I'm getting an endless series of errors about coercing Int to UInt. when I run pdf-checker --repair here is what the errors look like

Warning: unable to coerce object -167 of type Int to UInt
Warning: unable to coerce object -167 of type Int to UInt
Error in 4649 0 R (PDF::Annot::Link) /StructParent entry: Int.StructParent: -167 not of type: UInt

It seems that coercing an Int to a UInt is something that should be easy.
I upgraded to the latest modules available with not change.
I'm running Ubuntu Linux 20.04.5

core fonts and single byte encodings - not a comfortable fit

The plan is to encode core fonts using single byte encodings. To this end, Font::AFM calculates string-widths for a 0..255 (latin1) subset only.

The core fonts use an ExtendedRoman character set, which goes well past this. Consider:

% perl6 -MFont::Metrics::helvetica-bold -M Font::AFM -e 'my $hb = Font::Metrics::helvetica-bold.new; say $hb.Wx.keys (-) @Font::AFM::ISOLatin1Encoding'
set(Kcommaaccent, OE, guilsinglright, Umacron, radical, Sacute, oe, ecaron, dcaron, ohungarumlaut, perthousand, amacron, dagger, Nacute, Tcaron, notequal, lslash, quotesinglbase, Dcroat, lacute, fi, cacute, Ecaron, ncommaaccent, Zacute, umacron, ccaron, Aogonek, ncaron, zacute, nacute, summation, Amacron, Ncommaaccent, Ccaron, florin, lozenge, abreve, emdash, sacute, commaaccent, Emacron, scaron, endash, partialdiff, Ohungarumlaut, ellipsis, Rcaron, quotedblleft, zdotaccent, zcaron, Rcommaaccent, Lacute, scedilla, Ydieresis, fl, quotedblbase, Scaron, uring, edotaccent, omacron, tcaron, Zcaron, Omacron, Edotaccent, Racute, Euro, uhungarumlaut, racute, guilsinglleft, aogonek, lcaron, lessequal, greaterequal, tcommaaccent, rcommaaccent, gbreve, dcroat, Uogonek, Uhungarumlaut, Tcommaaccent, Zdotaccent, Lcommaaccent, Uring, trademark, Delta, Scedilla, emacron, imacron, Dcaron, Lcaron, Scommaaccent, Imacron, Lslash, Gcommaaccent, Cacute, Gbreve, quotesingle, gcommaaccent, Idotaccent, fraction, bullet, Ncaron, Eogonek, eogonek, quotedblright, lcommaaccent, iogonek, rcaron, kcommaaccent, scommaaccent, Iogonek, daggerdbl, Abreve, uogonek)

That's an extra 115 glyphs that fall outside of the latin-1 subset.

Will need a solution in the long term. I'm not sure if there's a nicer solution than moving to Identity-H.

Convert PDF::DOM::* to roles. Simplify PDF::DOM::Delegator and PDF::Reader interface

Currently, the PDF::Reader.ind-obj method fully realizes DOM objects using an unnecessary and convoluted callback mechanism. This is currently needed to ensure objects are stantiated to the correct type.

But if all of PDF::DOM classes are converted to roles, We can then simply subclass the ind-obj method then apply the roles at run-time. Simpler, more conventional and easier to extend.

Additional actions modelling is incorrect

PDF::Class currently has just PDF::AdditionalActions (based on Table 194), but there are actually 4 types Table 194 Annotation, 195 Page , 196 Form and 197 Catalog

This module has gone Glacial - on the back-burner

This module was mostly built using the pre 2016 ufo build-tool, which is no longer available.

It's now reliant on rakudo 2016.xx precompilation, which isn't yet fast to load or run. Now takes quite a few minutes to run the test-suite, with most of the time spent in precompilation and/or loading.

At this stage, I'm only regressing this module occasionally. There doesn't seem to be a fair bit of scope for optimization both in rakudo and within this module. May look at this again towards the end of 2016 or early in 2017.

Changes are required for the rename of PDF::Writer(Reader) to PDF::IO:Writer(Reader)

The changes have been made with PR #17.

precompilation fragilities

There's a few work arounds in the test suite.

In particular some of the tests have use PDF::Content::Util::TransformMatrix, prior to use PDF.

If they don't, dies as follows:

% perl6 -I lib t/helloworld.t
t/helloworld.t .. 
ok 1 - MediaBox bad setter - dies
ok 2 - MediaBox bad setter - ignored
ok 3 - The object is-a 'PDF::Content::Text::Block'
ok 4 - The object is-a 'PDF::Content::Text::Block'
ok 5 - The object is-a 'PDF::Content::Text::Block'
ok 6 - $img.Width
Cannot invoke this object (REPR: Null; VMNull)
  in sub multiply at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content/Util/TransformMatrix.pm (PDF::Content::Util::TransformMatrix) line 44
  in method track-graphics at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content/Ops.pm (PDF::Content::Ops) line 706
  in method op at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content/Ops.pm (PDF::Content::Ops) line 639
  in method do at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content.pm (PDF::Content) line 187
  in block  at t/helloworld.t line 58
  in method graphics at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content.pm (PDF::Content) line 51
  in method graphics at /home/david/git/perl6-PDF/../perl6-PDF-Content/lib/PDF/Content/Graphics.pm (PDF::Content::Graphics) line 49
  in block <unit> at t/helloworld.t line 22

I've also had to make some adjustments in PDF::Content::Op, which requires rather than using PDF::Content::Util::TransformMatrix.

Not sure why a simple dependency-fee set of functions is causing such mayhem.

todo mine the PDF 32000 specification

I suspect that much of the material has been indirectly or directly produced from Adobe's internal data dictionaries and accurately represents the internal structure, as represented by the Adobe suite.

I've been manually constructing classes from the spec (which is what everyone seems to do); but a structured dump of the object definitions would greatly assist with checking the classes built so far, and with the completion of PDF::Class.

Hopefully achievable with the tools built to date. For example, both pdf-checker.p6 and pdf-toc.p6 are capable of scanning the specification PDF.

Dumping the tables in PDF Spec to JSON or some-such would be a big help. These seem to be reasonable well defined as such via the documents struct tree root.

Handle 'must be indirect reference' constraints in spec

A generic mechanism may be necessary to handle the 'must be an indirect reference; constraint on certain entries. For example the threads entry in the Catalog object see [PDF 1.7 TABLE 3.25 Entries in the catalog dictionary].

Most likely the entry trait (Dictionaries) and index trait Arrays needs an additional :ind-ref argument that is somehow interpreted during Dict/Array ast constructuions and/or passed-through to the serializer. so that it 'knows' to construct an indirect object.

todo build class documenter tool

Write a script that traverses the PDF::Class document hierarchy and automatically generates documentation and a representation of the Zen object tree.

Could be it outputs markdown or html for publication.

Cooling on PDF::DOM as a module name

That name's just to similar to the PHP utility https://github.com/dompdf/dompdf

PDF::Doc?

Dump when executing: "pdf-burst.raku --page=1 --save-as=t.pdf irrigation-plan.pdf"

See the input file at:

irrigation-plan.pdf

Rely a lot more on PDF::Type::Delegator autoloading

It currently takes a long time for perl6 -I PDF -e0

A lot of classes are being eagerly loaded via use statements.

At this stage, at least, it may make sense to make greater use of PDF::Type::Delegator to autoload these classes, on demand, when needed.

Text Block line-break handling WRT to Unicode® Standard Annex #14

http://unicode.org/reports/tr14/ contains some useful info that can be used to improve and better generalize word/line breaking in the $.page.text() method.

Without going overboard, the text breaking method could make use of the non-breaking classes, break opportunities and better handle numeric context.

Port of Perl 5 Unicode::LineBreak might come in handy here.

Script pdf-info throwing exception after pdf-checker reports no error

After pdf-checker.raku --repair myFile.pdf ran and reported "completed with 0 warnings and 0 errors" I tried to display the file info with pdf-info.raku myFile.pdf and got the following error:

Type check failed in assignment to $box; expected List but got Positional[Numeric] (Positional[Numeric])
in sub MAIN at /home/tsalada/rakudo-2020.06/share/perl6/site/resources/564D3C75174AAD1D6ACB041994069B712F6EA8B9 line 51
in block at /home/tsalada/rakudo-2020.06/share/perl6/site/resources/564D3C75174AAD1D6ACB041994069B712F6EA8B9 line 7
in sub MAIN at /home/tsalada/rakudo-2020.06/share/perl6/site/bin/pdf-info.raku line 3
in block at /home/tsalada/rakudo-2020.06/share/perl6/site/bin/pdf-info.raku line 1

Work in progress on PDF 2.0 support.

Via the newly released PDF:_SO32000_2 module. Work in progress on pdf-2.0 branch.

Trailer dict not serializing indirect objects

E.g. from t/helloworld.pdf:

<< /Info << /Author (t/helloworld.t) /Creator (PDF::Tools) >> /Root 1 0 R /Size 10 >>
startxref
10416
%%EOF

Info entry is wrong. PDF spec says that it must be an indirect object.

pdf-raku / pdf-class-raku Goto Github PK

pdf-class-raku's Issues

Recommend Projects

Recommend Topics

Recommend Org