Coder Social home page Coder Social logo

pdf-content-raku's Issues

Detect and avoid broken wordspace handling

The PDF 32000 spec discusses the limited applicability of the SetWordspace (Tw) operator:

"Word spacing shall be applied to every occurrence of the single-byte character code 32 in a string when using a simple font or a composite font that defines code 32 as a single-byte code. It shall not apply to occurrences of the byte value 32 in multiple-byte codes"

In short, the Tw operator wont work on multibyte encodings, such as Identity-H.

The flow on effect is that text block justification. is breaking for Identity-H encoded fonts loaded by PDF::Font::Loader. In general, we should warn if WordSpacing has been set, but will be ineffective.

This is at least, easily detected.

  • Text::Block should warn and explicitly ignore WordSpacing, when it wont work with a given font
  • Our current internal use of WordSpacing for justified text needs to be fixed. Most likely change it
    to use ShowSpaceText (TJ).

Threading support

The most obvious unit of parallilism, is a page or a set of pages. For example when rendering a book. It may
make sense to break it down into chapters which are separated by page breaks. Within the chapters
the text flow between pages requires them to be constructed in sequence.

The major shared resource between pages is fonts. Fonts need to be shared between pages and the
adapted encoding and glyph detection for subsetting needs to be thread-safe.

Fonts is likely to be be where much of the work is and there needs to be some locking somewhere. Either
(i) fine-grained locking in the font encoders or more coarse grained (ii) only allowing a font to be rendering once
on any given font.

(i) will impact both core fonts and PDF::Font::Loader, which will need to have locking support in all its
encoders (ii) is much simpler, but may result in bottlenecks if the rendering is text (rather than image)
intensive and a common font is being used across threads.

Tests failing

I'm using raku v.2019.11 .
When I try to install PDF::Content I see this message:

===> Testing: PDF::Content:ver<0.3.0>:authgithub:p6-pdf:api<PDF.1.7>
[PDF::Content] # Failed test 'content with comments'
[PDF::Content] # at t/ops.t line 235
[PDF::Content] # expected: $("175 720 m % MoveTo", "175 700 l % LineTo", "300 800 400 720 v % CurveToInitial", "h % ClosePath", "S % Stroke", "% That's all#2665!")
[PDF::Content] # got: $("175 720 m % MoveTo", "175 700 l % LineTo", "300 800 400 720 v % CurveToInitial", "h % ClosePath", "S % Stroke", "% That's all♥!")
[PDF::Content] # You failed 1 test of 109
===> Testing [FAIL]: PDF::Content:ver<0.3.0>:authgithub:p6-pdf:api<PDF.1.7>

Matrix seems to be missing a "reflect" method

My matrix algebra is very sketchy now since I haven't had to use it in years. But I am trying to compose a reflection followed by a rotation for a special method where a scaling followed or preceeded by a rotation doesn't give the expected or desired results. I found a Wikipedia article entitled Rotations and reflections in two dimensions which mathematically composes the two required transformations into a single reflection.

UPDATE

There is a slight difference in signs in the article versus those in Matrix.rakumod which seem to be whether a positive rotation is clockwise or counterclockwise. I am working on a PR. I've changed signs in my PR to go with the PDF::Content convention and the tests I've added seem to work.

Improve handling of BX .. EX extended content blocks

Unknown operators should be ignored in BX .. EX blocks PDF::3200 7.8.1.

Presumably, this also means that arguments to the operators, should be flushed.

Also we should preserve the operators. when writing the content stream.

Core fonts are being over-shared.

The core fonts are currently being cached at the process level. Which means that any font-specific encoding differences will accumulate if multiple PDFs are being produced.

This is surprising result if multiple PDF's are being written. There's some chance of the encoding table becoming exhausted if more that 255 glyphs are used from a given font across the processes.

$pdf.core-font should probably cache locally against the PDF.

Not sure yet what to do with $gfx.core-font, but will most likely discourage it.

Missing Hash::int dependency possibly

[PDF::Content] Please note that a 'META6.json' file was found in 't', of which the 'provides' section was used to determine if a dependency is available or not. Perhaps you need to add 'Hash::int' in the <provides> section of that file? Or need to specify a directory that does *not* have a 'META6.json' file?

Above message when installing using rakubrew and version says Currently running moar-2023.08

Possible it's another issue with my setup as seemed to be having conflicts with libraries installed with system raku. Installed Hash::int with zef and that issue passed.

Investigate cmap encoding for core fonts

Worth looking at how various readers will display unicode characters that are not part of the standard core fonts and/or mac/win
encoding. Worth experimenting yo see if a cmap encoding option adds value (single or double byte).

Removal of PDF::Content::Tag::{Elem,Object,Root} from PDF::Content

PDF::Content 0.4.3 has some loosely coupled support for writing marked content.

PDF::Tags (under construction) can read, but not write tagged content.

So there's two competing APIs, (one for writing and another for reading).

Proposal is to export the newly created 0.4.x functionality for creating tags to PDF::Tags, so that there's a single API for both reading and writing tags.

Will leave the ability to read and write content stream tags in place (may rename). Will remove anything related to the struct tree.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.