Coder Social home page Coder Social logo

pdf-stamper's Introduction

pdf-stamper

Combine JSON template description with PDF template files and input data to form a complete PDF.

Stability

This project has been used in production for several years and I consider it pretty stable. The project is not under active development, but works well for the intended usecases.

Usage

Not much to see here yet.

Feeling adventurous? The library is on Clojars:

Clojars Project

Example: Template format

To give a quick example of how a PDF template description could look, this is an example of a template with a single hole:

{:name :template-one-hole
 :holes {:even [{:height 10.0
                 :width 10.0
                 :x 1.09
                 :y 3.12
                 :name :lonely-hole
                 :type :image
                 :priority 10}]
         :odd [{:height 10.0
                :width 10.0
                :x 1.09
                :y 3.12
                :name :lonely-hole
                :type :image
                :priority 10}]}}

It describes a hole on an actual PDF document page, where data (in this case an image) should be inserted. The data that makes pdf-stamper use the hole could look like:

{:template :template-one-hole
 :locations [{:lonely-hole {:contents {:image (java.io.BufferedImage. "an-image.jpg")}}}]}

See Documentation for further details.

Documentation

Documentation for both users and developers of pdf-stamper can be found in the Marginalia docs. Users can safely skip the documentation for pdf-stamper.text.parsed and pdf-stamper.text.pdfbox.

The documentation always describes the latest stable release; to generate the docs for a snapshot release, run lein marg.

Future

First:

  • Tests!

After that, in no particular order:

  • Hyphenation support
  • Text align for parsed text (always left-aligned now)
  • Control indentation of first line for parsed text
  • Add space after every line of paragraph
  • Pluggable text parser, to support other formats than XML (is this even relevant?)
  • Control the center-fold

Some of this will need a rewrite of the core to isolate side-effects at the edges, and make the internal API more data-driven in general.

Acknowledgments

  • Ingenium Golf (for letting me work on this)

License

Copyright © 2014-2018 Matthias Diehn Ingesman

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

pdf-stamper's People

Contributors

mdiin avatar waffle-iron avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

csk157

pdf-stamper's Issues

Template inheritance?

To combat boilerplate in the templates it might be an option to add template inheritance.

Keep with next paragraph

To prevent e.g. a heading from being the last paragraph on a page, it must be possible to specify that it should be put on the same page as the paragraph following it.

Only embed fonts that are actually used

Right now, all true type fonts in the context are embedded at the beginning of creating the PDF document. To reduce file size of generated documents it would be a good idea to only embed a font once it is actually requested by a template.

Widow and orphan protection

To improve the layouting of multi-line text fields, we need some kind of widow and orphan protection. I propose that it is a part of the template structure.

The algorithm for marking as `:broken` is faulty

There is a bug here, when a paragraph is exactly as long as the space given, the following paragraph will be marked as :broken.

Example

See the example in test/pdf_stamper/manual/overflow_bullets3.clj

On the third page of the result, the top line should contain a bullet.

Make templates more dynamic

Use case

We are building a hierarchy of templates, where one template is the base for the rest, and several modifier-templates exist. The modifiers mostly just increment an :x or :y value by a fixed amount; necessary because pages can contain different elements that might push some things around.

The template hierarchy is merged together to form the final templates.

What I'm proposing

In the above use case it would be very useful to specify on e.g. the :x and :y values a function to call instead of the regular number. The semantics of specifying that function would be something like this:

  1. Look up the previous value of the field, e.g. :x
  2. Apply the specified function to that value and insert the result as the new value of :x

In case there is no previous value of :x, there are two possibilities:

  1. When specifying the function, also specify a default value
    • To which the function is then applied, or which is used directly?
  2. Raise an error

Template merging algorithm

To remove potential combinatorial explosion of templates. Use something like this snippet:

(defn- make-templates
  "Naming scheme is a keyword with \"holes\" defined by $hole-name$. Example
  naming scheme:

  :rhubarb$part$

  Values inbetween $'s are matched to the :name of individual parts and replaced
  as needed. Example with the above naming scheme:

  parts = [base-all
           {:pdf-stamper/name "part"
            :pdf-stamper/optional? true
            :pdf-stamper/variants [{:value "flower" :template-part rhubarb-flower}
                                   {:value "roots" :template-part rhubarb-roots}]}]

  would yield templates with the names:

  [:rhubarbflower :rhubarbroots]

  And the appropriate template parts merged together in the order they are
  specified in the parts vector."
  [naming-scheme parts]
  (let [naming-scheme-replacement-map (into {} (map (comp vec reverse)
                                                    (re-seq #"\$([^\$]+)\$" naming-scheme)))]
    (loop [parts parts
           result []]
      ...)))

Accept URLs when adding fonts to the context

Right now only strings and input streams are handled correctly. Loading a resource from a jar (using clojure.java.io/resource) yields a URL instance, from which users have to construct an input stream (and remember to close it once the font has been embedded).

To make it easier to use and prevent potential memory leaks in client code, it should be possible to hand a URL instance to pdf-stamper.context/add-font and have pdf-stamper.context/embed-font open and close the input stream.

Images: Zoom to fill

The idea is to add an option to the :aspect key for image templates that allows images to be scaled 1:1, but where only a piece of the image is shown (i.e. it is cut out). It makes sense to cut from the center of the image.

Document cursor movements

i.e. bottom left is origin of PDF, and writing text moves the cursor down. All coordinates are to the bottom left corner of the element.

Treat even and odd pages differently

This could as an example be that even pages are rotated 180 degrees.

Maybe this is not so much a template thing as a page specification thing. If page specifications can say what should happen when they are placed on an even or odd page, things like the following would be possible:

  • Specify that a page spec must always be stamped on an even page, e.g. insert a blank page before if it is about to be stamped on an odd
    • It makes sense to be able to specify a template to use for the "blank" page
  • Flip the page horizontally
  • Flip the page horizontally if the page is even

Questions

Assuming an implementation where it is a part of the page data specification, these are some interesting things to ask oneself.

Q: What should happen when a page overflows?

A: Simply overflow and drop the knowledge of even/odd special behaviour?

Won't work if it is required that all even pages throughout the document are flipped horizontally.

A: Pass on all even/odd special behaviour?

Can result in the overflow inheriting the "print only on even pages" property, even though it should print on odd pages as well.

A: Only pass on some parts of the even/odd special behaviour?

Requires an up-front specification of which pieces of even/odd behaviour to pass on to the overflow pages.

Template validation on add

When adding a template description to the context it should be validated, and if invalid tell the user exactly what is wrong with it.

Would be nice to have coimpile-time if possible, but that should be optional, since clients might want to load templates dynamically or from resources not available at compile time.

How to handle missing templates?

When a page specifies a template that is not in the context, what should we do? As I see it there are two options:

  1. Throw a RuntimeException
  2. Skip the page

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.