Coder Social home page Coder Social logo

pdf-mode.el's Introduction

pdf-mode.el - Edit raw PDF files in Emacs

pdf-mode is a major mode for editing PDF files in Emacs. It's not perfect, but it should be a good starting point. It might be useful for the poor souls who are working on generating PDF; definitely useful to me.

Features:

  • parser for PDF entities

  • syntax highlighting based on the AST produced by the parser

  • find references to an object (M-x pdf-highlight-refs)

  • find definition of an object (M-x pdf-find-definition)

  • rewrites the xref section and stream /Length-s when saving a file (M-x pdf-fix-xrefs)

  • decompress stream at point (M-x pdf-inflate-stream)

  • easily insert a new object/stream (M-x pdf-new-object)

  • discard objects that are not referenced (M-x pdf-cleanup)

Default key bindings

(customize pdf-mode-map)

  • C-c C-o — insert a new object (at point; make sure the cursor is somewhere where an object makes sense). Pass a prefix argument (C-u) to make the new object a stream.

  • C-c C-e — decompress stream at point (pdf-inflate-stream).

  • M-? — highlight references to object/reference at point.

  • M-. — locate definition of object reference at point.

  • M-, — go back to previous location (after M-.).

  • M-a — move to beginning of thing at point (pdf-beginning-of-thing).

  • M-e — move to end of thing at point (pdf-end-of-thing).

  • C-c C-SPC — mark thing at point (pdf-mark-thing). The selection will be extended to parent nodes on subsequent calls.

pdf-fix-xrefs will run automatically before saving a file, so if that succeeds the new file should be valid (i.e. the xref and startxref sections should be properly updated). If there's a parse error, however, the file won't be saved at all. This is probably a bad idea; comments welcome.

pdf-fix-xrefs expects startxref to really be at the end of the file, and the trailer dictionary to precede it, as per the spec. The xref table can be anywhere in the page, but pdf-fix-xrefs will move it at the end just before the trailer. If no xref section exists, pdf-fix-xrefs won't mind and will just generate one.

Highlighting references with M-? (M-x pdf-highlight-refs) will enter a minor mode where the following additional key bindings are available:

  • C-<up> — move to the previous occurrence
  • C-<down> — move to the next occurrence
  • C-g or <escape> — remove highlighting and exit this mode

Limitations

Only PDF-1.4 non-linearized format is supported.

PDF-1.5 introduced “object streams”. See, before 1.5, PDF only had “stream objects”. Object streams are stream objects that can contain objects, but oddly enough, not stream objects. If this gibberish doesn't make any sense, and it doesn't, see the PDF spec, section 3.4.6—3.4.7, but only if you're okay with completely losing faith in humanity. Any case, starting with 1.5 there's a whole new definition of the cross-reference table optionally possible (object streams + XRef stream). For the time being, we don't support it.

For dealing with linearized PDFs or object streams, you can use qpdf:

qpdf --object-streams=disable -qdf input.pdf output.pdf

Now output.pdf should be a file that our little mode can work with.

Syntax highlighting is based on parsing the whole buffer, so if you throw a megabyte-order file at it it might feel pretty slow. I couldn't figure out how to make it work with Emacs' built-in font-locking support, because streams contain arbitrary data that can break the fragile regexp-based syntax highlighting (my bad for not trying hard enough to understand how do multi-line font locking with Emacs).

The parser might not work properly with DOS-style newlines.

Besides these known issues, there are of course an infinity of bugs. Please file issues or pull requests for the finite number of bugs you may find.

pdf-mode.el's People

Contributors

mishoo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.