qpdf / qpdf Goto Github PK

QPDF: A content-preserving PDF document transformer

License: Apache License 2.0

Shell 1.66% Perl 13.73% C++ 72.59% C 6.89% PostScript 0.09% Dockerfile 0.03% Batchfile 0.02% Roff 1.46% Python 1.79% Hack 0.23% CSS 0.01% CMake 1.45% Emacs Lisp 0.04% Raku 0.02%

pdf pdf-document-processor

qpdf's Introduction

QPDF

QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption, and numerous other features. It can also be used for splitting and merging files, creating PDF files (but you have to supply all the content yourself), and inspecting files for study or analysis. QPDF does not render PDFs or perform text extraction, and it does not contain higher-level interfaces for working with page contents. It is a low-level tool for working with the structure of PDF files and can be a valuable tool for anyone who wants to do programmatic or command-line-based manipulation of PDF files.

The QPDF Manual is hosted online at https://qpdf.readthedocs.io. The project website is https://qpdf.sourceforge.io. The source code repository is hosted at GitHub: https://github.com/qpdf/qpdf.

Verifying Distributions

The public key used to sign qpdf source distributions has fingerprint C2C9 6B10 011F E009 E6D1 DF82 8A75 D109 9801 2C7E and can be found at https://q.ql.org/pubkey.asc or downloaded from a public key server.

Copyright, License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

You may also see the license in the file LICENSE.txt in the source distribution.

Versions of qpdf prior to version 7 were released under the terms of version 2.0 of the Artistic License. At your option, you may continue to consider qpdf to be licensed under those terms. Please see the manual for additional information. The Artistic License appears in the file Artistic-2.0 in the source distribution.

Prerequisites

QPDF requires a C++ compiler that supports C++-17.

To compile and link something with qpdf, you can use pkg-config with package name libqpdf or cmake with package name qpdf. Here's an example of a CMakeLists.txt file that builds a program with the qpdf library:

cmake_minimum_required(VERSION 3.16)
project(some-application LANGUAGES CXX)
find_package(qpdf)
add_executable(some-application some-application.cc)
target_link_libraries(some-application qpdf::libqpdf)

QPDF depends on the external libraries zlib and jpeg. The libjpeg-turbo library is also known to work since it is compatible with the regular jpeg library, and QPDF doesn't use any interfaces that aren't present in the straight jpeg8 API. These are part of every Linux distribution and are readily available. Download information appears in the documentation. For Windows, you can download pre-built binary versions of these libraries for some compilers; see README-windows.md for additional details.

Depending on which crypto providers are enabled, then GnuTLS and OpenSSL may also be required. This is discussed more in Crypto providers below.

Detailed information appears in the manual.

Licensing terms of embedded software

QPDF makes use of zlib and jpeg libraries for its functionality. These packages can be downloaded separately from their own download locations. If the optional GnuTLS or OpenSSL crypto providers are enabled, then GnuTLS and/or OpenSSL are also required.

Please see the NOTICE file for information on licenses of embedded software.

Crypto providers

qpdf can use different crypto implementations. These can be selected at compile time or at runtime. The native crypto implementations that were used in all versions prior to 9.1.0 are still present, but they are not built into qpdf by default if any external providers are available at build time.

The following providers are available:

gnutls: an implementation that uses the GnuTLS library to provide crypto; causes libqpdf to link with the GnuTLS library
openssl: an implementation that can use the OpenSSL (or BoringSSL) libraries to provide crypto; causes libqpdf to link with the OpenSSL library
native: a native implementation where all the source is embedded in qpdf and no external dependencies are required

The default behavior is for cmake to discover which other crypto providers can be supported based on available external libraries, to build all available external crypto providers, and to use an external provider as the default over the native one. By default, the native crypto provider will be used only if no external providers are available. This behavior can be changed with various cmake options as described in the manual.

Note about weak cryptographic algorithms

The PDF file format used to rely on RC4 for encryption. Using 256-bit keys always uses AES instead, and with 128-bit keys, you can elect to use AES. qpdf does its best to warn when someone is writing a file with weak cryptographic algorithms, but qpdf must always retain support for being able to read and even write files with weak encryption to be able to fully support older PDF files and older PDF readers.

Building from source distribution on UNIX/Linux

Starting with version 11, qpdf builds with cmake. The default configuration with cmake works on most systems. On Windows, you can build qpdf with Visual Studio using cmake without having any additional tools installed. However, to run the test suite, you need MSYS2, and you also need MSYS2 to build with mingw.

Example UNIX/Linux build:

cmake -S . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build

Example mingw build from an MSYS2 mingw shell:

cmake -S . -B build -G 'MSYS Makefiles' -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build

Example MSVC build from an MSYS shell or from a Windows command shell with Visual Studio command-line tools in the path:

cmake -S . -B build
cmake --build build --config Release

Installation can be done with cmake --install. Packages can be made with cpack.

The tests use qtest, and the test driver is invoked by ctest. To see the real underlying tests, run ctest --verbose so that you can see qtest's output. If you need to turn off qtest's color output, pass -DQTEST_COLOR=0 to cmake.

For additional information, please refer to the manual.

Building on Windows

qpdf is known to build and pass its test suite with mingw and Microsoft Visual C++. Both 32-bit and 64-bit versions work. In addition to the manual, see README-windows.md for more details on how to build under Windows.

Building Documentation

The QPDF manual is written in reStructured Text format and is build with sphinx. The sources to the user manual can be found in the manual directory. For more detailed information, consult the Building and Installing QPDF section of the manual or consult the build-doc script.

Additional Notes on Build

qpdf provides cmake configuration files and pkg-config files. They support static and dynamic linking. In general, you do not need the header files from qpdf's dependencies to be available to builds that use qpdf. The only exception to this is that, if you include Pl_DCT.hh, you need header files from libjpeg. Since this is a rare case, qpdf's cmake and pkg-config files do not automatically add a JPEG include path to the build. If you are using Pl_DCT explicitly, you probably already have that configured in your build.

To learn about using the library, please read comments in the header files in include/qpdf, especially QPDF.hh, QPDFObjectHandle.hh, and QPDFWriter.hh. These are the best sources of documentation on the API. You can also study the code of QPDFJob.cc, which exercises most of the public interface. There are additional example programs in the examples directory.

Additional Notes on Test Suite

By default, slow tests and tests that require dependencies beyond those needed to build qpdf are disabled. Slow tests include image comparison tests and large file tests. Image comparison tests can be enabled by setting the QPDF_TEST_COMPARE_IMAGES environment variable to 1. Large file tests can be enabled setting the QPDF_LARGE_FILE_TEST_PATH environment variable to the absolute path of a directory with at least 11 GB of free space that can handle files over 4 GB in size. On Windows, this should be a Windows path (e.g. C:\LargeFileTemp even if the build is being run from an MSYS2 environment. The test suite provides nearly full coverage even without these tests. Unless you are making deep changes to the library that would impact the contents of the generated PDF files or testing this on a new platform for the first time, there is no real reason to run these tests. If you're just running the test suite to make sure that qpdf works for your build, the default tests are adequate.

If you are packaging qpdf for a distribution and preparing a build that is run by an autobuilder, you may want to pass -DSHOW_FAILED_TEST_OUTPUT=1 to cmake and run ctest with the --verbose or --output-on-failure option. This way, if the test suite fails, test failure detail will be included in the build output. Otherwise, you will have to have access to the qtest.log file from the build to view test failures. The Debian packages for qpdf enable this option. More notes for packagers can be found in the manual.

Random Number Generation

By default, qpdf uses the crypto provider for generating random numbers. The rest of this applies only if you are using the native crypto provider.

If the native crypto provider is in use, then, when qpdf detects either the Windows cryptography API or the existence of /dev/urandom, /dev/arandom, or /dev/random, it uses them to generate cryptographically secure random numbers. If none of these conditions are true, the build will fail with an error. This behavior can be modified in several ways:

If you use the cmake option SKIP_OS_SECURE_RANDOM or define the SKIP_OS_SECURE_RANDOM preprocessor symbol, qpdf will not attempt to use Windows cryptography or the random device. You must either supply your own random data provider or allow use of insecure random numbers.
If you turn on the cmake option USE_INSECURE_RANDOM or define the USE_INSECURE_RANDOM preprocessor symbol, qpdf will try insecure random numbers if OS-provided secure random numbers are disabled. This is not a fallback. In order for insecure random numbers to be used, you must also disable OS secure random numbers since, otherwise, failure to find OS secure random numbers is a compile error. The insecure random number source is stdlib's random() or rand() calls. These random numbers are not cryptography secure, but the qpdf library is fully functional using them. Using non-secure random numbers means that it's easier in some cases to guess encryption keys.
In all cases, you may supply your own random data provider. To do this, derive a class from qpdf/RandomDataProvider (since version 5.1.0) and call QUtil::setRandomDataProvider before you create any QPDF objects. If you supply your own random data provider, it will always be used even if support for one of the other random data providers is compiled in. If you wish to avoid any possibility of your build of qpdf from using anything but a user-supplied random data provider, you can define SKIP_OS_SECURE_RANDOM and not USE_INSECURE_RANDOM. In this case, qpdf will throw a runtime error if any attempt is made to generate random numbers and no random data provider has been supplied.

Acknowledgments

The qpdf project has a JetBrains license through their Open Source Program. We are grateful for this program and have been enjoying the benefits of their high-quality products.

qpdf's People

Contributors

Stargazers

Watchers

Forkers

smilingthax jamzo priyamvadadevi fenlis distrotech ftzdomino vlastachu manphiz nickxia007 jberkenbilt darko8 sdmathis kcrum jacmet gnarus ouyangyifan uts-ikehata liminggang jiajun85 ams-tschoening curehsu murmele mromson juedingfengzm nagyistge anclark jarnoh yiqideren ofir-bananaz frieshansen semtle slurdge psmlbhor zdohnal kilburn abhi-infrrd deepfriedbrain bl4ck01 rohieb openube t246246 probonopd rojasc liasica ilovezfs kurtpfeifle johan718 lichnak b3achley wheelcomplex murgeye smartree inic supzhou fajarlabs daniel-007 sundeepnarang brydzu ariia-git rjshaver neotim mouhamedfd alexis-idigo jbarlow83 lili19810920 github9800 dmsheets junrrein linhcao1611 debugoo lawguiren pidugusundeep bottlis trisoil developervishaldhawan zvoronz thuzarwin shalevy1 xuechuance zpion-id elijah90 magastzheng trueroad polluks lxgrxd developer191 rivy hellogithubcomeon shw0315 edison-cbs cloudmersive mb720 paulyc marshall-brown minielectron napasa leo-neat gustavodiasdev marcoscarpetta deanscarff

qpdf's Issues

Using --qdf on PDF from Inkscape can't get text

I have used the PDF at http://www.extractpdf.com/ and the text comes out perfectly. But when using qpdf --qdf in.pdf out.pdf I am not able to see the text streams.

May I please send the file to you for quick advice, do not want to post publicly.

Thank you.

Decrypting signed documents

Hi,

Couldn't find any info on this, but I'm trying to remove the protection on a PDF file that has be securely signed. There is no password, and it is a legitimate document, but adobe won't allow me to combine it with another pdf when the security is there, so I want to remove it.

When I try to use qpdf, I receive this error: "(encryption dictionary, file position 9336): unsupported encryption filter". Is this a known limitation?

Assertion failure whilst checking (presumably malformed) linerization tables.

Hi,

I'm using qpdf --check-linearization as a cross-check to our own cpdf -print-linearization whilst making changes to our lineariztion code.

One (presumably broken) file, which I will supply in a moment, causes an assertion failure rather than a useful message:

feast:trunk john$ qpdf --check-linearization cpdf.pdf
Assertion failed: (shared_idx_to_obj.count(idx) > 0), function checkHPageOffset, file libqpdf/QPDF_linearization.cc, line 805.
Abort trap: 6

Right, now time to put my head back into Adobe's amusingly mercurial description of linearization :-)

Read Arguments from STDIN

When calling QPDF from an external process it would be useful to push arguments via standard input. This would help with automated merges that are called on a set of files with unpredictable length and unpredictable file names.

Syntax for merging entire PDF files

If I needed to merge the entire a1.pdf and a2.pdf into b.pdf, is the best way to do it with:

qpdf --empty --pages a1.pdf 1-z a2.pdf 1-z -- b.pdf

Is there a way it can be called without specifying 1-z for each page? It would ideal if I could call it with something like:

qpdf --empty --pages a1.pdf a2.pdf -- b.pdf

which would then allow me to call it with wildcards like:

qpdf --empty --pages a*.pdf -- b.pdf

Thanks for your help.

Is it possible to extract attachments from PDF file using QPDF?

Hi @jberkenbilt ,

Is it possible to extract attachments from PDF file using QPDF?
I could not find any documentation on this.

But #6 , talks about "extract attachments tests failing...."

Can you guide me on this?

Thanks

Build errors on 32 bit linux with 5.1.2

I see, after a successful ./configure invocation, when running "make"

libtool: compile: unable to infer tagged configuration
libtool: compile: specify a tag with `--tag`

with the latest version

It seems to work on 64 bit fine.

I'm trying older versions now to see when this started to fail...

'--show-xref' under specific circumstances returns wrong info

I've stumbled across a small glitch when checking PDF files with qpdf --show-xref.

Here is a handwritten PDF file to demonstrate the problem:

%PDF-1.1

1 0 obj
<</Kids[2 0 R]/Count 1/Type/Pages>>
endobj

2 0 obj
<</Parent 1 0 R/Resources 3 0 R/MediaBox[0 0 595 842]/Contents[4 0 R]/Type/Page>>
endobj

3 0 obj
<</Font<</F1<</Type/Font/Subtype/Type1/BaseFont/Times-Italic>> >> >>
endobj

4 0 obj
<</Length  80>>
stream
   1    0    0    1    60    780    cm
BT
  /F1 48 Tf
  (Hello, TROOPERS!)Tj
ET
endstream
endobj


% here is a comment


5 0 obj
<</Pages 1 0 R/Type/Catalog>>
endobj

xref
0 6
0000000000 65535 f 
0000000010 00000 n 
0000000062 00000 n 
0000000169 00000 n 
0000000245 00000 n 
0000000374 00000 n 
trailer

<</Root 5 0 R/Size 6>>
startxref
4530
%%EOF

_{(Be careful if you copy'n'paste this example from this page to use the correct EOL convention, and to have the trailing blanks in the xref-table included, otherwise it will become invalid for other reasons than the ones I constructed into it!)}

This file has two wrong entries:

the startxref value should be 453 (not 4530).
the fifth object's offset should be 407 (not 374).

Running `qpdf --show-xref` against the above file

When running qpdf --show-xref against this file, this is the (expected) result:

WARNING: qpdf--show-xref-is-wrong.pdf: file is damaged
WARNING: qpdf--show-xref-is-wrong.pdf (file position 4430): xref not found
WARNING: qpdf--show-xref-is-wrong.pdf: Attempting to reconstruct cross-reference table
1/0: uncompressed; offset = 10
2/0: uncompressed; offset = 62
3/0: uncompressed; offset = 160
4/0: uncompressed; offset = 245
5/0: uncompressed; offset = 407
qpdf: operation succeeded with warnings; resulting file may have some problems

So qpdf...

...discovers and reports the wrong startxref value;
...determines and reports the correct byte offset (407) for object no. 5.

Now change the startxref value to the correct 453 one:

xref
0 6
0000000000 65535 f 
0000000010 00000 n 
0000000062 00000 n 
0000000169 00000 n 
0000000245 00000 n 
0000000374 00000 n 
trailer

<</Root 5 0 R/Size 6>>
startxref
453
%%EOF

Running `qpdf --show-xref` against the modified file (with correct `startxref`)

When running qpdf --show-xref against the modified file (correct startxref, incorrect fifth object's byte offset entry) this is the (unexpected) result:

1/0: uncompressed; offset = 10
2/0: uncompressed; offset = 62
3/0: uncompressed; offset = 169
4/0: uncompressed; offset = 245
5/0: uncompressed; offset = 374

So qpdf...

...no longer discovers the incorrect fifth object's byte offset entry and reports just what's in the existing xref table.

You can also put 373, 375, 404, 404 or 406 into the xref for object no. 5 and qpdf will not discover the wrong entry.

This is probably caused by added empty and comment lines in between objects no. 4 and 5.

If you put any value that is in the range of 376..404 into the xref for object no. 5 then qpdf will report the wrong entry:

WARNING: qpdf--show-xref-is-wrong.pdf: file is damaged
WARNING: qpdf--show-xref-is-wrong.pdf (object 5 0, file position 390): expected n n obj
WARNING: qpdf--show-xref-is-wrong.pdf: Attempting to reconstruct cross-reference table
1/0: uncompressed; offset = 10
2/0: uncompressed; offset = 62
3/0: uncompressed; offset = 160
4/0: uncompressed; offset = 245
5/0: uncompressed; offset = 407
qpdf: operation succeeded with warnings; resulting file may have some problems

WARNING: qpdf--show-xref-is-wrong.pdf (object 5 0, file position 409): expected endobj
operation for Dictionary object attempted on object of wrong type

_{(Note the two different types of error messages depending on the actual wrong value you insert into the xref table)}

This is probably caused by the fact that when it checks the byte offsets 376..404 it's inside the commented line an the next byte doesn't look at all like the start of an object.

Also, I noticed that in many cases an *off-by-one deviation from the correct value is not reported: whenever the offset byte is identical with an EOL character. It may be "tolerable" and it may be what all PDF readers out there may be tolerant of -- but strictly speaking: is it correct?*

For now I work around this glitch like this:

Whenever I want to let QPDF check, compute and report the complete xref table, I make sure to insert a wrong startxref entry into the file first (or comment out the line with the correct value).

qpdf SOMETIMES remains hanging in QPDFWriter::popPipelineStack()

hi,

release compiled qpdf SOMETIMES remains hanging in the
QPDFWriter::popPipelineStack(PointerHolder* bp) function's
while cycle (see below)

   delete this->pipeline_stack.back();
    this->pipeline_stack.pop_back();
    while (dynamic_cast<Pl_Count*>(this->pipeline_stack.back()) == 0)
    {
        Pipeline* p = this->pipeline_stack.back();
        this->pipeline_stack.pop_back();
        Pl_Buffer* buf = dynamic_cast<Pl_Buffer*>(p);
        if (bp && buf)
        {
            *bp = buf->getBuffer();
        }
        delete p;
    }
    this->pipeline = dynamic_cast<Pl_Count*>(this->pipeline_stack.back());

with the following call stack:

(gdb) bt
#0  0x60000000e10d5ec0:1 in QPDFWriter::popPipelineStack ()
    at ../../../tgs/t3/iqpdf/src/libqpdf/QPDFWriter.cc:909
#1  0x60000000e10e2050:0 in QPDFWriter::unparseObject ()
    at ../../../tgs/t3/iqpdf/src/libqpdf/QPDFWriter.cc:1520
#2  0x60000000e10dd130:0 in QPDFWriter::unparseObject ()
    at ../../../tgs/t3/iqpdf/src/libqpdf/QPDFWriter.cc:1192
#3  0x60000000e10d7740:0 in QPDFWriter::writeObject ()
    at ../../../tgs/t3/iqpdf/src/libqpdf/QPDFWriter.cc:1815
#4  0x60000000e10c9380:0 in QPDFWriter::writeStandard ()
    at ../../../tgs/t3/iqpdf/src/libqpdf/QPDFWriter.cc:3032
#5  0x60000000e10bab50:0 in QPDFWriter::write ()
    at ../../../tgs/t3/iqpdf/src/libqpdf/QPDFWriter.cc:2323
#6  0x60000000e1154d50:0 in ppdf::CPdfCodecQpdf_001::Write ()
    at ../../../src/pdfqpd/pdfcodecqpdf.cpp:699
#7  0x60000000e11d3a20:0 in ppdf::CPdfMergeXObject_001::CreateTempFile ()
    at ../../../src/pdfupd/pdfmergexobject.cpp:498

some idea?

thanks in advance

Linearization fails with "unknown token while reading object (PDF)"

Attempting to linearize the PDF file here by means of qpdf --linearize dixon.pdf output.pdf fails with the error (file position 3844619): unknown token while reading object (PDF). My qpdf version is 5.1.2.

I’ve encounted this same error with a number of PDFs (all ebooks) recently.

crash / stack overflow with malformed input pdf

Passing this pdf to qpdf will cause a crash:
https://crashes.fuzzing-project.org/qpdf-crash.pdf

Looking at the stack trace this seems to be an endless recursion causing a stack overflow.

Here's (part of) the stack trace when compiling qpdf with address sanitizer (latest git code):

==10615==ERROR: AddressSanitizer: stack-overflow on address 0x7ffdede32820 (pc 0x7f5ddac0dce7 bp 0x7ffdede33e50 sp 0x7ffdede32810 T0)
    #0 0x7f5ddac0dce6 in pcre_compile2 (/lib64/libpcre.so.1+0xace6)
    #1 0x78342b in PCRE::PCRE(char const*, int) /mnt/ram/qpdf/libqpdf/PCRE.cc:144:18
    #2 0x5ece64 in QPDFTokenizer::resolveLiteral() /mnt/ram/qpdf/libqpdf/QPDFTokenizer.cc:62:10
    #3 0x5f19be in QPDFTokenizer::presentCharacter(char) /mnt/ram/qpdf/libqpdf/QPDFTokenizer.cc:432:9
    #4 0x5f5091 in QPDFTokenizer::readToken(PointerHolder<InputSource>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /mnt/ram/qpdf/libqpdf/QPDFTokenizer.cc:519:6
    #5 0x5c461b in QPDFObjectHandle::parseInternal(PointerHolder<InputSource>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, QPDFTokenizer&, bool&, QPDFObjectHandle::StringDecrypter*, QPDF*, bool, bool, bool) /mnt/ram/qpdf/libqpdf/QPDFObjectHandle.cc:873:13
    #6 0x5c4c07 in QPDFObjectHandle::parseInternal(PointerHolder<InputSource>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, QPDFTokenizer&, bool&, QPDFObjectHandle::StringDecrypter*, QPDF*, bool, bool, bool) /mnt/ram/qpdf/libqpdf/QPDFObjectHandle.cc:939:15
    #7 0x5bcf0c in QPDFObjectHandle::parse(PointerHolder<InputSource>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, QPDFTokenizer&, bool&, QPDFObjectHandle::StringDecrypter*, QPDF*) /mnt/ram/qpdf/libqpdf/QPDFObjectHandle.cc:841:12
    #8 0x53b4d0 in QPDF::readObject(PointerHolder<InputSource>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, bool) /mnt/ram/qpdf/libqpdf/QPDF.cc:1020:31
    #9 0x550b21 in QPDF::readObjectAtOffset(bool, long long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, int&, int&) /mnt/ram/qpdf/libqpdf/QPDF.cc:1396:27
    #10 0x565da2 in QPDF::resolve(int, int) /mnt/ram/qpdf/libqpdf/QPDF.cc:1477:7
    #11 0x5a71e7 in QPDF::Resolver::resolve(QPDF*, int, int) /mnt/ram/qpdf/include/qpdf/QPDF.hh:520:13
    #12 0x5a71e7 in QPDFObjectHandle::dereference() /mnt/ram/qpdf/libqpdf/QPDFObjectHandle.cc:1520
    #13 0x5a88ca in QPDFObjectHandle::isInteger() /mnt/ram/qpdf/libqpdf/QPDFObjectHandle.cc:145:5
    #14 0x53d465 in QPDF::readObject(PointerHolder<InputSource>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, bool) /mnt/ram/qpdf/libqpdf/QPDF.cc:1121:23
    #15 0x550b21 in QPDF::readObjectAtOffset(bool, long long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, int&, int&) /mnt/ram/qpdf/libqpdf/QPDF.cc:1396:27
    #16 0x565da2 in QPDF::resolve(int, int) /mnt/ram/qpdf/libqpdf/QPDF.cc:1477:7
    #17 0x5a71e7 in QPDF::Resolver::resolve(QPDF*, int, int) /mnt/ram/qpdf/include/qpdf/QPDF.hh:520:13
    #18 0x5a71e7 in QPDFObjectHandle::dereference() /mnt/ram/qpdf/libqpdf/QPDFObjectHandle.cc:1520
    #19 0x5a88ca in QPDFObjectHandle::isInteger() /mnt/ram/qpdf/libqpdf/QPDFObjectHandle.cc:145:5
    #20 0x53d465 in QPDF::readObject(PointerHolder<InputSource>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, bool) /mnt/ram/qpdf/libqpdf/QPDF.cc:1121:23

qpdf could allow strings to be treated as names

Some broken pdf files have things like /Type (Page) instead of /Type /Page. It would be a small enhancement to allow qpdf to work with this. Perhaps qpdf could have a general mode in which it's more relaxed about "casting" from one type to another. See also https://bugs.launchpad.net/ubuntu/+source/qpdf/+bug/1397413

Building doesn't work on OS X

Starting with ./configure just reports there's no such file. Running autoconf then ./configure gives the error message

./configure: line 3867: syntax error near unexpected token `win32-dll'
./configure: line 3867: `LT_INIT(win32-dll)'

Page Rotate - feature request

Feel free to slap this down with vigor :)

I've been thoroughly searching the space for server-side PDF solutions, and qpdf is really solid. I think adding page rotation would be a logical and valuable feature addition. Trim, split, merge re-order and rotate are the basic things many users/programs need to do with a PDF. They represent modifications of the page containers/document container without modifying the content. Filling in forms, annotations, etc. is definitely a different level of interaction.

I think with page rotation added (and possibly form flattening) qpdf would be one of the best options available for getting a PDF into the desired order and optimized state for distribution.

qpdf-5.2.0 fails md5.test on upgrade from previous versions

build.log: http://dpaste.com/1ZECJWF.txt
qtest.log: http://dpaste.com/0SBCPNB.txt

The tests pick up the installed version. If I uninstall qpdf (5.1.3) first and then try to compile/run the tests/install qpdf-5.2.0 it works, which is a problem for source-based distributions like e.g. Exherbo or Gentoo.

'--qdf' output scraps previous incremental updates

As you know, PDF includes a feature called "incremental updates".

To see a small example file, download pdf-puzzle (and maybe read the accompanying article by Didier Stevens: 'solving-a-little-pdf-puzzle').

If one simply deletes all lines after the first EOF in the PDF, you've automatically restored the first version of the file (before the incremental update(s) happened).

If you run qpdf --qdf ... on an incrementally updated PDF, the resulting file will no more include the previous versions.

So here are my two wishes for future releases of QPDF:

It would be nice if you could include support for --qdf with PDF files which where incrementally updated.
As long as you can't implement that feature (or in case you are not willing to, for whatever reason), please document this fact.

Please add a clear note/statement to the QPDF manual saying something to this effect: 'The QDF mode does not fully support PDFs which include incremental updates. The output will include the most recent update only, and not the previous file versions.'

In any case, thanks a lot for this beautiful tool, Jay. I very much appreciate it. I love it, and I use it almost daily, mainly to create QDF modes of PDFs I have to investigate and to debug.

stdin as input file

Thanks a lot for qpdf, it's a wonderful tool.

I'm trying to do things like

qpdf --qdf --object-streams=disable x.pdf - | qpdf - - | gzip -c > x.clean.pdf.gz

instead, I have to do

qpdf --qdf --object-streams=disabled x.pdf _tmp.pdf
qpdf _tmp.pdf - | gzip -c > x.clean.pdf.gz
rm _tmp.pdf

Please consider letting us use "-" as "infilename".

the not so useful 'overflow reading bit stream' message

Hi! I went to a trap that is not very good implemented in qpdf.
When I have written my hint tables
into the hint stream with my own PDF generator,
I tried to write the bit stream in little endian
instead of big endian as is written in the reference.
When I checked my generated PDF with qpdf --check-linearization ,
I got an error message saying 'overflow reading bit stream'
that was not so useful for me, because
the error was in the first hint table's header not on the end of the stream.
When I am looking for other possibilities, when this can happen
this error can come out when the hint stream contains
only a little 1bit error somewhere in the hint table.
So what I am suggesting is a more detailed solution
for checking the hint table fields through processing the hint stream.
Anyway, qpdf is a nice project, well done, guys!

fix-qdf assumes fixed size for /W fields in object stream

I've used qpdf extensively and encountered finally encountered a problem.
The perl script fix-qdf that rewrites offsets after edits/insertion assumes that object numbers in an object_stream will only occupy a single byte.
Excerpt from qpdf 5.1.2 contains print " /W [ 1 $xref_f1_nbytes 1 ]\n";
Unfortunately the environment I work in doesn't let me provide a full file however I'm working with PDF files generated by MS word. Here's an excerpt of file showing content where this is not true:
1431 0 obj
<<
/Type /XRef
/Length 8592
/W [ 1 3 2 ]
/Info 2 0 R
/Root 1 0 R
/Size 1432

There were 257 items in this particular object stream. I'm not fluent in PERL and haven't been able to generate a fix for the script.

qpdf doesn't like /DecodeParms to contain streams

See https://bugs.linuxfoundation.org/show_bug.cgi?id=1197, attachment https://bugs.linuxfoundation.org/attachment.cgi?id=447. Object 296 is /DecodeParms for object 253. 296 is a dictionary whose single key has a value that is a stream. QPDFWriter is trying to make all /DecodeParms keys direct, which won't work in this case. Figure out why qpdf wants /DecodeParms to be direct and fix the logic so it is no longer required.

Tests: qpdf 2064 (compare images) FAILED

Trying to build qpdf 5.0.1 with tests enabled

qpdf/build/qtest.log-******************************************
qpdf/build/qtest.log-STARTING TESTS on Thu Oct 24 00:44:41 2013
qpdf/build/qtest.log-******************************************
qpdf/build/qtest.log-
qpdf/build/qtest.log-Test coverage active in scope qpdf
qpdf/build/qtest.log-
qpdf/build/qtest.log-Running ../qtest/qpdf.test
qpdf/build/qtest.log:qpdf test 2064 (compare images) FAILED
qpdf/build/qtest.log-cwd: /var/tmp/paludis/build/app-text-qpdf-5.0.1/work/C/64/qpdf-5.0.1/qpdf/qtest/qpdf
qpdf/build/qtest.log-command: tiffcmp -t tif1/a.tif tif2/a.tif
qpdf/build/qtest.log- at qpdf.test line 2205.
qpdf/build/qtest.log-   main::compare_pdfs('inline-images-cr.pdf', 'a.pdf') called at qpdf.test line 1965
qpdf/build/qtest.log-   Expected status: 0
qpdf/build/qtest.log-   Actual   status: 1
qpdf/build/qtest.log-
qpdf/build/qtest.log-Test coverage results:
qpdf/build/qtest.log-
qpdf/build/qtest.log-Coverage analysis: PASSED
qpdf/build/qtest.log-
qpdf/build/qtest.log-TESTS COMPLETE.  Summary:
qpdf/build/qtest.log-
qpdf/build/qtest.log-Total tests: 2081
qpdf/build/qtest.log-Passes: 2080
qpdf/build/qtest.log-Failures: 1
qpdf/build/qtest.log-Unexpected Passes: 0
qpdf/build/qtest.log-Expected Failures: 0
qpdf/build/qtest.log-Missing Tests: 0
qpdf/build/qtest.log-Extra Tests: 0

Complete build logs can be found here:

If you need anything else from the build i will be happy to provide them :)

libqpdf removes DecodeParms

I'm currently writing a PNG to PDF converter program with libqpdf.
I found that Flate compression predictor is quite useful for reducing file size.
However, whenever I try to set parameters for Flate predictor with DecodeParms
dictionary entry in Image dictionary, libqpdf removes it.

Is there any way to set DecodeParms?

QPDF writing a new file: "endstream" keyword is not on a line of its own

When QPDF fixes a PDF with a damaged xref table, it writes the "endstream" keyword without a preceding newline and places it directly at the end of the stream data.

When checking the output with Acrobat Preflight, this triggers lots of error messages:

Syntax problem: Indirect object “endobj” keyword not preceded by an EOL marker
Syntax problem: Stream dictionary improperly formatted
Syntax problem: Stream dictionary has improper length entry
Indirect object “endobj” keyword not followed by an EOL marker

qpdf hanging on certain pdf file

Jay - thank you again for your help with the issues I have reported over the past few months. I encountered an issue today on a certain pdf file where qpdf just hangs when trying to process it. Again, I'm not able to share the pdf, but I can provide any info you need to diagnose it:

If I run this command:

qpdf input.pdf output.pdf

the command runs successfully in 1 second, but if I run this command:

qpdf --empty --pages input.pdf 1-z -- output.pdf

qpdf just hangs with no output. I let it run over 10 minutes.

Is there any sort of debugging that I can do to narrow down what is causing it?

Thanks again.

Ryan

Build fails when using libc++ with clang

The qpdf build fails when trying to build using clang's libc++ C++ stdlib. I tested using Apple's clang 4.2, build 425, as shipped with Xcode 4.6.3 on Mac OS X 10.7; I've been told it happens in clang 5.0 build 500 as well.

clang fails with three errors in QPDFExc.cc:

/bin/bash ./libtool --quiet --mode=compile clang++ -stdlib=libc++ -Wold-style-cast -Wall -MD -MF libqpdf/build/QPDFExc.tdep -MP -Iinclude -Ilibqpdf  -c libqpdf/QPDFExc.cc -o libqpdf/build/QPDFExc.o; sed -e 's/\.o:/.lo:/' < libqpdf/build/QPDFExc.tdep > libqpdf/build/QPDFExc.dep
In file included from libqpdf/QPDFExc.cc:1:
include/qpdf/QPDFExc.hh:57:17: error: implicit instantiation of undefined template
      'std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >'
    std::string filename;
                ^
/usr/bin/../lib/c++/v1/iosfwd:187:27: note: template is declared here
    class _LIBCPP_VISIBLE basic_string;
                          ^
In file included from libqpdf/QPDFExc.cc:1:
include/qpdf/QPDFExc.hh:58:17: error: implicit instantiation of undefined template
      'std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >'
    std::string object;
                ^
/usr/bin/../lib/c++/v1/iosfwd:187:27: note: template is declared here
    class _LIBCPP_VISIBLE basic_string;
                          ^
In file included from libqpdf/QPDFExc.cc:1:
include/qpdf/QPDFExc.hh:60:17: error: implicit instantiation of undefined template
      'std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >'
    std::string message;
                ^
/usr/bin/../lib/c++/v1/iosfwd:187:27: note: template is declared here
    class _LIBCPP_VISIBLE basic_string;
                          ^
3 errors generated.

Full logs available at: https://gist.github.com/mistydemeo/1ec73e812c6300cdaa81

'configure' step fails on OSX because pcre.h isn't found

I'm trying to build QPDF from Git on OSX (Mavericks).

After completing './autogen.sh' successfully, './configure' ends with this error:

configure: WARNING: unable to find required header pcre.h

configure: error: some required prerequisites were not found

I've MacPorts installed:

$>  port installed pcre
   The following ports are currently installed:
   pcre @8.33_0 (active)

$>  ls -lh /opt/local/include/pcre.h
  -rw-r--r--  1 root  admin  30K Oct 28 08:02 /opt/local/include/pcre.h

I don't understand what the problem is...

libqpdf and PDF version string

This is minor issue but I'll report it.

I'm currently writing a small program with libqpdf.

I found that if I try to set PDF version number string which is not in
the form of "number.number", then the process dies. libqpdf accepts
"1000.3" but not "2".

I compiled with g++ (GCC) version 4.6.2-1 on MinGW32/Windows 8.1.
Does this happen also in other environment?

QPDF private methods

Hi,
I would like to develop a Python package to use libqpdf features from Python. One of the tasks (attaching QPDF to a Python stream) would require subclassing InputSource (which is pretty easy) and subclassing QPDF to provide a new method like "processPythonStream", which would in turn need to access QPDF methods. Since for instance parse method is private, this not possible, so I was wondering if you could move parse to protected.

Files with a blank PDF version

Jay - thank you for your help with the few issues I have reported. I really appreciate it. I'm running into another issue that I was hoping you could help with. Again, I'm not able to share these PDF's, but I can provide any much info as you need to diagnose it:

I have some PDF's where the PDF version is showing up as 0.0 in pdfinfo such as:

Creator: Toolkit http://www.activepdf.com
Producer: Toolkit http://www.activepdf.com
CreationDate: Tue Aug 16 07:52:12 2011
ModDate: Tue Aug 16 07:52:12 2011
Tagged: no
Pages: 1
Encrypted: no
Page size: 799.347 x 609.882 pts
File size: 25588 bytes
Optimized: no
PDF version: 0.0

When I try to process these files with qpdf, I receive an error:

input.pdf: not a PDF file

When I look at the files in a text editor, the first line of the file either is:

%PDF-

%PDF-0

%PDF-0.0

I am able to view the files in Acrobat, Foxit, and SumatraPDF. I imagine that they are just falling back to a certain PDF version and trying to display the file with that, but not absolutely sure.

Is there a way that qpdf can support these files too? Please let me know if I can provide more info.

Thanks.

Ryan

Acquire Cryptographic Context fails on fresh windows install

Call to CryptAcquireContext in SecureRandomDataProvider fails when using qpdf 5.1.2 with --encrypt option in a fresh windows install. This probably doesn't present itself once some user keys have been created in C:\Users\<user>\AppData\Roaming\Microsoft\Crypto\RSA.

This fixed it for me:

if (!CryptAcquireContext(&crypt_prov, 
                         "Container", 
                         NULL, 
                         PROV_RSA_FULL, 
                         0))
{
    if (GetLastError() == NTE_BAD_KEYSET)
    {
        if (!CryptAcquireContext(&crypt_prov, 
                                 "Container", 
                                 NULL, 
                                 PROV_RSA_FULL, 
                                 CRYPT_NEWKEYSET))
        {
            throw std::runtime_error("unable to acquire crypt context with new keyset");
        }
    } else {
        throw std::runtime_error("unable to acquire crypt context");
    }
}

[Feature request] Don't just warn about wrong stream length -- report the correct value too

When checking a PDF with qpdf --check a.pdf QPDF rather reliably reports and warns about cases where the stream length is not correctly contained in the PDF code. Example:

qpdf --check 4pix.pdf
 checking 4pix.pdf
 PDF Version: 1.3
 File is not encrypted
 File is not linearized
 WARNING: 4pix.pdf (object 6 0, file position 1376): attempting to recover stream length

It even says "attempting to recover stream length". However, it does not tell the user which stream length it deems to be the correct one.

It would be nice if QPDF reported the correct stream length here.

Lossless JPEG stream optimizations (optimizing the Huffman tables)

Usually JPEGs can be optimized lossless by optimizing the Huffman tables using e.g. jpegtran or Jpegoptim

linearize files that contain multiple objects with the same object ID and different generations

"qpdf.exe --linearize http://static.longnow.org/media/ClockPlans01.pdf out.pdf" error message:
"QPDF cannot currently linearize files that contain multiple objects with the same object ID and different generations. If you see this error message, please file a bug report and attach the file if possible...."

Form still not working with pdftk when file decrypted

PDF file: http://www.immi.gov.au/allforms/pdf/1066.pdf
Part C, question 53. Title(mr/mrs/miss/ms)

Only this kind of Button not working, other parts seems work.

Feature Request: join pages while maintaining layers

Layers currently get lost when joining pages from different files. Would be nice to be able to join pages, with different layers in each, into a new pdf that contains a super-set of all the layers from the input pages.

Linearize removes embedded XMP data

Version: 5.1.2

Steps that will reproduce the problem?

Create a PDF file with embedded XMP information, e.g. by using the LaTeX package xmpincl
Linearize the PDF
Try to extract the XMP information again

What is the expected result?
Seeing the XMP information.

What happens instead?
They are gone.

Possible workaround:
Do not linearize.

Any additional information:
None.

page length mismatch because of unregular object numbering vs. lengthNextN

Hi, I am trying to generate linearized PDF,
and checking with qpdf, if my hint tables are correct.
Qpdf reports 'page length mismatch' errors,
because my object numbers are not ordered.
This is because lengthNextN is used
when calculating the length of the objects.
In the PDF reference there is no mention,
in which order the objects must be,
only that the object numbers must be unique.
I am sure, I calculate the lengths as it is needed,
but qpdf's error messages are suggesting me
to change the whole object numbering,
that is not under my control
(I am working on an old generator
written by some old PDF gurus)

Split an input into multiple output files

I may be missing the obvious, but is it possible to use qpdf to split a single PDF input into multiple PDF outputs, one per page range?

The documentation on page selection refers to "splitting," but from what I can tell, there is always one output document. Every other PDF tool I've looked at (e.g. PDFtk, Poppler, Ghostscript, PDFTron, PDFLib) uses "splitting" to refer to breaking a file into multiple outputs, but the manual seems to refer to "splitting" as drawing from multiple possible inputs into one output.

I would like to know if there is a way to do this currently with qpdf. If so, can you give examples in the manual? If not, I propose it as a feature.

qpdf linearization of PDF file with pages merged using iText's PDFStamper results in missing pages

I use iText's PDFStamper to add contents of B.pdf to A.pdf*. I pass the merged file, say C.pdf, to qpdf for linearization. When the number of pages in A.pdf < 10, linearization succeeds and the number of pages in the linearized file is same as that of C.
If number of pages in A.pdf > 10, I can see that number of pages in C.pdf is the sum of number of pages in A.pdf + B.pdf, but upon linearization only pages from A.pdf are retained in the output file, and the pages added from B.pdf are missing.

qpdf version we use right now 4.0.1.
** A.pdf is generated as a PDF file from A.docx

extract attachments tests failing on PowerPC

6 extract attachments tests of qpdf are failing on both ppc and ppc64.
Running ../qtest/qpdf.test
qpdf test 1148 (extract attachments) FAILED
cwd: /home/abuild/rpmbuild/BUILD/qpdf-4.0.0/qpdf/qtest/qpdf
command: test_driver 35 a.pdf
expected output in attachments.out
at qpdf.test line 1740.
--> BEGIN EXPECTED OUTPUT <--
attachment1.txt:
This is the first attachment.
--END--
attachment2.png:
.PNG........IHDR...1 (2620 bytes)--END--
test 35 done
--> END EXPECTED OUTPUT <--
--> BEGIN ACTUAL OUTPUT <--
attachment1.txt:
This is the first attachment.
--END--
attachment2.png:
IPNG
^Z
^@^@^@^MIHDR^@^@^@1^@^@^@8^H^C^@^@^@cm�^@^@^@^DgAMA^@^@�O^K�a^E^@^@^@^AsRGB^@�^^@^@^@ cHRM^@^@z&^@^@@~~D^@^@^@^@^@~~@^@^@u0^@^@�^@^@:~X^@^@^Wp~\�Q<^@^@^BFPLTE��ε~\��~Lֽ~\��ƭ~Lƭ~Tƥ~Dε~~ Lε~T�~\{ֽ��έ~Tέ~Lέ~Dֵ~Dֵ~L�~T�ƥ�ƥ�~\{ƥ{~\k�~TcΥkέs֭{޵~Dֽ~Tֵ~T�~Lkε��Υ{�~DR�sB�~DJΥs~\s~\{�~Ts��~D�~Ts�~\~D�~\ֽ~L�~\k�~LZ~\s9�{J�~Lc�{R~\sB�~Lk�Ƶ��Я�~\޽~L֭~Dƥk~Tc�~DR� �s~\k9�~\s��Υ�~T�~Tk~Tc1~\kB~\sJ�~DZ޽~Tֵ{�~DZ~Tk1~TkB~Lc9�ƭ��Ƶ~Tk޵~Lέ{�~DZ�~LZ�~Dcν��ƥ~L֭s֥s�{R��֥{J�{B�{B~\k�~TZ�~Lc�s9~\c~\Z�~DJ~\k1~Tc)~\c1~LZ!~LZ)~TZ)��~D�~LR~LR!�ƭ�~LR~DZ)~Tc9�k9~DRR !�sJ{J^XsB^H~Tk9�~DR~\{R��~DJ^X~DR^Xs9^HsB^P��ץ{Z��{J^P~DR)~TsJ��~Lc)~\sR֥k~\c�sBΥck9^@sB^X~TZ�~LJ�{9�ε�~LJ�~DB�{9�s1�~DB~\k)k9^H�ε�{B�~D9�{9~Tc!{B^P{R!~LZ1�s1~Lc1ֵ~\�k1~TR~Tk){J!�{J��å~Kb~Lc!�� ~TZ!«~C~LZ^X�~Cb~DR^P{J^Hc1^@¤~Ck1^@��~C�ν��Ԥ^@^@^@^AtRNS^@@��f^@^@^@^AbKGD�df�n^@^@^@ pHYs^@^@^@H^@^@^@H^@F�k>^@^@^F�IDATH~U~V~K^?�^U~]9qbb�b�YA~X$~J^D��B^R�#.�z^]^B�b~M��^^"Bd)e^C�\�^^
LJSH�z[\�Ik�[^Mc~O�{�{t��v.�^~Z�iv>�|@^_]��"�H
s� ^WUIC�Ƨ^XU�Ӭ�5?4�&�~~A@FIױY~~NF~~E^_�B^R>nX�^O�y?�_i~~E��~~Y^E�,�^PSiլ��kF$^R^U~~Dd�z r�4^MM0UB��2�õ^_^K^C�I+~R~Nq�~B~Ah4~Yd^B^H^E�7�~[|~XNq~^Lֻ�^L^^|q~PR~Pd�kU��@4^D^X~D~QR�;~O�~M�XX QL��.~O889��\JQ^T^H�"E^B��~�^Pj�^RoY�0�A'��&6d~~QK1~Jr^M^]c^X&^Z^@AHU��~O��޲��Pk^@~V~]0M~BQ^X^?~M~A&~R^@��U~@~PQ~[^R�^Y/4.^AV��Eh(~B0^L^D^G_��J^H^F^Cb�^F�^Q�ӹ~YͰ~E^UP�(?N�<9~O^G~HN^L*#a| {{ '~G_��~Iӹ�~Y�^�@W��p~^^Ur��B<~@I^H��@~L3�y%~X��lm�Y~Q�88��rwr�~_^A^Q~S4�0$�V~[��N^C3~N-8^V�;~E|� ^R~DK�޹s��m5�iz^E~[�!^Y8�O<cs�ne2y([J^UX^X%.~Uz��]^B�h�n��uR^Z~D�~�!�0NA6/~L�<�^H>@T��~�+^@6t� �^Z�~X^R#^Y��0N^A�~U^W=~L~J8N^T^Wi^O^T^^�m�}�Zŵ�~ZfQ H-�>a^[^^M^B�� ^Q^S"^G^ET��5~QdB7��6!~L6*�^?�O< [^AD)�^F@aQ^XU��i ��&&~H^P�v��~\�3�X �E^NaTM~@n^PC~S~E��dY�n�y^N) ~I~E�~~P��^^^@�U4;V^A�0��;��~~Y0^O��E~V^W�^TZ~C-�I��F�h4~H~_�f^[~W^N~B� ^]~D��~D^P^W0{��U8��n~W~F^A�N@Uɼ��sY��S[d3|~~ASb^F�v�^E@�-^Y^F^D�~~M^O^N~[�2~~G�<^]^W�~~N�&,^]�uI_^Q]&^Q~~E�v�u�^Q^T^~~C�i�~~G��]��(��RY~~R�^@.~~V^[^]^B�b~~M��^^"~Bd)e^C�\�^^ LJSH�z[~\�~Ik�[^Mc~O�{�{t��v.�^~Z�iv>�|@^_~W�;O��h;~^��z{~O~_�;��~X�O~^�?e9�70�o�?^X^SO^N^L^F^G~G~F~_z��^GN<}�3~X��>~[�AQ^Q׹�^W^FNYG��G^@^L�~]vS4~^��g^_^[�NL>{^Q��~^�N�^^?G9~^N^G^K^Aazz �٧��~A�~>�~K3/^D)Q�^!~_~Cbi~D~P0;1y��~N^Dο�~KC�~Pc~J��>�=^T�9 E��v��Q^,��q�ܼ^[�^\�| �^H�L^L��~[~W&^��@�r��s�.~Wk�^N~E|�^E�^W�h~Z^L��%뷿~P~GW^O�~V_�.�^G^G� ^W^N�N~E� +��40��~U�~�^N~S^W~F]^�^Q��A^NO��$^_�D:~UN1~R��!�~W^^~L��۹^Z̰~T�~[�^R2^]~H~Wi.� �GLr~E~O~O[�^G^_�0�~Z�^R�)~To_~| i~Y^V^R^@�~R�I* +$e>^^^_^_�^]"'-��l&�z~P~@�-�+| ~]��5M7�J1&)�KK�^\?$^�^L:��<�& �^N_!B~F^RtZQ5�{�!^Q^_�<'��^O~H�AG!�)$(~Gw^K^H�^P +~W^L]/J�~@ �s~B@/E^O~I�~D�^�d^#^K�^E/d~V�^Rt~R~I]��"�~H +s� ^W~U~I~C�Ƨ^XU�Ӭ�5?4�&�~A@~F~IױY~NF~E^_�B^R>nX�^O�y?�_i~E��~Y^E�,�^PSiլ��kF$^R^U~Dd�z r�4^M~M0~U~B��2�õ^^K^C�I+RNq�BAh4~~Yd^B^H^E�7�~~[|~~XNq~~^Lֻ�^L^^|qPRPd�kU��@4^D^XDQR�;O�M�
XQL��.~~O889��\JQ^T^H�"E^B��~~�^Pj�^RoY�0�A'��&6d~~QK1~~Jr^M^]c^X&^Z^@ahu��~~O��޲��Pk^@V]0M~~BQ^X^?MA&~~R^@��U~~@PQ[^R�^Y/4.^AV��Eh(~~B0^L^D^G_��J^H^F^Cb�^F�^Q�ӹ~~YͰ~~E^UP�(?N�<9~~O^G~~HN^L*#a|
{'~~G_��~~Iӹ�~~Y�^�@W��p~~^^Ur��B<~@i^H��@~~L3�y%~~X��lm�Y~~Q�88��rwr�~~^A^Q~~S4�0$�V~~[��N^C3~~N-8^V�;~~E|� ^R~~DK�޹s��m5�iz^E~~[�!^Y8�O<cs�ne2y([J^UX^X%.~~Uz��]^B�h�n��uR^Z~~D��!�0NA6/~~L�<�^H>@t��~~�+^@6tt
��^Z�X^R#^Y��0N^A�~~U^W=LJ8N^T^Wi^O^T^^�m�}�Zŵ�~~ZfQ
+H-�>a^[^^M^B��
+^Q^S"^G^ET��5~~QdB7��6!~~L6*�^?�O< [^AD)�^F@aQ^XU��i ��&&~~H^P�v��~~\�3�X �E^NaTM~~@n^PC~~S~~E��dY�n�y^N) IE�~P��^^^@�U4;V^A�0��;��~Y0^O��E~~V^W�^TZ~~C-�I��F�h4~~H~_�f^[~~W^N~~B�
^]~~D��~~D^P^W0{��U8��nWF^A�N@Uɼ��sY��~S[d3|~ASb^F�v�^E@�-^Y^F^D�~M^O^N~[�2~G�<^]^W�~N�&,^]�uI*^Q]&^Q~E�v�u�^Q^T^~C�i�~G��]��(��RY~R�^@.~V^[^?<\�o^ZOF��0I.~M$^S~Q�~Df~M^A@J~H��lI^_x^V��^QQ ^H5~F~AHy~CT~Y ^?~B~Dm^P| ^N~Yj~W^Z^?~~@xbt:~I^@��~B-�~G�ժ��^W�Jn��^Wk^?��3�~R^)i � +!C~Aq�~N�^O?^@5^RU~G~^~[�s^Bjw$~S +w~\��V~E�%~RL�6�| �a^[m~WJ�>Q�_��^U~E~^2��?��6~T^N~D3��@Y^E2V��^_~A��bc��f��+^PT��^W�~[�f�^$^Fӡ�~KF۬}�8��fi5��]��^F~C��~^GZ�qk<z�ؾ�l6?n6o^?�x/~J�^?�v�ǳc^^�^?^@^T^U~UR^_~]�� V^@^@^@%tEXtdate:create^@2010-02-20T09:38:33-05:00~I�^@^@^@%tEXtdate:modify^@1998-01-14T18:05:05-05:00��^@^@^@^@IEND�B~B--END--
test 35 done
--> END DIFFERENCES <--

Merge fails to insert "empty" pdf

The following fails due to some "vector::_M_range_check":
qpdf --empty test.pdf
qpdf --empty out.pdf --pages test.pdf 1 --

And if the range parameter is optional, why does the following have "insufficient arguments":
qpdf --empty out.pdf --pages test.pdf --

By the way, the following is valid (range for second pdf is given):
qpdf --empty out.pdf --pages test.pdf test2.pdf 1 --

But it fails due to the same check procedure if test.pdf is empty.

qpdf --check and exit codes

I am running qpdf --check against a pdf and getting the following output:

checking /tmp/test.pdf
PDF Version: 1.2
File is not encrypted
File is linearized
WARNING: incorrect object count in outline hint table

but the exit code is 2. I see the documentation says the exit code will be 2 if errors, and 3 if warnings.

Should the above be considered a warning with a exit code of 3? I am able to process the file with qpdf, so it seems to be a valid pdf.

Thanks for your help.

Ryan

Installation fails on Solaris 10 (install(1) incompatibility)

$ uname -a 
SunOS testzone 5.10 Generic_147440-19 sun4v sparc SUNW,SPARC-Enterprise-T5120
$ mkdir $PWD/usr/local
$ ./configure --prefix=$PWD/usr/local && make
...
$ make install 
./mkinstalldirs /home/martin/qpdf-4.0.1/usr/local/lib/pkgconfig
mkdir /home/martin/qpdf-4.0.1/usr/local/lib
mkdir /home/martin/qpdf-4.0.1/usr/local/lib/pkgconfig
./mkinstalldirs /home/martin/qpdf-4.0.1/usr/local/bin
mkdir /home/martin/qpdf-4.0.1/usr/local/bin
./mkinstalldirs /home/martin/qpdf-4.0.1/usr/local/include/qpdf
mkdir /home/martin/qpdf-4.0.1/usr/local/include
mkdir /home/martin/qpdf-4.0.1/usr/local/include/qpdf
./mkinstalldirs /home/martin/qpdf-4.0.1/usr/local/share/doc/qpdf
mkdir /home/martin/qpdf-4.0.1/usr/local/share
mkdir /home/martin/qpdf-4.0.1/usr/local/share/doc
mkdir /home/martin/qpdf-4.0.1/usr/local/share/doc/qpdf
./mkinstalldirs /home/martin/qpdf-4.0.1/usr/local/share/man/man1
mkdir /home/martin/qpdf-4.0.1/usr/local/share/man
mkdir /home/martin/qpdf-4.0.1/usr/local/share/man/man1
/bin/bash ./libtool --mode=install install -c \
        libqpdf/build/libqpdf.la \
        /home/martin/qpdf-4.0.1/usr/local/lib/libqpdf.la
libtool: install: install -c libqpdf/build/.libs/libqpdf.so.10.0.1 /home/martin/qpdf-4.0.1/usr/local/lib/libqpdf.so.10.0.1
cp: cannot access /home/martin/qpdf-4.0.1/usr/local/lib/libqpdf.so.10.0.1
install: cp /home/martin/qpdf-4.0.1/usr/local/lib/libqpdf.so.10.0.1 libqpdf/build/.libs/libqpdf.so.10.0.1/libqpdf.so.10.0.1 failed 
make: *** [install] Error 2
$

This appears to be due to differences in behavior between Solaris and GNU install(1) (in particular the -c flag). GNU man page:

   -c     (ignored)

Solaris:

 -c dira         Install file in the directory  specified  by
                 dira,  if  file does not yet exist. If it is
                 found, install issues a message saying  that
                 the  file  already exists, and exits without
                 overwriting it.

Workaround: compile/install GNU coreutils, use /usr/local/bin/install.

the decrypted PDF is not correct

please try decrypt this pdf
http://www.japan.ntu.edu.tw/pdf/common_kanji_2007_0801.pdf

[Feature request] Report correct `startxref` value when doing `-show-xref`

I'd like to see a small feature added to QPDF:

When running show-xref, please let it report the (correct) value for the startxref entry.

Currently, if the startxref value is wrong, qpdf reports this:

(file position 155987): xref not found
WARNING: demo.pdf: Attempting to reconstruct cross-reference table
1/0: uncompressed; offset = 10
2/0: uncompressed; offset = 62
[....]
403/0: uncompressed; offset = 152245
404/0: uncompressed; offset = 155407

Couldn't qpdf, when it is computing all the objects' byte offsets also report/suggest a reasonable value for the/an startxref entry? (Yes, I'm aware of the fact that a damaged/buggy PDF may not contain any xref at all...)

Decrypt to same file produces empty file

When I decrypt a PDF file like so:
qpdf -decrypt test.pdf test.pdf
I get this error in the console:
test.pdf (object 291 0, file position 239): EOF while reading token
In my case at least. The numbers may differ from document to document. More importantly, the test.pdf is now 0 bytes.

I would expect qpdf to just write the decrypted pdf to the original file. This is preferrable especially when creating a registry-entry for a simple "right-click -> decrypt" command in Windows (or any other OS really).

The workaround is to "invent" a temporary file and optionally overwrite the original file with that, after qpdf has finished. Which is fiddly.

qpdf producing damaged PDF file and core dumping at linearization

I have a PDF document generated using pdflatex, and I used qpdf to linearize it. The initial document reads and checks fine, while the linearized document has the first pages damaged and produces an error at checking:

(object 269 0, file position 307): expected n n obj

It's a Beamer presentation including a lot of screengrabs. I thought I found the frame responsible (outcommenting it makes the problem disappear), but the frame looks fine, and even replacing it with a copy of the next frame generates the same problem. And nothing obviously suspicious before it.

Displaying the linearized document using evince gives me a lot of empty pages at the start and errors like this:

Syntax Error (4697): No font in show/space
Syntax Error: Unknown font tag 'F21'

Any ideas on how to solve this problem or how to debug it?

For those willing to try to linearize themselves: here's the document (warning: 50MB download):

http://www.offerman.com/private_secure_computing/AOC-Private_Secure_Computing-CyanogenMod-nonlin.pdf

I also tried pdfopt, creating a linearized document that appears to be just fine. Checking it with qpdf, however, results in a core dump.

Additional discussion may be found at StackExchange:
https://tex.stackexchange.com/questions/161450/damaged-pdf-file-after-linearization-using-qpdf-apparently-not-caused-by-latex

5.1.1 binaries wont load on WinXP

VC 2012 SP0 (?, but not SP1, google it) dropped WinXP support. 5.1.1 gives a "is not a valid win32 application" popup on WinXP. This specifically is because of " 6.00 operating system version" in the PE header of the binary. Going backwards through the sourceforge releases, 5.0.1 is the newest version I could run on WinXP. Could you please restore WinXP support in your distributed binaries?

C:\sources\qpdf-5.1.1\bin>dumpbin /headers qpdf.exe
Microsoft (R) COFF/PE Dumper Version 7.10.6030
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file qpdf.exe

PE signature found

File Type: EXECUTABLE IMAGE

FILE HEADER VALUES
             14C machine (x86)
               4 number of sections
        52D5A7D7 time date stamp Tue Jan 14 16:10:47 2014
               0 file pointer to symbol table
               0 number of symbols
              E0 size of optional header
             102 characteristics
                   Executable
                   32 bit word machine

OPTIONAL HEADER VALUES
             10B magic # (PE32)
           12.00 linker version
           10A00 size of code
            8200 size of initialized data
               0 size of uninitialized data
           10426 entry point (00410426)
            1000 base of code
           12000 base of data
          400000 image base (00400000 to 0041AFFF)
            1000 section alignment
             200 file alignment
            6.00 operating system version
            0.00 image version
            6.00 subsystem version
               0 Win32 version
           1B000 size of image
             400 size of headers
               0 checksum
               3 subsystem (Windows CUI)
            8140 DLL characteristics
                   RESERVED - UNKNOWN
                   RESERVED - UNKNOWN
                   Terminal Server Aware
          100000 size of stack reserve
            1000 size of stack commit
          100000 size of heap reserve
            1000 size of heap commit
               0 loader flags
              10 number of directories
               0 [       0] RVA [size] of Export Directory
           1708C [      64] RVA [size] of Import Directory
               0 [       0] RVA [size] of Resource Directory
               0 [       0] RVA [size] of Exception Directory
               0 [       0] RVA [size] of Certificates Directory
           1A000 [     D74] RVA [size] of Base Relocation Directory
           122A0 [      38] RVA [size] of Debug Directory
               0 [       0] RVA [size] of Architecture Directory
               0 [       0] RVA [size] of Global Pointer Directory
               0 [       0] RVA [size] of Thread Storage Directory
           15A78 [      40] RVA [size] of Load Configuration Directory
               0 [       0] RVA [size] of Bound Import Directory
           12000 [     274] RVA [size] of Import Address Table Directory
               0 [       0] RVA [size] of Delay Import Directory
               0 [       0] RVA [size] of COM Descriptor Directory
               0 [       0] RVA [size] of Reserved Directory

***CUT***
C:\sources\qpdf-5.0.1\bin>dumpbin /headers qpdf.exe
Microsoft (R) COFF/PE Dumper Version 7.10.6030
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file qpdf.exe

PE signature found

File Type: EXECUTABLE IMAGE

FILE HEADER VALUES
             14C machine (x86)
               4 number of sections
        52617C56 time date stamp Fri Oct 18 14:22:14 2013
               0 file pointer to symbol table
               0 number of symbols
              E0 size of optional header
             102 characteristics
                   Executable
                   32 bit word machine

OPTIONAL HEADER VALUES
             10B magic # (PE32)
           10.00 linker version
           14400 size of code
            8400 size of initialized data
               0 size of uninitialized data
           13C06 entry point (00413C06)
            1000 base of code
           16000 base of data
          400000 image base (00400000 to 0041EFFF)
            1000 section alignment
             200 file alignment
            5.01 operating system version
            0.00 image version
            5.01 subsystem version
               0 Win32 version
           1F000 size of image
             400 size of headers
               0 checksum
               3 subsystem (Windows CUI)
            8140 DLL characteristics
                   RESERVED - UNKNOWN
                   RESERVED - UNKNOWN
                   Terminal Server Aware
          100000 size of stack reserve
            1000 size of stack commit
          100000 size of heap reserve
            1000 size of heap commit
               0 loader flags
              10 number of directories
               0 [       0] RVA [size] of Export Directory
           1B1FC [      64] RVA [size] of Import Directory
               0 [       0] RVA [size] of Resource Directory
               0 [       0] RVA [size] of Exception Directory
               0 [       0] RVA [size] of Certificates Directory
           1E000 [     C90] RVA [size] of Base Relocation Directory
           162C0 [      1C] RVA [size] of Debug Directory
               0 [       0] RVA [size] of Architecture Directory
               0 [       0] RVA [size] of Global Pointer Directory
               0 [       0] RVA [size] of Thread Storage Directory
           199B0 [      40] RVA [size] of Load Configuration Directory
               0 [       0] RVA [size] of Bound Import Directory
           16000 [     29C] RVA [size] of Import Address Table Directory
               0 [       0] RVA [size] of Delay Import Directory
               0 [       0] RVA [size] of COM Descriptor Directory
               0 [       0] RVA [size] of Reserved Directory
**CUT**

DecodeParms length error when processing PDF file

I have a PDF file that is returning the following error when I run it through qpdf in Linux with "qpdf file.pdf output.pdf":

file.pdf (file position 8235527): stream /DecodeParms length is inconsistent with filters

I can't post the PDF file as it contains confidential information, but I have confirmed that I can process the file using pdftk in Linux and also view it in Windows using Acrobat, Foxit, and SumatraPDF.

Is there any sort of debugging that I can do to help narrow it down? Thanks for your help.

Command-line option to retrieve number of pages in PDF

Is there a command-line option in qpdf to retrieve the number of pages in the input pdf? I see I can do it with the --show-pages option and look for the highest page number, but is there an easier way? I'm doing it now with pdfinfo, but was curious if there was a way to do it with qpdf itself. Thanks.

qpdf / qpdf Goto Github PK

qpdf's Introduction

QPDF

Verifying Distributions

Copyright, License

Prerequisites

Licensing terms of embedded software

Crypto providers

Note about weak cryptographic algorithms

Building from source distribution on UNIX/Linux

Building on Windows

Building Documentation

Additional Notes on Build

Additional Notes on Test Suite

Random Number Generation

Acknowledgments

qpdf's People

Contributors

Stargazers

Watchers

Forkers

qpdf's Issues

Running qpdf --show-xref against the above file

Running qpdf --show-xref against the modified file (with correct startxref)

Recommend Projects

Recommend Topics

Recommend Org

Running `qpdf --show-xref` against the above file

Running `qpdf --show-xref` against the modified file (with correct `startxref`)