verapdf / verapdf-library Goto Github PK

View Code? Open in Web Editor NEW

268.0 268.0 48.0 31.75 MB

Industry supported, open source PDF/A validation library

Home Page: http://verapdf.org/software

License: GNU General Public License v3.0

Java 88.18% HTML 0.17% XSLT 11.65%

verapdf-library's People

Contributors

Stargazers

Watchers

Forkers

zhenyam dmitryremezow bezrukovm bdoubrov hak223sve bitsgalore fthomas johnscancella jasonzou madsmj amitdo shem-sergey dmitryrumiantsev gitsnaf ka7 chlara bryant1410 nagyistge carlwilson boehlf rezviybelorus fbarthez ssisoftware angrykid12 yiqideren ancarian set-de hellowangcheng artemkiriutin maximplusov jackdos romaprograms jmvezic intorpfk galmis evgeniy-prudnikov batlicka irinamavrina khalidasahara duffjohnson anhelinam darrendignam petrpytelka kadiryazgan kaynezhang mass1ve-err0r salomscala samyssmile

verapdf-library's Issues

veraPDF Installer JAR files need signing

Dev Effort

Description

Obtain a Java code signing certificate to avoid warnings on Windows systems when installing veraPDF.

veraPDF CLI to process standard input in lieu of a file input path.

The intention is that the CLI should process standard input if no file path is passed as a parameter. Currently the CLI takes no action and fails to print a usage message.

Install scripts should detect their execution directory.

Currently the install scripts can mis-fire if run from another location, e.g. double clicked in Mac finder. Add the code to detect current directory and find the install jar automatically.

Review exception handling and logging.

Some of the current exception handling is inconsistent and our log4j configuration isn't optimal / in line with best practise. Review our implementation and ensure that we're following best practise.

Execution exception in processing: java.lang.StackOverflowError

Open VeraPDF Conformance Checker
Choose File
159759_(PDFA).pdf
Set Program options as in screenshot
Press Execute

Invalid characters in the veraPDF output

Some PDF files result in veraPDF including invalid characters in the output (that's the default mmr output). Hardly devastating, but it's actually a bit of a pain as it causes XMLStarlet to drop out of parsing the output. Here are some examples:

This has an invalid character in the title tag extracted from the PDF:
http://web.archive.org/web/20080511210957/http://www.plymouth.gov.uk/5th_december_2007.pdf
This is a similar source, but this time shows a series of invalid characters in the title tags in the Pages feature extract:
http://web.archive.org/web/20071030162909/http://www.somersetpct.nhs.uk/about_us/board_meetings/November_2006/Papers/8%20(D)%20PEC%20Terms%20of%20Reference%20and%20Chair.pdf

This has an invalid character in the description of a rule in the validation output:
http://web.archive.org/web/20060930024642/http://www.tvcs.org.uk/pdfs/06leaflet1.pdf

These seem to be the two sources of invalid characters that I've come across so far, other than this:

http://web.archive.org/web/20071031010646/http://www.somersetpct.nhs.uk/about_us/board_meetings/December_2006/Papers/11%20(H)%20Risk%20Management%20Strategy%20and%20Policy%20Appendix%206%20-%20IPEC.pdf
This one is reported by XMLStarlet as invalid, but XMLStarlet has no problems parsing it. As far as I can see there are no problems with it, but it may be worth further investigation with a decent XML tool, so I thought I'd include it anyway.

The test data sets I'm passing on to Carl would be useful to use to double check there are no remaining bugs of a similar nature once fixes have been applied (I can send on if helpful). Carl mentioned adding these datasets to his automated tests. Depending on the outcome of the last example (and any others we find) it might be useful to validate the output on one of these large corpora as part of the automated test to pick up any of these kinds of issues in the future.

Add directory and recursive processing to CLI

Add basic functionality for directory processing, the CLI should:

take a directory reference and process all files in the directory that have the .pdf extension; and
handle a recursive processing flag, e.g. -r.

CLI: Reporting of internal errors/warnings to STDOUT results in invalid XML/HTML output

Commands like below should result in a valid XML file:

 verapdf whatever.pdf > whatever.xml

Since VeraPDF now uses STDOUT for reporting any errors or warnings during its execution, these now end up in the output file as well, with the result that output is not valid XML. Here's an example:

14:37:25,309 ERROR main XMPChecker:doesInfoMatchXMP:93 - Problems with XMP parsing. This namespace is not a schema or a structured type : http://ns.adobe.com/pdfx/1.3/
org.apache.xmpbox.xml.XmpParsingException: This namespace is not a schema or a structured type : http://ns.adobe.com/pdfx/1.3/
    at org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:263)
    at org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:206)
    at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:173)
    at org.verapdf.model.tools.XMPChecker.doesInfoMatchXMP(XMPChecker.java:74)
    at org.verapdf.model.impl.pb.cos.PBCosDocument.<init>(PBCosDocument.java:101)
    at org.verapdf.model.impl.pb.cos.PBCosDocument.<init>(PBCosDocument.java:63)
    at org.verapdf.model.ModelParser.getRoot(ModelParser.java:59)
    at org.verapdf.pdfa.validators.BaseValidator.validate(BaseValidator.java:371)
    at org.verapdf.cli.VeraPdfCliProcessor.processStream(VeraPdfCliProcessor.java:98)
    at org.verapdf.cli.VeraPdfCliProcessor.processPath(VeraPdfCliProcessor.java:82)
    at org.verapdf.cli.VeraPdfCliProcessor.processPaths(VeraPdfCliProcessor.java:70)
    at org.verapdf.cli.VeraPdfCli.main(VeraPdfCli.java:65)
14:37:26,838  WARN main FileSystemFontProvider:addTrueTypeFontImpl:287 - Missing 'name' entry for PostScript name in font C:\Windows\FONTS\code39LS.TTF
14:37:28,835 ERROR main PBoxPDMetadata:getXMPPackage:80 - Problems with parsing metadata. This namespace is not a schema or a structured type : http://ns.adobe.com/pdfx/1.3/
org.apache.xmpbox.xml.XmpParsingException: This namespace is not a schema or a structured type : http://ns.adobe.com/pdfx/1.3/
    at org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:263)
    at org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:206)
    at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:173)
    at org.verapdf.model.impl.pb.pd.PBoxPDMetadata.getXMPPackage(PBoxPDMetadata.java:74)
    at org.verapdf.model.impl.pb.pd.PBoxPDMetadata.getLinkedObjects(PBoxPDMetadata.java:59)
    at org.verapdf.pdfa.validators.BaseValidator.addAllLinkedObjects(BaseValidator.java:194)
    at org.verapdf.pdfa.validators.BaseValidator.checkNext(BaseValidator.java:136)
    at org.verapdf.pdfa.validators.BaseValidator.validate(BaseValidator.java:87)
    at org.verapdf.pdfa.validators.BaseValidator.validate(BaseValidator.java:371)
    at org.verapdf.cli.VeraPdfCliProcessor.processStream(VeraPdfCliProcessor.java:98)
    at org.verapdf.cli.VeraPdfCliProcessor.processPath(VeraPdfCliProcessor.java:82)
    at org.verapdf.cli.VeraPdfCliProcessor.processPaths(VeraPdfCliProcessor.java:70)
    at org.verapdf.cli.VeraPdfCli.main(VeraPdfCli.java:65)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<item size="4603158">
    <name>E:\pdfAcrobatEngineering\multimedia\SVG.pdf</name>
</item>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<validationResult flavour="PDFA_1_B" totalAssertions="2202" isCompliant="false">
    <assertions>
        <assertion ordinal="1710" status="FAILED">

        etc ...

Solution: write all internal errors/warnings to STDERR.

Tested with VeraPDF 0.8.7 on Windows 7, java version 1.8.0_66

Possible false positive in veraPDF when testing for CIDSystemInfo entry in CIDFont dictionary

http://bugs.ghostscript.com/show_bug.cgi?id=696797#c1
https://0x0.st/qpS.html

The validation test detailed in https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Parts-2-and-3-rules#rule-62113-1 could lead to false positives.
I can't attach a PDF to test this issue directly on this bug tracker.

veraPDF currently crashes when passed a non PDF file to validate.

The library and applications should detect when the file is quite clearly not a PDF and fail gracefully while informing the user.

More information and demonstration of error to come.

No CLI parameters causes null pointer exception.

After building and moving to the cli/target directory and executing the jar without parameters, e.g.

java -jar vera-cli-1.0-SNAPSHOT-jar-with-dependencies.jar

I get the following exception:

Exception in thread "main" java.lang.NullPointerException
        at java.io.File.<init>(File.java:277)
        at org.verapdf.model.ModelLoader.getRoot(ModelLoader.java:29)
        at org.verapdf.runner.ValidationRunner.runValidation(ValidationRunner.java:31)
        at org.verapdf.cli.VeraPdfCli.main(VeraPdfCli.java:36)

This should really return a help message.

Record application details in generated reports

There is currently no way to tell from the reports themselves which version of veraPDF was used to generate them. This information could prove useful as veraPDF changes, and such details are often sought for preservation in archival systems.

Crashing with OutOfMemoryError Exceptions

Below are a sample of files which I've found to cause veraPDF to crash, throwing OutOfMemoryError exceptions.

http://access.bl.uk/item/viewer/ark:/81055/vdc_00000001585E (91 mb)
http://access.bl.uk/item/viewer/ark:/81055/vdc_000000025A58 (185 mb)
http://access.bl.uk/item/viewer/ark:/81055/vdc_0000000156C0 (262 mb)
http://access.bl.uk/item/viewer/ark:/81055/vdc_000000025A16 (301 mb)
http://access.bl.uk/item/viewer/ark:/81055/vdc_000000037386 (425 mb)
http://access.bl.uk/item/viewer/ark:/81055/vdc_000000037500 (781 mb)

The crashes become particularly problematic when attempting to process files in batches. The crash leaves any remaining files unprocessed, with no indication of which file it choked on. I'd expect veraPDF to record its failure in the offending file's report instead of crashing, allowing it to continue with the remaining files.

veraPDF 0.20.3 CLI
Windows 7 (64-bit)
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)

Fix developer details in parent pom.

There are duplicate developer details in parent/pom.xml that need to be corrected.

CLI application outputs log4j configuration errors.

The startup error reads:

$ .\verapdf
log4j:ERROR Could not find value for key log4j.appender.file
log4j:ERROR Could not instantiate appender named "file".

It's possibly caused by misconfiguration of PDF box log4j.

Sensible default for validation profile in GUI

After starting the GUI, one now has to browse the directory structure to find a working validation profile. Using a sensible default here (the PDFA-1B.xml) would make things easier for a user.

ModelParser should throw a ParseException

If the passed file could not be parsed as a PDF then a specific parse exception should be thrown. This allows easy trapping of non-PDF file cases by the caller.

CLI: --profile argument

Profile selection works only for predefined profiles.

A. verapdf.bat -p "path to profile\PDFA-2B.xml" "path to pdf/somename.pdf"

Validation with -p argument, containing full path to profile (PDFA-2B.xml) gives incorrect result :

B. verapdf.bat -p 2b "path to pdf/somename.pdf"

But works fine with predefined profile for the same pdf file

OS: Windows 7, version 6.1, 7601:Service Pack 1
JM: Java version 1.8.0.66 / JRE build 1.8.0.66-b18

Validation via GUI renders mostly blank XML - no log messages to support any errors

Output:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <report xmlns="http://www.verapdf.org/MachineReadableReport" creationDate="2016-08-16T14:52:51.914+12:00" processingTime="00:00:02.225">
        <itemDetails size="1443616">
            <name>\\arcwn-fp02.archives.net\homefolders$\spencero\Desktop\224727.pdf\224727.pdf</name>
        </itemDetails>
        <pdfFeaturesReport/>
    </report>

The file comes from Govdocs Selected and is attached.

224727.pdf

Crash on running feature extraction

Here are four files that cause a crash when running the feature extractor of veraPDF (0.26.16), either GUI or CLI:
http://web.archive.org/web/20071009190733/http://www.thpct.nhs.uk/uploads/Trust%20Board%20Papers/01_07%20Papers.pdf
http://web.archive.org/web/20080702075816/http://news.bbc.co.uk/2/shared/bsp/hi/pdfs/14_01_08_lanarkshire.pdf
http://web.archive.org/web/20080702075821/http://news.bbc.co.uk/2/shared/bsp/hi/pdfs/14_01_08_ayrshire.pdf
http://web.archive.org/web/20080412102905/http://www.plymouth.gov.uk/260308lacmfinalsubmission.pdf

The first three give an Execution Exception, the last gives "Some error in saving the HTML Report ..."

I can provide further examples if needed.

Create a sample starter project for the veraPDF API.

It's currently difficult for a first time developer to know where to start when using the veraPDF code / API. Create a basic starter project that demonstrates the main use cases:

PDF/A validation;
PDF/A feature extraction; and
metadata repair.

Maven build fails

Following the instructions in the readme, the software failed to build on my system (Linux Mint). Full mvn output is here:

https://gist.github.com/bitsgalore/60c1c12d72cbe25fc3af

I'm using the packages maven2 and Oracle JDK 7; also tried with OpenJDK 7, but that's giving me similar results.

(Should add that I know next to nothing of Java, so maybe I'm just doing something wrong).

Maven build instructions on front page fail when building from scratch.

Following the build instructions to the letter for the "master" branch with an unpopulated local ~/.m2 repository results in the following error:

[ERROR] Failed to execute goal on project core: Could not resolve dependencies for project org.verapdf:core:jar:0.8.0: Failed to collect dependencies for [org.verapdf:pdf-model:jar:[0.8.0,0.9.0) (compile), rhino:js:jar:1.7R2 (compile), junit:junit:jar:4.12 (compile), nl.jqno.equalsverifier:equalsverifier:jar:1.5.1 (test)]: No versions available for org.verapdf:pdf-model:jar:[0.8.0,0.9.0) within specified range -> [Help 1]

Perhaps the prerequisites need to include building the needed verapdf artifacts?

NPE on serializing processing report to XML

Running the following code:

    PdfBoxFoundryProvider.initialise();
    PDFAFlavour flavour = PDFAFlavour.byFlavourId(profile);

    ValidatorConfig validatorConfig = ValidatorFactory.createConfig(flavour, true, 10);
    FeatureExtractorConfig featureConfig = FeatureFactory.defaultConfig();
    MetadataFixerConfig fixerConfig = FixerFactory.defaultConfig();
    EnumSet<TaskType> tasks = EnumSet.of(TaskType.VALIDATE);

    if (hasFeatures) {
      tasks.add(TaskType.EXTRACT_FEATURES);
    }

    ItemProcessor processor = ProcessorFactory
      .createProcessor(ProcessorFactory.fromValues(validatorConfig, featureConfig, fixerConfig, tasks));

    ProcessorResult result = processor.process(input.toFile());
    ByteArrayOutputStream os = new ByteArrayOutputStream();

    boolean prettyPrint = true;
    ProcessorFactory.resultToXml(result, os, prettyPrint);

Using org.verapdf:[email protected] dependency (org.verapdf:[email protected]), the following Null Pointer Exception error occurs when serializing the report to XML:

java.lang.NullPointerException: null
	at org.verapdf.processor.TaskResultImpl.getExceptionMessage(TaskResultImpl.java:40)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.sun.xml.internal.bind.v2.runtime.reflect.Accessor$GetterSetterReflection.get(Accessor.java:343)
	at com.sun.xml.internal.bind.v2.runtime.reflect.Accessor.getUnadapted(Accessor.java:127)
	at com.sun.xml.internal.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl.hasValue(TransducedAccessor.java:234)
	at com.sun.xml.internal.bind.v2.runtime.property.SingleElementLeafProperty.serializeBody(SingleElementLeafProperty.java:96)
	at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(ClassBeanInfoImpl.java:345)
	at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.childAsXsiType(XMLSerializer.java:681)
	at com.sun.xml.internal.bind.v2.runtime.property.ArrayElementNodeProperty.serializeItem(ArrayElementNodeProperty.java:54)
	at com.sun.xml.internal.bind.v2.runtime.property.ArrayElementProperty.serializeListBody(ArrayElementProperty.java:157)
	at com.sun.xml.internal.bind.v2.runtime.property.ArrayERProperty.serializeBody(ArrayERProperty.java:144)
	at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(ClassBeanInfoImpl.java:345)
	at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.childAsSoleContent(XMLSerializer.java:578)
	at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeRoot(ClassBeanInfoImpl.java:326)
	at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.childAsRoot(XMLSerializer.java:479)
	at com.sun.xml.internal.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:308)
	at com.sun.xml.internal.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:236)
	at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:95)
	at org.verapdf.core.XmlSerialiser.toXml(XmlSerialiser.java:249)
	at org.verapdf.processor.ProcessorFactory.resultToXml(ProcessorFactory.java:114)
	at org.roda.core.plugins.plugins.characterization.VeraPDFPluginUtils.runVeraPDF(VeraPDFPluginUtils.java:61)
	at org.roda.core.plugins.plugins.characterization.VeraPDFPlugin.executeOnRepresentation(VeraPDFPlugin.java:336)
	at org.roda.core.plugins.plugins.characterization.VeraPDFPlugin.execute(VeraPDFPlugin.java:149)
	at org.roda.core.plugins.orchestrate.akka.AkkaWorkerActor.handlePluginExecuteIsReady(AkkaWorkerActor.java:49)
	at org.roda.core.plugins.orchestrate.akka.AkkaWorkerActor.onReceive(AkkaWorkerActor.java:35)
	at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
	at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
	at akka.actor.ActorCell.invoke(ActorCell.scala:495)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
	at akka.dispatch.Mailbox.run(Mailbox.scala:224)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Problem seems to be due not expecting exception NOT to exist in the report when it is serialized into XML.

inconsistant handling of date/time format

I am using verapdf version 0.12.9 on a debian linux with java version 1.8.0_74, in NZST, and I am consistently getting this error:

<validationReport profile="PDF/A-1B validation profile" compliant="false">
    <statement>PDF file is not compliant with Validation Profile requirements.</statement>
    <details passedRules="101" failedRules="1" passedChecks="11707" failedChecks="1">
        <rule specification="ISO 19005-1:2005" clause="6.7.3" testNumber="1" status="failed" passedChecks="0" failedChecks="1">
            <description>If a document information dictionary does appear at a document, then all of its entries that have analogous properties in predefined XMP schemas, shall also be embedded in the file in XMP form with equivalent values.</description>
            <object>CosDocument</object>
            <test>doesInfoMatchXMP</test>
            <check status="failed">
                <context>root</context>

When I look at the binary of the pdf/a files I am checking, I see that the dates are the same. From one pdf/a I made on msword with a docx file in windows, the dates are:
xmp:CreateDate2016-04-11T15:57:39+12:00/xmp:CreateDatexmp:ModifyDate2016-04-11T15:57:39+12:00/xmp:ModifyDate
and, from the info dictionary:
<</CreationDate(D:20160411155739+12'00') /ModDate(D:20160411155739+12'00') >>

on a pdf/a made from the same docx in libreoffice on linux, the dates are:

from the info dictionary:
xmp:CreateDate2016-04-11T16:01:15+12:00/xmp:CreateDate
CreationDate(D:20160411160115+12'00')>>

However, verapdf is interpreting these dates differently. The xml output from the msword reports that the info dictionary dates are:
2016-04-11T15:57:39.000Z
2016-04-11T15:57:39.000Z

And that the xmp dates as:
xmp:CreateDate2016-04-11T15:57:39+12:00/xmp:CreateDate
xmp:ModifyDate2016-04-11T15:57:39+12:00/xmp:ModifyDate

It looks like the UTC offset in the case of the information dictionary dates is not being interpreted correctly. I am attaching both xml outputs and pdfs.

docx libreoffice.pdf
docx msword.pdf
docx libreoffice.txt
docx msword.txt

Correct typo for missing app.home and try to fix the underlying problem.

When using the features functionality, it normally throws the following error: 'Con not get system property “app.home”'. This message contains an obvious orthographic error in the first word.

XML output could be sharper in places

Not a major headache, but some of the veraPDF output is a bit messy which over complicates parsing with XPATH. In particular the <InformationDict> section currently has a number of tags, identified with a series of different attributes. <entry key="Title"></entry><entry key="producer"></entry> etc. This should be replaced with individually named tags: <title></title<producer></producer> etc.

The following XMP section also seems somewhat over complicated. A tidy up of the namespaces would help here.

I've mentioned this to CW already, but thought it sensible to add an issue for completeness.

Add JavaDoc generation to the project's Maven build.

Currently the build generates a JavaDoc Maven package and deploys to the OPF's Maven repository. Add a second task that publishes the JavaDoc to a public location at a well known URL.

Add sample project and JavaDoc links to the project README.

Provide links to the new starter resources in the README, see #442 and #441

some error in validating

I'm testing a pdf library "gnupdf". It's a light library doing only what I need. But some features are not or badly implemented so I debug it and I try to use veraPDF for this and I receive only the message "some error in validating" and there is no report so it's not very useful.

Document software versioning and release processes

Better information in the project README about:

software versioning policy;
use of GitHub branches;
veraPDF release policy; and
correspondence between GitHub tags and software version numbers;

GUI Installer doesn't close after installation is over

After installation GUI installer closes, but after it opens on more time starting from beginning
The same problem is reproduced with "Quit" button. If we push "Quit" button at first step or second step, GUI installer closes, but after it opens on more time starting from beginning

User specific config directory?

Dev Effort

Description

The current installation routine (installer v0.26.16) puts the veraPDF config directory inside the program installation directory (<install_dir>/config/). This is okay for a private, single-user installation where <install_dir> would be something like ~/bin/verapdf/ (examples based on Linux, but I guess on Windows it's similar). But if we want to install veraPDF system-wide for all users, <install_dir> will probably be a location that is writable only by root, like /opt/verapdf/. This leads to errors when starting the program as a regular, non-root user.

Please move the config directory from the installation directory to a user specific path, at least in case <install_dir>/config/ is not writable! Possible locations include ~/.verapdf/ or ~/.config/verapdf/ on Linux and somewhere below %APPDATA% on Windows.

Steps to reproduce:

Run the installer on Linux as root/sudo.
Choose /opt/verapdf/ as installation directory to prepare a system-wide installation. (Optionally, manually create a symlink /usr/local/bin/verapdf -> /opt/verapdf/verapdf later to make the verapdf command available in the users' path.)
After the installation has finished, run the /opt/verapdf/verapdf starter script as a regular user.

An exception is thrown:

Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: java.lang.IllegalArgumentException: Arg root:/opt/verapdf/config, must be a writable directory.
        at org.verapdf.apps.Applications.createConfigManager(Applications.java:36)
        at org.verapdf.apps.Applications.createAppConfigManager(Applications.java:44)
        at org.verapdf.cli.VeraPdfCli.<clinit>(VeraPdfCli.java:31)

CLI: xml report

Feature and validating reports are contains excess headers in xml file.

XML documents must contain one header and one root element that is the parent of all other elements.

Examples of generated reports:

1.veraPDF -x -f 0 someFile.pdf > FR.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<item size="41296">
    <name>C:\Users\dmitry.remezow\Desktop\GUI\verapdf-0.7.48\1f.pdf</name>
</item>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<featuresReport> some data </featuresReport>

2.veraPDF someFile.pdf > VR.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<item size="41296">
    <name>C:\Users\dmitry.remezow\Desktop\GUI\verapdf-0.7.48\1f.pdf</name>
</item>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<validationResult flavour="PDFA_1_B" totalAssertions="454" isCompliant="false">
some data
</validationResult>

OS: Windows 7, version 6.1, 7601:Service Pack 1
JM: Java version 1.8.0.66 / JRE build 1.8.0.66-b18
Version: 0.7.48

Add simple pass / fail summarised output for batch processing from the CLI

Simplify the output to the bare minimum, for example:

./verapdf --passed *.pdf 

PASSED   ./Published/14.pdf
PASSED   ./Published/apa-graf.pdf
PASSED   ./Published/archiving13.pdf
PASSED   ./Published/Archiving2014.pdf
FAILED   ./Published/ArticleBOM2013v4.pdf
FAILED   ./Published/becker_decision_jasist_published.pdf
PASSED   ./Published/ipres2011.pdf
FAILED   ./Published/bda-short-2013-sanoja-gancarski.pdf
PASSED   ./Published/ipres2012-sanoja-gancarski.pdf
PASSED   ./Published/paperEuroMed2012.pdf
PASSED   ./Published/wei_ipres11_final.pdf

Batch and recursive processing functionality for the veraPDF GUI

The result should be a listing of which files with the indication of which files failed and which files succeeded. It should also include the possibility of opening the detailed report for each of them. The list should be able to be exported to CSV.
We suggest showing the report immediately after running. If you take on the suggestion of supporting multiple files/folders, then the simplified report should be constructed as progress is being made (possibly on a ListView with multiple columns: filename, Passed/Failed, Link to detailed report).

veraPDF command line parameter parsing failure should be handled gracefully

When using the CLI version of VeraPDF 0.8.7 with a wrong parameter this stack trace is thrown:

$ ./verapdf --passsdfdsf

log4j:ERROR Could not find value for key log4j.appender.file
log4j:ERROR Could not instantiate appender named "file".
10:44:39,279 FATAL main VeraPdfCli:logThrowable:80 - Unknown option: --passsdfdsf
com.beust.jcommander.ParameterException: Unknown option: --passsdfdsf
    at com.beust.jcommander.JCommander.parseValues(JCommander.java:742)
    at com.beust.jcommander.JCommander.parse(JCommander.java:282)
    at com.beust.jcommander.JCommander.parse(JCommander.java:265)
    at org.verapdf.cli.VeraPdfCli.main(VeraPdfCli.java:49)

Validation question

Greetings from City of Helsinki, Finland
We started to use veraPDF for validation this year in Spring 2016.

I hope that my question is suitable for this GitHub channel, even though it is not actual veraPDF issue. For future questions you can advice if veraPDF has something like support forum for these kind of issues.

My questions related to my case
Can you explain why veraPDF and Preflight validates differently? Can you give a hint, how this veraPDF validation error can be fixed "OrderContainAllOCGs"?

My case - Attached file created

Saved to pdf-file (1.7) with Bentley CAD
Creator Tool: MicroStation 8.11.9.459 by Bentley Systems, Incorporated
Producer: Adobe PDF Library 9.0
Saved to PDF/A-2b with PDF-XChange Editor Plus Version 6.0 (Build 317.1) (Apr 19 2016;11:33:03)
Validated with veraPDF version 0.20
ERROR
specification="ISO_19005_2" clause="6.9" testNumber="3"
DESCRIPTION "If an optional content configuration dictionary contains the Order key, the array which is the value of this Order key shall contain references to all OCGs in the conforming file."
TEST doesOrderContainAllOCGs == true
MESSAGE Not all optional content groups are present in the Order entry of the optional content configuration dictionary
Validated also with Adobe Acrobat Pro DC 2015 version 2015.006.30033
Preflight PDF/A-2b validation -> No Problems Found

Attachment
0851_6_picture-layers_PDFA-2b.PDF

No help option for the command line application

After building and moving to the cli/target directory and executing the jar with a --help parameters, e.g.

java -jar vera-cli-1.0-SNAPSHOT-jar-with-dependencies.jar --help

I get the following exception:

Exception in thread "main" com.beust.jcommander.ParameterException: Unknown option: --help
        at com.beust.jcommander.JCommander.parseValues(JCommander.java:742)
        at com.beust.jcommander.JCommander.parse(JCommander.java:282)
        at com.beust.jcommander.JCommander.parse(JCommander.java:265)
        at org.verapdf.cli.VeraPdfCli.main(VeraPdfCli.java:32)

This should return a standard help message, related to #63 as no parameters should issue the same help message.

XmpParsingException resulting in "PDF file is not compliant with Validation Profile requirements"

If I open certain PDFs, trying to validate them against PDF/A-1a or 1b profiles, I get an error in the GUI: "PDF file is not compliant with Validation Profile requirements"

Simultaneously the console says:

10:20:11,045 ERROR SwingWorker-pool-2-thread-1 XMPChecker:doesInfoMatchXMP:93 - Problems with XMP parsing. Missing pdfaSchema:property in type definition
org.apache.xmpbox.xml.XmpParsingException: Missing pdfaSchema:property in type definition
        at org.apache.xmpbox.xml.PdfaExtensionHelper.populatePDFASchemaType(PdfaExtensionHelper.java:152)
        at org.apache.xmpbox.xml.PdfaExtensionHelper.populateSchemaMapping(PdfaExtensionHelper.java:116)
        at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:169)
        at org.verapdf.model.tools.XMPChecker.doesInfoMatchXMP(XMPChecker.java:74)
        at org.verapdf.model.impl.pb.cos.PBCosDocument.<init>(PBCosDocument.java:101)
        at org.verapdf.model.impl.pb.cos.PBCosDocument.<init>(PBCosDocument.java:63)
        at org.verapdf.model.ModelParser.getRoot(ModelParser.java:59)
        at org.verapdf.pdfa.validators.BaseValidator.validate(BaseValidator.java:371)
        at org.verapdf.gui.ValidateWorker.runValidator(ValidateWorker.java:171)
        at org.verapdf.gui.ValidateWorker.doInBackground(ValidateWorker.java:95)
        at org.verapdf.gui.ValidateWorker.doInBackground(ValidateWorker.java:46)
        at javax.swing.SwingWorker$1.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at javax.swing.SwingWorker.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
10:20:15,904 ERROR SwingWorker-pool-2-thread-1 PBoxPDMetadata:getXMPPackage:80 - Problems with parsing metadata. Missing pdfaSchema:property in type definition
org.apache.xmpbox.xml.XmpParsingException: Missing pdfaSchema:property in type definition
        at org.apache.xmpbox.xml.PdfaExtensionHelper.populatePDFASchemaType(PdfaExtensionHelper.java:152)
        at org.apache.xmpbox.xml.PdfaExtensionHelper.populateSchemaMapping(PdfaExtensionHelper.java:116)
        at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:169)
        at org.verapdf.model.impl.pb.pd.PBoxPDMetadata.getXMPPackage(PBoxPDMetadata.java:74)
        at org.verapdf.model.impl.pb.pd.PBoxPDMetadata.getLinkedObjects(PBoxPDMetadata.java:59)
        at org.verapdf.pdfa.validators.BaseValidator.addAllLinkedObjects(BaseValidator.java:194)
        at org.verapdf.pdfa.validators.BaseValidator.checkNext(BaseValidator.java:136)
        at org.verapdf.pdfa.validators.BaseValidator.validate(BaseValidator.java:87)
        at org.verapdf.pdfa.validators.BaseValidator.validate(BaseValidator.java:371)
        at org.verapdf.gui.ValidateWorker.runValidator(ValidateWorker.java:171)
        at org.verapdf.gui.ValidateWorker.doInBackground(ValidateWorker.java:95)
        at org.verapdf.gui.ValidateWorker.doInBackground(ValidateWorker.java:46)
        at javax.swing.SwingWorker$1.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at javax.swing.SwingWorker.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

The document can be viewed in Acrobat Reader or in Chrome's built-in PDF viewer.

Windows batch script error, possibly caused by insanely long classpath

Here's the suspect line from the generated script:

set CLASSPATH="%BASEDIR%"\etc;"%REPO%"\org\verapdf\core\0.1.196\core-0.1.196.jar;"%REPO%"\org\verapdf\verapdf-exceptions\0.1.196\verapdf-exceptions-0.1.196.jar;"%REPO%"\xmlunit\xmlunit\1.6\xmlunit-1.6.jar;"%REPO%"\junit\junit\4.12\junit-4.12.jar;"%REPO%"\org\hamcrest\hamcrest-core\1.3\hamcrest-core-1.3.jar;"%REPO%"\com\google\guava\guava\18.0\guava-18.0.jar;"%REPO%"\org\verapdf\validation-profile-parser\0.1.196\validation-profile-parser-0.1.196.jar;"%REPO%"\org\verapdf\validation-logic\0.1.196\validation-logic-0.1.196.jar;"%REPO%"\rhino\js\1.7R2\js-1.7R2.jar;"%REPO%"\org\verapdf\pdf-model\0.1.177\pdf-model-0.1.177.jar;"%REPO%"\org\verapdf\validation-report\0.1.196\validation-report-0.1.196.jar;"%REPO%"\org\verapdf\feature-report\0.1.196\feature-report-0.1.196.jar;"%REPO%"\org\verapdf\model-implementation\0.1.196\model-implementation-0.1.196.jar;"%REPO%"\org\verapdf\pdfbox\pdfbox\2.0.0-SNAPSHOT\pdfbox-2.0.0-20150707.113449-43.jar;"%REPO%"\org\verapdf\pdfbox\fontbox\2.0.0-SNAPSHOT\fontbox-2.0.0-20150707.113412-47.jar;"%REPO%"\commons-logging\commons-logging\1.2\commons-logging-1.2.jar;"%REPO%"\org\verapdf\pdfbox\xmpbox\2.0.0-SNAPSHOT\xmpbox-2.0.0-20150707.113430-46.jar;"%REPO%"\log4j\log4j\1.2.17\log4j-1.2.17.jar;"%REPO%"\org\bouncycastle\bcprov-jdk15on\1.52\bcprov-jdk15on-1.52.jar;"%REPO%"\org\codehaus\mojo\versions-maven-plugin\2.2\versions-maven-plugin-2.2.jar;"%REPO%"\org\apache\maven\maven-artifact\2.2.1\maven-artifact-2.2.1.jar;"%REPO%"\org\apache\maven\maven-artifact-manager\2.2.1\maven-artifact-manager-2.2.1.jar;"%REPO%"\org\apache\maven\maven-repository-metadata\2.2.1\maven-repository-metadata-2.2.1.jar;"%REPO%"\backport-util-concurrent\backport-util-concurrent\3.1\backport-util-concurrent-3.1.jar;"%REPO%"\org\apache\maven\maven-core\2.2.1\maven-core-2.2.1.jar;"%REPO%"\org\apache\maven\maven-plugin-parameter-documenter\2.2.1\maven-plugin-parameter-documenter-2.2.1.jar;"%REPO%"\org\apache\maven\wagon\wagon-http-lightweight\1.0-beta-6\wagon-http-lightweight-1.0-beta-6.jar;"%REPO%"\org\apache\maven\wagon\wagon-http-shared\1.0-beta-6\wagon-http-shared-1.0-beta-6.jar;"%REPO%"\nekohtml\xercesMinimal\1.9.6.2\xercesMinimal-1.9.6.2.jar;"%REPO%"\nekohtml\nekohtml\1.9.6.2\nekohtml-1.9.6.2.jar;"%REPO%"\org\apache\maven\wagon\wagon-http\1.0-beta-6\wagon-http-1.0-beta-6.jar;"%REPO%"\org\apache\maven\wagon\wagon-webdav-jackrabbit\1.0-beta-6\wagon-webdav-jackrabbit-1.0-beta-6.jar;"%REPO%"\org\apache\jackrabbit\jackrabbit-webdav\1.5.0\jackrabbit-webdav-1.5.0.jar;"%REPO%"\org\apache\jackrabbit\jackrabbit-jcr-commons\1.5.0\jackrabbit-jcr-commons-1.5.0.jar;"%REPO%"\commons-httpclient\commons-httpclient\3.0\commons-httpclient-3.0.jar;"%REPO%"\org\slf4j\slf4j-nop\1.5.3\slf4j-nop-1.5.3.jar;"%REPO%"\org\slf4j\slf4j-jdk14\1.5.6\slf4j-jdk14-1.5.6.jar;"%REPO%"\org\slf4j\slf4j-api\1.5.6\slf4j-api-1.5.6.jar;"%REPO%"\org\slf4j\jcl-over-slf4j\1.5.6\jcl-over-slf4j-1.5.6.jar;"%REPO%"\org\apache\maven\maven-profile\2.2.1\maven-profile-2.2.1.jar;"%REPO%"\org\apache\maven\maven-error-diagnostics\2.2.1\maven-error-diagnostics-2.2.1.jar;"%REPO%"\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;"%REPO%"\org\apache\maven\wagon\wagon-ssh-external\1.0-beta-6\wagon-ssh-external-1.0-beta-6.jar;"%REPO%"\org\apache\maven\wagon\wagon-ssh-common\1.0-beta-6\wagon-ssh-common-1.0-beta-6.jar;"%REPO%"\org\apache\maven\maven-monitor\2.2.1\maven-monitor-2.2.1.jar;"%REPO%"\org\apache\maven\wagon\wagon-ssh\1.0-beta-6\wagon-ssh-1.0-beta-6.jar;"%REPO%"\com\jcraft\jsch\0.1.38\jsch-0.1.38.jar;"%REPO%"\classworlds\classworlds\1.1\classworlds-1.1.jar;"%REPO%"\org\sonatype\plexus\plexus-sec-dispatcher\1.3\plexus-sec-dispatcher-1.3.jar;"%REPO%"\org\sonatype\plexus\plexus-cipher\1.4\plexus-cipher-1.4.jar;"%REPO%"\org\apache\maven\maven-model\2.2.1\maven-model-2.2.1.jar;"%REPO%"\org\apache\maven\maven-plugin-api\2.2.1\maven-plugin-api-2.2.1.jar;"%REPO%"\org\apache\maven\maven-plugin-descriptor\2.2.1\maven-plugin-descriptor-2.2.1.jar;"%REPO%"\org\apache\maven\maven-settings\2.2.1\maven-settings-2.2.1.jar;"%REPO%"\org\codehaus\plexus\plexus-interpolation\1.11\plexus-interpolation-1.11.jar;"%REPO%"\org\apache\maven\maven-project\2.2.1\maven-project-2.2.1.jar;"%REPO%"\org\apache\maven\maven-plugin-registry\2.2.1\maven-plugin-registry-2.2.1.jar;"%REPO%"\org\apache\maven\reporting\maven-reporting-api\3.0\maven-reporting-api-3.0.jar;"%REPO%"\org\apache\maven\reporting\maven-reporting-impl\2.2\maven-reporting-impl-2.2.jar;"%REPO%"\commons-validator\commons-validator\1.3.1\commons-validator-1.3.1.jar;"%REPO%"\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;"%REPO%"\commons-digester\commons-digester\1.6\commons-digester-1.6.jar;"%REPO%"\org\apache\maven\shared\maven-common-artifact-filters\1.4\maven-common-artifact-filters-1.4.jar;"%REPO%"\org\apache\maven\wagon\wagon-provider-api\2.5\wagon-provider-api-2.5.jar;"%REPO%"\org\apache\maven\wagon\wagon-file\2.5\wagon-file-2.5.jar;"%REPO%"\org\apache\maven\doxia\doxia-core\1.4\doxia-core-1.4.jar;"%REPO%"\org\apache\maven\doxia\doxia-logging-api\1.4\doxia-logging-api-1.4.jar;"%REPO%"\org\codehaus\plexus\plexus-component-annotations\1.5.5\plexus-component-annotations-1.5.5.jar;"%REPO%"\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;"%REPO%"\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;"%REPO%"\org\apache\httpcomponents\httpclient\4.0.2\httpclient-4.0.2.jar;"%REPO%"\commons-codec\commons-codec\1.3\commons-codec-1.3.jar;"%REPO%"\org\apache\httpcomponents\httpcore\4.0.1\httpcore-4.0.1.jar;"%REPO%"\org\apache\maven\doxia\doxia-sink-api\1.4\doxia-sink-api-1.4.jar;"%REPO%"\org\apache\maven\doxia\doxia-site-renderer\1.4\doxia-site-renderer-1.4.jar;"%REPO%"\org\apache\maven\doxia\doxia-decoration-model\1.4\doxia-decoration-model-1.4.jar;"%REPO%"\org\apache\maven\doxia\doxia-module-xhtml\1.4\doxia-module-xhtml-1.4.jar;"%REPO%"\org\apache\maven\doxia\doxia-module-fml\1.4\doxia-module-fml-1.4.jar;"%REPO%"\org\codehaus\plexus\plexus-velocity\1.1.7\plexus-velocity-1.1.7.jar;"%REPO%"\org\apache\velocity\velocity\1.5\velocity-1.5.jar;"%REPO%"\oro\oro\2.0.8\oro-2.0.8.jar;"%REPO%"\org\apache\velocity\velocity-tools\2.0\velocity-tools-2.0.jar;"%REPO%"\commons-chain\commons-chain\1.1\commons-chain-1.1.jar;"%REPO%"\dom4j\dom4j\1.1\dom4j-1.1.jar;"%REPO%"\sslext\sslext\1.2-0\sslext-1.2-0.jar;"%REPO%"\org\apache\struts\struts-core\1.3.8\struts-core-1.3.8.jar;"%REPO%"\antlr\antlr\2.7.2\antlr-2.7.2.jar;"%REPO%"\org\apache\struts\struts-taglib\1.3.8\struts-taglib-1.3.8.jar;"%REPO%"\org\apache\struts\struts-tiles\1.3.8\struts-tiles-1.3.8.jar;"%REPO%"\commons-collections\commons-collections\3.2.1\commons-collections-3.2.1.jar;"%REPO%"\org\codehaus\plexus\plexus-utils\3.0.20\plexus-utils-3.0.20.jar;"%REPO%"\org\codehaus\plexus\plexus-container-default\1.5.5\plexus-container-default-1.5.5.jar;"%REPO%"\org\codehaus\plexus\plexus-classworlds\2.2.2\plexus-classworlds-2.2.2.jar;"%REPO%"\org\apache\xbean\xbean-reflect\3.4\xbean-reflect-3.4.jar;"%REPO%"\commons-logging\commons-logging-api\1.1\commons-logging-api-1.1.jar;"%REPO%"\com\google\collections\google-collections\1.0\google-collections-1.0.jar;"%REPO%"\org\codehaus\plexus\plexus-interactivity-api\1.0-alpha-6\plexus-interactivity-api-1.0-alpha-6.jar;"%REPO%"\org\codehaus\plexus\plexus-i18n\1.0-beta-10\plexus-i18n-1.0-beta-10.jar;"%REPO%"\org\codehaus\woodstox\woodstox-core-asl\4.2.0\woodstox-core-asl-4.2.0.jar;"%REPO%"\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;"%REPO%"\org\codehaus\woodstox\stax2-api\3.1.1\stax2-api-3.1.1.jar;"%REPO%"\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;"%REPO%"\org\verapdf\gui\0.1.196\gui-0.1.196.jar

Enhancements to GUI front end.

These suggestions are taken from the KEEPS technical report:

The “Validate” button should be renamed to something like “Run”, “Execute” or just “Go”. When we choose the option “Features”, the app is not really “Validating”, but doing a different operation.
Move the “Generate reports” line one up in the interface so that it stands above the “Validate” button (move the “Validate” button down as well). This is because we need to choose a “Generate reports” option before starting the processing.
Consider renaming “Generate reports” to “Action” or “Report type”

Text reporter doesn't honour option to report successful checks

Specifying --passed or --success options in combination with verbosity -v has no effect on the output generated by the text reporter in veraPDF 0.20.3. I'm not sure whether this is intentional or not, but it should probably be clarified in the CLI documentation, or the reporter should be amended to present both passed and failed checks. Perhaps something like:

FAIL C:\Users\...\650553.pdf
  PASS 6.7.11-1
  FAIL 6.7.11-2
  FAIL 6.7.11-3
  FAIL 6.7.3-1
  ...

Unsupported major.minor version

I install on linux with the latest sdk and I have this error message :
Unsupported major.minor version 51.0
Unable to test.

Regards.

Vera PDF testing with Archives NZ Born-Digital 'Preservation Masters'

Posting on behalf of Archives New Zealand and @andreakb testing of Vera PDF on our 49 PDF Preservation Masters.

Speaking to @carlwilson on the last OPF call he suggested we upload the XML results from these as a zip to be put through further tests at Vera PDF HQ.

Obviously with 49 PDF we can't really test scale or performance, but these 49 should be useful from a preservation perspective as they are the most valuable PDFs in our collection being entirely born-digital.

The PDFs are also available to be downloaded by the public if there is any further need. We're happy to perform further testing too, and so the links are only here in the attached XLS as an FYI or if you're interested.

Keen to keep the dialog going. Please do let me know if this isn't the right project to attach these too, and let us know as well what else you might need from us so that we can help improve the tool.

verapdf-0136output.zip

PDF_PMs_GDA.xlsx

JVM startup seems to be included in first file's processingTime

When processing files in batches, the first file seems to count the JVM's loading time in its processingTime attribute. Any files processed afterwards generally report significantly shorter times. Below are the results from a sample batch of five files, three of which were identical.

<report ... processingTime="00:00:02.128">  <---
<report ... processingTime="00:00:00.700">     ¦
<report ... processingTime="00:00:00.734">  <--- SAME FILE
<report ... processingTime="00:00:00.607">     ¦
<report ... processingTime="00:00:00.715">  <---

Ideally the times would be comparable between all files. As it stands, the loading and processing times can't be separated for the first file, and so can't be accurately compared with the others' times.

validation throws an exception

Validating the attached (invalid) file throws an exception in verapdf gui 0.16.2.

resulticbe852e3-c9ed-4668-998b-203d2a354a06.pdf

Add AFRelationship key to the features of EmbeddedFile

The value of the AFRelationship key is not included at the moment as a property of embedded file in the feature report.

Difference between xml and mrr output?

Dev Effort

Description

Applies to VeraPDF 0.26.16

The -format switch of the CLI tool has 2 values (xml and mrr) that both write xml output, although the output is somewhat different in both cases. I couldn't find any documentation on the reason for having 2 different XML formats, and I found this a bit confusing. My suggestion would be to either get rid of one of the XML formats, or otherwise document the differences between them.

Out of Memory Error

I ran verapdf-gui, version 0.10, on an HP printer manual in PDF form; after about half a minute, it got an Out of Memory Error. The stack dump follows. The file is called LJ1010UG_EN.pdf and is 2431236 bytes long. Operating system: OS X 10.10.5. Running verapdf from the command line produces the same OutOfMemoryError, though of course without any Swing classes being involved.

I was able to fix the problem by adding the parameter -Xmx2048m in a copy of the verapdf script. Perhaps VeraPDF should use this parameter, or some suitable value bigger than Java's conservative default, to minimize the chance of running out of memory.

java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

/Users/gmcgath/Software/VeraPDF/VeraPDF/verapdf-gui ; exit;
07:54:55,116 ERROR AWT-EventQueue-0 CheckerPanel:errorInValidatingOccur:435 - Exception during the validation process
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at javax.swing.SwingWorker.get(SwingWorker.java:602)
at org.verapdf.gui.CheckerPanel.validationEnded(CheckerPanel.java:392)
at org.verapdf.gui.ValidateWorker.done(ValidateWorker.java:182)
at javax.swing.SwingWorker$5.run(SwingWorker.java:737)
at javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.run(SwingWorker.java:832)
at sun.swing.AccumulativeRunnable.run(AccumulativeRunnable.java:112)
at javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.actionPerformed(SwingWorker.java:842)
at javax.swing.Timer.fireActionPerformed(Timer.java:312)
at javax.swing.Timer$DoPostEvent.run(Timer.java:244)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:312)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:745)
at java.awt.EventQueue.access$300(EventQueue.java:103)
at java.awt.EventQueue$3.run(EventQueue.java:706)
at java.awt.EventQueue$3.run(EventQueue.java:704)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:715)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:138)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:91)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.fontbox.cff.IndexData.initData(IndexData.java:95)
at org.apache.fontbox.cff.CFFParser.readIndexData(CFFParser.java:163)
at org.apache.fontbox.cff.CFFParser.parseFont(CFFParser.java:393)
at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:115)
at org.apache.fontbox.ttf.CFFTable.read(CFFTable.java:53)
at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:361)
at org.apache.fontbox.ttf.OpenTypeFont.getCFF(OpenTypeFont.java:61)
at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeFontImpl(FileSystemFontProvider.java:260)
at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeCollection(FileSystemFontProvider.java:180)
at org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.(FileSystemFontProvider.java:155)
at org.apache.pdfbox.pdmodel.font.FontMapper$DefaultFontProvider.(FontMapper.java:80)
at org.apache.pdfbox.pdmodel.font.FontMapper.getProvider(FontMapper.java:99)
at org.apache.pdfbox.pdmodel.font.FontMapper.findFont(FontMapper.java:414)
at org.apache.pdfbox.pdmodel.font.FontMapper.findFontBoxFont(FontMapper.java:383)
at org.apache.pdfbox.pdmodel.font.FontMapper.getFontBoxFont(FontMapper.java:356)
at org.apache.pdfbox.pdmodel.font.PDType1Font.(PDType1Font.java:110)
at org.apache.pdfbox.pdmodel.font.PDType1Font.(PDType1Font.java:72)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:62)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:96)
at org.verapdf.model.tools.resources.PDInheritableResources.getFont(PDInheritableResources.java:47)
at org.verapdf.model.factory.operator.OperatorParser.getFontFromResources(OperatorParser.java:489)
at org.verapdf.model.factory.operator.OperatorParser.parseOperator(OperatorParser.java:250)
at org.verapdf.model.factory.operator.OperatorFactory.operatorsFromTokens(OperatorFactory.java:49)
at org.verapdf.model.impl.pb.pd.PBoxPDContentStream.getOperators(PBoxPDContentStream.java:53)
at org.verapdf.model.impl.pb.pd.PBoxPDContentStream.getLinkedObjects(PBoxPDContentStream.java:41)
at org.verapdf.pdfa.validators.BaseValidator.addAllLinkedObjects(BaseValidator.java:194)
at org.verapdf.pdfa.validators.BaseValidator.checkNext(BaseValidator.java:136)
at org.verapdf.pdfa.validators.BaseValidator.validate(BaseValidator.java:87)
at org.verapdf.pdfa.validators.BaseValidator.validate(BaseValidator.java:371)
at org.verapdf.gui.ValidateWorker.runValidator(ValidateWorker.java:171)
at org.verapdf.gui.ValidateWorker.doInBackground(ValidateWorker.java:95)
at org.verapdf.gui.ValidateWorker.doInBackground(ValidateWorker.java:46)

verapdf / verapdf-library Goto Github PK

verapdf-library's People

Contributors

Stargazers

Watchers

Forkers

verapdf-library's Issues

Dev Effort

Description

Dev Effort

Description

Dev Effort

Description

Recommend Projects

Recommend Topics

Recommend Org