vinsguru / pdf-util Goto Github PK

View Code? Open in Web Editor NEW

97.0 97.0 70.0 1.7 MB

PDF Compare Utility

Java 100.00%

pdf-util's People

Contributors

Stargazers

Watchers

Forkers

betterment bobquest33 kompally pascalschumacher chandubabuv pellcorp jghamburg santosh0912 yu12333 marseli95 vvasianovych mathiasve as842718 michaelviaene forexblog ademdogu lakshmikantdeshpande chisasaw brij-bhasin svalchinov mynewlearnings2021 satishgonella joergfischer hontrang vitaliismokov nickkfwong-work pablonicolasdiaz jlolling ro-el frankvanderkuur fruitpunch5amurai dst1213 bigblue1735 christoberaps op1993 nabeen5 sridharangopal ccrazypeter saikrishnameda248 malfoy1995 hongnba gnyblast knighthunter09 hansvd diunuge boyuxin dilook ajit1 zhoutanxin nerviantone kartikijoshi myhzxn murodin masonmarker ziyekudeng jwatchdog rawedit tiendung690 logicprogrammer tsbalaji10 sonaljeet chaomc kirankandel kkkishore9999 xcflyworld fengtaotien mannemlokesh cniesen

pdf-util's Issues

Possible to release two different version of the artifact to maven

A pdfutil-all, which is the jar-with-dependencies and a pdfutil which is just the pdf-util code, and the dependencies are transitive. With the current approach of making the jar-with-dependencies the default artifact, it makes it problematic to use in a maven build, especailly if we are using different versions of commons-io, commons-logging, fontbox, pdfbox, etc.

java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!

Hi,
When I compare two pdf file using the PDFUtil, I encountered the following exception:

java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
at sun.awt.image.IntegerInterleavedRaster.getDataElements(IntegerInterleavedRaster.java:219)
at java.awt.image.BufferedImage.getRGB(BufferedImage.java:986)
at com.testautomationguru.utility.ImageUtil.compareAndHighlight(ImageUtil.java:20)
at com.testautomationguru.utility.PDFUtil.convertToImageAndCompare(PDFUtil.java:479)
at com.testautomationguru.utility.PDFUtil.comparePdfByImage(PDFUtil.java:450)
at com.testautomationguru.utility.PDFUtil.comparePdfFiles(PDFUtil.java:311)
at com.testautomationguru.utility.PDFUtil.compare(PDFUtil.java:271)
at com.sls.awb.PdfDiff2.main(PdfDiff2.java:28)

Would you please help to see what the problem is？Thank you

My Java Code:

package com.sls.awb;
import java.io.IOException;
import com.testautomationguru.utility.CompareMode;
import com.testautomationguru.utility.PDFUtil;

public class PdfDiff2 {
    public static void main(String[] args) throws IOException {
        PDFUtil pdfUtil = new PDFUtil();
        pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);

        boolean same = pdfUtil.compare(args[0], args[1]);

        if (same) {
            System.out.println("same");
        } else {
            System.out.println("diff");
        }
    }
}

The two PDFS used for comparison can be viewed in the attachment
test-v1.pdf
test-v2.pdf

I have to pdf file both pdf file having same content but its not working

fileToCompare=ApplicationPreview_Advice368.pdf
masterFile=MasterFileTextStyle.pdf
Could you please check why it is working.
PDFUtil pdfUtil = new PDFUtil();
pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);
pdfUtil.highlightPdfDifference(Color.RED);
pdfUtil.highlightPdfDifference(true);
pdfUtil.compareAllPages(true);
return pdfUtil.compare(masterFile, fileToCompare);

NoSuchMethodError when using Maven dependency

When I include PDFUtil as a maven dependency with:

<dependency>
    <groupId>com.testautomationguru.pdfutil</groupId>
    <artifactId>pdf-util</artifactId>
    <version>0.0.1</version>
</dependency>

I get following error:

java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocument.getPage(I)Lorg/apache/pdfbox/pdmodel/PDPage;
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:108)
at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94)
at com.testautomationguru.utility.PDFUtil.convertToImageAndCompare(PDFUtil.java:470)
at com.testautomationguru.utility.PDFUtil.comparePdfByImage(PDFUtil.java:443)
at com.testautomationguru.utility.PDFUtil.comparePdfFiles(PDFUtil.java:304)
at com.testautomationguru.utility.PDFUtil.compare(PDFUtil.java:264)

When I just add the pdfutil.jar to my project it works fine.

There are no outputs after using the feature of comparing PDF files in Visual mode with storing the results

Hi there!
First of all, I would like to say thanks for your sharing. It's beneficial for us to verify PDF files' content.

Besides that, the most insight in your library is the feature of comparing PDF files in Visual mode. I followed the scripts you showed us below

`pdfUtil.setCompareMode(CompareMode.VISUAL_MODE)
pdfUtil.compare(file1, file2)

	// compare the 3rd page alone
	pdfUtil.compare(file1, file2, startPage, endPage)

	//if you need to store the result
	pdfUtil.highlightPdfDifference(true)
	pdfUtil.setImageDestinationPath(destinationPath)
	pdfUtil.compare(file1, file2)`

However, there is no output in the destination folder after calling those scripts. Could you please help to clarify the output we should expect with those scripts?

Thanks and Regards,

Comparison stops as soon as differences are found

Use case: I have a couple of pdfs with slight differences on multiple pages. I want to use the tool to highlight the differences.

Exspected outcome: Multiple png files with the highlighted differences

Actual outcome: I only get the first page with the highlighted differences

Is this intended behaviour or am I missing something

import com.testautomationguru.utility.PDFUtil;
import com.testautomationguru.utility.CompareMode;

public class PDFCompare {
  public static void main(String[] args) throws java.io.IOException{
    PDFUtil pdfUtil = new PDFUtil();

    String file1="files/doc1.pdf";
    String file2="files/doc2.pdf";

    pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);

    //if you need to store the result
    pdfUtil.highlightPdfDifference(true);
    pdfUtil.setImageDestinationPath("files/");
    pdfUtil.compare(file1, file2, 2, 5);
  }//End of main
}

Compile & executing

javac -cp '.:pdfutil.jar' PDFCompare.java
java -cp '.:pdfutil.jar' PDFCompare

IE script failing when VDI(Virtual Machine) and Cloud get disconnected.

Java Selenium- We use either Virtual Machine or cloud to run Scripts.

Virtual Machine- When we actually working on Virtual machine it means when it is active then scripts are passing but when virtual Machine disconnect(disconnection might be because of any reason like we manually disconnected or machine shut down or suppose if I just locked machine and went then after some time it will disconnect automatically.) that time scripts are failing. When analyzed then got to know that its just opening IE browser then entering url but after that its not entering username password not even clicking button nothing is happening and scripts are failing.

Cloud: on aloud also we facing similar issue as I mentioned above for VM but additional to that for VM when you lock(windows+L) machine then it is working fine till VM get disconnect but for cloud even if you lock or even minimize that cloud then also scripts are failing.

Note: Chrome scripts are working fine in any condition.

PDF Comparison failing though both PDFs are same because on non sequencial retrival of text

I am using below code to get whole PDF text into strings and then compare of both string.
String str = pdfutil.getText("C:\Users\"+System.getProperty("user.name")+"\Downloads"+"\"+prereport+".pdf");
String str1 = pdfutil.getText("C:\Users\"+System.getProperty("user.name")+"\Downloads"+"\"+postreport+".pdf");
System.out.println("Check the text from both PDFs : " + str.equalsIgnoreCase(str1));

sometimes retrival of text is not sequencial.Ex-
suppose from 1 PDF its retrieved text like --- $497.10 0.51 - Investment Cash
from 2nd PDF its retrieving text like --- $497.10 -0.51 Investment Cash

in one string there is 0.51 - and in other string -0.51 so PDF comparison is failing.

Please see above screenshot how it looks in actual both PDFs. Ideally it should retrieve sequentially and PDF Comparison should be successfully .Please help me to resolve this issue.

java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocument.getPages()

Trying to run the compare method of the PDFUtil and getting this error: java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocument.getPages()

junk characters coming in retrieved text using PDF util class method getText

I am using below code to get whole PDF text into strings and then compare of both string.
String str = pdfutil.getText("C:\Users"+System.getProperty("user.name")+"\Downloads"+""+prereport+".pdf");
String str1 = pdfutil.getText("C:\Users"+System.getProperty("user.name")+"\Downloads"+""+postreport+".pdf");
System.out.println("Check the text from both PDFs : " + str.equalsIgnoreCase(str1));

When I retrieve pdf text into string instead of text am getting below type of characters in retrieved string.
jlkqeiv qobka obmloq _v ^``lrkq mêÉé~~êÉÇ=Ñçê ^g^o ag _ìííÉêÑáÉäÇ jçåíÜäó=qêÉåÇ=oÉéçêí=ÖÉåÉê~~íÉÇ=çå lÅíçÄÉê=NPI=OMNT=~í=PWMR=~ã=EbpqF

Compare in visual mode not working

pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);
pdfUtil.highlightPdfDifference(true);
pdfUtil.setImageDestinationPath(resultFileDestFolder);
pdfUtil.compare(file1, file2);

 Does not produce any output file, can somebody help?

Generate Diff PDF

Hi,
Can it generate a diff pdf file showing page by page difference ?
Deep

is it able to skip few pixel areas to compare.

I have one reference pdf with me and comparing newly generates pdf's. but all the when new pdf is generated it will be having one unique bar code in it.

when i run this, all the time my tests show as failed, instead i need to skip few pixel areas and compare.

Please let me know if this is possible.

Thank you.

Selenium Grid-Access downloaded file in node machine

I am using Selenium Grid and for one script by default file is getting downloaded in Download folder of node machine.
with the use of Talk2Grid I retrieved NODEIP and to access downloaded file I used below path in code
Download Path- \\"+NODEIP+"\users\+system.getproperty("user.name")+\Downloads

Here NODEIP it is taking correctly but system.getproperty("user.name") gives HUB User
Can anyone please guide how to access file downloaded in node machine?

pdf-util-0.0.1.jar contains dependency classes

Hi,

thanks for providing pdf-util. 👍

We had some problems using pdf-util, because the jar contains the classes of the dependencies. Our project is using commons-io 2.5 but pdf-util uses commons-io 1.3.2 and pdf-util adds these classes to the classpath.

Please consider removing the classes from the jar or use the maven shade plugin to relocated the classes (see: https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html for details; I could send you a pull request, if you like.)

Both solutions will prevent dependency conflicts.

Is there a python version for this?

Hi,
I can't seem to figure out how to use library in python. Is there a python version of this library as well?

Licence.txt?

Thanks for writing such a useful utility. Under what licensing terms is this software made available? Would it be possible to add a license.txt file to the repo?