vinsguru / pdf-util Goto Github PK
View Code? Open in Web Editor NEWPDF Compare Utility
PDF Compare Utility
A pdfutil-all, which is the jar-with-dependencies and a pdfutil which is just the pdf-util code, and the dependencies are transitive. With the current approach of making the jar-with-dependencies the default artifact, it makes it problematic to use in a maven build, especailly if we are using different versions of commons-io, commons-logging, fontbox, pdfbox, etc.
Hi,
When I compare two pdf file using the PDFUtil, I encountered the following exception:
java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
at sun.awt.image.IntegerInterleavedRaster.getDataElements(IntegerInterleavedRaster.java:219)
at java.awt.image.BufferedImage.getRGB(BufferedImage.java:986)
at com.testautomationguru.utility.ImageUtil.compareAndHighlight(ImageUtil.java:20)
at com.testautomationguru.utility.PDFUtil.convertToImageAndCompare(PDFUtil.java:479)
at com.testautomationguru.utility.PDFUtil.comparePdfByImage(PDFUtil.java:450)
at com.testautomationguru.utility.PDFUtil.comparePdfFiles(PDFUtil.java:311)
at com.testautomationguru.utility.PDFUtil.compare(PDFUtil.java:271)
at com.sls.awb.PdfDiff2.main(PdfDiff2.java:28)
Would you please help to see what the problem is?Thank you
My Java Code:
package com.sls.awb;
import java.io.IOException;
import com.testautomationguru.utility.CompareMode;
import com.testautomationguru.utility.PDFUtil;
public class PdfDiff2 {
public static void main(String[] args) throws IOException {
PDFUtil pdfUtil = new PDFUtil();
pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);
boolean same = pdfUtil.compare(args[0], args[1]);
if (same) {
System.out.println("same");
} else {
System.out.println("diff");
}
}
}
The two PDFS used for comparison can be viewed in the attachment
test-v1.pdf
test-v2.pdf
fileToCompare=ApplicationPreview_Advice368.pdf
masterFile=MasterFileTextStyle.pdf
Could you please check why it is working.
PDFUtil pdfUtil = new PDFUtil();
pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);
pdfUtil.highlightPdfDifference(Color.RED);
pdfUtil.highlightPdfDifference(true);
pdfUtil.compareAllPages(true);
return pdfUtil.compare(masterFile, fileToCompare);
When I include PDFUtil as a maven dependency with:
<dependency>
<groupId>com.testautomationguru.pdfutil</groupId>
<artifactId>pdf-util</artifactId>
<version>0.0.1</version>
</dependency>
I get following error:
java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocument.getPage(I)Lorg/apache/pdfbox/pdmodel/PDPage;
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:108)
at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94)
at com.testautomationguru.utility.PDFUtil.convertToImageAndCompare(PDFUtil.java:470)
at com.testautomationguru.utility.PDFUtil.comparePdfByImage(PDFUtil.java:443)
at com.testautomationguru.utility.PDFUtil.comparePdfFiles(PDFUtil.java:304)
at com.testautomationguru.utility.PDFUtil.compare(PDFUtil.java:264)
When I just add the pdfutil.jar to my project it works fine.
Hi there!
First of all, I would like to say thanks for your sharing. It's beneficial for us to verify PDF files' content.
Besides that, the most insight in your library is the feature of comparing PDF files in Visual mode. I followed the scripts you showed us below
`pdfUtil.setCompareMode(CompareMode.VISUAL_MODE)
pdfUtil.compare(file1, file2)
// compare the 3rd page alone
pdfUtil.compare(file1, file2, startPage, endPage)
//if you need to store the result
pdfUtil.highlightPdfDifference(true)
pdfUtil.setImageDestinationPath(destinationPath)
pdfUtil.compare(file1, file2)`
However, there is no output in the destination folder after calling those scripts. Could you please help to clarify the output we should expect with those scripts?
Thanks and Regards,
Use case: I have a couple of pdfs with slight differences on multiple pages. I want to use the tool to highlight the differences.
Exspected outcome: Multiple png files with the highlighted differences
Actual outcome: I only get the first page with the highlighted differences
Is this intended behaviour or am I missing something
import com.testautomationguru.utility.PDFUtil;
import com.testautomationguru.utility.CompareMode;
public class PDFCompare {
public static void main(String[] args) throws java.io.IOException{
PDFUtil pdfUtil = new PDFUtil();
String file1="files/doc1.pdf";
String file2="files/doc2.pdf";
pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);
//if you need to store the result
pdfUtil.highlightPdfDifference(true);
pdfUtil.setImageDestinationPath("files/");
pdfUtil.compare(file1, file2, 2, 5);
}//End of main
}
Compile & executing
javac -cp '.:pdfutil.jar' PDFCompare.java
java -cp '.:pdfutil.jar' PDFCompare
Java Selenium- We use either Virtual Machine or cloud to run Scripts.
Virtual Machine- When we actually working on Virtual machine it means when it is active then scripts are passing but when virtual Machine disconnect(disconnection might be because of any reason like we manually disconnected or machine shut down or suppose if I just locked machine and went then after some time it will disconnect automatically.) that time scripts are failing. When analyzed then got to know that its just opening IE browser then entering url but after that its not entering username password not even clicking button nothing is happening and scripts are failing.
Cloud: on aloud also we facing similar issue as I mentioned above for VM but additional to that for VM when you lock(windows+L) machine then it is working fine till VM get disconnect but for cloud even if you lock or even minimize that cloud then also scripts are failing.
Note: Chrome scripts are working fine in any condition.
I am using below code to get whole PDF text into strings and then compare of both string.
String str = pdfutil.getText("C:\Users\"+System.getProperty("user.name")+"\Downloads"+"\"+prereport+".pdf");
String str1 = pdfutil.getText("C:\Users\"+System.getProperty("user.name")+"\Downloads"+"\"+postreport+".pdf");
System.out.println("Check the text from both PDFs : " + str.equalsIgnoreCase(str1));
sometimes retrival of text is not sequencial.Ex-
suppose from 1 PDF its retrieved text like --- $497.10 0.51 - Investment Cash
from 2nd PDF its retrieving text like --- $497.10 -0.51 Investment Cash
in one string there is 0.51 - and in other string -0.51 so PDF comparison is failing.
Please see above screenshot how it looks in actual both PDFs. Ideally it should retrieve sequentially and PDF Comparison should be successfully .Please help me to resolve this issue.
Trying to run the compare method of the PDFUtil and getting this error: java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocument.getPages()
I am using below code to get whole PDF text into strings and then compare of both string.
String str = pdfutil.getText("C:\Users"+System.getProperty("user.name")+"\Downloads"+""+prereport+".pdf");
String str1 = pdfutil.getText("C:\Users"+System.getProperty("user.name")+"\Downloads"+""+postreport+".pdf");
System.out.println("Check the text from both PDFs : " + str.equalsIgnoreCase(str1));
When I retrieve pdf text into string instead of text am getting below type of characters in retrieved string.
jlkqeiv qobka obmloq _v ^``lrkq mêÉéêÉÇ=Ñçê ^g^o ag _ìííÉêÑáÉäÇ jçåíÜäó=qêÉåÇ=oÉéçêí=ÖÉåÉêíÉÇ=çå lÅíçÄÉê=NPI=OMNT=~í=PWMR=~ã=EbpqF
pdfUtil.setCompareMode(CompareMode.VISUAL_MODE);
pdfUtil.highlightPdfDifference(true);
pdfUtil.setImageDestinationPath(resultFileDestFolder);
pdfUtil.compare(file1, file2);
Does not produce any output file, can somebody help?
Hi,
Can it generate a diff pdf file showing page by page difference ?
Deep
I have one reference pdf with me and comparing newly generates pdf's. but all the when new pdf is generated it will be having one unique bar code in it.
when i run this, all the time my tests show as failed, instead i need to skip few pixel areas and compare.
Please let me know if this is possible.
Thank you.
I am using Selenium Grid and for one script by default file is getting downloaded in Download folder of node machine.
with the use of Talk2Grid I retrieved NODEIP and to access downloaded file I used below path in code
Download Path- \\"+NODEIP+"\users\+system.getproperty("user.name")+\Downloads
Here NODEIP it is taking correctly but system.getproperty("user.name") gives HUB User
Can anyone please guide how to access file downloaded in node machine?
Hi,
thanks for providing pdf-util. 👍
We had some problems using pdf-util, because the jar contains the classes of the dependencies. Our project is using commons-io 2.5
but pdf-util uses commons-io 1.3.2
and pdf-util adds these classes to the classpath.
Please consider removing the classes from the jar or use the maven shade plugin to relocated the classes (see: https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html for details; I could send you a pull request, if you like.)
Both solutions will prevent dependency conflicts.
Hi,
I can't seem to figure out how to use library in python. Is there a python version of this library as well?
Thanks for writing such a useful utility. Under what licensing terms is this software made available? Would it be possible to add a license.txt file to the repo?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.