aspose-pdf / aspose.pdf-for-java Goto Github PK
View Code? Open in Web Editor NEWAspose.PDF for Java examples, plugins and showcases
Home Page: https://products.aspose.com/pdf/java
License: MIT License
Aspose.PDF for Java examples, plugins and showcases
Home Page: https://products.aspose.com/pdf/java
License: MIT License
Hi,
We are trying to convert a PDF to Word Document using aspose api.
But the word document we get after conversion is loosing the whitespace characters and the text in the word document are wrapped one on another.
Could you please provide some sample code for PDF to Doc Conversion or any customizations in the conversion to be good.
we are using Aspose PDF 11.0.0.jar
Please see the below image for reference. (there are no spaces in between the words)
How TxT To PDF File?
Text in pdf contains embedded fonts,so I can't match the phone by pattern.
https://lagou-zhaopin-fe.lagou.com/activities/20221229/1672295126482.pdf
public static final String PHONE_REG = "(?:(?:1[-\\s]*[3456789][-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1})|(?:0[1-9]\\d{1,2}[-\\s]*\\d{7,8}))(?!\\d)";
public static void main(String[] args) throws Exception {
byte[] source = FileUtils.readFileToByteArray(new File("/1672295126482.pdf"));
if (!getLicense()) {
throw new Exception("com.aspose.pdf lic ERROR!");
}
try (ByteArrayInputStream searchInputStream = new ByteArrayInputStream(source); ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
Document pdfDoc = new Document(searchInputStream);
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
TextEditOptions textEditOptions = new TextEditOptions(0, TextEditOptions.LanguageTransformation.class);
TextFragmentAbsorber phoneTextFragmentAbsorber = new TextFragmentAbsorber(
PHONE_REG,
textSearchOptions,
textEditOptions);
PageCollection pages = pdfDoc.getPages();
Page page = pages.get_Item(1);
page.accept(phoneTextFragmentAbsorber);
for (TextFragment textFragment : phoneTextFragmentAbsorber.getTextFragments()) {
String text = textFragment.getText();
logger.info("phone: " + text);
}
} catch (Exception e) {
e.printStackTrace();
}
}
Hi is there a way to convert a pdf page to vectorial instruction. Tried to use GraphicsDevice but not sure if it's the right solution.
here is the sample code for replace text in pdf use aspose-pdf.jar, it can only replace the first page of pdf.
public static void replaceTextOnAllPages() { // Open document Document pdfDocument = new Document("source.pdf"); // Create TextAbsorber object to find all instances of the input search phrase TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("sample"); // Accept the absorber for first page of document pdfDocument.getPages().accept(textFragmentAbsorber); // Get the extracted text fragments into collection TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments(); // Loop through the fragments for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) { textFragment.setText("New Pharase"); } pdfDocument.save("Updated_Text.pdf"); }
now I want to replace the text for all pages in pdf, how can I do?
As described I used the same code from (http://www.aspose.com/docs/display/pdfjava/Print+PDF+file+to+default+printer+%28facades%29).
But it's printing blank page, and nothing printed with default printer except a blank page.
Hello.
I've faced a problem with com.aspose.pdf.examples.AsposePdfExamples.DocumentConversion.ConvertPCLToPDFFormat class. The thing is, it fails with NullPointerException on line 12 :
Exception in thread "main" java.lang.NullPointerException at com.aspose.pdf.internal.pcl.util.BaseFontHelper.lI(Unknown Source) at com.aspose.pdf.internal.pcl.util.BaseFontHelper.lI(Unknown Source) at com.aspose.pdf.internal.l94n.lI.<init>(Unknown Source) at com.aspose.pdf.internal.l94n.lt.<init>(Unknown Source) at com.aspose.pdf.internal.pcl.composer.lI.lu(Unknown Source) at com.aspose.pdf.internal.pcl.composer.lI.<init>(Unknown Source) at com.aspose.pdf.l12j.lI(Unknown Source) at com.aspose.pdf.l12j.lI(Unknown Source) at com.aspose.pdf.ADocument.lI(Unknown Source) at com.aspose.pdf.ADocument.<init>(Unknown Source) at com.aspose.pdf.Document.<init>(Unknown Source) at com.aspose.pdf.examples.AsposePdfExamples.DocumentConversion.ConvertPCLToPDFFormat.main(ConvertPCLToPDFFormat.java:12)
.
The only thing I changed was the filename in Document constructor (I changed "Document.pcl" to "src/main/resources/com/aspose/pdf/examples/AsposePdf/Conversion/pcltopdf/test.pcl" so that the constructor can find the source).
Is there a problem with the "test.pcl" file or something else?
Thank you in advance.
Why, when I use aspose-pdf for convert pdf to word, It uses a lot of RAM, and my Tomcat goes down, while I do not have access to increase permsize. why convert pdf to word uses a lot of ram usage unlike convert excel to pdf?
my code is :
import com.aspose.cells.FileFormatType;
import com.aspose.cells.Workbook;
import com.aspose.cells.WorksheetCollection;
import com.aspose.pdf.Document;
import com.aspose.pdf.SaveFormat;
import com.bpm.mis.config.Constants;
import com.sun.corba.se.impl.orbutil.closure.Constant;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;
public class AsposeExporter {
private final Logger log = LoggerFactory.getLogger(AsposeExporter.class);
private String homePath = System.getProperty("user.home");
private String licenseDir = homePath + "/aspose/";
private final String token;
private Workbook workbook;
public AsposeExporter(String token) throws Exception {
this.token = token;
checkAsposeLicense();
workbook = new Workbook(Constants.EXP_DIR + token + ".xlsx");
}
private void checkAsposeLicense() {
try {
com.aspose.cells.License licenseCell = new com.aspose.cells.License();
licenseCell.setLicense(licenseDir+"Aspose.Cells.lic");
com.aspose.pdf.License licensePdf = new com.aspose.pdf.License();
licensePdf.setLicense(licenseDir+"Aspose.Pdf.lic");
} catch (Exception e) {
log.warn("aspose licence is not set!",e);
}
}
public void trimExcelWorkbook(){
WorksheetCollection indList =workbook.getWorksheets();
for(int ind=indList.getCount()-1;ind>1;ind--) {
workbook.getWorksheets().removeAt(ind);
}
}
public void exportPdf() {
try {
workbook.save(Constants.EXP_DIR + token + ".pdf", FileFormatType.PDF);
workbook.dispose();
} catch (Exception e) {
log.error("aspose export to pdf error!", e);
}
}
public void exportWord() {
exportPdf();
Document document = new Document(Constants.EXP_DIR +token+".pdf");
document.save(Constants.EXP_DIR + token + ".docx", SaveFormat.DocX);
}
}
/Users/shubham.thakur/.bundle/ruby/2.6.0/aspose_java_for_ruby-2aa92d5f681c/lib/aspose_java_for_ruby.rb:28: [BUG] Segmentation fault at 0x0000000000000000
ruby 2.6.3p62 (2019-04-16 revision 67580) [universal.x86_64-darwin20]
-- Crash Report log information --------------------------------------------
See Crash Report log file under the one of following:
* ~/Library/Logs/DiagnosticReports
* /Library/Logs/DiagnosticReports
for more details.
Don't forget to include the above Crash Report log file in bug reports.
E: FATAL EXCEPTION: Thread-15
Process: com.pdftechnologies.pdfreaderpro, PID: 15504
java.lang.NoClassDefFoundError: Failed resolution of: Ljava/awt/GraphicsEnvironment;
at com.aspose.pdf.internal.l71t.lI.lj(Unknown Source)
at com.aspose.pdf.internal.l71t.lI.lI(Unknown Source)
at com.aspose.pdf.internal.l71t.lI.<clinit>(Unknown Source)
at com.aspose.pdf.internal.l71t.lI.lj(Unknown Source)
at com.aspose.pdf.internal.l67k.l1y.<clinit>(Unknown Source)
at com.aspose.pdf.internal.l67k.l1y.lI(Unknown Source)
at com.aspose.pdf.internal.l67k.ld.l0f(Unknown Source)
at com.aspose.pdf.Color.getBlack(Unknown Source)
at com.aspose.pdf.GraphInfo.<init>(Unknown Source)
at com.aspose.pdf.Page.<init>(Unknown Source)
at com.aspose.pdf.PageCollection.lI(Unknown Source)
at com.aspose.pdf.PageCollection.add(Unknown Source)
at com.aspose.pdf.PageCollection.add(Unknown Source)
at java.lang.Thread.run(Thread.java:760)
Caused by: java.lang.ClassNotFoundException: Didn't find class "java.awt.GraphicsEnvironment" on path: DexPathList[[zip file "/data/app/com.pdftechnologies.pdfreaderpro-2/base.apk"],nativeLibraryDirectories=[/data/app/com.pdftechnologies.pdfreaderpro-2/lib/arm, /data/app/com.pdftechnologies.pdfreaderpro-2/base.apk!/lib/armeabi-v7a, /system/lib, /vendor/lib]]
at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:56)
at java.lang.ClassLoader.loadClass(ClassLoader.java:380)
at java.lang.ClassLoader.loadClass(ClassLoader.java:312)
at com.aspose.pdf.internal.l71t.lI.lj(Unknown Source)
at com.aspose.pdf.internal.l71t.lI.lI(Unknown Source)
at com.aspose.pdf.internal.l71t.lI.<clinit>(Unknown Source)
at com.aspose.pdf.internal.l71t.lI.lj(Unknown Source)
at com.aspose.pdf.internal.l67k.l1y.<clinit>(Unknown Source)
at com.aspose.pdf.internal.l67k.l1y.lI(Unknown Source)
at com.aspose.pdf.internal.l67k.ld.l0f(Unknown Source)
at com.aspose.pdf.Color.getBlack(Unknown Source)
at com.aspose.pdf.GraphInfo.<init>(Unknown Source)
at com.aspose.pdf.Page.<init>(Unknown Source)
at com.aspose.pdf.PageCollection.lI(Unknown Source)
at com.aspose.pdf.PageCollection.add(Unknown Source)
at com.aspose.pdf.PageCollection.add(Unknown Source)
at com.aspose.pdf.ADocument.l1v(Unknown Source)
at com.aspose.pdf.ADocument.isLicensed(Unknown Source)
环境:Android Studio,Android应用,导入aspose-pdf-18.11.jar
使用:进行PDF --> Doc,Docx,PPtx,xls等转档,抛出如上异常;
请问,该Jar是否支持Android平台使用,并使用Java中所支持的转档功能。
目前Code中所支持的转档类型:
public static final int Doc = 1;
public static final int Xps = 2;
public static final int Html = 3;
public static final int Xml = 4;
public static final int TeX = 5;
public static final int DocX = 6;
public static final int Svg = 7;
public static final int MobiXml = 8;
public static final int Excel = 9;
public static final int Epub = 10;
public static final int Plugin = 11;
public static final int Pptx = 14;
联系方式:[email protected]
Exception in thread "main" java.lang.ClassCastException: com.aspose.pdf.LaunchAction cannot be cast to com.aspose.pdf.GoToURIAction
`Document document = new Document("C:\Users\admin.v.ramesh\Downloads\hyperlink to file.pdf");
Page page = document.getPages().get_Item(1);
AnnotationSelector selector = new AnnotationSelector(new LinkAnnotation(page, Rectangle.getTrivial()));
page.accept(selector);
List list = selector.getSelected();
// Iterate through individual item inside list
if (list.size() == 0)
System.out.println("No Hyperlinks found..");
else {
// Loop through all the bookmarks
for(LinkAnnotation annot : (Iterable<com.aspose.pdf.LinkAnnotation>)list)
{
//Annotation an = (Annotation)annot;
// Print the destination URL
System.out.println("URL: " + ((com.aspose.pdf.GoToURIAction)annot.getAction()).getURI());
}
}`
convert pdf to html but no title,how can i insert?
I am using ASPOSE.PDF for Java.
I have created multipage pdf from the multiple tiff byte arrays, but not able to set rotation.
Also the generated pdf size is very large, its around 6-7 MB for a 3 pages. Could any one help to set rotation for image and reducing pdf size.
When replacing text, how to solve the problem of not wrapping after replacing short text with long text?
Could not transfer artifact com.aspose:aspose-pdf:pom:22.12 from/to AsposeJavaAPI (https://artifact.aspose.com/repo/): transfer failed for https://artifact.aspose.com/repo/com/aspose/aspose-pdf/22.12/aspose-pdf-22.12.pom
On the README, you state that you can open the project directly with IntelliJ IDEA, and you can use the import feature in Eclipse and Netbeans. It seems that you have the Eclipse files (.classpath and .project), but nothing IntelliJ IDEA related (no .idea, .iml or .ipr)
Some files which are used in documentation and for demo pupose in the code does not exist in the repository. For instance EmailDemo_updated.html which is used to demo converting html file to PDF does not exists in the repository. There are many examples like this.
Hi Team,
(new PdfContentEditor()).replaceText throwing Font Arial was not found
in linux server
I am not sure, why it is searching of Arial font in linux box. please assist.
(same code works in windows machine as expected )
Aspose jar log trace :
Exception class com.aspose.pdf.exceptions.FontNotFoundException: Font Arial was not found
com.aspose.pdf.FontRepository.findFont(Unknown Source)
com.aspose.pdf.internal.l4n.l1h.l0y(Unknown Source)
com.aspose.pdf.internal.l4n.lj.lI(Unknown Source)
com.aspose.pdf.internal.l4n.lj.lI(Unknown Source)
com.aspose.pdf.internal.l4n.lj.lI(Unknown Source)
com.aspose.pdf.internal.l4n.l2t.lI(Unknown Source)
com.aspose.pdf.internal.l5l.ld.lI(Unknown Source)
com.aspose.pdf.internal.l5l.lI.lI(Unknown Source)
com.aspose.pdf.internal.l5l.lI.lI(Unknown Source)
com.aspose.pdf.internal.l5l.ld.lj(Unknown Source)
com.aspose.pdf.internal.l5l.lh.lI(Unknown Source)
com.aspose.pdf.internal.l5l.lh.(Unknown Source)
com.aspose.pdf.internal.l5l.l0t.lI(Unknown Source)
com.aspose.pdf.TextFragmentAbsorber.visit(Unknown Source)
com.aspose.pdf.facades.PdfContentEditor.replaceText(Unknown Source)
com.aspose.pdf.facades.PdfContentEditor.replaceText(Unknown Source)
Using replace all strategy
&
FontRepository.isReplaceNotFoundFonts():true
I decided to convert excel to word but , I was suggested that I have to convert exccel to pdf , after this ok ,I convert pdf to word But Excel has a chart that correctly is in pdf Unfortunately, it does not come in Word.
import com.aspose.cells.FileFormatType;
import com.aspose.cells.Workbook;
import com.aspose.pdf.Document;
import com.aspose.pdf.SaveFormat;
public class Test {
public static void main(String[] args) throws Exception {
String dir="D:/Test/";
Workbook workbook = new Workbook(dir+"test.xlsx");
workbook.save(dir+"testexcel.pdf", FileFormatType.PDF);
Document document = new Document(dir+"testexcel.pdf");
document.save(dir+"testexcel1.docx", SaveFormat.DocX);
}
}
If you add the following lines into .gitignore it aids people using Eclipse, IntelliJ or Mac. I've add # so you can see which they refer to. I guess this might be a change you want to do for all the aspose products.
.classpath
.project
.settings/
test-output/
.idea/
*.iml
*.iws
.DS_Store
Severely hinder development progress.
Hi,
We are trying to convert PDF to word using aspose, conversion is happening but the word file generated is of only 2kb, also we are unable to open that word file.
Please provide solution.
Pdf export double column display,how to achieve?
Same type of issue to aspose-slides/Aspose.Slides-for-Java#7
Any reason why you picked camel case names for package names.
Just wondering as both Sun and now Oracle recommend package names are all lower case, so avoid potential conflict with classes and interfaces.
https://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html
AddPageNumberStamp has the following package name;
package programmersguide.workingwithasposepdf.workingwithstampsandwatermarks.addpagenumberstamp.java;
But should be;
package com.aspose.pdf.examples.asposepdf.stampsandwatermarks;
I've downloaded this repository and tried it with the latest version I can find (4.6.0), but it doesn't compile. Checking the documentation in your site, I also find that a lot of methods doesn't seem to apply to the latest version? (for example, Section
doesn't have a getIsLandscape()
method, among others)
Is there an updated examples/documentation anywhere?
Thanks
There are pauses in multithreaded transitions.
how to use?
code:
public void test(){
String pdfPath = "/home/test/test.pdf";
String docxPath = "/home/test/test.docx";
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(htmlDestPath);
DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
pdfDocument.save(exportPath, saveOptions);
}
exception:
java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.elementData(ArrayList.java:424) at java.util.ArrayList.get(ArrayList.java:437) at com.aspose.pdf.internal.l0j.ly.lf(Unknown Source) at com.aspose.pdf.internal.l0j.ly.lI(Unknown Source) at com.aspose.pdf.internal.doc.ml.MlParagraphConverter.addParagraph(Unknown Source) at com.aspose.pdf.internal.l99t.lk.lI(Unknown Source) at com.aspose.pdf.internal.l99t.lk.lI(Unknown Source) at com.aspose.pdf.internal.l0u.lh.lI(Unknown Source) at com.aspose.pdf.internal.l99t.lk.lf(Unknown Source) at com.aspose.pdf.internal.l99t.le.lk(Unknown Source) at com.aspose.pdf.internal.l15p.lv.lI(Unknown Source) at com.aspose.pdf.internal.l15p.lb.lf(Unknown Source) at com.aspose.pdf.internal.l15t.lj.lI(Unknown Source) at com.aspose.pdf.internal.l0j.lf.lI(Unknown Source) at com.aspose.pdf.l4j.lI(Unknown Source) at com.aspose.pdf.l4j.lI(Unknown Source) at com.aspose.pdf.ADocument.lj(Unknown Source) at com.aspose.pdf.ADocument.lI(Unknown Source) at com.aspose.pdf.Document.lI(Unknown Source) at com.aspose.pdf.ADocument.lI(Unknown Source) at com.aspose.pdf.ADocument.save(Unknown Source) at com.aspose.pdf.Document.save(Unknown Source) at org.jeecg.smallTools.TestStr.downloadFile2Local3(TestStr.java:316) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)
I have a pdf file:
that converted to word by Aspose.pdf and output is :
https://ufile.io/54xv2 why after convert from pdf to word , the text of word is out of area and And the texts are in a group format.
how can I edit this code to achieve my goal?
my code is :
Document document = new Document("x"+".pdf");
document.save("x" + ".docx", SaveFormat.DocX);
Hi, i am getting this error when i tried to run PHP to Excel
Fatal error: Uncaught Error: Class 'com\aspose\pdf\Document' not found
in my repositiory not found aspose\pdf\Document? how to solve this, Please give any suggestion
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.