Comments (6)
Hi Jatin,
pdf2docx
won't OCR pdf, so it should be no text if it's not a text-based pdf. Otherwise, could you please provide the source pdf? So, I can look into it.
from pdf2docx.
This is what i got as response.
I tried with this pdf and got this.
Thank you for your reply and helping me.
from pdf2docx.
Thank you for the input. I guess it's not real "text", but some path show the shape of each character. Could you copy any text in the pdf and paste here to prove it?
from pdf2docx.
Yes, it's not real "text". I didn't knew about it. But, is there anything i can do for these type of PDFs?
from pdf2docx.
You have to ocr them first to get the real text. I'd recommend https://github.com/jbarlow83/OCRmyPDF, which ocr you pdf and generates a real text layer right behind the original "text". In this way, the format is still persisted. After that, it should be processed correctly by pdf2docx
.
from pdf2docx.
Thank you so much for helping me. :)
from pdf2docx.
Related Issues (20)
- 转化后存在页面超出的问题
- ValueError: unsupported colorspace for 'png' HOT 3
- Any support for ANDROID? HOT 1
- 转换时遇到字体名为中文(比如“宋体”)时,发生错误 HOT 3
- language support HOT 2
- pdf2docx.Converter将某些特殊pdf转word时,某个子进程会卡住 HOT 3
- Table is broken when the table is displayed on 2 pages HOT 2
- 关于行高分配的逻辑疑问 HOT 2
- 转换docx表格中文本不全,请问这个可以解决吗 HOT 1
- Resource Han Rounded CN Light rendered as "Resource" HOT 2
- 转word后图片被旋转180° HOT 12
- 表格生成的时候没有处理好浮动形图片 HOT 1
- 含XFA表单域的PDF无法转换为word HOT 1
- 占用内存没有gc
- pdf转word后,表格会溢出边界 HOT 2
- Hyperlinks are not transferred HOT 7
- pdf2docx may disable your logging configurations HOT 1
- some pages parse failed
- Accuracy checker
- Pdfkit header and footer convert
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdf2docx.