Coder Social home page Coder Social logo

Comments (8)

tcy6 avatar tcy6 commented on August 23, 2024 1

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

好的十分感谢,另外是不是Qanything无法处理没有文本元素的pdf啊,我截了一张图进行解析,发现有报错。如果是这样那它里面的ocr的意义是什么呢,是解析表格?

from qanything.

milely avatar milely commented on August 23, 2024 1

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

好的十分感谢,另外是不是Qanything无法处理没有文本元素的pdf啊,我截了一张图进行解析,发现有报错。如果是这样那它里面的ocr的意义是什么呢,是解析表格?
The OCR module was removed due to slowly processing speed , and it can currently only handle parseable pdf files. Support for scanning image-based pdf files will be added in the future through a toggle switch.

from qanything.

tcy6 avatar tcy6 commented on August 23, 2024 1

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

好的十分感谢,另外是不是Qanything无法处理没有文本元素的pdf啊,我截了一张图进行解析,发现有报错。如果是这样那它里面的ocr的意义是什么呢,是解析表格?
The OCR module was removed due to slowly processing speed , and it can currently only handle parseable pdf files. Support for scanning image-based pdf files will be added in the future through a toggle switch.

好的好的,十分感谢。既然不会ocr pdf,那感觉可以把pdf loader里面的ocr相关的东西先去掉,不然很迷惑人哈哈哈,明明都输出ocr finished了,但是实际上却没有ocr

from qanything.

milely avatar milely commented on August 23, 2024

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

from qanything.

tcy6 avatar tcy6 commented on August 23, 2024

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

好的十分感谢,另外是不是Qanything无法处理没有文本元素的pdf啊,我截了一张图进行解析,发现有报错。如果是这样那它里面的ocr的意义是什么呢,是解析表格?

报错信息如下:
<Logger debug_logger (INFO)> <Logger qa_logger (INFO)>
LOCAL DATA PATH: c:\Users\Administrator\Desktop\QAnything-1.4.1\QANY_DB\content
LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1
LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1
table model initing...
cpu
table model inited...
WARNING:root:Miss outlines
INFO:debug_logger:Start OCR!
1it [00:00, ?it/s]
INFO:debug_logger:OCR finished in 0.15695199999026954 seconds
preprocess
1it [00:00, ?it/s]
Traceback (most recent call last):
File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\core\test.py", line 204, in
markdown_dir = loader.load_to_markdown()
File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\utils\loader\self_pdf_loader.py", line 53, in load_to_markdown
page_width = max([b["x1"] for b in self.boxes if b['layout_type'] == 'text']) - min(
ValueError: max() arg is an empty sequence

from qanything.

xiehurricane avatar xiehurricane commented on August 23, 2024

同感 上传一个单层PDF只有图片 就悲剧了 box找不到 直接报错 跟代码发现没有OCR

from qanything.

zhudongwork avatar zhudongwork commented on August 23, 2024

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

好的十分感谢,另外是不是Qanything无法处理没有文本元素的pdf啊,我截了一张图进行解析,发现有报错。如果是这样那它里面的ocr的意义是什么呢,是解析表格?

报错信息如下: <Logger debug_logger (INFO)> <Logger qa_logger (INFO)> LOCAL DATA PATH: c:\Users\Administrator\Desktop\QAnything-1.4.1\QANY_DB\content LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1 LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1 table model initing... cpu table model inited... WARNING:root:Miss outlines INFO:debug_logger:Start OCR! 1it [00:00, ?it/s] INFO:debug_logger:OCR finished in 0.15695199999026954 seconds preprocess 1it [00:00, ?it/s] Traceback (most recent call last): File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\core\test.py", line 204, in markdown_dir = loader.load_to_markdown() File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\utils\loader\self_pdf_loader.py", line 53, in load_to_markdown page_width = max([b["x1"] for b in self.boxes if b['layout_type'] == 'text']) - min( ValueError: max() arg is an empty sequence

我也是一样的错误:Error in Powerful PDF parsing: max() arg is an empty sequence。关键是我传的是一页论文pdf,不是图片

from qanything.

SoonyangZhang avatar SoonyangZhang commented on August 23, 2024

同感 上传一个单层PDF只有图片 就悲剧了 box找不到 直接报错 跟代码发现没有OCR

可以使用ocrmypdf 处理pdf。

from qanything.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.