是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? <ul clas

同感上传一个单层PDF只有图片就悲剧了 box找不到直接报错跟代码发现没有OCR <p dir="au

[BUG] 在尝试单独使用PdfLoader出现问题 about qanything HOT 8 OPEN

tcy6 commented on August 23, 2024

[BUG] 在尝试单独使用PdfLoader出现问题

from qanything.

Comments (8)

tcy6 commented on August 23, 2024 1

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

好的十分感谢，另外是不是Qanything无法处理没有文本元素的pdf啊，我截了一张图进行解析，发现有报错。如果是这样那它里面的ocr的意义是什么呢，是解析表格？

from qanything.

milely commented on August 23, 2024 1

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

好的十分感谢，另外是不是Qanything无法处理没有文本元素的pdf啊，我截了一张图进行解析，发现有报错。如果是这样那它里面的ocr的意义是什么呢，是解析表格？
The OCR module was removed due to slowly processing speed , and it can currently only handle parseable pdf files. Support for scanning image-based pdf files will be added in the future through a toggle switch.

from qanything.

tcy6 commented on August 23, 2024 1

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

好的十分感谢，另外是不是Qanything无法处理没有文本元素的pdf啊，我截了一张图进行解析，发现有报错。如果是这样那它里面的ocr的意义是什么呢，是解析表格？
The OCR module was removed due to slowly processing speed , and it can currently only handle parseable pdf files. Support for scanning image-based pdf files will be added in the future through a toggle switch.

好的好的，十分感谢。既然不会ocr pdf，那感觉可以把pdf loader里面的ocr相关的东西先去掉，不然很迷惑人哈哈哈，明明都输出ocr finished了，但是实际上却没有ocr

from qanything.

milely commented on August 23, 2024

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

from qanything.

tcy6 commented on August 23, 2024

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

好的十分感谢，另外是不是Qanything无法处理没有文本元素的pdf啊，我截了一张图进行解析，发现有报错。如果是这样那它里面的ocr的意义是什么呢，是解析表格？

报错信息如下:
<Logger debug_logger (INFO)> <Logger qa_logger (INFO)>
LOCAL DATA PATH: c:\Users\Administrator\Desktop\QAnything-1.4.1\QANY_DB\content
LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1
LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1
table model initing...
cpu
table model inited...
WARNING:root:Miss outlines
INFO:debug_logger:Start OCR！
1it [00:00, ?it/s]
INFO:debug_logger:OCR finished in 0.15695199999026954 seconds
preprocess
1it [00:00, ?it/s]
Traceback (most recent call last):
File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\core\test.py", line 204, in
markdown_dir = loader.load_to_markdown()
File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\utils\loader\self_pdf_loader.py", line 53, in load_to_markdown
page_width = max([b["x1"] for b in self.boxes if b['layout_type'] == 'text']) - min(
ValueError: max() arg is an empty sequence

from qanything.

xiehurricane commented on August 23, 2024

同感上传一个单层PDF只有图片就悲剧了 box找不到直接报错跟代码发现没有OCR

from qanything.

zhudongwork commented on August 23, 2024

Please download the pdf parser related checkpoints in modelscope [https://www.modelscope.cn/models/netease-youdao/QAnything-pdf-parser/files]

好的十分感谢，另外是不是Qanything无法处理没有文本元素的pdf啊，我截了一张图进行解析，发现有报错。如果是这样那它里面的ocr的意义是什么呢，是解析表格？

报错信息如下: <Logger debug_logger (INFO)> <Logger qa_logger (INFO)> LOCAL DATA PATH: c:\Users\Administrator\Desktop\QAnything-1.4.1\QANY_DB\content LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1 LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1 table model initing... cpu table model inited... WARNING:root:Miss outlines INFO:debug_logger:Start OCR！ 1it [00:00, ?it/s] INFO:debug_logger:OCR finished in 0.15695199999026954 seconds preprocess 1it [00:00, ?it/s] Traceback (most recent call last): File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\core\test.py", line 204, in markdown_dir = loader.load_to_markdown() File "c:\Users\Administrator\Desktop\QAnything-1.4.1\qanything_kernel\utils\loader\self_pdf_loader.py", line 53, in load_to_markdown page_width = max([b["x1"] for b in self.boxes if b['layout_type'] == 'text']) - min( ValueError: max() arg is an empty sequence

我也是一样的错误：Error in Powerful PDF parsing: max() arg is an empty sequence。关键是我传的是一页论文pdf，不是图片

from qanything.

SoonyangZhang commented on August 23, 2024

同感上传一个单层PDF只有图片就悲剧了 box找不到直接报错跟代码发现没有OCR

可以使用ocrmypdf 处理pdf。

from qanything.

[BUG] 在尝试单独使用PdfLoader出现问题 about qanything HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent