Comments (3)
代码:import os
from qanything_kernel.utils.loader.self_pdf_loader import PdfLoader
file_path = "C:/home/QAnything/example/test.pdf"
loader = PdfLoader(filename=file_path,binary=None, save_dir=os.path.dirname(file_path))
md = loader.load_to_markdown()
print(md)
输出:
LOCAL DATA PATH: C:\home\QAnything\QANY_DB\content
LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1
LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1
table model initing...
cpu
table model inited...
WARNING:root:Miss outlines
23it [00:00, ?it/s]
2024-06-25 15:20:41,009 Start OCR!
2024-06-25 15:20:41,012 OCR finished in 2.0259380000061356 seconds
preprocess
preprocess
23it [00:00, 11515.94it/s]
### 2024-06-25 15:20:43,110 Error in Powerful PDF parsing: max() arg is an empty sequence
2024-06-25 15:20:43,112 PDF Parse finished in 2.0993914999999106 seconds
C:/home/QAnything/example\test_md\test.md
from qanything.
可能是模型文件损坏,或下载的是指针文件?我用魔塔的python重下了一遍就好了
from qanything.
from qanything_kernel.utils.loader.self_pdf_loader import PdfLoader pdf_loader = PdfLoader(filename='tables/table-03d9ec345317b0115180d7dbcf843ef6.pdf') markdown_directory = pdf_loader.load_to_markdown() print(f"Markdown文件在: {markdown_directory}")
➜ QAnything python QAnything_ocr.py LOCAL DATA PATH: /mnt/user/QAnything-qanything-python/QAnything/QANY_DB/content LOCAL_RERANK_REPO: netease-youdao/bce-reranker-base_v1 LOCAL_EMBED_REPO: netease-youdao/bce-embedding-base_v1 Traceback (most recent call last): File "/mnt/user/QAnything-qanything-python/QAnything/QAnything_ocr.py", line 6, in pdf_loader = PdfLoader(filename='tables/table-03d9ec345317b0115180d7dbcf843ef6.pdf') File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/self_pdf_loader.py", line 14, in init super().init() File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/core/parser/pdf_parser.py", line 34, in init self.layouter = LayoutRecognizer("layout") File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/core/vision/layout_recognizer.py", line 20, in init super().init(self.labels, domain, model_dir) File "/mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/core/vision/recognizer.py", line 28, in init self.ort_sess = ort.InferenceSession(model_file_path, providers=['CPUExecutionProvider']) File "/mnt/user/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/mnt/user/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 472, in _create_inference_session sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model) onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /mnt/user/QAnything-qanything-python/QAnything/qanything_kernel/utils/loader/pdf_to_markdown/checkpoints/layout/layout.onnx failed:Protobuf parsing failed. 尝试过的onnxruntime版本如下: onnxruntime==1.17.1 onnxruntime-gpu==1.17.1 onnxruntime==1.18.0 onnxruntime-gpu==1.18.0
跑通了不,有没有遇到Error in Powerful PDF parsing: max() arg is an empty sequence
from qanything.
Related Issues (20)
- [BUG] Error in Powerful PDF parsing: max() arg is an empty sequence HOT 1
- [BUG] <title>如何完全重置 HOT 1
- 【急】求助大佬,为什么打开前端没有对话界面(点击新建知识库也提示请求失败 HOT 2
- use AsyncOpenAI in llm_for_openai_api.py for performance consideration HOT 1
- 如何问答之后不显示数据来源和相关性信息? HOT 2
- 问下这个知识库有容量限制吗?可以一直上传文件?10万个文件都是有关联的,那么问答速度有何总影响 HOT 1
- [BUG] <qanything-python在哪里?>
- 最新的版本V1.4.1没有发布docker image吗? HOT 1
- [BUG] <title>纯python版调用ollama接口后台报找不到大模型 HOT 2
- python版本与master版本的问题 HOT 2
- 请问使用问答接口的时候history如何自动获取,手动调用问答接口没有history信息 HOT 1
- ollama qwen2 7b模型效果不好 HOT 1
- [BUG] <title> python版本上传文档,一直显示“解析中” HOT 2
- 请问如何设置 相关性参数
- 本地给不了的答案,想要ollama加载模型直接回复。
- [BUG] <python版本安装,上传文件后一直加载中>
- [BUG] <title>这个解析检索结果parse_batch_result
- 网址解析不了
- [BUG] <title> RuntimeError: generator ignored GeneratorExit
- 图片绘制
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qanything.