Comments (2)
I completely agree @mathieuchateau
A current fix would be to load the Documents
manually instead of using the KnowledgeBase
, something like:
reader = PDFReader()
pdf_documents: List[Document] = reader.read(uploaded_file)
for doc in pdf_documents:
doc.content = clean_content(doc.content)
assistant.knowledge_base.load_documents(pdf_documents, upsert=True)
You could also do it with a Website reader:
scraper = WebsiteReader(max_links=2, max_depth=1)
web_documents: List[Document] = scraper.read(input_url)
for doc in web_documents:
doc.content = clean_content(doc.content)
assistant.knowledge_base.load_documents(web_documents, upsert=True)
A long term approach would be to expose this as a parameter in the KnowledgeBase
object, that accepts a "pre_process()" function, which runs on the content to clean it.
What do you think?
from phidata.
from phidata.
Related Issues (20)
- Error:sqlalchemy.exc.StatementError: (builtins.ValueError) expected 768 dimensions, not 0 HOT 5
- Error in Tavily API HOT 10
- Is it possible to use custom api base url and key? HOT 2
- SQL issue HOT 7
- Couple questions... HOT 2
- Documentation Issue: Bad Links HOT 2
- Hermes not using tools HOT 1
- Trying to use singlestore with phidata and i keep getting this error, Even if i have done everything right with the right ssl certificate:I keep getting this error: sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'svc-3482219c-a389-4079-b18b-d50662524e8a-shared-dml.aws-virginia-6.svc.singlestore.com' ([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000))") HOT 4
- What is 'resources' in example scripts? HOT 1
- how to make lmstudio embedder HOT 2
- Seeing this example is unclear, so HOT 1
- AttributeError: module 'click' has no attribute 'get_os_args' HOT 1
- OpenAILike LLM error out with 422 HOT 18
- Can I load older chat history? HOT 1
- Deprecated function role messages in OpenAI LLM HOT 1
- How do you create an assistant, increase memory storage, and then call multiple different tools?
- What is a run_id and user_id HOT 3
- how to use web search in ollama? HOT 7
- Set api key on Windows HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from phidata.