hwchase17 / chroma-langchain Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
If I run this code I'm getting a warning:
UserWarning: VectorDBQA
is deprecated - please use from langchain.chains import RetrievalQA
Thanks for your work on this. I'm really enjoying Langchain, Chroma and OpenAI.
I am using this plugin as follows and it works great. I'm trying to also safeguard against creating new collections when one already exists. Also trying to do the same thing for items in the collection. Ideally, I'd like to know how to incorporate
client.get_or_create_collection
collection.upsert()
into my code below to facilitate this. I've started going down the path of building the db natively with Chroma, but thought it might be possible to do it in langchain with this plugin.
db = Chroma.from_texts(texts, embeddings, metadatas=metadatas, ids=ids, collection_name=collection_name, persist_directory="db")
I try to use Chroma from Vector stores in following way, after installing chromadb.
loader = PyPDFLoader("data/Diabetes.pdf")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
db = Chroma.from_documents(docs, OpenAIEmbeddings())
NOTE: I am using Windows machine, installed chromadb via pip, and chromadb client is working. I am able to create collections.
I get the following error:
onnxruntime is not supported on window 2012, how to change to pytorch
Hi, I found your example very easy to setup and get a fair understanding on how RAG with langchain with Chroma.
Although, I'd be more interested to host chromadb as a standalone microservice and access it in the application to store embeddings and query later. Can you please add that part as well?
I've tried below piece of snippet. But for some reason, I'm not able to get the chunks saved to vector db.
# create chroma db or load db from disk
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
import chromadb
from chromadb.config import Settings
client_settings = Settings(
chroma_api_impl="chromadb.api.fastapi.FastAPI",
chroma_client_auth_provider="chromadb.auth.token.TokenAuthClientProvider",
chroma_client_auth_credentials="xxxxxx",
chroma_client_auth_token_transport_header="X_CHROMA_TOKEN",
allow_reset=True,
anonymized_telemetry=False
)
client = chromadb.HttpClient(
host="localhost",
port=8000,
settings=client_settings,
)
collection = client.get_or_create_collection(name="documents")
emb_fn = OllamaEmbeddings(base_url=OLLAMA_URL, model=OLLAMA_MODEL)
def get_chroma(chroma_client):
chroma_db = Chroma(
collection_name="documents",
embedding_function=emb_fn,
client=chroma_client,
)
return chroma_db
chroma_db_client = get_chroma(client)
if init_db:
chroma_db_client.from_documents(all_document_chunks, emb_fn)
print(collection.count())
print(collection.peek())
else:
chroma_db_client = Chroma(embedding_function=emb_fn)
Output:
0
{'ids': [], 'embeddings': [], 'metadatas': [], 'documents': [], 'data': None, 'uris': None}
The chromadb server is running in a docker container and shows no errors. Also the variable all_document_chunks
has several chunks of a local document that I have.
Appreciate your help!
In this program, if I ask questions unrelated to the provided documents, can I get the answer I want?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.