Coder Social home page Coder Social logo

renumics / renumics-rag Goto Github PK

View Code? Open in Web Editor NEW
138.0 138.0 31.0 17.05 MB

Visualization for a Retrieval-Augmented Generation (RAG) Assistant ๐Ÿค–โค๏ธ๐Ÿ“š

License: MIT License

Makefile 1.64% Python 55.73% Jupyter Notebook 42.63%
huggingface langchain large-language-models llm machine-learning openai rag retrieval-augmented retrieval-augmented-generation streamlit umap visualization

renumics-rag's People

Contributors

druzsan avatar markus-stoll avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

renumics-rag's Issues

DuplicateIDError

docs_vectorstore.add_documents(splits, ids=split_ids) throws the below error:

---------------------------------------------------------------------------
{
	"name": "DuplicateIDError",
	"message": "Expected IDs to be unique, found duplicates of: f7bf54e0f0a6f6fdf377d0904d443afab7177378, f8d159fc1246a55f0d0758e689c6099f440bfbc2",
	"stack": "---------------------------------------------------------------------------
DuplicateIDError                          Traceback (most recent call last)
Cell In[5], line 14
     12 split_ids = list(map(stable_hash, splits))
     13 #docs_vectorstore.add_documents(splits)
---> 14 docs_vectorstore.add_documents(splits, ids=split_ids)
     15 docs_vectorstore.persist()

File ~\\AppData\\Roaming\\Python\\Python311\\site-packages\\langchain_core\\vectorstores.py:119, in VectorStore.add_documents(self, documents, **kwargs)
    117 texts = [doc.page_content for doc in documents]
    118 metadatas = [doc.metadata for doc in documents]
--> 119 return self.add_texts(texts, metadatas, **kwargs)

File ~\\AppData\\Roaming\\Python\\Python311\\site-packages\\langchain_community\\vectorstores\\chroma.py:297, in Chroma.add_texts(self, texts, metadatas, ids, **kwargs)
    295 ids_with_metadata = [ids[idx] for idx in non_empty_ids]
    296 try:
--> 297     self._collection.upsert(
    298         metadatas=metadatas,
    299         embeddings=embeddings_with_metadatas,
    300         documents=texts_with_metadatas,
    301         ids=ids_with_metadata,
    302     )
    303 except ValueError as e:
    304     if \"Expected metadata value to be\" in str(e):

File ~\\AppData\\Roaming\\Python\\Python311\\site-packages\\chromadb\\api\\models\\Collection.py:477, in Collection.upsert(self, ids, embeddings, metadatas, documents, images, uris)
    444 def upsert(
    445     self,
    446     ids: OneOrMany[ID],
   (...)
    456     uris: Optional[OneOrMany[URI]] = None,
    457 ) -> None:
    458     \"\"\"Update the embeddings, metadatas or documents for provided ids, or create them if they don't exist.
    459 
    460     Args:
   (...)
    467         None
    468     \"\"\"
    470     (
    471         ids,
    472         embeddings,
    473         metadatas,
    474         documents,
    475         images,
    476         uris,
--> 477     ) = self._validate_embedding_set(
    478         ids, embeddings, metadatas, documents, images, uris
    479     )
    481     if embeddings is None:
    482         if documents is not None:

File ~\\AppData\\Roaming\\Python\\Python311\\site-packages\\chromadb\\api\\models\\Collection.py:545, in Collection._validate_embedding_set(self, ids, embeddings, metadatas, documents, images, uris, require_embeddings_or_data)
    523 def _validate_embedding_set(
    524     self,
    525     ids: OneOrMany[ID],
   (...)
    543     Optional[URIs],
    544 ]:
--> 545     valid_ids = validate_ids(maybe_cast_one_to_many_ids(ids))
    546     valid_embeddings = (
    547         validate_embeddings(
    548             self._normalize_embeddings(maybe_cast_one_to_many_embedding(embeddings))
   (...)
    551         else None
    552     )
    553     valid_metadatas = (
    554         validate_metadatas(maybe_cast_one_to_many_metadata(metadatas))
    555         if metadatas is not None
    556         else None
    557     )

File ~\\AppData\\Roaming\\Python\\Python311\\site-packages\\chromadb\\api\\types.py:240, in validate_ids(ids)
    236         example_string = (
    237             f\"{', '.join(examples[:5])}, ..., {', '.join(examples[-5:])}\"
    238         )
    239         message = f\"Expected IDs to be unique, found {n_dups} duplicated IDs: {example_string}\"
--> 240     raise errors.DuplicateIDError(message)
    241 return ids

DuplicateIDError: Expected IDs to be unique, found duplicates of: f7bf54e0f0a6f6fdf377d0904d443afab7177378, f8d159fc1246a55f0d0758e689c6099f440bfbc2"
}

LangChain Issue

Do one without LangChain, the dependencies for it make it not usable

Not able to invoke the rag_chain given in the example Notebook

I am trying to implement the renumics-spotlight package and I was referring to the example notebook that is given and followed the tutorial mentioned in this link.

I had to make some changes in the code in my local notebook - and the steps upto when the rag_chain is invoked works fine, but these steps fail due to Authentication error from OpenAI , which I am not able to understand why?

question = "Who built the nuerburgring"
response = rag_chain.invoke(question)
answer = response["answer"]
answer

Since the error stack trace is large, I'm sharing this via a link

Do we need to provide any additional authentication here since I've already set the env variable and also tried doing this before invoking the rag_chain

from getpass import getpass
OPENAI_API_KEY = getpass()
import os

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

But this still makes no difference. Can you please help me out in this issue.

No instruction for use with Huggingface models

Why there are no instructions to use with open source embedding models? The notebook does not work because you can not create docs/data and fill it unless you are paying for OpenAI api.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.