Coder Social home page Coder Social logo

csabaconsulting / thruthinkcohereweaviatechat Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 3.0 102 KB

Cohere and Weaviate powered ThruThink support chat on Streamlit

Home Page: https://thruthinksupport.streamlit.app/

License: MIT License

Python 100.00%
cohere generative-ai llm rag retrieval-augmented-generation streamlit weaviate

thruthinkcohereweaviatechat's Introduction

ThruThink® Support Chat Agent utilizing RAG Fusion, powered by Cohere & Weaviate

The main business goal is to develop a support chat agent for the investment projection web application ThruThink®.

  • ThruThink® is a business budgeting on-line app to create professional budgets and forecasts
  • A product of literally decades of experience and careful thought, and thousands of calculations
  • Thru-hiking, or through-hiking, is the act of hiking an established long-distance trail end-to-end continuously
  • There are no dedicated personnel for support chat agent roles, it had a “classic” chat agent integration in the past
  • An LLM and RAG (Retrieval Augmented Generation) powered chat agent could be invaluable, given that
    1. It stays relatively grounded
    2. It Won’t hallucinate* wildly

Desired abilities:

  • Main goal: answer ThruThink® software specific questions such as: "In ThruThink can I make adjustments on the Cash Flow Control page?"
  • Nice to have: answer more generic questions such as: "How much inventory should I have?"

The bulk of the knowledge base consists of close to 190 help topics also divided into a few dozen categories. That's more than nothing, however users can ask such a wide variety of questions that chunking these documents may not provide a nice ground for a good vector match in the embedding space. To increase the performance of the chat agent I employ several techniques.

Achievements:

  1. With a synthetic data generation I enriched the knowledge base. I coined this QBRAG (QnA Boosted RAG) because I'm using the same QnA data I already generated and curated for potential fine tuning purposes. The same dataset can be used to enrich the vector indexed knowledge as well.
  2. The highlight of my submission is RAG Fusion (see article).
  3. I utilize Weaviate for vector storage, embedding, matching and retrieval. I use Cohere for multiple language models: fine tuned chat model and also co.chat with a web connector advanced feature in different stages of the chain.
  4. I perform metadata helped retrieval since I ingest and index the help documents' titles and categories.
  5. I also use LangChain to construct some stages of the chain.
  6. The front-end is powered by and hosted on Streamlit, I highly customized the view which also features linked references.
  7. After the fusion and re-ranking I provide the user with both results from a more traditional RAG grounded co.chat call and also a web-connector powered other call (that is also augmented to provide guidance) to show both information so the user can get the best of both worlds.
  8. Since I have to control several stages of the chain for the fusion, I was not able to use such high level LangChain constructs as ConversationalRetrievalChain or RetrievalQA, so co.chat's ability to handle the conversation for me (via conversation_id) made my job much easier than I'd have to work for history / memory functionality and other building blocks.

RAG Fusion:

  1. Since users might ask questions which don't match well into the QnA questions in its particular form, but it is still covered by the knowledge base, the application first generates variations of the user's query with the help of a fine tuned Cohere model. The hope is that some of these variations may match closer to some QnA or help data chunks.
  2. The document retrieval then happens for all of the query variations.
  3. There's a reciprocal rank fusion which concludes a fused list of documents across all the variations.
  4. We'll take the top k of those documents and perform final two RAG calls which supply the displayed data. Both RAG calls use the cutting edge co.chat, one of the calls is document based, and the other is a web connector based (but still document augmentation helped for better result).

Note that the application and all the help documents are English. Therefore we used the embed-english-v2.0 with cosine similarity. Knowing we need to perfrom only in English domain we can expect possibly slightly better performance. We need to pay attention to the similarity (multi lingual embedding uses dot product). Also refer to https://github.com/CsabaConsulting/Cohere/blob/main/WeaviateInit.ipynb.

Other achievements:

Kindly look at the development and experimentation IPython notebooks and scripts in the https://github.com/CsabaConsulting/Cohere repository. These were used to establish the Weaviate schema for ingestion / indexing / retrieval, and also testing retrieval and building up the parts for the RAG Fusion.

Future plans:

  • Decrease runtime by running the variation document retrievals in parallel, this is a Streamlit specific tech challenge with asyncio / await.
  • Decrease runtime by running the final two co.chat RAG calls in parallel, this is a Streamlit specific tech challenge with asyncio / await.
  • Make the citation linking nicer and other UI enhancements.
  • Measure how much the RAG Fusion (if at all) improves answer quality. Measure the trade-off factoring in extra latency and potential token usage increase which also means cost increase.
  • Integrate the agent into ThruThink which uses ASP.NET MVC / C# / Azure technology stack, but not open source. In that final deployment I'll be able to open up referred help topics using the meta-data I get back as part of the query results document metadata.
  • Add filter against harmful content, for example using Google PaLM2's safety attributes.

thruthinkcohereweaviatechat's People

Contributors

mrcsabatoth avatar

Stargazers

Vince Fulco--Bighire.tools avatar

Watchers

Attila Toth avatar  avatar Kostas Georgiou avatar

thruthinkcohereweaviatechat's Issues

Highlight document chunk section in the document reference side

Currently I only handle and interpolate the references between the return text to the referenced documents. However when the retrieved document is not a QnA (which is a wholesome chunk), but a chunk from a help document, I could get the whole document and I can highlight which chunk section of that document made it to the retrieval. I'd need to design the UX, because even right now the document reference section can be long. Maybe looking for a Streamlit accordion or something.

Prompt engineer the query variation prompt

Came across https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/10-langchain-multi-query.ipynb in #6 https://www.youtube.com/watch?v=VFf8XJUIHnU

His prompt:

template = """
Your task is to generate 3 different search queries that aim to
answer the user question from multiple perspectives. The user questions
are focused on Large Language Models, Machine Learning, and related
disciplines.
Each query MUST tackle the question from a different viewpoint, we
want to get a variety of RELEVANT search results.
Provide these alternative questions separated by newlines.
Original question: {question}
"""

Refactor code further

Somehow extract the lengthy initialization.
Use black or other Python formatting on the source code.

Better error handling

For example if any clients or API endpoints could not be instantiated or similar corner cases

Use cohere re-ranker after we got the reranked list of the fused documents

This could easily be an extra step between the final RAG preparation and the fusion reranking.
See https://github.com/pinecone-io/examples/blob/master/learn/generation/better-rag/00-rerankers.ipynb

import cohere

os.environ["COHERE_API_KEY"] = os.getenv("COHERE_API_KEY") or getpass.getpass()
co = cohere.Client(os.environ["COHERE_API_KEY"])
rerank_docs = co.rerank(
    query=query, documents=docs.keys(), top_n=25, model="rerank-english-v2.0"
)

Follow-up questions trick the fusion

Example:

Q: "What do you know about ThruThink Deal Score?"
A: "ThruThink Deal Score is a proprietary feature1.1 that evaluates multiple factors1.1 involving historical information, the transaction itself and the projections1.1 which result in an overall grade or score for the "Deal"1.1.
It is an optional feature1.2, that is turned off by default1.2 within the ThruThink web application. It must be manually turned on by the user1.2 in the Workflow menu, Step 71.2, by clicking the button for ThruThink Evaluation inputs1.2 and choosing "Yes"1.2.
The ThruThink Deal Score provides an "A", "B", "C", "D" or "F"1.2 rating of the Deal with "A"1.2 being the highest rating. The rating evaluates the Deal or comparative benefit to the Owner as presented by the User Inputs.1.2
Would you like to know more about the ThruThink Evaluation?"
Q: "Yes I would like"

Currently we generate variations for the current question. The variation generator LLM doesn't have the context of the co.chat. We'll need to work on this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.