Coder Social home page Coder Social logo

daethyra / freestream Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 3.0 1.39 MB

Multi-Page Generative AI Application and Tools

Home Page: https://freestream.streamlit.app

License: Other

Python 100.00%
chatbot langchain large-language-models openai python question-answering rag retrieval-augmented-generation streamlit google-ai

freestream's Introduction

Hi there ๐Ÿ‘‹ I'm Daethyra (pronounced: duh-thear-uh)

A bit about me...

  • ๐Ÿณ๏ธโ€โšง๏ธ Pronouns: she/her
  • ๐Ÿ”ญ Iโ€™m currently working on transitioning from cybersecurity and into software development.
  • ๐ŸŒฑ Iโ€™m currently building FreeStream, a Streamlit Multi-Page App with various Chatbots for many use-cases.
  • ๐Ÿ‘ฏ I love collaborating on open-source projects with a vision to make peoples' lives better.
  • ๐Ÿค” More specifically, I want to build cool stuff for others that automates the mundane.
  • ๐Ÿ’ฌ Ask me about my favorite video game, or which games I'm playing recently.
  • โšก Fun fact: I love Star Wars! Guess my favorite trilogy.

Tech Stack

Python JavaScript TypeScript Pandas Numpy Scikit-Learn LangChain PyTorch TensorFlow React Next.js Tailwind CSS Flask FastAPI Jupyter HTML5 CSS Markdown PDM Git GitHub Docker Windows Linux MongoDB Google Cloud AWS Azure


Notable Projects

FreeStream

Description: A web application where you can access Claude Opus and GPT-4 for free, making use of the different chatbot architectures I've set up. The first example is focused on retrieval augmented generation and requires you to upload files for the AI to generate answers from. The second chatbot, so far, is a general-purpose chatbot, and the benefit to using FreeStream is that there's no chat-length limits and you can "drop-in" your choice of large language model from foundational model providers OpenAI, Anthropic, and Google.

Build-RAGAI

Description: A collection of Jupyter Notebooks and Python components to leverage LangChain, OpenAI, and Transformers for building generative AI applications, providing reusable code snippets, tutorials, and end-to-end examples.


Daethyra's GitHub stats Top Langs

freestream's People

Contributors

daethyra avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

freestream's Issues

LLM Reflectivity | RAG -> CRAG | Update `qa_chain` for use beyond OpenAI models

History Aware Retriever (idea):

# Create a history-aware retriever
history_aware_retriever = create_history_aware_retriever(
    retriever, combine_docs_chain
)

# Create a RAG chain
qa_chain = create_retrieval_chain(history_aware_retriever, llm)

# if the length of messages is 0, or when the user \
# clicks the clear button
if len(msgs.messages) == 0 or st.sidebar.button("Clear message history"):
    msgs.clear()
    # show a default message from the AI
    msgs.add_ai_message("How can I help you?")

# Display coversation history window
avatars = {"human": "user", "ai": "assistant"}
for msg in msgs.messages:
    st.chat_message(avatars[msg.type]).write(msg.content)

# Display user input field and enter button
if user_query := st.chat_input(placeholder="Ask me anything!"):
    st.chat_message("user").write(user_query)

    # Display assistant response
    with st.chat_message("assistant"):
        # Check for the presence of the "messages" key in session state
        if "messages" not in st.session_state:
            st.session_state.messages = []

        retrieval_handler = PrintRetrievalHandler(st.container())
        stream_handler = StreamHandler(st.empty())
        response = qa_chain.invoke(
            {"input": user_query}, callbacks=[retrieval_handler, stream_handler]
        )

migrate from `qa_chain.run` to `qa_chain.invoke`

Issue is based on the following existing snippet:

        response = qa_chain.run(
            user_query, callbacks=[retrieval_handler, stream_handler] # when switching to `qa_chain.invoke` you 
            # will use cfg = RunnableConfig()
            # cfg["callbacks"] = [retrieval_handler, stream_handler]
        )

Image Upscaler HALTED | Improper import statement for dependency: `basicsr` | ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'

Details

Reference Issues:

  1. BasicSR
  2. Real-ESRGAN
  3. Automatic1111

My Traceback (PII Scrubbed):

(.venv) PS C:\Users\Software\testground\Real-ESRGAN> python .\inference_realesrgan.py      
Traceback (most recent call last):
 File "C:\Users\Software\testground\Real-ESRGAN\inference_realesrgan.py", line 5, in <module>
    from basicsr.archs.rrdbnet_arch import RRDBNet
 File "C:\Users\Software\.venv\lib\site-packages\basicsr\__init__.py", line 4, in <module>
    from .data import *
 File "C:\Users\Software\.venv\lib\site-packages\basicsr\data\__init__.py", line 22, in <module>
    _dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames]
 File "C:\Users\Software\.venv\lib\site-packages\basicsr\data\__init__.py", line 22, in <listcomp>
    _dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames]
 File "C:\Users\Software\.venv\lib\site-packages\basicsr\data\realesrgan_dataset.py", line 11, in <module>
    from basicsr.data.degradations import circular_lowpass_kernel, random_mixed_kernels
 File "C:\Users\Software\.venv\lib\site-packages\basicsr\data\degradations.py", line 8, in <module>
    from torchvision.transforms.functional_tensor import rgb_to_grayscale
ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'

Suggested Fix

AgentExecutor | Add tools: Tavily, Calculator, `configure_retriever`

Edit: This issue is serving as a reference, and isn't to be worked on directly. Instead, focus on learning and implementing LangGraph to for a smooth transition to resolve multiple issues in one fell swoop.


Using this prompt: https://smith.langchain.com/hub/hwchase17/react?organizationId=0f7461cf-206f-5c85-aa8d-48c6c48bafc5
I could add LLMMatchChain:

# Init different built-in tools
search = DuckDuckGoSearchAPIWrapper()
llm_math_chain = LLMMathChain.from_llm(llm)

# create a toolset list
tools = [
    Tool(
        name="Search",
        func=search.run,
        description="useful for when you need to answer questions about current events. You should ask targeted questions",
    ),
    Tool(
        name="Calculator",
        func=llm_math_chain.run,
        description="useful for when you need to answer questions about math",
    ),
]

Turn `configure_retriever` into a Tool for an Agent | (?)Refactor `configure_retriever` to a class(?) | `configure_retriever` is restrictive, Provides no access to inner functionality

Refactor configure_retriever to a class

Increase Modularity

Idea:

class RetrieverConfigurator:
    """
    A class for configuring a retriever object based on uploaded files.

    This class encapsulates the process of reading documents from uploaded files,
    splitting them into smaller chunks, creating embeddings for each chunk, and defining
    a retriever object that uses the FAISS vector database to search for similar documents.
    """

    def __init__(self, uploaded_files):
        """
        Initialize the RetrieverConfigurator object.

        Args:
            uploaded_files (list): A list of Streamlit uploaded file objects.
        """
        self.uploaded_files = uploaded_files
        self.docs = []
        self.chunks = []
        self.vectordb = None
        self.retriever = None

    def read_documents(self):
        """
        Reads the documents from the uploaded files.

        This method iterates over the uploaded files, writes them to temporary files,
        and loads the documents using the UnstructuredFileLoader.
        """
        temp_dir = tempfile.TemporaryDirectory()
        for file in self.uploaded_files:
            temp_filepath = os.path.join(temp_dir.name, file.name)
            with open(temp_filepath, "wb") as f:
                f.write(file.getvalue())
            loader = UnstructuredFileLoader(temp_filepath)
            self.docs.extend(loader.load())
            logger.info("Loaded document: %s", file.name)

    def split_documents(self):
        """
        Splits the loaded documents into smaller chunks.

        This method uses the RecursiveCharacterTextSplitter to split the documents
        into chunks based on the specified chunk size and overlap.
        """
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=75)
        self.chunks = text_splitter.split_documents(self.docs)

    def create_embeddings(self):
        """
        Creates embeddings for each chunk using the HuggingFace's MiniLM model.

        This method initializes the HuggingFaceEmbeddings with the specified model
        and generates embeddings for the chunks. The embeddings are then stored
        in a FAISS vector database.
        """
        model_kwargs = {"device": "cuda" if torch.cuda.is_available() else "cpu"}
        embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2", model_kwargs=model_kwargs)
        self.vectordb = FAISS.from_documents(self.chunks, embeddings)

    def define_retriever(self):
        """
        Defines a retriever object that uses the FAISS vector database.

        This method creates a retriever object from the FAISS vector database,
        configuring it with the specified search type and parameters.
        """
        self.retriever = self.vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 3, "fetch_k": 7})

    def configure_retriever(self):
        """
        Configures and returns a retriever object for a given list of uploaded files.

        This method orchestrates the process of reading documents, splitting them into chunks,
        creating embeddings, and defining a retriever object. The configured retriever
        object is then returned.

        Returns:
            retriever (Retriever): A retriever object that can be used to search for similar documents.
        """
        self.read_documents()
        self.split_documents()
        self.create_embeddings()
        self.define_retriever()
        return self.retriever

configurator = RetrieverConfigurator(uploaded_files)
retriever = configurator.configure_retriever()

`configure_retreiver` cache_resource -> cache_data

From https://docs.streamlit.io/library/advanced-features/caching

st.cache_data is the recommended way to cache computations that return data: loading a DataFrame from CSV, transforming a NumPy array, querying an API, or any other function that returns a serializable data object (str, int, float, DataFrame, array, list, โ€ฆ). It creates a new copy of the data at each function call, making it safe against mutations and race conditions. The behavior of st.cache_data is what you want in most cases โ€“ so if you're unsure, start with st.cache_data and see if it works!
st.cache_resource is the recommended way to cache global resources like ML models or database connections โ€“ unserializable objects that you don't want to load multiple times. Using it, you can share these resources across all reruns and sessions of an app without copying or duplication. Note that any mutations to the cached return value directly mutate the object in the cache (more details below).

Gemini-Pro generates non-English text

During ConversationalRetrievalChain, when Gemini creates a retrieval question, it seems to pick a random language
-> Fix:
---> AgentExecutor implementation
---> Custom prompt chaining

Create Agent Executor w/ Tools + Memory

Idea inspired by the following code from here:

    chat_agent = ConversationalChatAgent.from_llm_and_tools(llm=llm, tools=tools)
    executor = AgentExecutor.from_agent_and_tools(
        agent=chat_agent,
        tools=tools,
        memory=memory,
        return_intermediate_steps=True,
        handle_parsing_errors=True,
    )

Note: By passing memory into the Agent Executor, we no longer have to worry about passing in the right key values to an invoke dictionary.

replace `ConversationalRetrievalChain` with own chain

The following causes non-ChatGPT models to write standalone questions in non-English text:

# Create a chain that ties everything together
qa_chain = ConversationalRetrievalChain.from_llm(
    # switch to 
    # create_history_aware_retriever
    llm, retriever=retriever, memory=memory, verbose=True
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.