Coder Social home page Coder Social logo

pnkvalavala / repochat Goto Github PK

View Code? Open in Web Editor NEW
209.0 1.0 38.0 38 KB

Chatbot assistant enabling GitHub repository interaction using LLMs with Retrieval Augmented Generation

Home Page: https://repochat.streamlit.app

License: Apache License 2.0

Python 100.00%
chat-application deeplake github langchain openai streamlit code-llama huggingface retrieval-augmented-generation

repochat's Introduction

Repochat - GitHub Repository Interactive Chatbot

Repochat is an interactive chatbot project designed to engage in conversations about GitHub repositories using a Large Language Model (LLM). It allows users to have meaningful discussions, ask questions, and retrieve relevant information from a GitHub repository. This README provides step-by-step instructions for setting up and using Repochat on your local machine.

Repochat.mp4

Table of Contents

Branches

Repochat offers two branches with distinct functionalities:

Main Branch

The main branch of Repochat is designed to run entirely on your local machine. This version of Repochat doesn't rely on external API calls and offers greater control over your data and processing. If you're looking for a self-contained solution, the main branch is the way to go.

Cloud Branch

The cloud branch of Repochat primarily relies on API calls to external services for model inference and storage. It's well-suited for those who prefer a cloud-based solution and don't want to set up a local environment.

Installation

To get started with Repochat, you'll need to follow these installation steps:

  1. Create a virtual environment and activate on your local machine to isolate the project's dependencies.

    python -m venv repochat-env
    source repochat-env/bin/activate
  2. Clone the Repochat repository and navigate to the project directory.

    git clone https://github.com/pnkvalavala/repochat.git
    cd repochat
  3. Install the required Python packages using pip.

    pip install -r requirements.txt
  4. Install the "llama-cpp-python" library.

    Installation without Hardware Acceleration

    pip install llama-cpp-python

    Installation with Hardware Acceleration

    llama.cpp supports multiple BLAS backends for faster processing.

    To install with OpenBLAS, set the LLAMA_BLAS and LLAMA_BLAS_VENDOR environment variables before installing:

    CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python

    To install with cuBLAS, set the LLAMA_CUBLAS=1 environment variable before installing:

    CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

    To install with CLBlast, set the LLAMA_CLBLAST=1 environment variable before installing:

    CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python

    To install with Metal (MPS), set the LLAMA_METAL=on environment variable before installing:

    CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python

    To install with hipBLAS / ROCm support for AMD cards, set the LLAMA_HIPBLAS=on environment variable before installing:

    CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python

    To get to know more about Hardware Acceleration, refer to official README from llama-cpp-python

  5. Create a folder named models in the project directory.

  6. Download a Language Model from the Hugging Face Model Hub based on your computer's capabilities. It is recommended using the following model as a starting point: TheBloke/CodeLlama-7B-GGUF. If you want to quantize a model available on Hugging Face, follow the instructions from llama.cpp

  7. Copy the downloaded model file to the "models" folder.

  8. Open the models.py file located in the "repochat" folder and set the model file location in the code_llama() function as follows:

    def code_llama():
        callbackmanager = CallbackManager([StreamingStdOutCallbackHandler()])
        llm = LlamaCpp(
            model_path="./models/codellama-7b.Q4_K_M.gguf",
            n_ctx=2048,
            max_tokens=200,
            n_gpu_layers=1,
            f16_kv=True,
            callback_manager=callbackmanager,
            verbose=True,
            use_mlock=True
        )
        return llm

Usage

  1. Open your terminal and run the following command to start the Repochat application:

    streamlit run app.py
  2. You can now input the GitHub repository link.

  3. Repochat will fetch all the files from the repository and store them in a folder named "cloned_repo." It will then split the files into smaller chunks and calculate their embeddings using the Sentence Transformers model, specifically sentence-transformers/all-mpnet-base-v2.

  4. The embeddings are stored locally in a vector database called ChromaDB.

Chatbot Functionality

Repochat allows you to engage in conversations with the chatbot. You can ask questions or provide input, and the chatbot will retrieve relevant documents from the vector database. It then sends your input, along with the retrieved documents, to the Language Model for generating responses. By default, I've set the model to "codellama-7b-instruct," but you can change it based on your computer's speed, and you can even try the 13b quantized model for responses.

The chatbot retains memory during the conversation to provide contextually relevant responses.

Raising Issues

If you encounter any issues, have suggestions, or want to report a bug, please visit the Issues section of the Repochat repository and create a new issue. Provide detailed information about the problem you're facing, and I'll do my best to assist you.

License

This project is licensed under the Apache License 2.0. For details, see the LICENSE file. Please note that this is a change from the previous license, and it's important to review the terms and conditions of the new license.

repochat's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

repochat's Issues

strip off extra .git if repo url ends with it

the clone_repo function url always adds a ".git" to the URL

command = f'git clone {git_url}.git {repo_path} && rm -rf {repo_path}/.git'

but if you paste the repo URL for HTTPS from GitHub from the code button it always has this. i.e. https://github.com/pnkvalavala/repochat.git
then the command that ends up getting run is
git clone https://github.com/pnkvalavala/repochat.git.git ./cloned_repo && rm -rf ./cloned_repo/.git
which then it assumes it's a private repo since it is a 404 not found then you are prompted for your password on the CLI (not frontend). and it never completes.

I think simply adding an rstrip(".git") would be an elegant solution so people can paste the path in front end with or without .git and it will work either way:

git_url = git_url.rstrip(".git")
command = f'git clone {git_url}.git {repo_path} && rm -rf {repo_path}/.git'

Private github repos?

Is there currently a way to pull from private repos assuming authenticated with GH CLI?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.