Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Preface

This is a fork of Kenneth Leung's original repository, that adjusts the original code in several ways:

A streamlit visualisation is available to make it more user-friendly
Follow-up questions are now possible thanks to memory implementation
Different models now appear as options for the user
Multiple other optimalisations

Quickstart

Note: If you want to run this in an offline environment, read the following instructions first: Using offline embeddings
Ensure you have downloaded the model of your choice in GGUF format and placed it into the models/ folder. Some examples:
- https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF
- https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF
Fill the data/ folder with .pdf, .doc(x) or .txt files you want to ask questions about
To build a FAISS database with information regarding your files, launch the terminal from the project directory and run the following command
python db_build.py
To start asking questions about your files, run the following command:
streamlit run st_main.py
Choose which model to use for Q&A and adjust parameters to your liking

Using offline embeddings

Necessary word embeddings are usually downloaded when running the application. This works for most use cases, but not for those where this application has to be run without any connection to the internet at all.

In those cases, perform the following steps:

Download the desired embedding files from https://sbert.net/models
- This repo uses all-MiniLM-L6-v2.zip
- Unzip to folder: sentence-transformers_all-MiniLM-L6-v2/
- If you want to use different embeddings, you should adjust the folder name and the reference to it in db_build.py (line 74)
Go to the .cache/ folder on your offline machine
- Can be found in C:/Users/[User]/ for most Windows machines
Within this folder, create torch/sentence_transformers/ if nonexistent
Place embedding folder from step 1 inside of /sentence_transformers/

If all steps were performed correctly, the application will find the embeddings locally and will not try to download the embeddings.

Tools

LangChain: Framework for developing applications powered by language models
LlamaCPP: Python bindings for the Transformer models implemented in C/C++
FAISS: Open-source library for efficient similarity search and clustering of dense vectors.
Sentence-Transformers (all-MiniLM-L6-v2): Open-source pre-trained transformer model for embedding text to a 384-dimensional dense vector space for tasks like clustering or semantic search.
Llama-2-7B-Chat: Open-source fine-tuned Llama 2 model designed for chat dialogue. Leverages publicly available instruction datasets and over 1 million human annotations.

Files and Content

/assets: Images relevant to the project
/config: Configuration files for LLM application
/data: Dataset used for this project (i.e., Manchester United FC 2022 Annual Report - 177-page PDF document)
/models: Binary file of GGUF quantized LLM model (i.e., Llama-2-7B-Chat)
/src: Python codes of key components of LLM application, namely llm.py, utils.py, prompts.py and classes.py
/vectorstore: FAISS vector store for documents
db_build.py: Python script to ingest dataset and generate FAISS vector store
db_clear.py: Python script to clear the previously built database
main.py: Main Python script to launch the application from the terminal
st_main.py: Main Python script to launch the application with streamlit visuals
st_upl.py: Python script to launch a version of the app to ask questions about uploaded PDFs
st_csv.py: Python script to launch a version of the app to ask questions about uploaded CSVs
requirements.txt: List of Python dependencies (and version)

vlassie / llama-2-cpu-inference Goto Github PK

llama-2-cpu-inference's Introduction

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Preface

Quickstart

Using offline embeddings

Tools

Files and Content

References

llama-2-cpu-inference's People

Contributors

Stargazers

Watchers

llama-2-cpu-inference's Issues

Recommend Projects

Recommend Topics

Recommend Org