Coder Social home page Coder Social logo

vlassie / llama-2-cpu-inference Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kennethleungty/llama-2-open-source-llm-cpu-inference

1.0 1.0 0.0 4.77 MB

Copy of Llama2 CPU inference project by Kenneth Leungty, adjusted to add more features

Home Page: https://towardsdatascience.com/running-llama-2-on-cpu-inference-for-document-q-a-3d636037a3d8

License: MIT License

Python 100.00%

llama-2-cpu-inference's Introduction

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Preface

This is a fork of Kenneth Leung's original repository, that adjusts the original code in several ways:

  • A streamlit visualisation is available to make it more user-friendly
  • Follow-up questions are now possible thanks to memory implementation
  • Different models now appear as options for the user
  • Multiple other optimalisations

Quickstart

  • Note: If you want to run this in an offline environment, read the following instructions first: Using offline embeddings

  • Ensure you have downloaded the model of your choice in GGUF format and placed it into the models/ folder. Some examples:

  • Fill the data/ folder with .pdf, .doc(x) or .txt files you want to ask questions about

  • To build a FAISS database with information regarding your files, launch the terminal from the project directory and run the following command
    python db_build.py

  • To start asking questions about your files, run the following command:
    streamlit run st_main.py

  • Choose which model to use for Q&A and adjust parameters to your liking

Alt text


Using offline embeddings

Necessary word embeddings are usually downloaded when running the application. This works for most use cases, but not for those where this application has to be run without any connection to the internet at all.

In those cases, perform the following steps:

  1. Download the desired embedding files from https://sbert.net/models
    • This repo uses all-MiniLM-L6-v2.zip
    • Unzip to folder: sentence-transformers_all-MiniLM-L6-v2/
    • If you want to use different embeddings, you should adjust the folder name and the reference to it in db_build.py (line 74)
  2. Go to the .cache/ folder on your offline machine
    • Can be found in C:/Users/[User]/ for most Windows machines
  3. Within this folder, create torch/sentence_transformers/ if nonexistent
  4. Place embedding folder from step 1 inside of /sentence_transformers/

If all steps were performed correctly, the application will find the embeddings locally and will not try to download the embeddings.


Tools

  • LangChain: Framework for developing applications powered by language models
  • LlamaCPP: Python bindings for the Transformer models implemented in C/C++
  • FAISS: Open-source library for efficient similarity search and clustering of dense vectors.
  • Sentence-Transformers (all-MiniLM-L6-v2): Open-source pre-trained transformer model for embedding text to a 384-dimensional dense vector space for tasks like clustering or semantic search.
  • Llama-2-7B-Chat: Open-source fine-tuned Llama 2 model designed for chat dialogue. Leverages publicly available instruction datasets and over 1 million human annotations.

Files and Content

  • /assets: Images relevant to the project
  • /config: Configuration files for LLM application
  • /data: Dataset used for this project (i.e., Manchester United FC 2022 Annual Report - 177-page PDF document)
  • /models: Binary file of GGUF quantized LLM model (i.e., Llama-2-7B-Chat)
  • /src: Python codes of key components of LLM application, namely llm.py, utils.py, prompts.py and classes.py
  • /vectorstore: FAISS vector store for documents
  • db_build.py: Python script to ingest dataset and generate FAISS vector store
  • db_clear.py: Python script to clear the previously built database
  • main.py: Main Python script to launch the application from the terminal
  • st_main.py: Main Python script to launch the application with streamlit visuals
  • st_upl.py: Python script to launch a version of the app to ask questions about uploaded PDFs
  • st_csv.py: Python script to launch a version of the app to ask questions about uploaded CSVs
  • requirements.txt: List of Python dependencies (and version)

References

llama-2-cpu-inference's People

Contributors

kennethleungty avatar vlassie avatar eltociear avatar seyedsaeidmasoumzadeh avatar

Stargazers

 avatar

Watchers

Victor Gevers avatar

llama-2-cpu-inference's Issues

How to make this run on a GPU?

@Vlassie I have the following queries

  • Will it run on XLSX\XLS\CSV files?
  • How to make it run on GPU?
  • Will it be able to answer the queries in, let's say less than 25secs?

Fix 'double' prompt

Right now the code adds two different prompts together, one from main_st.py and one from prompts.py.

Only one prompt should be pushed to the model.

Adjust readme

The current readme file is still based on the original fork. A lot of things have changed, especially the instructions for using the program.

Fix memory issue

Right now, the chatbot only remembers the first conversation context you ask it. After clearing chat history, it should reset to 0 context.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.