Coder Social home page Coder Social logo

codefuse-ai / codefuse-modelcache Goto Github PK

View Code? Open in Web Editor NEW
649.0 20.0 32.0 2.77 MB

A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.

License: Other

Python 100.00%
llm semantic-cache

codefuse-modelcache's Introduction

Codefuse-ModelCache

中文 | English

Contents

news

  • 🔥🔥[2024.04.09] Add Redis Search to store and retrieve embeddings in multi-tenant scene, this can reduce the interaction time between Cache and vector databases to 10ms.
  • 🔥🔥[2023.12.10] we integrate LLM embedding frameworks such as 'llmEmb', 'ONNX', 'PaddleNLP', 'FastText', alone with the image embedding framework 'timm', to bolster embedding functionality.
  • 🔥🔥[2023.11.20] codefuse-ModelCache has integrated local storage, such as sqlite and faiss, providing users with the convenience of quickly initiating tests.
  • [2023.08.26] codefuse-ModelCache...

Introduction

Codefuse-ModelCache is a semantic cache for large language models (LLMs). By caching pre-generated model results, it reduces response time for similar requests and improves user experience.
This project aims to optimize services by introducing a caching mechanism. It helps businesses and research institutions reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models. Through open-source, we aim to share and exchange technologies related to large model semantic cache.

Quick Deployment

The project's startup scripts are divided into flask4modelcache.py and flask4modelcache_demo.py.

  • flask4modelcache_demo.py is a quick test service that embeds sqlite and faiss, and users do not need to be concerned about database-related matters.
  • flask4modelcache.py is the normal service that requires configuration of mysql and milvus database services.

Dependencies

  • Python version: 3.8 and above
  • Package Installation
pip install -r requirements.txt 

Service Startup

Demo Service Startup

  1. Download the embedding model bin file from the following address: https://huggingface.co/shibing624/text2vec-base-chinese/tree/main. Place the downloaded bin file in the model/text2vec-base-chinese folder.
  2. Start the backend service using the flask4modelcache_dome.py script.
cd CodeFuse-ModelCache
python flask4modelcache_demo.py

Normal Service Startup

Before starting the service, the following environment configurations should be performed:

  1. Install the relational database MySQL and import the SQL file to create the data tables. The SQL file can be found at: reference_doc/create_table.sql
  2. Install the vector database Milvus.
  3. Add the database access information to the configuration files:
    1. modelcache/config/milvus_config.ini
    2. modelcache/config/mysql_config.ini
  4. Download the embedding model bin file from the following address: https://huggingface.co/shibing624/text2vec-base-chinese/tree/main. Place the downloaded bin file in the model/text2vec-base-chinese folder.
  5. Start the backend service using the flask4modelcache.py script.

Service-Access

The current service provides three core functionalities through RESTful API.: Cache-Writing, Cache-Querying, and Cache-Clearing. Demos:

Cache-Writing

import json
import requests
url = 'http://127.0.0.1:5000/modelcache'
type = 'insert'
scope = {"model": "CODEGPT-1008"}
chat_info = [{"query": [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "你是谁?"}],
                  "answer": "Hello, I am an intelligent assistant. How can I assist you?"}]
data = {'type': type, 'scope': scope, 'chat_info': chat_info}
headers = {"Content-Type": "application/json"}
res = requests.post(url, headers=headers, json=json.dumps(data))

Cache-Querying

import json
import requests
url = 'http://127.0.0.1:5000/modelcache'
type = 'query'
scope = {"model": "CODEGPT-1008"}
query = [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "Who are you?"}]
data = {'type': type, 'scope': scope, 'query': query}

headers = {"Content-Type": "application/json"}
res = requests.post(url, headers=headers, json=json.dumps(data))

Cache-Clearing

import json
import requests
url = 'http://127.0.0.1:5000/modelcache'
type = 'remove'
scope = {"model": "CODEGPT-1008"}
remove_type = 'truncate_by_model'
data = {'type': type, 'scope': scope, 'remove_type': remove_type}

headers = {"Content-Type": "application/json"}
res = requests.post(url, headers=headers, json=json.dumps(data))

Articles

https://mp.weixin.qq.com/s/ExIRu2o7yvXa6nNLZcCfhQ

modules

modelcache modules

Function-Comparison

In terms of functionality, we have made several changes to the git repository. Firstly, we have addressed the network issues with huggingface and enhanced the inference speed by introducing local inference capabilities for embeddings. Additionally, considering the limitations of the SqlAlchemy framework, we have completely revamped the module responsible for interacting with relational databases, enabling more flexible database operations. In practical scenarios, LLM products often require integration with multiple users and multiple models. Hence, we have added support for multi-tenancy in the ModelCache, while also making preliminary compatibility adjustments for system commands and multi-turn dialogue.

Module Function
ModelCache GPTCache
Basic Interface Data query interface
Data writing interface
Embedding Embedding model configuration
Large model embedding layer
BERT model long text processing
Large model invocation Decoupling from large models
Local loading of embedding model
Data isolation Model data isolation
Hyperparameter isolation
Databases MySQL
Milvus
OceanBase
Session management Single-turn dialogue
System commands
Multi-turn dialogue
Data management Data persistence
One-click cache clearance
Tenant management Support for multi-tenancy
Milvus multi-collection capability
Other Long-short dialogue distinction

Core-Features

In ModelCache, we adopted the main idea of GPTCache, includes core modules: adapter, embedding, similarity, and data_manager. The adapter module is responsible for handling the business logic of various tasks and can connect the embedding, similarity, and data_manager modules. The embedding module is mainly responsible for converting text into semantic vector representations, it transforms user queries into vector form.The rank module is used for sorting and evaluating the similarity of the recalled vectors. The data_manager module is primarily used for managing the database. In order to better facilitate industrial applications, we have made architectural and functional upgrades as follows:

  • We have modified it similar to Redis and embedded it into the LLMs product, providing semantic caching capabilities. This ensures that it does not interfere with LLM calls, security audits, and other functionalities, achieving compatibility with all large-scale model services.
  • Multiple Model Loading Schemes:
    • Support loading local embedding models to address Hugging Face network connectivity issues.
    • Support loading various pretrained model embedding layers.
  • Data Isolation Capability
    • Environment Isolation: Can pull different database configurations based on the environment to achieve environment isolation (dev, prepub, prod).
    • Multi-tenant Data Isolation: Dynamically create collections based on the model for data isolation, addressing data isolation issues in multi-model/services scenarios in LLMs products.
  • Support for System Commands: Adopting a concatenation approach to address the issue of system commands in the prompt format.
  • Differentiation of Long and Short Texts: Long texts pose more challenges for similarity evaluation. To address this, we have added differentiation between long and short texts, allowing for separate configuration of threshold values for determining similarity.
  • Milvus Performance Optimization: The consistency_level of Milvus has been adjusted to "Session" level, which can result in better performance.
  • Data Management Capability:
    • Ability to clear the cache, used for data management after model upgrades.
    • Hitquery recall for subsequent data analysis and model iteration reference.
    • Asynchronous log write-back capability for data analysis and statistics.
    • Added model field and data statistics field for feature expansion.

Todo List

Adapter

  • Register adapter for Milvus:Based on the "model" parameter in the scope, initialize the corresponding Collection and perform the load operation.

Embedding model&inference

  • Inference Optimization: Optimizing the speed of embedding inference, compatible with inference engines such as FasterTransformer, TurboTransformers, and ByteTransformer.
  • Compatibility with Hugging Face models and ModelScope models, offering more methods for model loading.

Scalar Storage

  • Support MongoDB
  • Support ElasticSearch

Vector Storage

  • Adapts Faiss storage in multimodal scenarios.

Rank能力

  • Add ranking model to refine the order of data after embedding recall.

Service

  • Supports FastAPI.
  • Add visual interface to offer a more direct user experience.

Acknowledgements

This project has referenced the following open-source projects. We would like to express our gratitude to the projects and their developers for their contributions and research.
GPTCache

Contributing

ModelCache is a captivating and invaluable project, whether you are an experienced developer or a novice just starting out, your contributions to this project are warmly welcomed. Your involvement in this project, be it through raising issues, providing suggestions, writing code, or documenting and creating examples, will enhance the project's quality and make a significant contribution to the open-source community.

codefuse-modelcache's People

Contributors

peng3307165 avatar powerli2002 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codefuse-modelcache's Issues

cache是基于prompt的缓存?

看了一遍文章,没理解这块缓存是怎么优化的;
问题1.缓存是针对query,还是针对的prompt?
问题2:faiss中存的向量是历史的query吗?如果是的话对于那些同query会时间更新改变answer的case又是如何处理的?比如同样问现在几点了,模型每次都应该再算一次吧
问题3:下图这个分支是如何理解?能给个例子吗
image

[Feature: Ranking ability] Add ranking model to refine the order of data after embedding recall

This issue is created to better track my PRs for Todo List [Rank ability]

Background

Efficiently retrieving relevant results from large-scale datasets plays a crucial role in software development and Q&A tasks, including code recommendation and vulnerability code clone discovery. Existing search approaches suffer the difficulty of being scaled up, while maintaining the effectiveness of retrieved results.

Prototype

In this paper, we propose a two-stage search framework to address these limitations and achieve both effectiveness and efficiency in the search Scenario. We will add ranking model to refine the order of data after embedding recall.

Screenshot 2024-05-28 at 10 37 53

Task

  • Add ranking model to refine the order of data after embedding recall.

非常感谢蚂蚁开源code模型

现在有几个功能想请假大佬,我使用的CodeFuse-CodeLlama-34B-4bits的模型,对auto_gptq不熟悉,不知道怎么修改成的openai_api的功能,能不能给一个使用AutoTokenizer, AutoModelForCausalLM调模型的案例。
另外一个是,我提问是
你是一个数据分析师,请使用python进行数据分析。我已提供文件titanic.csv,文件地址是/mnt/bst/,请你分析一下这个文件,先读取这个文件,请写出python代码
回答:
generate text is 首先,我们需要导入必要的库.我们将使用pandas,matplotlib和seaborn库.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

然后,我们将读取文件.

df = pd.read_csv('/mnt/bst/titanic.csv')

接下来,我们将查看数据的前几行.

print(df.head())

然后,我们将查看数据的描述.

print(df.describe())

接下来,我们将查看数据的数据类型.

print(df.dtypes)

接下来,我们将查看数据中是否有缺失值.

print(df.isnull().sum())

接下来,我们将查看数据的唯一值.

print(df.nunique())

接下来,我们将查看数据的分布.

df.hist(bins=50, figsize=(20,15))
plt.show()

最后,我们将查看数据的相关性.

corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True)
plt.show()

这些就是一些基本的数据分析步骤.你可以根据需要进行更多的数据分析.
回答是分段回答,这块我要怎么写提示词,形成一个整块的可马上执行的代码
非常感谢。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.