zilliztech / gptcache Goto Github PK

View Code? Open in Web Editor NEW

7.0K 59.0 493.0 22.79 MB

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

Home Page: https://gptcache.readthedocs.io

License: MIT License

Python 99.66% Makefile 0.10% Shell 0.21% Dockerfile 0.03%

chatbot chatgpt chatgpt-api llm milvus similarity-search vector-search aigc openai memcache

gptcache's Introduction

gptcache's People

Contributors

Stargazers

Watchers

Forkers

simfg kirillkazakov8 filip-halt shiyu22 bennu-li techthiyanes liliu-z xiaofan-luan jiaoew1991 rossman22590 cxie shanghaikid dormymo yangwang92 zzmjohn machinelearningsystem itsharex binbinlv ahlag fzliu tabversion pjkw rsohlot wikipockets zc277584121 kurotanshi jhenderson00 lloydchang buphnezz minhhungvu17 zhuwenxing sagoyanfisic ai-psa timmylicheng junjiejiangjjj chasingegg mobs75 lyhiving co-simulation gavinjx isold23 pouyanpi amazingchow lihuibng yyyasin19 kakao014 babyblue26 foxspy longxtx yueyedeai vegetabledog118 thlo7777 bighomec 447555240 koala73 joecp17 annias sn0wfree xialuxi ii0 qq6690876 ai-learn-use chavesliu kuroro99 nemoyn jackliaoall-ai-gpt tcjacker aileen5150 janhway ygj781129 big-data-ai gaohuan2015 maeganyork peins anubrag scienceartist edwardsun9 lishuanzhu allchain yangxiyucs yejiahaoye qwang6 mgeorge01 fxhollow eltociear nashid congshan0519 wangzhifei0320 zhaolijun1109 wuyujuan1003 minghsuanwu jensinjames bahdenkurdish goswamig vkill-wucy wanghaipeng789 skiff0303 hhy5277 avaintell wangguojim

gptcache's Issues

[Feature]: JavaScript/TypeScript Port

Python is obviously the ecosystem to be in for ML at the moment, but offering an easy interface for JavaScript/TypeScript (which is extremely common to use for both frontend and backend) could help increase library adoption.

I'd be curious to hear thoughts on the best approach for this, or if it's even a good idea, from those more knowledgeable. Would basically embedding this library be the best way, so that the code generally only needs written once? Or would a full port in JS probably be worth it, as to my understanding this isn't actually doing anything data-intensive or ML-intensive like creating the embeddings or the database system, and it would mostly just be syntactical changes instead of creating anything new?

[Feature]: Support for mutually exclusive multiple contexts

Is your feature request related to a problem? Please describe.

I am currently working on something that has uses multiple GPT3.5 contexts. The sharing of cache between those contexts throws up some errors. Is there a way to set up data_manager in a way that can name dbs differently ?

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Bug]: error: update_cache_func() takes 1 positional argument but 2 were given

Current Behavior

I ran the demo of Github in colab, but encountered an error. The error message is as follows: WARNING: root: failed to save the data to cache, error: update_cache_func() takes 1 positional argument but 2 were given. How can I fix this?

Expected Behavior

Successfully cached information

Steps To Reproduce

No response

Environment

colab

Anything else?

No response

[Enhancement]: Add langchain llm

What would you like to be added?

No response

Why is this needed?

No response

Anything else?

No response

[Feature]: Think of adding concept of session

Is your feature request related to a problem? Please describe.

There are couple of cases where we need the idea of session and context.

Let me quickly go through some examples:

Another use case is the langchain SQL demo, see (https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html)
The chain did follow:

Based on the query, determine which tables to use.
Based on those tables, call the normal SQL database chain.
The chain request context so it won't hit the second time

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Feature]: higher onnx similarity evaluation token limit

Is your feature request related to a problem? Please describe.

currently the implemented onnx similarity evalution using "GPTCache/albert-duplicate-onnx" is limited to 512 token, is it possible to get higher than 512?

Describe the solution you'd like.

using langchain conversational chat agent prompt producing aroute 600s to thousands of tokens, it will be easier to get the cache hit without reducing the prompt's token size using onnx similiarity evaluation

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Feature]: Support MongoDB Atlas for both scalar and vector data

Is your feature request related to a problem? Please describe.

As noted in #200 It would be nice to have a NoSQL implementation, personally I prefer Mongo to DynamoDB due to the breath of operations you can perform on it compared to Dynamo.

Describe the solution you'd like.

Hooking into their python driver PyMongo to store the scalar data in a standard NoSQL format. Mongo also offer a graph database option for storing and accessing the vector data MongoDB as a Graph Database. I would like to have a simplified data storage solution only using one provider.

Describe an alternate solution.

No response

Anything else? (Additional Context)

Also, this is brilliant, was a few weeks away from trying to create something similar for myself. But this is a great solution for my problems.

[DOCS]: Update Chart

Documentation Link

No response

Describe the problem

No response

Describe the improvement

No response

Anything else?

No response

[Feature]: Streaming support

Is your feature request related to a problem? Please describe.

No response

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Feature]: Moderation API

Is your feature request related to a problem? Please describe.

I am not able to use Moderation API.

Describe the solution you'd like.

Add moderation api openai and also another possible solution, and add caching mechanism too.

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Feature]: Support DynamoDB

Is your feature request related to a problem? Please describe.

It's like nosql but just comes with all the AWS support https://aws.amazon.com/dynamodb/

Describe the solution you'd like.

Possibly hook up the dynamodb through https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html then populate or fetch from it.

Describe an alternate solution.

Maybe mongo atlas? https://www.mongodb.com/cloud/atlas/register

Anything else? (Additional Context)

Thanks for the work on the cache implementations!

Naming suggestions: Perhaps, you can disconnect the GPT once the hype is over and call it something else because I think this cache can be applied to many ML/NLP/CV applications beyond just GPT things.

[Feature]: Support more configs for openAI models

Is your feature request related to a problem? Please describe.

Due to the document of openAI, we missed some major parameters of openAI document, see:

https://platform.openai.com/docs/api-reference/completions/create

max_tokens? just bypass to GPT for now
temperature: there are couple things we can do,
1. randomly pick answer from returned result if they are all very similar.
2. edit the answer with another small model：For instance image -> https://huggingface.co/lambdalabs/sd-image-variations-diffusers
n -> if there no enough cached result, we will need to generate from OpenAI anyway.
bestof -> control the topk numbers we want to retrieved from cache

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Feature]: qdrant as a vector store

Is your feature request related to a problem? Please describe.

Hi. Glad someone finally made this. Been on my mind for a long time.
Any chance of supporting qdrant as a vector store? Qdrant also allows filtering by metadata which can be helpful if you only want to retrieve cache from within a certain date range for example. Qdrant can also store the llm response as metadata which eliminates the sqlite requirement. The new qdrant local mode means you dont have to setup a server via docker. Just install from pip and you are ready to go.

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[DOCS]: Add CI badge and license badge

Documentation Link

No response

Describe the problem

No response

Describe the improvement

No response

Anything else?

No response

[Bug]: Multiple messages not answering last message

Current Behavior

Using openai's api,
when I passed multiple messages to the openai, it randomly returns one of the questions' answer .

Expected Behavior

Only return answer of last question.

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

[Bug]: Wrong behavior when used from the Docker Image

Current Behavior

I am following the doc to be used with docker from here.

I noticed that no matter what Prompt I set, it always returns the 1st item inserted, for example

curl -X GET  "http://localhost:8000?prompt=hello"
null
curl -X PUT -d "receive a hello message" "http://localhost:8000?prompt=hello+world"
curl -X GET  "http://localhost:8000?prompt=hello"
"receive a hello message"
curl -X GET  "http://localhost:8000?prompt=bye"
"receive a hello message"

I expect that the last bye query will not return any value, just a null.

I created a sample gptcache.yml with the following

model_src:
    openai
...
config:
    similarity_threshold: 0.2

By the way if I use threshold (as detailed in the docs) I got an error. Then another error using the -f gptcache.yml parameter

Traceback (most recent call last):
  File "/usr/local/bin/gptcache_server", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/gptcache_server/server.py", line 55, in main
    init_similar_cache_from_config(config_dir=args.cache_config_file)
  File "/usr/local/lib/python3.8/site-packages/gptcache/adapter/api.py", line 167, in init_similar_cache_from_config
    with open(config_dir, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'gptcache.yml'

Modified the command to something like this as I am using Windows and with the volume got it running

docker run -p 8000:8000 -v %cd%:"/workspace":rw -it zilliz/gptcache:latest gptcache_server -s 0.0.0.0 -p 8000 -f gptcache.yml

I expected to get a better result lowering the similarity_threshold value but exactly the same.

Am I missing something in how the component should be used?

thanks!

Expected Behavior

No response

Steps To Reproduce

No response

Environment

Windows 10 + Docker Image.

Anything else?

No response

[Bug]: Session is module is not working as expect

Current Behavior

when i run the example Sqlite + Faiss + ONNX, i get the following error.
And i try to add enter and exit function in the sql_storage.py, but it's not working .

def __enter__(self):
    print("debug: enter function test")
    return self

def __exit__(self):  # 
    self.drop()

def drop(self):
    self._data_manager.delete_session(self.name)
    print("debug:drop sql connnect")

return SSDataManager(cache_base, vector_base, object_base, max_size, clean_size, eviction)
File "C:\ProgramData\Anaconda3\lib\site-packages\gptcache\manager\data_manager.py", line 209, in init
self.eviction_base.put(self.s.get_ids(deleted=False))
File "C:\ProgramData\Anaconda3\lib\site-packages\gptcache\manager\scalar_data\sql_storage.py", line 229, in get_ids
with self.Session() as session:
AttributeError: enter

Expected Behavior

No response

Steps To Reproduce

No response

Environment

python : 3.8.3

Anything else?

No response

[Bug]: Moderation api is not working

Current Behavior

modOutputres = openai.Moderation.create(input=question)
File "/opt/anaconda3/envs/openai/lib/python3.9/site-packages/gptcache/adapter/openai.py", line 319, in create
res = adapt(
File "/opt/anaconda3/envs/openai/lib/python3.9/site-packages/gptcache/adapter/adapter.py", line 39, in adapt
pre_embedding_res = chat_cache.pre_embedding_func(
File "/opt/anaconda3/envs/openai/lib/python3.9/site-packages/gptcache/processor/pre.py", line 19, in last_content
return data.get("messages")[-1]["content"]
TypeError: 'NoneType' object is not subscriptable

Expected Behavior

it should give moderation api output.

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

[Feature]:Support audio embeddings

Is your feature request related to a problem? Please describe.

For audio - text generation modules such as OpenAI whisper, we need to cache with audio as the key

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

please give some usage scenarios for GPTCache

I am very confused with scenarios like AI chat, NPCs in the game which seems not really suitable for GPTCache, as these requests' contents are almost diifferent.

Although similarity evaluation can estimate some similarity requests and then making responses from GPTCahe, the responses' performance seems make a little worse than from LLM.

So can you give some typical scenarios that can be applied with GPTCache? What I mean is what characteristics do GPTCache scenarios conform to?

thanks!

[Query] : Not an issue, but a query - how to cache existing data and query offline

Pardon my ignorance. I am looking to build a offline cache with existing data and query offline. Can you let me know how to do it.

Can i save and search using get_data_manager from documentation link

[Feature]: PostgreSQL pgvector as vector store

Is your feature request related to a problem? Please describe.

pgvector is an extension for PostgreSQL that allows vector similarity search.
Any chance of supporting it as a vector store?

Personally, I see it very interesting since it may allow to run GPTCache with a single DB Engine (Postgres both as a cache store and a Vector Store) and it was lately supported on AWS RDS and other hosted solutions.

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Enhancement]: Support chromadb

What would you like to be added?

No response

Why is this needed?

No response

Anything else?

No response

[Feature]: Add Redis as a VectorStore

Is your feature request related to a problem? Please describe.

Redis is a really popular vector store and caching database which is used by the industry such as fintech. It will make it really easy to integrate into the existing services and APIs without adding new vector db such as FAISS and ChromaDB and include the entire pytorch into the build image. Also, it seems RedisSearch provide a efficient KNN search.

Describe the solution you'd like.

Use the redis async client, create a index that's solely for vectorescore. The index name prefix must NOT be matching other existing index names to avoid namespace overlapping, e.g. searching my-cache will also looking for entries under my-cache-gpt.

Describe an alternate solution.

No response

Anything else? (Additional Context)

I think this will be a really nice feature for those whom want to use cache without adding pure vector storage into their infra.

[Feature]: Support Weaviate as an option for a vector store

Is your feature request related to a problem? Please describe.

I want to use weaviate to find the K most similar requests from the input request's extracted embedding

Describe the solution you'd like.

Be able to do this:

data_manager = get_data_manager(CacheBase("weaviate"), VectorBase("weaviate", dimension=128))

or at least:

data_manager = get_data_manager(CacheBase("other db"), VectorBase("weaviate", dimension=128))

Describe an alternate solution.

None 😀

Anything else? (Additional Context)

No response

[Bug]: failed to save the data to cache, error: adapt.<locals>.update_cache_func()

Current Behavior

Hello , I was trying to run the following code from the doc (chat with GPT)) , but i am getting this message in console:

WARNING: failed to save the data to cache, error: adapt..update_cache_func() takes 1 positional argument but 2 were given

Expected Behavior

Cache save and update is expected

Steps To Reproduce

Code :


import time


def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

print("Cache loading.....")

# To use GPTCache, that's all you need
# -------------------------------------------------
from gptcache import cache
from gptcache.adapter import openai

cache.init()
cache.set_openai_key()
# -------------------------------------------------

question = "what's github"
for _ in range(2):
    start_time = time.time()
    response = openai.ChatCompletion.create(
      model='gpt-3.5-turbo',
      messages=[
        {
            'role': 'user',
            'content': question
        }
      ],
    )
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print(f'Answer: {response_text(response)}\n')



### Environment

```markdown
Env : Python 3.10.7,GPTCACHE v0.1.13

Anything else?

No response

How to set similarity_threshold

Does this variable have a range? I think it would be [0,1], but in the following code, it was a very large number.

https://github.com/zilliztech/gptcache/blob/d730fde6326f56df360a2f276d321d5f12c4aecd/example/sqlite_faiss_towhee/sqlite_faiss_towhee.py#L24

Scalar store

Why not use SQLAlchemy, which can help us support multiple databases.

[Feature]: Support to store images in cache storage.

Is your feature request related to a problem? Please describe.

For models working on image generation, we can store the result image in cache storage rather than just text.

TODO: support minio cache storage backend and local disk storage backend

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Bug]: Cache not writing when prompt is greater than 1000 characters for SQL Scalar Cache

Current Behavior

I have a large prompt that I want to cache (it's part of a prompt template in langchain where the query itself is small but the complete template is large) and it is not saving to cache because the prompt is > 1000 characters.

The expected output result itself is small (e.g. ~256 characters) but it's large due to it being explicit instructions. The bug I get is this:

2023-05-22 11:28:08,013 - 140704405832512 - adapter.py-adapter:162 - WARNING: failed to save the data to cache, error: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]String or binary data would be truncated in table 'gambrinus-cachestore.dbo.gptcache_question', column 'question'. Truncated value: '...'. (2628) (SQLExecDirectW)")
[SQL: INSERT INTO gptcache_question (question, create_on, last_access, embedding_data, deleted) OUTPUT inserted.id VALUES (?, ?, ?, ?, ?)]
[parameters: ('... (1424 characters truncated) ...', datetime.datetime(2023, 5, 22, 11, 28, 7, 965579), datetime.datetime(2023, 5, 22, 11, 28, 7, 965586), bytearray(b'e\xe4 \xbd\xb1CA=\xed\x96\x00<\x1b\xa9\xbd\xbd\xd4\x8c7\xbd\xde\x93]\xbd4\x0c\xd2\xbb\xbfo$\xbd\x01\x89\x82=\xb1\xff\xe5<\xb7\xd25\xbd\xf0 ... (8408 characters truncated) ... xb8 \'\xbdM\xab\xb6\xbc\xd3\xc7\x84=\x93a\x15=\xe9\x03A\xbd\xde\xe2\x18\xbd)0\x1c\xbb\x86\x95\x08<Vl\x19=T\x9c\x95\xbdX\x13j\xbdEg\x11\xbdk\xafi\xbd'), 0)]
(Background on this error at: https://sqlalche.me/e/20/f405)

I see that the ultimate issue is that the question in the question table is set to varchar(1000) which would be too small for this prompt. The test prompt I am using has 1706 characters. This is with SQL Server, but the bug doesn't seem to be specific to SQL Server but SQL in general as a scalar cache.

Expected Behavior

Allow the question column in the question table to be larger, either as a flexible variable or large enough for most LLMs (larger than 1000 characters anyway).

Steps To Reproduce

1. Create a SQL Server Scalar Cache.
2. Input a total prompt with > 1000 characters as input.
3. Attempt to run the cache.

Environment

here's the data manager. max_size and clean_size doesn't seem to do anything.


cache_base = CacheBase('sqlserver', sql_url=SQL_URL, table_name="gptcache")
vector_base = VectorBase('milvus', host=vector_database['host'], port=vector_database['port'],
                         user=vector_database['user'], password=vector_database['password'], secure=vector_database['secure'],
                         collection_name=vector_database['collection_name'], search_params=vector_database['search_params'], local_mode=vector_database['local_mode'],
                         dimension=onnx.dimension)

data_manager = get_data_manager(cache_base, vector_base, max_size=5000, clean_size=200)



### Anything else?

_No response_

[Bug]: Chinese support not very well?

Current Behavior

I test the offical similarity example in readme .

onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
cache.init(
    embedding_func=onnx.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(),
    )

...

but it dosen't support Chinese very well. I ask some question, it always occured of the same answer:

q:俄罗斯总统是谁
目前的俄罗斯总统是弗拉基米尔·普京。
q:你是谁?
目前的俄罗斯总统是弗拉基米尔·普京。
q:东风夜放花千树
目前的俄罗斯总统是弗拉基米尔·普京。
q:who are you?
I am an AI language model developed by OpenAI. I am designed to assist and provide information to users through conversation.
q:我儿子8岁, 我3年后比我儿子2倍大3岁, 我多少岁?
目前你的年龄是13岁，因为（8+3）*2=22。

q:东风夜放花千树
目前的俄罗斯总统是弗拉基米尔·普京。
Time consuming: 0.10s
2023-04-28 18:30:01,839 - 140497058133568 - _internal.py-_internal:186 - INFO: 127.0.0.1 - - [28/Apr/2023 18:30:01] "POST / HTTP/1.1" 302 -
2023-04-28 18:30:01,853 - 140497184024128 - _internal.py-_internal:186 - INFO: 127.0.0.1 - - [28/Apr/2023 18:30:01] "GET /?result=目前的俄罗斯总统是弗拉基米尔·普京。 HTTP/1.1" 200 -

I don't know how to avoid these problems?

Thank you!

Expected Behavior

match the right question and give the right answer.

Steps To Reproduce

run the similar match in the readme.

Environment

ubuntu 22.04

Anything else?

using onnx

[Feature]: Support DuckDB

Is your feature request related to a problem? Please describe.

SQLite is good but old

Describe the solution you'd like.

Support DuckDB

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Enhancement]: Support the bilingual llm request

What would you like to be added?

When I asked multiple questions, it always return same answers only because they mention same keywords.

Like:
I asked "what is coffee(咖啡是什么)",
it returns. " Coffee is a drink make of coffee beans " ... bla bla.
Correct answer.
But When I asked it "How to make Americano coffee(美式咖啡怎么制作的)", it answered me with same answer like above question.

Expected a better NLP.

Why is this needed?

Because it's totally different question.

Anything else?

No response

[Bug]: GET from server fail and crash

Current Behavior

When I try to get answer from server, I get empty reply from server and the server crashes.

Expected Behavior

When I try to get answer for a prompt Hello, It should return a proper reply, which in this case should be receive a hello message.

Steps To Reproduce

First I started the server:

python GPTCache/gptcache_server/server.py

Then I put and get:

❯ curl -X PUT -d "receive a hello message" "http://localhost:8000?prompt=hello"
❯ curl -X GET "http://localhost:8000?prompt=hello"
curl: (52) Empty reply from server

The server reports:

Starting server at localhost:8000
127.0.0.1 - - [25/Apr/2023 15:02:06] "PUT /?prompt=hello HTTP/1.1" 200 -
OMP: Error #15: Initializing libomp.a, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
[1]    49798 abort      python GPTCache/gptcache_server/server.py

Environment

I'm using:

Mac Intel
Pyhton 3.8
GPTcache from main branch - f3406ee

Though, everything works smoothly on ubuntu server.

Anything else?

No response

[Feature]: Support Image embedding and cache image as key.

Is your feature request related to a problem? Please describe.

For models like CLIP, and BLIP, we need to cache image as key

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

Listen as an OpenAI API?

I checked the document, I am thinking is that possible to make the GPTCache listen as an OpenAI like API? So we can just connect it to other service via OpenAI API, such as ChatBox.

[Bug]: Customize Cache

Current Behavior

want to customize the component, in order to add more static methods，such as openai base_url。But I got an error whether i remove cache.init or not

gptcache.utils.error.NotInitError: The cache should be inited before using

Expected Behavior

No response

Steps To Reproduce

No response

Environment

linux
latest packages

Anything else?

No response

[Enhancement]: Caching Support for Agents

What would you like to be added?

While it is possible to cache each LLM call, I notice that there is no way to cache the entire thought process and subsequent output from an Agent call e.g., LLMSingleActionAgent from langchain. Any way that this can be achieved?

Why is this needed?

Agents will be increasingly important and heavily utilized

Anything else?

No response

[Bug]: 'sentence-transformers/paraphrase-albert-small-v2' is NOT a correct model identifier listed on Huggingface

Current Behavior

I installed the latest version of GPTCache (0.1.22) and run:

from gptcache.embedding import Onnx
Onnx()

then I see the error message below:

OSError: Can't load config for 'sentence-transformers/paraphrase-albert-small-v2'. Make sure that:

- 'sentence-transformers/paraphrase-albert-small-v2' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'sentence-transformers/paraphrase-albert-small-v2' is the correct path to a directory containing a config.json file

I checked on Huggingface, and that model is no longer available.
Thank you.

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

[Enhancement]: Add RWKV model support (RWKV is a 100% RNN Language Model - ctxlen 8192 models available, longer ctxlen soon)

What would you like to be added?

RWKV Raven 7B Gradio Demo: https://huggingface.co/spaces/BlinkDL/Raven-RWKV-7B

Use rwkv.cpp for CPU INT4 / INT8: https://github.com/saharNooby/rwkv.cpp

Github project: https://github.com/BlinkDL/ChatRWKV

Sample code using rwkv pip package: https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmark_more.py

Please let me know if you have any questions :)

Why is this needed?

No response

Anything else?

No response

[Enhancement]: Support qdrant

What would you like to be added?

No response

Why is this needed?

No response

Anything else?

No response

[Feature]: Support huggingface transformers LLM model

Is your feature request related to a problem? Please describe.

Can huggingface LLM model chat caching be support?

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[DOCS]: add more badges and update doc text

Documentation Link

No response

Describe the problem

No response

Describe the improvement

No response

Anything else?

No response

[Feature]: Support MiniGPT4 and Blip2

Is your feature request related to a problem? Please describe.

Mini GPT4 will be interesting DEMO about how gptcache can work in the multi modality

see https://github.com/Vision-CAIR/MiniGPT-4/

The input will be a photo and a question, while the output is the answer of the question based on the photo.

have fun

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Enhancement]: refine the vector storage interface

What would you like to be added?

No response

Why is this needed?

No response

Anything else?

No response

[Bug]: The current caching strategy does not support multi-round conversations.

Current Behavior

   messages=[
       {"role": "system", "content": "You are a helpful assistant."},
       {"role": "user", "content": "who is the CEO of OpenAI?"},
       {"role": "user", "content": "how old is he/she"},

   ]

the answer

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "As of September 2021, the CEO of OpenAI is Sam Altman. He was born on April 22, 1985, which makes him 36 years old.",
        "role": "assistant"
      }
    }
  ],
  "created": 1680522845,
  "id": "chatcmpl-71D37v4kvN1haGqD322L445bqqM0P",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 37,
    "prompt_tokens": 37,
    "total_tokens": 74
  }
}

Then I changed the company name to APPLE

    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "who is the CEO of APPLE?"},
        {"role": "user", "content": "how old is he/she"},

    ]

Since the current caching strategy is using the latest content, how old is him/her will hit the cache. So the answer is

{'gptcache': True, 'choices': [{'message': {'role': 'assistant', 'content': 'As of September 2021, the CEO of OpenAI is Sam Altman. He was born on April 22, 1985, which makes him 36 years old.'}, 'finish_reason': 'stop', 'index': 0}]}

But if you look at the context, this answer is clearly unreasonable.

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

[Feature]: GPTCache openAI should make the cached result more similar to openAI response

Is your feature request related to a problem? Please describe.

ChatGPT returned
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "The Valley of Kings is located in the west bank of the Nile river in Luxor, Egypt.",
"role": "assistant"
}
}
],
"created": 1680670004,
"id": "chatcmpl-71pKeRARTWzSiQE5uu6NzKkZYUTLE",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion",
"usage": {
"completion_tokens": 20,
"prompt_tokens": 229,
"total_tokens": 249
}
}

GPT cache returned
{'gptcache': True, 'choices': [{'message': {'role': 'assistant', 'content': 'The Valley of Kings is located in the west bank of the Nile river in Luxor, Egypt.'}, 'finish_reason': 'stop', 'index': 0}]}

currently, the gpt cache returned is similar in choices but lack other field such as usage and created time

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Feature]: Add some param like temperature to make the output more random

Is your feature request related to a problem? Please describe.

just like

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

[Feature]:Support the PaddleNLP Embedding

Is your feature request related to a problem? Please describe.

支持PaddleNLP实现Embedding

Describe the solution you'd like.

支持PaddleNLP实现Embedding

Describe an alternate solution.

支持PaddleNLP实现Embedding

Anything else? (Additional Context)

支持PaddleNLP实现Embedding

[Bug]: GPTCache similarity caching code example encountered an error during execution

Current Behavior

This is an issue relating to the integration of GPTCache with LangChain

import os
import time
import gptcache
from gptcache.processor.pre import get_prompt
from gptcache.manager.factory import get_data_manager
from langchain.cache import GPTCache, SQLiteCache
from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache import Cache
from gptcache.embedding import Onnx
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
from langchain.llms import OpenAI
import langchain
import openai
from decouple import config

os.environ["OPENAI_API_KEY"] = config("OPENAI_API_KEY")
openai.api_base = config("OPENAI_API_BASE")

llm = OpenAI(model_name="text-davinci-002", n=1, best_of=1)
i = 0
file_prefix = "data_map"
llm_cache = Cache()

def init_gptcache_map(cache_obj: gptcache.Cache):
global i
cache_path = f'{file_prefix}_{i}.txt'
onnx = Onnx()
cache_base = CacheBase('sqlite')
vector_base = VectorBase('faiss', dimension=onnx.dimension)
data_manager = get_data_manager(cache_base, vector_base, max_size=10, clean_size=2)
cache_obj.init(
pre_embedding_func=get_prompt,
embedding_func=onnx.to_embeddings,
data_manager=data_manager,
similarity_evaluation=SearchDistanceEvaluation(),
)
i += 1

langchain.llm_cache = GPTCache(init_gptcache_map)

llm("Tell me a joke")
`

error:
Traceback (most recent call last): File "D:\chat-main\tt.py", line 43, in llm("Tell me a joke") File "D:\chat-main\venv\Lib\site-packages\langchain\llms\base.py", line 246, in call return self.generate([prompt], stop=stop).generations[0][0].text ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\chat-main\venv\Lib\site-packages\langchain\llms\base.py", line 161, in generate llm_output = update_cache( ^^^^^^^^^^^^^ File "D:\chat-main\venv\Lib\site-packages\langchain\llms\base.py", line 51, in update_cache langchain.llm_cache.update(prompt, llm_string, result) File "D:\chat-main\venv\Lib\site-packages\langchain\cache.py", line 255, in update return adapt( ^^^^^^ File "D:\chat-main\venv\Lib\site-packages\gptcache\adapter\adapter.py", line 22, in adapt embedding_data = time_cal( ^^^^^^^^^ File "D:\chat-main\venv\Lib\site-packages\gptcache_init_.py", line 25, in inner res = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\chat-main\venv\Lib\site-packages\gptcache\embedding\onnx.py", line 58, in to_embeddings ort_outputs = self.ort_session.run(None, ort_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Program Files (x86)\Python311\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run return self._sess.run(output_names, input_feed, run_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(int32)) , expected: (tensor(int64))

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

zilliztech / gptcache Goto Github PK

gptcache's Introduction

GPTCache : A Library for Creating Semantic Cache for LLM Queries

Quick Install

🚀 What is GPTCache?

😊 Quick Start

dev install

example usage

OpenAI API original usage

OpenAI API + GPTCache, exact match cache

OpenAI API + GPTCache, similar search cache

OpenAI API + GPTCache, use temperature

🎓 Bootcamp

😎 What can this help with?

🤔 How does it work?

🤗 Modules

😇 Roadmap

😍 Contributing

gptcache's People

Contributors

Stargazers

Watchers

Forkers

gptcache's Issues

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

What would you like to be added?

Why is this needed?

Anything else?

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

Documentation Link

Describe the problem

Describe the improvement

Anything else?

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

Documentation Link

Describe the problem

Describe the improvement

Anything else?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment