shishirpatil / gorilla Goto Github PK

View Code? Open in Web Editor NEW

10.4K 99.0 804.0 189.6 MB

Gorilla: An API store for LLMs

Home Page: https://gorilla.cs.berkeley.edu/

License: Apache License 2.0

Python 95.69% Shell 0.24% C++ 0.12% JavaScript 3.17% Rust 0.47% Scheme 0.29% Dockerfile 0.02%

api llm api-documentation chatgpt gpt-4-api claude-api openai-api openai-functions

gorilla's People

Contributors

Stargazers

Watchers

Forkers

krish240574 joedevon jwong8314 codeaudit 0xcryptoshark devdoshi gmh5225 joskid mccharley lijameshao dinodefend fayazmiraz jxzhangjhu stjordanis evelynmitchell touristshaun zhangjyr evdcush zjackz ai-natural-language-processing-lab soltrinox deepsimple dumpmemory cisfran05 nekonton botoai standardgalactic techthiyanes ccaiccie quduoduo argent-oxidum jjhw shawnharmsen vasanth1302 quant2017 eltociear balogunolalere jaedukseo bellahigh192 nooproblem moerehman biancaperad ariuannyo edithnewam richardyads robergias xiangweizheng vpegasus angle2046 xiaoqin00 hyzwz zhengtu1122 nirkal dkzdev goswamig paulyang8 xinxiangbobby cyberflamego hotelzululima gzihdnapabufshaj cranie aicodehunt bravedrxutf tonywhite11 positioner skywolfdreamer abhinav70291 techieteee jrcribb macromachine ai-jie01 koychoo mhamedi sontoriyama royesha aigc-cook-book milica013910743 catherinezhou t-sumida kgopeneivin emmaxen richardsonjf hannesgith diaszharmakhanov iuriimattos2 aruncivicscience greysond aifylabs yihaocs kaganmumcu fang-zhang connally82 parparto smallw00d2211 mikestahelena shpetimhaxhiu app-creative damonclifford neorisetech dbtjr1103

gorilla's Issues

eval resutls

Hi, thanks for your excellent work.

I ran the eval-scrip

python ast_eval_th.py --api_dataset ../../data/api/torchhub_api.jsonl --apibench ../../data/apibench/torchhub_eval.json --llm_responses ../eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl

and get the results:

Final Functionality accuracy:  0.7580645161290323
Final hallucination:  0.16129032258064516

I find these results are inconsistent with the results reported in the paper.

I would like to ask where I got it wrong.

Thanks.

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: What is the ...

Describe the solution you'd like
I would love to see a model of Gorilla hosted to Replicate, it would be nice to be able to utilize their API and hosting.
Additional context
Had a blast playing with the colab

Train with mpt 8k

Is the feature request related to a problem?

Would it be expensive to train with mpt 8k? Can you provide an mpt 8k model?

Describe the solution you'd like
When I run gorilla, I want to see an 8k context window.

Prefer to keep Apache 2 licensing.

Additional context
Add any other context or screenshots about the feature request here.

https://huggingface.co/mosaicml/mpt-7b-8k

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: I would like to translate 'I feel very good today.' from English to Chinese

Incorrect Response. Can be augmented with own data?

what are the document retrievers mentioned in your paper?

Hi!

thanks for the wonderful work! During reading your paper, I'm confused about the document retrievers mentioned in your paper. You mentioned several of them, such as gpt and oracle. I cannot find more specific reference or hyperlinks in your paper. I'm wondering where can I find websites or illustrations of these retrievers?

Thank you.

[Apibench] Needs to be easy to query for which API and version fully supported

Needs to be easy to query for which API and version fully supported

load-8bit flag doesn't work

Describe the issue
When I use the --load-8bit flag it's returning a load_compress_model that's not imported anywhere (and for that reason -I guess- it's failing?).

Any ideas on how to go about this issue? I've searched for this obj in the code itself and in hugging face's API but couldn't find it, so I'm kind of clueless on what to do.

I'm running this on a one GPU machine. It's an old T420 with archlinux.

Thanks!

Clean up legacy dependecies from `eval/`

We don't use codeblue anymore. It's just legacy code. Just need to verify and clean up any dangling references, and remove this dependency.

To remove: https://github.com/ShishirPatil/gorilla/tree/main/eval/eval-scripts/codebleu
Check for dependencies: https://github.com/ShishirPatil/gorilla/tree/main/eval/eval-scripts/*

.

timeline to release training codes?

Thanks for sharing the awesome work! do you have a rough estimate when will be you release the training codes?

The provided response file test results are not consistent with the paper[bug] Hosted Gorilla: <Issue>

Describe the bug

We use the file /eval/eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl and then use the code /eval/eval-scripts/ast_eval_th.py to calculate the metrics The final calculated result is Final Functionality accuracy: 75.80 Final hallucination: 16.12, which is the same as the final Functionality accuracy of zero-shot of torchhub published in Table1 of the paper. 59.13 Final hallucination: 6.98 is a big difference

To Reproduce
Steps to reproduce the behavior:

We use the file /eval/eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl and then use the code /eval/eval-scripts/ast_eval_th.py to calculate the metrics

Screenshots
None

Proposed Solution
None

Additional context
We would like to know why there is a large discrepancy with the original published results, whether it is because an update was made or we compared the wrong table.

idea: The DB-GPT project is doing something similar.

Thank you very much for your work. We are also implementing similar functionality through a plugin mechanism in our DB-GPT project.

open source: https://github.com/csunny/DB-GPT

Bump Anthropic dependency from 0.2.x to 0.3.x

We use Anthropic's Claude for evaluating Gorilla in eval/. This was tested for anthropic==0.2.8 release, and needs to be updated to support the latest PyPI release (0.3.x). This involves cosmetic changes in two files ‎eval/get_llm_responses.py and ‎eval/get_llm_responses_retriever.py

[feature] - Change the Gorilla picture to the one of ones below (if you like it)

Contributing APIs

I'd like to contribute the SkyPilot API. What's the best way to add it to Gorilla?

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

When applying these deltas to these base weights I get the following error:

$ python apply_delta.py --base-model-path ../../llama-7b-hf/ --target-model-path ../../gorilla-7b-hf-v0/ --delta-path ../../gorilla-7b-hf-delta-v0/
Loading the delta weights from ../../gorilla-7b-hf-delta-v0/
Traceback (most recent call last):
  File "/home/paperspace/projects/gorilla/gorilla/inference/apply_delta.py", line 167, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/home/paperspace/projects/gorilla/gorilla/inference/apply_delta.py", line 129, in apply_delta
    delta_tokenizer = AutoTokenizer.from_pretrained(delta_path, use_fast=False)
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 702, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained
    return cls._from_pretrained(
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__
    self.sp_model.Load(vocab_file)
  File "/home/paperspace/.local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/home/paperspace/.local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Specs:

$ nvidia-smi
Thu Jun  1 17:50:22 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro M4000        Off  | 00000000:00:05.0  On |                  N/A |
| 46%   32C    P8    16W / 120W |    189MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1532      G   /usr/lib/xorg/Xorg                121MiB |
|    0   N/A  N/A      2011      G   /usr/bin/gnome-shell               59MiB |
|    0   N/A  N/A      2571      G   ...bexec/gnome-initial-setup        2MiB |
+-----------------------------------------------------------------------------+

$ LC_ALL=C lspci -v | grep -EA10 "3D|VGA" | grep 'prefetchable' 
	Memory at f4000000 (32-bit, prefetchable) [size=8M]
	Memory at f3000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f0000000 (64-bit, prefetchable) [size=32M]

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           29Gi       1.2Gi       5.6Gi        13Mi        22Gi        27Gi
Swap:            0B          0B          0B

[bug] Hosted Gorilla: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8a in position 9: invalid start byte

Hello,
I get this error when launching Gorilla on a Windows 10 PC:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8a in position 9: invalid start byte

It seems like the output works though.
Thank you.

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa8bdf53c40>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt:

any idea why it failed?

Using the following open.api_base: http://34.132.127.197:8000/v1

How to generate training samples by self-Instruct

Thank you very much for your work！

In this repository, I see API data and training data, what prompt should I use to generate training data from API data.

Thanks！

Request to understand more on integration with other tools.

Hi, thanks for the demo video. Just wanted to understand how did you created gorilla spotlight feature and how can we create something out of it. Also i couldnt understand How we can integreate into langchain exactly.

De-duplicate APIBench eval data (?)

The evaluation data for APIBench is duplicated between data/apibench/*_eval.json and eval/eval-data/questions/. I think the only difference is formatting. Maybe we should just keep the eval/eval-data/responses and have data/apibench for only data used to train the model.

Initially we made two copies with the following rationale:
apibench should have all the data self-contained, which the community is using to train/benchmark their LLMs.
eval/ would have the eval data in a format that would be easy to eyeball and understand what is going on.

Maybe this is one of those few cases where it might be ok to have the same data twice in the repository in different formats?

Starting this issue in case anyone has comments on this.

Augmenting additional API to the Gorilla-LLm

I hope you are doing well, a great thanks for this work.
Is it possible to add additional APIs(private APIs) to Gorilla? We have a large database of APIs and we need to add them to Gorilla, How can we do this? Should we fine-tune the Gorilla LLM? or something like this?

Target module for Qlora?

lora_target_modules='["query_key_value"]' "not part of this model"

The returned results show garbled content?

The running command used is：
python3 serve/gorilla_cli.py --model-path model/gorilla-7b-th-v0/

But the returned results show garbled content

How did this problem arise and how should it be resolved?

Leveraging Llama 2

I don’t see any existing discussion about leveraging Meta’s new Llama 2 model. Curious if you guys have any plans in the making for using this new base model in gorilla.

[bug] Hosted Gorilla: <Issue>

Exception: text
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to French

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2302 tokens (1790 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)

Is there any way to just cut the completion / request to the first 2048 tokens?

How to run this project?

Describe the issue

I saw the scene described in the video, which seems to be running on the command line and obtaining API access methods through dialogue. But I didn't find where to run it to get such results. Do I need to train first or do I need to run a specific Python file? Please advise..

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f455ea077f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

The bm25 and gpt-index scripts ?

          For the different retrievers, we use bm25 (https://en.wikipedia.org/wiki/Okapi_BM25), gpt-index simply uses `Davinci v1` from OpenAI to embed all the documents and do simple cosine similarity match during the inference time. For oracle, we just provided the golden truth answer to Gorilla. Hope this helps and let me know if there are any further questions!

Originally posted by @tianjunz in #21 (comment)

Would you be willing to release the bm25 and gpt-index scripts to help the community reproduce the experimental results?

[feature] FOOM detection.

This seems like the sort of project that could accidentally produce a self-improving superhuman system. Does anyone on the project have an understanding of AI Alignment? Are there efforts to measure the potential for systems built with gorilla to FOOM?

Questions on evaluation data in the project and paper

Could I know what does FT and RT mean here? finetuning?
I notice it's hard for LLM to always generate right formats. Here's are some examples. How did you handle these kind of responses. Did you exclude them when you build the charts from the paper? Do you generate percentage of the invalid records?

https://github.com/ShishirPatil/gorilla/blob/main/eval/eval-data/responses/huggingface/response_huggingface_Gorilla_FT_0_shot.jsonl

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f4f57fc10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

GPT4 cutoff date is September 2021 - how did this impact evals?

Any new API info would not be in GPT4 training.

How much impact do you think this has with respect to relative performance between GPT4 and Gorilla?

Did you do any eval on APIs that existed prior to 09/21 versus those after?

I reviewed the paper but could not find any discussion on this. https://arxiv.org/abs/2305.15334

To be clear, I am not saying this invalidates the ideas, which I think were a fantastic contribution to OS LLMs, but rather that it would be good to understand the precise reason for the superior performance.

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(CUDA error: uncorrectable ECC error encountered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n)","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: I would like to translate 'I feel very good today.' from English to Chinese

it would be nice if gorilla could actually automate tasks

I would really like to see Gorilla AI to automate my boring tasks
I really want to see it actually automate my tasks instead of choosing a Bash command that does not work

context
here's the inspiration of the idea:
https://github.com/emcf/engshell

btw don't forget to use gorilla LLM since it's better than GPT-4

[bug] Testing Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7bba974da140>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-tf-v0, for prompt: I want to build a robot that can detecting objects in an image

integrating discord webhooks with Repo

This would allow discord members to easily view GitHub commits/Issues/Pull Requests easily.
I wouldn't mind having access having access to certain settings in this Repo in order to set it up.
If you don't want to give perms you can use this guide here https://ardalis.com/integrate-github-and-discord-with-webhooks/

[feature] Run gorilla locally without GPUs 🦍

Today, Gorilla end-points run on UC Berkeley hosted servers 🐻 When you try our colab, or our chat completion API, or the CLI tool, it hits our GPUs for inference. A popular ask among our users is to run Gorilla locally on Macbooks/Linux/WSL.

Describe the solution you'd like:
Have the model(s) running locally on MPS/CPU/GPU and listening to a port. All the current gorilla end-points can then just hit localhost to get the response to any given prompt.

Additional context:
Here is an application that would immediately use it: https://github.com/gorilla-llm/gorilla-cli
Given, we have LLaMA models, these should be plug-and-play: ggerganov/llama.cpp and karpathy/llama2.c
Also relevant: https://huggingface.co/TheBloke/gorilla-7B-GPTQ

Update 1: If you happen to have an RTX, or V100 or A100 or H100, you can use Gorilla today without any latency hit. The goal of this enhancement is to help those who may not have access to and greatest GPUs.

License?

Hello, thanks for making your work available! Have you chosen a license yet?

inference/gorilla_eval.py funcion not implemented

in file inference/gorilla_eval.py when set --device mps the following function is not implemented

replace_llama_attn_with_non_inplace_operations()

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0ec18dabf0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

Discord invite link is invalid

The discord invite link expired please renew

Gorilla Self-Hosted

Hi,

is it also possible to self host Gorilla with an API that is compatible with the OpenAI chat completion API?
So essentially the same as depicted in the Colab?

Encountered 1 file(s) that may not have been copied correctly on Windows

I encounter this problem downloading model weights. Seems weights larger than 4 GB are not correctly handled on Windows. Do you upload the models from windows system?

root@4bd793bb2ded:/workspace/gorilla# git lfs install
Updated git hooks.
Git LFS initialized.

root@4bd793bb2ded:/workspace/gorilla# git clone https://huggingface.co/gorilla-llm/gorilla-mpt-7b-hf-v0
Cloning into 'gorilla-mpt-7b-hf-v0'...
remote: Enumerating objects: 35, done.
remote: Counting objects: 100% (35/35), done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 35 (delta 5), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (35/35), 621.68 KiB | 1.84 MiB/s, done.
Filtering content: 100% (2/2), 4.38 GiB | 57.36 MiB/s, done.
Encountered 1 file(s) that may not have been copied correctly on Windows:
        pytorch_model-00001-of-00002.bin

See: `git lfs help smudge` for more details.

root@4bd793bb2ded:/workspace/gorilla/gorilla-mpt-7b-hf-v0# ls -al
total 12989212
drwxr-xr-x 3 root root       4096 Jun  7 00:17 .
drwxr-xr-x 8 root root        161 Jun  7 00:16 ..
drwxr-xr-x 9 root root        174 Jun  7 00:18 .git
-rw-r--r-- 1 root root       1477 Jun  7 00:16 .gitattributes
-rw-r--r-- 1 root root       2068 Jun  7 00:16 README.md
-rw-r--r-- 1 root root       1752 Jun  7 00:16 adapt_tokenizer.py
-rw-r--r-- 1 root root      16818 Jun  7 00:16 attention.py
-rw-r--r-- 1 root root       2493 Jun  7 00:16 blocks.py
-rw-r--r-- 1 root root       1284 Jun  7 00:16 config.json
-rw-r--r-- 1 root root       9080 Jun  7 00:16 configuration_mpt.py
-rw-r--r-- 1 root root      28182 Jun  7 00:16 flash_attn_triton.py
-rw-r--r-- 1 root root        112 Jun  7 00:16 generation_config.json
-rw-r--r-- 1 root root      27219 Jun  7 00:16 hf_prefixlm_converter.py
-rw-r--r-- 1 root root       3639 Jun  7 00:16 meta_init_context.py
-rw-r--r-- 1 root root      17406 Jun  7 00:16 modeling_mpt.py
-rw-r--r-- 1 root root       2563 Jun  7 00:16 norm.py
-rw-r--r-- 1 root root      12558 Jun  7 00:16 param_init_fns.py
-rw-r--r-- 1 root root 9943040275 Jun  7 00:18 pytorch_model-00001-of-00002.bin
-rw-r--r-- 1 root root 3355599187 Jun  7 00:17 pytorch_model-00002-of-00002.bin
-rw-r--r-- 1 root root      16023 Jun  7 00:16 pytorch_model.bin.index.json
-rw-r--r-- 1 root root        129 Jun  7 00:16 special_tokens_map.json
-rw-r--r-- 1 root root    2113738 Jun  7 00:16 tokenizer.json
-rw-r--r-- 1 root root        264 Jun  7 00:16 tokenizer_config.json

What's the GPTIndex, Oracle retriever in the paper?

Could you help share the reference of the oracle retriever? I can not find it from the paper.
Is the GPTIndex in the paper LLamaIndex? I know GPTIndex has been rename to LLamaIndex https://github.com/jerryjliu/llama_index and just like to confirm that. If so, what's the index method you are using? List? Tree or something else?

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f912081fb50>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

shishirpatil / gorilla Goto Github PK

gorilla's People

Contributors

Stargazers

Watchers

Forkers

gorilla's Issues

Recommend Projects

Recommend Topics

Recommend Org