shishirpatil / gorilla Goto Github PK
View Code? Open in Web Editor NEWGorilla: An API store for LLMs
Home Page: https://gorilla.cs.berkeley.edu/
License: Apache License 2.0
Gorilla: An API store for LLMs
Home Page: https://gorilla.cs.berkeley.edu/
License: Apache License 2.0
Hi, thanks for your excellent work.
I ran the eval-scrip
python ast_eval_th.py --api_dataset ../../data/api/torchhub_api.jsonl --apibench ../../data/apibench/torchhub_eval.json --llm_responses ../eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl
and get the results:
Final Functionality accuracy: 0.7580645161290323
Final hallucination: 0.16129032258064516
I find these results are inconsistent with the results reported in the paper.
I would like to ask where I got it wrong.
Thanks.
Exception: Invalid response object from API: '{"object":"error","message":"","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: What is the ...
Describe the solution you'd like
I would love to see a model of Gorilla hosted to Replicate, it would be nice to be able to utilize their API and hosting.
Additional context
Had a blast playing with the colab
Is the feature request related to a problem?
Would it be expensive to train with mpt 8k? Can you provide an mpt 8k model?
Describe the solution you'd like
When I run gorilla, I want to see an 8k context window.
Prefer to keep Apache 2 licensing.
Additional context
Add any other context or screenshots about the feature request here.
Exception: Invalid response object from API: '{"object":"error","message":"","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: I would like to translate 'I feel very good today.' from English to Chinese
Hi!
thanks for the wonderful work! During reading your paper, I'm confused about the document retrievers mentioned in your paper. You mentioned several of them, such as gpt and oracle. I cannot find more specific reference or hyperlinks in your paper. I'm wondering where can I find websites or illustrations of these retrievers?
Thank you.
Needs to be easy to query for which API and version fully supported
Describe the issue
When I use the --load-8bit
flag it's returning a load_compress_model
that's not imported anywhere (and for that reason -I guess- it's failing?).
Any ideas on how to go about this issue? I've searched for this obj in the code itself and in hugging face's API but couldn't find it, so I'm kind of clueless on what to do.
I'm running this on a one GPU machine. It's an old T420 with archlinux.
Thanks!
We don't use codeblue anymore. It's just legacy code. Just need to verify and clean up any dangling references, and remove this dependency.
To remove: https://github.com/ShishirPatil/gorilla/tree/main/eval/eval-scripts/codebleu
Check for dependencies: https://github.com/ShishirPatil/gorilla/tree/main/eval/eval-scripts/*
Thanks for sharing the awesome work! do you have a rough estimate when will be you release the training codes?
Describe the bug
We use the file /eval/eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl and then use the code /eval/eval-scripts/ast_eval_th.py to calculate the metrics The final calculated result is Final Functionality accuracy: 75.80 Final hallucination: 16.12, which is the same as the final Functionality accuracy of zero-shot of torchhub published in Table1 of the paper. 59.13 Final hallucination: 6.98 is a big difference
To Reproduce
Steps to reproduce the behavior:
Screenshots
None
Proposed Solution
None
Additional context
We would like to know why there is a large discrepancy with the original published results, whether it is because an update was made or we compared the wrong table.
Thank you very much for your work. We are also implementing similar functionality through a plugin mechanism in our DB-GPT project.
open source: https://github.com/csunny/DB-GPT
We use Anthropic's Claude for evaluating Gorilla in eval/
. This was tested for anthropic==0.2.8
release, and needs to be updated to support the latest PyPI release (0.3.x). This involves cosmetic changes in two files eval/get_llm_responses.py and eval/get_llm_responses_retriever.py
I'd like to contribute the SkyPilot API. What's the best way to add it to Gorilla?
When applying these deltas to these base weights I get the following error:
$ python apply_delta.py --base-model-path ../../llama-7b-hf/ --target-model-path ../../gorilla-7b-hf-v0/ --delta-path ../../gorilla-7b-hf-delta-v0/
Loading the delta weights from ../../gorilla-7b-hf-delta-v0/
Traceback (most recent call last):
File "/home/paperspace/projects/gorilla/gorilla/inference/apply_delta.py", line 167, in <module>
apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
File "/home/paperspace/projects/gorilla/gorilla/inference/apply_delta.py", line 129, in apply_delta
delta_tokenizer = AutoTokenizer.from_pretrained(delta_path, use_fast=False)
File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 702, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained
return cls._from_pretrained(
File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__
self.sp_model.Load(vocab_file)
File "/home/paperspace/.local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/home/paperspace/.local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
Specs:
$ nvidia-smi
Thu Jun 1 17:50:22 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro M4000 Off | 00000000:00:05.0 On | N/A |
| 46% 32C P8 16W / 120W | 189MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1532 G /usr/lib/xorg/Xorg 121MiB |
| 0 N/A N/A 2011 G /usr/bin/gnome-shell 59MiB |
| 0 N/A N/A 2571 G ...bexec/gnome-initial-setup 2MiB |
+-----------------------------------------------------------------------------+
$ LC_ALL=C lspci -v | grep -EA10 "3D|VGA" | grep 'prefetchable'
Memory at f4000000 (32-bit, prefetchable) [size=8M]
Memory at f3000000 (32-bit, non-prefetchable) [size=16M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at f0000000 (64-bit, prefetchable) [size=32M]
$ free -h
total used free shared buff/cache available
Mem: 29Gi 1.2Gi 5.6Gi 13Mi 22Gi 27Gi
Swap: 0B 0B 0B
Hello,
I get this error when launching Gorilla on a Windows 10 PC:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8a in position 9: invalid start byte
It seems like the output works though.
Thank you.
Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa8bdf53c40>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese
Exception: Invalid response object from API: '{"object":"error","message":"","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt:
any idea why it failed?
Using the following open.api_base: http://34.132.127.197:8000/v1
Thank you very much for your work!
In this repository, I see API data and training data, what prompt should I use to generate training data from API data.
Thanks!
Hi, thanks for the demo video. Just wanted to understand how did you created gorilla spotlight feature and how can we create something out of it. Also i couldnt understand How we can integreate into langchain exactly.
The evaluation data for APIBench is duplicated between data/apibench/*_eval.json
and eval/eval-data/questions/
. I think the only difference is formatting. Maybe we should just keep the eval/eval-data/responses
and have data/apibench
for only data used to train the model.
Initially we made two copies with the following rationale:
apibench
should have all the data self-contained, which the community is using to train/benchmark their LLMs.
eval/
would have the eval data in a format that would be easy to eyeball and understand what is going on.
Maybe this is one of those few cases where it might be ok to have the same data twice in the repository in different formats?
Starting this issue in case anyone has comments on this.
I hope you are doing well, a great thanks for this work.
Is it possible to add additional APIs(private APIs) to Gorilla? We have a large database of APIs and we need to add them to Gorilla, How can we do this? Should we fine-tune the Gorilla LLM? or something like this?
lora_target_modules='["query_key_value"]' "not part of this model"
I don’t see any existing discussion about leveraging Meta’s new Llama 2 model. Curious if you guys have any plans in the making for using this new base model in gorilla.
Exception: text
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to French
Exception: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2302 tokens (1790 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)
Is there any way to just cut the completion / request to the first 2048 tokens?
Describe the issue
I saw the scene described in the video, which seems to be running on the command line and obtaining API access methods through dialogue. But I didn't find where to run it to get such results. Do I need to train first or do I need to run a specific Python file? Please advise..
Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f455ea077f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese
For the different retrievers, we use bm25 (https://en.wikipedia.org/wiki/Okapi_BM25), gpt-index simply uses `Davinci v1` from OpenAI to embed all the documents and do simple cosine similarity match during the inference time. For oracle, we just provided the golden truth answer to Gorilla. Hope this helps and let me know if there are any further questions!
Originally posted by @tianjunz in #21 (comment)
Would you be willing to release the bm25 and gpt-index scripts to help the community reproduce the experimental results?
This seems like the sort of project that could accidentally produce a self-improving superhuman system. Does anyone on the project have an understanding of AI Alignment? Are there efforts to measure the potential for systems built with gorilla to FOOM?
I notice it's hard for LLM to always generate right formats. Here's are some examples. How did you handle these kind of responses. Did you exclude them when you build the charts from the paper? Do you generate percentage of the invalid records?
https://github.com/ShishirPatil/gorilla/blob/main/eval/eval-data/responses/huggingface/response_huggingface_Gorilla_FT_0_shot.jsonl
Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f4f57fc10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese
Any new API info would not be in GPT4 training.
How much impact do you think this has with respect to relative performance between GPT4 and Gorilla?
Did you do any eval on APIs that existed prior to 09/21 versus those after?
I reviewed the paper but could not find any discussion on this. https://arxiv.org/abs/2305.15334
To be clear, I am not saying this invalidates the ideas, which I think were a fantastic contribution to OS LLMs, but rather that it would be good to understand the precise reason for the superior performance.
Exception: Invalid response object from API: '{"object":"error","message":"NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(CUDA error: uncorrectable ECC error encountered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with TORCH_USE_CUDA_DSA
to enable device-side assertions.\n)","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: I would like to translate 'I feel very good today.' from English to Chinese
I would really like to see Gorilla AI to automate my boring tasks
I really want to see it actually automate my tasks instead of choosing a Bash command that does not work
context
here's the inspiration of the idea:
https://github.com/emcf/engshell
btw don't forget to use gorilla LLM since it's better than GPT-4
Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7bba974da140>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-tf-v0, for prompt: I want to build a robot that can detecting objects in an image
This would allow discord members to easily view GitHub commits/Issues/Pull Requests easily.
I wouldn't mind having access having access to certain settings in this Repo in order to set it up.
If you don't want to give perms you can use this guide here https://ardalis.com/integrate-github-and-discord-with-webhooks/
Today, Gorilla end-points run on UC Berkeley hosted servers 🐻 When you try our colab, or our chat completion API, or the CLI tool, it hits our GPUs for inference. A popular ask among our users is to run Gorilla locally on Macbooks/Linux/WSL.
Describe the solution you'd like:
Have the model(s) running locally on MPS/CPU/GPU and listening to a port. All the current gorilla end-points can then just hit localhost
to get the response to any given prompt.
Additional context:
Here is an application that would immediately use it: https://github.com/gorilla-llm/gorilla-cli
Given, we have LLaMA models, these should be plug-and-play: ggerganov/llama.cpp and karpathy/llama2.c
Also relevant: https://huggingface.co/TheBloke/gorilla-7B-GPTQ
Update 1: If you happen to have an RTX, or V100 or A100 or H100, you can use Gorilla today without any latency hit. The goal of this enhancement is to help those who may not have access to and greatest GPUs.
Hello, thanks for making your work available! Have you chosen a license yet?
in file inference/gorilla_eval.py
when set --device mps
the following function is not implemented
replace_llama_attn_with_non_inplace_operations()
Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0ec18dabf0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese
The discord invite link expired please renew
Hi,
is it also possible to self host Gorilla with an API that is compatible with the OpenAI chat completion API?
So essentially the same as depicted in the Colab?
I encounter this problem downloading model weights. Seems weights larger than 4 GB are not correctly handled on Windows. Do you upload the models from windows system?
root@4bd793bb2ded:/workspace/gorilla# git lfs install
Updated git hooks.
Git LFS initialized.
root@4bd793bb2ded:/workspace/gorilla# git clone https://huggingface.co/gorilla-llm/gorilla-mpt-7b-hf-v0
Cloning into 'gorilla-mpt-7b-hf-v0'...
remote: Enumerating objects: 35, done.
remote: Counting objects: 100% (35/35), done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 35 (delta 5), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (35/35), 621.68 KiB | 1.84 MiB/s, done.
Filtering content: 100% (2/2), 4.38 GiB | 57.36 MiB/s, done.
Encountered 1 file(s) that may not have been copied correctly on Windows:
pytorch_model-00001-of-00002.bin
See: `git lfs help smudge` for more details.
root@4bd793bb2ded:/workspace/gorilla/gorilla-mpt-7b-hf-v0# ls -al
total 12989212
drwxr-xr-x 3 root root 4096 Jun 7 00:17 .
drwxr-xr-x 8 root root 161 Jun 7 00:16 ..
drwxr-xr-x 9 root root 174 Jun 7 00:18 .git
-rw-r--r-- 1 root root 1477 Jun 7 00:16 .gitattributes
-rw-r--r-- 1 root root 2068 Jun 7 00:16 README.md
-rw-r--r-- 1 root root 1752 Jun 7 00:16 adapt_tokenizer.py
-rw-r--r-- 1 root root 16818 Jun 7 00:16 attention.py
-rw-r--r-- 1 root root 2493 Jun 7 00:16 blocks.py
-rw-r--r-- 1 root root 1284 Jun 7 00:16 config.json
-rw-r--r-- 1 root root 9080 Jun 7 00:16 configuration_mpt.py
-rw-r--r-- 1 root root 28182 Jun 7 00:16 flash_attn_triton.py
-rw-r--r-- 1 root root 112 Jun 7 00:16 generation_config.json
-rw-r--r-- 1 root root 27219 Jun 7 00:16 hf_prefixlm_converter.py
-rw-r--r-- 1 root root 3639 Jun 7 00:16 meta_init_context.py
-rw-r--r-- 1 root root 17406 Jun 7 00:16 modeling_mpt.py
-rw-r--r-- 1 root root 2563 Jun 7 00:16 norm.py
-rw-r--r-- 1 root root 12558 Jun 7 00:16 param_init_fns.py
-rw-r--r-- 1 root root 9943040275 Jun 7 00:18 pytorch_model-00001-of-00002.bin
-rw-r--r-- 1 root root 3355599187 Jun 7 00:17 pytorch_model-00002-of-00002.bin
-rw-r--r-- 1 root root 16023 Jun 7 00:16 pytorch_model.bin.index.json
-rw-r--r-- 1 root root 129 Jun 7 00:16 special_tokens_map.json
-rw-r--r-- 1 root root 2113738 Jun 7 00:16 tokenizer.json
-rw-r--r-- 1 root root 264 Jun 7 00:16 tokenizer_config.json
Could you help share the reference of the oracle retriever? I can not find it from the paper.
Is the GPTIndex in the paper LLamaIndex? I know GPTIndex has been rename to LLamaIndex https://github.com/jerryjliu/llama_index and just like to confirm that. If so, what's the index method you are using? List? Tree or something else?
Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f912081fb50>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.