Comments (2)
RoPE is a technique that allows a model to operate in a higher context (max_total_tokens) than the context length in which it was trained (maybe 4096). That's what you see. By default, if you do not set the parameter, it is calculated automatically. In other words, just delete the line with the parameter starting with rope.
There's no obivous golden rule for rope_freq_base, but I recommend using this calculations:
def calculate_rope_alpha(self) -> float:
"""Calculate the RoPE alpha based on the n_ctx.
Assume that the trained token length is 4096."""
# The following formula is obtained by fitting the data points
# (comp, alpha): [(1.0, 1.0) (1.75, 2.0), (2.75, 4.0), (4.1, 8.0)]
compress_ratio = self.calculate_rope_compress_ratio()
return (
-0.00285883 * compress_ratio**4
+ 0.03674126 * compress_ratio**3
+ 0.23873223 * compress_ratio**2
+ 0.49519964 * compress_ratio
+ 0.23218571
)
def calculate_rope_freq(self) -> float:
"""Calculate the RoPE frequency based on the n_ctx.
Assume that the trained token length is 4096."""
return 10000.0 * self.calculate_rope_alpha() ** (64 / 63)
def calculate_rope_compress_ratio(self) -> float:
"""Calculate the RoPE embedding compression ratio based on the n_ctx.
Assume that the trained token length is 4096."""
return max(self.max_total_tokens / Config.trained_tokens, 1.0)
def calculate_rope_scale(self) -> float:
"""Calculate the RoPE scaling factor based on the n_ctx.
Assume that the trained token length is 4096."""
return 1 / self.calculate_rope_compress_ratio()
Note that this auto calculation methods are present in dev branch now and will be merged soon.
You can refer other parameters in this path:
llama_api/schemas/models.py. I recommend you to use IDE such as VSCode, as it will show you some hints for hidden parameters.
from llama-api.
Thank you so much!
from llama-api.
Related Issues (20)
- Is there a way to use this on google Colab and have the url be public? HOT 1
- model_definitions.py HOT 3
- warning: failed to mlock 245760-byte buffer (after previously locking 0 bytes): Cannot allocate memory llm_load_tensors: mem required = 46494.72 MB (+ 1280.00 MB per state) HOT 4
- Generation stops at 251 tokens - works fine on oobabooga HOT 3
- Support for ExLlama V2 HOT 2
- Set number of cores being used on cpu? HOT 2
- BUG: I found the model path bug! HOT 2
- Long generations dont return data but server says 200 OK. Swagger screen just says LOADING forever. HOT 5
- Any way to define embeddings model in model_definitions.py? HOT 1
- exllamav2 HOT 2
- exllama GPU split HOT 1
- Zephyr7b gives gobbly gook output but Mistral7b works fine.
- how to run this api in cpu only mode HOT 1
- How can I use a specific prompt template?
- Support min_p sampler
- Usage of embedding through langchain
- High RAM and CPU usage
- Stopped working after enabling CUDA
- FastAPI + llamapi issue
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama-api.