Comments (4)
Yes, we have plans to move our attention computation over to the FlashInfer project, which is working on support for Volta and Turning GPUs. So hopefully that will address the issue.
from lorax.
Hey @emillykkejensen, unfortunately our min supported architecture at the moment is Ampere due to the flash attention dependency. Please see system requirements here: https://github.com/predibase/lorax?tab=readme-ov-file#requirements
from lorax.
Fair enough. However, one could argue that the point of qlora among other things, is to serve on smaller (older and cheeper) GPU's that don't support ampere? Is there anything in the making, or?
from lorax.
Sounds good 😊 I'm sure you are already aware, but in the off case your not, I can see that there is a fix in TGI? However it seems they simply fix it by loading the full model?
from lorax.
Related Issues (20)
- Refactor the quantization config for weights
- log arbitrary headers HOT 1
- Async client to backoff when model overloaded
- Support loading `.pt` weights HOT 1
- Error: Warmup(Generation("'bool' object has no attribute 'dtype'")) HOT 3
- Inference with AWQ quantized base model + compile enabled results in the <unk> tokens
- Combining multiple LoRA adapters HOT 1
- 10s latency of lora inference caused by None base_model_name_or_path in adapter_config
- [Question] Usage about the `adapter-memory-fraction` HOT 1
- Improve the latency of `load_batched_adapter_weights` HOT 1
- Fix PyTorch CUDA version in Docker
- Idefics2 and LLaVA
- Fallback to Flash Attention v1 for pre-Ampere GPUs HOT 1
- Private LORA Adapter Error - Server error: No valid adapter config file found: tried None and None HOT 1
- Llama3-8b-Instruct won't stop generating HOT 1
- Speculative tokens fails during warmup in some scenarios HOT 1
- Batch inference endpoint (OpenAI compatible)
- Add HF authentication instructions to lorax-launcher docs HOT 6
- Improve async load for adapters to avoid main thread lockups in server
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lorax.