Comments (3)
Try increasing the memory utilization, since the new release can handle much larger values (up to 1.0 if there's no idle vram usage). This doesn't apply to GPTQ, but CUDA graphs are enabled by default and they use about 2gb extra vram, you can disable them with --enforce-eager or enforce_eager=True if using in a script.
from aphrodite-engine.
Nope, still crashes with OOM with both points addressed
Full command:
python -m aphrodite.endpoints.openai.api_server \
--download-dir ./models \
--model "TheBloke/MythoMax-L2-13B-AWQ" \
--gpu-memory-utilization 1 \
--swap-space 5 \
--api-keys key \
--dtype float16 \
--enforce-eager \
--host 127.0.0.1 \
--port 5000 \
--max-model-len 4096
from aphrodite-engine.
nvm, that's me being dumb. I forgot to add --quantization gptq/awq, aphrodite-engine 0.4.2 managed to load gptq models even w/o it
from aphrodite-engine.
Related Issues (20)
- Reduced performance due to Ray process core pinning HOT 2
- Request: Have Dockerfile use the current branch HOT 2
- Error when `top_logprobs` value is `-inf` HOT 4
- Device Side Assertion, Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. HOT 19
- AsyncEngineDeadError with koboldai api server HOT 4
- Add RoPE scaling arguments to engine HOT 1
- Infinite hang on example prompt. Using AWQ quantization HOT 3
- Is GGUF support broken? HOT 9
- Configuration of the internal port of the docker container HOT 3
- Fix warnings during compile time
- GGUF IQ quants support HOT 1
- Prompts are being interpolated on log output HOT 2
- Problem with dockerfile and compiled image in 0.5.0 HOT 20
- Problem with request (before 0.5 works with no problem) HOT 2
- Overcomplicated and unexplained usage for beginners HOT 13
- GPTQRowParallelLinear has no attribute world_size HOT 2
- Request: Better CLI control over max CTX and rope scaling HOT 1
- ModuleNotFoundError: No module named 'aphrodite.common.logits' HOT 5
- Request: Support CLI option for concurrent request rate limit HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aphrodite-engine.