Comments (1)
I notice you are using device_map="balanced"
, setting it to "auto" might be helpful.
If you are able to load the model in one GPU, don't load it into multiple, since passing parameters across GPUs is really slow. You could try using as few GPUs as possible for model parallelism.
You could also try quantizing the model(4 bit models are around 1GB/Bparameters):
https://huggingface.co/blog/4bit-transformers-bitsandbytes
Also, loading model with use_flash_attention_2=True
could speed it up.
from deepseek-coder.
Related Issues (20)
- Trying to finetune DeepSeek-Coder on custom Dataset HOT 13
- 33B AWQ量化+vLLM部署问题
- 如何构建微调的CoT数据 HOT 1
- 官方提供的微调训练脚本是否支持33B模型训练?(及训练相关问题) HOT 1
- Leetcode数据集的构建脚本请问可以开源吗
- Fail to fine-tune V1.5 model with custom llama script HOT 1
- How can I do continue pretraining? HOT 1
- Are NTP and FIM 2 separate stages of training, or are they combined? HOT 4
- clarification on the sentinel token format
- 使用react调用接口错误
- Does DeepSeek-Coder have wasm related knowledge? HOT 1
- Why generate "GGGGG...." ,when the input string is longer than a certain length in GGUF model? HOT 1
- What is the base context length of the model before extension to 16k? HOT 1
- 请问支持function call吗?支持在RAG中实现inline citations吗?
- markdown格式的数据预训练 HOT 3
- 微调完的模型,如何跟基础模型合并? HOT 1
- 本地部署怎么实现vscode自动代码补全? HOT 1
- How to use fine-tuned model? HOT 3
- 使用vllm加速inference后输出容易不符合格式要求
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepseek-coder.