Comments (9)
Haven't seen that yet but it could be due to memory limitations when merging with the original model. Wdyt @awaelchli ?
Btw otherwise it looks like everything was successful. You can try to manually merge the weights via the litgpt merge_lora
command.
from litgpt.
Yes, that means you probably ran out of CPU memory. Right now, merging LoRA parameters requires the entire checkpoint to fit in memory.
from litgpt.
@alistairwgillespie How much RAM does your system have?
from litgpt.
I opened #1189 which seems to help for me. It would be interesting if you could try it, Alistair.
from litgpt.
I think that the biggest issue is that the training was done with quantization. But merge_lora.py
doesn't support it.
We can load the model in quantized form, merge weights. But when it comes to saving the model we have a couple of options:
- If we want to save the model in quantized form there shouldn't be any significant problems. Plus the latest BNB should support it.
- If we want to save in dequantized form, then we have to add incremental dequantization and saving. Otherwise the whole model has to be dequantized right before saving, which significantly increases memory consumption.
Lightning-AI/pytorch-lightning#19242 - Or just stick to the current approach and
hopeassume that a user have more (much more in fact) free CPU RAM than GPU VRAM.
I want to also mention that we might have a problem with merging quantized weights: #935
from litgpt.
@carmocca Was having the same issue - and your PR resolved it for me (at least on a small test training run)
from litgpt.
@Andrei-Aksionov For now the easiest thing is to save it dequantized. The LoRA merge already supports this (you added this!). If it's not working well then we should fix it
from litgpt.
Yes, I even remember adding it 😆.
You can quantize a model upon loading (thanks to Fabric), merge while keeping the model in quantized form (guess here kudos goes to me), but if you want to save a model in a dequantized form you need to first to dequantied the whole model.
If it's not working well then we should fix it
The fix would be an incremental dequantization/saving:
- Take a layer that we want to save
- Dequantize it
- Save it
- Go back to step 1
from litgpt.
Apologies for the late reply, All. I updated the hardware notes in the original issue for audit purposes. Boosting the hardware fixed the issue. The additional feedback was helpful too.
from litgpt.
Related Issues (20)
- LoRA model tokenizer configuration fails to load HOT 8
- TypeError: unsupported operand type(s) for -: 'float' and 'NoneType' HOT 7
- Gradients in GPT module of the finetuning/lora.py script are always zero HOT 6
- Explain how to pretrain on a custom dataset
- Add `--warmup_fraction` to pretraining script HOT 2
- Batch Inference (batch size > 1) HOT 1
- LongLora fine-tuning support HOT 3
- False positive warning about mixed precision in `merge_lora.py`
- Categorize SFT and Pretraining data HOT 3
- Meaningful error if no validation split fraction is provided in custom JSON data module HOT 1
- Decide what to do about 16bit weights trained with mixed precision
- Determine the default precision and quantization in chat and generate HOT 1
- Question about using custom dataset for pretraining HOT 3
- Deployment example HOT 2
- Automatically infer and download the tokenizer in pretrain?
- 1.8B H2O model HOT 3
- Is it possible to run Llama 2 70B with 80Gb? HOT 3
- Feature Request: A generation API that does not load the model weights every time HOT 3
- Problem when evaluating finetune model using adapter_v2 HOT 6
- Log additional data with LoRA finetuning
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from litgpt.