Coder Social home page Coder Social logo

Comments (9)

rasbt avatar rasbt commented on June 11, 2024 1

Haven't seen that yet but it could be due to memory limitations when merging with the original model. Wdyt @awaelchli ?

Btw otherwise it looks like everything was successful. You can try to manually merge the weights via the litgpt merge_lora command.

from litgpt.

awaelchli avatar awaelchli commented on June 11, 2024

Yes, that means you probably ran out of CPU memory. Right now, merging LoRA parameters requires the entire checkpoint to fit in memory.

from litgpt.

carmocca avatar carmocca commented on June 11, 2024

@alistairwgillespie How much RAM does your system have?

from litgpt.

carmocca avatar carmocca commented on June 11, 2024

I opened #1189 which seems to help for me. It would be interesting if you could try it, Alistair.

from litgpt.

Andrei-Aksionov avatar Andrei-Aksionov commented on June 11, 2024

I think that the biggest issue is that the training was done with quantization. But merge_lora.py doesn't support it.
We can load the model in quantized form, merge weights. But when it comes to saving the model we have a couple of options:

  • If we want to save the model in quantized form there shouldn't be any significant problems. Plus the latest BNB should support it.
  • If we want to save in dequantized form, then we have to add incremental dequantization and saving. Otherwise the whole model has to be dequantized right before saving, which significantly increases memory consumption.
    Lightning-AI/pytorch-lightning#19242
  • Or just stick to the current approach and hope assume that a user have more (much more in fact) free CPU RAM than GPU VRAM.

I want to also mention that we might have a problem with merging quantized weights: #935

from litgpt.

ecatkins avatar ecatkins commented on June 11, 2024

@carmocca Was having the same issue - and your PR resolved it for me (at least on a small test training run)

from litgpt.

carmocca avatar carmocca commented on June 11, 2024

@Andrei-Aksionov For now the easiest thing is to save it dequantized. The LoRA merge already supports this (you added this!). If it's not working well then we should fix it

from litgpt.

Andrei-Aksionov avatar Andrei-Aksionov commented on June 11, 2024

Yes, I even remember adding it 😆.
You can quantize a model upon loading (thanks to Fabric), merge while keeping the model in quantized form (guess here kudos goes to me), but if you want to save a model in a dequantized form you need to first to dequantied the whole model.

If it's not working well then we should fix it

The fix would be an incremental dequantization/saving:

  1. Take a layer that we want to save
  2. Dequantize it
  3. Save it
  4. Go back to step 1

from litgpt.

alistairwgillespie avatar alistairwgillespie commented on June 11, 2024

Apologies for the late reply, All. I updated the hardware notes in the original issue for audit purposes. Boosting the hardware fixed the issue. The additional feedback was helpful too.

from litgpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.