Is it possible to save the merged PT file like it is for stable diffusion? Would it st

Merge lora permanently? about alpaca_lora_4bit HOT 6 OPEN

Ph0rk0z commented on June 26, 2024

Merge lora permanently?

from alpaca_lora_4bit.

Comments (6)

ehartford commented on June 26, 2024 1

I wrote a script for that, but the resulting model won't fit in VRAM. (30b anyway)

import sys
sys.path.insert(0, './repository/transformers/src')
sys.path.insert(0, './repository/GPTQ-for-LLaMa')
sys.path.insert(0, './repository/peft/src')

import peft
import peft.tuners.lora
assert peft.tuners.lora.is_gptq_available()

import torch
import transformers
from autograd_4bit import load_llama_model_4bit_low_ram
from peft import LoraConfig, get_peft_model, get_peft_model_state_dict, PeftModel

print('PyTorch version:', torch.__version__)
print('Transformers version:', transformers.__version__)
print('PEFT version:', peft.__version__)
print('cuda:', torch.cuda.is_available())

# ! Config
from arg_parser import get_config
import train_data

ft_config = get_config()

# * Show loaded parameters
if ft_config.local_rank == 0:
    print(f"{ft_config}\n")

if ft_config.gradient_checkpointing:
    print('Disable Dropout.')

# Load Basic Model
model, tokenizer = load_llama_model_4bit_low_ram(ft_config.llama_q4_config_dir, ft_config.llama_q4_model, device_map=ft_config.device_map)

# Load base model and adapter
adapter_path = './alpaca_lora/adapter_model.bin'
adapter_weights = torch.load(adapter_path)

# Merge the adapter weights into the base model
for name, param in model.named_parameters():
    if name in adapter_weights:
        param.data = adapter_weights[name]

# Save the merged model
output_path = './merged_model'
model.save_pretrained(output_path)

from alpaca_lora_4bit.

johnsmith0031 commented on June 26, 2024

I think it is hard to do because the 4-bit model is already quantized once and if you reconstruct it and add lora-weight to the weight and quantize it again the loss may be out of control. But technically feasible, if ignore the quality of the merged model.

But I have a plan to make a LoRA adapter that multiple LoRA can be used together just like that in stable diffusion.

from alpaca_lora_4bit.

Ph0rk0z commented on June 26, 2024

How about applying it to the HF and then re-quantizing it? The new GPTQ requires doing that anyway. Ideally in FP16.

from alpaca_lora_4bit.

johnsmith0031 commented on June 26, 2024

Yes then I think it's feasible.
And also, are those 4-bit checkpoints released?

from alpaca_lora_4bit.

Ph0rk0z commented on June 26, 2024

For the other model types? I have the one for the GPTX-20 that someone uploaded. Otherwise they can just be slowly requantized.

from alpaca_lora_4bit.

LoopControl commented on June 26, 2024

The above script doesn't seem to work after the v2 GPTQ changes -- it produces a model file that is exactly the same as the original.

Anyone know how to fix (it looks like the layer names don't match for that for name,param loop)?

from alpaca_lora_4bit.

Merge lora permanently? about alpaca_lora_4bit HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent