Hi! I am trying to fine tune Conv2d with LoRA. I first loaded the pre-trained model we

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Pre-trained conv weight is not same as that self.conv.weight about lora HOT 3 OPEN

aleemsidra commented on May 23, 2024

Pre-trained conv weight is not same as that self.conv.weight

from lora.

Comments (3)

edwardjhu commented on May 23, 2024

Thanks for sharing this. It might have something to do with how you load the checkpoint. Can you provide a minimal example where this happens?

from lora.

aleemsidra commented on May 23, 2024

@edwardjhu , I am first loading the model as:

model.load_state_dict(torch.load( "/home/sidra/Documents/Domain_Apatation/UDAS/src/checkpoints/base_model_mms_2023-07-06_12-45-28_PM/dc_model.pth"), strict=False).

Below is the structure of a part of loaded model:

UNet2D(
  (init_path): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): ResBlock(
      (conv_path): Sequential(
        (0): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        )
        (1): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        )
      )
    )
  )
)

After loading the model, I am replacing the Conv2d isntances in nn.sequential and in ResBlock as:

# Replacing Conv layers with LoRa layer 

for name, sub_module in model.named_children():
    for name, layer in list(sub_module.named_children()): 
        #Conv2d
        if isinstance(layer, nn.Conv2d):
            setattr(sub_module, name, lora.Conv2d(
            layer.in_channels,
            layer.out_channels,
            kernel_size=layer.kernel_size[0],
            r=2,
            lora_alpha=2))


        # ResBlock
        elif isinstance(sub_module, nn.Sequential):
            for name, layer in list(sub_module.named_children()):
                if isinstance(layer, ResBlock):
                        for i, preactivation_module in enumerate(layer.conv_path):
                            if isinstance(preactivation_module, PreActivationND) and isinstance(preactivation_module.layer, nn.Conv2d):
                                setattr(preactivation_module, 'layer', lora.Conv2d(
                                    preactivation_module.layer.in_channels,
                                    preactivation_module.layer.out_channels,
                                    kernel_size=preactivation_module.layer.kernel_size[0],
                                    r=2,
                                    lora_alpha=2))

The updated model structure looks like this:

UNet2D(
  (init_path): Sequential(
    (0): Conv2d(
      (conv): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1))
    )
    (1): ReLU()
    (2): ResBlock(
      (conv_path): Sequential(
        (0): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(
            (conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
          )
        )
        (1): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(
            (conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
          )
        )
      )
    )
  )
)

Then I checked the weather lora matrices have been injected correctly by checking param names as:

for name, param in model.named_parameters():
      print(name)
init_path.0.lora_A
init_path.0.lora_B
init_path.0.conv.weight
init_path.0.conv.bias
init_path.2.conv_path.0.bn.weight
init_path.2.conv_path.0.bn.bias
init_path.2.conv_path.0.layer.lora_A
init_path.2.conv_path.0.layer.lora_B
init_path.2.conv_path.0.layer.conv.weight
init_path.2.conv_path.0.layer.conv.bias
init_path.2.conv_path.1.bn.weight
init_path.2.conv_path.1.bn.bias
init_path.2.conv_path.1.layer.lora_A
init_path.2.conv_path.1.layer.lora_B
init_path.2.conv_path.1.layer.conv.weight
init_path.2.conv_path.1.layer.conv.bias

Which shows that lora layers have been correctly added. But when I check the weights of conv layers in pre-trained model and one after injecting LoRa layers its not same:

# Pre-trained model
 model.init_path[0].weight[0][0]

tensor([[ 0.2988,  0.2760, -0.0493],
        [ 0.3431, -0.0962,  0.0716],
        [-0.1536,  0.1956,  0.2885]], grad_fn=<SelectBackward0>)

# with LoRa
model.init_path[0].conv.weight[0][0]
tensor([[ 0.1168,  0.0223, -0.1227],
        [-0.2735, -0.2281, -0.2859],
        [ 0.2369, -0.1391, -0.0499]])

Moreoevr, in my original Conv2D, bias is set to False, but when I checked model.init_path[0].conv.bias it gives:

Parameter containing:
tensor([-0.1540,  0.0532, -0.0386, -0.0889, -0.1558,  0.0867, -0.2746,  0.3279,
        -0.0516,  0.0622,  0.1098, -0.1297,  0.2631, -0.0025,  0.0273, -0.3173],
       requires_grad=True)

The requires_grad is also True, but in pre-trained conv layer, bias was False, so from where these values are coming?

Can you please give your feedback on this?

from lora.

Pre-trained conv weight is not same as that self.conv.weight about lora HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent