Comments (3)
Thanks for sharing this. It might have something to do with how you load the checkpoint. Can you provide a minimal example where this happens?
from lora.
@edwardjhu , I am first loading the model as:
model.load_state_dict(torch.load( "/home/sidra/Documents/Domain_Apatation/UDAS/src/checkpoints/base_model_mms_2023-07-06_12-45-28_PM/dc_model.pth"), strict=False)
.
Below is the structure of a part of loaded model:
UNet2D(
(init_path): Sequential(
(0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): ReLU()
(2): ResBlock(
(conv_path): Sequential(
(0): PreActivationND(
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation): ReLU()
(layer): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
(1): PreActivationND(
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation): ReLU()
(layer): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
)
)
)
)
)
After loading the model, I am replacing the Conv2d
isntances in nn.sequential
and in ResBlock
as:
# Replacing Conv layers with LoRa layer
for name, sub_module in model.named_children():
for name, layer in list(sub_module.named_children()):
#Conv2d
if isinstance(layer, nn.Conv2d):
setattr(sub_module, name, lora.Conv2d(
layer.in_channels,
layer.out_channels,
kernel_size=layer.kernel_size[0],
r=2,
lora_alpha=2))
# ResBlock
elif isinstance(sub_module, nn.Sequential):
for name, layer in list(sub_module.named_children()):
if isinstance(layer, ResBlock):
for i, preactivation_module in enumerate(layer.conv_path):
if isinstance(preactivation_module, PreActivationND) and isinstance(preactivation_module.layer, nn.Conv2d):
setattr(preactivation_module, 'layer', lora.Conv2d(
preactivation_module.layer.in_channels,
preactivation_module.layer.out_channels,
kernel_size=preactivation_module.layer.kernel_size[0],
r=2,
lora_alpha=2))
The updated model structure looks like this:
UNet2D(
(init_path): Sequential(
(0): Conv2d(
(conv): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1))
)
(1): ReLU()
(2): ResBlock(
(conv_path): Sequential(
(0): PreActivationND(
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation): ReLU()
(layer): Conv2d(
(conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
)
)
(1): PreActivationND(
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation): ReLU()
(layer): Conv2d(
(conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
)
)
)
)
)
)
Then I checked the weather lora matrices have been injected correctly by checking param names as:
for name, param in model.named_parameters():
print(name)
init_path.0.lora_A
init_path.0.lora_B
init_path.0.conv.weight
init_path.0.conv.bias
init_path.2.conv_path.0.bn.weight
init_path.2.conv_path.0.bn.bias
init_path.2.conv_path.0.layer.lora_A
init_path.2.conv_path.0.layer.lora_B
init_path.2.conv_path.0.layer.conv.weight
init_path.2.conv_path.0.layer.conv.bias
init_path.2.conv_path.1.bn.weight
init_path.2.conv_path.1.bn.bias
init_path.2.conv_path.1.layer.lora_A
init_path.2.conv_path.1.layer.lora_B
init_path.2.conv_path.1.layer.conv.weight
init_path.2.conv_path.1.layer.conv.bias
Which shows that lora layers have been correctly added. But when I check the weights of conv layers in pre-trained model and one after injecting LoRa layers its not same:
# Pre-trained model
model.init_path[0].weight[0][0]
tensor([[ 0.2988, 0.2760, -0.0493],
[ 0.3431, -0.0962, 0.0716],
[-0.1536, 0.1956, 0.2885]], grad_fn=<SelectBackward0>)
# with LoRa
model.init_path[0].conv.weight[0][0]
tensor([[ 0.1168, 0.0223, -0.1227],
[-0.2735, -0.2281, -0.2859],
[ 0.2369, -0.1391, -0.0499]])
Moreoevr, in my original Conv2D, bias is set to False
, but when I checked model.init_path[0].conv.bias it gives:
Parameter containing:
tensor([-0.1540, 0.0532, -0.0386, -0.0889, -0.1558, 0.0867, -0.2746, 0.3279,
-0.0516, 0.0622, 0.1098, -0.1297, 0.2631, -0.0025, 0.0273, -0.3173],
requires_grad=True)
The requires_grad is also True, but in pre-trained conv layer, bias was False, so from where these values are coming?
Can you please give your feedback on this?
from lora.
Related Issues (20)
- Code samples for "UNDERSTANDING THE LOW-RANK UPDATES" chapter (chapter 7). HOT 2
- why use alpha/r in stead of alpha? HOT 2
- Error in MergedLinear HOT 5
- RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` HOT 4
- matmul ordering in MergedLinear HOT 1
- How does this paper search the hyperparameters on GLUE datasets with Roberta? HOT 1
- gpt2_ft.py: error: unrecognized arguments: --local-rank=0 HOT 1
- M1 Pro 14 inch HOT 1
- Pershendetje HOT 1
- Code
- Is LoRA only applied to query and value matrices?
- Marvelous! Great work! HOT 1
- Bug after the latest commit #63 HOT 1
- MergedLinear bug? HOT 2
- Is it possible to fine-tuning model to extend token limit length? HOT 1
- Replicating Result on WebNLG HOT 1
- About GPU utilization HOT 1
- there's some bug in layer.py
- AB matrix initialization in layers.py does not conform to the description of the paper HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lora.