Class ConvLoRA currently only works for Conv2d. By in

Conv1d and Conv3d are not working about lora HOT 2 OPEN

gau-nernst commented on June 4, 2024

Conv1d and Conv3d are not working

from lora.

Comments (2)

edwardjhu commented on June 4, 2024

Good catch! I had Conv2d in mind when I first wrote it.

Looks like we just need to instantiate lora_A and lora_B differently depending on the kind of convolution.

Happy to review and merge it if someone wants to implement and test it. Otherwise, I'll do it in the near future.

from lora.

gau-nernst commented on June 4, 2024

I have this idea but it will change the way current Conv2d LoRA works. We can treat convolution as a matmul with the input as a flattened "window". For example, for Conv2d, the input is a window with (kernel_size, kernel_size) spatial dimensions, and the flattened input dim is in_channels * kernel_size * kernel_size. This will naturally extend to Conv1d and Conv3d

Conv1d: B @ A = (out_channels, rank) @ (rank, in_channels * kernel_size) = (out_channels, in_channels * kernel_size)
Conv2d: B @ A = (out_channels, rank) @ (rank, in_channels * kernel_size * kernel_size) = (out_channels, in_channels * kernel_size * kernel_size)
Conv3d: B @ A = (out_channels, rank) @ (rank, in_channels * kernel_size * kernel_size * kernel_size) = (out_channels, in_channels * kernel_size * kernel_size * kernel_size)

There are two benefits with the above implementation. (1) kernel size doesn't need to be the same in all spatial dimensions, and (2) we can use convolution in the LoRA branch in the forward pass instead of merging weights, similar to Linear implementation (relevant issue - #54). The first convolution (with lora_A) is normal convolution, with the same kernel size, but the second convolution (with lora_B) will be point-wise (aka 1x1) convolution. I haven't tested it but from what I understand, it should work.

The situation becomes slightly complicated when grouped convolution is involved (groups > 1). I'm thinking of accounting for groups in the input channels of lora_A (so lora_A becomes (rank, in_channels / groups * kernel_size * kernel_size)). We can still implement forward pass of LoRA branch as two convolutions with lora_A and lora_B, where we will use grouped convolution for lora_A, similar to the original convolution branch. A problem might arise when we try to merge weights though. Due to how groped convolution works, I think the merged weights might not be lora_B @ lora_A (I will need to test this). If that's the case, we need to use a different calculation to merge weights.

Another way of using groups > 1 is to follow your current implementation, which puts groups in the output of lora_B (out_channels / groups, rank). However, this would sacrifice the ability to use convolution for forward pass in LoRA branch, but maintains the ability to merge weights with simple lora_B @ lora_A.

Let me know what you think @edwardjhu. Thank you!

from lora.

Conv1d and Conv3d are not working about lora HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent