🐛 Describe the bug ./ip_adapter_weight.

Why does the size of state_dict decrease a lot when I move the parameters to the CPU? What information was lost? about pytorch HOT 6 CLOSED

JWargrave commented on September 28, 2024

Why does the size of state_dict decrease a lot when I move the parameters to the CPU? What information was lost?

from pytorch.

Comments (6)

tringwald commented on September 28, 2024 1

It's not metadata, your saved tensors seem to be views into a bigger tensor.

x = torch.ones(1000**3)
y = x[:5]  # y is a view of the first 5 entries of x

Calling torch.save(y, ...) will still save the x tensor.

from pytorch.

tringwald commented on September 28, 2024

Are the tensors you are saving views of a bigger tensor? In that case, torch will save the whole tensor. You can avoid that by calling .clone() on all tensor views before saving.

from pytorch.

JWargrave commented on September 28, 2024

Are the tensors you are saving views of a bigger tensor? In that case, torch will save the whole tensor. You can avoid that by calling .clone() on all tensor views before saving.

Yes. state_dict_save_during_training contains 64 parameters and I have tried to save every parameter to a bin file with the code below:

for k, v in state_dict_save_during_training.items():
    torch.save(v, f'./gpu_{k}.bin')
    torch.save(v.cpu(), f'./cpu_{k}.bin')

Every gpu_{k}.bin has a size of 1.6G and cpu_{k}.bin is much smaller. I want to know is there any difference in using gpu_{k}.bin or cpu_{k}.bin for testing and subsequent training?

from pytorch.

tringwald commented on September 28, 2024

There should not be any difference in the tensor's data, you'll just discard some metadata (like the tensor's CUDA device) by using cpu_{k].bin. As I said above, you can also try to call .clone() instead of .cpu() and see if that fixes the problem.

from pytorch.

JWargrave commented on September 28, 2024

There should not be any difference in the tensor's data, you'll just discard some metadata (like the tensor's CUDA device) by using cpu_{k].bin. As I said above, you can also try to call .clone() instead of .cpu() and see if that fixes the problem.

I tried .clone()

state_dict_save_during_training=torch.load('./ip_adapter_weight.bin') # load weight saved during training
state_dict_to_cpu={k:state_dict_save_during_training[k].clone() for k in state_dict_save_during_training.keys()} # clone
torch.save(state_dict_to_cpu,'./temp_state_dict_cloned.bin') # save cloned weight

The resulting temp_state_dict_cloned.bin is 98M. Thanks! But I am still surprised that metadata is so large!

from pytorch.

JWargrave commented on September 28, 2024

It's not metadata, your saved tensors seem to be views into a bigger tensor.
x = torch.ones(1000**3)
y = x[:5]  # y is a view of the first 5 entries of x
Calling torch.save(y, ...) will still save the x tensor.

Thank you very much, I understand!

from pytorch.

Recommend Projects

Why does the size of state_dict decrease a lot when I move the parameters to the CPU? What information was lost? about pytorch HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent