Coder Social home page Coder Social logo

Why does the size of state_dict decrease a lot when I move the parameters to the CPU? What information was lost? about pytorch HOT 6 CLOSED

JWargrave avatar JWargrave commented on September 28, 2024
Why does the size of state_dict decrease a lot when I move the parameters to the CPU? What information was lost?

from pytorch.

Comments (6)

tringwald avatar tringwald commented on September 28, 2024 1

It's not metadata, your saved tensors seem to be views into a bigger tensor.

x = torch.ones(1000**3)
y = x[:5]  # y is a view of the first 5 entries of x

Calling torch.save(y, ...) will still save the x tensor.

from pytorch.

tringwald avatar tringwald commented on September 28, 2024

Are the tensors you are saving views of a bigger tensor? In that case, torch will save the whole tensor. You can avoid that by calling .clone() on all tensor views before saving.

from pytorch.

JWargrave avatar JWargrave commented on September 28, 2024

Are the tensors you are saving views of a bigger tensor? In that case, torch will save the whole tensor. You can avoid that by calling .clone() on all tensor views before saving.

Yes. state_dict_save_during_training contains 64 parameters and I have tried to save every parameter to a bin file with the code below:

for k, v in state_dict_save_during_training.items():
    torch.save(v, f'./gpu_{k}.bin')
    torch.save(v.cpu(), f'./cpu_{k}.bin')

Every gpu_{k}.bin has a size of 1.6G and cpu_{k}.bin is much smaller. I want to know is there any difference in using gpu_{k}.bin or cpu_{k}.bin for testing and subsequent training?

from pytorch.

tringwald avatar tringwald commented on September 28, 2024

There should not be any difference in the tensor's data, you'll just discard some metadata (like the tensor's CUDA device) by using cpu_{k].bin. As I said above, you can also try to call .clone() instead of .cpu() and see if that fixes the problem.

from pytorch.

JWargrave avatar JWargrave commented on September 28, 2024

There should not be any difference in the tensor's data, you'll just discard some metadata (like the tensor's CUDA device) by using cpu_{k].bin. As I said above, you can also try to call .clone() instead of .cpu() and see if that fixes the problem.

I tried .clone()

state_dict_save_during_training=torch.load('./ip_adapter_weight.bin') # load weight saved during training
state_dict_to_cpu={k:state_dict_save_during_training[k].clone() for k in state_dict_save_during_training.keys()} # clone
torch.save(state_dict_to_cpu,'./temp_state_dict_cloned.bin') # save cloned weight

The resulting temp_state_dict_cloned.bin is 98M. Thanks! But I am still surprised that metadata is so large!

from pytorch.

JWargrave avatar JWargrave commented on September 28, 2024

It's not metadata, your saved tensors seem to be views into a bigger tensor.

x = torch.ones(1000**3)
y = x[:5]  # y is a view of the first 5 entries of x

Calling torch.save(y, ...) will still save the x tensor.

Thank you very much, I understand!

from pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.