Coder Social home page Coder Social logo

Comments (2)

sarahooker avatar sarahooker commented on April 28, 2024

Hi Ian,

Wonderful to hear you enjoyed our work! Thanks for these comments, I've put together some thoughts below. I'll tag an owner of this shared research repo to close this issue, but feel free to move this to email if you have additional questions (author email address for correspondence is listed in our paper.

  1. Lottery ticket experiments using one-shot sparsification instead of iterative pruning

I agree, it would be fun to evaluate whether the lottery ticket results hold on these large scale tasks with “one-shot” sparsification. In fact, one of the variants in The lottery ticket hypothesis is whether lottery tickets occur in both one shot pruning and iteratively pruned networks.

However, for both one shot and iteratively pruned networks the authors compare the 1) performance of the sparse substructure trained from scratch (with same weights as initial random initialization) to 2) the performance of the original network.

However, the variant you propose appears to quite different, because you are comparing the performance of the sparse substructure trained from scratch (with same weights as initial random initialization) to the one shot pruned structure at the end of training.

Since both variants would likely perform substantially worse than the original model, it is unclear what information we gain here. I.e., you won't be able to tell whether one’s ability to match accuracy when re-training is a product of your hypothesis or just because the accuracy to match is worse (We suspect it is the latter). It's an interesting question, but I don't see a clear way to clearly disentangle the answer. Still easy to run this variant, and perhaps the results will surprise. :) You can simply run the magnitude pruning for a desired fraction of sparsity once at the end of pruning (I believe by setting the begin_pruning_step and end_pruning_step both to equal one before the last step of training).

  1. Knowledge reconstitution

Hmmm, this I know less about. I believe Erich Elsen, one of my co-authors worked on a project related to this idea called dense-sparse-dense.

Hope these answers are somewhat helpful. Thanks again Ian for taking the time to put together these thoughts.

from google-research.

ekelsen avatar ekelsen commented on April 28, 2024

I think Sara meant to say that you should set the begin_pruning_step = final_step - 1 and end_pruning_step = final_step to mimic zero-shot pruning. You'll also need to set the threshold_decay parameter to 0, otherwise the threshold won't immediately jump to the necessary value to get the sparsity level you want.

Based on previous experience I've had with zero-shot pruning (see for example the last line of Table 4 in https://arxiv.org/pdf/1704.05119.pdf where the error rate more than doubles at 90% pruning), I would guess that zero-shot pruning will actually lead to worse accuracies than the random fixed sparsity patterns trained from scratch. If you try this, would love to know the results.

from google-research.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.