<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[state_of_sparsity] - Knowledge transfer and reconstitution about google-research HOT 2 CLOSED

google-research commented on April 28, 2024

[state_of_sparsity] - Knowledge transfer and reconstitution

from google-research.

Comments (2)

sarahooker commented on April 28, 2024

Hi Ian,

Wonderful to hear you enjoyed our work! Thanks for these comments, I've put together some thoughts below. I'll tag an owner of this shared research repo to close this issue, but feel free to move this to email if you have additional questions (author email address for correspondence is listed in our paper.

Lottery ticket experiments using one-shot sparsification instead of iterative pruning

I agree, it would be fun to evaluate whether the lottery ticket results hold on these large scale tasks with “one-shot” sparsification. In fact, one of the variants in The lottery ticket hypothesis is whether lottery tickets occur in both one shot pruning and iteratively pruned networks.

However, for both one shot and iteratively pruned networks the authors compare the 1) performance of the sparse substructure trained from scratch (with same weights as initial random initialization) to 2) the performance of the original network.

However, the variant you propose appears to quite different, because you are comparing the performance of the sparse substructure trained from scratch (with same weights as initial random initialization) to the one shot pruned structure at the end of training.

Since both variants would likely perform substantially worse than the original model, it is unclear what information we gain here. I.e., you won't be able to tell whether one’s ability to match accuracy when re-training is a product of your hypothesis or just because the accuracy to match is worse (We suspect it is the latter). It's an interesting question, but I don't see a clear way to clearly disentangle the answer. Still easy to run this variant, and perhaps the results will surprise. :) You can simply run the magnitude pruning for a desired fraction of sparsity once at the end of pruning (I believe by setting the begin_pruning_step and end_pruning_step both to equal one before the last step of training).

Knowledge reconstitution

Hmmm, this I know less about. I believe Erich Elsen, one of my co-authors worked on a project related to this idea called dense-sparse-dense.

Hope these answers are somewhat helpful. Thanks again Ian for taking the time to put together these thoughts.

from google-research.

ekelsen commented on April 28, 2024

I think Sara meant to say that you should set the begin_pruning_step = final_step - 1 and end_pruning_step = final_step to mimic zero-shot pruning. You'll also need to set the threshold_decay parameter to 0, otherwise the threshold won't immediately jump to the necessary value to get the sparsity level you want.

Based on previous experience I've had with zero-shot pruning (see for example the last line of Table 4 in https://arxiv.org/pdf/1704.05119.pdf where the error rate more than doubles at 90% pruning), I would guess that zero-shot pruning will actually lead to worse accuracies than the random fixed sparsity patterns trained from scratch. If you try this, would love to know the results.

from google-research.

[state_of_sparsity] - Knowledge transfer and reconstitution about google-research HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent