Coder Social home page Coder Social logo

Comments (12)

david-berthelot avatar david-berthelot commented on July 17, 2024 8
  1. Interleave is simply forming batches of items that come from both labeled and unlabeled batches. Since we only update batch norm for the first batch, it's important that this batch is representative of the whole data.
  2. BN is only updated for the first batch.
  3. Post ops contains only the operations to perform after the gradient update: update batch norm, do weight decay, update moving average weights.

from mixmatch.

david-berthelot avatar david-berthelot commented on July 17, 2024 1

I am not sure, typically in fully supervised learning one only runs one batch and thus updates batch norm only once.

from mixmatch.

bl0 avatar bl0 commented on July 17, 2024

Also, I can't understand the role of post_ops. I know that only the ops in post_ops can be evaluated, such that BN is updated only for some special cases, in other cases BN will be frozen, but for what cases?

Thanks very much.

from mixmatch.

bl0 avatar bl0 commented on July 17, 2024

Thanks for your reply.

from mixmatch.

bl0 avatar bl0 commented on July 17, 2024

If the BN is updated for all batches, what will happen?

from mixmatch.

bl0 avatar bl0 commented on July 17, 2024

Thanks very much.

from mixmatch.

happygds avatar happygds commented on July 17, 2024

@bl0 did you try update BN for all batches, what do you observe ?

from mixmatch.

bl0 avatar bl0 commented on July 17, 2024

The results are bad. So maybe we should be careful about the BN.

from mixmatch.

moskomule avatar moskomule commented on July 17, 2024

Hi, I didn't know this trick. As @bl0, I also found that this interleaving avoids performance drop quite well.
Are there any references for this?

from mixmatch.

david-berthelot avatar david-berthelot commented on July 17, 2024

I'm not sure about references. There are many ways to train with batch norm (for example, one could make a giant batch of everything), I simply chose that solution (interleaving) because I was considering doing multi-GPU and I wanted a homogeneous batch.

from mixmatch.

zhaozhengChen avatar zhaozhengChen commented on July 17, 2024

Hi authors,
In mixmatch.py, you forward three batches(one labeled and two unlabeled) separately and use interleave to get a batch which can represent the whole data, why we cannot forward them together?
I tried to forward them together but got a bad performance.

from mixmatch.

david-berthelot avatar david-berthelot commented on July 17, 2024

I don't know. It could be the way you made change introduced a bug, it could be that there it introduces a different behavior but I don't remember if it is the case: this research was done a year ago.
Please update this thread if you find the reason so others can benefit.

from mixmatch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.