In Hinton's paper, it is said that "On each presentation of each training case, each h

Random dropout at each mini-batch? about dropout HOT 8 CLOSED

ChenglongChen commented on July 4, 2024

Random dropout at each mini-batch?

from dropout.

Comments (8)

mdenil commented on July 4, 2024

The code as written will correctly use different dropout masks for different invocations of the train_model function defined at https://github.com/mdenil/dropout/blob/master/mlp.py#L261

The variable mask is a symbolic random variable whose source of randomness is the RandomStream srng. Theano internally handles creating different random instantiations of mask as needed. See here: http://deeplearning.net/software/theano/tutorial/examples.html#using-random-numbers for an explanation of how random variables work in theano.

The code as written will not use different random masks across runs of the program because rng is seeded with a fixed number here: https://github.com/mdenil/dropout/blob/master/mlp.py#L190 If you want to use a different sequence of dropout masks each time you train then you will need to use a different seed for rng each time (I would accept a pull request to add this funcitonality).

from dropout.

ChenglongChen commented on July 4, 2024

Thanks for the clarification. I have misinterpreted the "999999" as the random seed. That is so silly of me. Yes, indeed, a different dropout mask will be used for each mini-batch training.
I have added a few functionality based on your code these few days for my own use, including:

allow to user define a different random seed, as you mentioned
allow to user define the dropout rate used for each layer
allow to user define the activation functions for each layer
constrain the norms of the columns of the weight matrices
change the update rule to match Hinton's dropout paper
I definitely would like to contribute, but I am newbie to Github and might need some time to learn how to PR.

from dropout.

mdenil commented on July 4, 2024

Even if 99999 where the random seed, a different dropout mask would still be used for each minibatch. This code does in fact set the random seed of srng to a fixed value (in particular it sets it to rng.randint(999999), but rng is a numpy RandomState whose seed is set explicitly so the call to randint will always produce the same number across runs of the program).

The reason the dropout masks will be different for different minibatches is because theano.tensor.shared_randomstreams.RandomStreams creates symbolic random variables, which is not the same as generating a single random matrix. The link in my previous comment explains how the symbolic random variables work in theano.

You can find some documentation on how to set up a pull request here: https://help.github.com/articles/using-pull-requests I'd be happy to integrate any of the changes you mentioned.

from dropout.

ChenglongChen commented on July 4, 2024

I have made the PR.

Regarding your question, I am not totally agreed.
In function:
def _dropout_from_layer(rng, layer, p):
"""p is the probablity of dropping a unit
"""
srng = theano.tensor.shared_randomstreams.RandomStreams(
rng.randint(999999))
# p=1-p because 1's indicate keep and p is prob of dropping
mask = srng.binomial(n=1, p=1-p, size=layer.shape)
# The cast is important because
# int * float32 = float64 which pulls things off the gpu
output = layer * T.cast(mask, theano.config.floatX)
return output

If rng.randint(999999) were fixed seed, then every call of _dropout_from_layer will produce the same random stream srng. Because of this, mask will always be the same since it is always the first call to srng.

The randomness, I think, actually lies in rng.randint(999999), which will uniformly draw a random integer from [0,999999]. After each call to rng, the internal state of rng will be updated. Due to this, the next call to rng will result in a different integer. Finally, this integer is fed to srng = theano.tensor.shared_randomstreams.RandomStreams() as random seed to generate a different random stream, thus introduce the randomness for each mini-batch training.

You can verify the behavior of numpy.random.RandomState using the following code:

import numpy as np
rng = np.random.RandomState(1234)
print rng.randint(999999) # first call; output 486191
print rng.randint(999999) # second call; output 451283

rng = np.random.RandomState(1234)
def randomNum(rng):
print rng.randint(999999)
randomNum(rng) # output 486191; after this call, the internal state of rng will be updated
print rng.randint(999999) # output 451283 instead of 486191

from dropout.

mdenil commented on July 4, 2024

Thank you for the pull request. I will merge it once I have had a chance to review it this evening.

_dropout_from_layer is called once per (dropout) layer and initializes a different theano.tensor.shared_randomstreams.RandomStreams object each time. Using rng.randint(999999) causes the seed for each of these random streams to be set differently.

If we instead seeded srng in each layer with the same number then the theano RandomStreams objects would still choose different dropout masks for each minibatch, but the dropout masks for different layers within the same minibatch would be the same (actually different layer sizes would cause the random streams to de-sync because masks of different sizes require different amounts of random numbers, but two layers of the same size would always generate the same mask for each minibatch).

from dropout.

ChenglongChen commented on July 4, 2024

Er, with your detailed explanation and the links, I think I might get a sense of it now. I once thought that _dropout_from_layer is called every mini-batch, but indeed it is only called at the initialization stage. Then it generates a random stream called mask (can I call it a random variable?) that go into the computation graph. In the training phase, for each mini-batch, mask will updated its internal state after the computation involved mask (i.e., dropout in this case). In that way, each mini-batch will use different dropout units. Am I correct?

from dropout.

mdenil commented on July 4, 2024

Yes that is correct. Precisely it is srng that is the random stream, and mask is a random variable whose source of randomness is srng. For each minibatch we call train_model and after each call the internal state of srng is updated. This will cause mask to have a different value when when train_model is called on the next minibatch.

from dropout.

ChenglongChen commented on July 4, 2024

Thanks a ton!

from dropout.

Random dropout at each mini-batch? about dropout HOT 8 CLOSED

Comments (8)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent