Coder Social home page Coder Social logo

deepul's People

Contributors

kvfrans avatar wilson1yan avatar wuphilipp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepul's Issues

HW2 autoregressive flow for images solution issue

I'm pretty sure the log-likelihood for the solution to the second exercise is off. The nll is defined as:

  def nll(self, x, cond=None):
    loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
    weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
    log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1))
    return -log_det_jacobian.mean()

As you can see, the weights are never used. I believe this should be:

  def nll(self, x, cond=None):
    loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
    weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
    log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1)).exp()
    return -torch.log((log_det_jacobian * weights).sum(dim=2)).mean()

Question regarding solutions availability

Hi!

Thank you for an awesome course!

I am not a Berkeley student and have been following the course through lectures on YouTube. I am wondering whether homework solutions will be publicly available on the website?

What is `self.loc` in `MixtureCDFFlow` class?

I am trying to implement the 1D flow as shown in Flow Models Demos (Official).ipynb. I can't figure out what is self.loc in the MixtureCDFFlow class. Also, what does self.n_components denote?
This is the code

 def flow(self, x):
        # set up mixture distribution
        weights = F.softmax(self.weight_logits, dim=0).unsqueeze(0).repeat(x.shape[0], 1)
        mixture_dist = self.mixture_dist(self.loc, self.log_scale.exp())
        x_repeat = x.unsqueeze(1).repeat(1, self.n_components)

        # z = cdf of x
        z = (mixture_dist.cdf(x_repeat) * weights).sum(dim=1)

        # log_det = log dz/dx = log pdf(x)
        log_det = (mixture_dist.log_prob(x_repeat).exp() * weights).sum(dim=1).log()

        return z, log_det

Some help is really appreciated.

Thanks

Reporting bugs/errors in `lecture3_flow_models_demos.ipynb`

First of all thank you for making all of the lectures and other content public. This is really helpful.

I took a look at the demo implementations for lecture 3 and found some bugs which I am reporting here:

1. In Demo 3, the .flow() method in class ConditionalMixtureCDFFlow(nn.Module): has the following signature:

def flow(self, x, cond):

However when .flow() is called by .invert() method, the condition cond is not passed to .flow():

def invert(self, z, cond):
        # Find the exact x via bisection such that f(x) = z
        results = []
        for z_elem in z:
            def f(x):
                # SHOULD PASS `cond` in the line below
                return self.flow(torch.tensor(x).unsqueeze(0))[0] - z_elem
            x = bisect(f, -20, 20)
            results.append(x)
        return torch.tensor(results).reshape(z.shape)

2. In Demo 4 the .forward() method of MaskConv2d never uses cond or batch_size:

class MaskConv2d(nn.Conv2d):
  def __init__(self, mask_type, *args, **kwargs):
    assert mask_type == 'A' or mask_type == 'B'
    super().__init__(*args, **kwargs)
    self.register_buffer('mask', torch.zeros_like(self.weight))
    self.create_mask(mask_type)

  def forward(self, input, cond=None):
    # batch_size AND cond ARE NEVER USED
    batch_size = input.shape[0]
    out = F.conv2d(input, self.weight * self.mask, self.bias, self.stride,
                   self.padding, self.dilation, self.groups)
    return out

So it has no effect when it is called by the .forward() method of AutoregressiveFlowPixelCNN like so:

      if isinstance(layer, MaskConv2d):
        out = layer(out, cond=cond)
      else:
        out = layer(out)

3. In Demo 4 the .nll() method of AutoregressiveFlowPixelCNN does not take exponential of log_prob and use weights when calculating log_det_jacobian:

    loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
    weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
    log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1))
    return -log_det_jacobian.mean()

I think it should be something like:

log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1)).exp() * weights

I actually have lots of questions about why .nll() is implemented the way it is. Why the need to .unsqueeze(1).repeat(...) rather than just multiplying it the standard way? Where is the base_dist that forces the output of the transformed variables to have a known distribution?

Looking at the .sample() method it seems the weights are used to select mean and var for a sample. But how are they learnt? Regardless of how it's being used, the weights are not used in the .nll() function and thus should not be calculated.

Please let me know if I am missing something here.

HW1 MADE | possible flaws in solution implementaiotn.

Hi,
first of all thank you for the course and provided solutions.

There are maybe a few flaws in MADE implementation but I can be wrong.

for l in range(num_hidden):
            self.m[l] = np.random.randint(
                self.m[l - 1].min(), self.nin - 1, size=self.hidden_sizes[l]
            )

When you initialize random numbers for masks it seems better to use permutation, otherwise we may zero more than we need. Let's say with a very low chance we can get the next m with all the same numbers.

Also, as I understand size of unique numbers in m for all m should be not smaller than 'nin -1', therefore len(m) should not be less than 'nin -1'. Maybe better to add some assertion for it.

Again,
I may be wrong and thank you for your great work.

Bug in MADE sampling implementation in 'hw1_solutions.ipynb'

//Implementation provided in the notebook

bug

Suppose there are three variables x1,x2,x3 and the ordering given is 3,1,2 , then during sampling:
I)Desired sampling order should be x2,x3,x1
II)The above code sample in order x3,x1,x2

This bug is noticeable when random permutation is passed as ordering in function q2_b (passing ordering = None doesn't expose the bug):
model = MADE((1, H, W), 2, hidden_size=[512, 512],ordering = np.random.permutation(H*W)).cuda()

Result with the current bug on shape and MNIST samples (when ordering passed is random permutation)
shapebug
mnistbug

So the slide 38 of Lecture 2 also needs to be edited.Random permutation doesn't generate such bad samples as shown in slides.Its the bug in the code
https://docs.google.com/presentation/d/1xl4KKNYw08PatORSnFDyzX6MT98brEdYElZKZ5bi2YM/edit#slide=id.g7d02b18e4d_0_191

Suggested Fix code:
correct_code

Results with this code
shapes_correct
mnist_correct

ActNorm implementation missing division by `std` on the shift parameter

Hi,

Thanks for making the video lectures and homework public. I'm really enjoying the course so far. I was going through homework 2 and wanted to compare my stuff with the solutions. For the solution of hw2, I found the following implementation of ActNorm

class ActNorm(nn.Module):
    def __init__(self, n_channels):
        super(ActNorm, self).__init__()
        self.log_scale = nn.Parameter(torch.zeros(1, n_channels, 1, 1), requires_grad=True)
        self.shift = nn.Parameter(torch.zeros(1, n_channels, 1, 1), requires_grad=True)
        self.n_channels = n_channels
        self.initialized = False

    def forward(self, x, reverse=False):
        if reverse:
            return (x - self.shift) * torch.exp(-self.log_scale), self.log_scale
        else:
            if not self.initialized:
                self.shift.data = -torch.mean(x, dim=[0, 2, 3], keepdim=True)
                self.log_scale.data = - torch.log(
                    torch.std(x.permute(1, 0, 2, 3).reshape(self.n_channels, -1), dim=1).reshape(1, self.n_channels, 1,
                                                                                                 1))
                self.initialized = True
                result = x * torch.exp(self.log_scale) + self.shift
            return x * torch.exp(self.log_scale) + self.shift, self.log_scale

I think the shift needs to be divided by the standard deviation as follows for the activations to be normalized.

self.shift.data = -(torch.mean(x, dim=[0, 2, 3], keepdim=True) * torch.exp(self.log_scale)

Let me know if I'm missing something.

Flows demo fails because of missing `.to(ptu.device)`

The AutoregressiveFlow and RealNVP cells fails with the error message copied at the end of the message. I ran all the cells sequentially from the beginning.

If I add real_nvp = real_nvp.to(ptu.device), all works fine:

real_nvp = RealNVP([AffineTransform("left", n_hidden=2, hidden_size=64),
                    AffineTransform("right", n_hidden=2, hidden_size=64),
                    AffineTransform("left", n_hidden=2, hidden_size=64),
                    AffineTransform("right", n_hidden=2, hidden_size=64)],
                   train_loader.dataset, 'moons', train_labels)
real_nvp = real_nvp.to(ptu.device) # <-- ADDED THIS LINE 
train_losses, test_losses = train_epochs(real_nvp, train_loader, test_loader, dict(epochs=250, lr=5e-3, epochs_to_plot=[0, 3, 6, 10, 25, 249]))

Error messages:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-54-8b25cbd7d1cc> in <module>
      1 ar_flow = AutoregressiveFlow(train_loader.dataset, 'moons', train_labels)
----> 2 train_losses, test_losses = train_epochs(ar_flow, train_loader, test_loader, dict(epochs=100, lr=5e-3, epochs_to_plot=[0, 1, 3, 6, 10, 99]))

<ipython-input-41-5d16b72566e2> in train_epochs(model, train_loader, test_loader, train_args)
     37     for epoch in tqdm_notebook(range(epochs), desc='Epoch', leave=False):
     38         model.train()
---> 39         train(model, train_loader, optimizer)
     40         train_loss = eval_loss(model, train_loader)
     41         train_losses.append(train_loss)

<ipython-input-41-5d16b72566e2> in train(model, train_loader, optimizer)
      4     for x in train_loader:
      5         x = x.to(ptu.device).float()
----> 6         loss = model.nll(x)
      7         optimizer.zero_grad()
      8         loss.backward()

<ipython-input-52-b941007c468c> in nll(self, x)
    100 
    101     def nll(self, x):
--> 102         return - self.log_prob(x).mean()
    103 
    104     def plot(self, title):

<ipython-input-52-b941007c468c> in log_prob(self, x)
     96 
     97     def log_prob(self, x):
---> 98         z, log_det = self.flow(x)
     99         return (self.base_dist.log_prob(z) + log_det).sum(dim=1) # shape: [batch_size, dim]
    100 

<ipython-input-52-b941007c468c> in flow(self, x)
     92         x1, x2 = torch.chunk(x, 2, dim=1)
     93         z1, log_det1 = self.dim1_flow.flow(x1.squeeze())
---> 94         z2, log_det2 = self.dim2_flow.flow(x2, cond=x1)
     95         return torch.cat([z1.unsqueeze(1), z2.unsqueeze(1)], dim=1), torch.cat([log_det1.unsqueeze(1), log_det2.unsqueeze(1)], dim=1)
     96 

<ipython-input-52-b941007c468c> in flow(self, x, cond)
     33     def flow(self, x, cond):
     34         # parameters of flow on x depend on what it's conditioned on
---> 35         loc, log_scale, weight_logits = torch.chunk(self.mlp(cond), 3, dim=1)
     36         weights = F.softmax(weight_logits)
     37 

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

<ipython-input-52-b941007c468c> in forward(self, x)
     12 
     13     def forward(self, x):
---> 14         return self.layers(x)
     15 
     16 # same CDF flow as in Demo 1, but conditioned on an auxillary variable

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     65     @weak_script_method
     66     def forward(self, input):
---> 67         return F.linear(input, self.weight, self.bias)
     68 
     69     def extra_repr(self):

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1350     if input.dim() == 2 and bias is not None:
   1351         # fused op is marginally faster
-> 1352         ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t())
   1353     else:
   1354         output = input.matmul(weight.t())

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #4 'mat1'

`optimizer.zero_grad` called after calculating loss instead of before in `lecture3_flow_models_demos.ipynb`.

In the lecture3_flow_models_demos.ipynb optimizer.zero_grad` is called after calculating the loss. i.e:

 model.train()
  for x in train_loader:
      x = x.float()
      loss = model.nll(x)
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

In other implementation, the standard is to call optimizer.zero_grad before forward propagation. e.g

optimizer.zero_grad()
y = model(x)
loss = criterion(y, y_true)
loss.backward()
optimizer.step()

Changing the implementation to before calculating the loss, breaks the model and the model fails to learn anything. I cannot seem to understand why the model has that behavior.

HW1 solutions Discretized Mixture of Logistics Parameter initialization confusion

Hi,
I was going through hw1_solutions at https://github.com/rll/deepul/blob/master/homeworks/solutions/hw1_solutions.ipynb and I got confused at the Discretized Mixture of Logistics implementation.
In the _init_ function of the class, the parameters(means, log_scales, logits) are all initialized in a different way. I couldn't understand how those initializations were decided. For example, why don't we declare all of these with torch.randn() ?
image
I will appreaciate any help. Thank you for your time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.