rll / deepul Goto Github PK

View Code? Open in Web Editor NEW

741.0 741.0 367.0 94.7 MB

Python 3.90% TeX 2.45% Jupyter Notebook 93.65%

deepul's People

Contributors

Stargazers

Watchers

Forkers

edwardcen siamakz frankfan007 lelayf allensmile mods333 olivetom hdocmsu aashiqmuhamed theaeroes cleysonl jizongfox davidyaonanzhu jeromeku ashutosh-dwivedi-e3502 abrliu googleberk hyungsuklim ewaszyszka jirachikai xxxiyuki fursovia skp80 seokg ahatamiz data-science-ai-open-source amit2014 rajeev595 sts-sadr roysh ai-hub-deep-learning-fundamental patoalejor foromik manoja328 rachidbenzhair ratulghosh debasishg gabrielfritz minakhan01 liusongxiang tejank10 nikehop kranthikumarr valitovrus cherryyin jesbu1 tholiao jinwei1209 pavel-akapian mohammedalghamdi shanjgit warvito lawrencechen98 enginbozkurt mmirshekari zhanglang1860 be-redasmara deepmanupy 1287953547 chucknoelke amirstudy tcl326 salemameen pandinosaurus hsouporto pepsalehi nguyenducnhaty muralits98 ismail-30 aniketgurav dwtcourses blaxe05 codeaudit anu-bioinfo rishabh135 bipiniiith abollo hoanghungict eizaburo-doi gdsttian exjustice kschuerholt analyticsneu codeslime nsanghi bassndao-fork manikant92 earlwong-ai pranavmodi juyue sumanvid97 hellomickey deepaliverma tsivaguru wwymak leyangzhang ling-cai saharudra gnoparus ivanlima45

deepul's Issues

Question regarding solutions availability

Hi!

Thank you for an awesome course!

I am not a Berkeley student and have been following the course through lectures on YouTube. I am wondering whether homework solutions will be publicly available on the website?

clarification question

In HW2 q1, it asks to return 1000 samples without decoder noise computed as z \sim p(z), x = mu(z). What does mu here mean?

`optimizer.zero_grad` called after calculating loss instead of before in `lecture3_flow_models_demos.ipynb`.

In the lecture3_flow_models_demos.ipynb optimizer.zero_grad` is called after calculating the loss. i.e:

 model.train()
  for x in train_loader:
      x = x.float()
      loss = model.nll(x)
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

In other implementation, the standard is to call optimizer.zero_grad before forward propagation. e.g

optimizer.zero_grad()
y = model(x)
loss = criterion(y, y_true)
loss.backward()
optimizer.step()

Changing the implementation to before calculating the loss, breaks the model and the model fails to learn anything. I cannot seem to understand why the model has that behavior.

HW1 solutions Discretized Mixture of Logistics Parameter initialization confusion

Hi,
I was going through hw1_solutions at https://github.com/rll/deepul/blob/master/homeworks/solutions/hw1_solutions.ipynb and I got confused at the Discretized Mixture of Logistics implementation.
In the _init_ function of the class, the parameters(means, log_scales, logits) are all initialized in a different way. I couldn't understand how those initializations were decided. For example, why don't we declare all of these with torch.randn() ?

I will appreaciate any help. Thank you for your time.

Flows demo fails because of missing `.to(ptu.device)`

The AutoregressiveFlow and RealNVP cells fails with the error message copied at the end of the message. I ran all the cells sequentially from the beginning.

If I add real_nvp = real_nvp.to(ptu.device), all works fine:

real_nvp = RealNVP([AffineTransform("left", n_hidden=2, hidden_size=64),
                    AffineTransform("right", n_hidden=2, hidden_size=64),
                    AffineTransform("left", n_hidden=2, hidden_size=64),
                    AffineTransform("right", n_hidden=2, hidden_size=64)],
                   train_loader.dataset, 'moons', train_labels)
real_nvp = real_nvp.to(ptu.device) # <-- ADDED THIS LINE 
train_losses, test_losses = train_epochs(real_nvp, train_loader, test_loader, dict(epochs=250, lr=5e-3, epochs_to_plot=[0, 3, 6, 10, 25, 249]))

Error messages:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-54-8b25cbd7d1cc> in <module>
      1 ar_flow = AutoregressiveFlow(train_loader.dataset, 'moons', train_labels)
----> 2 train_losses, test_losses = train_epochs(ar_flow, train_loader, test_loader, dict(epochs=100, lr=5e-3, epochs_to_plot=[0, 1, 3, 6, 10, 99]))

<ipython-input-41-5d16b72566e2> in train_epochs(model, train_loader, test_loader, train_args)
     37     for epoch in tqdm_notebook(range(epochs), desc='Epoch', leave=False):
     38         model.train()
---> 39         train(model, train_loader, optimizer)
     40         train_loss = eval_loss(model, train_loader)
     41         train_losses.append(train_loss)

<ipython-input-41-5d16b72566e2> in train(model, train_loader, optimizer)
      4     for x in train_loader:
      5         x = x.to(ptu.device).float()
----> 6         loss = model.nll(x)
      7         optimizer.zero_grad()
      8         loss.backward()

<ipython-input-52-b941007c468c> in nll(self, x)
    100 
    101     def nll(self, x):
--> 102         return - self.log_prob(x).mean()
    103 
    104     def plot(self, title):

<ipython-input-52-b941007c468c> in log_prob(self, x)
     96 
     97     def log_prob(self, x):
---> 98         z, log_det = self.flow(x)
     99         return (self.base_dist.log_prob(z) + log_det).sum(dim=1) # shape: [batch_size, dim]
    100 

<ipython-input-52-b941007c468c> in flow(self, x)
     92         x1, x2 = torch.chunk(x, 2, dim=1)
     93         z1, log_det1 = self.dim1_flow.flow(x1.squeeze())
---> 94         z2, log_det2 = self.dim2_flow.flow(x2, cond=x1)
     95         return torch.cat([z1.unsqueeze(1), z2.unsqueeze(1)], dim=1), torch.cat([log_det1.unsqueeze(1), log_det2.unsqueeze(1)], dim=1)
     96 

<ipython-input-52-b941007c468c> in flow(self, x, cond)
     33     def flow(self, x, cond):
     34         # parameters of flow on x depend on what it's conditioned on
---> 35         loc, log_scale, weight_logits = torch.chunk(self.mlp(cond), 3, dim=1)
     36         weights = F.softmax(weight_logits)
     37 

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

<ipython-input-52-b941007c468c> in forward(self, x)
     12 
     13     def forward(self, x):
---> 14         return self.layers(x)
     15 
     16 # same CDF flow as in Demo 1, but conditioned on an auxillary variable

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     65     @weak_script_method
     66     def forward(self, input):
---> 67         return F.linear(input, self.weight, self.bias)
     68 
     69     def extra_repr(self):

~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1350     if input.dim() == 2 and bias is not None:
   1351         # fused op is marginally faster
-> 1352         ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t())
   1353     else:
   1354         output = input.matmul(weight.t())

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #4 'mat1'

HW2 autoregressive flow for images solution issue

I'm pretty sure the log-likelihood for the solution to the second exercise is off. The nll is defined as:

  def nll(self, x, cond=None):
    loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
    weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
    log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1))
    return -log_det_jacobian.mean()

As you can see, the weights are never used. I believe this should be:

  def nll(self, x, cond=None):
    loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
    weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
    log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1)).exp()
    return -torch.log((log_det_jacobian * weights).sum(dim=2)).mean()

What is `self.loc` in `MixtureCDFFlow` class?

I am trying to implement the 1D flow as shown in Flow Models Demos (Official).ipynb. I can't figure out what is self.loc in the MixtureCDFFlow class. Also, what does self.n_components denote?
This is the code

 def flow(self, x):
        # set up mixture distribution
        weights = F.softmax(self.weight_logits, dim=0).unsqueeze(0).repeat(x.shape[0], 1)
        mixture_dist = self.mixture_dist(self.loc, self.log_scale.exp())
        x_repeat = x.unsqueeze(1).repeat(1, self.n_components)

        # z = cdf of x
        z = (mixture_dist.cdf(x_repeat) * weights).sum(dim=1)

        # log_det = log dz/dx = log pdf(x)
        log_det = (mixture_dist.log_prob(x_repeat).exp() * weights).sum(dim=1).log()

        return z, log_det

Some help is really appreciated.

Thanks

Is log_det in preprocess function useful? HW2

There is no parameters in the preprocess function, so I am wondering that calculating log_det in preprocess function is useful for train?

Where is HW 4?

Hi! I was wondering where HW4 is in this repository.

HW1 MADE | possible flaws in solution implementaiotn.

Hi,
first of all thank you for the course and provided solutions.

There are maybe a few flaws in MADE implementation but I can be wrong.

for l in range(num_hidden):
            self.m[l] = np.random.randint(
                self.m[l - 1].min(), self.nin - 1, size=self.hidden_sizes[l]
            )

When you initialize random numbers for masks it seems better to use permutation, otherwise we may zero more than we need. Let's say with a very low chance we can get the next m with all the same numbers.

Also, as I understand size of unique numbers in m for all m should be not smaller than 'nin -1', therefore len(m) should not be less than 'nin -1'. Maybe better to add some assertion for it.

Again,
I may be wrong and thank you for your great work.

Bug in MADE sampling implementation in 'hw1_solutions.ipynb'

//Implementation provided in the notebook

Suppose there are three variables x1,x2,x3 and the ordering given is 3,1,2 , then during sampling:
I)Desired sampling order should be x2,x3,x1
II)The above code sample in order x3,x1,x2

This bug is noticeable when random permutation is passed as ordering in function q2_b (passing ordering = None doesn't expose the bug):
model = MADE((1, H, W), 2, hidden_size=[512, 512],ordering = np.random.permutation(H*W)).cuda()

Result with the current bug on shape and MNIST samples (when ordering passed is random permutation)

So the slide 38 of Lecture 2 also needs to be edited.Random permutation doesn't generate such bad samples as shown in slides.Its the bug in the code
https://docs.google.com/presentation/d/1xl4KKNYw08PatORSnFDyzX6MT98brEdYElZKZ5bi2YM/edit#slide=id.g7d02b18e4d_0_191

Suggested Fix code:

Results with this code

Reporting bugs/errors in `lecture3_flow_models_demos.ipynb`

First of all thank you for making all of the lectures and other content public. This is really helpful.

I took a look at the demo implementations for lecture 3 and found some bugs which I am reporting here:

1. In `Demo 3`, the `.flow()` method in `class ConditionalMixtureCDFFlow(nn.Module):` has the following signature:

def flow(self, x, cond):

However when .flow() is called by .invert() method, the condition cond is not passed to .flow():

def invert(self, z, cond):
        # Find the exact x via bisection such that f(x) = z
        results = []
        for z_elem in z:
            def f(x):
                # SHOULD PASS `cond` in the line below
                return self.flow(torch.tensor(x).unsqueeze(0))[0] - z_elem
            x = bisect(f, -20, 20)
            results.append(x)
        return torch.tensor(results).reshape(z.shape)

2. In `Demo 4` the `.forward()` method of `MaskConv2d` never uses `cond` or `batch_size`:

class MaskConv2d(nn.Conv2d):
  def __init__(self, mask_type, *args, **kwargs):
    assert mask_type == 'A' or mask_type == 'B'
    super().__init__(*args, **kwargs)
    self.register_buffer('mask', torch.zeros_like(self.weight))
    self.create_mask(mask_type)

  def forward(self, input, cond=None):
    # batch_size AND cond ARE NEVER USED
    batch_size = input.shape[0]
    out = F.conv2d(input, self.weight * self.mask, self.bias, self.stride,
                   self.padding, self.dilation, self.groups)
    return out

So it has no effect when it is called by the .forward() method of AutoregressiveFlowPixelCNN like so:

      if isinstance(layer, MaskConv2d):
        out = layer(out, cond=cond)
      else:
        out = layer(out)

3. In `Demo 4` the `.nll()` method of AutoregressiveFlowPixelCNN does not take exponential of `log_prob` and use `weights` when calculating `log_det_jacobian`:

    loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
    weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
    log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1))
    return -log_det_jacobian.mean()

I think it should be something like:

log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1)).exp() * weights

I actually have lots of questions about why .nll() is implemented the way it is. Why the need to .unsqueeze(1).repeat(...) rather than just multiplying it the standard way? Where is the base_dist that forces the output of the transformed variables to have a known distribution?

Looking at the .sample() method it seems the weights are used to select mean and var for a sample. But how are they learnt? Regardless of how it's being used, the weights are not used in the .nll() function and thus should not be calculated.

Please let me know if I am missing something here.

ActNorm implementation missing division by `std` on the shift parameter

Hi,

Thanks for making the video lectures and homework public. I'm really enjoying the course so far. I was going through homework 2 and wanted to compare my stuff with the solutions. For the solution of hw2, I found the following implementation of ActNorm

class ActNorm(nn.Module):
    def __init__(self, n_channels):
        super(ActNorm, self).__init__()
        self.log_scale = nn.Parameter(torch.zeros(1, n_channels, 1, 1), requires_grad=True)
        self.shift = nn.Parameter(torch.zeros(1, n_channels, 1, 1), requires_grad=True)
        self.n_channels = n_channels
        self.initialized = False

    def forward(self, x, reverse=False):
        if reverse:
            return (x - self.shift) * torch.exp(-self.log_scale), self.log_scale
        else:
            if not self.initialized:
                self.shift.data = -torch.mean(x, dim=[0, 2, 3], keepdim=True)
                self.log_scale.data = - torch.log(
                    torch.std(x.permute(1, 0, 2, 3).reshape(self.n_channels, -1), dim=1).reshape(1, self.n_channels, 1,
                                                                                                 1))
                self.initialized = True
                result = x * torch.exp(self.log_scale) + self.shift
            return x * torch.exp(self.log_scale) + self.shift, self.log_scale

I think the shift needs to be divided by the standard deviation as follows for the activations to be normalized.

self.shift.data = -(torch.mean(x, dim=[0, 2, 3], keepdim=True) * torch.exp(self.log_scale)

Let me know if I'm missing something.

rll / deepul Goto Github PK

deepul's People

Contributors

Stargazers

Watchers

Forkers

deepul's Issues

1. In Demo 3, the .flow() method in class ConditionalMixtureCDFFlow(nn.Module): has the following signature:

2. In Demo 4 the .forward() method of MaskConv2d never uses cond or batch_size:

3. In Demo 4 the .nll() method of AutoregressiveFlowPixelCNN does not take exponential of log_prob and use weights when calculating log_det_jacobian:

Recommend Projects

Recommend Topics

Recommend Org

1. In `Demo 3`, the `.flow()` method in `class ConditionalMixtureCDFFlow(nn.Module):` has the following signature:

2. In `Demo 4` the `.forward()` method of `MaskConv2d` never uses `cond` or `batch_size`:

3. In `Demo 4` the `.nll()` method of AutoregressiveFlowPixelCNN does not take exponential of `log_prob` and use `weights` when calculating `log_det_jacobian`: