deepul's People
Forkers
edwardcen siamakz frankfan007 lelayf allensmile mods333 olivetom hdocmsu aashiqmuhamed theaeroes cleysonl jizongfox davidyaonanzhu jeromeku ashutosh-dwivedi-e3502 abrliu googleberk hyungsuklim ewaszyszka jirachikai xxxiyuki fursovia skp80 seokg ahatamiz data-science-ai-open-source amit2014 rajeev595 sts-sadr roysh ai-hub-deep-learning-fundamental patoalejor foromik manoja328 rachidbenzhair ratulghosh debasishg gabrielfritz minakhan01 liusongxiang tejank10 nikehop kranthikumarr valitovrus cherryyin jesbu1 tholiao jinwei1209 pavel-akapian mohammedalghamdi shanjgit warvito lawrencechen98 enginbozkurt mmirshekari zhanglang1860 be-redasmara deepmanupy 1287953547 chucknoelke amirstudy tcl326 salemameen pandinosaurus hsouporto pepsalehi nguyenducnhaty muralits98 ismail-30 aniketgurav dwtcourses blaxe05 codeaudit anu-bioinfo rishabh135 bipiniiith abollo hoanghungict eizaburo-doi gdsttian exjustice kschuerholt analyticsneu codeslime nsanghi bassndao-fork manikant92 earlwong-ai pranavmodi juyue sumanvid97 hellomickey deepaliverma tsivaguru wwymak leyangzhang ling-cai saharudra gnoparus ivanlima45deepul's Issues
HW2 autoregressive flow for images solution issue
I'm pretty sure the log-likelihood for the solution to the second exercise is off. The nll is defined as:
def nll(self, x, cond=None):
loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1))
return -log_det_jacobian.mean()
As you can see, the weights are never used. I believe this should be:
def nll(self, x, cond=None):
loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1)).exp()
return -torch.log((log_det_jacobian * weights).sum(dim=2)).mean()
Question regarding solutions availability
Hi!
Thank you for an awesome course!
I am not a Berkeley student and have been following the course through lectures on YouTube. I am wondering whether homework solutions will be publicly available on the website?
What is `self.loc` in `MixtureCDFFlow` class?
I am trying to implement the 1D flow as shown in Flow Models Demos (Official).ipynb
. I can't figure out what is self.loc
in the MixtureCDFFlow
class. Also, what does self.n_components
denote?
This is the code
def flow(self, x):
# set up mixture distribution
weights = F.softmax(self.weight_logits, dim=0).unsqueeze(0).repeat(x.shape[0], 1)
mixture_dist = self.mixture_dist(self.loc, self.log_scale.exp())
x_repeat = x.unsqueeze(1).repeat(1, self.n_components)
# z = cdf of x
z = (mixture_dist.cdf(x_repeat) * weights).sum(dim=1)
# log_det = log dz/dx = log pdf(x)
log_det = (mixture_dist.log_prob(x_repeat).exp() * weights).sum(dim=1).log()
return z, log_det
Some help is really appreciated.
Thanks
Reporting bugs/errors in `lecture3_flow_models_demos.ipynb`
First of all thank you for making all of the lectures and other content public. This is really helpful.
I took a look at the demo implementations for lecture 3 and found some bugs which I am reporting here:
1. In Demo 3
, the .flow()
method in class ConditionalMixtureCDFFlow(nn.Module):
has the following signature:
def flow(self, x, cond):
However when .flow()
is called by .invert()
method, the condition cond
is not passed to .flow()
:
def invert(self, z, cond):
# Find the exact x via bisection such that f(x) = z
results = []
for z_elem in z:
def f(x):
# SHOULD PASS `cond` in the line below
return self.flow(torch.tensor(x).unsqueeze(0))[0] - z_elem
x = bisect(f, -20, 20)
results.append(x)
return torch.tensor(results).reshape(z.shape)
2. In Demo 4
the .forward()
method of MaskConv2d
never uses cond
or batch_size
:
class MaskConv2d(nn.Conv2d):
def __init__(self, mask_type, *args, **kwargs):
assert mask_type == 'A' or mask_type == 'B'
super().__init__(*args, **kwargs)
self.register_buffer('mask', torch.zeros_like(self.weight))
self.create_mask(mask_type)
def forward(self, input, cond=None):
# batch_size AND cond ARE NEVER USED
batch_size = input.shape[0]
out = F.conv2d(input, self.weight * self.mask, self.bias, self.stride,
self.padding, self.dilation, self.groups)
return out
So it has no effect when it is called by the .forward()
method of AutoregressiveFlowPixelCNN
like so:
if isinstance(layer, MaskConv2d):
out = layer(out, cond=cond)
else:
out = layer(out)
3. In Demo 4
the .nll()
method of AutoregressiveFlowPixelCNN does not take exponential of log_prob
and use weights
when calculating log_det_jacobian
:
loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1))
return -log_det_jacobian.mean()
I think it should be something like:
log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1)).exp() * weights
I actually have lots of questions about why .nll()
is implemented the way it is. Why the need to .unsqueeze(1).repeat(...)
rather than just multiplying it the standard way? Where is the base_dist
that forces the output of the transformed variables to have a known distribution?
Looking at the .sample()
method it seems the weights
are used to select mean
and var
for a sample. But how are they learnt? Regardless of how it's being used, the weights
are not used in the .nll()
function and thus should not be calculated.
Please let me know if I am missing something here.
HW1 MADE | possible flaws in solution implementaiotn.
Hi,
first of all thank you for the course and provided solutions.
There are maybe a few flaws in MADE implementation but I can be wrong.
for l in range(num_hidden):
self.m[l] = np.random.randint(
self.m[l - 1].min(), self.nin - 1, size=self.hidden_sizes[l]
)
When you initialize random numbers for masks it seems better to use permutation, otherwise we may zero more than we need. Let's say with a very low chance we can get the next m with all the same numbers.
Also, as I understand size of unique numbers in m for all m should be not smaller than 'nin -1', therefore len(m) should not be less than 'nin -1'. Maybe better to add some assertion for it.
Again,
I may be wrong and thank you for your great work.
Bug in MADE sampling implementation in 'hw1_solutions.ipynb'
//Implementation provided in the notebook
Suppose there are three variables x1,x2,x3 and the ordering given is 3,1,2 , then during sampling:
I)Desired sampling order should be x2,x3,x1
II)The above code sample in order x3,x1,x2
This bug is noticeable when random permutation is passed as ordering in function q2_b (passing ordering = None doesn't expose the bug):
model = MADE((1, H, W), 2, hidden_size=[512, 512],ordering = np.random.permutation(H*W)).cuda()
Result with the current bug on shape and MNIST samples (when ordering passed is random permutation)
So the slide 38 of Lecture 2 also needs to be edited.Random permutation doesn't generate such bad samples as shown in slides.Its the bug in the code
https://docs.google.com/presentation/d/1xl4KKNYw08PatORSnFDyzX6MT98brEdYElZKZ5bi2YM/edit#slide=id.g7d02b18e4d_0_191
ActNorm implementation missing division by `std` on the shift parameter
Hi,
Thanks for making the video lectures and homework public. I'm really enjoying the course so far. I was going through homework 2 and wanted to compare my stuff with the solutions. For the solution of hw2, I found the following implementation of ActNorm
class ActNorm(nn.Module):
def __init__(self, n_channels):
super(ActNorm, self).__init__()
self.log_scale = nn.Parameter(torch.zeros(1, n_channels, 1, 1), requires_grad=True)
self.shift = nn.Parameter(torch.zeros(1, n_channels, 1, 1), requires_grad=True)
self.n_channels = n_channels
self.initialized = False
def forward(self, x, reverse=False):
if reverse:
return (x - self.shift) * torch.exp(-self.log_scale), self.log_scale
else:
if not self.initialized:
self.shift.data = -torch.mean(x, dim=[0, 2, 3], keepdim=True)
self.log_scale.data = - torch.log(
torch.std(x.permute(1, 0, 2, 3).reshape(self.n_channels, -1), dim=1).reshape(1, self.n_channels, 1,
1))
self.initialized = True
result = x * torch.exp(self.log_scale) + self.shift
return x * torch.exp(self.log_scale) + self.shift, self.log_scale
I think the shift
needs to be divided by the standard deviation as follows for the activations to be normalized.
self.shift.data = -(torch.mean(x, dim=[0, 2, 3], keepdim=True) * torch.exp(self.log_scale)
Let me know if I'm missing something.
Flows demo fails because of missing `.to(ptu.device)`
The AutoregressiveFlow
and RealNVP
cells fails with the error message copied at the end of the message. I ran all the cells sequentially from the beginning.
If I add real_nvp = real_nvp.to(ptu.device)
, all works fine:
real_nvp = RealNVP([AffineTransform("left", n_hidden=2, hidden_size=64),
AffineTransform("right", n_hidden=2, hidden_size=64),
AffineTransform("left", n_hidden=2, hidden_size=64),
AffineTransform("right", n_hidden=2, hidden_size=64)],
train_loader.dataset, 'moons', train_labels)
real_nvp = real_nvp.to(ptu.device) # <-- ADDED THIS LINE
train_losses, test_losses = train_epochs(real_nvp, train_loader, test_loader, dict(epochs=250, lr=5e-3, epochs_to_plot=[0, 3, 6, 10, 25, 249]))
Error messages:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-54-8b25cbd7d1cc> in <module>
1 ar_flow = AutoregressiveFlow(train_loader.dataset, 'moons', train_labels)
----> 2 train_losses, test_losses = train_epochs(ar_flow, train_loader, test_loader, dict(epochs=100, lr=5e-3, epochs_to_plot=[0, 1, 3, 6, 10, 99]))
<ipython-input-41-5d16b72566e2> in train_epochs(model, train_loader, test_loader, train_args)
37 for epoch in tqdm_notebook(range(epochs), desc='Epoch', leave=False):
38 model.train()
---> 39 train(model, train_loader, optimizer)
40 train_loss = eval_loss(model, train_loader)
41 train_losses.append(train_loss)
<ipython-input-41-5d16b72566e2> in train(model, train_loader, optimizer)
4 for x in train_loader:
5 x = x.to(ptu.device).float()
----> 6 loss = model.nll(x)
7 optimizer.zero_grad()
8 loss.backward()
<ipython-input-52-b941007c468c> in nll(self, x)
100
101 def nll(self, x):
--> 102 return - self.log_prob(x).mean()
103
104 def plot(self, title):
<ipython-input-52-b941007c468c> in log_prob(self, x)
96
97 def log_prob(self, x):
---> 98 z, log_det = self.flow(x)
99 return (self.base_dist.log_prob(z) + log_det).sum(dim=1) # shape: [batch_size, dim]
100
<ipython-input-52-b941007c468c> in flow(self, x)
92 x1, x2 = torch.chunk(x, 2, dim=1)
93 z1, log_det1 = self.dim1_flow.flow(x1.squeeze())
---> 94 z2, log_det2 = self.dim2_flow.flow(x2, cond=x1)
95 return torch.cat([z1.unsqueeze(1), z2.unsqueeze(1)], dim=1), torch.cat([log_det1.unsqueeze(1), log_det2.unsqueeze(1)], dim=1)
96
<ipython-input-52-b941007c468c> in flow(self, x, cond)
33 def flow(self, x, cond):
34 # parameters of flow on x depend on what it's conditioned on
---> 35 loc, log_scale, weight_logits = torch.chunk(self.mlp(cond), 3, dim=1)
36 weights = F.softmax(weight_logits)
37
~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
<ipython-input-52-b941007c468c> in forward(self, x)
12
13 def forward(self, x):
---> 14 return self.layers(x)
15
16 # same CDF flow as in Demo 1, but conditioned on an auxillary variable
~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
---> 92 input = module(input)
93 return input
94
~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
65 @weak_script_method
66 def forward(self, input):
---> 67 return F.linear(input, self.weight, self.bias)
68
69 def extra_repr(self):
~/.virtualenvs/deepul/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
1350 if input.dim() == 2 and bias is not None:
1351 # fused op is marginally faster
-> 1352 ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t())
1353 else:
1354 output = input.matmul(weight.t())
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #4 'mat1'
Where is HW 4?
Hi! I was wondering where HW4 is in this repository.
`optimizer.zero_grad` called after calculating loss instead of before in `lecture3_flow_models_demos.ipynb`.
In the lecture3_flow_models_demos.ipynb optimizer.zero_grad` is called after calculating the loss. i.e:
model.train()
for x in train_loader:
x = x.float()
loss = model.nll(x)
optimizer.zero_grad()
loss.backward()
optimizer.step()
In other implementation, the standard is to call optimizer.zero_grad
before forward propagation. e.g
optimizer.zero_grad()
y = model(x)
loss = criterion(y, y_true)
loss.backward()
optimizer.step()
Changing the implementation to before calculating the loss, breaks the model and the model fails to learn anything. I cannot seem to understand why the model has that behavior.
HW1 solutions Discretized Mixture of Logistics Parameter initialization confusion
Hi,
I was going through hw1_solutions at https://github.com/rll/deepul/blob/master/homeworks/solutions/hw1_solutions.ipynb and I got confused at the Discretized Mixture of Logistics implementation.
In the _init_ function of the class, the parameters(means, log_scales, logits) are all initialized in a different way. I couldn't understand how those initializations were decided. For example, why don't we declare all of these with torch.randn() ?
I will appreaciate any help. Thank you for your time.
Is log_det in preprocess function useful? HW2
There is no parameters in the preprocess function, so I am wondering that calculating log_det in preprocess function is useful for train?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.