Comments (4)
- Yes, I believe the weights need to be accounted for when computing the log probability (using
logsumexp
would probably be the easiest / most numerically stable manner)
For 4., it is sampling from the learned distribution of x
. You're right in that under this flow formulation (with z ~ Uniform([0,1}) + outputting parameters for a MixtureCDF), it's equivalent to optimizing maximum likelihood. So, you can sample by either sampling from an autoregressive model the normal way, or sample by sampling z ~ Uniform([0,1])
and then inverting f
using bisection (like you mentioned).
Side-note: It is a valid flow, and the terms do relate to the actual jacobian diagonal terms, as the log_derivative of the CDF is the log of the PDF. Note that if our target z
distribution was not uniform, and, say, normal, then it would not be equivalent to maximum likelihood anymore.
I'm not sure what you mean by original distribution
- is that z
or x
? x
is probably the more common one like samples shown in RealNVP / Glow / VideoFlow. But flows have other uses such as learning more complex distributions (i.e. Inverse Autoregressive Flows and Variational Lossy Autoencoder)
from deepul.
Where is the
base_dist
that forces the output of the transformed variables to have a known distribution?
I think I understand this part. Since z
is uniform, its log_prob
is a constant. Thus log x = constant + log_det_jacobian
. Thus maximizing log x
is the same as maximizing log_det_jacobian
.
from deepul.
Hi @vinsis - some thoughts on the bugs mentioned above.
-
I think
cond
can be used when you have labels, such as digit labels for MNIST. There is an example in Autoregressive models demo on this idea. -
Yes, since log (Uniform) is constant, it is not considered in
nll()
. -
Sampling made no sense to me. What are we sampling here? Is it
z
orx
? I'm assuming it isx
we are sampling because the sampled images look very close to the original. Had we sampledz
, it would (should!) look like points sampled fromUnif(0, 1) X Unif(0, 1)
but that's not the case here.
-- the function samples fromtorch.normal()
instead ofNormalCDF()
. Why? Shouldn't we sample fromMixtureCDFNormal()
? And here, we getz
, notx
-- to getx
, we have to implementinvert_flow()
(and therefore,flow()
inAutoRegressivePixelCNN
class) and usescipy.optimise.bisect()
onz
obtained previously.
I'm definitely missing something here -- because the output ofdemo4_save_results()
is strikingly good.
I think the model is just PixelCNN + Mixture of Gaussian. It's not a "flow" as such. In the loss function, log_det_jacobian
is not really related to any Jacobian, but just nll( MixOfGaussian() )
. If this is the case, the sampling makes sense too. There is no z.
In general, I am curious if flows are ever used to sample from original distribution at all! They are good for inference, and plotting pdf(x)
(pdf(z) * det J
). Maybe we should look at Inverse Autoregressive Flows?
Pinging @wilson1yan @alexlioralexli for help. Thanks!
from deepul.
The way log prob is calculated seems faulty to me @wilson1yan
After the step loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
, each of loc
, log_scale
and weight_logits
has the shape (batch_size, n_components, num_channels, height, width)
But when calculating the log_prob, x
is manipulated like so:
x.unsqueeze(1).repeat(1,1,self.n_components,1,1)
which makes the shape of x
to (batch_size, 1, n_components * num_channels, height, width)
(assuming initially x
had shape (batch_size, num_channels, height, width)
)
This makes n_components
multiplied twice in two dimensions due to broadcasting. I believe just the operation x.unsqueeze(1)
should suffice.
I implemented this flow myself and am struggling to get decent results. So I won't be surprised if I missed something elementary.
Edit: Just ran the notebook myself and x.unsqueeze(1)
indeed works. Initially I was getting poor results for another reason.
from deepul.
Related Issues (10)
- Bug in MADE sampling implementation in 'hw1_solutions.ipynb' HOT 1
- `optimizer.zero_grad` called after calculating loss instead of before in `lecture3_flow_models_demos.ipynb`.
- HW2 autoregressive flow for images solution issue HOT 3
- HW1 solutions Discretized Mixture of Logistics Parameter initialization confusion HOT 2
- Is log_det in preprocess function useful? HW2 HOT 1
- HW1 MADE | possible flaws in solution implementaiotn. HOT 2
- Flows demo fails because of missing `.to(ptu.device)` HOT 2
- ActNorm implementation missing division by `std` on the shift parameter
- What is `self.loc` in `MixtureCDFFlow` class? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepul.