probabilists / zuko Goto Github PK

View Code? Open in Web Editor NEW

294.0 294.0 22.0 297 KB

Normalizing flows in PyTorch

Home Page: https://zuko.readthedocs.io

License: MIT License

Python 100.00%

deep-learning density-estimation generative-model normalizing-flows probability torch

zuko's People

Contributors

Stargazers

Watchers

zuko's Issues

It seems the `cdf()` in `DiagNormal` has not been implemented?

Running the following lines

import zuko
model = zuko.flows.GF(3)
print(model.base().cdf(0.))

I got a NotImplementedError in torch\distributions\distribution.py.

Plus, I wanted to truncate the base in flows.GF, but I failed to manage it.

import zuko
model = zuko.flows.GF(3)
model.base = zuko.distributions.Truncated(model.base(), -1., 1.)
samples = model().rsample((32,))
print(samples)

Running the code above I got

File "D:\Program Files\Python3.11.4\Lib\site-packages\zuko\distributions.py", line 458, in __init__
    assert not base.event_shape, "'base' has to be univariate"
AssertionError: 'base' has to be univariate

I am new in normalization flows so I don't know how to correct this. Any help is appreciated.

Do NSF use coupling layers or autoregressive layers?

Description

I'm a bit confused about the NSF module of the package. Does it use autoregressive transforms or coupling layers (as in the original NSF paper)? The NSF are signficantly faster at log_prob, so I'm guessing it is autoregressive layers. If so, are coupling layers currently implemented in the toolbox?

Reproduce

import torch
import zuko
import time

dim_density = 200

x = torch.randn(dim_density)
y = torch.randn(5)

# Neural spline flow (NSF) with 3 transformations
flow = zuko.flows.NSF(dim_density, 5, transforms=1, hidden_features=[128] * 2)

# Sample 64 points x ~ p(x | y)
start_time = time.time()
x = flow(y).sample((64,))
print(time.time() - start_time)  # takes about 1 second

# Evaluate log p(x | y)
start_time = time.time()
log_p = flow(y).log_prob(x)
print(time.time() - start_time)  # takes about 0.014 seconds

Thanks a lot for your help!

Add covariance_type option to GMM class

Description

Currently, the zuko.flows.mixture.GMM class only supports full covariance matrices. However, there are a number of use cases (especially high-dimensional) where a full covariance matrix is either not needed or infeasible to estimate. This issue proposes to add the option to choose between different covariance matrix types similar to sklearn.mixture.GaussianMixture

Here is an example of how different covariance types approximate a mixture of 3 Gaussians with varying covariance matrices.

Implementation

The current structure of the GMM zuko.flows.mixture.GMM class makes it very easy to add the above mentioned enhancements. I have implemented the changes in a fork of the repository and could open a pull request if this change is wanted. I have only tested the code for the unconditional case, but I do not see any way I could break it when adding context features.

Further improvements

When generating the above figure, I (again) realised how easily mode collapse happens for GMMs. The zuko.flows.mixture.GMM class could, therefore, also benefit from some sort of initialisation procedure, again, similar to sklearn.mixture.GaussianMixture. I fully understand if that goes beyond the scope of what Zuko wants to achieve. The benefit is that Zuko is very convenient to use and ties in so well with Pytorch code that having such a procedure here could be nice. However, it might add another dependency (e.g., sklearn) if you want to use existing implementations of initialisation algorithms. I have some basic implementation of this (using sklearn) lying around and would be happy to polish it up and make another commit if this is wanted.

Add support for custom adjacency matrices in autoregressive models

Description

I would like to enable support for autoregressive models (that is, MAF, NSF, and NAF) to use a specific adjacency matrix (rather than an ordering), which could be sparser than the triangular matrix obtained from the ordering. In case an adjacency matrix is provided, all layers would use that adjacency matrix for the masking, instead of using random permutations between layers.

The motivation for this feature is that sometimes adding extra structure to the normalizing flows can be really useful in practice. Past year, we published a paper showing the benefits of such an architecture in the context of causal generative models, and our source code actually adapted Zuko to use adjacency matrix for this purpose.

Since the changes required are not many, I thought I could make a pull request with this feature so that we can reuse Zuko for future projects with much more ease (since the version from our repository is based un Zuko v0.2.2).

Implementation

The implementation would follow that from our repository, and it basically consists in adding the adjacency matrix as another parameter of the constructor, and use it if provided, rather than building the matrix from the ordering.

I have a pull-request ready, which I will link to this issue after submitting it.

Alternatives

I did not think of any alternatives, since: i) the functionality is already implemented, the only real barrier is that it was not a parameter of the constructor; and ii) we have a working and tested implementation.

Batch normalization

I've found the Zuko library to be extremely beneficial for my work. I sincerely appreciate the effort that has gone into its development. In the Masked Autoregressive Flow paper (NeurIPS, 2017), the authors incorporated batch normalization following each autoregressive layer. Could this modification be integrated into the MaskedAutoregressiveTransform function?

Requested flow architectures

Description

This issue tracks the flow/transformation architectures that are requested and/or implemented. You are welcome to request new architectures. If you wish to contribute to Zuko, this list is the perfect place to start.

Typically, implementing a new flow architecture consists in adding a new transformation in zuko/transforms.py and a new class in the zuko/flows/ directory. NSF and SOSPF are great examples. Sometimes, a special LazyTransform needs to be implemented to take into account the specificities of the architecture. NAF and GF are such examples.

List of requested flows

Add Glow-like multi-scale flow

Description

Glow is multi-scale normalizing flow based on affine coupling transforms introduced in Glow: Generative Flow with Invertible 1x1 Convolutions (Kingma et al., 2018).

Implementation

Due to its multi-scale nature, it is hard to translate the official implementation of Glow as a base distribution and a series of transformations. In particular, the shape and number of features change along the flow.

A first solution would be to define a MultiScaleNormalizingFlow distribution that would handle the removal of features in the log_prob method. Another solution would be to define a non-bijective transformation that would drop features in the forward direction and add random features in the inverse direction.

Remove star imports from init modules

Description

Hello @francois-rozet,

This should take up our discussion on the usage of star imports in #38:

Using star imports in Python is very convenient and saves you from typing long module names repeatedly if you need to use multiple entities from a module.
However, from my experience, star imports can cause naming conflicts if two modules have entities with the same name. Importing everything from a module can make it difficult to determine which specific entities are being used in the code. This can make code more error-prone and harder to debug. Additionally, Linters like ruff cant warn you about unused imports when start imports are used.

Hence, I would suggest removing them from the init modules for the reasons mentioned above, at the cost of slightly higher maintenance.

Implementation

I did originally eliminate the use of all start import in commit 08d3f43.
If desired, I could cherry-pick zuko/flows/__init__.py from commit 08d3f43 and remove the ignores from pyprojetc.toml in a new PR.

Alternatives

Leave everything as it is now.

Add circular spline flow

Description

The circular spline (CS) transformation was introduced in Normalizing Flows on Tori and Spheres (Rezende et al., 2020). It defines a rational-quadratic spline to the half-open interval $[0, 2\pi[$. Several CS transformations can be combined to obtain expressive autoregressive flows on the tori.

Implementation

The authors do not provide an official implementation, but the MonotonicRQSTransform should easy to adapt. To improve expressiveness, spline transformations should be mixed with phase translations $(\theta + \pi) \bmod 2\pi$. The prior of the neural circular spline flow (NCSF) should be uniform.

Note that the CS transformation can be equivalently defined over the half-open interval $[-\pi, \pi]$.

Sampling from a NAF gives inf

Description

Thank you for the package, which is useful for my current internship with @plcodrigues. I however encounter an issue, I would like to know if I am using your package correctly.
I am trying to reproduce the code from D. Ward et al., "Robust Neural Posterior Estimation and Statistical Model Criticism" and at one point I need to train an unconditional NAF on a two-dimensional training set. But once trained, sampling from the trained flow often produces 'inf'.

Reproduce

I found out this code reproduces the issue:

import torch
import zuko

train_set = torch.distributions.Normal(0,25).sample((10_000,2))
train_set = (train_set - train_set.mean(0))/train_set.std(0)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=256, shuffle=True)
flow = zuko.flows.NAF(features=2,context=0) #Unconditional flow
optimizer = torch.optim.AdamW(flow.parameters(), lr=1e-3)

for x in train_loader :
    loss = -flow().log_prob(x) 
    loss = loss.mean()

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(flow().sample((10_000,)).isinf().any())

Environment

Zuko version: 0.2.0
PyTorch version: 2.0.0+cu117
Python version: 3.8.16
OS: Windows 10

NAF reported log-likelihood values are excesively high

Description

Now that I am revisiting my experience with Zuko, I have remembered that we did not use NAF in our work since, during testing, we highly suspected that there was a bug in NAF and did not feel confident using it.

This is how we discovered it (back in Zuko 0.2.0, maybe it is fixed by now): we had different synthetic experiments for which we knew the ground-truth data-generating process, and we could compute the real log-likelihood of the data. When we fitted a NAF model using data from this model, the reported log-likelihoods reported by NAF were orders of magnitude larger than the ground-truth (which cannot happen since the KL is always non-negative). This behaviour was not observed with MAF or NSF, where they reported values slightly lower than the true log-likelihoods.

(I will update the issue when I find time, but I guessed it is useful to raise the issue despite being incomplete.)

Reproduce

TBD, but it can be tested by substituting MAF by NAF in any of the synthetic experiments of this repository.

Expected behavior

The log-likelihoods computed by the flow should be lower (given enough data) than that of the model that generated the data.

Causes and solution

I am uncertain of what could be the reason, but it should be somewhere in the log-likelihood computation.

Environment

Zuko version: 0.2.0
PyTorch version: 1.12.0
Python version: 3.8
OS: Ubuntu 22.04

Initialization takes very long

Description

Thanks a lot for the package, the API is lovely! Unfortunately, instantiating flows in high-D takes very long. Is there a known reason for this and is there a simple trick to speed it up? Thanks a lot for your help!

Reproduce

import torch
import zuko

dim_density = 1000

x = torch.randn(dim_density)
y = torch.randn(5)

flow = zuko.flows.NSF(dim_density, 5, transforms=1, hidden_features=[128] * 2)

This code takes about 3 minutes to run. The following is instant in nflows:

from nflows import transforms, distributions, flows

dim_density = 1000

transform = transforms.CompositeTransform([
    transforms.MaskedPiecewiseRationalQuadraticAutoregressiveTransform(
        features=dim_density, 
        hidden_features=128,
        tails="linear",
        tail_bound=3.0,
    ),
    transforms.RandomPermutation(features=dim_density)
])
base_distribution = distributions.StandardNormal(shape=[dim_density])
flow = flows.Flow(transform=transform, distribution=base_distribution)

Environment

Zuko version: cutting edge github version
PyTorch version: '1.12.0+cu102'
Python version: 3.9.5
OS: Ubuntu 20.04

Allow vanilla `Transform` in `FlowModule`

Description

Using the library would be simpler if one could easily add existing transforms like torch.distributions.ExpTransform() like this:

myflow.transforms.insert(0, ExpTransform())

or by directly passing them to the FlowModule constructor.

Implementation

This initializer should accept more types:

https://github.com/francois-rozet/zuko/blob/4fba6620f06e771a98e01e980d621e9889858f73/zuko/flows.py#L80

One possibility would be to just do change this line:

https://github.com/francois-rozet/zuko/blob/4fba6620f06e771a98e01e980d621e9889858f73/zuko/flows.py#L97

to something like

transform = ComposedTransform(*(
    # Only perform conditioning if we do not have a transform already
    t if isinstance(t, Transform) else t(y)
    for t in self.transforms
))

Alternatives

One can wrap it into a zuko.flows.Unconditional as shown here or by using zuko.flows.Unconditional(lambda _: transform), but this has several drawbacks:

If the transform has learnable parameters, they are not registered when calling my_flow.parameters(). This was my main pain point. You can register this in a separate and else unused attribute of the flow to circumvent this, but it feels more like a hack.
It bloats the code and is kind of unexpected to new users.
Dpending on how the callback is implemented, this might create many transforms that are not really needed to be re-created. I am unsure whether this is an actual perormance issue, but there are probably transforms that you do not want to re-create too often.

Add Extrapolation to Bernstein Polynomial Flow

Description

As discussed in #32, we now implemented linear extrapolation outside the bounds of the Bernstein Polynomial.
The feature becomes active if linear=True.

Here is a simple Python script to visualize the resulting effect:

# %% Imports
import torch

from matplotlib import pyplot as plt
from zuko.transforms import BernsteinTransform

# %% Globals
M = 10
batch_size = 10
torch.manual_seed(1)
theta = torch.rand(size=(M,)) * 500  # creates a random parameter vector

# %% Sigmoid
bpoly = BernsteinTransform(theta=theta, linear=False)

x = torch.linspace(-15, 15, 2000)
y = bpoly(x)

adj = bpoly.log_abs_det_jacobian(x, y).detach()
J = torch.diag(torch.autograd.functional.jacobian(bpoly, x)).abs().log()

# %% Plot
fig, axs = plt.subplots(2, sharex=True)
fig.suptitle("Bernstein polynomial with Sigmoid")
axs[0].plot(x, y, label="Bernstein polynomial")
axs[0].scatter(
    torch.linspace(-10, 10, bpoly.order + 1),
    bpoly.theta.numpy().flatten(),
    label="Bernstein coefficients",
)
axs[0].legend()
axs[1].plot(x, adj, label="ladj")
# axs[1].scatter(
#     torch.linspace(-10, 10, bpoly.order),
#     bpoly.dtheta.numpy().flatten(),
#     label="dtheta",
# )
axs[1].plot(x, J, label="ladj (autograd)")
axs[1].legend()
fig.tight_layout()
fig.savefig("sigmoid.png")

# %% Extrapolataion
bpoly = BernsteinTransform(theta=theta, linear=True)

x = torch.linspace(-15, 15, 2000)
y = bpoly(x)

adj = bpoly.log_abs_det_jacobian(x, y).detach()
J = torch.diag(torch.autograd.functional.jacobian(bpoly, x)).abs().log()

# %% Plot

fig, axs = plt.subplots(2, sharex=True)
fig.suptitle("Bernstein polynomial with linear extrapolation")
axs[0].plot(x, y, label="Bernstein polynomial")
axs[0].scatter(
    torch.linspace(-10, 10, bpoly.order + 1),
    bpoly.theta.numpy().flatten(),
    label="Bernstein coefficients",
)
axs[0].legend()
axs[1].plot(x, adj, label="ladj")
# axs[1].scatter(
#     torch.linspace(-10, 10, bpoly.order),
#     bpoly.dtheta.numpy().flatten(),
#     label="dtheta",
# )
axs[1].plot(x, J, label="ladj (autograd)")
axs[1].legend()
fig.tight_layout()
fig.savefig("linear.png")

This makes the BPF Implementation more robust to data laying outside the domain.of the Bernstein Polynomial,without the need for thenon linear sigmoid function.

@oduerr Do you have anything else to add?

Implementation

The implementation can be found in the bpf_extrapolation branch of my fork.

My changes specifically include

Optional linear extrapolation in the call method
Custom Implementation of log_abs_det_jacobian since the gradient dos not seam to pass through the torch.where statement and to improve numerical stability ind the sigmoidal case.

Add dropout to mitigate overfitting

Description

Hello! Thanks for all of the work in this library. I was wondering if there was a straightforward way of including dropout for some of the neural networks for conditional flows. I am currently implementing an application that uses them and I am having some trouble dealing with overfitting.

Implementation

Not sure exactly how to go about it here.

Add sum-of-squares polynomial flow

Description

The sum-of-squares polynomial flow is an interpretable and universal normalizing flow that was introduced in Sum-of-Squares Polynomial Flow (Jaini et al.). The idea is to model the first derivative of the transformations as the sum of squared polynomials, thereby enforcing monotonicity.

Implementation

The authors do not provide an official implementation. The transformation is not analytically invertible, but can be approximated using the bisection method if the features are bounded. The transformation could inherit from UnconstrainedMonotonicTransform.

Understanding SoftclipTransform

Description

I'm trying to limit the output domain of a normalizing flow. I noticed that there is a zuko.transforms.SoftclipTransform transformation that could be useful for my case.

Reproduce

This is what I have tried:

class MyFlow(zuko.flows.FlowModule):
    def __init__(
        self,
        features: int,
        context: int = 0,
        transforms: int = 3,
        randperm: bool = False,
        **kwargs,
    ):
        orders = [
            torch.arange(features),
            torch.flipud(torch.arange(features)),
        ]

        transforms = [
            zuko.flows.MaskedAutoregressiveTransform(
                features=features,
                context=context,
                order=torch.randperm(features) if randperm else orders[i % 2],
                **kwargs,
            )
            for i in range(transforms)
        ]

        base = zuko.flows.Unconditional(
            zuko.distributions.DiagNormal,
            torch.zeros(features),
            torch.ones(features),
            buffer=True,
        )
        
        transforms.append(zuko.flows.Unconditional(zuko.transforms.SoftclipTransform))

        super().__init__(transforms, base)

Expected behavior

I would expect the samples to be contrained in the [-5, 5] domain, since that is the default argument of SoftClipTransform.

I have two questions:

Why is the domain not limited to [-5, 5]?

flow = MyFlow(1, 1, transforms=5, hidden_features=[50])
samples = flow(torch.rand(1)).sample((10000,))
plt.hist(samples.numpy(), bins=100);

How can I change the bound argument of the transformation? I noticed that the transformation class is passed to the flow, instead of its instance, so I'm not sure how one can change the initialization arguments.

Generating samples with their log-probability

Description

There a some use-cases (at least I have one), where some also needs the ladj,
while calculating the inverse operation. For my use-case I am not only using the normalizing flow
to generate samples, but also want to know the likelihood of the produced samples.

Implementation

Implementation could be somewhat hard, because in principle every transformation would need to include another
method inverse_and_ladj. I am willing to help and contribute pull requests for that. My main focus at the moment are neural spline flows.

I'm mainly opening this issue to figure out if this is wanted, and one has to decide how to introduce the functionality in the consuming classes of the transformations.

Add tutorials

Description

The documentation should provide tutorials for common use cases, such as creating a custom (coupling/autoregressive) flow, training with the forward and backward KL, performing importance sampling, adding preprocessing transformations, and maybe training a CNF with a flow-matching loss.

These tutorials can be Jupyter notebooks and can either be included in the documentation with myst-nb or linked from the repository.

Citing this package

This package has been very useful for my research and I would love to credit it. Would it be possible to create a Zenodo release to cite it properly? Thanks!

probabilists / zuko Goto Github PK

zuko's People

Contributors

Stargazers

Watchers

Forkers

zuko's Issues

Description

Reproduce

Description

Implementation

Further improvements

Description

Implementation

Alternatives

Description

List of requested flows

Description

Implementation

Description

Implementation

Alternatives

Description

Implementation

Description

Reproduce

Environment

Description

Reproduce

Expected behavior

Causes and solution

Environment

Description

Reproduce

Environment

Description

Implementation

Alternatives

Description

Implementation

Description

Implementation

Description

Implementation

Description

Reproduce

Expected behavior

Description

Implementation

Description

Recommend Projects

Recommend Topics

Recommend Org