pimdh / relie Goto Github PK

Experiments for the AISTATS publication on Reparameterizing Distributions over Lie Groups

License: MIT License

Jupyter Notebook 4.58% Python 95.42%

relie's Introduction

ReLie

Official repository to reproduce experiments for the AISTATS-19 publication on Reparameterizing Distributions over Lie Groups [1] (arxiv). For a more intuitive understanding of our work take a look at the presentation slides prepared for our talk in Okinawa.

From left to right, examples of SO(2), SO(3), and SE(3) group actions.

Implementation

We implement the code for SO(3) in PyTorch by building on the torch.distributions.transform framework. We extend this framework, as the Lie group exponential map is not a bijection, but a locally invertible function / a local diffeomorphism:

relie.LocalDiffeoTransform generalizes torch.distributions.transform.Transform
relie.LocalDiffeoTransformedDistribution generalizes torch.distributions.transform.TransformedDistribution

The simplest way of creating a distribution on the group, is by putting a zero-mean Gaussian on the algebra, pushing this forward and left-multiplying with a group element, to put the 'mean' of the resulting distribution away from the identity. This can be constructed as follows:

from relie import (
    SO3ExpTransform,
    SO3MultiplyTransform,
    LocalDiffeoTransformedDistribution as LDTD,
)

alg_loc = ...  # of shape [batch, 3], dtype=double
scale = ...  # of shape [batch, 3], dtype=double
loc = so3_exp(alg_loc)  # of shape [batch, 3, 3]

alg_distr = Normal(torch.zeros_like(scale), scale)
transforms = [SO3ExpTransform(k_max=3), SO3MultiplyTransform(loc)]
group_distr = LDTD(alg_distr, transforms)

This can then be used for e.g. Variational Inference:

z = group_distr.rsample()
entropy = -group_distr.log_prob(z)

Note:

We require double precision.
We consider 2 * k_max + 1 pre-images. In our experience, k_max=3 is sufficient.
Parametrizing the mean with an algebra element that is mapped to the group with the exponential map should not be used in the context of auto-encoders. See [2, 3] for details.

LI-Flow

Alternatively, one can construct a NICE-style normalizing flow. See relie.experiments.so3_multimodal_flow for an example.

Experiments

Please find the experiments of the paper in the package relie.experiments.

Contact

For comments and questions regarding this repository, feel free to reach out to Pim de Haan.

License

MIT

References

[1] Falorsi, L., de Haan, P., Davidson, T. & Forré, P.
Reparametrizing Distributions on Lie Groups
AISTATS (2019)
[2] Falorsi, L., de Haan, P., Davidson, T. R., De Cao, N., Weiler, M., Forré, P., & Cohen, T. S.
Explorations in homeomorphic variational auto-encoding
ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models (2018)
[3] de Haan, P., and Falorsi, L..
Topological Constraints on Homeomorphic Auto-Encoding
NeurIPS 2018 workshop on Integration of Deep Learning Theories (2018)

BibTeX format:

@article{falorsi2019reparameterizing,
  title={Reparameterizing distributions on Lie groups},
  author={Falorsi, L. and
          de Haan, P. and
          Davidson, T.R. and
          Forr{\'e}, P.},
  journal={22nd International Conference on Artificial Intelligence and Statistics (AISTATS-19)},
  year={2019}
}

relie's People

Contributors

Stargazers

Watchers

Forkers

chaoso richardrl zehsilva akashsengupta1997 dilaragokay jaredlafer

relie's Issues

Reasoning behind inv in FlowDistr

Hi, thanks for compiling this very well-organised repo, it has been a joy to work with!

I have a question about the following lines in the FlowDistr class definition in so3_multimodal_flow.py :

 def transforms(self):
        transforms = [
            self.flow().inv,
            intermediate_transform,
            SO3ExpCompactTransform(algebra_support_radius),
        ]
        return transforms

Why is the flow transformation specified as the inverse of the flow transformation defined using the Flow class? I understand that theoretically this makes no difference since the flow transformation is a bijective mapping; but practically it seems like the flow training (i.e. log_prob evaluation) is much faster by specifying the transformation as the inverse. Is it something along the lines of the inverse transform being pre-computed / not being repeatedly computed during log_prob evaluation?

Get NaN when requesting log_prob of mu

If I request the log_prob of mu, an assertion error will be raised indicating NaN.

batch = 2
loc = torch.randn(batch, 3).double()
scale = F.softplus(torch.randn(batch, 3).double())
normal = Normal(torch.zeros_like(loc), scale)
transforms = [
    SO3ExpTransform(),
    SO3MultiplyTransform(so3_exp(loc))]
dist = LDTD(normal, transforms)

GT = so3_exp(loc)

loss = -dist.log_prob(GT)

It may come from https://github.com/pimdh/relie/blob/master/relie/utils/so3_tools.py#L143 where x_norm is zero

I'm not sure if simply adding a small eps to the norm would be fine, since xset of zero vector is a sphere while xset of a very tiny vector is not.

Computation of log p(X|G) in VI

I have some questions regarding the computation of log p(X|G) in prediction_loss_fn.

Why do you use L2 loss here? Would it be possible to use another loss instead of it?
I see no log operation in this function. Is there any other operation that substitutes it?

Thanks in advance!

N(0, scale), exp, multiply \mu OR N(\mu, scale), exp

Hi, thanks for your really amazing work!

I wonder when reparametrization is not a must, e.g., MLE, the predicted $\mu$ should be used after exp via SO3MultiplyTransform or before exp as the mean of the Euclidean Gaussian? I.e., like the title, the procedure should be N(0, scale) $\rightarrow$ exp $\rightarrow$ multiply $\mu$ or N(\mu, scale) $\rightarrow$ exp?

Note that in readme, it is the former case, while in experiment
https://github.com/pimdh/relie/blob/master/relie/experiments/so3_conditional_mle.py#L90, it is the latter case.

Is there any preference?

Thanks in advance!

Recommendations for usage of SO3ExpCompactTransform vs SO3ExpBijectiveTransform

Hi, sorry to bother you again!

I was wondering if you had any insights on the pros and cons of using SO3ExpCompactTransform vs SO3ExpBijectiveTransform for the exp map from so(3) to SO(3)?

The code makes it clear that SO3ExpCompactTransform assumes that the so(3) distribution has support in the <2pi ball, while SO3ExpBijectiveTransform assumes support in the <pi ball (i.e. where the exp map is injective).

In your experiments code, you seem to set the support radius as 1.1pi or 1.6pi and use SO3ExpCompactTransform.

Are there any significant advantages (e.g. numerical accuracy / edge cases) to extending the support radius on so(3) beyond pi? If so, should I basically always use SO3ExpCompactTransform and use some support radius between pi and 2pi?

Thanks,
Akash