aleximmer / laplace Goto Github PK

View Code? Open in Web Editor NEW

438.0 10.0 66.0 2.09 MB

Laplace approximations for Deep Learning.

Home Page: https://aleximmer.github.io/Laplace

License: MIT License

Python 99.21% Mako 0.74% Shell 0.04% Makefile 0.01%

approximate-bayesian-inference laplace-approximation deep-learning neural-network

laplace's Introduction

The laplace package facilitates the application of Laplace approximations for entire neural networks, subnetworks of neural networks, or just their last layer. The package enables posterior approximations, marginal-likelihood estimation, and various posterior predictive computations. The library documentation is available at https://aleximmer.github.io/Laplace.

There is also a corresponding paper, Laplace Redux — Effortless Bayesian Deep Learning, which introduces the library, provides an introduction to the Laplace approximation, reviews its use in deep learning, and empirically demonstrates its versatility and competitiveness. Please consider referring to the paper when using our library:

@inproceedings{laplace2021,
  title={Laplace Redux--Effortless {B}ayesian Deep Learning},
  author={Erik Daxberger and Agustinus Kristiadi and Alexander Immer
          and Runa Eschenhagen and Matthias Bauer and Philipp Hennig},
  booktitle={{N}eur{IPS}},
  year={2021}
}

The code to reproduce the experiments in the paper is also publicly available; it provides examples of how to use our library for predictive uncertainty quantification, model selection, and continual learning.

Important

As a user, one should not expect Laplace to work automatically. That is, one should experiment with different Laplace's options (hessian_factorization, prior precision tuning method, predictive method, backend, etc!). Try looking at various papers that use Laplace for references on how to set all those options depending on the applications/problems at hand.

Setup
Example usage
Structure
Extendability
When to use which backend?
Contributing
References

Setup

For full compatibility, install this package in a fresh virtual env. We assume Python >= 3.9 since lower versions are (soon to be) deprecated. PyTorch version 2.0 and up is also required for full compatibility. To install laplace with pip, run the following:

pip install --upgrade pip wheel packaging
pip install git+https://github.com/aleximmer/[email protected]

Caution

Unfortunately, we lost our PyPI account and so running pip install laplace-torch only installs the previous version (0.1)!

For development purposes, clone the repository and then install:

# first install the build system:
pip install --upgrade pip wheel packaging

# then install the develop 
pip install -e ".[all]"

Example usage

Simple usage

In the following example, a pre-trained model is loaded, then the Laplace approximation is fit to the training data (using a diagonal Hessian approximation over all parameters), and the prior precision is optimized with cross-validation "gridsearch". After that, the resulting LA is used for prediction with the "probit" predictive for classification.

Important

Laplace expects all data loaders, e.g. train_loader and val_loader below, to be instances of PyTorch DataLoader. Each batch, next(iter(data_loader)) must either be the standard (X, y) tensors or a dict-like object containing at least the keys specified in dict_key_x and dict_key_y in Laplace's constructor.

Important

The total number of data points in all data loaders must be accessible via len(train_loader.dataset).

Important

In optimize_prior_precision, make sure to match the arguments with the ones you want to pass in la(x, ...) during prediction.

from laplace import Laplace

# Pre-trained model
model = load_map_model()

# User-specified LA flavor
la = Laplace(model, "classification",
             subset_of_weights="all",
             hessian_structure="diag")
la.fit(train_loader)
la.optimize_prior_precision(
    method="gridsearch", 
    pred_type="glm", 
    link_approx="probit", 
    val_loader=val_loader
)

# User-specified predictive approx.
pred = la(x, pred_type="glm", link_approx="probit")

Marginal likelihood

The marginal likelihood can be used for model selection [10] and is differentiable for continuous hyperparameters like the prior precision or observation noise. Here, we fit the library default, KFAC last-layer LA and differentiate the log marginal likelihood.

from laplace import Laplace

# Un- or pre-trained model
model = load_model()

# Default to recommended last-layer KFAC LA:
la = Laplace(model, likelihood="regression")
la.fit(train_loader)

# ML w.r.t. prior precision and observation noise
ml = la.log_marginal_likelihood(prior_prec, obs_noise)
ml.backward()

Laplace on LLM

Tip

This library also supports Huggingface models and parameter-efficient fine-tuning. See examples/huggingface_examples.py and examples/huggingface_examples.md for the full exposition.

First, we need to wrap the pretrained model so that the forward method takes a dict-like input. Note that when you iterate over a Huggingface dataloader, this is what you get by default. Having a dict-like input is nice since different models have different number of inputs (e.g. GPT-like LLMs only take input_ids, while BERT-like ones take both input_ids and attention_mask, etc.). Inside this forward method you can do your usual preprocessing like moving the tensor inputs into the correct device.

class MyGPT2(nn.Module):
    def __init__(self, tokenizer: PreTrainedTokenizer) -> None:
        super().__init__()
        config = GPT2Config.from_pretrained("gpt2")
        config.pad_token_id = tokenizer.pad_token_id
        config.num_labels = 2
        self.hf_model = GPT2ForSequenceClassification.from_pretrained(
            "gpt2", config=config
        )

    def forward(self, data: MutableMapping) -> torch.Tensor:
        device = next(self.parameters()).device
        input_ids = data["input_ids"].to(device)
        attn_mask = data["attention_mask"].to(device)
        output_dict = self.hf_model(input_ids=input_ids, attention_mask=attn_mask)
        return output_dict.logits

Then you can "select" which parameters of the LLM you want to apply the Laplace approximation on, by switching off the gradients of the "unneeded" parameters. For example, we can replicate a last-layer Laplace: (in actual practice, use Laplace(..., subset_of_weights='last_layer', ...) instead, though!)

model = MyGPT2(tokenizer)
model.eval()

# Enable grad only for the last layer
for p in model.hf_model.parameters():
    p.requires_grad = False
for p in model.hf_model.score.parameters():
    p.requires_grad = True

la = Laplace(
    model,
    likelihood="classification",
    # Will only hit the last-layer since it's the only one that is grad-enabled
    subset_of_weights="all",
    hessian_structure="diag",
)
la.fit(dataloader)
la.optimize_prior_precision()

test_data = next(iter(dataloader))
pred = la(test_data)

This is useful because we can apply the LA only on the parameter-efficient finetuning weights. E.g., we can fix the LLM itself, and apply the Laplace approximation only on the LoRA weights. Huggingface will automatically switch off the non-LoRA weights' gradients.

def get_lora_model():
    model = MyGPT2(tokenizer)  # Note we don't disable grad
    config = LoraConfig(
        r=4,
        lora_alpha=16,
        target_modules=["c_attn"],  # LoRA on the attention weights
        lora_dropout=0.1,
        bias="none",
    )
    lora_model = get_peft_model(model, config)
    return lora_model

lora_model = get_lora_model()

# Train it as usual here...

lora_model.eval()

lora_la = Laplace(
    lora_model,
    likelihood="classification",
    subset_of_weights="all",
    hessian_structure="diag",
    backend=AsdlGGN,
)

test_data = next(iter(dataloader))
lora_pred = lora_la(test_data)

Subnetwork Laplace

This example shows how to fit the Laplace approximation over only a subnetwork within a neural network (while keeping all other parameters fixed at their MAP estimates), as proposed in [11]. It also exemplifies different ways to specify the subnetwork to perform inference over.

from laplace import Laplace

# Pre-trained model
model = load_model()

# Examples of different ways to specify the subnetwork
# via indices of the vectorized model parameters
#
# Example 1: select the 128 parameters with the largest magnitude
from laplace.utils import LargestMagnitudeSubnetMask
subnetwork_mask = LargestMagnitudeSubnetMask(model, n_params_subnet=128)
subnetwork_indices = subnetwork_mask.select()

# Example 2: specify the layers that define the subnetwork
from laplace.utils import ModuleNameSubnetMask
subnetwork_mask = ModuleNameSubnetMask(model, module_names=["layer.1", "layer.3"])
subnetwork_mask.select()
subnetwork_indices = subnetwork_mask.indices

# Example 3: manually define the subnetwork via custom subnetwork indices
import torch
subnetwork_indices = torch.tensor([0, 4, 11, 42, 123, 2021])

# Define and fit subnetwork LA using the specified subnetwork indices
la = Laplace(model, "classification",
             subset_of_weights="subnetwork",
             hessian_structure="full",
             subnetwork_indices=subnetwork_indices)
la.fit(train_loader)

Serialization

As with plain torch, we support to ways to serialize data.

One is the familiar state_dict approach. Here you need to save and re-create both model and Laplace. Use this for long-term storage of models and sharing of a fitted Laplace instance.

# Save model and Laplace instance
torch.save(model.state_dict(), "model_state_dict.bin")
torch.save(la.state_dict(), "la_state_dict.bin")

# Load serialized data
model2 = MyModel(...)
model2.load_state_dict(torch.load("model_state_dict.bin"))
la2 = Laplace(model2, "classification",
              subset_of_weights="all",
              hessian_structure="diag")
la2.load_state_dict(torch.load("la_state_dict.bin"))

The second approach is to save the whole Laplace object, including self.model. This is less verbose and more convenient since you have the trained model and the fitted Laplace data stored in one place, but also comes with some drawbacks. Use this for quick save-load cycles during experiments, say.

# Save Laplace, including la.model
torch.save(la, "la.pt")

# Load both
torch.load("la.pt")

Some Laplace variants such as LLLaplace might have trouble being serialized using the default pickle module, which torch.save() and torch.load() use (AttributeError: Can't pickle local object ...). In this case, the dill package will come in handy.

import dill

torch.save(la, "la.pt", pickle_module=dill)

With both methods, you are free to switch devices, for instance when you trained on a GPU but want to run predictions on CPU. In this case, use

torch.load(..., map_location="cpu")

Warning

Currently, this library always assumes that the model has an output tensor of shape (batch_size, ..., n_classes), so in the case of image outputs, you need to rearrange from NCHW to NHWC.

Structure

The laplace package consists of two main components:

The subclasses of laplace.BaseLaplace that implement different sparsity structures: different subsets of weights ("all", "subnetwork" and "last_layer") and different structures of the Hessian approximation ("full", "kron", "lowrank" and "diag"). This results in nine currently available options: laplace.FullLaplace, laplace.KronLaplace, laplace.DiagLaplace, the corresponding last-layer variations laplace.FullLLLaplace, laplace.KronLLLaplace, and laplace.DiagLLLaplace (which are all subclasses of laplace.LLLaplace), laplace.SubnetLaplace (which only supports "full" and "diag" Hessian approximations) and laplace.LowRankLaplace (which only supports inference over "all" weights). All of these can be conveniently accessed via the laplace.Laplace function.
The backends in laplace.curvature which provide access to Hessian approximations of the corresponding sparsity structures, for example, the diagonal GGN.

Additionally, the package provides utilities for decomposing a neural network into feature extractor and last layer for LLLaplace subclasses (laplace.utils.feature_extractor) and effectively dealing with Kronecker factors (laplace.utils.matrix).

Finally, the package implements several options to select/specify a subnetwork for SubnetLaplace (as subclasses of laplace.utils.subnetmask.SubnetMask). Automatic subnetwork selection strategies include: uniformly at random (laplace.utils.subnetmask.RandomSubnetMask), by largest parameter magnitudes (LargestMagnitudeSubnetMask), and by largest marginal parameter variances (LargestVarianceDiagLaplaceSubnetMask and LargestVarianceSWAGSubnetMask). In addition to that, subnetworks can also be specified manually, by listing the names of either the model parameters (ParamNameSubnetMask) or modules (ModuleNameSubnetMask) to perform Laplace inference over.

Extendability

To extend the laplace package, new BaseLaplace subclasses can be designed, for example, Laplace with a block-diagonal Hessian structure. One can also implement custom subnetwork selection strategies as new subclasses of SubnetMask.

Alternatively, extending or integrating backends (subclasses of curvature.curvature) allows to provide different Hessian approximations to the Laplace approximations. For example, currently the curvature.CurvlinopsInterface based on Curvlinops and the native torch.func (previously known as functorch), curvature.BackPackInterface based on BackPACK and curvature.AsdlInterface based on ASDL are available.

When to use which backend

Tip

Each backend as its own caveat/behavior. The use the following to guide you picking the suitable backend, depending on you model & application.

Small, simple MLP, or last-layer Laplace: Any backend should work well. CurvlinopsGGN or CurvlinopsEF is recommended if hessian_factorization = 'kron', but it's inefficient for other factorizations.
LLMs with PEFT (e.g. LoRA): AsdlGGN and AsdlEF are recommended.
Continuous Bayesian optimization: CurvlinopsGGN/EF and BackpackGGN/EF are recommended since they are the only ones supporting backprop over Jacobians.

Caution

The curvlinops backends are inefficient for full and diagonal factorizations. Moreover, they're also inefficient for computing the Jacobians of large models since they rely on torch.func.jacrev along torch.func.vmap! Finally, curvlinops only computes K-FAC (hessian_factorization = 'kron') for nn.Linear and nn.Conv2d modules (including those inside larger modules like Attention).

Caution

The BackPack backends are limited to models expressed as nn.Sequential. Also, they're not compatible with normalization layers.

Documentation

The documentation is available here or can be generated and/or viewed locally:

# assuming the repository was cloned
pip install -e ".[docs]"
# create docs and write to html
bash update_docs.sh
# .. or serve the docs directly
pdoc --http 0.0.0.0:8080 laplace --template-dir template

Contributing

Pull requests are very welcome. Please follow these guidelines:

Install Laplace via pip install -e ".[dev]" which will install ruff and all requirements necessary to run the tests and build the docs.
Use ruff as autoformatter. Please refer to the following makefile and run it via make ruff. Please note that the order of ruff check --fix and ruff format is important!
Also use ruff as linter. Please manually fix all linting errors/warnings before opening a pull request.
Fully document your changes in the form of Python docstrings, typehinting, and (if applicable) code/markdown examples in the ./examples subdirectory.
Provide as many test cases as possible. Make sure all test cases pass.

Issues, bug reports, and ideas are also very welcome!

References

This package relies on various improvements to the Laplace approximation for neural networks, which was originally due to MacKay [1]. Please consider citing the respective papers if you use any of their proposed methods via our laplace library.

[1] MacKay, DJC. A Practical Bayesian Framework for Backpropagation Networks. Neural Computation 1992.
[2] Gibbs, M. N. Bayesian Gaussian Processes for Regression and Classification. PhD Thesis 1997.
[3] Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., Adams, R. Scalable Bayesian Optimization Using Deep Neural Networks. ICML 2015.
[4] Ritter, H., Botev, A., Barber, D. A Scalable Laplace Approximation for Neural Networks. ICLR 2018.
[5] Foong, A. Y., Li, Y., Hernández-Lobato, J. M., Turner, R. E. 'In-Between' Uncertainty in Bayesian Neural Networks. ICML UDL Workshop 2019.
[6] Khan, M. E., Immer, A., Abedi, E., Korzepa, M. Approximate Inference Turns Deep Networks into Gaussian Processes. NeurIPS 2019.
[7] Kristiadi, A., Hein, M., Hennig, P. Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks. ICML 2020.
[8] Immer, A., Korzepa, M., Bauer, M. Improving predictions of Bayesian neural nets via local linearization. AISTATS 2021.
[9] Sharma, A., Azizan, N., Pavone, M. Sketching Curvature for Efficient Out-of-Distribution Detection for Deep Neural Networks. UAI 2021.
[10] Immer, A., Bauer, M., Fortuin, V., Rätsch, G., Khan, EM. Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning. ICML 2021.
[11] Daxberger, E., Nalisnick, E., Allingham, JU., Antorán, J., Hernández-Lobato, JM. Bayesian Deep Learning via Subnetwork Inference. ICML 2021.

laplace's People

Contributors

Stargazers

Watchers

Forkers

jordy-vl frankfan007 ratschlab codeaudit gpleiss metodj emykes laplacekorea sanmayphy mengxiangming bradgrimm-forks videra-health edaxberger ravidziv tinnguyen96 shrejais marshalarijona ewongtoi stjordanis frederikwarburg georgezefko sanaelotfi william-gregory ml-edu mrinanksharma bilboblockchain danmackinlay amandinesoub sheetalgiri pinkdiamond1 brunokm cplou99 madhavesdios alex-hh niccoloanceschi asclepiusinformatica lordfiftyfive oiao heatdh bhargobdeka peterw2333 nina-weng linxvan-du ivanvrkic albertomq giosueio gustavals shubham0704 elcorto staeltchinda totony4real samuelebortolotti wenhancao rockdeldiablo xueliu8617112 shivamp-20 ludvins muhmmadzs ruili-pml mjordahn danielathk hubayirp mahdi-shafiei kacapopovic

laplace's Issues

Integrate references in docstrings

Make more clear which method we take from where, similar to how sklearn does it. For example, here in their docstrings.

Enable pip install

Naming: laplace-torch or ideally laplace.

Enable Thompson sampling

This has to be treated differently for regression and classification but is very similar to the predictive(..) currently implemented. Basically laplace.thompson_sample(x, n_samples=1) should return n_samples from the posterior on functions f to perform Thompson sampling in active learning/bandits/BO. For regression, this is simply sampling from the Gaussian distribution on f while its unclear what would be desired for classification.

This function is currently implemented as predictive_samples() but is not necessarily correct for the classification case.

Add references to docstrings

Parts of the methods or classes implemented in the library are proposed in different papers. Instead of having a single reference list in the readme, we could therefore add references into the docstrings.

Move from abc.abstract to NotImplementedErrors to ease subclassing

To add new subclasses of Laplace or backends, it is sometimes required to implement methods in the subclass that are not explicitly necessary. Following the alternative convention of raising errors on a call allows for more flexible subclassing.

Improve readme and state what the library is for

one sentence introduction
bib reference

More realistic library examples

Real data examples
continual learning

Add support for type hints

A straight forward way to improve the code quality is to enable runtime support for type hints via typing.

AsdlHessian data and model on separate devices

Hi guys,
I have been playing around with this library(it's really good!).
This is just a small thing but when losses are calculated in the eig_lowrank
in asdl.py I think the data and the model are not on the same device because a train_loader is passed to the eig_lowrank function. A simple change - .to(device) would make it work.

Keep up the good work! :)

Clarify how the softmax is handled for classification

Currently it's not really clear how the final softmax is dealt with in the classification case, which might lead to confusion / unintentional misuse of the library.

There's two things to clarify:

That the MAP model put into Laplace shouldn't apply a softmax (either via a nn.Sofmax() layer in the model or a F.softmax() call in the overwritten forward pass) but return the logits instead. This could probably most easily be fixed by clarifying it in the documentation/readme and additionally raising a warning if the model outputs on the training set during fit() lie in [0,1] and sum to 1.
That the Laplace model applies the softmax internally when making predictions and that, therefore, the user shouldn't apply another softmax on top. Here we can probably only improve the documentation.

Add methods to save and load Laplace instances

Users might want to avoid computing the Hessian approximation every time they run their code or reuse the same Laplace approximation in different files (e.g. #42). The best interface would probably be .save(filepath) and .load(filepath) methods.

Allow fitting Laplace repeatedly

There is no reason to prefent the .fit() method to be called repeatedly, for example after changing hyperparameters or on a different data set. Currently, this raises a ValueError here. Maybe raise a warning instead of simply reset the state to enable safe iterative fits.

Usage for Object Detection

Love the work you're doing.

A Question: are the algorithms in this repo agnostic to the downstream task or the type of input?
For example, if I have an object recognition model for LIDAR data or classification of Audio inputs, can I still use the package?

Example: continual learning on toy data

Simple example showing continual learning with the Laplace approximation on toy data.

Support DataParallel

Support DataParallel for the predictions and Hessian computation (with Kazuki's backend).

How do you avoid negative determinant of hessian?

Hi, I have a question about how do you avoid negative determinants of Hessian for logdetermiant.

Hessian matrices are not always positive semi-definite, so generally, they have both positive and negative eigenvalues. In other words, determinants can sometimes be negative, i.e., logdet of Hessians cannot be obtained. As long as I ran some simple experiments with your code, I didn't encounter such an issue, but I'd like to know how do you avoid this problem.

Thank you!

Subnetwork inference

Extend curvature backends to retrieve for subnetwork
Add sublaplace.py and the corresponding classes (only FullSubnetLaplace possible afaik)

Integration of `asdfghjkl-0.1`

Current version can be found here. For example, Kazuki Osawa mentioned that the data_average parameter now defaults to True but we require False for a proper Hessian approximation.

Test how the backends handle models with learnable BatchNorm parameters

Check if or which approximations support models with learnable BatchNorm parameters and add clarifying comment to the docs.

Temperature scaling naming issue

We call it temperature but its actually 1/temperature

Joint posterior predictive over inputs

This can be useful for some applications, for example applying LAs to deepminds neural_testbed. Discussed here. Code for neural_testbed.

Add tests with larger model architectures

We should add some tests to catch potential issues with larger models. When computing Hessians it's very easy to run out of memory on consumer hardware, so some tests that check that we don't do any unnecessary allocation of memory might be useful.

Some ideas:

Have tests that checks that our supposedly memory-efficient Laplace variants (last-layer, subnetwork, KFAC, low-rank) actually scale to large models and don't throw out-of-memory errors
Have tests that fail when trying to be bold and e.g. run full Laplace on a big model (this one is kind of trivial)

Are there actually good ways to test these things? I.e. do we know which hardware the tests will be run on (probably on some CPU) and can we artificially limit the memory available (e.g. to emulate behaviour for a standard GPU with, say, 16GB of memory, if the CPU comes with more RAM than that)?

Somewhat relatedly, we could consider adding more informative error messages when running out of memory during Hessian allocation / computation. E.g., if initialising the Hessian runs out of memory, we could raise an error saying something like
"Your model is too big for using FullLaplace. It has X parameters, so the Hessian would be YTB large, while your CPU/GPU only has ZGB memory available. To use FullLaplace on your machine, your model can at most have ~V parameters. Instead, consider using a more memory-efficient Laplace variant, such as W."

More examples

Hi,
Are you going to add more examples here?
I will try myself same ones (originally tested with BNN, Pymc3/BNN or Julia/Turing/BNN) :

Toy (Javier Antoran Cabiscol)
Yacht (UCI)
Housing (UCI)
Half Moons (sklearn)
Toy (Turing)

Add option to tune the prior precision by optimising the NLL on a validation set

What do you think about another option for tuning the prior precision that uses (gradient-based) optimization methods to minimise the NLL on a validation set? I think this might nicely complement the existing options (i.e. MLL optimization and CV using validation data).

This is e.g. how the temperature parameter in temperature scaling is typically optimized; see an example implementation using BFGS from scipy here.

Even easier would be to use the same optimization approach as for the MLL (i.e. Adam from PyTorch).

Add descriptive error message for out-of-memory errors

From #69:

We could consider adding more informative error messages when running out of memory during Hessian allocation / computation. E.g., if initialising the Hessian runs out of memory, we could raise an error saying something like
"Your model is too big for using FullLaplace. It has X parameters, so the Hessian would be YTB large, while your CPU/GPU only has ZGB memory available. To use FullLaplace on your machine, your model can at most have ~V parameters. Instead, consider using a more memory-efficient Laplace variant, such as W."

Example: post-hoc predictive uncertainty

Something like CIFAR-10 pretrained model loading and show how calibration improves.

Add MoLA class

How to compute and store the Hessian for last layer LA offline and use it for future predictions

Hello,

Thanks for the amazing library.

I want to first fit the Last Layer Laplace with the training data and then would like to store the essentials (e.g. the Hessian and the mu) and then later use the _glm_predictive_distribution() function with another OOD dataset in a separate python file, without having access to the training data. Could you please me to understand if this decoupling would be possible with the current code? If possible, can you please point me to the metric/variables which I would need to store?

Decomposition of uncertainty for classification

The method proposed by Kwon et al should be implemented for the MC predictives.

Backend keywords should not catch all keywords but be a dict that is passed

Diagram of inheritance structure of Laplace classes for docs

Create a diagram which describes the inheritance structure of all subclasses of laplace.BaseLaplace.

Example: Bayesian neural model selection

Implement true block-diagonal LA

Kazuki's asdfghjkl implements block-diagonal versions of GGN and EF which could readily be used to construct an alternative posterior approximation. This would require a new LA class and backend integration of asdfghjkl for block-diagonal.

Import issue

Hi! Thank you for developing this module! I experienced an error when trying the Laplace module on Google colab. It says " cannot import name 'Laplace' from 'laplace' (/usr/local/lib/python3.7/dist-packages/laplace/init.py)". Kindly seek your assistance in this issue. Thank you!

Kron and KronDecomposed error handling

Currently, these do not support BatchNorm due to the backends but this should not fail silently when all-weights Laplace is used on networks with Batchnorm.

Logo for the documentation and github teaser

Use face and signature from wikipedia as logo.

Error: Extension saving to kflr does not have an extension for Module <class 'lenet.LeNet5'>

My code:

model = LeNet5(num_classes=10).cuda()

trainset, testset, _ , _ = get_dataset('mnist')
train_loader = DataLoader(trainset, 128, True)
# test_loader = DataLoader(testset, 2000, False)

optimizer = torch.optim.SGD(model.parameters(), 0.1, 0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

pbar = tqdm(range(epoch), total=epoch)
for _ in pbar:
    acc, _ = train_once(model, train_loader, optimizer, criterion)
    pbar.set_postfix_str(f'Acc: {acc:.2f}%')
 
la = Laplace(model, 'classification', 'all', 'kron')
la.fit(train_loader)

Haven't totally understood the math behind......

Lazy posterior covariance computation only for full on changed prior/sigma_noise

Currently implemented on the level of BaseLaplace but should be moved to a specific setter of FullLaplace where it is actually only required. This can be achieved with a setter decorator @BaseLaplace.prior_precision.setter.

Integrate low-rank Laplace

See low-rank branch.

Progress bar for Laplace fitting

It would be nice to (have the option to) print a progress bar when fitting the Hessian (e.g. via tqdm).
For small problems this doesn't matter as it's instant anyways, but for larger problems one can wait a considerable time for the fitting and doesn't really know how long it'll take.

Cannot subclass nn.Module

It looks like I am getting an error when I pass in a model that is a subsclass of nn.Module.

I am using the following model:

class FeedForward(nn.Module):
    def __init__(self, in_dim, hiddens, out_dim, dropout=0.0):
        super(FeedForward, self).__init__()
        dims = [in_dim] + hiddens + [out_dim]
        layers = []
        for i in range(len(hiddens)):
            start = dims[i]
            end = dims[i+1]
            p = dropout if i < len(dims) - 2 else 0.0
            layer = nn.Linear(start, end)
            if p != 0:
                layers.append(nn.Sequential(layer, nn.ReLU(), nn.Dropout(p=p)))
            else:
                layers.append(nn.Sequential(layer, nn.ReLU()))
        layers.append(nn.Linear(hiddens[-1], out_dim))
        self.layers = nn.Sequential(*layers)

Then I train it:

model = FeedForward(1, [100, 100], 1, dropout=0.0)
lr = 1e-3
optim = torch.optim.Adam([{'params': model.parameters(), 'weight_decay': 1e-4}],
                         lr=lr)
...

Then I get the following error:

la = Laplace(model, 'regression')
la.fit(train_dl)

Truncated Traceback (Use C-c C-$ to view full TB):
/anaconda3/envs/pytorch_hunter/lib/python3.9/site-packages/backpack/extensions/backprop_extension.py in __get_module_extension(self, module)
     97             if self._fail_mode is FAIL_ERROR:
     98                 # PyTorch converts this Error into a RuntimeError for torch<1.7.0
---> 99                 raise NotImplementedError(
    100                     f"Extension saving to {self.savefield} "
    101                     "does not have an extension for "

NotImplementedError: Extension saving to kflr does not have an extension for Module <class 'funcprior.models.FeedForward'>

KMP_DUPLICATE_LIB

Hi,
To make it work, I had to add the following:

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

Check the following link:
error 15 initializing-libiomp5md-dll

Priors as loss criteria

Would allow to implement other priors than Gaussian where the attribute .delta or .prior_prec simply returns the second derivative wrt. NN parameters and can be passed into the Laplace class.
For example, straightforward to implemnet are Gaussian and t-Student.

How to use when a regression model predicts mean and variance

Can Laplace support regression models that output a mean and a variance? Thanks for package!

Decouple predictive from the Laplace class?

Useful for: Users who want to implement custom predictive approximations.

Issue: Currently, the predictive approximation is tightly coupled with the Laplace class. So, if the user wanted to implement a new predictive approximation, they have to dig deep into this class, and it might break something not to mention that it can be confusing.

Proposal:

2-steps predictive interface (function output and link predictives)

class FunctionPredictive:

    def __init__(self, ...):
        ...

    def __call__(self, x):
        ''' Return 2 arrays for means and vars '''
        raise NotImplementedError()


class LinearizedPredictive(FunctionPredictive):

    def __init__(self, laplace_net, ...):
        self.laplace_net = laplace_net
        ...

    def __call__(self, x):
        J = compute_jacobian(laplace_net, x)
        return laplace_net.map_prediction(x), J.T @ laplace_net.covmat @ J


class LinkPredictive:

    def __init__(self, ...):
        ...

    def __call__(self, f_mean, f_var):
        raise NotImplementedError()


class ProbitPredictive(LinkPredictive):

    def __init__(self, ...):
        ...

    def __call__(self, f_mean, f_var):
        return torch.sigmoid(f_mean / torch.sqrt(1 + pi/8 * f_var))

Usage

linearized_pred = LinearizedPred()
probit_pred = ProbitPred()  # Set it to `None` if one does regression
laplace_net = Laplace(..., function_predictive=linearized_pred, link_predictive=probit_pred)
laplace_net.fit(train_loader)
laplace_net(x)  # Prediction using the specified predictives

Likelihood/Loss classes

Probably subclass from torch criteria and keep module parameters for specific library functions.
Additionally could subclass from torch distributions for log probabilities and implement the predictive etc.

Change temperature parameter

Either change name to inv_temperature or implement it as actual temperature.
Currently, increased temperature leads to more concentrated posteriors so its reverse.

Custom likelihood and data_loader

Hi! Is there any way that we can implement custom likelihood instead of 'regression' and 'classification', and data_loader? I'm trying to use laplace for PINN. So, the negative log-likelihood (loss) and data_loader are slightly different.

My PINN network has two inputs. I faced this issue:

`/usr/local/lib/python3.7/dist-packages/laplace/baselaplace.py in fit(self, train_loader)
120 self.model.eval()
121
--> 122 X, _ = next(iter(train_loader))
123 with torch.no_grad():
124 self.n_outputs = self.model(X[:1].to(self._device)).shape[-1]

ValueError: too many values to unpack (expected 2)`

What would you advise in this case? Thanks!

Functional Laplace

Do all computations in kernel space which allows for different approximations.
Interesting for low data and output dimensionality.

Add all to all files

__all__ determines what's imported when using from module import * (https://stackoverflow.com/questions/44834/can-someone-explain-all-in-python), which would be good to use consistently throughout the library.

aleximmer / laplace Goto Github PK

laplace's Introduction

Table of contents

Setup

Example usage

Simple usage

Marginal likelihood

Laplace on LLM

Subnetwork Laplace

Serialization

Structure

Extendability

When to use which backend

Documentation

Contributing

References

laplace's People

Contributors

Stargazers

Watchers

Forkers

laplace's Issues

Recommend Projects

Recommend Topics

Recommend Org