Comments (1)
Model uncertainty
Model or epistemic uncertainty captures uncertainty in the model parameters. It is higher in regions of no or little training data and lower in regions of more training data. Therefore, model uncertainty can be explained away given enough training data.
Bayesian Neural Network
- Place a prior distribution (e.g., Gaussian) over model weights
w
. By Bayes rule, we can have a posterior distribution model weightsp(w|D)
, instead of a point estimate ofw
. - Bayesian prediction:
p(y*|x*, D) = E_{w~p(w|D)}[p(y*|x*, w)] = ∫p(y*|x*, w)p(w|D)dw
(i.e., marginalize over the posterior or so-called marginalization and Bayesian model averaging). Practically the integration cannot be compute exactly. Therefore we sample severalw
s fromp(w|D)
and perform averaging (i.e., "ensembles" or so-called approximated Bayesian marginalization). - Bayesian inference: computing analytical solution of
p(w|D)
is intractable, we can approximatep(w|D)
using variational inference (i.e., minimizeKL(q_θ(w|D) || p(w|D)
).] - Advantages: Robustness to over-fitting, model uncertainty quantification.
- Disadvantages: Computational expensive (need variational inference to learn parameters), number of parameters doubles (learning a parameter becomes learning its mean and variance) and more time to converge.
Mathematical Findings: Dropout as Bayesian Approximation
A neural network with arbitrary depth and non-linearities, with dropout applied before every weight layer, is mathematically equivalent to an approximation to the probabilistic deep Gaussian process (marginalized over its covariance function parameters). Note: Gaussian processes (GP) model distributions over functions. The findings carry to other variants of dropout as well (e.g., drop-connect, multiplicative Gaussian noise).
Dropout objective minimizes KL-divergence between an approximate distribution and the posterior of a deep Gaussian process (marginalized over its finite rank covariance function parameters) (i.e., Dropout objective is as same as variational inference!).
A deep GP can be approximated by placing variational distribution (i.e., the approximated distribution q(w) to the posterior distribution p(w|X, Y)) over each component of a spectral decomposition of the GP's covariance functions. This spectral decomposition maps each layer of the deep GP to a layer of explicitly represented hidden unit.
Obtaining Model Uncertainty by "MC Dropout"
Performing T stochastic (dropout-enabled) forward passes through the network, we can get the variance of T predictions as model uncertainty; the average of T predictions can be viewed as an ensembled prediction.
Determining the best dropout rate
The best dropout rate can be simply done in grid search, where you make average predictions from MC dropout with different dropout rates. The best dropout rate corresponds to the best average prediction (please refer to the code).
Different non-linearities result in different uncertainty estimates
Dropout’s uncertainty draws its properties from the GP in which different covariance functions correspond to different uncertainty estimates. ReLU and Tanh approximate different GP covariance functions (See appendix 3.1).
from papernotes.
Related Issues (20)
- Neural Architecture Search
- A Recipe for Training Neural Networks HOT 1
- SinGAN: Learning a Generative Model from a Single Natural Image HOT 1
- Few-Shot Unsupervised Image-to-Image Translation HOT 1
- A Style-Based Generator Architecture for Generative Adversarial Networks
- Unsupervised Data Augmentation for Consistency Training HOT 1
- How to Read a Paper HOT 1
- Selfie: Self-supervised Pretraining for Image Embedding HOT 1
- NeurIPS 2019 Notes
- Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates HOT 1
- Bayesian Deep Learning
- Knowledge Distillation
- CVPR 2020 Tutorial Talk: Automated Hyperparameter and Architecture Tuning
- Extensive CVPR 2020 Highlighted Tutorials and Papers!
- Normalization Techniques in Training DNNs: Methodology, Analysis and Application
- Why Normalizing Flows Fail to Detect Out-of-Distribution Data
- Knowledge Distillation Meets Self-Supervision & Prime-Aware Adaptive Distillation HOT 4
- Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels HOT 3
- Hyperspherical Prototype Networks HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from papernotes.