turinglang / advancedhmc.jl Goto Github PK
View Code? Open in Web Editor NEWRobust, modular and efficient implementation of advanced Hamiltonian Monte Carlo algorithms
Home Page: https://turinglang.org/AdvancedHMC.jl/
License: MIT License
Robust, modular and efficient implementation of advanced Hamiltonian Monte Carlo algorithms
Home Page: https://turinglang.org/AdvancedHMC.jl/
License: MIT License
@yebai It seems that I don't have access to edit the description of the repo after the transfer. Maybe we want something like "Efficient building blocks for modern HMC samplers"?
As this package mainly aim to provide robust building blocks for HMC (instead of providing a robust sample
interface directly), may be we want to re-name it? @yebai @trappmartin
Currently initialised M inverse need to be passed in. We probably want one without them and initialise them as identity.
Neal, R. M. (2011). MCMC using Hamiltonian dynamics. Handbook of Markov chain Monte Carlo, 2(11), 2. (pdf)
Betancourt, M. (2017). A Conceptual Introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:1701.02434.
Girolami, M., & Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), 123-214. (link)
Betancourt, M. J., Byrne, S., & Girolami, M. (2014). Optimizing the integrator step size for Hamiltonian Monte Carlo. arXiv preprint arXiv:1411.6669.
Betancourt, M. (2016). Identifying the optimal integration time in Hamiltonian Monte Carlo. arXiv preprint arXiv:1601.00225.
Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn Sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593-1623.
Currently during test the verbosity is disable for clean test output thus show()
functions are not tests at all. We may want to include tests for the explicitly.
We currently have a method for each AbstractMetric
which enables us to make another instance of the metric of the following form:
metric = ...
new_metric_of_same_type = metric(inverse_mass_matrix)
Why do we need such a method?
The tag name "0.1.1" is not of the appropriate SemVer form (vX.Y.Z).
cc: @xukai92
Hi @xukai92, I'd like to start a repository heavily based on your code, but with some modifications that lead to incompatibility. I don't see a license file. Is there one you prefer? I'd like to use MIT or BSD if possible. Thanks!
PhasePoint
type #17 #50find_good_eps
#27Adapter
types #28AbstractMetric
s #45Metric
types by using AbstractPDMat
#34We should try to have a modular interface for different momentum refreshment methods (e.g. partial refreshments in Neal 2011)
The NUTS sampler computes Cholesky decomposition on the fly for each Hamiltonian step. This is likely one of the performance bottlenecks for NUTS since Cholesky decomposition is a costly operation and should be computed only once adaption is done.
Copied from @yebai comment in #23 (comment)
r = rand_momentum(rng, h)
H = hamiltonian_energy(h, θ, r)
out of find_good_eps
and wraping
θ′, r′, _is_valid = step(Leapfrog(ϵ′), h, θ′, r′)
H_new = _is_valid ? hamiltonian_energy(h, θ′, r′) : Inf
into a function
function A(ϵ, h, θ, r)
θ′, r′, _is_valid = step(Leapfrog(ϵ′), h, θ′, r′)
return H_new = _is_valid ? hamiltonian_energy(h, θ′, r′) : Inf
end
Then, we can drop the dependency on AdvancedHMC
from this function: find_good_eps(h::Hamiltonian, θ::AbstractVector{T}, A::Function; max_n_iters::Int=100)
, and move it into adaption/stepsize.jl
.
I just came across the error message at:
AdvancedHMC.jl/src/hamiltonian.jl
Line 33 in 0814ae1
This error message should be more informative so that the user gets an idea why he gets it, i.e. in my case I suppose the model I'm looking at is misspecified. It would be good to know what is Inf
.
We should consider adding the following: https://arxiv.org/abs/1905.11916
Maybe we should make the sample
function not return those samples by default (and support an optional keyword for it)?
Discussions in #5 (comment)
@mohamed82008 I reverted some functions about pre-allocating memory so that only intermediate variables are pre-allocated and all functions remains immutable. This is to avoid un-expected behaviour due to mutability. We probably want to implement mutable version of functions to pre-allocate memory for return variables explicitly later. Does it sound good to you?
Sorry, I missed this question. I think it would be nice to support mutable and immutable array types, e.g. StaticArrays.SArray and StaticArrays.MArray or Vector. I saw Chris mention somewhere that DiffEq does this by defining 2 versions of the function, one for mutable arrays and one for immutable. We might want to explore something like this in a future PR. The mutating version takes a first argument the pre-allocated output vector whereas the non-mutating version doesn't. Supporting StaticArrays would be an interesting direction to pursue to gain some speedup for parameter vectors of size <1000 or so. Small arrays is the sweet spot of StaticArrays putting ordinary arrays to shame in many cases.
At the moment, we are using the following notations:
logπ
- log target distribution∂logπ∂θ
- gradientprop
- transitioning kernelθ_init
- initial value for sampling states / parametersI think this can be improved by
∂logπ∂θ
==> ∇logπ
prop
==> κ
(kappa) or τ
(tau) - MH transitioning kernelθ
- parameters, r
- momentumθ_init
==> θ₀
(theta_{0}), r_init
==> r₀
(r_{0})z
:= (θ, r)
- (position, momentum)
pairAny other notations missing?
I was speaking with Kai about this and agreed it would be good to be able to identify divergence of a particular transition (Sec. 6.2 in here ) allowing us to identify pathologies which may bias the posterior sample.
Basically those appear in the READM.md for now.
Approximation error of leapfrog integration (i.e. accumulated Hamiltonian energy error) can sometimes explode, for example when the curvature of the current region is very high. This type of approximation error is sometimes called divergence
[1] since it shifts a leapfrog simulation away from the correct solution.
In Turing, this type of errors is currently caught a relatively ad-hoc function called is_valid
,
AdvancedHMC.jl/src/integrator.jl
Line 7 in 734c0fa
is_valid
can catch cases where one or more elements of the parameter vector is either nan
or inf
. This has several drawbacks
AdvancedHMC.jl/src/integrator.jl
Line 26 in 734c0fa
AdvancedHMC.jl/src/integrator.jl
Line 45 in 734c0fa
Therefore, we might want to refactor the current code a bit for a more robust mechanism for handling leapfrog approximation errors. Perhaps we can learn from the DynamicHMC
implementation:
https://github.com/tpapp/DynamicHMC.jl/blob/master/src/buildingblocks.jl#L168
[1]: Betancourt, M. (2017). A Conceptual Introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:1701.02434.
update
is doing unnecessary decomposition of matrix (#24)Both this package and Distributions.jl
define a sample
function, as opposed to this package extending Distributions.jl
's sample
function with new methods. This causes a name clash which, given that these packages will likely be used to together a lot, is annoying. Should we maybe extend Distributions.jl
's sample
rather than define our own?
Moved from #69
#################################
### Hong's abstraction starts ###
#################################
###
### Create a `Termination` type for each `Trajectory` type, e.g. HMC, NUTS etc.
### Merge all `Trajectory` types, and make `transition` dispatch on `Termination`,
### such that we can overload `transition` for different HMC samplers.
### NOTE: stopping creteria, max_depth::Int, Δ_max::AbstractFloat, n_steps, λ
###
"""
Abstract type for termination.
"""
abstract type AbstractTermination end
# Termination type for HMC and HMCDA
struct StaticTermination{D<:AbstractFloat} <: AbstractTermination
n_steps :: Int
Δ_max :: D
end
# NoUTurnTermination
struct NoUTurnTermination{D<:AbstractFloat} <: AbstractTermination
max_depth :: Int
Δ_max :: D
# TODO: add other necessary fields for No-U-Turn stopping creteria.
end
struct Trajectory{I<:AbstractIntegrator} <: AbstractTrajectory{I}
integrator :: I
n_steps :: Int # Counter for total leapfrog steps already applied.
Δ :: AbstractFloat # Current hamiltonian energy minus starting hamiltonian energy
# TODO: replace all ``*Trajectory` types with `Trajectory`.
# TODO: add turn statistic, divergent statistic, proposal statistic
end
isterminated(
x::StaticTermination,
τ::Trajectory
) = τ.n_steps >= x.n_steps || τ.Δ >= x.Δ_max
# Combine trajectories, e.g. those created by the build_tree algorithm.
# NOTE: combine proposal (via slice/multinomial sampling), combine turn statistic,
# and combine divergent statistic.
combine_trajectory(τ′::Trajectory, τ′′::Trajectory) = nothing # To-be-implemented.
## TODO: move slice variable `logu` into `Trajectory`?
combine_proposal(τ′::Trajectory, τ′′::Trajectory) = nothing # To-be-implemented.
combine_turn(τ′::Trajectory, τ′′::Trajectory) = nothing # To-be-implemented.
combine_divergence(τ′::Trajectory, τ′′::Trajectory) = nothing # To-be-implemented.
transition(
τ::Trajectory{I},
h::Hamiltonian,
z::PhasePoint,
t::T
) where {I<:AbstractIntegrator,T<:AbstractTermination} = nothing
###############################
### Hong's abstraction ends ###
###############################
Time used for collecting 2 million samples
using the following
using DynamicHMC, Turing, Test
@model gdemo(x, y) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ Normal(m, sqrt(s))
y ~ Normal(m, sqrt(s))
return s, m
end
mf = gdemo(1.5, 2.0)
@time chn1 = sample(mf, DynamicNUTS(2000000));
@time chn2 = sample(mf, Turing.NUTS(2000000, 0.65));
The REQUIRE file could not be found.
cc: @yebai
migrated from TuringLang/Turing.jl#571
As mentioned on page 59 in [1], multinomial sampling instead of slice sampling may provide an improvement in overall performance.
@trappmartin pointed that we should aim to support GPU via CuArrays. I believe this is possible and a good idea.
\alpha
H
migrated from TuringLang/Turing.jl#268
As requested by @jonasmac16, it would be nice to have this implemented; see Section 5.7 of Neal, 2012.
After the RF this might be an interesting extension of the current code, especially RMHMC with SoftAbs Metric.
Return value doesn't introduce any cost and it would save some computation for place where value is needed right after gradient evaluation.
The following 3 types in src/metric.jl
:
can be merged into one type using AbstractPDMat
struct EuclideanMetric{T<:Real,A<:AbstractPDMat{T}} <: AbstractMetric{T}
M⁻¹ :: A # contains dim, M⁻¹ and its Cholesky decomposition.
# Pre-allocation for intermediate variables
_temp :: A
end
Reference: https://github.com/JuliaStats/PDMats.jl
This might be interesting to add at some point.
The REQUIRE file could not be found.
cc: @xukai92
Will do after #31 gets merged.
Copied from @yebai's comments in #23 (comment)
can we
adaptation/Adaptation.jl
src/adaptation.jl
adaptation/Adaptation.jl
from AdvancedHMC
directly?Copied from @yebai's comments in #23 (comment)
consider merging the following three functions
getss(fss::FixedStepSize)
getss(da::DualAveraging)
getss(mssa::ManualSSAdapter)
into
getϵ(a:: StepSizeAdapter) = a.ϵ
, or more generallygetϵ(a:: AbstractAdapter) = a.ϵ
Copied from @yebai's comments in #23 (comment)
consider removing ManualSSAdapter
and related code, since we can always use FixedStepSize
Copied from @yebai's comments in #23 (comment)
can we make DualAveraging
an immutable type, and let
adapt_stepsize!
, adapt!
finalize!
always return a new DualAveraging
instance. This can improve both readability and performance since immutable types are allocated on stacks. If so, we can also remove
reset!
and merge
adapt!(da::DualAveraging, θ::AbstractVector{<:AbstractFloat}, α::AbstractFloat)
adapt_stepsize!(da::DualAveraging, α::AbstractFloat)
into
-adapt(d::AbstractAdaptor, α::AbstractFloat=1.0, θ::AbstractVector{<:AbstractFloat}=zeros(0))
We can probably do similar things for Metric
adaptors.
Copied from @yebai's comments in #23 (comment)
Consider
NaiveCompAdaptor
and ThreePhaseAdaptor
direct subtypes of AbstractAdapter
Since AbstractCompositeAdapter
is only used to define
getM⁻¹(ca::AbstractCompositeAdapter)
getss(ca::AbstractCompositeAdapter)
update(h::Hamiltonian, p::AbstractProposal, a::Adaptation.AbstractCompositeAdapter)
These functions can be defined for AbstractAdaptor
directly:
getM⁻¹(a::AbstractAdapter) = a.var
getss(a::AbstractAdapter) = a.ϵ
update(h::Hamiltonian, p::AbstractProposal, a::Adaptation.AbstractCompositeAdapter)
Then we can overload these functions for subtypes of AbstractAdaptor
getM⁻¹(a:: ThreePhaseAdapter) = getM⁻¹(a.pc)
getss(a:: ThreePhaseAdapter) = getss(a.ssa)
getM⁻¹(dpc::UnitPreConditioner) = nothing
(perhaps return 1.0
instead of nothing
)getM⁻¹(dpc::DensePreConditioner) = dpc.covar
Related: TuringLang/Turing.jl#641
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.