juliangehring / bootstrap.jl Goto Github PK

View Code? Open in Web Editor NEW

107.0 3.0 27.0 727 KB

Statistical bootstrapping library for Julia

Home Page: https://juliangehring.github.io/Bootstrap.jl

License: Other

Julia 100.00%

bootstrapping-statistics julia confidence-intervals bootstrapping statistics

bootstrap.jl's People

Contributors

Stargazers

Watchers

bootstrap.jl's Issues

Add documentation for contributors

This could cover

explain how to implement your own bootstrap method
which branch to use for pull requests (because github unfortunately always has to use the "default" branch for all pull requests, and we don't want that here).

Put function arguments first in signatures

For example,

bootstrap(randn(20), mean, BasicSampling(100))

would become

bootstrap(mean, randn(20), BasicSampling(100))

This allows the use of do blocks to construct anonymous functions, e.g.

bootstrap(randn(20), BasicSampling(100)) do x
    sum(x) / length(x)
end

While that's likely not something that will occur that often when bootstrapping, it's conventional in Julia for functions to come first.

Typo in the confidence interval section of the readme

Currently the readme says the following on confidence intervals:

Bootstrap.jl/README.md

Line 85 in 092b1ec

    
           where each `Tuple` is of the form `(statistic_value, upper_confidence_bound, lower_confidence_bound)`.

but I think it should read:

where each Tuple is of the form (statistic_value, lower_confidence_bound, upper_confidence_bound).

Based on

Bootstrap.jl/src/confint.jl

Line 98 in 092b1ec

return t0, lower, upper

Override functions from StatsBase

In the perspective of adding Bootstrap.jl to the list of default packages in Stats.jl (JuliaStats/StatsKit.jl#4), it would be nice to use the confint and stderror functions from StatsBase instead of ci and se, and to add a method to StatsBase.nobs instead of defining a separate function.

Update DataFrames dependency

Thank you for this package! Now that DataFrames.jl has been updated to version 1, could we add this dependency to the Project.toml?

Bump Distributions.jl to 0.25

Would Bootstrap.jl work with Distributions.jl 0.25?

Bootstrap resampling from an arbitrary number of distributions?

Implementing this functionality would would allow you to do things like calculate the mean difference between two distributions:

using Statistics

xs = rand(1000)
ys = rand(1000)

bootstrap((x,y)->mean(x)-mean(y), (xs, ys), BasicSampling(1000)) # mock API

Which would be equivalent(ish) to the following (minus the nice extras that Bootstrap.jl provides):

using Statistics, StatsBase

map(_->mean(sample(xs, size(xs)))-mean(sample(ys, size(ys))), 1:1000)

Block bootstrap

I would like to implement tsboot from the R boot package, do you have any recommendations on where to start (e.g., methods to overload or types to implement)? My understanding from the documentation is that tsboot is just performing a block-bootstrap, so I'm guessing I'd just need to implement a BlockSampling type?

Exact Bootstrap?

Hi all - thanks for building out this awesome project! I have a very small beef with the readme (sorry in advance), in particular the exact bootstrap. I worry it's misleading to tell users this feature is useful in practice, for at least two reasons:

If we have N=10 unique observations, that's 10^10=10 billion resamples! This method is only computationally feasible when N is in the single digits. Even an incredibly simple estimator (i.e. sample mean) and a huge distributed computing environment can't make this work for N>15 or so.
Bootstrap methods all rely on an asymptotic argument that the empirical distribution of the sample 'looks like' the population distribution if the sample is sufficiently large. I don't know any case where resampling from a sample w/ < 10 obs will tell you anything meaningful about the sampling distribution of a statistic. Even in a simple case, like if we observe N=10 draws from a normal distribution (unknown mean and variance) and want to do inference on the mean, a bootstrap on such a small sample won't tell you anything meaningful.

So either the exact bootstrap is computationally feasible but does not answer what you want it to, or it is not computationally feasible.

I see this as a nice thing to have in the library, but I can't see why any practitioner would want to use it.

Host the documentation on github pages

Host the package documentation on github pages, instead of using readthedocs
Update the documentation

is argument signature for draw! too restrictive?

The draw! method for abstract arrays requires that both arguments be the same type, but the destination array is created with copy which doesn't always give you back an AbstractArray of the same type:

julia> typeof(view(1:100, 1:2:100))
SubArray{Int64,1,UnitRange{Int64},Tuple{StepRange{Int64,Int64}},true}

julia> typeof(copy(view(1:100, 1:2:100)))
Array{Int64,1}

outdated compat info at General Registry

Hi,

The entry of this package on general registry is outdated. It doesn't allow for the latest [email protected].

I'm not familiar with the registry mechanism, but it seems the issue is caused by the compat rule in this form

PackageD = ">=0.4"
which does not meet the criteria for automatic merging. Could you update the registry info? Many thanks for this handy package!

Request for documentation: Balanced Sampling

I'm not entirely sure what the balanced sampling bootstrap is doing -- there's several variance reduction techniques it could be referring to, and I'm not sure which one of these it actually is.

Project.toml

Hi all,

Is it my imagination or is Bootstrap.jl missing a Project.toml and Manifest.toml? I'm currently working on wrapping routines from DependentBootstrap.jl using the Bootstrap.jl API and can easily add a Project.toml and Manifest.toml at the same time as part of the (eventual) pull request...

Cheers,

Colin

BCaConfInt can't handle data with same value

Probably just needs a special case to return (t0, t0, t0)?

julia> bci = confint2(bs, BCaConfInt(0.95), 1)[1]
jkt = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
resid = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
alpha = [0.025, 0.975]
z1 = [-2.47862, 1.44131]
z0 = -0.5186569320803911
qn = [-1.95996, 1.95996]
t1 = [1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 0.928571, 0.857143, 0.928571, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 0.928571, 0.928571, 1.0, 0.857143, 1.0, 1.0, 0.857143, 0.785714, 0.928571, 0.928571, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.857143, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 0.857143, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 0.857143, 0.857143, 0.928571, 1.0, 0.928571, 0.928571, 1.0, 0.928571, 0.928571, 1.0, 0.928571, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.857143, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.857143, 0.928571, 0.928571, 1.0, 0.928571, 0.928571, 0.857143, 1.0, 1.0, 0.928571, 0.928571, 1.0, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 0.857143, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 0.928571, 0.928571, 1.0, 0.857143, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 0.928571, 0.928571, 0.857143, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.857143, 1.0, 0.857143, 1.0, 1.0, 0.857143, 0.928571, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.857143, 1.0, 0.928571, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 0.857143, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 0.928571, 0.857143, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 0.857143, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 1.0, 0.928571, 0.928571, 0.928571, 0.928571, 1.0, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 0.857143, 0.928571, 1.0, 0.857143, 0.857143, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 0.928571, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 0.928571, 1.0, 1.0, 0.928571, 1.0, 0.928571, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.857143, 0.928571, 1.0, 0.928571, 0.857143, 0.928571, 0.857143, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.857143, 0.857143, 0.928571, 0.928571, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.857143, 1.0, 0.928571, 1.0, 0.857143, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.928571, 0.857143, 0.857143, 1.0, 0.928571, 1.0, 0.928571, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.857143, 0.928571, 1.0, 1.0, 1.0, 0.857143, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.857143, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.785714, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.857143, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.785714, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.857143, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 0.857143, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 1.0, 1.0, 1.0, 0.928571, 1.0, 0.928571, 1.0, 0.928571, 1.0, 1.0, 0.928571, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0, 1.0, 0.928571, 0.928571, 1.0, 1.0, 1.0, 1.0]
zalpha = [NaN, NaN]
ERROR: InexactError: trunc(Int64, NaN)
Stacktrace:
 [1] trunc at ./float.jl:693 [inlined]
 [2] floor at ./float.jl:355 [inlined]
 [3] _quantilesort!(::Array{Float64,1}, ::Bool, ::Float64, ::Float64) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Statistics/src/Statistics.jl:839
 [4] #quantile!#45(::Bool, ::Function, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Statistics/src/Statistics.jl:811
 [5] #quantile! at ./none:0 [inlined]
 [6] #quantile!#46 at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Statistics/src/Statistics.jl:819 [inlined]
 [7] #quantile! at ./none:0 [inlined]
 [8] #quantile#52(::Bool, ::Function, ::Array{Float64,1}, ::Array{Float64,1}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Statistics/src/Statistics.jl:913
 [9] quantile(::Array{Float64,1}, ::Array{Float64,1}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Statistics/src/Statistics.jl:913
 [10] confint2(::NonParametricBootstrapSample{Array{Float64,1}}, ::BCaConfInt, ::Int64) at ./REPL[75]:16
 [11] top-level scope at none:0

Julia 1.0

Any plans on making the package Julia v1.0 compatible?
Thanks!

Feature to retrieve all sampled results so that a histogram can be obtained

Please forgive me if this feature exists, but is there a way to easily pull a list of the resampled outputs so that I can create a histogram of the values predicted? I'd like to get a convincing visual figure that shows I've used enough samples.

Allow passing an RNG

This can be important numerical reproducibility and for tests. This way one could use e.g. a StableRNG to make sure they get the exact same results across Julia versions, etc.

Upgrade `Formulas` to `StatsModels` v0.6.0+

StatsModels v0.6.0 introduced a breaking redesign of its formulas which makes parts of Bootstrap incompatible with the latest versions of the package (see #52). Future versions of Bootstrap should adapt the new formula interface and become compatible with the lastest versions of StatsModels.

Distributions.jl dependency

Thank you for this great package. Would it be possible set to a proper [compat] entry in Project.toml or remove Distributions.jl from the dependency list? As is, Bootstrap leads to my Distributions.jl version being downgraded:

(@v1.5) pkg> add Bootstrap
  Resolving package versions...
  Installed StatsFuns ─ v0.9.6
  Installed Requires ── v1.1.1
  Installed RCall ───── v0.13.10
  Installed Revise ──── v3.1.8
Updating << path removed >>
  [e28b5b4c] + Bootstrap v2.2.0
  [31c24e10] ↓ Distributions v0.24.3 ⇒ v0.23.12

Time-series bootstrapping

Hi all,

I noticed this package listed in the recent announcement for StatsKit. Reading through the docs, it sounds like this package is likely to be the default Julia package for bootstrapping. As it happens, I've registered a fairly complete package for bootstrapping time-series (which is mainly block bootstrapping procedures and block length selection procedures). Link here

Are there any good reasons to try and merge some of the functionality from that package into this one? Alternatively, do any of you think it would useful for me to add an interface to the code in my package that matches the interface here, so frequent users of this package can quickly switch across to my package for any time-series needs without having to alter their code? Or should I just leave things as they are for now?

Any feedback is appreciated. I use Julia a lot, but haven't really been keeping my finger on the pulse of the package ecosystem.

Cheers,

Colin

Feature Suggestion: Bayesian Bootstrap

The Bayesian bootstrap simulates an approximation for the posterior distribution of a parameter by assuming that only the observed values are possible, and the relative frequency of each observed value follows a Dirichlet distribution with α_i = 1 for all i -- i.e. the possible frequencies are uniformly distributed. Here's some code I used, which should be pretty easy to adapt to agree with the API used in this package. Dirichlet is taken from Distributions.jl.

function bayesian_bootstrap(sample::T, bootstrap_count::Integer, statistic::Function) where {T<:AbstractArray}
    data_size = length(pointwise_ev)
    weights = similar(pointwise_ev, (bootstrap_count, data_size))
    for i in 1:bootstrap_count
        weights[i, :] .= rand(Dirichlet(ones(data_size)), bootstrap_count)
    return statistic.(sample, each_col(weights))
end

join JuliaStats ?

thanks for this! looks great.

any reason not to place this package in the JuliaStats group? would be easier to find.

Plase bump `StatsBase` compat bound

Hi, it seems that the package has not yet added a compat entry for [email protected]. If this package is still being maintained, could we bump this up?

Interquartile range

I suggest adding the iqr to the estimators.

Bootstrapping a function with 2 inputs?

Is the package currently able to do bootstrap with functions that take 2 inputs? A concrete example would be getting bootstrap intervals of a correlation coefficient based on x, y data.

I tried to give it a go with how I imagined it would work (below). It 'works' in that it doesn't give an error, but I'm unable to interpret the output and figure out if it has done anything meaningful.

using Distributions
using Statistics
using Bootstrap

N = 100
personality = rand(Normal(), N)
looks = rand(Normal(), N)

bs = bootstrap(cor, [personality  looks], BasicSampling(1000))
bci = confint(bs, BasicConfInt(0.95))

bs gives...

Bootstrap Sampling
  Estimates:
    │ Var │ Estimate  │ Bias        │ StdError  │
    │     │ Float64   │ Float64     │ Float64   │
    ├─────┼───────────┼─────────────┼───────────┤
    │ 1   │ 1.0       │ 0.0         │ 0.0       │
    │ 2   │ 0.0308112 │ -0.00386273 │ 0.0986716 │
    │ 3   │ 0.0308112 │ -0.00386273 │ 0.0986716 │
    │ 4   │ 1.0       │ 0.0         │ 0.0       │
  Sampling: BasicSampling
  Samples:  1000
  Data:     Array{Float64,2}: { 100 × 2 }

and bci is

((1.0, 1.0, 1.0), (0.030811242352543605, -0.15276601303154966, 0.2214372656540179), (0.030811242352543605, -0.15276601303154966, 0.2214372656540179), (1.0, 1.0, 1.0))

[question] how do you control or limit the sample size?

Correct me if Im mistaken in my understanding of bootstrapping, I'm going to basically use this as a more robust version of jackknifing.

I have data that's used to fit a model and I want to apply bootstrapping to determine to confidence of 2 or 3 fitted parameters. The data usually ranges from 5-13 data points and the model will not fit less than 3 or so.

Is there a way to ensure all resampling sets contain at least a certain number of data points?
How do we check or specify this?
What if I'd like to resample some data set of size M to many sets of size N where N<M?

Confidence Interval Output

Would it be possible to document better the Confidence Interval Output? I was expecting a Tuple with 2 values like in HypothesisTests, but I don't understand how should I interpret the third value.

fcns of arrays

I'd like to be able to use boot with array data, not just vectors, something like:

dat = randn(5000, 3) #observations by row
muhat = mean(dat, 1)
f(boot_sample) = norm(mean(boot_sample) - muhat)

boot(dat, f, 100)

Unfortunately, boot is only defined for data which is an abstract vector. Is there a fast way to change my data to a vector of vectors? Would ideally want to do it "in place" to save on the memory, too.

Even better, it feels like this sort of functionality should be generically supported by the package, somehow... Thoughts?

Increased Modularity/Composability

Perhaps a long-term goal since it would require quite a bit of work and might involve some breaking changes, but I'd like it if bootstrap estimates had greater modularity and composability. For instance, there's no particular reason that variance reduction techniques, like antithetic sampling, have to exclude techniques like the maximum entropy bootstrap for time series. I think you could break these features down into four separate questions:

What you're resampling (is this a maximum entropy bootstrap, block bootstrap, standard bootstrap, wild bootstrap, etc.)
Variance reduction techniques (control variates, antithetic. Maybe in the future things like drawing QMC rather than Monte Carlo samples or importance sampling?)
Residual/percentile (are you trying to generate the sampling distribution or the error distribution)?
Bootstrap distribution transformations -- e.g. BCa, studentizing, double bootstrap.
So you could have something like:

bootstrap(data, BasicSampling, MaxEnt, BCa)

(The reason I mention 4 is that sometimes users may want to see a full bootstrap distribution, which might include a desire for corrections. I think this is very good practice and should be encouraged, since only calculating a 95% confidence interval can lead people into the trap of discounting outcomes outside the 95% interval as "basically impossible.")

juliangehring / bootstrap.jl Goto Github PK

bootstrap.jl's People

Contributors

Stargazers

Watchers

Forkers

bootstrap.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org