Comments (8)
I assume that you would want to keep the original pairing between x
and y
in your use case? I.e. if you pick x[i]
you also want to pick y[i]
in a sampling step?
from bootstrap.jl.
If the data is an Array
, the implementation will treat the rows as observations and the columns as variables; analogous to a DataFrame
. The bootstrap
step will sample different observations while keeping the relation between the variables intact. In other words, it will generate a new data set with data[idx,:]
for each sampling.
You could do the same with a DataFrame
where the role of observations and variables is bit clearer as in a plain array. But both will behave the same.
from bootstrap.jl.
In your case, the package should work as you seem to expect. Let me know if my explanations make sense or if you anything isn't clear.
from bootstrap.jl.
Thanks for the quick reply.
I assume that you would want to keep the original pairing between
x
andy
in your use case? I.e. if you pickx[i]
you also want to picky[i]
in a sampling step?
Yes
If the data is an Array, the implementation will treat the rows as observations and the columns as variables; analogous to a DataFrame. The bootstrap step will sample different observations while keeping the relation between the variables intact. In other words, it will generate a new data set with data[idx,:] for each sampling.
Sounds like this is doing what I wanted it to
In your case, the package should work as you seem to expect. Let me know if my explanations make sense or if you anything isn't clear.
I guess I am a bit confused about why there were 4 output variables rather than just 1 for the bootstrap correlation values. In this case the variables are uncorrelated, what I was after overall was the 95% CI's on the correlation coefficient. So am expecting some range overlapping zero (variables are independent/uncorrelated in this example), but was unsure how to interpret the output.
from bootstrap.jl.
I guess I am a bit confused about why there were 4 output variables rather than just 1 for the bootstrap correlation values. In this case the variables are uncorrelated, what I was after overall was the 95% CI's on the correlation coefficient. So am expecting some range overlapping zero (variables are independent/uncorrelated in this example), but was unsure how to interpret the output.
The Statistics.cor
function computes a 2x2 correlation matrix for your input array:
2×2 Array{Float64,2}:
1.0 0.192743
0.192743 1.0
Please note that this is a full correlation matrix and not only a coefficient between both variables.
This is why the output of the bootstrap
step also gives you 4 variables, it is simply the output of cor
applied on the bootstrapped data set. You are probably only interested in the off-diagonal elements, and could use e.g. x -> cor(x)[2]
or x -> cor(x[:,1], x[:,2])
as your statistics function.
The confidence intervals return
(estimate, lower_bound, upper_bound)
for each of the 4 bootstrapped variables. The vector contains the confidence intervals for all 4 variables. In line with what I mentioned above, you would want the 2nd or 3rd element of the vector in this case.
from bootstrap.jl.
Perfect. I got confused about the output of Statistics.cor
because when I was testing I used it in the form of cor(personality, looks)
and just got the scaler output.
So this makes sense, but I think x -> cor(x[1,:], x[2,:])
should be x -> cor(x[:,1], x[:,2])
. That gives me:
((0.005388735203249658, -0.22214553137849985, 0.21166247504417582),)
and those numbers make sense for 100 uncorrelated values. Not sure why the extra empty element though. But that solves my question. Thanks very much!
from bootstrap.jl.
Yes, the indices should be other way around. I have updated the comment above to fix this.
The "empty element" is just julia's way of displaying a tuple with only one element.
from bootstrap.jl.
Maybe this issue will be useful documentation if anyone else has a similar issue. Thanks for the help.
from bootstrap.jl.
Related Issues (20)
- Time-series bootstrapping HOT 12
- Project.toml HOT 10
- Bootstrap resampling from an arbitrary number of distributions? HOT 3
- Broken compatibility against new release of StatsModels (v0.6.0) HOT 11
- Upgrade `Formulas` to `StatsModels` v0.6.0+ HOT 6
- Interquartile range HOT 6
- Distributions.jl dependency HOT 4
- outdated compat info at General Registry HOT 2
- Confidence Interval Output HOT 1
- Update DataFrames dependency HOT 2
- [question] how do you control or limit the sample size? HOT 3
- Feature to retrieve all sampled results so that a histogram can be obtained HOT 6
- Feature Suggestion: Bayesian Bootstrap HOT 2
- Bump Distributions.jl to 0.25 HOT 8
- Request for documentation: Balanced Sampling
- Increased Modularity/Composability
- Exact Bootstrap?
- Allow passing an RNG
- Plase bump `StatsBase` compat bound HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bootstrap.jl.