polarwandering / paleosampling Goto Github PK
View Code? Open in Web Editor NEWQuantitative Analysis of Paleomagnetic Sampling Strategies
Home Page: https://polarwandering.github.io/PaleoSampling/
License: MIT License
Quantitative Analysis of Paleomagnetic Sampling Strategies
Home Page: https://polarwandering.github.io/PaleoSampling/
License: MIT License
For Figure 3, it could be useful to also include in panel (c) the figure with the one sample per site case where we allow to have the very simple test of filtering points that are outside the 45 degree band around the previously estimated mean.
This also open the question of how to estimate the mean in the more efficient way.
When I am evaluating a pole for use in a paleogeographic reconstruction or designing a study for develop a pole, I want to have high confidence that the pole is a good estimate of the true pole direction.
The current approach of showing the mean square error between the simulated poles and the true pole is asking a slightly different question. The mean is what showing what the average deviation from the true pole is. But in practice I want to have a sense not of what this all looks like in the mean, but what it looking like elsewhere in the distribution (how often do I get a result that is far from the truth).
The histograms of Figure 3 give one window into this, but we need to think through a way to visualize this aspect of the result over more of the parameter space. One approach that I think could work would be to show the 95 percentile of the error to the true pole in addition to showing the mean in a plot that has the same structure as the plot showing the mean square error.
We can discuss other approaches like picking example parameter combinations for box/whisker or violin plots. Perhaps there could be an interactive widget where one could select a parameter combination and see the distribution in addition to the mean square error?
There is currently a TypeError
in the Figure 1 notebook that is rendered in the JupyterBook: https://polarwandering.github.io/PaleoSampling/figures/figure1/Figure1.html
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 df_pole = df_filter[df_filter.n0 <= 20].pivot('N', 'n0', 'error_angle_S')
2 df_scatter = df_filter[df_filter.n0 <= 20].pivot('N', 'n0', 'error_vgp_scatter')
4 plot_angle_error(df_pole,
5 df_scatter,
6 save_plot=True)
TypeError: pivot() takes 1 positional argument but 4 were given
Add more navigation to the JupyterBook in the main notebooks. This should be very simple links that points to the respective notebooks to run the different experiments, specially those that are easy to execute.
These two are different (mean of a function vs function of the mean). This is important because the theoretical result holds for kappas, without information on the mean angle. In order to assert if the result is actually working, we need to compute this quantity .
The example shown in Sampling_example.ipynb generates samples from 10 sites with 4 samples per site
params0 = smp.Params(N=10,
n0=4,
kappa_within_site=100,
site_lat=10,
site_long=0,
outlier_rate=0.10,
secular_method="G",
kappa_secular=None)
All of these samples are then plotted which makes them look like independent samples. It is uncommon to plot sample data while skipping the site hierarchy as done here. I think that it makes sense to either generate the sample with n0=1 or to calculate a site mean for each site. This would need to be done in directional space and then transformed to VGP space.
It also seems valuable to have this example show the angular error from the true pole for the sample. This would be illustrative.
Check that we are sampling from the uniform distribution in inclination-declination space instead of latitude-longitude space
In addition to estimating
VGP scatter is quantified by the parameter
where
To address the issue of within site scatter, some studies use a cutoff for
McElhinny and McFadden (1997) defined a parameter
where
The VGP scatter is calculated in https://github.com/PmagPy/PmagPy/blob/master/pmagpy/ipmag.py with the function sb_vgp_calc
. This function is currently setup to accept a data frame โ which should be generalized. It has a parameter for whether to not to do within site scatter correction.
Change all notation in the code to be consistent with the manuscript and the standard notation
N_sites
: Total number of sites.n_samples_site
: Total number of samples per site.n_samples
: Total number of samples. This should be equal to N_sites * n_samples_site
.We can also use
kappa_insite
: Kappa of the Fisher distribution for sampling within site.outlier_rate
: Proportion of expected outliers.latitude
, longitude
: These correspond to the paleo-latitudes/longitudes used for going from (dec, inc) to (lat, lon).The mapping from kappa to angular dispersion is currently computed based on an interpolation of a pre-computed file located in kappa_secular
. Now, in order to read this file once the package has been installed I had added the following line to define the relative path inside kappa.py
:
_file_location = pathlib.Path(__file__).parent.joinpath("kappa_tabular/kappa2angular.csv")#.parent.joinpath("")
df = pd.read_csv(_file_location, header=0)
However, even after installation, this path doesn't seem to be recognized in the Binder hub. I am not sure why this is the case, but I guess this is not the best way of reading the file. This is the message error once we launch Binder
@Swanson-Hysell the sampling notebook is printing an error when executed by JupyterBook. If you cannot fix this error, you can always uncomments the notebook in the _config.yml
under executed_patterns
to ignore the execution of this notebook and instead just display it.
In the function generate_samples
in smpsite/smpsite/sampling.py
, I am converting the directions corresponding to secular variation from directional space to (inclination, declination). Then I add Fisher noise in this space. I want to check that the functions I am using from pmagpy
are the right ones to do this transformation.
# sample secular variations
# lat, lon
# here is where I can replace by TK03 model
secular_declinations, secular_inclinations = ipmag.fishrot(k=params.kappa_secular,
n=N_sites,
dec=0,
inc=90,
di_block=False)
for i, nk in enumerate(design):
outliers = np.random.binomial(1, params.outlier_rate, nk)
outliers = sorted(outliers)
n_outliers = np.sum(outliers)
n_samples = nk - n_outliers
# Transform VGP coordinates to directions (D, I) coordinates
# inc, dec
dec_vgp, inc_vgp = pmag.vgp_di(plat=secular_inclinations[i],
plong=secular_declinations[i],
slat=params.site_lat,
slong=params.site_long)
# Sample real samples (within-site)
declinations, inclinations = ipmag.fishrot(k=params.kappa_vgp,
n=n_samples,
dec=dec_vgp,
inc=inc_vgp,
di_block=False)
# Convert specimenst to geographical space
for j in range(len(declinations)):
trans_dec, trans_inc, _, _ = pmag.dia_vgp(declinations[j], inclinations[j], 0, params.site_lat, params.site_long)
declinations[j] = trans_dec
inclinations[j] = trans_inc
I would also like to include the TK03 sampling of the secular variations in the sampling, instead of sampling from Fisher with unknown kappa.
On the other side, I wonder if I need to do the same at the moment of estimating the mean direction. That is, instead of taking averages of averages in coordinate space, should I compute the average of the site directions in the directional space, transform these to coordinate space and then compute the Fisher mean but in coordinate space? That is, the mean-site direction is computed in (inc, dec) space, while the mean of VGPs is computed in the (lat, lon) space?
Thank you @LenGallo for the help! Feel free to do a Pull Request with changes or open issues if there is something that needs fix.
Right now, a bottleneck of all the simulations is the required time. A profiling of the code should be executed to see what is consuming all the execution time. A quick scan revealed that the problem doesn't seem to be coming from the sampling or mean estimation per se, so there must be some other operation in the code that is time-consuming and probably can be replaced.
More exploration is needed.
For the second figure in the paper, we want to understand how the Mean Square Error dispersion from the true pole compares when we use
We are going to compare these two methods when we vary i) the dispersion parameter kappa per site (from 10 to 1000, logscale) and ii) the latitude (from 0 to 90). Here we want to compare the difference between MSE between 1 and 2 as we move i and ii, possibly in a colormap 2d plot.
Thoughts on this @LenGallo @Swanson-Hysell @duserzym @matdomeier @bramvaes?
I think it would be more effective to make the RMSE for VGP scatter its own figure with the same overlay annotations with the black contours and the dashed white total sample numbers. It is hard to read when it is as small as it is.
In addition, while I understand that perceptually uniform color maps are in favor they actually make it rather hard to tell the difference between numbers which makes the contours
For both of these plots, it seems a bit silly to have the x-axis go all the way to 40. I don't know of any paleomagnetic study in the literature that has developed that many samples per site. Given that the improvement in number of sample per site asymptotes well before then, not much information is being communicated. Given that it is interested to see the asymptotic behavior, but to also see that there is improvement in the range of 1 to 5 having the x_max be 15 or 20 would be better.
Add an action to create automatically all figures inside Makefile
. The sequence of action should be organized as follows
figures : fig1 fig2a ...
fig1 : figures/figure1/fig1.png
figures/figure1/fig1.png : outputs/fig1a..
<execure notebook>
We probably want to use a matching pattern to generate all the actios for the different figures with the same command.
I order to make the calculations broadly accessible, it would be valuable to develop interactive widgets for calculating expected accuracy given varied parameters.
Hi all,
As implemented now, the 'perfect outlier detection algorithm' detects outliers in the directions within the sites. This makes sense because one argument favouring taking several samples per site would be that outliers can be seen and discarded accordingly (there should be at least 3 samples...). When we compare any given strategy to the one that has more than one sample per site, the latter should have ignore_outliers=True
.
It is also fair, and sensible, to permit the 'one-sample-per-site' strategy to get rid of outliers as well, but in this case, the outlier would be detected against the population of VGPs. This is what Vandamme does.
Other cutoffs could be applied (in Gerritsen's paper there's and in-depth discussion), but I think Vandamme is just fine.
Looking forward to seeing the outcome! I have quickly gone through the simulations and I can say that the 'one-sample-per-site' strategy plus Vandamme, always wins!
In addition to doing runs using TK03, we can explore using Model G of McFadden and McElhinny (1988): https://doi.org/10.1029/JB093iB10p11583. This approach could be numerically more efficient given that VGP scatter which can be calculated using A and B coefficients and then linked to kappa.
When exporting a conda environment that has pmagpy installed inside, this doesn't show in the environment.yml
file or in the listed dependencies with conda. I wonder if I should manually add at the end with the pip
clause or is there is a better way of doing this. Maybe exporting the dependencies with pip? How do you usually do this @Swanson-Hysell @duserzym ?
Adding contributions along the project using the all-contributors bot.
It will be useful to illustrate the idea to add a scatter plot of the sampled poles on top of some locations in Figure 1, just to give more context to the figure and make it easier to understand.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.