Light

astroml / astroml_figures Goto Github PK

View Code? Open in Web Editor NEW

7.0 3.0 10.0 7.7 MB

Figures from the astroML book and paper

License: BSD 2-Clause "Simplified" License

Python 100.00%

astroml_figures's Introduction

astroML Figures

Figures from the astroML book and paper

astroml_figures's People

Contributors

Stargazers

Watchers

Forkers

cicerolneto rbiswas4 bsipocz macicco connolly stgilhool mikistli stkyr davidrrice lxlofpku

astroml_figures's Issues

Figure 10.3 inconsistencies

I just noticed that, somehow, the top 4 and the bottom 4 panels in the 2nd edition of the web version of fig. 10.3 are swapped  (and now the caption is wrong). The two printed versions and the 1st edition of the web version are fine. See https://www.astroml.org/book_figures/chapter10/fig_FFT_aliasing.html

Note: we should also check the notebook version of this figure.

Figure 4.2

Hi,
upon running the code for Book Figure 4.2 on Ubuntu, Python returned an error: 'GMM' object has no attribute 'eval' for logprob, responsibilities =M_best.eval(x).
To solve the problem, I replaced M_best.eval(x) (line 85) with:
M_best.score_samples(x.reshape((-1,1)))
and M_best.predict_proba(x) (line 110) with:
p = M_best.predict_proba(x.reshape((-1,1)))

I'm using scikit-learn 0.17

(was astroML/astroML#82, more discussion is on that issue)

Fix title 1.12

This is a duplicate of astroML/astroML#96 as somehow SDSS Stripe 82 Moving Object Catalog sneaked back to the htmls.

Wrong equation reference in Figure 5.9

In the comment in Figure 5.9, there is an incorrect reference in the equations.
For the probability p(b), instead of "eqn. 5.70", it should be "eqn. 5.71".
For the gaussian approximation, the equation is not "eqn. 5.71".

RuntimeError triggered by pymc3 for figure 5.24

at the time of opening this issue I suspect this is a local issue on my laptop, but either case having the issue doesn't hurt.

I now run into pymc3 issues a few times with pycharm mostly when examples are embended in notebooks, but this now consistently appears on the command line, too. I only see the error using python3.8, while it works as expected with identical numpy and pymc3 versions on python3.7.

python book_figures/chapter5/fig_model_comparison_mcmc.py 
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [M1_log_sigma, M1_mu]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [M1_log_sigma, M1_mu]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 262, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 95, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/bsipocz/munka/devel/worktrees/astroML_figures/giant_figure_generating_branch_ed2/book_figures/chapter5/fig_model_comparison_mcmc.py", line 87, in <module>
    trace1 = pm.sample(draws=2500, tune=100)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pymc3/sampling.py", line 469, in sample
    trace = _mp_sample(**sample_args)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pymc3/sampling.py", line 1053, in _mp_sample
    sampler = ps.ParallelSampler(
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pymc3/parallel_sampling.py", line 355, in __init__
    self._samplers = [
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pymc3/parallel_sampling.py", line 356, in <listcomp>
    ProcessAdapter(
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pymc3/parallel_sampling.py", line 242, in __init__
    self._process.start()
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/Users/bsipocz/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Fig 3.19 is double Weibull

cross ref from https://github.com/astroML/text_errata:

Page 104: Figure 3.19 shows the positive part of a double Weibull distribution, not a Weibull distribution. In this case it means that the values on the y axis are half of what they should be. To get a Weibull distribution in scipy, use exponweib with a=1 rather than dweibull.

chapter 9 "Star/Quasar Classification ROC Curves" example trains classifiers on the whole data set rather than the train split

In fig_star_quasar_ROC.py, inside compute_results(), classifiers are trained on X rather than X_train. This means that the test set has been observed from the classifiers during training which of course is a bad practice.

The way to fix this would be to change line 90 from

model.fit(X, y)

model.fit(X_train, y_train)

Additionally, the figure fig_star_quasar_ROC_1.png needs to be updated as it is the result of the execution of the script.

Setting up CI

Some sort of CI testing here would be useful, preferably we would also need a cron job that regularly runs to double check nothing has been broken.

Figure 9.12: sklearn.tree.DecisionTreeClassifier incompatibility

I am using:

sklearn.version: 0.16.1
astroML.version: 0.3

File "fig_rrlyrae_treevis.py", line 242, in 
random_state=0, criterion='entropy')
TypeError: init() got an unexpected keyword argument 'compute_importances'

I added these lines to fix my fork:

# in 0.14+ Setting compute_importances=True is no longer required. 
try:
  # version < 0.14
 clf = DecisionTreeClassifier(compute_importances=True,
                             random_state=0, criterion='entropy')
except:
  # version 0.14+
  clf = DecisionTreeClassifier(
                             random_state=0, criterion='entropy')

see also: astroML/astroML#77

(was astroML/astroML#78)

Add new baselines

for the 2nd edition.

Add caption for new figures

Once they are finalized, update the caption for the new figures.

Figure 6.17

In figure 6.17, we should use the correlation from the full data rather than the mean of bootstrap samples as the best estimate.

(was: astroML/astroML#76)

Python 3.7 compatibility: issues with pymc (at least 10 figures)

pymc has a method called await. Given that async and await are reserved keywords in python 3.7 pymc is not even importable causing at least the following figures not compatible with python3.7 either:

book_figures/chapter5/fig_cauchy_mcmc.py
book_figures/chapter5/fig_signal_background.py
book_figures/chapter5/fig_model_comparison_mcmc.py
~~- [ ] book_figures/chapter1/fig_moving_objects_multicolor.py~~ this was never problematic, not sure how it ended up on this list
book_figures/chapter10/fig_matchedfilt_chirp2.py
book_figures/chapter10/fig_matchedfilt_chirp.py
book_figures/chapter10/fig_arrival_time.py
book_figures/chapter10/fig_matchedfilt_burst.py
book_figures/chapter5/fig_gaussgauss_mcmc.py
book_figures/chapter8/fig_outlier_rejection.py

Plot regression with newer sklearn.decomposition.PCA

The PCA projection in book_figures/chapter7/fig_S_manifold_PCA.py has changed depending on the sklearn version being used (y range should be flipped).

Investigate the cause of it, and report upstream if it looks like a bug.

Add new second edition baselines

Add source code for figure 9.20

cc @connolly

Move PRs over from astroML

Fix examples to use kwargs instead of positional args

Newer versions enforce the usage of kwargs therefore examples should be checked and fixed.

Figure 3.19 y axis is off by a factor of 2

We should fix this figure.

See astroML/text_errata#29

(was: astroML/astroML#62)

matplotlibrc: Add back edges for points

The default has been changed for this, we need to add back the black edges.

Remove note from Figure 10.16

There is no need to point out the error in print ed1, the old figure and the note should be removed

Avoid hacky way of setting up GaussianMixture dataset

Some of the current examples are hacking GaussianMixture() to set up the input dataset. In more recent versions of scikit-learn sampling with a none witted GaussianMixture is not really supported feature (discussion around scikit-learn/scikit-learn#7822 (comment)).

So while is possible to hack it around, we should look into other ways to generate the input dataset for these user facing examples.

examples are e.g.: book_figures/chapter6/fig_GMM_nclusters.py

CNN cartoon issues due to M51 picture

The M51 picture for the CNN cartoon brings up two low priority issues, one needs documentation only, the other a solution:

the jpeg file requires Pillow as a dependency. Maybe the best solution is to convert is to png (the only issue is to make sure the result image is the same as what went into the book).
using the current test and pdf generating mechanism (relying on extracting the code out to a temporary "somefile.py") is not working with the current solution for the file path of the image. Running the script directly works, so users shouldn't be affected by is.
Copying a workaround from the astroML pickle_results mechanism is probably the easiest solution here.

Change default for use_latex to False

While we had it True to generate the figures for the books, this default regularly causes issues for users working with the figure files.

Therefore I think changing the detault to False has more benefits, and adding a comment about it in all the code files that in the book we used True should provide the necessary information for reproducibility.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.