astroml / gatspy Goto Github PK

General tools for Astronomical Time Series in Python

License: BSD 2-Clause "Simplified" License

Makefile 0.17% Python 99.83%

gatspy's Introduction

gatspy: General tools for Astronomical Time Series in Python

Gatspy (pronounced as F. Scott Fitzgerald would probably pronounce it) is a collection of tools for analyzing astronomical time series in Python. Documentation can be found at http://astroML.org/gatspy/

Examples

For examples of using gatspy, refer to the example notebooks in the package (powered by nbviewer)

Installation

You can install the released version of gatspy using

$ pip install gatspy

or install the source from this directory using

$ python setup.py install

The package is pure python (i.e. no C or Fortran extensions) so there should be no problems with installation on any system. Gatspy has the following dependencies:

numpy version 1.8 or higher
scipy

Additionally, for some of the functionality gatspy optionally depends on:

Unit Tests

Gatspy uses nose for unit tests. With nosetests installed, type

$ nosetests gatspy

to run the unit tests.

The tests are run on Python versions 2.7, 3.4, and 3.5.

Authors

Jake VanderPlas

Citing

If you use this code in an academic publication, please consider including a citation. Citation information in a variety of formats can be found on zenodo.

Please also see our paper describing the multiband methods in this package: VanderPlas & Ivezic (2015).

gatspy's People

Contributors

Stargazers

Watchers

gatspy's Issues

optimizer.py: LinearScanOptimzer.find_best_periods is sensitive to time scale

I found a minor issue with the number of steps required to compute the best period. It seems that the number of steps scales linearly with the scale of the time units. A simple workaround is to just use the MJD time units to prevent the excess calculations.

Example from IPython notebook (below):

With the units of time values in MJD, max(times) - min(times) may be ~3e3 units (days) and require ~70e3 steps to compute the best period.
With the units of time values in Unix time, max(times) - min(times) may be ~300e6 units (seconds) and require ~7e9 steps to compute the best period. (In the example ipynb, this example overflows 8GB of RAM.)

Example IPython notebook (taken from gatspy/examples, tests added to end): http://nbviewer.ipython.org/gist/stharrold/874c9f588c02b661b50b

Relevant gatspy code: https://github.com/astroML/gatspy/blob/master/gatspy/periodic/optimizer.py#L49-L63

Moving the multiband LS upstream

If I understand the status correctly, it's only the multiband LS that is unique in gatspy, the rest had been moved upstream to astropy a couple of years ago.

@jakevdp - how would you feel to move the multi band as well and maybe archive this repo? It would make somewhat easier to communicate to the community that all your astroML & gatspy implementations has been moved, and receive the maintenance and occasional fixes and improvements in astropy.timeseries rather than at multiple places?

I volunteer to do the actual moving bit, but it's one of those cases where it feels better to ask for permission and opinion first rather than forgiveness after the fact.

Incorrect periodogram with multiple terms

The code snippet below leads to a periodogram with incorrrect values (less than zero, greater than one). An image showing the periodogram is a little lower down and the data used in the example is here.

from __future__ import print_function, division

from astropy.table import Table

from gatspy.periodic import LombScargleFast, LombScargle

import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

# use seaborn for plot styles
import seaborn; seaborn.set()

s = Table.read('tres_10c.csv')
t = s['BJD_TDB']
mag = s['Source-Sky_C10']
dmag = s['mag_error']

model = LombScargle(Nterms=5).fit(t, mag, dmag)
periods, power = model.periodogram_auto(nyquist_factor=100, oversampling=5)

Please declare "astroML" and "supersmoother" as dependencies

... in setup.py, so that pip install gatspy "just works".

optimizer sometimes searches outside the given period range

import numpy as np
from gatspy.periodic import LombScargleFast

np.random.seed(0)
t = 100 * np.random.rand(100)
dy = 0.1
y = np.sin(t) + dy * np.random.randn(100)

model = LombScargleFast().fit(t, y, dy)
model.optimizer.period_range = (6.3, 10)
print(model.best_period)
#6.2859733116463481

This happens when the low-frequency end of the grid is near a peak. It gets worse when the frequency becomes very small compared to the data range.

Conda package for simplified bundling into other projects

Hello.

I just released a package https://github.com/robdmc/pandashells that depends on gatspy. Since many of the tools in Pandashells depend on matplotlib and friends, I really want to make a conda package out of it. I spent a little time trying to figure out how to include pip packages in conda, but I think that may be a fools errand.

Have you considered creating/maintaining a conda package out of gatspy? I see one version on anacoda.org, but it doesn't seem to be maintained by you.

Soften `supersmoother` requirement?

Related to #13, but slightly different: is there a not-too-hacky way to make gatspy.periodic be imported successfully without supersmoother being installed? It seems like if we just want to use, say, LombScargle, we should be able to do so without installing supersmoother. Maybe we could move the ImportError into the SuperSmoother class somewhere? Not a huge issue as supersmoother is quite light-weight, but would make things a little cleaner for some users.

How to return higher Fourier parameters with LombScargle()?

Sorry for this simple question, but how to return the Fourier parameters after it has been fitted with LombScargle()?

cannot import name 'datasets'

Any idea why this is the case?

RRLyrae website down

Hello

It seems this website is now down :

http://www.mpia.de/~bsesar/S82_RRLyr/RRLyr_ugriz_templates.tar.gz

creating crash when requested. The template seems to be here :

http://www.astro.caltech.edu/~bsesar/S82_RRLyr/RRLyr_ugriz_templates.tar.gz

or in SIMBAD.

Is it possible to have an update?

Thanks

Optimizer issue

We look at performing iterations of a Gaussian distribution of data across each data points error bars. The optimizer is used to examine the best period of the L-S model. A histogram plot is created of the no. of iterations against the value for the optimized period. We observe a large spike at a particular value (1763 days) for no apparent reason.

The optimizer is compared to a 'MAX' method whereby the best period is selected through examination of the max power of the L-S model fit.

The results highlight that often, over 50% of the optimizer chosen periods fall onto the same exact value as another iteration (up to ~ 10 sig figs) however, for the MAX method only a small percentage exhibit such behaviour (heavily influenced by oversampling fact size).

We see no reason for such a difference to arise and can only conclude that the optimizer for whatever reason does not seem to work for this particular data.
Note that the optimizer does appear to work for other objects that we have examined.

Believe code is implemented correctly, although mistakes/mis-understandings may be present.

Code located in Word Doc; Data in Text document (Download to same file location to run properly)

Thanks.

Gatspy_Issue.docx

Gatspy_issue_Data.txt

Inconsistent treatment of DC component

I am attempting to compute a spectral window by taking the power spectrum of ones sampled as the time series. I expect a value of 1 at frequency=0, but usually I get zero power at zero frequency... but not always! I set fit_offset=False and center_data=False to try to retain the the DC value. Here's an example:

import numpy as np
import matplotlib.pyplot as plt
import gatspy.periodic as gp

time = np.linspace(0,9,1e5)
nyq=1./(2.*np.median(np.diff(time)))
deltaf=(1./time[-1])/4.5
freq,sw=gp.lomb_scargle_fast.lomb_scargle_fast(time,np.ones(time.size)/time.size,
                                               f0=0.,df=deltaf,Nf=nyq/deltaf,fit_offset=False,
                                               center_data=False)

plt.plot(freq,sw,label='oversample by 4.5')

deltaf=(1./time[-1])/5.
freq,sw=gp.lomb_scargle_fast.lomb_scargle_fast(time,np.ones(time.size)/time.size,
                                               f0=0.,df=deltaf,Nf=nyq/deltaf,fit_offset=False,
                                               center_data=False)
plt.plot(freq,sw,label='oversample by 5')
plt.title('why this different behavior at zero frequency?')
plt.legend()
plt.xlim(0,.5)
plt.xlabel('frequency')
plt.ylabel('spectral window')
plt.show()

This seems to only affect the power at exactly frequency = 0, and the result appears to be sensitive to some precise relationship between the frequency sampling and the time sampling. Am I correct to expect power=1 for frequency=0? Is the "spectral window" a well defined quantity for the Lomb-Scargle periodogram (though I expect it to reproduce the Fourier transform result for the evenly-sampled example above)?

Error when fetch_rrlyrae(), (HTTP Error 404: Not Found)

My python version: 3.7.4
Gatspy 0.3 installed via pip

When trying to use the simplest rrlyrae = fetch_rrlyrae(), it shows the error:

HTTP Error 404: Not Found

The whole error is as follows:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-11-b9aee16fde99> in <module>
      1 from gatspy.datasets import fetch_rrlyrae
----> 2 rrlyrae = fetch_rrlyrae()

/usr/local/anaconda3/lib/python3.7/site-packages/gatspy/datasets/rrlyrae.py in fetch_rrlyrae(partial, **kwargs)
    387     else:
    388         return RRLyraeLC('table1.tar.gz',
--> 389                          cache_kwargs=kwargs)
    390 
    391 

/usr/local/anaconda3/lib/python3.7/site-packages/gatspy/datasets/rrlyrae.py in __init__(self, tablename, dirname, cache_kwargs)
     89         self.dirname = dirname
     90         self.cache_kwargs = cache_kwargs
---> 91         self._load_data()
     92 
     93     def _load_data(self):

/usr/local/anaconda3/lib/python3.7/site-packages/gatspy/datasets/rrlyrae.py in _load_data(self)
     93     def _load_data(self):
     94         filename = _get_download_or_cache(self.tablename,
---> 95                                           **(self.cache_kwargs or {}))
     96         self.data = tarfile.open(filename)
     97         self._metadata = None

/usr/local/anaconda3/lib/python3.7/site-packages/gatspy/datasets/rrlyrae.py in _get_download_or_cache(filename, data_home, url, force_download)
     43 
     44     if force_download or not os.path.exists(save_loc):
---> 45         fhandle = urlopen(src_url)
     46         with open(save_loc, 'wb') as cache:
     47             cache.write(fhandle.read())

/usr/local/anaconda3/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

/usr/local/anaconda3/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

/usr/local/anaconda3/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

/usr/local/anaconda3/lib/python3.7/urllib/request.py in error(self, proto, *args)
    561             http_err = 0
    562         args = (dict, proto, meth_name) + args
--> 563         result = self._call_chain(*args)
    564         if result:
    565             return result

/usr/local/anaconda3/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

/usr/local/anaconda3/lib/python3.7/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
    753         fp.close()
    754 
--> 755         return self.parent.open(new, timeout=req.timeout)
    756 
    757     http_error_301 = http_error_303 = http_error_307 = http_error_302

/usr/local/anaconda3/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

/usr/local/anaconda3/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

/usr/local/anaconda3/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

/usr/local/anaconda3/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

/usr/local/anaconda3/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

LombScargleFast.find_best_periods does not agree with LombScargleFast.score

import numpy as np
from astroML.time_series import lomb_scargle_bootstrap
from gatspy.periodic import LombScargleFast

print np.__version__ #1.9.1
print gatspy.__version__ #0.2.1
print astroML.__version__ #0.3

time=np.array([ 2456627.79138722,  2456706.70756269,  2456707.71879721,
                2456710.7089872 ,  2456714.6781531 ,  2457003.78622566,
                2457004.78832308,  2457008.79132022,  2457011.77312267,
                2457013.77502331])
rv=np.array([ 30356.93030223,   6381.7927134 ,   6121.9466685 ,   5884.45430149,
              5641.85369914,  11957.91925097,  12068.74753209,  10966.23226051,
              11870.72687516,   9647.03955393])
e=np.array([ 80.0980572 ,  80.33726674,  80.98308236,  82.29154533,
             91.65632576,  87.24849974,  81.73203085,  82.35611381,
             82.30579078,  79.94037756])

print 'N={}'.format(time.size) #10
print 'T.max()-T.min(): {:.0f}'.format(time.max()-time.min()) #386
print 'stdev(RV): {:.0f}'.format(rv.std()) #6.9 km/s
print 'mean error: {:.0f}'.format(e.mean()) #83 m/s

model = LombScargleFast().fit(time,rv,e)

model.optimizer.period_range=(1.2,3000)
fbpP,fbpS=model.find_best_periods(n_periods=6, return_scores=True)

periods=np.arange(1.2,3000,.1)
scores=model.score(periods)
autoP,autoS=model.periodogram_auto(nyquist_factor=100)

print ('Do scores returned by model.find_best_periods '
       'match those from model.score?')
try:
    assert (model.score(fbpP)==fbpS).all()
    print '   Yes'
except AssertionError:
    print '   No'

print ('Do scores returned by model.periodogram '
       'match those from model.score?')
try:
    assert (model.score(fbpP)==model.periodogram(fbpP)).all()
    print '   Yes'
except AssertionError:
    print '   No'

print ('Do the periods returned by model.find_best_periods have ranking as '
       'those from model.score?')
try:
    assert (model.score(fbpP).argsort()==fbpS.argsort()).all()
    print '   Yes'
except AssertionError:
    print '   No'


D=lomb_scargle_bootstrap(time, rv, e, 1/periods, generalized=True,
                         N_bootstraps=1000, random_state=0)
sig1 = np.percentile(D,[99])
D=lomb_scargle_bootstrap(time, rv, e, 1/autoP, generalized=True,
                         N_bootstraps=1000, random_state=0)
sig1_auto = np.percentile(D,[99])
D=lomb_scargle_bootstrap(time, rv, e, 1/fbpP, generalized=True,
                         N_bootstraps=1000, random_state=0)
sig1_fbpP = np.percentile(D,[99])

plt.plot(autoP,autoS,'k')
plt.axhline(sig1_auto,color='k',linestyle='--')
plt.plot(periods,scores,'r')
plt.axhline(sig1,color='r',linestyle='--')
plt.axhline(sig1_fbpP,color='g',linestyle='--')
plt.ylim(0,2)
plt.xlim(.9,3000)
plt.semilogx()
for i,x,y in zip(fbpS.size-fbpS.argsort(),fbpP, model.score(fbpP)):
    plt.annotate(str(i),(x,y),(x,1.5),
                 arrowprops=dict(facecolor='black', shrink=0.05,
                                 width=.5,headwidth=.5,frac=.1))
plt.axvline(1.2,color='k',linestyle=':')
plt.xlabel('P (days)')
plt.ylabel('LS Power')

handles=[plt.plot([-1,-1],[-1,-2],'k--',label='99% CI')[0],
         plt.plot([-1,-1],[-1,-2],'k',label='Auto P')[0],
         plt.plot([-1,-1],[-1,-2],'k',label='Manual P')[0],
         plt.plot([-1,-1],[-1,-2],'k:',label='Min. physical P')[0]]
plt.legend(loc='upper left',frameon=0)
plt.subplots_adjust(.11,.12,.96,.96)


"""
I would think that at a bare minimum the relative ordering of the periods
from find_best_period would match those from .score and .periodogram.

Thought not demonstrated here, I'm also somewhat unnerved that the scores
returned by.score are a function of the input periods at least at the 3% level.
Which is higher than I'd expect from something like variations due to machine
precision alone.

This, combined with the issue above, makes be uncomfortable quiting 
significance of peaks deriving from both find_best_periods and a manual search
of scores. My initial interest in find_best_periods was that it seemed to offer
an efficient way to account for the frequency gridding seen in peaks located 
using say autoS.argsorted()[::-1][:6] alone. In a large sample of stars this
simplistic approach will see long (~300-2000 day) periods pile up at 
~950,620, 500, etc. days.
""

Multiband Base model should ``ravel()`` inputs in ``fit()`` method

Currently the API fails as documented.

Also need updated examples of this.

Tests do not run when being offline at build time

I'm packaging all astroML related packages for Fedora (astroML, astroML-addons and supersmoother already done). For gatspy I haven't found a way to run the tests without an internet connection, I get something like

ERROR: gatspy.datasets.tests.test_download_data.test_downloads
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/builddir/build/BUILD/gatspy-0.2.1/gatspy/datasets/tests/test_download_data.py", line 19, in test_downloads
    data = downloader()
  File "/builddir/build/BUILD/gatspy-0.2.1/gatspy/datasets/rrlyrae.py", line 381, in fetch_rrlyrae
    cache_kwargs=kwargs)
  File "/builddir/build/BUILD/gatspy-0.2.1/gatspy/datasets/rrlyrae.py", line 82, in __init__
    self._load_data()
  File "/builddir/build/BUILD/gatspy-0.2.1/gatspy/datasets/rrlyrae.py", line 86, in _load_data
    **(self.cache_kwargs or {}))
  File "/builddir/build/BUILD/gatspy-0.2.1/gatspy/datasets/rrlyrae.py", line 37, in _get_download_or_cache
    buf = download_with_progress_bar(src_url)
  File "/usr/lib/python2.7/site-packages/astroML/datasets/tools/download.py", line 41, in download_with_progress_bar
    fhandle = urlopen(data_url)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1229, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1199, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

Is there any possibility to run tests without network? Is required in Fedora Packaging Guidelines: https://fedoraproject.org/wiki/Packaging:Guidelines#Build_time_network_access

For now I'll build the package without running tests, they run fine on local machines with network access.

Best Regards
Christian

SuperSmoother algorithm cannot be used

Even after I have installed supersmoother via pip and imported it, it cannot be simply used and shows warning:

ImportError: Package supersmoother is required. Use pip install supersmoother to install

Add normalization of PSD to documentation

I'm currently poking around in some data using gatspy, and wanted to take a stab at calculating some false-alarm probabilities. It took me a while of reading the gatspy source code to figure out that the score method returns the PSD in the least-squares normalization (?). Perhaps this is obvious to anyone with more experience with Lomb-Scargle periodograms than me, but for newcomers, it might be useful to explicitly mention that in the docs?

I'm happy to submit a PR, but I wanted to make sure that I'd understood the correctly what it returns before I do that. :)

Brani's website with RR Lyrae data is gone

It looks like Brani's UW website is gone: this is now a broken link
http://www.astro.washington.edu/users/bsesar/S82_RRLyr/

Referenced here:
https://github.com/astroML/gatspy/blob/master/gatspy/datasets/rrlyrae.py

Better docs of grid search parameters

I need to better document how to use the grid period search effectively (see #3)