Coder Social home page Coder Social logo

intake-astro's Introduction

Intake: Take 2

A general python package for describing, loading and processing data

Logo

Build Status Documentation Status

Taking the pain out of data access and distribution

Intake is an open-source package to:

  • describe your data declaratively
  • gather data sets into catalogs
  • search catalogs and services to find the right data you need
  • load, transform and output data in many formats
  • work with third party remote storage and compute platforms

Documentation is available at Read the Docs.

Please report issues at https://github.com/intake/intake/issues

Install

Recommended method using conda:

conda install -c conda-forge intake

You can also install using pip, in which case you have a choice as to how many of the optional dependencies you install, with the simplest having least requirements

pip install intake

Note that you may well need specific drivers and other plugins, which usually have additional dependencies of their own.

Development

  • Create development Python environment with the required dependencies, ideally with conda. The requirements can be found in the yml files in the scripts/ci/ directory of this repo.
    • e.g. conda env create -f scripts/ci/environment-py311.yml and then conda activate test_env
  • Install intake using pip install -e .
  • Use pytest to run tests.
  • Create a fork on github to be able to submit PRs.
  • We respect, but do not enforce, pep8 standards; all new code should be covered by tests.

intake-astro's People

Contributors

danielballan avatar martindurant avatar smoh avatar timothydmorton avatar wtbarnes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

intake-astro's Issues

Adding fsspec support to `astropy.io.fits`

I'm opening this issue to draw attention to the fact that I opened an Astropy PR today (astropy/astropy#13238) which would add explicit support for opening FITS files with fsspec.

This PR may not necessarily benefit intake-astro, because I believe this package already makes clever use of AstroPy's lazy data loading features (i.e., ImageHDU.section and lazy_load_hdus=True).

Perhaps the most important contribution of the Astropy PR, however, is that it adds documentation on the use of fsspec with FITS files, aimed at astronomers. For example, the Astropy PR would add a chapter on the use of fsspec to the Astropy docs which can be previewed here:

https://astropy--13238.org.readthedocs.build/en/13238/io/fits/usage/cloud.html

/ping @martindurant: I'd be interested to hear your thoughts on the PR. I'm happy to be told this is a bad idea, or have my attention drawn to any incorrect claims I may accidentally have made about fsspec in the Astropy docs.

header/wcs from remote catalog?

If I have a FITS file on disk locally, I see that I can access the header/WCS info if I read it in first; e.g.,

source = intake.open_fits_array('/Users/tdm/Downloads/ACTPol_148_D6_PA1_S1_1way_I.fits')
arr = source.read()
source.wcs

and that gives me

WCS Keywords

Number of WCS axes: 2
CTYPE : 'RA---CEA'  'DEC--CEA'  
CRVAL : 0.0  0.0  
CRPIX : 5832.0  1302.0  
PC1_1 PC1_2  : 1.0  0.0  
PC2_1 PC2_2  : 0.0  1.0  
CDELT : -0.008333333333333333  0.008333333333333333  
NAXIS : 3521  1505

However, I'd like to access this data via a YAML catalog; for example (actpol.yaml):

sources: 
    ACTPol_148_D6_PA1_S1_1way_I:
        driver: fits_array
        cache:
          - argkey: urlpath
            type: file
        args:
            url: https://lambda.gsfc.nasa.gov/data/suborbital/ACT/actpol_2016_maps/ACTPol_148_D6_PA1_S1_1way_I.fits
            ext: 0
        direct_access: force

I then define the catalog

cat = intake.open_catalog('actpol.yaml')

and I can read in the data array, e.g.,

arr = cat.ACTPol_148_D6_PA1_S1_1way_I.read()

but how do I access the header or wcs? The .arr attribute (as well as header, wcs, etc., are not set for this remote source after read, unlike the local one. Am I not understanding about how remote (or catalog-defined) sources work, or is this a bug in intake-astro?

Dask arrays have no dtype attribute

When I create a source from multiple (local) FITS files, I'm able to construct a Dask array using to__dask(). However, the resulting Dask array has no dtype attribute. It does have a shape attribute.

E.g.

import intake
>>> source = intake.open_fits_array('data/*.fits', ext=0)
>>> darr = source.to_dask()
>>> darr.dtype 

gives

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-65-2fad671c8367> in <module>
----> 1 darr.dtype

~/anaconda/envs/aia-on-pleiades/lib/python3.7/site-packages/dask/array/core.py in dtype(self)
   1042     @property
   1043     def dtype(self):
-> 1044         return self._meta.dtype
   1045 
   1046     def _get_chunks(self):

AttributeError: 'tuple' object has no attribute 'dtype'

Furthermore, darr._meta simply returns the shape attribute.

I've dug a bit into how _meta is set in dask.array.Array, but cannot seem to figure out where exactly the intake loader is missing this bit of information. Any pointers would be helpful! I can also supply a few example FITS files for debugging if needed.

If this is a bug in intake-astro or a feature that needs implementing, I'd be happy to open a PR to do so.

Reading a fits file returns empty(?) dask array

Here I'm trying to open a fits file with astropy.io.fits:

In[42]: f=fits.open('20161214_00034.fits')
In[43]: f[1].data.shape
Out[43]: (5776,)

This is correct. However when reading with FITSArraySource, I get some warnings and the resulting dask array is empty

In[44]: f=FITSArraySource('20161214_00034.fits')
In[45]: f.shape
WARNING: FITSFixedWarning: The WCS transformation has more axes (2) than the image it is associated with (0) [astropy.wcs.wcs]
WARNING: FITSFixedWarning: 'datfix' made the change 'Set DATE-REF to '1858-11-17' from MJD-REF'. [astropy.wcs.wcs]
In[46]: f.to_dask().shape
Out[46]: ()
In[47]: 

ImportError: Compatibility w/recent dask?

I got an exception on this line:

https://github.com/intake/intake-astro/blob/master/intake_astro/array.py#L52

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/scratch/local/45915091/ipykernel_32764/2861514433.py in <module>
      1 img_ = FITSArraySource("file.fits", ext=1)
----> 2 img = img_.to_dask()
      3 img

/blue/adamginsburg/adamginsburg/repos/intake-astro/intake_astro/array.py in to_dask(self)
    101 
    102     def to_dask(self):
--> 103         self._get_schema()
    104         return self.arr
    105 

/blue/adamginsburg/adamginsburg/repos/intake-astro/intake_astro/array.py in _get_schema(self)
     50 
     51     def _get_schema(self):
---> 52         from dask.bytes import open_files
     53         import dask.array as da
     54         from dask.base import tokenize

ImportError: cannot import name 'open_files' from 'dask.bytes' (/blue/adamginsburg/adamginsburg/repos/dask/dask/bytes/__init__.py)

Has the open_files function been renamed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.