ajdawson / eofs Goto Github PK

View Code? Open in Web Editor NEW

197.0 197.0 60.0 52.38 MB

EOF analysis in Python

Home Page: http://ajdawson.github.io/eofs/

License: GNU General Public License v3.0

Python 100.00%

eofs's People

Contributors

Stargazers

Watchers

Forkers

arulalant panhaidongphd gavin971 nicolasfauchereau divinelybless jmolina9 silky abogusz atmoschris alaniwi lynch123 zhenkunl raybellwaves diversoft mnkmishra02 shaunwbell asanchezlorente jbusecke wujfhero cubayang scottwales candrasa leosiqueira jhon-dong alchemist5617 qwecitq tomastorsvik qiuzeng296 matrss li2z3e bbuzz31 matt-long lleoiu zms-learn zhe233 matzech 1895-art jfa-mbule tomlwebb xrosliang jjinyun griverat chanjeunlam feigegege yadidya-b tianchi03 nicrie kachingasilwimba sunshinenone-delft e-erfani guidov swnesbitt danskecommodities ocefpaf cyschneck lee1043 zeitsperre

eofs's Issues

reconstructedField raises an AttributeError in latest Python / Xarray version

When I run the following example use case

from eofs.xarray import Eof
solver = Eof(data) # flux is the above DataArray
reconstr = solver.reconstructedField(solver.neofs)

The reconstructedField method call raises an AttributeError, pointing to the following section of the code in /lib/python3.10/site-packages/eofs/standard.py within reconstructedField:

# Determine how the PCs and EOFs will be selected.
if isinstance(neofs, collections.Iterable):
            modes = [m - 1 for m in neofs]
        else:
            modes = slice(0, neofs)

Error raised: AttributeError: module 'collections' has no attribute 'Iterable'

It looks like usage of collections.Iterable has been deprecated, being refactored into a second-level module for "abstract base classes" (a similar issue raised here).

Changing the above to

# Determine how the PCs and EOFs will be selected.
if isinstance(neofs, collections.abc.Iterable):
            modes = [m - 1 for m in neofs]
        else:
            modes = slice(0, neofs)

solved the issue for now.

What is EOF?

I don't mean to be dense, but it is a bit difficult to figure out what EOF is from just the repository. The tagline, the readme, and the webpage all use EOF endlessly, but not once is it spelled out in full. You have to go all the way to the overview in the documentation to find it. This is confusing for new (or prospective) users.

PS, I do know what it means; I'm just saying it's hard to figure out if you don't know already.

Python 3 documentation

A recent PR (#55) made the source Python 3 compatible, and therefore the tests can now run against the source version on Python 3 as well as 2. Remove this caveat from the documentation.

Input Data Missing Error from Standard Solver

Hi! I am trying to use the standard solver on a netcdf file that has a data variable (reflectance values), a lat variable, a lon variable, and a time dimension. When I put it into the standard solver, I get an error saying all input data is missing even though I have verified there is reflectance data present... I also tried converting to an xarray and using that xarray solver but got the same warning. Could it be because there are too many masked values? Sorry if I am missing something obvious, I've been picking away at it for quite a while now- thank you for any insights you may be able to provide! Code and example .nc file below

filename = '/home/williamcoast/Desktop/test_csv/test.nc'
ncin = Dataset(filename, 'r')
color = ncin.variables['data'][:]
lons = ncin.variables['longitude'][:]
lats = ncin.variables['latitude'][:]
ncin.close()


coslat = np.cos(np.deg2rad(lats))
wgts = np.sqrt(coslat)[..., np.newaxis]
solver = Eof(color, weights=wgts)

`
test_netcdf.zip

Python collection issue with reconstructedField()

Hello, I am using 1.4.0 of the API with Python 3.10.4 (numpy 1.22.3) and am having problems with eofs.standard.reconstructedField(). It gives the following error:

Traceback (most recent call last): File "/Users/alderj/code/Python/EOFs/Osman_EOF.py", line 33, in <module> reconstruction = solver.reconstructedField([1,2]) File "/Users/alderj/miniconda3/envs/IDLPython/lib/python3.10/site-packages/eofs/standard.py", line 638, in reconstructedField if isinstance(neofs, collections.Iterable): AttributeError: module 'collections' has no attribute 'Iterable'

A quick google indicates collections.Iterable is deprecated. Is there a work around? I'd really like to be able to add PC1 and PC2 in the original data units.

Issue with explained variance

Hello,

I am using the eofs package with the xarray option and ran into a problem with the explained variance. I am comparing sea surface temperature from a satellite product (HadISST) and model output from a global circulation model. See the jupyter notebook here

https://github.com/sryan288/Share/blob/master/EOF_HadISST_vs_ORCA.ipynb

The spatial patterns and PCs for the first two modes look very similar, however, the explained variance for the observations (sst array) is about 60% while it is below 40% for the model (orca array). Generally, the first few modes summed up explain an uncommonly small percentage of the total variance for the model data.

As a test I saved both fields that I plug into the EOF functions, read them into Matlab and performed an EOF analysis there, where I get an explained variance around 60 % for the first modes in both data sets.

I am fairly new to Python and there could definitely be a mistake in my code but I couldn't find anything and don't understand what is going on.

I would greatly appreciate any kind of help!
Svenja

Better CI test matrix

We could do with testing with and without extra dependencies (Iris / cdms2). Currently doing the full package tests on Python 2.7 means we are restricted in what numpy versions can be tested against due to cdat-lite's hard numpy dependency. A travis configuration variable could be used to test just the standard interface, and also the full interface if available.

SVD error when using eof solver on dask array

I have been getting an SVD error when trying to call the xarray eof solver on a daskarray.

The error is due to the following line in eofs.standard:
nonMissingIndex = np.where(np.logical_not(np.isnan(self._data[0])))[0]

np.where always fails and gives nans for dask arrays (see e.g. https://stackoverflow.com/questions/59957541/what-is-the-dask-equivalent-of-numpy-where)

Possibly related to #115 (although I can no longer see those notebooks).

Solved by calling .load() before calling the solver, but loses the advantages of dask.

Question about projectField

Hi @ajdawson ,
I am a student and a beginner of Python.My teacher give me a research topic which is to reconstruct the data covered by clouds.When I looked up the literature, I found EOF analysis, and the article used this method to reconstruct the data blocked by clouds, and the effect is very good. However, I have encountered problems when using your library. I don't know if my understanding is correct, so I ask you for advice.

Can the function of projectfield (data) realize the data reconstruction mentioned above? If not, is there any other function that can implement this function.

Thank you. Your reply will be very helpful to me.

Use versioneer for version management

Tagging as the new version bump.

EOFs not quite orthogonal? PCs rigorously are.

Here's a notebook I used in class with your package. It uses OpenDAP data for reproducibility. No weighting, a simple SST gridpoints analysis.

In the cell numbered 70-71 I checked orthogonality with simple np.corrcoef(). Poor for eofs, great for pcs. Any thoughts on why?

np.corrcoef(eofs[0,:,:].ravel(), eofs[1,:,:].ravel())
array([[ 1. , -0.07549177],
[-0.07549177, 1. ]])

np.corrcoef(pcs[:,0], pcs[:,1])
array([[1.00000000e+00, 1.87961874e-08],
[1.87961874e-08, 1.00000000e+00]])

http://nbviewer.jupyter.org/github/MPOcanes/MPO624-2018/blob/master/assignments/module3-empiricaldecomp/EOF_ERSSTmap.ipynb

xarray projectField function time coords mistake

I discovered that (line 639 eofs/xarray):

       pcs.coords.update({coord.name: (coord.dims, coord)
                           for coord in self._time_ndcoords})

should instead use the new data's time coordinates with time_ndcoords:

       pcs.coords.update({coord.name: (coord.dims, coord)
                           for coord in time_ndcoords})

With this modification it is possible find the EOF's of X1 then project the fields from X2 (than contain a different set of time coordinates) back onto the EOF's of X1 using projectField. Your implementation references the X1 time coordinates (by calling self._time_ndcoords).

(First time creating an issue on Github so sorry if it's not formatted etc. the correct way)

Not found in doc

Hi @ajdawson ,
first of all, thanks for a tremendously useful package.
here are a few questions I cannot find in the doc or examples:

since the sign of an EOF/PC is arbitrary (only the product counts), I often have to multiply both by -1 to get something sensible (e.g. global warming shows up as an upward trending PC and warm colors, not the reverse). How do I do this with your package? (I should note that I use xarray, and as much as possible would like to use its built-in plot capabilities).
I was able to retrieve the variance fraction - nice feature. Is there an easy way to do a scree plot, including the error bars on the eigenvalues? A related feature might be northTest, but I am not quite sure what to do with the numbers it returns.

Thanks in advance!
Julien

Multivariate EOFs

Hello,
Thank you so much for creating this code. I have a question: how do you normalize the input data before applying Multivariate EOFs? In addition, can you disable this option (so we could use our own normalization criterion)? I am looking at the code but cannot find where the normalization process occurs. Many thanks!

Opposite signal in EOF correlation map

Hi everyone,

I'm having trouble to calculate the NAO EOF pattern. Using Eofs package example, the resultant covariance map has opposite signal (it was supposed to have negative covariance in the poles and positive in midi latitudes).
Does anyone know what's the problem?

The code used:

import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

from eofs.xarray import Eof
from eofs.examples import example_data_path

Read geopotential height data using the xarray module. The file contains

December-February averages of geopotential height at 500 hPa for the

European/Atlantic domain (80W-40E, 20-90N).

filename = example_data_path('hgt_djf.nc')
z_djf = xr.open_dataset(filename)['z']

Compute anomalies by removing the time-mean.

z_djf = z_djf - z_djf.mean(dim='time')

Create an EOF solver to do the EOF analysis. Square-root of cosine of

latitude weights are applied before the computation of EOFs.

coslat = np.cos(np.deg2rad(z_djf.coords['latitude'].values)).clip(0., 1.)
wgts = np.sqrt(coslat)[..., np.newaxis]
solver = Eof(z_djf, weights=wgts)

Retrieve the leading EOF, expressed as the covariance between the leading PC

time series and the input SLP anomalies at each grid point.

eof1 = solver.eofsAsCovariance(neofs=1)

Plot the leading EOF expressed as covariance in the European/Atlantic domain

clevs = np.linspace(-75, 75, 11)
proj = ccrs.Orthographic(central_longitude=-20, central_latitude=60)
ax = plt.axes(projection=proj)
ax.coastlines()
ax.set_global()
eof1[0, 0].plot.contourf(ax=ax, levels=clevs, cmap=plt.cm.RdBu_r,
transform=ccrs.PlateCarree(), add_colorbar=False)
ax.set_title('EOF1 expressed as covariance', fontsize=16)
plt.show()

=============================================================

Result was supposed to be like this:
https://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/nao_loading.html

Thanks for your time, hope this is not a silly question.

Error while using solver.eofs with eofs.xarray

I am confused by this error as this code was working perfectly fine until last week.
biweekly_data is an Xarray Dataset and when I type the following:

coslat = np.cos(np.deg2rad(biweekly_data.coords['rlat'].values))
wgts = np.sqrt(coslat)[..., np.newaxis]
solver = Eof(biweekly_data, weights=wgts)

eof1 = solver.eofs()

I get this error:
TypeError: the input must be an xarray DataArray

So I tried:

coslat = np.cos(np.deg2rad(biweekly_data.coords['rlat'].values))
wgts = np.sqrt(coslat)[..., np.newaxis]
solver = Eof(biweekly_data.snowmelt, weights=wgts)

eof1 = solver.eofs()

And then get this:
TypeError: Using a DataArray object to construct a variable is ambiguous, please extract the data using the .data property

So one more time I tried:

coslat = np.cos(np.deg2rad(biweekly_data.coords['rlat'].values))
wgts = np.sqrt(coslat)[..., np.newaxis]
solver = Eof(biweekly_data.snowmelt.data, weights=wgts)

eof1 = solver.eofs()

And get:
TypeError: the input must be an xarray DataArray

It just baffled me because I've used the code as shown in the very top block before to create maps like these:

Note:

The solver works if I use eofs.standard when I use solver = Eof(biweekly_data.snowmelt.data, weights=wgts)

CSV station data to use eofs

I have a dataset with station observation in csv file. Output is nothing
cf.head()

			Height	Stid	ci
DATE	Lat	Lon
1951-01-01	32.56	117.22	206.0	58221	0.608196
	32.05	118.45	618.0	58238	0.569477
	30.20	120.14	77.0	58457	0.626431
	30.37	117.02	263.0	58424	0.666847
	28.40	121.30	95.0	58665	0.546766

from eofs.xarray import Eof
ds = cf.to_xarray()
solver = Eof(ds["ci"])
eof_ci = solver.eofsAsCorrelation(neofs=1)

here is eof_ci：

<xarray.DataArray 'eofs' (mode: 0, Lat: 154, Lon: 166)>
array([], shape=(0, 154, 166), dtype=float64)
Coordinates:

mode (mode) float64

Lat (Lat) float64 2.85 2.86 2.97 3.02 3.03 ... 34.27 34.29 34.5 34.51

Lon (Lon) float64 11.75 11.81 11.91 11.98 ... 121.6 122.1 122.1 122.3
Attributes:
long_name: correlation_between_pcs_and_ci

Use doctr for managing documentation deployment

Using this tools means no longer having to update the documentation manually for each release.

reconstructedField

Hi,

I have used the eofs reconstructedField() function to reconstructing the matrix that decomposed by Eof, but I found that even the neofs is setted as the maximum, I still cannot get the original dataset. I check through out the code, and found that the center value haven't be added back to the reconstruced dataset. Could your help me to make sure that?

# This is the code that I used to do the simple test
import numpy as np
import matplotlib.pyplot as plt
from eofs.standard import Eof

a = np.random.randint(-10,10,(10,20))
solver = Eof(a)
reconstructed_data = solver.reconstructedField(solver.neofs)
plt.scatter(a.reshape(-1),reconstructed_data.reshape(-1))
plt.grid()
plt.show()

plt.scatter(a.reshape(-1),(reconstructed_data+a.mean(axis=0)).reshape(-1))
plt.grid()
plt.show()

Feature Request: rotated PCA

To reproduce the methodology CPC use to calculate the NAO e.g http://www.cpc.ncep.noaa.gov/data/teledoc/telepatcalc.shtml

I see they first calculate the 10 leading eofs for each month (3 month average) which this code can do as it stands. They then apply a ‘Varimax rotation’ (I need to read their paper) to obtain the 10 rotated eofs for each month (3 month average)

Support for extended EOF analysis

From a user request: add support for extended EOF analysis.

eofs.xarray.Eof behaves unexpectedly when provided a DataArray with a coordinate called 'mode'

Hi,

I admit that the following issue is a bit pathological, and you may decide it is not worth doing anything about, but I thought I would flag it anyway.

I found myself in the position of wanting to apply PCA to the product space of the principal components of two separate fields:

import xarray as xr
from eofs.xarray import Eof

Z500_pcs=xr.open_dataarray('DJF_Z500_PCs.nc')
MSLP_pcs=xr.open_dataarray('DJF_MSLP_PCs.nc')

combined_pcs=xr.concat([Z500_pcs,MSLP_pcs],'mode')
solver=Eof(combined_pcs)

print(solver.eofs().shape)
print(solver.eofs()[0].shape)
print(solver.eofs()[0][0][0][0][0].shape)

(13, 13)
(13, 13)
(13, 13)

solver.eofs(neofs=3)

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_9504/46481902.py in <module>
----> 1 solver.eofs(neofs=3)

~/miniconda3/lib/python3.9/site-packages/eofs/xarray.py in eofs(self, eofscaling, neofs)
    227         eofs = xr.DataArray(eofs, coords=coords, name='eofs',
    228                             attrs={'long_name': long_name})
--> 229         eofs.coords.update({coord.name: (coord.dims, coord)
    230                             for coord in self._space_ndcoords})
    231         return eofs

~/miniconda3/lib/python3.9/site-packages/xarray/core/coordinates.py in update(self, other)
    164             [self.variables, other_vars], priority_arg=1, indexes=self.xindexes
    165         )
--> 166         self._update_coords(coords, indexes)
    167 
    168     def _merge_raw(self, other, reflexive):

~/miniconda3/lib/python3.9/site-packages/xarray/core/coordinates.py in _update_coords(self, coords, indexes)
    340         coords_plus_data = coords.copy()
    341         coords_plus_data[_THIS_ARRAY] = self._data.variable
--> 342         dims = calculate_dimensions(coords_plus_data)
    343         if not set(dims) <= set(self.dims):
    344             raise ValueError(

~/miniconda3/lib/python3.9/site-packages/xarray/core/dataset.py in calculate_dimensions(variables)
    203                 last_used[dim] = k
    204             elif dims[dim] != size:
--> 205                 raise ValueError(
    206                     f"conflicting sizes for dimension {dim!r}: "
    207                     f"length {size} on {k!r} and length {dims[dim]} on {last_used!r}"

ValueError: conflicting sizes for dimension 'mode': length 3 on <this-array> and length 13 on {'mode': 'mode'}

All these errors vanish when I add the following line:

combined_pcs=combined_pcs.rename({'mode':'original_mode'})

Maybe a warning, or an automatic renaming if an array with a coordinate named 'mode' is passed to Eof would be appropriate?

Time coord in iris cube - name is hardcoded as 'time'

I have an Iris cube with a time dimension co-ordinate whose coord.name() is 't'. When trying to create a solver it crashes because the code assumes time dimensions must be called 'time'. This is true for univariate and multivariate solvers.

I can work around this by setting coord.standard_name = 'time' on all my cubes before creating EOFs, but the eofs package itself could allow for any name by doing something like time_name = cube.coord(axis='T').name().

Question about projectField

Hi @ajdawson,

I have been used the eofs extensively for my on-going research in recent 2 years. Thanks for your efforts on developing and maintaining this awesome package. I am not sure if this issue board is proper for asking question. Please excuse me if not, but I really do need your help for my following questions.

I understand how eofsAsCovariance and eofsAsCorrelation are different, but couldn't fully understand what eofs is for. Could you please give me some detail about the eofs and how it works differently to above two?
Some interfaces (e.g., eofsAsCovariance, pcs) have pcscaling option while some other interfaces (e.g., eofs, projectField) have eofscaling option. To me it seems like if pcscaling=1, it works with normalized PC time series that has unit variance (please correct me if I am wrong). But I don't fully understand what eofscaling option is for. May I have further detail how the eofscaling option work?
According to the manual, "We could also project another field onto the EOFs to produce a set of pseudo-PCs:pseudo_pcs = solver.projectField(other_field)"
I am using this call as below:
pseudo_pcs = solver.projectField(field_to_be_projected, neofs=1, eofscaling=0)
The eofscaling=0 says the field is being projected onto "un-scaled EOFs" as default. But to me, it seems like EOF pattern of unit variance (map that has spatial deviation = 1) is being projected onto the given field (That is why I asked question 2).
I see the projectField is using flatE, which is coming from E. I suspect this E should identical to map of EOF pattern that has unit variance. Could you please confirm or correct this for me?

Thank you for your attention and sorry for deficit of my understanding. Your comment would be tremendously helpful for me. Thank you in advance.

Create conda packages and documentation

Build conda packages and host them on binstar. Add documentation so users know about this option.

Can I get variance fraction from projected field?

Hello,

Thanks to your effort on this package, I am using this for develop climate model evaluation tool. I could have fraction of variance using solver.varianceFraction(), but I am wondering if I can get the fraction but from projected field.

I've used below to project arbitrary field to solver and gained pcs of projected field:
pcs_of_projected_field = solver.projectField(field_to_be_projected,neofs=eofn,eofscaling=1)

But couldn't find way to have its fraction of variance. Does EOFs have function to do this efficiently?

Thanks for your attention.

How To Plot Contours from a 2D Matrix

I obtained temperature data from 90 weather stations. Each station has data for 3,481 times. I currently have a two-dimensional (space and time) matrix of temperature data (3481 x 90). When I run the module, the solver returns a 1 x 90 EOF. How can I plot this EOF using contourf?

Should I create an empty 90 x 90 matrix and fill in the diagonal the values of the EOF? And the values of lot and lan should also be 90 x 90 matrices?

fill = ax.contourf(lons, lats, np.fill_diagonal(np.zeros((90,90)), eof1.squeeze()), clevs, cmap=plt.cm.RdBu_r, latlon=True)

Error in eofsAsCovariance when using weights

I am running the NAO example with xarray. Extracting the 1st EOF repeatedly gives different results (only when using weights). It looks like the weights are applied repeatedly.

import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

from eofs.xarray import Eof
from eofs.examples import example_data_path


filename = example_data_path('hgt_djf.nc')
z_djf = xr.open_dataset(filename)['z']


z_djf = z_djf - z_djf.mean(dim='time')


coslat = np.cos(np.deg2rad(z_djf.coords['latitude'].values)).clip(0., 1.)
wgts = np.sqrt(coslat)[..., np.newaxis]
solver = Eof(z_djf, weights=wgts)


eof1 = solver.eofsAsCovariance(neofs=1)


clevs = np.linspace(-75, 75, 11)
proj = ccrs.Orthographic(central_longitude=-20, central_latitude=60)
ax = plt.axes(projection=proj)
ax.coastlines()
ax.set_global()
eof1[0, 0].plot.contourf(ax=ax, levels=clevs, cmap=plt.cm.RdBu_r,
                         transform=ccrs.PlateCarree(), add_colorbar=False)
ax.set_title('EOF1 expressed as covariance', fontsize=16)
plt.show()

# ========================================================================
# Extract 1st EOF again and redo the same plot
# ========================================================================
eof1 = solver.eofsAsCovariance(neofs=1)

clevs = np.linspace(-75, 75, 11)
proj = ccrs.Orthographic(central_longitude=-20, central_latitude=60)
ax = plt.axes(projection=proj)
ax.coastlines()
ax.set_global()
eof1[0, 0].plot.contourf(ax=ax, levels=clevs, cmap=plt.cm.RdBu_r,
                         transform=ccrs.PlateCarree(), add_colorbar=False)
ax.set_title('EOF1 expressed as covariance', fontsize=16)
plt.show()

print(eof1-eof1b)

Result of difference between eof1 and eof1b:

<xarray.DataArray 'eofs' (mode: 1, pressure: 1, latitude: 29, longitude: 49)>
array([[[[ 1.09211228e-01,  9.13365940e-02,  7.00722281e-02, ...,
           1.05908773e-01,  1.25983739e-01,  1.41891558e-01],
         [ 2.34846658e-01,  2.13107734e-01,  1.85683208e-01, ...,
           5.54756315e-02,  9.04768289e-02,  1.16930300e-01],
         [ 4.52193967e-01,  4.28573805e-01,  3.97803594e-01, ...,
          -9.20446292e-02, -4.41052427e-02, -7.67394121e-03],
         ...,
         [-5.45899433e+01, -5.52156445e+01, -5.58994769e+01, ...,
          -6.28602958e+01, -6.25882936e+01, -6.22965706e+01],
         [-8.68933483e+01, -8.72562317e+01, -8.76033326e+01, ...,
          -9.52351039e+01, -9.51051557e+01, -9.50488148e+01],
         [            nan,             nan,             nan, ...,
                      nan,             nan,             nan]]]])
Coordinates:
  * mode       (mode) int64 0
  * pressure   (pressure) float32 500.0
  * latitude   (latitude) float32 20.0 22.5 25.0 27.5 ... 82.5 85.0 87.5 90.0
  * longitude  (longitude) float32 -80.0 -77.5 -75.0 -72.5 ... 35.0 37.5 40.0

Why EOF transforms dims in coordinates

Hi,

I have an xarray Dataset :

>>> inFile
<xarray.Dataset>
Dimensions:                               (time_counter: 6000, x: 182, y: 149)
Coordinates:
  * time_counter                          (time_counter) float64 3.02e+07 ... 1.892e+11
Dimensions without coordinates: x, y
Data variables:
     tos_yearmean                          (time_counter, y, x) float32 ...

I compute the EOFs on tos_yearmean and I get :

>>> solver.eofs()
<xarray.DataArray 'eofs' (mode: 6000, y: 149, x: 182)>
array([[[nan, nan, ..., nan, nan],
           [nan, nan, ..., nan, nan]]], dtype=float32)
Coordinates:
  * mode     (mode) int64 0 1 2 3 4 5 6 7 ... 5993 5994 5995 5996 5997 5998 5999
  * y        (y) int64 0 1 2 3 4 5 6 7 8 ... 140 141 142 143 144 145 146 147 148
  * x        (x) int64 0 1 2 3 4 5 6 7 8 ... 173 174 175 176 177 178 179 180 181
Attributes:
    long_name:  empirical_orthogonal_functions

Dimensions x and y have been transformed into coordinates, with index starting at 0. This is spurious information. When I use the input file in Ferret, Ferret assumes that x and y start at 1. When I write solver.eofs in a file and use it in Ferret, the indexing starts at 0, and there is mismatch between the two files that make Ferret to fail on some operations.

Can I prevent eofs to transform dimension in coordinates ?

Thanks,

Olivier

Python crashed when running standard example

When trying to run your example code in "standard" dir, I've got a runtime errror and no exception was catched. I debuged it with PyCharm and found that this happened in this line:

A, Lh, E = np.linalg.svd(dataNoMissing, full_matrices=False)

I wonder if there is sth wrong with this input data dataNoMissing? Or it is caused by Windows version of numpy?

OS: Windows 8.1 with Update1
Numpy version: 1.8.0

Any possibility to work with ensembles and stacked dimensions?

I would like to compute an EOF on an ensemble (a xarray with member dimension) not having different EOFs for all the members. I would like, for example, to stack the dimensions time and member, and then computing the EOF on that dimension. Is it possible or it would be better to stick to the plain sklearn functions?

Work with dask arrays

Thanks for providing this amazing package. It is absolutely one of the best and most useful python packages I know of!

Currently eofs only works with numpy arrays. However, its core computational algorithm, svd, is implemented in dask array. http://dask.pydata.org/en/latest/array-api.html

This means that it would theoretically be possible for eofs to leverage dask to do out of core EOFs with minimal refactoring.

Is this on your roadmap? Would be keen to help if you’re interested.

Deploy tags to PyPI automatically

The Travis CI deploy mechanism can be used to automatically build and upload distributions built from tags.

Unable to install eofs via conda

Hi Andrew,

My issue is that I wasn't able to install eofs via conda, due to some "conflicts":

$ conda install -c https://conda.anaconda.org/ajdawson eofs
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ......
Solving package specifications: ....

The following specifications were found to be in conflict:
  - bottleneck (target=bottleneck-1.0.0-np110py27_0.tar.bz2) -> numpy 1.10*|1.11*|1.9*
  - bottleneck (target=bottleneck-1.0.0-np110py27_0.tar.bz2) -> python 3.4*|3.5*
  - eofs
  - fontconfig (target=fontconfig-2.11.1-5.tar.bz2) -> freetype 2.5*
  - fontconfig (target=fontconfig-2.11.1-5.tar.bz2) -> libpng 1.6.17
Use "conda info <package>" to see the dependencies for each package.

Could you please give me a hint? For instance, what happens if I uninstall bottleneck, may it help?
Please find below my "conda info" and "conda list" outputs.
Thanks in advance
Carlos

$ conda info
Using Anaconda Cloud api site https://api.anaconda.org
Current conda install:

             platform : linux-32
        conda version : 4.0.5
  conda-build version : 1.20.0
       python version : 2.7.11.final.0
     requests version : 2.9.1
     root environment : /home/carlos/anaconda2  (writable)
  default environment : /home/carlos/anaconda2
     envs directories : /home/carlos/anaconda2/envs
        package cache : /home/carlos/anaconda2/pkgs
         channel URLs : https://repo.continuum.io/pkgs/free/linux-32/
                        https://repo.continuum.io/pkgs/free/noarch/
                        https://repo.continuum.io/pkgs/pro/linux-32/
                        https://repo.continuum.io/pkgs/pro/noarch/
          config file : None
    is foreign system : False

$ conda list
# packages in environment at /home/carlos/anaconda2:
#
abstract-rendering        0.5.1               np110py27_0    defaults
alabaster                 0.7.7                    py27_0    defaults
anaconda-client           1.4.0                    py27_0    defaults
argcomplete               1.0.0                    py27_1    defaults
astropy                   1.1.2               np110py27_0    defaults
babel                     2.3.3                    py27_0    defaults
backports                 1.0                      py27_0    defaults
backports-abc             0.4                       <pip>
backports.ssl-match-hostname 3.4.0.2                   <pip>
backports_abc             0.4                      py27_0    defaults
basemap                   1.0.7               np110py27_0    anaconda
beautifulsoup4            4.4.1                    py27_0    defaults
bitarray                  0.8.1                    py27_0    defaults
blaze                     0.9.0                     <pip>
blaze-core                0.9.0                    py27_0    defaults
bokeh                     0.11.1                   py27_0    defaults
boto                      2.39.0                   py27_0    defaults
bottleneck                1.0.0               np110py27_0    defaults
cairo                     1.12.18                       6    defaults
cdecimal                  2.3                      py27_0    defaults
cffi                      1.5.2                    py27_1    defaults
clyent                    1.2.2                    py27_0    defaults
colorama                  0.3.7                    py27_0    defaults
conda                     4.0.5                    py27_0    defaults
conda-build               1.20.0                   py27_0    defaults
conda-env                 2.4.5                    py27_0    defaults
configobj                 5.0.6                    py27_0    defaults
configparser              3.5.0b2                  py27_1    defaults
cryptography              1.3.1                    py27_0    defaults
curl                      7.45.0                        0    defaults
cycler                    0.10.0                   py27_0    defaults
cython                    0.24                     py27_0    defaults
cytoolz                   0.7.5                    py27_0    defaults
datashape                 0.5.1                    py27_0    defaults
decorator                 4.0.9                    py27_0    defaults
docutils                  0.12                     py27_0    defaults
entrypoints               0.2                      py27_1    defaults
enum34                    1.1.3                    py27_0    defaults
et-xmlfile                1.0.1                     <pip>
et_xmlfile                1.0.1                    py27_0    defaults
fastcache                 1.0.2                    py27_0    defaults
flask                     0.10.1                   py27_1    defaults
fontconfig                2.11.1                        5    defaults
freetype                  2.5.5                         0    defaults
funcsigs                  1.0.0                    py27_0    defaults
futures                   3.0.5                    py27_0    defaults
geos                      3.3.3                         0    anaconda
gevent                    1.1.0                    py27_0    defaults
gevent-websocket          0.9.5                    py27_1    defaults
greenlet                  0.4.9                    py27_0    defaults
grin                      1.2.1                    py27_1    defaults
h5py                      2.6.0               np110py27_1    defaults
hdf5                      1.8.16                        0    defaults
idna                      2.1                      py27_0    defaults
imagesize                 0.7.0                    py27_0    defaults
ipaddress                 1.0.14                   py27_0    defaults
ipykernel                 4.3.1                    py27_0    defaults
ipython                   4.1.2                    py27_1    defaults
ipython-genutils          0.1.0                     <pip>
ipython-notebook          4.0.4                    py27_0    defaults
ipython-qtconsole         4.0.1                    py27_0    defaults
ipython_genutils          0.1.0                    py27_0    defaults
ipywidgets                4.1.1                    py27_0    defaults
itsdangerous              0.24                     py27_0    defaults
jasper                    1.900.1                       3    IOOS
jbig                      2.1                           0    defaults
jdcal                     1.2                      py27_0    defaults
jedi                      0.9.0                    py27_0    defaults
jinja2                    2.8                      py27_0    defaults
jpeg                      8d                            0    defaults
jsonschema                2.4.0                    py27_0    defaults
jupyter                   1.0.0                    py27_2    defaults
jupyter-client            4.2.2                     <pip>
jupyter-console           4.1.1                     <pip>
jupyter-core              4.1.0                     <pip>
jupyter_client            4.2.2                    py27_0    defaults
jupyter_console           4.1.1                    py27_0    defaults
jupyter_core              4.1.0                    py27_0    defaults
libffi                    3.2.1                         0    defaults
libgfortran               3.0                           0    defaults
libpng                    1.6.17                        0    defaults
libsodium                 1.0.3                         0    defaults
libtiff                   4.0.6                         1    defaults
libxml2                   2.9.2                         0    defaults
libxslt                   1.1.28                        0    defaults
llvmlite                  0.10.0                   py27_0    defaults
lxml                      3.6.0                    py27_0    defaults
markupsafe                0.23                     py27_0    defaults
matplotlib                1.5.1               np110py27_0    defaults
mistune                   0.7.2                    py27_0    defaults
mkl                       11.3.1                        0    defaults
mpmath                    0.19                     py27_0    defaults
multipledispatch          0.4.8                    py27_0    defaults
nbconvert                 4.2.0                    py27_0    defaults
nbformat                  4.0.1                    py27_0    defaults
networkx                  1.11                     py27_0    defaults
nltk                      3.2.1                    py27_0    defaults
nose                      1.3.7                    py27_0    defaults
notebook                  4.2.0                    py27_0    defaults
numba                     0.25.0              np110py27_0    defaults
numexpr                   2.5.2               np110py27_0    defaults
numpy                     1.10.4                   py27_1    defaults
odo                       0.4.2                    py27_0    defaults
openblas                  0.2.14                        4    defaults
openpyxl                  2.3.2                    py27_0    defaults
openssl                   1.0.2g                        0    defaults
pandas                    0.18.0              np110py27_0    defaults
patchelf                  0.8                           0    defaults
path.py                   8.2                      py27_0    defaults
patsy                     0.4.1                    py27_0    defaults
pep8                      1.7.0                    py27_0    defaults
pexpect                   4.0.1                    py27_0    defaults
pickleshare               0.5                      py27_0    defaults
pillow                    3.2.0                    py27_0    defaults
pip                       8.1.1                    py27_1    defaults
pixman                    0.32.6                        0    defaults
ply                       3.8                      py27_0    defaults
psutil                    4.1.0                    py27_0    defaults
ptyprocess                0.5                      py27_0    defaults
py                        1.4.31                   py27_0    defaults
py2cairo                  1.10.0                   py27_2    defaults
pyasn1                    0.1.9                    py27_0    defaults
pycairo                   1.10.0                   py27_0    defaults
pycosat                   0.6.1                    py27_0    defaults
pycparser                 2.14                     py27_0    defaults
pycrypto                  2.6.1                    py27_0    defaults
pycurl                    7.19.5.3                 py27_0    defaults
pyflakes                  1.1.0                    py27_0    defaults
pygments                  2.1.3                    py27_0    defaults
pygrib                    2.0.0                     <pip>
pyopenssl                 0.15.1                   py27_2    defaults
pyparsing                 2.1.1                    py27_0    defaults
pyqt                      4.11.4                   py27_1    defaults
pytables                  3.2.2               np110py27_3    defaults
pytest                    2.9.1                    py27_0    defaults
python                    2.7.11                        0    defaults
python-dateutil           2.5.2                    py27_0    defaults
pytz                      2016.3                   py27_0    defaults
pyyaml                    3.11                     py27_1    defaults
pyzmq                     15.2.0                   py27_0    defaults
qt                        4.8.7                         0    defaults
qtconsole                 4.2.1                    py27_0    defaults
readline                  6.2                           2    defaults
requests                  2.9.1                    py27_0    defaults
rope                      0.9.4                    py27_1    defaults
scikit-image              0.12.3              np110py27_0    defaults
scikit-learn              0.17.1              np110py27_0    defaults
scipy                     0.17.0              np110py27_2    defaults
setuptools                20.7.0                   py27_0    defaults
simplegeneric             0.8.1                    py27_0    defaults
singledispatch            3.4.0.3                  py27_0    defaults
sip                       4.16.9                   py27_0    defaults
six                       1.10.0                   py27_0    defaults
snowballstemmer           1.2.1                    py27_0    defaults
sockjs-tornado            1.0.1                    py27_0    defaults
sphinx                    1.4.1                    py27_0    defaults
sphinx-rtd-theme          0.1.9                     <pip>
sphinx_rtd_theme          0.1.9                    py27_0    defaults
spyder                    2.3.8                    py27_1    defaults
spyder-app                2.3.8                    py27_0    defaults
sqlalchemy                1.0.12                   py27_0    defaults
sqlite                    3.9.2                         0    defaults
ssl_match_hostname        3.4.0.2                  py27_1    defaults
statsmodels               0.6.1               np110py27_0    defaults
sympy                     1.0                      py27_0    defaults
tables                    3.2.2                     <pip>
terminado                 0.5                      py27_1    defaults
theano                    0.7.0               np110py27_0    defaults
tk                        8.5.18                        0    defaults
toolz                     0.7.4                    py27_0    defaults
tornado                   4.3                      py27_0    defaults
traitlets                 4.2.1                    py27_0    defaults
ujson                     1.35                     py27_0    defaults
unicodecsv                0.14.1                   py27_0    defaults
util-linux                2.21                          0    defaults
werkzeug                  0.11.8                   py27_0    defaults
wheel                     0.29.0                   py27_0    defaults
xlrd                      0.9.4                    py27_0    defaults
xlsxwriter                0.8.4                    py27_0    defaults
xlwt                      1.0.0                    py27_0    defaults
xz                        5.0.5                         1    defaults
yaml                      0.1.6                         0    defaults
zeromq                    4.1.3                         0    defaults
zlib                      1.2.8                         0    defaults

CDAT

Will eofs be included in the next release of UV-CDAT?

Eof weights bug

Hi there,

I've learned that sometimes giving weights='coslat' or 'area' option in Eof reverses sign (positive to negative or vise versa), looks like a bug.. (It is weird that not always but for some specific case.. no idea why..) According to the interface description, it looks like weighting is square-root so should not have any effect on sign, but turning off those weight function bring back to original sign.. Do you have any idea? I have some figures can show if you want.

I am testing with UV-CDAT implemented version, and had no chance to test the conda version yet.

cdms2 in python3 has issue with eofs

CDAT/cdms#294

Potential issue with MKL 2018

Hi @ajdawson ,

I am reporting a potential issue with recent version of MKL.

My code is using eofs in it, and it used to work well in my environment. But one day it stopped working when I upgraded the environment.

The error was coming from standard.py, returning runningtime overflow error or DLASCL parameter error (like this) from self._L = Lh * Lh / normfactor.

I learned that my issue was able to be solved by downgrading my mkl version from 2018 to 2017 as below:

>> conda install mkl=2017.0.3

Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /export/lee1043/anaconda2/envs/pmp_nightly:

The following packages will be DOWNGRADED:

    mkl:   2018.0.0-hb491cac_4   --> 2017.0.3-0   
    numpy: 1.13.1-py27hd1b6e02_2 --> 1.13.1-py27_0

Not sure this would be issue for others as well, but it would be great if this could be checked. Sharing this for your interest.

Thanks.

projectField not working on data with non-dimension coordinates in latest xarray version

xarray released a few days ago the 0.19.0 version which comes with some deprecations (pydata/xarray#5630) that seem to affect data with non-dimension coordinate only.

Sample code

import xarray as xr
from eofs.xarray import Eof

# Load example data from xarray
data = xr.tutorial.open_dataset("air_temperature").air

# Compute anomaly
anom = data.groupby("time.month") - data.groupby("time.month").mean()

# Create the Eof solver with a subset of the data
solver = Eof(anom.sel(time=slice("2013-01", "2013-12")))

# Project all the data
solver.projectField(anom, neofs=2)

This is the error raised

Traceback (most recent call last):
  File "/data/users/service/index/test.py", line 15, in <module>
    solver.projectField(anom, neofs=2)
  File "/home/service/miniconda3/envs/pangeo/lib/python3.9/site-packages/eofs/xarray.py", line 639, in projectField
    pcs.coords.update({coord.name: (coord.dims, coord)
  File "/home/service/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/coordinates.py", line 163, in update
    coords, indexes = merge_coords(
  File "/home/service/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/merge.py", line 472, in merge_coords
    collected = collect_variables_and_indexes(aligned)
  File "/home/service/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/merge.py", line 294, in collect_variables_and_indexes
    variable = as_variable(variable, name=name)
  File "/home/service/miniconda3/envs/pangeo/lib/python3.9/site-packages/xarray/core/variable.py", line 121, in as_variable
    raise TypeError(
TypeError: Using a DataArray object to construct a variable is ambiguous, please extract the data using the .data property.

Changing the last line to

solver.projectField(anom.drop("month"), neofs=2)

fixes the issue, however a non-dimension coordinate is lost, being in this case the coordinate 'month' that comes from the groupby operation from xarray.

Testing the same sample code with the previous xarray version (0.18.2) yields the expected result

<xarray.DataArray 'pseudo_pcs' (time: 2920, mode: 2)>
array([[  50.44886 ,  -78.26509 ],
       [  21.369547,  -98.04355 ],
       [   8.925724, -110.18372 ],
       ...,
       [ -47.0296  , -151.02394 ],
       [ -45.16002 , -128.5353  ],
       [ -27.55614 ,  -93.23076 ]], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
  * mode     (mode) int64 0 1
    month    (time) int64 1 1 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12 12 12 12
Attributes:
    long_name:  air_pseudo_pcs

Following the suggestion in the error raised by changing this line

eofs/lib/eofs/xarray.py

Lines 638 to 640 in 603ed8e

    
           # Add non-dimension coordinates. 
        
           pcs.coords.update({coord.name: (coord.dims, coord) 
        
                              for coord in time_ndcoords})

            # Add non-dimension coordinates.
            pcs.coords.update({coord.name: (coord.dims, coord.data)
                               for coord in time_ndcoords})

Solves the issue with no apparent breaking change. I can send a simple PR if it seems okey.

Metadata-aware solver interface to cf-python fields

Hi, we've had a user feature request to our project cf-python for the ability to conduct EOF & rotated EOF analysis with our intrinsic data object, the cf-python field, or more precisely the 'field construct' of the CF data model (see also the cfdm library).

Bryan Lawrence recommended your library as a potential solution for this, notably since it appears to be open for interfacing in a way that allows for management of the underlying metadata, as demonstrated by the Iris interface module. cf-python makes use of numpy arrays under-the-hood but our philosophy & one of our core USPs is to enable data analysis that preserves CF-compliant metadata, so the standard numpy solver interface is not appropriate.

Therefore we were wondering if you'd be happy to include a module for a cf-python solver interface? If so, we'd write it (I've volunteered so I would write most, if not all, of it) & then I suggest we can put it up as a Pull Request for your review. If that sounds agreeable, please let us know any requirements/advice you might have for us to help us to develop it so it fits in as you would like, otherwise we can use the iris module as a guide. Thanks.

Multivariate EOF Variance Discrepancies

Hey, first off thanks for developing the eofs package, it has helped me out alot with performing univariate EOFs.

So this may very well be my lack of theoretical understanding, please forgive me if it is.
But when performing MEOFs I wanted to see how the variance explained was distributed among the correspondingly reconstructed fields, and it seems off to me.
I tried comparing with both anomaly fields and also standardised data before plugging them into the solver.

I compared the outputs of the two final commands below, which is done specifically on standardised data so that the SVD's variance fraction and the constructed data's variance should be equal.

# mean = 0, std = 1 for each dataarray
m_solver = MultivariateEof(list_data_arrays, weights=list_wgts)

# These numbers don't compare with standardised data
var_fraction = numpy.sum(m_solver.varianceFraction(neigs=n))

reconstructed_var = 0
for i in range(0, N_vars):
  reconstructed_var += numpy.nanvar(m_solver.reconstructedField(n)[i])
reconstructed_var /= N_vars 

# Should be around 0
var_fraction - reconstructed_var

I have three datasets which I have done this with, comparing in any combination of the three.

If this is the wrong output, could it be the way the MEOF is computed? The few papers I have found on MEOFs stack the datasets vertically, along the time axis, which is the opposite of what the MultivariateEOFs object does as far as I could tell.

Regards,
Boooke

How to obtain the measure of the percent variance explained by each mode?

e.g. Mode 1 represents 45%, mode 2 represents 13% and so on.

Thanks!

Better build matrix

This could be improved further. The main issue is the definition of "complete", which depends on what is available (xarray everywhere, iris on 2.7 and 3.4, cdms2 on 2.7 only).

Keep multiple revisions of the documentation

Re-organise the deployed documentation (gh-pages branch) so that documentation for older versions can be preserved.

Test failures with numpy 1.10

Getting some errors and failures with numpy 1.10 on OS X. Most appear to relate to getting MaskedArrays when expecting plain arrays and vice versa.

Error in "eofsAsCorrelation" 'numpy.ndarray' object has no attribute 'filled'

Hi, I've just found the following error after trying your iris example using my own cube. I'm using numpy version '1.10.1'

AttributeError Traceback (most recent call last)
/home/scott/Copy/WORK/WIP/ASL_SOM_vs_EOF/EOF/asl_eof.py in ()
18 # PC time series and the input SST anomalies at each grid point, and the
19 # leading PC time series itself.
---> 20 eof1 = solver.eofsAsCorrelation(neofs=1)
21 pc1 = solver.pcs(npcs=1, pcscaling=1)
22

/home/scott/PYTHON/eofs/iris.pyc in eofsAsCorrelation(self, neofs)
288
289 """
--> 290 eofs = self._solver.eofsAsCorrelation(neofs)
291 eofdim = DimCoord(range(eofs.shape[0]),
292 var_name='eof',

/home/scott/PYTHON/eofs/standard.pyc in eofsAsCorrelation(self, neofs)
364 # numpy array filled with numpy.nan.
365 if not self._filled:
--> 366 c = c.filled(fill_value=np.nan)
367 return c
368

AttributeError: 'numpy.ndarray' object has no attribute 'filled'

issue when using xarray + dask

Hi there,

I'm trying to use eofs package. It seems to work OK when I use numpy arrays or when I only use xarray. But I can't get around myself using it with xarray+dask.

I've reduced my dataset into something very small.

Here are 3 example notebooks...

@ScottWales, am I doing something wrong here? I also tried chunking .chunk('time'=1) but I still had the same issue...

EOF using complex values

I would like to use your package for looking at wind fields by doing eof analysis on complex valued arrays. In doing a simple test using a 3x3x3 real value array and the same array as type complex, the first 2 eofs are the same but the third is different. I am guessing the third eof for this simple problem is just noise anyway but was curious if you had done any analysis with complex values.
Thanks

Support for ensemble data

Hi there, I've just got started using the eofs package for analysis of some forecast model data. Really appreciate the functionality of the package and has saved me a lot of time already.

The model I'm working with is an ensemble based system, so ideally for the purposes of the analysis I'd like to treat each individual ensemble as an extra set of samples on the time dimension. (e.g. if I have 12 ensembles over 100 time points, on an x by y lat long grid, I end up computing eofs of 1200 time points on my x, y grid)
My data is stored in iris cubes.
Are there any plans to add support for this? If not I'm happy to try adding it to the source myself if I can find the time. Any suggestions on the best approach would be appreciated.
Cheers,
Tom

Feature Request: MultivariateEof for xarray objects

Is there a reason MultivariateEof has not been implemented yet for xarray objects? If not, I'd be happy to contribute the feature if you can give some pointers on how best that would be done.

	# Add non-dimension coordinates.
	pcs.coords.update({coord.name: (coord.dims, coord)
	for coord in time_ndcoords})