Comments (10)
@coastalmodeler thank you for your input. In order to help you I'll need to be able to reproduce your error. Please provide the following:
- OS
- Python version
- pyextremes version
- numpy version
- pandas version
In addition to that I'll need a complete code snippet which can be run as is. For example:
import pandas as pd
import pyextremes
data = pd.read_csv("data.csv")
model = pyextremes.EVA(data)
model.get_extremes()
And provide a link to your data.csv
. You can also make a GitHub gist with jupyter notebook if that's what you prefer.
from pyextremes.
More information. I understand the error comes out of pandas, not your code directly. Just for information, my data looks like:
model = EVA(pd.Series(durations, np.sort(dates)))
print(dates, dates.dtype)
print(durations, durations.dtype)
['2002-01-01T16:00:00.000000000' '2002-01-01T20:00:00.000000000'
'2002-01-02T03:00:00.000000000' ... '2004-10-02T23:00:00.000000000'
'2004-10-03T07:00:00.000000000' '2004-10-03T13:00:00.000000000'] datetime64[ns]
[20. 18. 7. ... 4. 5. 7.] float64
Full stack trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [115], in <module>
----> 1 model.get_extremes(method="BM", block_size="10D")
2 model.plot_extremes()
File ~/.local/lib/python3.9/site-packages/pyextremes/eva.py:452, in EVA.get_extremes(self, method, extremes_type, **kwargs)
450 message = f"for method='{method}' and extremes_type='{extremes_type}'"
451 logger.debug("extracting extreme values %s", message)
--> 452 self.__extremes = get_extremes(
453 method=method,
454 ts=self.data,
455 extremes_type=extremes_type,
456 **kwargs,
457 )
458 self.__extremes_method = method
459 self.__extremes_type = extremes_type
File ~/.local/lib/python3.9/site-packages/pyextremes/extremes/extremes.py:59, in get_extremes(ts, method, extremes_type, **kwargs)
13 """
14 Get extreme events from time series.
15
(...)
56
57 """
58 if method == "BM":
---> 59 return get_extremes_block_maxima(
60 ts=ts,
61 extremes_type=extremes_type,
62 **kwargs,
63 )
64 if method == "POT":
65 return get_extremes_peaks_over_threshold(
66 ts=ts,
67 extremes_type=extremes_type,
68 **kwargs,
69 )
File ~/.local/lib/python3.9/site-packages/pyextremes/extremes/block_maxima.py:148, in get_extremes_block_maxima(ts, extremes_type, block_size, errors, min_last_block)
137 warnings.warn(
138 message=f"{empty_intervals} blocks contained no data",
139 category=NoDataBlockWarning,
140 )
142 logger.debug(
143 "successfully collected %d extreme events, found %s no-data blocks",
144 len(extreme_values),
145 empty_intervals,
146 )
--> 148 return pd.Series(
149 data=extreme_values,
150 index=pd.Index(data=extreme_indices, name=ts.index.name or "date-time"),
151 dtype=np.float64,
152 name=ts.name or "extreme values",
153 ).fillna(np.nanmean(extreme_values))
File ~/.local/lib/python3.9/site-packages/pandas/core/series.py:439, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
437 data = data.copy()
438 else:
--> 439 data = sanitize_array(data, index, dtype, copy)
441 manager = get_option("mode.data_manager")
442 if manager == "block":
File ~/.local/lib/python3.9/site-packages/pandas/core/construction.py:570, in sanitize_array(data, index, dtype, copy, raise_cast_failure, allow_2d)
567 data = list(data)
569 if dtype is not None or len(data) == 0:
--> 570 subarr = _try_cast(data, dtype, copy, raise_cast_failure)
571 else:
572 subarr = maybe_convert_platform(data)
File ~/.local/lib/python3.9/site-packages/pandas/core/construction.py:760, in _try_cast(arr, dtype, copy, raise_cast_failure)
755 subarr = maybe_cast_to_integer_array(arr, dtype)
756 else:
757 # 4 tests fail if we move this to a try/except/else; see
758 # test_constructor_compound_dtypes, test_constructor_cast_failure
759 # test_constructor_dict_cast2, test_loc_setitem_dtype
--> 760 subarr = np.array(arr, dtype=dtype, copy=copy)
762 except (ValueError, TypeError):
763 if raise_cast_failure:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (731,) + inhomogeneous part.
from pyextremes.
After cleaning my data, that is, making sure all dates are represented and "non existing" data is set to zero, the program runs. But this makes me nervous. Indeed, filling the gaps with zeros is like saying zeros is a data actually when it's not...
from pyextremes.
@wiz21b can you share your data so that I can reproduce your error? This is not meant to happen because EVA pre-processes the data during initialization - this may be a scenario I didn't account for.
Also data doesn't have to be at regular intervals.
from pyextremes.
what was the solution? I'm having the same issue
from pyextremes.
@coastalmodeler I have never heard back from @wiz21b so I don't know if the issue is resolved. I can reopen this issue for you if you post details about your error.
from pyextremes.
Cool thanks. I'm getting the same error as wiz21b when I run the get_extremes command. I also get the error when I try to execute some of the plot functions and POT functions. I haven't been able to figure out why.
I'm following exact tutorial case but using data from a different noaa station using the NOAA_COOPS function to download the data into a dataframe. See code below:
tide_gauge=noaa_coops.Station(8775237)
#https://api.tidesandcurrents.noaa.gov/api/prod/#products
df_water_levels=tide_gauge.get_data(
begin_date="20040406",
end_date="20220925",
product="water_level",
datum="NAVD",
units="english",
time_zone="LST")
I then normalize the dataset by adjusting for RSLR:
measured_rslr=5.54*0.00328084 #ft/yr
df_water_levles_corrected=df_water_levels['water_level'].copy().sort_index(ascending=True).astype(float).dropna()
df_water_levels_corrected=df_water_levels_corrected-(df_water_levels_corrected.index.array-pd.to_datetime("1992"))/pd.to_timedelta("365.2425D")*measured_rslr
am=pyextremes.EVA(df_water_levels_corrected)
Everything works up until this point and here is the command that results in the error:
am.get_extremes(method="BM", block_size="365.2425D",errors="ignore")
am.get_extremes(method="BM", block_size="365.2425D",errors="ignore")
C:\Users: NoDataBlockWarning: 1 blocks contained no data
warnings.warn(
Traceback (most recent call last):
Input In [94] in <cell line: 1>
am.get_extremes(method="BM", block_size="365.2425D",errors="ignore")
File ~.conda\envs\work\lib\site-packages\pyextremes\eva.py:452 in get_extremes
self.__extremes = get_extremes(
File ~.conda\envs\work\lib\site-packages\pyextremes\extremes\extremes.py:59 in get_extremes
return get_extremes_block_maxima(
File ~.conda\envs\work\lib\site-packages\pyextremes\extremes\block_maxima.py:148 in get_extremes_block_maxima
return pd.Series(
File ~.conda\envs\work\lib\site-packages\pandas\core\series.py:451 in init
data = sanitize_array(data, index, dtype, copy)
File ~.conda\envs\work\lib\site-packages\pandas\core\construction.py:594 in sanitize_array
subarr = _try_cast(data, dtype, copy, raise_cast_failure)
File ~.conda\envs\work\lib\site-packages\pandas\core\construction.py:784 in _try_cast
subarr = np.array(arr, dtype=dtype, copy=copy)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (18,) + inhomogeneous part.
from pyextremes.
Thank you for your quick response. See info and code below.
- OS: Microsoft Windows 10
- Python Version: 3.8.13
- pyextremes version: 2.2.4
- numpy version: 1.21.5
- pandas version: 1.4.3
- noaa_coops version: 0.1.9
Here's the code. Note, there is no CSV file I'm using noaa-coops to download the data directly into python from the API. The noaa coops wrapper can be found here: https://pypi.org/project/noaa-coops/
import noaa_coops as nc
import pyextremes
import numpy as np
import pandas as pd
tide_gauge=nc.Station(8775237)
#https://api.tidesandcurrents.noaa.gov/api/prod/#products
df_water_levels=tide_gauge.get_data(
begin_date="20040406",
end_date="20220925",
product="water_level",
datum="NAVD",
units="english",
time_zone="LST")
measured_rslr=5.54*0.00328084
df_water_levels_corrected=df_water_levels['water_level'].copy().sort_index(ascending=True).astype(float).dropna()
df_water_levels_corrected=df_water_levels_corrected-(df_water_levels_corrected.index.array-pd.to_datetime("1992"))/pd.to_timedelta("365.2425D")*measured_rslr
am=pyextremes.EVA(df_water_levels_corrected)
am.get_extremes(method="BM",errors="ignore")
from pyextremes.
I think I found the issue. There were duplicate time steps in the NOAA dataset. Once I removed those the code works as intended. I'd guess that was the same problem @wiz21b was having. Thank you for your time.
from pyextremes.
@coastalmodeler thank you for posting your solution here, it was an issue with the EVA
class not removing duplicates - I have included a fix in the latest release
from pyextremes.
Related Issues (20)
- Add API description HOT 1
- model.get_summary and model.plot_diagnostics taking a long time HOT 9
- When getting extremes with threshold, pyextremes should warn that the threshold is too high/low HOT 1
- BUG: Results are not matched with ismev for MLE HOT 4
- Support of covariates HOT 1
- Error when running your quick start example HOT 3
- How are your confidence intervals calculated? HOT 3
- KS test gives incorrect test_statistic HOT 5
- alternative to block_size HOT 2
- Getting confidence intervals for MLE after fitting the model HOT 1
- Support for covariates HOT 2
- Long timeseries support: thinking beyond pandas datetime range HOT 3
- Extracting confidence intervals on fit parameters 'c', 'loc', and 'scale' HOT 1
- Confidence interval question HOT 3
- Error in plot_parameter_stability() HOT 3
- Multi-dimensional indexing (e.g. `obj[:, None]`) is no longer supported. Convert to a numpy array before indexing instead. HOT 3
- Error in plot_mean_residual_life for scipy v1.11.2
- Make pyextremes citable with zenodo? HOT 4
- Multiprocessing in MLE model prevents use on AWS Lambda Functions HOT 1
- Digital Object Identifier for pyextremes HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyextremes.