pastas / pastastore Goto Github PK

View Code? Open in Web Editor NEW

15.0 9.0 4.0 6.14 MB

:spaghetti: :convenience_store: Tools for managing timeseries and Pastas models

Home Page: https://pastastore.readthedocs.io

License: MIT License

Python 100.00%

timeseries analysis database pandas data-management groundwater hydrology pastas arctic python

pastastore's Introduction

Pastas: Analysis of Groundwater Time Series

Important

As of Pastas 1.5, noisemodels are not added to the Pastas models by default anymore. Read more about this change here.

Pastas: what is it?

Pastas is an open source python package for processing, simulating and analyzing groundwater time series. The object oriented structure allows for the quick implementation of new model components. Time series models can be created, calibrated, and analysed with just a few lines of python code with the built-in optimization, visualisation, and statistical analysis tools.

Documentation & Examples

Documentation is provided on the dedicated website pastas.dev
Examples can be found on the examples directory on the documentation website
View and edit a working example notebook of a Pastas model in MyBinder
A list of publications that use Pastas is available in a dedicated Zotero group

Get in Touch

Questions on Pastas can be asked and answered on Github Discussions.
Bugs, feature requests and other improvements can be posted as Github Issues.
Pull requests will only be accepted on the development branch (dev) of this repository. Please take a look at the developers section on the documentation website for more information on how to contribute to Pastas.

Quick installation guide

To install Pastas, a working version of Python 3.9, 3.10, 3.11, or 3.12 has to be installed on your computer. We recommend using the Anaconda Distribution as it includes most of the python package dependencies and the Jupyter Notebook software to run the notebooks. However, you are free to install any Python distribution you want.

Stable version

To get the latest stable version, use:

pip install pastas

Update

To update pastas, use:

pip install pastas --upgrade

Developers

To get the latest development version, use:

pip install git+https://github.com/pastas/pastas.git@dev#egg=pastas

Related packages

Pastastore is a Python package for managing multiple timeseries and pastas models
Metran is a Python package to perform multivariate timeseries analysis using a technique called dynamic factor modelling.
Hydropandas can be used to obtain Dutch timeseries (KNMI, Dinoloket, ..)
PyEt can be used to compute potential evaporation from meteorological variables.

Dependencies

Pastas depends on a number of Python packages, of which all of the necessary are automatically installed when using the pip install manager. To summarize, the dependencies necessary for a minimal function installation of Pastas

numpy>=1.7
matplotlib>=3.1
pandas>=1.1
scipy>=1.8
numba>=0.51

To install the most important optional dependencies (solver LmFit and function visualisation Latexify) at the same time with Pastas use:

pip install pastas[full]

or for the development version use:

pip install git+https://github.com/pastas/pastas.git@dev#egg=pastas[full]

How to Cite Pastas?

If you use Pastas in one of your studies, please cite the Pastas article in Groundwater:

Collenteur, R.A., Bakker, M., Caljé, R., Klop, S.A., Schaars, F. (2019) Pastas: open source software for the analysis of groundwater time series. Groundwater. doi: 10.1111/gwat.12925.

To cite a specific version of Pastas, you can use the DOI provided for each official release (>0.9.7) through Zenodo. Click on the link to get a specific version and DOI, depending on the Pastas version.

Collenteur, R., Bakker, M., Caljé, R. & Schaars, F. (XXXX). Pastas: open-source software for time series analysis in hydrology (Version X.X.X). Zenodo. http://doi.org/10.5281/zenodo.1465866

pastastore's People

Contributors

Stargazers

Watchers

Forkers

tomvansteijn webclinic017 mattbrst surajitdb

pastastore's Issues

Storing oseries to models relationship

The latest PR #49 adds functionality to keep track of models per oseries. This is useful to keep track of, e.g. for getting a list of models for a certain location. The downside of the current implementation is that it requires a run through all stored models to build this dictionary, which can take a few seconds when creating a Connector object linking to an existing database.

This issue is a reminder to maybe think about a faster more efficient way to keep track of this, i.e. store this relationship in a separate library that is updated with each add_model() and del_model() call. This avoids having to rebuild this dictionary each time you connect to the database. Or perhaps another solution...?

Pastastore sets log-level of Pastas to Error

Not sure if this is intended behavior, but Paststore sets the log-level to Error, on importing pastastore (in yaml_interface.py).

Maybe it is better to remove these two logger-lines from yaml_interface.py:

ps.logger.setLevel("ERROR")

logging.basicConfig(level="INFO")

What do you think @dbrakenhoff ?

make Connector methods directly available in PastaStore

Suggestion to make the read/write/delete methods directly available under the PastaStore object.

So the following code

ml = pstore.conn.get_models("my_little_model")

would become:

ml = pstore.get_models("my_little_model")

This can be done by registering all the (non_internal) methods from the Connector when initializing a PastaStore object, e.g:

def __init__(self, ..., conn):
    ...
    self.get_models = conn.get_models

I kind of liked having a split between the read/write/delete methods and the other logic, but I think typing laziness will win the day, so I will add this in a next release.

Add ArcticDBConnector

Add a new ArcticDB connector to work with the new version of Arctic: https://docs.arcticdb.io/
The API looks similar so it should be fairly simple to implement based on the existing class.

Advantages:

No longer requires MongoDB instance, only the python package install is necessary, if I'm reading everything correctly
Probably better maintained and more modern setup (perhaps we can test on GH actions again).

Keep the old one for backwards compatibility and old(er) projects.

[Enhancement] Force Data type of oseries index to strings

Finally got to move some projects to Pastastore, works very well two far! Small issue here:

I have time series with numbers as series name. Although I force these numbers to be strings, this may still cause failure at a later stage (e.g., store.get_statistics(["evp"]).

It would be nicety simply force the index of store.oseries to strings, to prevent any issues.

Arctic not yet supporting pandas 1.0

Arctic contains at least one reference to a function that was deprecated in pandas 1.0.
The fix is really simple. Just change

https://github.com/man-group/arctic/blob/684bc8c706e80bd4d5763dca97e37130b4b859e0/arctic/serialization/numpy_records.py#L256

into

return Series(data=data, index=index, name=name)

After changing this locally I was able to use pastastore with pandas 1.0. But do be careful because there might be other issues lurking.

The PR to fix this has been submitted but still has to be accepted: man-group/arctic#841 (comment)

PastaStore and Connector names and logical defaults

The PastaStore object now takes a name, which is not something that's really used, so we could default to the connector name and optionally let users pass their own.

The Connectors now take a name and some take a path/connection string. This is not implemented consistently, and also doesn´t make sense for all connectors.

ArcticConnector requires a unique name to create a library with that name.
PystoreConnector I think also requires a name to create a database in a certain folder.
PasConnector requires a path but does not use the name, it would probably make more sense to pass a directory in which a folder with the provided name is created, more in line with the other two connectors.
DictConnector takes a name argument for consistency reasons but my proposal would be to set a default name.

I'm not sure how to make these changes without breaking some old code (for PasConnector), but I think would be good to implement sooner rather than later.

add pastas.timeseries.TimeSeries object to a PastaStore using PastaStore.add_oseries()

I think it would be nice to modify PastaStore.add_oseries() in such a way that the series argument can be a pastas.timeseries.TimeSeries object. Now only pandas DataFrame and Series are supported. It can save you some code, for example when adding a Dino csv file to the store.

Now:

series = ps.read_dino(fname)
store.add_oseries(series.series_original, name=series.name, metadata=series.metadata)

Proposed change:

series = ps.read_dino(fname)
store.add_oseries(series)

What are your thoughts on this?

PasConnector crashes when reading timeseries if no metadata is provided

Should be an easy fix, by testing if metadata file exists.

problems with float to int conversion when reading pastastore from zip

I have a stress consisting of integers. To avoid any pastas validation errors I convert the stress from integers to floats before I add it to the pastastore. I save the pastastore to a .zip file and then I get an error when I try to read it from the .zip. When the stress is read from the zip file it is converted back from floats to integers which results in an error when validating the series in pastas.

to reproduce:

import pastastore as pst
import pandas as pd
df = pd.DataFrame(np.random.randint(0, 100, 100), index=pd.date_range('2000-1-1' ,periods=100))
# create PastaStore instance
pstore = pst.PastaStore(pst.DictConnector())

pstore.add_stress(df.astype(float), "evap", kind="evap", metadata={"x": 100_000, "y": 400_000})


pstore.to_zip('pstore.zip')
pstore_in = pst.PastaStore.from_zip("pstore.zip", conn=pst.DictConnector())

Make name of recharge stressmodel an optional parameter in add_recharge

Can we make the name of the rechange stressmodel an optional parameter in add_recharge? This name is hardcoded as 'recharge' right now. Or is this name used in pastastore in other places as well?

Saving and reloading models with series set to False

Describe the bug
I'm unsure whether this is a bug or a misunderstanding from my part. I'm trying to save a calibrated Pastas model so that I can re-use it later in an (semi-)operational setting. Given it is sort of an operational setting it would be nice to have the saved model and saving / loading times as small as possible. Therefore, when saving the model I put the series argument to False which reduces the size of the model. But when I try to load the model I get an error that he can't find the argument series

To Reproduce

# Save any model
ml.to_file('anymodel.pas', series=False)
loaded_model = ps.io.load('anymodel.pas')

Expected behavior
It would be nice if I could reload the model so that I can use it to forecast groundwater levels on a new period (outside of the calibration range)

Screenshots
This is the error message I'm getting:

oseries = ps.TimeSeries(**data["oseries"])
TypeError: __init__() missing 1 required positional argument: 'series'

Python package version
Python version: 3.8.12
Numpy version: 1.20.3
Scipy version: 1.8.0
Pandas version: 1.4.1
Pastas version: 0.19.0
Matplotlib version: 3.5.1

pandas.Series and PystoreConnector

Writing pandas.Series to PystoreConnector might not work: ranaroussi/pystore#39 (comment)

Fix should be easy (internal conversion to pandas.DataFrame), but waiting for answer to my question first.

option to return series and metadata simultaneously

i.e.

o, meta = store.get_oseries("oseries", return_metadata=True)

Think about how to return when list of names is passed. Probably:

{
  "oseries1": [o1, meta1],
  "oseries2": [o2, meta2]
}

KeyError on step trend

KeyError: 'step0_A'

In function:
pst.util.frontiers_checks(pstore, modelnames)

Same issue with linear-trend.

Please skip the check on step/linear trend (and possible others) with a warning or implement a check,

Kind regards,
Marco

Some error-code:

a= pst.util.frontiers_checks(pstore,
modelnames=["".join(code)+name],
check1_rsq = True,
check1_threshold = 0.7,
check2_autocor = True,
check3_tmem = True,
check4_gain = True)
Running model diagnostics: 0%| | 0/1 [00:00<?, ?it/s]C:\Users\mvanb\anaconda3\envs\Evides_TRA_Bwal\lib\site-packages\pastastore\util.py:766: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
C:\Users\mvanb\anaconda3\envs\Evides_TRA_Bwal\lib\site-packages\pastastore\util.py:747: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
C:\Users\mvanb\anaconda3\envs\Evides_TRA_Bwal\lib\site-packages\pastastore\util.py:747: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
C:\Users\mvanb\anaconda3\envs\Evides_TRA_Bwal\lib\site-packages\pastastore\util.py:747: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
C:\Users\mvanb\anaconda3\envs\Evides_TRA_Bwal\lib\site-packages\pastastore\util.py:747: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.

Traceback (most recent call last):

File ~\anaconda3\envs\Evides_TRA_Bwal\lib\site-packages\pandas\core\indexes\base.py:3802 in get_loc
return self._engine.get_loc(casted_key)

File pandas_libs\index.pyx:138 in pandas._libs.index.IndexEngine.get_loc

File pandas_libs\index.pyx:165 in pandas._libs.index.IndexEngine.get_loc

File pandas_libs\hashtable_class_helper.pxi:5745 in pandas._libs.hashtable.PyObjectHashTable.get_item

File pandas_libs\hashtable_class_helper.pxi:5753 in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'step0_A'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

Cell In[516], line 1
a= pst.util.frontiers_checks(pstore,

File ~\anaconda3\envs\Evides_TRA_Bwal\lib\site-packages\pastastore\util.py:756 in frontiers_checks
gain = ml.parameters.loc[f"{sm_name}_A", "optimal"]

File ~\anaconda3\envs\Evides_TRA_Bwal\lib\site-packages\pandas\core\indexing.py:1066 in getitem
return self.obj._get_value(*key, takeable=self._takeable)

File ~\anaconda3\envs\Evides_TRA_Bwal\lib\site-packages\pandas\core\frame.py:3924 in _get_value
row = self.index.get_loc(index)

File ~\anaconda3\envs\Evides_TRA_Bwal\lib\site-packages\pandas\core\indexes\base.py:3804 in get_loc
raise KeyError(key) from err

KeyError: 'step0_A'

Error when running `solve_models()` on a pastastore

When I run pstore.solve_models() I get this error:

Solving models: 0it [00:00, ?it/s]INFO:pastas.timeseries:Nan-values were removed at the end of the time series B58A0092-004.
INFO:pastas.timeseries:Cannot determine frequency of series B58A0092-004: freq=None. The time series is irregular.
INFO:pastas.timeseries:Time Series B58A0092-004: 13 nan-value(s) was/were found and filled with: drop.
INFO:pastas.timeseries:User provided frequency for time series RH_ELL: freq=D
INFO:pastas.timeseries:User provided frequency for time series EV24_ELL: freq=D
Solving models: 0it [00:00, ?it/s]
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-15-3c39fc9ad072> in <module>
----> 1 pstore.solve_models()

~\anaconda3\envs\dev\lib\site-packages\pastastore\store.py in solve_models(self, mls, report, ignore_solve_errors, store_result, progressbar, **kwargs)
    619         desc = "Solving models"
    620         for ml_name in (tqdm(mls, desc=desc) if progressbar else mls):
--> 621             ml = self.conn.get_models(ml_name)
    622 
    623             m_kwargs = {}

~\anaconda3\envs\dev\lib\site-packages\pastastore\base.py in get_models(self, names, return_dict, progressbar, squeeze, update_ts_settings)
    737         """
    738         models = []
--> 739         names = self._parse_names(names, libname="models")
    740         desc = "Get models"
    741         for n in (tqdm(names, desc=desc) if progressbar else names):

~\anaconda3\envs\dev\lib\site-packages\pastastore\base.py in _parse_names(self, names, libname)
    935                 raise ValueError(f"No library '{libname}'!")
    936         else:
--> 937             raise NotImplementedError(f"Cannot parse 'names': {names}")
    938 
    939     @ staticmethod

NotImplementedError: Cannot parse 'names': Model(oseries=B58A0092-004, name=B58A0092-004, constant=True, noisemodel=True)

I think the error occurs since version 0.7.0 because pstore.conn.models returns a pastastore.base.ModelAccessor now instead of a list.

Improve readme usage section

Currently contains too much technical info about Connectors. Should contain something along these lines:

Gettings started.
Define Connector [link to extra info in docs] and PastaStore:

conn = pst.PasConnector("test", "./pastas_db")
pstore = pst.PastaStore(conn.name, conn)

Add head observation time series:

o = pd.read_csv("obs.csv", index_col=[0])
meta = {"x": 100, "y": 200}
pstore.add_oseries(o, "obs1", metadata=meta)
# view oseries metadata with
pstore.oseries

Add stresses time series:

p = pd.read_csv("prec.csv", index_col=[0]
e = pd.read_csv("evap.csv", index_col=[0]
pstore.add_stress(p, "prec1", kind="prec", metadata={"x":110, "y":195}
pstore.add_stress(e, "evap1", kind="evap", metadata={"x":110, "y":195}
# view stresses metadata
pstore.stresses

Create time series models

ml = pstore.create_model("obs1", add_recharge=True)
ml.solve()
pstore.add_model(ml)

Load model from PastaStore:

ml2 = pstore.get_models("obs1")

Drop official arctic and pystore support

Arctic is a fantastic tool, but the Python package is not sufficiently actively maintained. At least not actively enough for it to keep up with pandas/numpy releases and supported versions. And that's getting annoying.

Additionally, I haven't used Pystore in ages, and it also adds quite some dependency complexity for IMO minimal gain (PasConnector is easier, and ArcticConnector is faster). I also don't know what the maintenance status of that package is.

I think I will remove arctic and pystore support from testing, which will simplify installation and speed up testing. I will keep optional dependencies defined so users can install requisite packages with

pip install pastastore[arctic]
# or
pip install pastastore[pystore]

I doubt the first install statement will work however because of the new version of pip and deprecation of setup.py installs. The docs will be updated to explain how to install arctic or pystore manually, but probably on a separate page.

It might also be time to find out whether there are some better alternatives with similar speed/compression but better maintenance. Some potential ideas:

redis-py: https://github.com/redis/redis-py/blob/master/tests/test_timeseries.py, https://redis-py.readthedocs.io/en/stable/redismodules.html#redistimeseries-commands
pymongo (use mongoDB directly without Arctic): https://www.mongodb.com/docs/manual/core/timeseries-collections/
timescaleDB: https://docs.timescale.com/timescaledb/latest/quick-start/python/
influxDB: https://www.influxdata.com/blog/getting-started-python-influxdb/

Add descriptions to waitbars

see title

Add Metran support to Pastastore

I'd like to store / create Metran models in Pastastore such that we can easily do analysis on head series, Pastas simulations (and residuals/noise).

Transition YAML interface to TOML?

Since PR #60 add yaml interface by @dbrakenhoff Pastastore supports building Pastas models from YAML files. I really like this addition (even though I don't use it that often). However since Python v3.11 tomllib is supported in the standard library. The language is similar to YAML as that it also emphasizes human readability. So maybe it would be nice to transition to TOML since it drops an external dependency. There are probably upsides and downsides of YAML vs TOML but I haven't had time to look into this.

Force pastastore name to equal pandas object name

Problems arise when the name of the added pandas series (or name of the column in the case of DataFrames) does not match the name that is used in the store.

This discrepancy in names can cause weird stuff to happen when pastas StressModels are created directly from pandas objects (after being retrieved from the store). The name of the series in the model then differs from the name of the series in the pastastore. Since models and timeseries are stored separately, the name of the timeseries in the model cannot be matched to objects in the store, which causes problems.

The proposed solution is to force the name of the pandas series to match the name in the pastastore. Also removing support for multi-column DataFrames should simplify enforcing this requirement.

Add update methods for timeseries and metadata

Make it easy to update a timeseries or metadata dictionary with new data.

Arctic + Pandas issues

Arctic is (somewhat) working on sorting out their pandas version issues (e.g. see man-group/arctic#903 (comment)).

As a result of this work, pastastore tests are failing when using the pandas 1.0.3 version that arctic says it requires, but they pass when using version 1.1.5. As a temporary fix the installation of pandas 1.1.5 in pastastore tests/ci is hardcoded. Note this does lead to a dependency conflict notification.

Current advice for pastastore users who also use the ArcticConnector is to use pandas 1.1.5.

Developments on this issue will be posted here.

make get_tmin_tmax work for models library

Currently, get_tmin_tmax only works for libname = "oseries" or "stresses". I'd like to get this to work for "models" as well where the tmin and tmax are retrieved from the model settings.

Move to pyproject.toml and setup.cfg

I have the files somewhat ready but running into issues installing arctic with its annoying lack of releases and outdated dependencies.

See other issue concerning arctic.

pastastore properties are not same as connector properties

pstore.oseries is not the same as pstore.conn.oseries if after initialization of the pastastore oseries are added or removed.

This is caused by the fact that the property value is registered at initialization, and is a static value from that moment onwards.

Error obtaining model after loading with from_zip()

The pcov data is a dictionary after loading from a zipfile, but pastas expects a DataFrame.
Check why the pcov data isn't converted back into a DataFrame after load, but the model parameters seemingly are.

Error when parsing model dict because of the series names

I create a pastas model and add a Stressmodel to it. Then I add this model to a Pastastore using the following code:

ml = ps.Model(df['stand_m_tov_nap'], name='tm')
w1 = ps.StressModel(df['Volume_sum'], rfunc=ps.Hantush, name='well_totaal', settings='well', up=False)
ml.add_stressmodel(w1)
pstore = pst.PastaStore('test',connector=pst.DictConnector("my_conn"))

pstore.add_model(ml)

Then when I try to retrieve with pstore.get_models('tm') the model I get this error:

Traceback (most recent call last):

  File "<ipython-input-150-256e244d4d4a>", line 1, in <module>
    pstore.get_models('tm')

  File "c:\users\oebbe\02_python\pastastore\pastastore\connectors.py", line 1372, in get_models
    ml = self._parse_model_dict(data)

  File "c:\users\oebbe\02_python\pastastore\pastastore\base.py", line 341, in _parse_model_dict
    raise LookupError(msg)

LookupError: oseries stand_m_tov_nap not present in project

Now I understand that I have to add the oseries and stresses to the pastastore seperately. So I use:

pstore = pst.PastaStore('test',connector=pst.DictConnector("my_conn"))
pstore.add_oseries(df['stand_m_tov_nap'], name='to')
pstore.add_stress(df['Volume_sum'], name='ts')
pstore.add_model(ml)

But still I get the same error when using pstore.get_models('tm'). This is because the oseries is added to the pastastore under the name 'to' while the model dictionary saves the stress series under the name 'stand_m_tov_nap'. This is the name of the pandas Series that was used to create the oseries.

I think this has to do with the way a model is added to the pastastore but I don't know how to solve it.

add notebook for users coming from the original pastas.Project

Help users realize that the switch isn't all that difficult, especially if you're just interested in doing stuff in-memory.

Add `validate_oseries` and `validate_stress` options when adding time series

Use Pastas new validate_ methods to optionally check oseries/stresses before storing them.

emptying models library should also delete oseries_models links

The oseries_models should be deleted as well.

pstore.empty_library("models")

pstore.oseries_models  # still contains references to deleted models

Fix is probably to also delete oseries_models if libname="models" was passed.

Add apply function to pastastore

Add method to apply custom functions to entries in libraries:

func = lambda ml:  ml.get_contribution("well")
pstore.apply("models", func)

should be simple code:

def apply(self, libname, func, names=None, progressbar=True):
    names = self.conn._parse_names(names, libname)
    result = []
    for n in (tqdm(names) if progressbar else names):
        result.append(func(self.conn._get_item(libname, n)))
    return result

kwargs are intepreted differently in adjust_text and annotate in add_labels

          Won't this result in issues if kwargs are parsed to both adjust_text and annotate?

Originally posted by @martinvonk in #110 (comment)

stored name not showing up in stresses dataframe

When adding a stress with a metadata dictionary containing 'name' as a key:

pstore.add_stress(s, "my_name", kind="prec", metadata={"name": "not_what_i_intendend"})

The resulting pstore.stresses DataFrame uses the metadata name, and not the name passed when adding the stress as the index:

                                  kind
name                   
not_what_i_intended               prec

I should be seeing my_name there, regardless of the metadata dictionary.

error when data is not found differs per connector

Maybe make the error when data is not found uniform? Current situation:

ArcticConnector: NoDataFoundException
PystoreConnector: ValueError
PasConnector: FileNotFoundError
DictConnector: KeyError

JSON can't handle datetime.datetime in pastastore metadata

datetime in a dictionary can't be saved when writing the pastastore to a zip file. You get the following error: TypeError: Object of type datetime is not JSON serializable and the .zip file becomes empty.

This could be solved by checking if there are datetime.datetime formats in the metadata when adding a stress/oseries. pandas.to_datetime() should work or just convert to a string.

Towards Pastas 1.0 with some backwards compatibility

Pastas 1.0 will introduce some breaking changes to the pas-files format, meaning that pastas models created with pastas<=0.22.0 cannot be loaded with pastas 1.0. This has direct implications for pastastore as well since we use pastas to read/write the models.

How do we deal with this?

First of all, add support for Pastas versions 0.23.0 and 1.0 --> I think I have this ready, PR will follow shortly.
Pastastore should not care what version of pastas is running and still be able to run all tests --> add different pastas versions to tests?
For users with databases created with pastas<=0.22.0, provide a method to update all pas-files. This will require pastas v0.23.0, and basically reads all models and writes them back into the database. A warning should be added that this method is not guaranteed to cover all cases, and it is preferable to reconstruct the models with a script.
Pastastore should throw errors when older versions of pas files are read from the database when the pastas version does not match. --> I'm assuming pastas will throw this error for us and we don't need to include any extra logic in pastastore.

If anyone is reading along and I'm missing something, feel free to add it here.

ArcticDB causes segfault in GH Actions tests

Not sure exactly why this is happening, but some tests fail because arcticdb causes some kind of segfault, killing Python in the process. Rerunning the failed tests has worked thus far in getting everything to pass.

Somehow it seems some tests are interfering with one another when using arcticdb?

Not a major issue but certainly something to keep an eye on.

Write metadata to timeseries files in PastaStore.to_zip()

Currently the metadata is not written to file, which means when writing the zip, all that metadata is lost.

I have to think of a way to add the metadata to the timeseries json file. Shouldn't be too hard, but probably have to deal with converting all kinds of types that aren't officially allowed in json. Perhaps the PastasEncoder can once again be of use here.

reduce duplication of code in connectors

I think 80% of the code in the connector objects is duplicated. This could probably somehow be moved to the BaseConnector class... Improving this would improve maintainability.

Plot both oseries and stresses by default in data_availability plot

Currently the data_availability plots needs a library name (libname argument): "oseries" v "stresses"

pastastore/pastastore/plotting.py

Lines 210 to 226 in 28ea9f2

    
           def data_availability( 
        
               self, 
        
               libname, 
        
               names=None, 
        
               kind=None, 
        
               intervals=None, 
        
               ignore=("second", "minute", "14 days"), 
        
               ax=None, 
        
               cax=None, 
        
               normtype="log", 
        
               cmap="viridis_r", 
        
               set_yticks=False, 
        
               figsize=(10, 8), 
        
               progressbar=True, 
        
               dropna=True, 
        
               **kwargs, 
        
           ):

I'd like to make the default None which plots both libraries in a plot like the one below. Here the oseries are plotted in the upper Axes, and stresses in the lower Axes and the colorbar is shared.

Thoughts @dbrakenhoff ?

`check3_cutoff` not used in frontiers checks

Using the frontier check you can provide a cutoff number to calculate the tmax of a response function. Currently the given cutoff number is not used and instead the default value of 0.999 is used.

Add missing methods from pastas.Project

While looking through the pastas.Project code, I saw some methods that were not yet implemented in pastastore:

get_parameters
get_statistics
...

This issue is a reminder to look through pastas.Project and check if any of those methods would be nice to include in pastastore.

Use the DictConnector as default when createing a Pastastore

When you create a Pastastore you have to specify the connector. I propose to use the DictConnector (in-memory) as the default to make the Pastastore easier to use.

Add connector for postgres (and/or other db?)

It would be cool to add postgres as a supported database. I recall we attempted to store pastas projects in a postgres database before, so maybe that code is a good starting point for building a new connector?

Or maybe there are other databases we want to support?

Edit: maybe TimescaleDB (which is a timeseries database built on PostgreSQL)?

Add information about DictConnector

The DictConnector is not mentioned in the readme and not included in the example notebook.
This should be added.

Error writing pastastore to .zip file

When I run the code in this notebook

I get the following error:
ValueError: DataFrame index must be unique for orient='columns'.

when trying to write the pastastore to a .zip file using:
store.to_zip('pastasstore.zip', overwrite=True)

Error when loading from zip with stress names consisting of only numeric characters

Somewhere on load the numeric character name is converted to an int, which causes an error when pastastore checks whether the oseries or stress is contained in the index (which is of type str).

A quick fix to wrap s["name"] with str() in self.check_stresses_in_store() is sufficient?

	def data_availability(
	self,
	libname,
	names=None,
	kind=None,
	intervals=None,
	ignore=("second", "minute", "14 days"),
	ax=None,
	cax=None,
	normtype="log",
	cmap="viridis_r",
	set_yticks=False,
	figsize=(10, 8),
	progressbar=True,
	dropna=True,
	**kwargs,
	):

pastas / pastastore Goto Github PK

pastastore's Introduction

Pastas: Analysis of Groundwater Time Series

Pastas: what is it?

Documentation & Examples

Get in Touch

Quick installation guide

Stable version

Update

Developers

Related packages

Dependencies

How to Cite Pastas?

pastastore's People

Contributors

Stargazers

Watchers

Forkers

pastastore's Issues

KeyError: 'step0_A'

Some error-code:

Recommend Projects

Recommend Topics

Recommend Org