prms-python / prms-python Goto Github PK

View Code? Open in Web Editor NEW

16.0 3.0 8.0 75.71 MB

A Python package for the PRMS hydrologic model

Home Page: http://prms-python.github.io/PRMS-Python/

License: Other

Python 4.07% Jupyter Notebook 95.77% Roff 0.16%

prms hydrologic modelling-framework python monte-carlo-simulation

prms-python's Introduction

PRMS-Python

Online documentation

PRMS-Python provides a Python interface to PRMS data files and manages PRMS simulations. This module aims to improve the efficiency of PRMS workflows by giving access to PRMS data structures while providing "pythonic" tools to do scenario-based PRMS simulations. By "scenario-based" we mean testing model hypotheses associated with model inputs, outputs, and model structure. For example, parameter sensitivity analysis, where each "scenario" is an iterative perturbation of one or many parameters. Another example "scenario-based" modeling exercise would be climate scenario modeling: what will happen to modeled outputs if the input meteorological data were to change?

Installation

PRMS-Python versions are available on the Python Package Index PyPI and can be installed and upgraded using pip:

pip install prms-python

Alternatively clone-then-pip:

git clone https://github.com/PRMS-Python/PRMS-Python.git
cd PRMS-Python

then

pip install --editable .

Another option is to download the source code as a zip file, unzip it and within the PRMS-Python root directory run:

python setup.py install

Usage and documentation

We reccomend starting with the "getting started" Jupyter notebook for file structure rules that PRMS-Python uses and then moving on to other example notebooks in the notebooks directory. Online documentation is available here.

Building documentation

This project uses the Sphinx documentation engine for Python The documentation source is located in docs/source. Eventually we can wrap the following steps into a script. But for now, to build the documentation, go to the docs/ directory and run

make html

If it fails because of missing dependencies, just install the dependencies it says it's missing. Publishing the docs is now done automatically with any commits are pushed to the master branch.

Unit tests

I run them using nose but that's not required. From the root repo directory

nosetests -v

Contribute

We welcome anyone seriously interested in contributing to PRMS-Python to do so in anyway they see fit. If you are not sure where to begin you can look for current issues or submit a new issue here. You may also fork PRMS-Python and submit a pull request if you would like to offer your direct changes to the package.

Citing PRMS-Python

If you use PRMS-Python for published work we ask that you please cite the PRMS-Python manuscript as follows:

Volk, J. M., & Turner, M. A. (2019). PRMS-Python: A Python framework for programmatic PRMS modeling and access to its data structures. Environmental Modelling & Software, 114, 152–165. https://doi.org/10.1016/J.ENVSOFT.2019.01.006

prms-python's People

Contributors

Stargazers

Watchers

Forkers

drastogi4 gsudershan xuexianwu davidchoi76 behroozetebari yumeone liuhai84 kwaterjupyter

prms-python's Issues

Water balance optimization

@JohnVolk we need to expand on how this should be done with examples/discussion below

Annual
April-September
February-July
High

Fix PyPI prms-python metadata

See current state: https://pypi.python.org/pypi/prms-python.

Missing homepage, authors, author emails, description, and more.

prmspy CLI import errors when installing PRMS-Python with python 2

prmspy works as expected when installing with pip3, I modified the init to have absolute imports to fix an installation bug with pip2 by adding absolute imports as opposed to relative i.e.

from prms_python.data import Data
# instead of

from .data import Data

This fixed errors when trying to install using pip2. However the prmspy script gives the following errors:

$ prmspy 
PRMS-Python/prms_python/__init__.py:19: RuntimeWarning: Parent module 'prms-python' not found while handling absolute import
  from prms_python.data import Data
PRMS-Python/prms_python/__init__.py:20: RuntimeWarning: Parent module 'prms-python' not found while handling absolute import
  from prms_python.optimizer import Optimizer, OptimizationResult
PRMS-Python/prms_python/__init__.py:21: RuntimeWarning: Parent module 'prms-python' not found while handling absolute import
  from prms_python.parameters import Parameters, modify_params
PRMS-Python/prms_python/__init__.py:22: RuntimeWarning: Parent module 'prms-python' not found while handling absolute import
  from prms_python.simulation import Simulation, SimulationSeries
PRMS-Python/prms_python/__init__.py:23: RuntimeWarning: Parent module 'prms-python' not found while handling absolute import
  from prms_python.scenario import Scenario, ScenarioSeries
PRMS-Python/prms_python/__init__.py:24: RuntimeWarning: Parent module 'prms-python' not found while handling absolute import
  from prms_python.util import load_statvar, load_data_file, nash_sutcliffe
Traceback (most recent call last):
  File "/usr/local/bin/prmspy", line 9, in <module>
    load_entry_point('prms-python', 'console_scripts', 'prmspy')()
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 547, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2720, in load_entry_point
    return ep.load()
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2380, in load
    return self.resolve()
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2386, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
ImportError: No module named scripts.prmspy

Add PET optimization to Optimizer

Add solar radiation optimization before and after images monthly and daily

Example plot coming soon, but should act like

from prms_python import Parameters, Optimizer, Data

p = Parameters('path/to/parameters')
d = Data('path/to/data')

o = Optimizer(Parameters, Data)
o.srad()
o.plot_srad_optimization(title='current optimization')

Config file option

Give users the ability to run scenario series via a config file. Provide fields for default files and directories and HydroShare credentials.

add other Python functions with docs

See this for doc example:

https://github.com/VirtualWatershed/capstone/blob/master/util/capstone.py#L309

enable Parameter and Data classes to handle multiple files

PRMS allows for multple parameter files that are associated with the same model, for example one file may contain the dimensions and parameters associated with soil properties and another may contain the parameters for the snow module, another for cascading flow, etc. A similar thing can be done with data files: one file can contain ten stations where precip was recorded and another file can contain tmin and tmax. An example of multiple data files is the ACF model included in the PRMS download from the USGS, the sagehen model they provide has multiple param files. The files are listed in the control file for example in the sagehen model the parameter files are listed in the control file like so:

####
param_file
6
4
./input/sagehen.params
./input/gis.params
./input/gvr.params
./input/ncascade.params
./input/ncascdgw.params
./input/subbasin.params

When we tackle this issue we will need to consider that the Parameter object needs to connect to all the param files listed in the control file. One idea is to use the control file on Parameter/Data initialization to connect all the corresponding files. The simulation class will need to include the appropriate collection of input files which can be ideally handled by only modifying the Parameter class.

parallelize ScenarioSeries data generation

Currently the running of PRMS is parallelized, but generating the modified inputs for a ScenarioSeries is not. That could help a lot.

Multiple Simulation class

How to handle a suite of simulations.

Run unit tests on Optimizer.monte_carlo

Make Monte Carlo parameter resampling method for optimization or parameter uncertainty analysis for arbitrary parameter sets

Due to the massive number of simulations required to do Monte Carlo simulations a future goal will also include reducing the size of simulation input and output files, e.g. keeping only the input parameters resampled and the output statistical variable that the optimization is conducted on, and a single copy of the original parameter, control and data input files. The OptimizationResult class should incorporate the recordings of Monte Carlo simulations for analysis of the parameter space output variable of interest (e.g. streamflow).

Develop Simulation class

example

sim = Simulation(input_dir, output_dir)
sim.run()

Will need to handle directory structure.

distribute on PyPI

Too many files open error

When running many simulations (about 4,000) I am getting the error "Too many files open". subprocess.communicate() is supposed to close stderr and stdout pipes I also tried in the Simulation runner adding two lines:

        p = subprocess.Popen(
            prms_exec + ' control', shell=True, stdout=subprocess.PIPE,
            stderr=subprocess.PIPE
        )
        # these two lines
        p.stdout.close()
        p.stderr.close()

However I am still getting the error so I am not sure what files are not being closed. Below is a traceback.

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-5a3359cdaa87> in <module>()
     10 #for i in range(25):
     11 while True:
---> 12     optr.monte_carlo(measrd, ['tmax_index','dday_intcp','dday_slope'], output_variable, method='uniform',                 n_sims=64, nproc=8, stage=stage)
     13 
     14 #     if i <= 13:

~/scratch/PRMS-Python/prms_python/optimizer.py in monte_carlo(self, reference_path, param_names, statvar_name, stage, n_sims, method, noise_factor, nproc)
    143         # run
    144         # if not nproc: nproc = mp.cpu_count() // 2
--> 145         outputs = list(series.run(nproc=nproc).outputs_iter())
    146         self.arb_outputs.extend(outputs) # for current instance- add outputs
    147 

~/scratch/PRMS-Python/prms_python/simulation.py in run(self, prms_exec, nproc)
     30             nproc = mp.cpu_count()//2
     31 
---> 32         pool = mp.Pool(processes=nproc)
     33         pool.map(_simulation_runner, self.series)
     34 

/usr/lib/python3.5/multiprocessing/context.py in Pool(self, processes, initializer, initargs, maxtasksperchild)
    116         from .pool import Pool
    117         return Pool(processes, initializer, initargs, maxtasksperchild,
--> 118                     context=self.get_context())
    119 
    120     def RawValue(self, typecode_or_type, *args):

/usr/lib/python3.5/multiprocessing/pool.py in __init__(self, processes, initializer, initargs, maxtasksperchild, context)
    166         self._processes = processes
    167         self._pool = []
--> 168         self._repopulate_pool()
    169 
    170         self._worker_handler = threading.Thread(

/usr/lib/python3.5/multiprocessing/pool.py in _repopulate_pool(self)
    231             w.name = w.name.replace('Process', 'PoolWorker')
    232             w.daemon = True
--> 233             w.start()
    234             util.debug('added worker')
    235 

/usr/lib/python3.5/multiprocessing/process.py in start(self)
    103                'daemonic processes are not allowed to have children'
    104         _cleanup()
--> 105         self._popen = self._Popen(self)
    106         self._sentinel = self._popen.sentinel
    107         _children.add(self)

/usr/lib/python3.5/multiprocessing/context.py in _Popen(process_obj)
    265         def _Popen(process_obj):
    266             from .popen_fork import Popen
--> 267             return Popen(process_obj)
    268 
    269     class SpawnProcess(process.BaseProcess):

/usr/lib/python3.5/multiprocessing/popen_fork.py in __init__(self, process_obj)
     18         sys.stderr.flush()
     19         self.returncode = None
---> 20         self._launch(process_obj)
     21 
     22     def duplicate_for_child(self, fd):

/usr/lib/python3.5/multiprocessing/popen_fork.py in _launch(self, process_obj)
     64     def _launch(self, process_obj):
     65         code = 1
---> 66         parent_r, child_w = os.pipe()
     67         self.pid = os.fork()
     68         if self.pid == 0:

OSError: [Errno 24] Too many open files

Document usage of Data class

Nice work getting the Data class together. Now I just have a couple documentation requests. It'd be great if you could update the tutorial text to include an intro to using the Data class. Specifically, please,

Show an example of reading data, modifying the Data instance's data, then writing the modified data to a new file. This may already be in the notebook, cutting and pasting that would be fine.
an explanation of the metadata and the use of the pandas dataframe

If you click the link above to the tutorial text, you can click the pencil icon on GitHub and edit the file directly in GitHub, then commit your changes to the master branch when you're done.

reatining data type in Parameters.write

When calling Parameters.write() the newly written parameter file does not retain the original data types for parameter values, specifically parameters with datatype integer (PRMS data type "1") are being converted to floats in the new file. Occurs when writing a new parameter file after accessing or modifying a parameter of type integer. For example original parameter hru_type in integer format:

####
hru_type
1
nhru
10791
1
0
0
0
0

Then write the file to a new parameter file params.write('newparam') and open the new file the hru_type parameter was saved in float format which would cause PRMS to crash:

####
hru_type
1
nhru
10791
1
0.0
0.0
0.0
0.0

Share data to HydroShare

Provide a command-line interface to share ScenarioSeries to Hydroshare

Optimizer class with solar radiation parameter optimization

Create an Optimizer class that will track the state of optimization and provide methods for running each step of optimization, for example

from prms_python import Parameters, Optimizer, Data

params = Parameters('my-data/parameters')
data = Data('my-data/data')

optr = Optimizer(params, data, title='Dry Creek Parameterization Optimization',
                           description='''
This optimization routine will find an appropriate solar radiation, potential ET, and other parameters
by maximizing the Nash-Sutcliffe model efficiency. We begin with parameters that were generated
from GIS routines and meteorological data that should be correct in ratio to other parameters, but possibly not correct in aggregate.
''')

optr.set_global_method('nash-sutcliffe')

optr.srad('path/to/reference_data/measured_srad.csv')
optr.pet('path/to/reference_data/measured_pet.csv')

print(optr.history())

# [2016-07-07T08:18:56] OPTIMIZATION INITIALIZED title: "Dry Creek Parameterization Optimization"
# [2016-07-07T08:22:14] SRAD OPTIMIZATION FINISHED: 14 iterations, mean modeled/observed = 0.98 w/ stddev 0.05
# [2016-07-07T08:26:22] PET OPTIMIZATION: 21 iterations, mean modeled/observed = 1.15 w/ stddev 0.210

We will also add methods beyond solar radiation and potential ET parameter optimization, following established methods for PRMS as described in Hay, et al, 2006.

As this issue says, to complete, just get solar radiation and history tracking working. @JohnVolk it'd be great to get your feedback in the comments below.

nproc error in SimulationSeries

When trying to set nproc= any int, the srad optimizer is crashing with the following error:

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
<ipython-input-19-e5a9b6224f36> in <module>()
----> 1 optr.srad(measrd, srad_hru, n_sims=8, nproc=4)
      2 util.delete_out_files(work_directory, 'prms_ic.out')

/home/john/scratch/PRMS-Python/prms_python/optimizer.py in srad(self, reference_srad_path, station_nhru, n_sims, method, nproc)
    136 
    137         # run all scenarios
--> 138         outputs = list(series.run(nproc=nproc).outputs_iter())
    139         self.srad_outputs.extend(outputs)
    140 

/home/john/scratch/PRMS-Python/prms_python/simulation.py in run(self, prms_exec, nproc)
     32 
     33         pool = mp.Pool(processes=nproc)
---> 34         pool.map(_simulation_runner, self.series)
     35 
     36         # for s in self.series:

/usr/lib/python3.4/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    258         in a list that is returned.
    259         '''
--> 260         return self._map_async(func, iterable, mapstar, chunksize).get()
    261 
    262     def starmap(self, func, iterable, chunksize=None):

/usr/lib/python3.4/multiprocessing/pool.py in get(self, timeout)
    597             return self._value
    598         else:
--> 599             raise self._value
    600 
    601     def _set(self, i, obj):

/usr/lib/python3.4/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    381                     break
    382                 try:
--> 383                     put(task)
    384                 except Exception as e:
    385                     job, ind = task[:2]

/usr/lib/python3.4/multiprocessing/connection.py in send(self, obj)
    204         self._check_closed()
    205         self._check_writable()
--> 206         self._send_bytes(ForkingPickler.dumps(obj))
    207 
    208     def recv_bytes(self, maxlength=None):

/usr/lib/python3.4/multiprocessing/reduction.py in dumps(cls, obj, protocol)
     48     def dumps(cls, obj, protocol=None):
     49         buf = io.BytesIO()
---> 50         cls(buf, protocol).dump(obj)
     51         return buf.getbuffer()
     52 

PicklingError: Can't pickle <class 'prms_python.simulation.Simulation'>: it's not the same object as prms_python.simulation.Simulation

If I leave nproc=None then everything works as expected.

Maybe the last post in this stackoverflow post will help?
http://stackoverflow.com/questions/1412787/picklingerror-cant-pickle-class-decimal-decimal-its-not-the-same-object

Optimize flows method in Optimizer class

@JohnVolk again need to expand on this. Just following the Hay, et al, JAWRA paper

All flows
Low flows
Peak flows