mullenkamp / hilltop-py Goto Github PK

View Code? Open in Web Editor NEW

9.0 4.0 9.0 1.53 MB

Python functions for extracting data from hilltop systems

License: Apache License 2.0

Python 86.63% Makefile 6.34% Batchfile 6.92% Shell 0.10%

hilltop-py's Introduction

Repository for Hilltop Python tools

This git repository contains the Hilltop Python tools and associated documentation.

Documentation

The primary documentation for the package can be found here.

hilltop-py's People

Contributors

Stargazers

Watchers

Forkers

horizonsrc markcoetzee data-to-knowledge jeffcnz lukefullard mwtoews karunakar2 rb-tech-byte sirvine1994

hilltop-py's Issues

Handling and conversion of units

Currently, hilltop-py returns units via the measurement_list function, but the package does not do any handling and conversion of units. Measurement types can come in a variety of units with little consistancy.
The main issue that I have seen in the results is that they may not always return units. This seems common amongst the gauging results.

get_data function to optionally return some measurement_list results

There are use cases where people want more than just the time series data results returned when calling the get_data function. An option to provide some measurement_list associated data to the output of the get_data function would cover those use cases.
A function parameter of a list of str to the get_data function should provide this. Where the strings are the column names of the measurement_list results.

Return units with results

Units aren't currently returned with the measurement results. They are available in the DataSource header information. Adding a units column to the returned dataframe would allow the measurement units to be viewed and compared, and conversions scripted if required. Gauging results would need to have the units hard coded into the dictionary.

DLL errors

Hi,

hilltop-py is a great idea. Thanks for putting it together. We've recently purchased a subscription to Hilltop, but the python scripting doesn't seem to work. I have installed hilltop-py through pip, and can import hilltop. However, I get an error message:

ImportError: DLL load failed: The specified module could not be found

when Hilltop is imported. The path to the Hilltop.pyd file correctly specified in the PYTHONPATH and I have run the installation in Hilltop Manager.

Are there any particular DLLs the Hilltop.pyd requires? I have also installed pywin32.

Thanks,

Neil

measurement_list functions return lowercase results

measurement_list functions in web_service and Hilltop class now convert MeasurementNames to lower case (they didn''t use to). This can cause issues with downstream processes if they are case sensitive.

Please could the option for lowercase, or provided case be added.

Provide all results from the measurement_list function

There are several results that the hilltop xml response provides that is not being parsed and returned by hilltop-py. These include DataSource, Item and Divisor. These should come through in the DataFrame results for the measurement_list function.

pandas date parsing warning

With the latest master (df3d747), I'm seeing many repetitive warning messages:

d:\src\hilltop-py\hilltoppy\utils.py:193: UserWarning: Parsing dates in %Y-%m-%dT%H:%M:%S format when dayfirst=True was specified. Pass dayfirst=False or specify a format to silence this warning.
val = pd.to_datetime(val, dayfirst=True)

Shouldn't dayfirst=False since hilltoppy.web_service.get_data specifies a "format 2001-01-01"?

Extraction failed: site name issues

Hi, I'm having issues with the get_data() command.
Python version: 3.6.13
Running on a new conda environment.

site_list( ) works as expected.
measurement_list( ) works as expected.

Example code:

import requests
import pandas as pd
from hilltoppy import hilltop

hts = (directory of my hts file)
mysites = ['Whanganui at d/s Taumarunui STP']
mymtypes = ['Nitrate (HRC) [Nitrate (HRC)]']
tsdata = hilltop.get_data(hts, mtypes=mymtypes, sites = mysites)

Which fails with the following output:
"Extraction failed for site Whanganui at d/s Taumarunui STP and mtype Nitrate (HRC) [Nitrate (HRC)]"

If I now run the example code below (without supplying a site name):

import requests
import pandas as pd
from hilltoppy import hilltop

hts = (directory of my hts file)
mysites = ['Whanganui at d/s Taumarunui STP']
mymtypes = ['Nitrate (HRC) [Nitrate (HRC)]']
tsdata = hilltop.get_data(hts, mtypes=mymtypes)

Then the error becomes

Name: Measurement, Length: 188, dtype: object
<class 'pandas.core.series.Series'> returned a result with an error set
Extraction failed for site 17

followed by a list of all sites names in the hts file.

Could you please guide me, where am I going wrong?
Thanks.

Unable to read check data

When the measurement name is different than the data source name, the get_data function fails to find the data. This is always the case for check data.

An example snippet of xml data returned by Hilltop is:

CheckSeries
…
Water Temperature Check

Here ItemName is not the same as the DataSource name. In the get_data function, this section looks for the ItemName which matches the data source:

    for m in measurements:
        m_dict = {c.tag: convert_value(c.text) for c in m}
        m_name = m_dict.pop('ItemName')

        if measurement.lower() == m_name.lower():

However, if the ItemName (measurement name) and the data source (the input) do not match, this fails. For data which has a different measurement name than the data source name (which is always the case for check data AFAIK) multiple inputs are required. In the above example, the Hilltop call requires 'Water temperature' in the server call, but for the processing it needs to match ‘Water Temperature Check’ and not ‘Water Temperature’.

Use the divisor values for converting results values to the appropriate units

Hilltop outputs divisor values which should be divided by the results value to get the appropraite units. This is especially necessary for gauging results and totally unnecessary for water quality results. This should be implemented before the unit conversion task.

Unable to access Stage [Gauging Results]

https://data.hbrc.govt.nz/EnviroData/ContinuousArchive.hts?service=Hilltop&request=MeasurementList&Site=Tutaekuri%20River%20at%20Puketapu%20HBRC%20Site has the Gauging Results Stage measurement needing to be requested as Stage [Gauging Results], but this is not in the gauging_dict in web_services.py so an error is passed.

Adding and extra row to the dictionary (line 24) fixes the issue. Row id

'Stage [Gauging Results]': {'row': 'I1', 'multiplier': 0.001},

Note: there is a problem accessing Gauging Results from this server so it may be hard to test

Element tree error when running Python 3.9

Error when trying to use Webserver version with Python 3.9

ElementTree.Element.getchildren depreciated from Python 3.2

https://docs.python.org/3.8/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.getchildren

the source is In webserver line 158

children = s.getchildren

replacing with

children = list(s)

fixes, but may need try catch to try and ensure backward compatibility

Measurement list does not return dict

The inputs of the function measurement_list are given by:

def measurement_list(base_url, hts, site, measurement=None, output='dataframe', timeout=60, **kwargs):

The doc string says the following about the output parameter:

output : dataframe or list of dict The output object. Must be either dataframe or list of dict.

However, the output variable is not used in the measurement_list function and the function always returns a dataframe.

Simple script to reproduce:

from hilltoppy import web_service as ws

base_url = "http://hilltop.gw.govt.nz/"
hts = "data.hts"
site = "Akatarawa River at Hutt Confluence"
print(
    type(
        ws.measurement_list(
            base_url,
            hts,
            site,
            output="dict",
        )
    )
)

This script should return a list of dictionaries, but instead returns a pandas dataframe

The function get_data attempts to using this functionality as a backup way of obtaining the measurement desired. The following code throws an error when the if loop is entered:

if 'Item' not in ds_dict1:
    ml = measurement_list(base_url, hts, site, measurement=measurement, output='dict', timeout=timeout)
    for m in ml:
        if m['MeasurementName'].lower() == measurement.lower():
            ds_dict1.update(m)

retries for ConnectionErrors

The Hilltop server will occationally fail for many reasons and will return a ConnectionError from the requests package. I've had to handle this error in all of my downstream applications by providing retries if it receives this kind of error. I should just add this functionality to hilltop-py.

Issues with version 2

Since version 2 development from #51, I'm seeing a few issues that were not present before.

For example take this example:

import hilltoppy.web_service

base_url = "https://data.hbrc.govt.nz/Envirodata"
hts = "ContinuousArchive.hts"
site = "Well.MW2s Brookvale Road"
meas = "Elevation Above Sea Level [Recorder Water Level]"
ts_df = hilltoppy.web_service.get_data(
    base_url, hts, site, meas,
    agg_method="Average",
    agg_interval="1 week")

With d61b18f from Windows with conda-forge installed dependencies the output is:

C:\Users\mtoews\AppData\Local\miniforge3\envs\pyforge\lib\site-packages\pydantic\_internal\_config.py:269: UserWarning: Valid config keys have changed in V2:
* 'json_dumps' has been removed
* 'json_loads' has been removed
  warnings.warn(message, UserWarning)
Traceback (most recent call last):
  File "D:\src\hilltop-py\trythis.py", line 7, in <module>
    ts_df = hilltoppy.web_service.get_data(
  File "D:\src\hilltop-py\hilltoppy\web_service.py", line 396, in get_data
    ds_dict1 = orjson.loads(DataSource(**ds_dict).json(exclude_none=True))
  File "C:\Users\mtoews\AppData\Local\miniforge3\envs\pyforge\lib\site-packages\pydantic\main.py", line 159, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for DataSource
Interpolation
  Input should be 'Discrete','Instant','Incremental' or 'Event' [type=enum, input_value='Histogram', input_type=str]

and a variation on the above get_data example without agg_* methods shows a different error:

C:\Users\mtoews\AppData\Local\miniforge3\envs\pyforge\lib\site-packages\pydantic\_internal\_config.py:269: UserWarning: Valid config keys have changed in V2:
* 'json_dumps' has been removed
* 'json_loads' has been removed
  warnings.warn(message, UserWarning)
D:\src\hilltop-py\hilltoppy\utils.py:193: UserWarning: Parsing dates in %Y-%m-%dT%H:%M:%S format when dayfirst=True was specified. Pass `dayfirst=False` or specify a format to silence this warning.
  val = pd.to_datetime(val, dayfirst=True)
Traceback (most recent call last):
  File "D:\src\hilltop-py\trythis.py", line 7, in <module>
    ts_df = hilltoppy.web_service.get_data(
  File "D:\src\hilltop-py\hilltoppy\web_service.py", line 443, in get_data
    if m['MeasurementName'].lower() == measurement.lower():
TypeError: string indices must be integers

Same issues are found on Linux with pip installed dependences.

Chunk get_data requests

Currently, the get_data function sends one request to the Hilltop server for data. The request may be for 2 data points or 2 million. I find that some Hilltop servers struggle when the request is really big (e.g. > 1,000,000 data points). I also find that parsing the giant xml into pandas dataframes is sluggish and make the memory footprint of the dataframes larger than they should be.
Providing an option to chunk out a get_data request into multiple smaller requests based on a fixed number of years would solve these issues. I've already written some code for this in the past.

get_data_quality erroring for start date in com

Hi Mike,

Been trying to use the get_data_quality function in Hilltop-py COM but the parameter "start" doesn't seem to work unless you put in a date (which doesn't seem to be a requirement).
The error seems to be around "DataStartTime".
File ~\Miniconda3_32bit\lib\site-packages\hilltoppy\com.py:290 in get_data_quality
start1 = wqr.DataStartTime
If I run first with a start date and then remove it and re-run, the file will pull the full record date record which is weird. If I start from a fresh console it will fail.

Another observation is the doc string requires ISO format but will also error if I put the time in.
start : str
The start date to retreive from the data in ISO format (e.g. '2011-11-30 00:00').

Thanks!
Emily

horizons error not .hts

Hi Mike,

I've been attempting to use the package to pull data from the horizons hilltop server - https://envirodata.horizons.govt.nz/api/hilltop/data?

However it's failing due to the below as it doesn't go to an .hts file - can we get a workaround put in for this?

if not hts.endswith('.hts'):
raise ValueError('The hts file must end with .hts')

Cheers,
Darren