This git repository contains the Hilltop Python tools and associated documentation.
The primary documentation for the package can be found here.
Python functions for extracting data from hilltop systems
License: Apache License 2.0
This git repository contains the Hilltop Python tools and associated documentation.
The primary documentation for the package can be found here.
Currently, hilltop-py returns units via the measurement_list function, but the package does not do any handling and conversion of units. Measurement types can come in a variety of units with little consistancy.
The main issue that I have seen in the results is that they may not always return units. This seems common amongst the gauging results.
There are use cases where people want more than just the time series data results returned when calling the get_data function. An option to provide some measurement_list associated data to the output of the get_data function would cover those use cases.
A function parameter of a list of str to the get_data function should provide this. Where the strings are the column names of the measurement_list results.
Units aren't currently returned with the measurement results. They are available in the DataSource header information. Adding a units column to the returned dataframe would allow the measurement units to be viewed and compared, and conversions scripted if required. Gauging results would need to have the units hard coded into the dictionary.
Hi,
hilltop-py is a great idea. Thanks for putting it together. We've recently purchased a subscription to Hilltop, but the python scripting doesn't seem to work. I have installed hilltop-py through pip, and can import hilltop. However, I get an error message:
ImportError: DLL load failed: The specified module could not be found
when Hilltop is imported. The path to the Hilltop.pyd file correctly specified in the PYTHONPATH and I have run the installation in Hilltop Manager.
Are there any particular DLLs the Hilltop.pyd requires? I have also installed pywin32.
Thanks,
Neil
measurement_list functions in web_service and Hilltop class now convert MeasurementNames to lower case (they didn''t use to). This can cause issues with downstream processes if they are case sensitive.
Please could the option for lowercase, or provided case be added.
There are several results that the hilltop xml response provides that is not being parsed and returned by hilltop-py. These include DataSource, Item and Divisor. These should come through in the DataFrame results for the measurement_list function.
With the latest master (df3d747), I'm seeing many repetitive warning messages:
d:\src\hilltop-py\hilltoppy\utils.py:193: UserWarning: Parsing dates in %Y-%m-%dT%H:%M:%S format when dayfirst=True was specified. Pass
dayfirst=False
or specify a format to silence this warning.
val = pd.to_datetime(val, dayfirst=True)
Shouldn't dayfirst=False
since hilltoppy.web_service.get_data
specifies a "format 2001-01-01"?
Hi, I'm having issues with the get_data() command.
Python version: 3.6.13
Running on a new conda environment.
site_list( ) works as expected.
measurement_list( ) works as expected.
Example code:
import requests
import pandas as pd
from hilltoppy import hilltop
hts = (directory of my hts file)
mysites = ['Whanganui at d/s Taumarunui STP']
mymtypes = ['Nitrate (HRC) [Nitrate (HRC)]']
tsdata = hilltop.get_data(hts, mtypes=mymtypes, sites = mysites)
Which fails with the following output:
"Extraction failed for site Whanganui at d/s Taumarunui STP and mtype Nitrate (HRC) [Nitrate (HRC)]"
If I now run the example code below (without supplying a site name):
import requests
import pandas as pd
from hilltoppy import hilltop
hts = (directory of my hts file)
mysites = ['Whanganui at d/s Taumarunui STP']
mymtypes = ['Nitrate (HRC) [Nitrate (HRC)]']
tsdata = hilltop.get_data(hts, mtypes=mymtypes)
Then the error becomes
Name: Measurement, Length: 188, dtype: object
<class 'pandas.core.series.Series'> returned a result with an error set
Extraction failed for site 17
followed by a list of all sites names in the hts file.
Could you please guide me, where am I going wrong?
Thanks.
When the measurement name is different than the data source name, the get_data function fails to find the data. This is always the case for check data.
An example snippet of xml data returned by Hilltop is:
CheckSeries
…
Water Temperature Check
Here ItemName is not the same as the DataSource name. In the get_data function, this section looks for the ItemName which matches the data source:
for m in measurements:
m_dict = {c.tag: convert_value(c.text) for c in m}
m_name = m_dict.pop('ItemName')
if measurement.lower() == m_name.lower():
However, if the ItemName (measurement name) and the data source (the input) do not match, this fails. For data which has a different measurement name than the data source name (which is always the case for check data AFAIK) multiple inputs are required. In the above example, the Hilltop call requires 'Water temperature' in the server call, but for the processing it needs to match ‘Water Temperature Check’ and not ‘Water Temperature’.
Hilltop outputs divisor values which should be divided by the results value to get the appropraite units. This is especially necessary for gauging results and totally unnecessary for water quality results. This should be implemented before the unit conversion task.
https://data.hbrc.govt.nz/EnviroData/ContinuousArchive.hts?service=Hilltop&request=MeasurementList&Site=Tutaekuri%20River%20at%20Puketapu%20HBRC%20Site has the Gauging Results Stage measurement needing to be requested as Stage [Gauging Results], but this is not in the gauging_dict in web_services.py so an error is passed.
Adding and extra row to the dictionary (line 24) fixes the issue. Row id
'Stage [Gauging Results]': {'row': 'I1', 'multiplier': 0.001},
Note: there is a problem accessing Gauging Results from this server so it may be hard to test
Error when trying to use Webserver version with Python 3.9
ElementTree.Element.getchildren depreciated from Python 3.2
the source is In webserver line 158
children = s.getchildren
replacing with
children = list(s)
fixes, but may need try catch to try and ensure backward compatibility
The inputs of the function measurement_list
are given by:
def measurement_list(base_url, hts, site, measurement=None, output='dataframe', timeout=60, **kwargs):
The doc string says the following about the output
parameter:
output : dataframe or list of dict The output object. Must be either dataframe or list of dict.
However, the output variable is not used in the measurement_list
function and the function always returns a dataframe.
Simple script to reproduce:
from hilltoppy import web_service as ws
base_url = "http://hilltop.gw.govt.nz/"
hts = "data.hts"
site = "Akatarawa River at Hutt Confluence"
print(
type(
ws.measurement_list(
base_url,
hts,
site,
output="dict",
)
)
)
This script should return a list of dictionaries, but instead returns a pandas dataframe
The function get_data
attempts to using this functionality as a backup way of obtaining the measurement desired. The following code throws an error when the if loop is entered:
if 'Item' not in ds_dict1:
ml = measurement_list(base_url, hts, site, measurement=measurement, output='dict', timeout=timeout)
for m in ml:
if m['MeasurementName'].lower() == measurement.lower():
ds_dict1.update(m)
The Hilltop server will occationally fail for many reasons and will return a ConnectionError from the requests package. I've had to handle this error in all of my downstream applications by providing retries if it receives this kind of error. I should just add this functionality to hilltop-py.
Since version 2 development from #51, I'm seeing a few issues that were not present before.
For example take this example:
import hilltoppy.web_service
base_url = "https://data.hbrc.govt.nz/Envirodata"
hts = "ContinuousArchive.hts"
site = "Well.MW2s Brookvale Road"
meas = "Elevation Above Sea Level [Recorder Water Level]"
ts_df = hilltoppy.web_service.get_data(
base_url, hts, site, meas,
agg_method="Average",
agg_interval="1 week")
With d61b18f from Windows with conda-forge installed dependencies the output is:
C:\Users\mtoews\AppData\Local\miniforge3\envs\pyforge\lib\site-packages\pydantic\_internal\_config.py:269: UserWarning: Valid config keys have changed in V2:
* 'json_dumps' has been removed
* 'json_loads' has been removed
warnings.warn(message, UserWarning)
Traceback (most recent call last):
File "D:\src\hilltop-py\trythis.py", line 7, in <module>
ts_df = hilltoppy.web_service.get_data(
File "D:\src\hilltop-py\hilltoppy\web_service.py", line 396, in get_data
ds_dict1 = orjson.loads(DataSource(**ds_dict).json(exclude_none=True))
File "C:\Users\mtoews\AppData\Local\miniforge3\envs\pyforge\lib\site-packages\pydantic\main.py", line 159, in __init__
__pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for DataSource
Interpolation
Input should be 'Discrete','Instant','Incremental' or 'Event' [type=enum, input_value='Histogram', input_type=str]
and a variation on the above get_data
example without agg_*
methods shows a different error:
C:\Users\mtoews\AppData\Local\miniforge3\envs\pyforge\lib\site-packages\pydantic\_internal\_config.py:269: UserWarning: Valid config keys have changed in V2:
* 'json_dumps' has been removed
* 'json_loads' has been removed
warnings.warn(message, UserWarning)
D:\src\hilltop-py\hilltoppy\utils.py:193: UserWarning: Parsing dates in %Y-%m-%dT%H:%M:%S format when dayfirst=True was specified. Pass `dayfirst=False` or specify a format to silence this warning.
val = pd.to_datetime(val, dayfirst=True)
Traceback (most recent call last):
File "D:\src\hilltop-py\trythis.py", line 7, in <module>
ts_df = hilltoppy.web_service.get_data(
File "D:\src\hilltop-py\hilltoppy\web_service.py", line 443, in get_data
if m['MeasurementName'].lower() == measurement.lower():
TypeError: string indices must be integers
Same issues are found on Linux with pip installed dependences.
Currently, the get_data function sends one request to the Hilltop server for data. The request may be for 2 data points or 2 million. I find that some Hilltop servers struggle when the request is really big (e.g. > 1,000,000 data points). I also find that parsing the giant xml into pandas dataframes is sluggish and make the memory footprint of the dataframes larger than they should be.
Providing an option to chunk out a get_data request into multiple smaller requests based on a fixed number of years would solve these issues. I've already written some code for this in the past.
Hi Mike,
Been trying to use the get_data_quality function in Hilltop-py COM but the parameter "start" doesn't seem to work unless you put in a date (which doesn't seem to be a requirement).
The error seems to be around "DataStartTime".
File ~\Miniconda3_32bit\lib\site-packages\hilltoppy\com.py:290 in get_data_quality
start1 = wqr.DataStartTime
If I run first with a start date and then remove it and re-run, the file will pull the full record date record which is weird. If I start from a fresh console it will fail.
Another observation is the doc string requires ISO format but will also error if I put the time in.
start : str
The start date to retreive from the data in ISO format (e.g. '2011-11-30 00:00').
Thanks!
Emily
Hi Mike,
I've been attempting to use the package to pull data from the horizons hilltop server - https://envirodata.horizons.govt.nz/api/hilltop/data?
However it's failing due to the below as it doesn't go to an .hts file - can we get a workaround put in for this?
if not hts.endswith('.hts'):
raise ValueError('The hts file must end with .hts')
Cheers,
Darren
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.