Coder Social home page Coder Social logo

hilltop-py's Introduction

Repository for Hilltop Python tools

This git repository contains the Hilltop Python tools and associated documentation.

Documentation

The primary documentation for the package can be found here.

hilltop-py's People

Contributors

jeffcnz avatar mullenkamp avatar mwtoews avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hilltop-py's Issues

Handling and conversion of units

Currently, hilltop-py returns units via the measurement_list function, but the package does not do any handling and conversion of units. Measurement types can come in a variety of units with little consistancy.
The main issue that I have seen in the results is that they may not always return units. This seems common amongst the gauging results.

get_data function to optionally return some measurement_list results

There are use cases where people want more than just the time series data results returned when calling the get_data function. An option to provide some measurement_list associated data to the output of the get_data function would cover those use cases.
A function parameter of a list of str to the get_data function should provide this. Where the strings are the column names of the measurement_list results.

Return units with results

Units aren't currently returned with the measurement results. They are available in the DataSource header information. Adding a units column to the returned dataframe would allow the measurement units to be viewed and compared, and conversions scripted if required. Gauging results would need to have the units hard coded into the dictionary.

DLL errors

Hi,

hilltop-py is a great idea. Thanks for putting it together. We've recently purchased a subscription to Hilltop, but the python scripting doesn't seem to work. I have installed hilltop-py through pip, and can import hilltop. However, I get an error message:

ImportError: DLL load failed: The specified module could not be found

when Hilltop is imported. The path to the Hilltop.pyd file correctly specified in the PYTHONPATH and I have run the installation in Hilltop Manager.

Are there any particular DLLs the Hilltop.pyd requires? I have also installed pywin32.

Thanks,

Neil

measurement_list functions return lowercase results

measurement_list functions in web_service and Hilltop class now convert MeasurementNames to lower case (they didn''t use to). This can cause issues with downstream processes if they are case sensitive.

Please could the option for lowercase, or provided case be added.

Provide all results from the measurement_list function

There are several results that the hilltop xml response provides that is not being parsed and returned by hilltop-py. These include DataSource, Item and Divisor. These should come through in the DataFrame results for the measurement_list function.

pandas date parsing warning

With the latest master (df3d747), I'm seeing many repetitive warning messages:

d:\src\hilltop-py\hilltoppy\utils.py:193: UserWarning: Parsing dates in %Y-%m-%dT%H:%M:%S format when dayfirst=True was specified. Pass dayfirst=False or specify a format to silence this warning.
val = pd.to_datetime(val, dayfirst=True)

Shouldn't dayfirst=False since hilltoppy.web_service.get_data specifies a "format 2001-01-01"?

Extraction failed: site name issues

Hi, I'm having issues with the get_data() command.
Python version: 3.6.13
Running on a new conda environment.

site_list( ) works as expected.
measurement_list( ) works as expected.

Example code:

import requests
import pandas as pd
from hilltoppy import hilltop

hts = (directory of my hts file)
mysites = ['Whanganui at d/s Taumarunui STP']
mymtypes = ['Nitrate (HRC) [Nitrate (HRC)]']
tsdata = hilltop.get_data(hts, mtypes=mymtypes, sites = mysites)

Which fails with the following output:
"Extraction failed for site Whanganui at d/s Taumarunui STP and mtype Nitrate (HRC) [Nitrate (HRC)]"

If I now run the example code below (without supplying a site name):

import requests
import pandas as pd
from hilltoppy import hilltop

hts = (directory of my hts file)
mysites = ['Whanganui at d/s Taumarunui STP']
mymtypes = ['Nitrate (HRC) [Nitrate (HRC)]']
tsdata = hilltop.get_data(hts, mtypes=mymtypes)

Then the error becomes

Name: Measurement, Length: 188, dtype: object
<class 'pandas.core.series.Series'> returned a result with an error set
Extraction failed for site 17

followed by a list of all sites names in the hts file.

Could you please guide me, where am I going wrong?
Thanks.

Unable to read check data

When the measurement name is different than the data source name, the get_data function fails to find the data. This is always the case for check data.

An example snippet of xml data returned by Hilltop is:

CheckSeries

Water Temperature Check

Here ItemName is not the same as the DataSource name. In the get_data function, this section looks for the ItemName which matches the data source:

    for m in measurements:
        m_dict = {c.tag: convert_value(c.text) for c in m}
        m_name = m_dict.pop('ItemName')

        if measurement.lower() == m_name.lower():

However, if the ItemName (measurement name) and the data source (the input) do not match, this fails. For data which has a different measurement name than the data source name (which is always the case for check data AFAIK) multiple inputs are required. In the above example, the Hilltop call requires 'Water temperature' in the server call, but for the processing it needs to match ‘Water Temperature Check’ and not ‘Water Temperature’.

Unable to access Stage [Gauging Results]

https://data.hbrc.govt.nz/EnviroData/ContinuousArchive.hts?service=Hilltop&request=MeasurementList&Site=Tutaekuri%20River%20at%20Puketapu%20HBRC%20Site has the Gauging Results Stage measurement needing to be requested as Stage [Gauging Results], but this is not in the gauging_dict in web_services.py so an error is passed.

Adding and extra row to the dictionary (line 24) fixes the issue. Row id

'Stage [Gauging Results]': {'row': 'I1', 'multiplier': 0.001},

Note: there is a problem accessing Gauging Results from this server so it may be hard to test

Measurement list does not return dict

The inputs of the function measurement_list are given by:

def measurement_list(base_url, hts, site, measurement=None, output='dataframe', timeout=60, **kwargs):

The doc string says the following about the output parameter:

output : dataframe or list of dict The output object. Must be either dataframe or list of dict.

However, the output variable is not used in the measurement_list function and the function always returns a dataframe.

Simple script to reproduce:

from hilltoppy import web_service as ws

base_url = "http://hilltop.gw.govt.nz/"
hts = "data.hts"
site = "Akatarawa River at Hutt Confluence"
print(
    type(
        ws.measurement_list(
            base_url,
            hts,
            site,
            output="dict",
        )
    )
)

This script should return a list of dictionaries, but instead returns a pandas dataframe

The function get_data attempts to using this functionality as a backup way of obtaining the measurement desired. The following code throws an error when the if loop is entered:

if 'Item' not in ds_dict1:
    ml = measurement_list(base_url, hts, site, measurement=measurement, output='dict', timeout=timeout)
    for m in ml:
        if m['MeasurementName'].lower() == measurement.lower():
            ds_dict1.update(m)

retries for ConnectionErrors

The Hilltop server will occationally fail for many reasons and will return a ConnectionError from the requests package. I've had to handle this error in all of my downstream applications by providing retries if it receives this kind of error. I should just add this functionality to hilltop-py.

Issues with version 2

Since version 2 development from #51, I'm seeing a few issues that were not present before.

For example take this example:

import hilltoppy.web_service

base_url = "https://data.hbrc.govt.nz/Envirodata"
hts = "ContinuousArchive.hts"
site = "Well.MW2s Brookvale Road"
meas = "Elevation Above Sea Level [Recorder Water Level]"
ts_df = hilltoppy.web_service.get_data(
    base_url, hts, site, meas,
    agg_method="Average",
    agg_interval="1 week")

With d61b18f from Windows with conda-forge installed dependencies the output is:

C:\Users\mtoews\AppData\Local\miniforge3\envs\pyforge\lib\site-packages\pydantic\_internal\_config.py:269: UserWarning: Valid config keys have changed in V2:
* 'json_dumps' has been removed
* 'json_loads' has been removed
  warnings.warn(message, UserWarning)
Traceback (most recent call last):
  File "D:\src\hilltop-py\trythis.py", line 7, in <module>
    ts_df = hilltoppy.web_service.get_data(
  File "D:\src\hilltop-py\hilltoppy\web_service.py", line 396, in get_data
    ds_dict1 = orjson.loads(DataSource(**ds_dict).json(exclude_none=True))
  File "C:\Users\mtoews\AppData\Local\miniforge3\envs\pyforge\lib\site-packages\pydantic\main.py", line 159, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for DataSource
Interpolation
  Input should be 'Discrete','Instant','Incremental' or 'Event' [type=enum, input_value='Histogram', input_type=str]

and a variation on the above get_data example without agg_* methods shows a different error:

C:\Users\mtoews\AppData\Local\miniforge3\envs\pyforge\lib\site-packages\pydantic\_internal\_config.py:269: UserWarning: Valid config keys have changed in V2:
* 'json_dumps' has been removed
* 'json_loads' has been removed
  warnings.warn(message, UserWarning)
D:\src\hilltop-py\hilltoppy\utils.py:193: UserWarning: Parsing dates in %Y-%m-%dT%H:%M:%S format when dayfirst=True was specified. Pass `dayfirst=False` or specify a format to silence this warning.
  val = pd.to_datetime(val, dayfirst=True)
Traceback (most recent call last):
  File "D:\src\hilltop-py\trythis.py", line 7, in <module>
    ts_df = hilltoppy.web_service.get_data(
  File "D:\src\hilltop-py\hilltoppy\web_service.py", line 443, in get_data
    if m['MeasurementName'].lower() == measurement.lower():
TypeError: string indices must be integers

Same issues are found on Linux with pip installed dependences.

Chunk get_data requests

Currently, the get_data function sends one request to the Hilltop server for data. The request may be for 2 data points or 2 million. I find that some Hilltop servers struggle when the request is really big (e.g. > 1,000,000 data points). I also find that parsing the giant xml into pandas dataframes is sluggish and make the memory footprint of the dataframes larger than they should be.
Providing an option to chunk out a get_data request into multiple smaller requests based on a fixed number of years would solve these issues. I've already written some code for this in the past.

get_data_quality erroring for start date in com

Hi Mike,

Been trying to use the get_data_quality function in Hilltop-py COM but the parameter "start" doesn't seem to work unless you put in a date (which doesn't seem to be a requirement).
The error seems to be around "DataStartTime".
File ~\Miniconda3_32bit\lib\site-packages\hilltoppy\com.py:290 in get_data_quality
start1 = wqr.DataStartTime

If I run first with a start date and then remove it and re-run, the file will pull the full record date record which is weird. If I start from a fresh console it will fail.

Another observation is the doc string requires ISO format but will also error if I put the time in.
start : str
The start date to retreive from the data in ISO format (e.g. '2011-11-30 00:00').

Thanks!
Emily

horizons error not .hts

Hi Mike,

I've been attempting to use the package to pull data from the horizons hilltop server - https://envirodata.horizons.govt.nz/api/hilltop/data?

However it's failing due to the below as it doesn't go to an .hts file - can we get a workaround put in for this?

if not hts.endswith('.hts'):
raise ValueError('The hts file must end with .hts')

Cheers,
Darren

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.