uchic / dataretrieval Goto Github PK
View Code? Open in Web Editor NEWThis project forked from doi-usgs/dataretrieval-python
Tools for downloading hydrologic and climate data.
License: Other
This project forked from doi-usgs/dataretrieval-python
Tools for downloading hydrologic and climate data.
License: Other
Current tests are lacking. There are two existing tests that make calls to the service with minimal checks.
Establish a vertical slice of tests that provides full test coverage of one service.
Acceptance Criteria:
When parsing date and time from separate columns the information contained in the raw data is used to generate a datetime object. It is common that only partial information is available in certain rows (e.g., a date is available without a time). This results in missing values (NaT). Since the original columns containing the information are not returned in the dataset, a significant part of the information is lost.
Functions with this issue: get_qwdata, get_gwlevels.
These are two examples from the wto functions mentioned above:
site_id = "04024000"
data = nwis.get_qwdata(sites=site_id)
df = data[0]
Explore df and observe the missing (NaT) values in the index.
Click here to see the raw data from this query (notice there are no missing dates here)
site_id = "375907091432201"
data = get_gwlevels(sites=site_id)
df = data[0]
Explore df and observe the missing (NaT) values in the index.
Click here to see the raw data from this query (notice there are no missing dates here)
For all these missing values, in both functions, there is a date value available in the raw data queried. The output dataframe should contain all original columns to avoid losing data.
Issue
Copy datatime index column as a seperate column
To Reproduce
rawData_today = get_iv(sites=site_id, parameterCd=parameterCode, startDt=today, endDt=today)
Expected behavior
This query should return unit value data for a specific parameter at a USGS NWIS monitoring site between a begin and end date with datetime column as an index and seperate column
Issue
The tz input parameter is not supported in get_iv function
To Reproduce
entralTime = get_iv(sites=site_id, parameterCd=parameterCode,
startDt="2014-10-10T12:00", endDt="2014-10-10T23:59",
zoneAbbreviation="S")
Expected behavior
This query should return unit value data for a specific parameter at a USGS NWIS monitoring site between a begin and end date with a specified timezone
The following metadata needs to be added to the dataFrame returned from all functions that return a dataFrame:
More metadata will be provided at a later time. Much of the metadata is dependent on the endpoint that is called, we'll have to do a more thorough audit of the R tool to find out what this metadata is and implemented. Right now we should just focus on the metadata implementation.
The pandas module currently does not have embedded metadata. It is something that is being considered (for the last 7 years or so) but not implemented. There are suggestions and incomplete implementations available, much of which is documented here pandas-dev/pandas#2485
I'm in favor of monkey patching for now until pandas has an official implementation. With this approach, we'll just add metadata as fields to the dataFrame before returning it. This will be simple to implement. The major drawback is that the metadata isn't actually a part of the dataFrame so if a user were to run a dataFrame method that results in a new dataFrame being created, the metadata would not be carried to the new dataFrame. I don't believe this is a big deal but am open to input. We can easily provide a method that copies our metadata between dataFrames as a convenience function for the user. This will be easy to use, but will require the user to know about and use.
Acceptance Criteria:
The pcode endpoint does not work because the tool provides query parameters as a dictionary, which reduces the multiple "show" key query parameters to only the last one provided. See code here
https://github.com/UCHIC/dataretrieval/blob/master/dataretrieval/nwis.py#L367
This will also have to be fixed in the query method here
dataretrieval/dataretrieval/nwis.py
Line 163 in 224515c
Modifying the query function may have a cascading effect causing a lot of refactoring... I think it's worth exploring, but considering there are few tests, we may want to establish tests around the library before doing such work.
Acceptance Criteria:
Issue
The following functions: get_qwdata, get_discharge_measurements, get_discharge_peaks, get_gwlevels, get_stats, get_dv, get_info, get_iv will output a dataframe with a datetime index when a single site is provided and a dataframe with a MultiIndex (site/datetime) when more than 1 site is provided.
To Reproduce
Single site:
site_id = "434400121275801"
nwis.get_gwlevels(sites=site_id)
More than one site:
site_ids = ["434400121275801", "375907091432201"]
nwis.get_gwlevels(sites=site_ids)
Expected behavior
For consistency, both queries should return similar dataframes.
Setup continuous integration to run tests and pylint with a Pull Request.
Acceptance Criteria:
Hello,
I have been relying on the USGS repo however it seems to have some issues. @sblack-usu You seem to have developed that one and seem to be developing this one as well. Do you recommend I use this instead?
PS: I already made several minor PRs on the aforementioned repo and would very much like to help in developing the package. So please advise accordingly how best I can help.
Issue
The get_pmcodes functions is not returing correct results from any query,
To Reproduce
nwis.get_pmcodes(['00400'])
Expected behavior
This query should return information about parameter 00400
Issue
Adding Multi-index as an argument, Default = True
To Reproduce
rawData_today = get_iv(sites=site_id, parameterCd=parameterCode, startDt=today, endDt=today, MultiIndex=False)
Expected behavior
This query should return unit value data for a specific parameter at a USGS NWIS monitoring site between a begin and end date with datetime column only as an index
Issue
Adding date time values into "lev_dt", "lev_tm", and "lev_tz_cd" columns - (get_gwlevels) function
To reproduce
data = get_gwlevels(sites=site_id, start="1980-01-01", end="2000-12-31")
Expected behavior
This query should return groundwater level data for specific USGS monitoring sites with date time values added as three separate columns
Acceptance Criteria:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.