Coder Social home page Coder Social logo

dataretrieval's People

Contributors

cjbas22 avatar elmerehbi avatar jbousquin avatar jsta avatar nouri-1992 avatar nouri1992 avatar sblack-usu avatar thodson-usgs avatar

Watchers

 avatar

dataretrieval's Issues

Establish testing framework and patterns

Current tests are lacking. There are two existing tests that make calls to the service with minimal checks.

Establish a vertical slice of tests that provides full test coverage of one service.

Acceptance Criteria:

  • Mocking framework is setup and used
  • Code coverage tool is setup and used
  • Testing should be around our code, which is 90% input parsing and validation (don't test the dataframe itself unless we do custom work on building it. This currently is only true for timeseries.)

When parsing date and time in get_qwdata and get_gwlevels data get lost.

Issue

When parsing date and time from separate columns the information contained in the raw data is used to generate a datetime object. It is common that only partial information is available in certain rows (e.g., a date is available without a time). This results in missing values (NaT). Since the original columns containing the information are not returned in the dataset, a significant part of the information is lost.

Functions with this issue: get_qwdata, get_gwlevels.

To Reproduce

These are two examples from the wto functions mentioned above:

get_qwdata. Run:

site_id = "04024000"
data = nwis.get_qwdata(sites=site_id)
df = data[0]
Explore df and observe the missing (NaT) values in the index.
Click here to see the raw data from this query (notice there are no missing dates here)

get_gwlevels. Run:

site_id = "375907091432201"
data = get_gwlevels(sites=site_id)
df = data[0]
Explore df and observe the missing (NaT) values in the index.
Click here to see the raw data from this query (notice there are no missing dates here)

Expected behavior

For all these missing values, in both functions, there is a date value available in the raw data queried. The output dataframe should contain all original columns to avoid losing data.

Copy datatime index column as a seperate column

Issue
Copy datatime index column as a seperate column

To Reproduce
rawData_today = get_iv(sites=site_id, parameterCd=parameterCode, startDt=today, endDt=today)

Expected behavior
This query should return unit value data for a specific parameter at a USGS NWIS monitoring site between a begin and end date with datetime column as an index and seperate column

Error with the get_iv function when querying with timezone

Issue
The tz input parameter is not supported in get_iv function

To Reproduce
entralTime = get_iv(sites=site_id, parameterCd=parameterCode,
startDt="2014-10-10T12:00", endDt="2014-10-10T23:59",
zoneAbbreviation="S")

Expected behavior
This query should return unit value data for a specific parameter at a USGS NWIS monitoring site between a begin and end date with a specified timezone

Add metadata to dataFrame

The following metadata needs to be added to the dataFrame returned from all functions that return a dataFrame:

  • url
  • queryTime

More metadata will be provided at a later time. Much of the metadata is dependent on the endpoint that is called, we'll have to do a more thorough audit of the R tool to find out what this metadata is and implemented. Right now we should just focus on the metadata implementation.

The pandas module currently does not have embedded metadata. It is something that is being considered (for the last 7 years or so) but not implemented. There are suggestions and incomplete implementations available, much of which is documented here pandas-dev/pandas#2485

I'm in favor of monkey patching for now until pandas has an official implementation. With this approach, we'll just add metadata as fields to the dataFrame before returning it. This will be simple to implement. The major drawback is that the metadata isn't actually a part of the dataFrame so if a user were to run a dataFrame method that results in a new dataFrame being created, the metadata would not be carried to the new dataFrame. I don't believe this is a big deal but am open to input. We can easily provide a method that copies our metadata between dataFrames as a convenience function for the user. This will be easy to use, but will require the user to know about and use.

Acceptance Criteria:

  • Establish an implementation for metadata in this library.
  • url and queryTime metadata is available on all dataFrames returned by this library.

Fix pcode endpoint

The pcode endpoint does not work because the tool provides query parameters as a dictionary, which reduces the multiple "show" key query parameters to only the last one provided. See code here

https://github.com/UCHIC/dataretrieval/blob/master/dataretrieval/nwis.py#L367

This will also have to be fixed in the query method here

def query(url, **kwargs):

Modifying the query function may have a cascading effect causing a lot of refactoring... I think it's worth exploring, but considering there are few tests, we may want to establish tests around the library before doing such work.

Acceptance Criteria:

  • Fix the pmcode method to use a list of tuples for the query params,
  • Either refactor the query method to support list of tuples or create a new issue describing the work necessary.

Functions output is inconsistent (different number of columns) when querying one/multiple sites.

Issue

The following functions: get_qwdata, get_discharge_measurements, get_discharge_peaks, get_gwlevels, get_stats, get_dv, get_info, get_iv will output a dataframe with a datetime index when a single site is provided and a dataframe with a MultiIndex (site/datetime) when more than 1 site is provided.

To Reproduce

Single site:
site_id = "434400121275801"
nwis.get_gwlevels(sites=site_id)

More than one site:
site_ids = ["434400121275801", "375907091432201"]
nwis.get_gwlevels(sites=site_ids)

Expected behavior
For consistency, both queries should return similar dataframes.

Setup Continuous Integration

Setup continuous integration to run tests and pylint with a Pull Request.

Acceptance Criteria:

  • When submitting a Pull Request a job, the PR should be blocked until a report of test results, code coverage and pylint are provided and thresholds are met.

Active repo?

Hello,

I have been relying on the USGS repo however it seems to have some issues. @sblack-usu You seem to have developed that one and seem to be developing this one as well. Do you recommend I use this instead?

PS: I already made several minor PRs on the aforementioned repo and would very much like to help in developing the package. So please advise accordingly how best I can help.

Error with the get_pmcpdes function

Issue
The get_pmcodes functions is not returing correct results from any query,

To Reproduce
nwis.get_pmcodes(['00400'])

Expected behavior
This query should return information about parameter 00400

Adding Multi-index as an argument, Default = True

Issue
Adding Multi-index as an argument, Default = True

To Reproduce
rawData_today = get_iv(sites=site_id, parameterCd=parameterCode, startDt=today, endDt=today, MultiIndex=False)

Expected behavior
This query should return unit value data for a specific parameter at a USGS NWIS monitoring site between a begin and end date with datetime column only as an index

Setup Pep8

Acceptance Criteria:

  • Setup formatting standards and run the project through pylint with pep8 standards.
  • Provide documentation of the standards set.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.