Coder Social home page Coder Social logo

usepa / raqsapi Goto Github PK

View Code? Open in Web Editor NEW
20.0 6.0 4.0 4.16 MB

A R extension to Retrieve EPA Air Quality System Data via the AQS Data Mart API

License: Other

R 98.78% TeX 1.22%
r03 airmonitoring aqs datamart environmental-data-retrieval environmental-monitoring usepa rpackage rprogramming

raqsapi's Introduction

Introduction to the RAQSAPI package

Clinton Mccrowey, physical scientist - US EPA

RAQSAPI hexsticker

Project Status: Active – The project has reached a stable, usable state and is being actively developed. R-CMD-check CRAN status metacran downloads lifecycle licenseLast-changedate

EPA Disclaimer

This software/application was developed by the U.S. Environmental Protection Agency (USEPA). No warranty expressed or implied is made regarding the accuracy or utility of the system, nor shall the act of distribution constitute any such warranty. The USEPA has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the USEPA. The USEPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by the USEPA or the United States Government.

Warning: US EPA’s AQS Data Mart API V2 is currently
in beta phase of development, the API interface has not been finalized.
This means that certain functionality of the API may change or be removed
without notice. As a result, this package is also currently marked as beta and
may also change to reflect any changes made to the Data Mart API or in respect
to improvements in the design, functionality, quality and documentation of
this package. The authors assume no liability for any problems that may occur
as a result of using this package, the Data Mart service, any software,
service, hardware, or user accounts that may utilize this package.

Introduction

The RAQSAPI package for the R programming environment allows a R programming environment to connect to and retrieve data from the United States Environmental Protection Agency’s (US EPA) Air Quality System (AQS) Data Mart API v2 Air Quality System1 interface directly. This package enables the data user to omit legacy challenges including coercing data from a JSON object to a usable R object, retrieving multiple years of data, formatting API requests, retrieving results, handling credentials, requesting multiple pollutant data and rate limiting data requests. All the basic functionality of the API have been implemented that are available from the AQS API Data Mart server. The library connects to AQS Data Mart API via Secure Hypertext Transfer Protocol (HTTPS) so there is no need to install external ODBC drivers, configure ODBC connections or deal with the security vulnerabilities associated with them. Most API functions have a parameter, return_header, which by default is set to FALSE. If the user decides to set return_header to TRUE, then that function will return a R AQS_DATAMART_APIv2 S3 object which is a two item named list.
The first item, ($Header) in the AQS_DATAMART_APIv2 object is a tibble2 which contains the header information. The Header contains status information regarding the request (success/fail), any applicable error messages returned from the API, if any exist, the URL used in the request, a date and time stamp noting when request was received and other useful information. The second item of the AQS_DATAMART_APIv2 object ($Data) is a tibble which contains the actual data being requested. For functions with the return_header option set to FALSE (default) a simple tibble is returned with just the $Data portion of the request. After each call to the API a five second stall is invoked to help prevent overloading the Data Mart API server and to serve as a simple rate limit. 1

About the timeliness of AQS Data

EPA’s AQS Datamart API, the service that RAQSAPI retrieves data from, does not host real time (collected now/today) data. If real time data is needed, please use the AirNow API and direct all questions toward real time data there. RAQSAPI does not work with AirNow and cannot retrieve real time data. For more details see section 7.1 of the About AQS Data page3.

Installing RAQSAPI

Either install the stable version from CRAN or install the latest development version from GitHub.

Option 1: Installing the stable version from CRAN

install.packages(pkgs="RAQSAPI", dependencies = TRUE )

Option 2: Installing the development version of RAQSAPI

To install the development version of RAQSAPI, first if not already installed, install the remotes package and its dependencies. Then run the following in a R environment.

remotes::install_github(repo = "USEPA/raqsapi",
                        dependencies = TRUE,
                        upgrade = "always",
                        build = TRUE,
                        #optional, set TRUE if the manual is desired,
                        #requires pandoc
                        build_manual = FALSE,
                        build_vignettes = TRUE 
                        )

Using The RAQSAPI library

Load RAQSAPI

after successfully installing the RAQSAPI package, load the RAQSAPI library:

library(RAQSAPI)

Sign up and setting up user credentials with the RAQSAPI library

If you have not already done so you will need to sign up with AQS Data Mart using aqs_sign_up function, 2 this function takes one input, “email”, which is a R character object, that represents the email address that you want to use as a user credential to the AQS Data Mart service. After a successful call to aqs_sign_up an email message will be sent to the email address provided with a new Data Mart key which will be used as a credential key to access the Data Mart API. The aqs_sign_up function can also be used to regenerate a new key for an existing user, to generate a new key simply call the aqs_sign_up function with the parameter “email” set to an existing account. A new key will be e-mailed to the account given.

The credentials used to access the Data Mart API service are stored in a R environment variable that needs to be set every time the RAQSAPI library is attached or the key is changed. Without valid credentials, the Data Mart server will reject any request sent to it. The key used with Data Mart is a key and is not a password, so the RAQSAPI library does not treat the key as a password; this means that the key is stored in plain text and there are no attempts to encrypt Data Mart credentials as would be done for a username and password combination. The key that is supplied to use with Data Mart is not intended for authentication but only account monitoring. Each time RAQSAPI is loaded and before using any of it’s functions use the aqs_credentials 3 function to enter in the user credentials so that RAQSAPI can access the AQS Data Mart server.

Note: The credentials used to access AQS Data Mart
API is not the same as the credentials used to access AQS. AQS users who do
not have access to the AQS Data Mart will need to create new credentials.

(suggested) Use the keyring package to manage credentials

It is highly suggested that users use a keyring manager to store and retrieve their credentials while using RAQSAPI. One such credential manager is provided by the keyring package 4. The Keyring package uses the credential manager available for most popular operating systems to store and manage user credentials. This will help avoid hard coding credential information into R scripts.

To use the keyring package with RAQSAPI first install keyring:

install.package("keyring")

Ensure that your system is supported by the keyring package before proceeding.

  keyring::has_keyring_support()

then set the keyring used to access AQS Data Mart (make sure to replace the text in the angled brackets with your specific user information):

  library("keyring")  
  keyring::key_set(service = "AQSDatamart",
                   username = "\<user email account\>")

a popup window will appear for the user to input their keyring information. Enter the AQS Data mart credential key associated with the AQS user name provided, then hit enter. Now the AQS Data Mart user credential is set using keyring.

To retrieve the keyring to use with RAQSAPI load the keyring package and use the function key_get to return the user credential to RAQSAPI:

  library(RAQSAPI)  
  library(keyring)  
  datamartAPI_user <- \<user email account\>  
  server <- "AQSDatamart"

then pass these variables to the aqs_credentials function when using RAQSAPI:

  aqs_credentials(username = datamartAPI_user,
                  key = key_get(service = server,
                                username = datamartAPI_user
                                )
                  )

To change the keyring stored with the keyring package repeat the steps above to call the keyring::key_set function again with the new credential information.

To retrieve a list of all keyrings managed with the keyring package use the function: > keyring::key_list()

Refer the thekeyring package documentation for an in depth explanation on using the keyring package.

Information: AQS Data Mart API restricts the
maximum amount of monitoring data to one full year of data per API
call.

RAQSAPI functions are named according to the service and filter variables that are available by the Data Mart API.5

Data Mart aggregate functions

These functions retrieve aggregated data from the Data Mart API and are grouped by how each function aggregates the data. There are 7 different families of related aggregate functions in which the AQS Data Mart API groups data.

These seven families are:

  • _by_site
  • _by_county
  • _by_state
  • _by_<latitude/longitude bounding box> (_by_box)
  • _by_<monitoring agency> (_by_MA)
  • _by_<Primary Quality Assurance Organization> (_by_pqao)
  • _by_<core based statistical area (as defined by the
    US census Bureau)> (_by_cbsa).

Within these families of aggregated data functions there are functions that call on the 13 different aggregate services that the Data Mart API provides. Note that not all aggregations are available for each service.

These fourteen services are:

  • Monitors (aqs_monitors_by_*)
  • Sample Data (aqs_sampledata_by_*)
  • Daily Summary Data (aqs_dailysummary_by_*)
  • Annual Summary Data (aqs_annualsummary_by_*)
  • Quarterly Summary Data (aqs_quarterlysummary_by_*)
  • Quality Assurance - Blanks Data (aqs_qa_blanks_by_*)
  • Quality Assurance - Collocated Assessments (aqs_qa_collocated_assessments_by_*)
  • Quality Assurance - Flow Rate Verifications (aqs_qa_flowrateverification_by_*)
  • Quality Assurance - Flow Rate Audits (aqs_qa_flowrateaudit_by_*)
  • Quality Assurance - One Point Quality Control Raw Data (aqs_qa_one_point_qc_by_*)
  • Quality Assurance - PEP Audits (aqs_qa_pep_audit_by_*)
  • Transaction Sample - AQS Submission data in transaction Format (RD) (aqs_transactionsample_by_*)
  • Quality Assurance - Annual Performance Evaluations
    (aqs_qa_annualperformanceeval_by_*)
  • Quality Assurance - Annual Performance Evaluations in the AQS
    Submission transaction format (RD) (aqs_qa_annualperformanceevaltransaction_by_*)
Information: AQS Data Mart API restricts the
maximum amount of monitoring data to one full year of data per
API call. These functions are able to return multiple years of data by
making repeated calls to the API. Each call to the Data Mart API will take
time to complete. The more years of data being requested the longer RAQSAPI
will take to return the results.

Aggregate functions are named AQS_API<service>_<aggregation>() where <service> is one of the 13 services listed above and <aggregation> is either “_by_site”, “_by_county”, “_by_state”, “_by_box”, “_by_cbsa”, “_by_ma”, or “_by_pqao”.

See the RAQSAPI vignette for more details

(RAQSAPI must be installed and built with BUILD_MANUAL = TRUE enabled)

  RShowDoc(what="RAQSAPIvignette", type="html", package="RAQSAPI")

pyaqsapi - a port of RAQSAPI to the python 3 programming environment

For users that feel more comfortable working within a python 3 environment, pyaqsapi4, a port of RAQSAPI to the python 3 language has been released. Both projects aim to maintain feature parity with the other and there are no inherent advantages to using either project over the other, except for the ability of working within the programming language environment of choice. The API of both packages are very structured similarly, both packages export the same data, use the same credentials and data source to retrieve data.

Acknowledgements

RAQSAPI was included in the Rblogger’s March 2021: “Top 40” New CRAN Packages.

The RAQSAPI package borrows upon functions and code provided by sources not mentioned in the DESCRIPTION file. Here we attempt to acknowledge those sources with them RAQSAPI would not be possible.

  • README badges are provided by R package badgecreator5.
  • The R package usethis6 was used to generate GitHub actions for Continuous integration (CI).
  • Code cleanup was assisted by the R package lintr7
  • the function install.packages are provided by the R package utils8
  • the function install_github are provided by the R package remotes9
  • .gitignore file borrowed examples from https://github.com/github/gitignore/blob/master/R.gitignore
  • . CITATION.cff file was generated by the R package cffr10
  • R package urlchecker11 was used to check urls in RAQSAPI documentation

References

(2) Müller, K.; Wickham, H. Tibble: Simple Data Frames; 2021.

(4) Mccrowey, C. A Python 3 Package to Retrieve Ambient Air Monitoring Data from the United States Environmental Protection Agency’s (US EPA) Air Quality System (AQS) Data Mart API V2 Interface, 2022. https://github.com/USEPA/pyaqsapi.

(6) Wickham, H.; Bryan, J.; Barrett, M. Usethis: Automate Package and Project Setup; 2021.

(7) Hester, J.; Angly, F.; Hyde, R. Lintr: A ’Linter’ for r Code; 2020.

(8) Team, R. C. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019.

(9) Csárdi, G.; Hester, J.; Wickham, H.; Chang, W.; Morgan, M.; Tenenbaum, D. Remotes: R Package Installation from Remote Repositories, Including ’GitHub’; 2021.

(10) Druskat, S.; Spaaks, J. H.; Chue Hong, N.; Haines, R.; Baker, J.; Bliven, S.; Willighagen, E.; Pérez-Suárez, D.; Konovalov, A. Citation File Format, 2021. https://doi.org/10.5281/zenodo.5171937.

(11) R-Lib/Urlchecker on Github. https://github.com/r-lib/urlchecker (accessed 2023-11-30).

Footnotes

  1. RAQSAPI’s rate limit does not guarantee that the user will not go over the rate limit and does not guarantee that API calls do not overload the AQS Data Mart system, each user should monitor their requests independently.

  2. Use “?aqs_sign_up” after the RAQSAPI library has been loaded to see the full usage description of the aqs_sign_up function.

  3. Use “?aqs_credentials” after the RAQSAPI library has been loaded to see the full usage description of the aqs_credentials function.

  4. [R Keyring package]https://cran.r-project.org/package=keyring)

  5. See (https://aqs.epa.gov/aqsweb/documents/data_api.html) for the full details of the Data Mart API

raqsapi's People

Contributors

andychase avatar mccroweyclinton-epa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

raqsapi's Issues

Annual and Daily function names typo

Hi! I'm a data journalist who has been using this package off and on for several months, so first I want to say thank you for your work on this wrapper. It's been a huge help and very easy to use.

When I was referencing the vignettes today, I noticed what looks like a typo in the description of "Daily Summary Data (aqs_dailydata_by_)" and "Annual Summary Data (aqs_annualdata_by_)" in the vignettes and README. I think you might have accidentally typed typed "dailydata" and "annualdata" when in fact the function is "dailysummary" and "annualsummary."

I did a quick search and it does look like dailydata and annualdata are used elsewhere in the package, but I believe those are just your naming conventions unrelated to the daily and annual summary functions and their names.

Suggestion: automatically 0-pad numeric arguments

As things stand, for example, if you provide stateFIPS = 1 to aqs_sampledata_by_site, you get the error state: 1, requires 2 digit numeric value. You're expected to indicate state 1 as "01" instead.

It would likely be better for the API itself to accept unpadded values, but if the API is going to remain picky about this, the package can work around it.

routines::unsafe legacy renegotiation disabled

R package curl which is a dependency of httr which itself is a dependency of RAQSAPI throws the following error when attempting to connect to the AQS DataMart API server using RAQSAPI in R environments where the curl library uses the libssl library to establish an SSL connection with AQS DataMart.

The error message is as follows:
Error in `curl::curl_fetch_memory(url, handle = handle)`: error:0A000152:SSL routines::unsafe legacy renegotiation disabled

It appears that this issue is due to a change in the openssl library wherein openssl no longer allows connections to domains which do not support RFC 5746.

R environments that are running on non-Windows systems or Windows systems where curl uses libssl seem to be affected. Users running more modern versions of Windows (Windows 10 or newer) can avoid this issue by setting curl to use the Windows Schannel backend.

To my current knowledge, there are no code changes to the RAQSAPI package that will fix this issue on systems that do not have access to the Windows Schannel API, this included all non-windows systems.

The only way to fix this issue is to have add support for RFC 5746 on the datamart API server or for the libssl library to re-add support for legacy renegotiation. The EPA is working with the EPA NCC on adding support for RFC 5746 to the EPA AQS Datamart API server in the meantime.

Until this issue is addressed all github actions unit test of RAQSAPI will fail due to SSL connectivity issues since github actions defaults to using openssl in the curl libraries on github actions runner systems.

Qualifier Explanations added to documentation

Each observation contains a list of one of 39 qualifiers. However, the documentation doesn't include what each of these qualifiers means and I can't find anything about it on the web:
image

For example, why are some values voided by the operator. One qualifier just says "outlier." Does that mean the machine was malfunctioning or that's just a reading outside of the norm? If the machine was malfunctioning why not label it "machine malfunction?" Etc.

Thank you for your work on this package.

RAQSAPI and rlang 1.0.0

Hello, I see in revdep checks:

** byte-compile and prepare package for lazy loading
Error: object ‘call_frame’ is not exported by 'namespace:rlang'
Execution halted

This is because RAQSAPI is using call_frame() which has been deprecated in rlang 0.3.0 (released in October 2018):

callingfunction <- rlang::call_frame(n = 2)$fn_name

If you wait for rlang 1.0 to come out (very soon), you could replace that call by:

rlang::call_name(rlang::caller_call(2))

Otherwise you could use:

rlang::call_name(sys.call(sys.parent(2))

"Error in httr::user_agent(.) : length(agent) == 1 is not TRUE" on ags_dailysummary_by_state() call

Hi, below is the example I use and the error message I get from aqs_dailysummary_by_state(). Do you know why? (This is under R-4.0.4 with the latest CRAN version of RAQSAPI)

library(RAQSAPI)

## Variables
param <- as.character(c(
    44201, # Ozone (O3; parts per million)
    81104, # PM2.5 STP (PM2.5; micrograms/m3)
    81102, # PM10 Total 0-10um STP (PM10, micrograms/m3)
    42602, # Nitrogen dioxide (NO2) (NO2; parts per billion)
    42603, # Oxides of nitrogen (NOx) (NOX; parts per billion)
    42401, # Sulfur dioxide (SO2; parts per billion)
    42102, # Carbon dioxide (CO2; parts per million)
    42101, # Carbon monoxide (CO; parts per million)
    62101, # Outdoor Temperature (TEMPO; Degrees Fahrenheit)
    64101, # Barometric pressure (BPRES; millibars)
    62201, # Relative Humidity (RHUM; percent relative humidity)
    61101, # Wind Speed - Scalar (WS-S; Knots)
    65101 # Rain 24hr total (RAIN, inches)
))

## State
state <- 36 # New York

## Time period (only one year at a time)
start <- as.Date("2000-01-01")
end <- as.Date("2000-12-31")

## Request data
dat <- aqs_dailysummary_by_state(param, bdate = start, edate = end, stateFIPS = state)
## => Error in httr::user_agent(.) : length(agent) == 1 is not TRUE

## Traceback
## > traceback()
## 8: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))
## 7: stopifnot(is.character(agent), length(agent) == 1)
## 6: httr::user_agent(.)
## 5: glue("User:{user} via RAQSAPI library for R") %>% httr::user_agent()
## 4: aqs(service = service, filter = "byState", user = getOption("aqs_username"),
##        user_key = getOption("aqs_key"), variables = list(param = format.multiple.params.for.api(parameter),
##            bdate = format(bdate, format = "%Y%m%d"), edate = format(edate,
##                format = "%Y%m%d"), state = stateFIPS, cbdate = cbdate,
##            cedate = cedate))
## 3: .f(parameter = .l[[1L]][[1L]], bdate = .l[[2L]][[1L]], edate = .l[[3L]][[1L]],
##        stateFIPS = .l[[4L]][[1L]], service = .l[[5L]][[1L]], ...)
## 2: purrr::pmap(.l = params, .f = aqs_services_by_state)
## 1: aqs_dailysummary_by_state(param, bdate = start, edate = end,
##        stateFIPS = state)

Overfull \hbox (xxxxxpt too wide) in RAQSAPI manual

When rendering the RAQSAPI manual some of the links generated from the @family roxygen2 notes are placed outside of the document margins. Warning messages are displayed during the compilation of the manual stating

Overfull \hbox (xxxxxpt too wide)

Since the document and its embedded links are automatically generated by roxygen2. I do not believe that there is anything that I can do to address this issue.

Certain functions return headers even with return_headers = FALSE.

I have been informed that some functions return headers even with the default return_headers = FALSE.

reprex:

test <- aqs_sampledata_by_site(parameter="44201",bdate=as.Date("2018-01-01"),edate=as.Date("2020-12-31"), stateFIPS="01",countycode="003",sitenum="0010")
> str(test)
List of 3
 $ :List of 2
  ..$ Header: tibble [1 x 4] (S3: tbl_df/tbl/data.frame)
  .. ..$ status      : chr "Success"
  .. ..$ request_time: chr "2021-02-24T14:35:47-05:00"
  .. ..$ url         : chr "https://aqs.epa.gov/data/api/sampleData/bySite __truncated__
  .. ..$ rows        : int 5880
  ..$ Data  : tibble [5,880 x 29] (S3: tbl_df/tbl/data.frame)
  .. ..$ state_code           : chr [1:5880] "01" "01" "01" "01" ...
  .. ..$ county_code          : chr [1:5880] "003" "003" "003" "003" ...
  .. ..$ site_number          : chr [1:5880] "0010" "0010" "0010" "0010" ...
  .. ..$ parameter_code       : chr [1:5880] "44201" "44201" "44201" "44201" ...
  .. ..$ poc                  : int [1:5880] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ latitude             : num [1:5880] 30.5 30.5 30.5 30.5 30.5 ...
  .. ..$ longitude            : num [1:5880] -87.9 -87.9 -87.9 -87.9 -87.9 ...
  .. ..$ datum                : chr [1:5880] "NAD83" "NAD83" "NAD83" "NAD83" ...
  .. ..$ parameter            : chr [1:5880] "Ozone" "Ozone" "Ozone" "Ozone" ...
  .. ..$ date_local           : chr [1:5880] "2018-03-01" "2018-03-01" "2018-03-01" "2018-03-01" ...
  .. ..$ time_local           : chr [1:5880] "00:00" "01:00" "02:00" "03:00" ...
  .. ..$ date_gmt             : chr [1:5880] "2018-03-01" "2018-03-01" "2018-03-01" "2018-03-01" ...
  .. ..$ time_gmt             : chr [1:5880] "06:00" "07:00" "08:00" "09:00" ...
  .. ..$ sample_measurement   : num [1:5880] NA 0.025 0.026 0.025 0.024 0.02 0.023 0.023 0.023 0.027 ...
  .. .. more columns...
  .. two more list elements (2019 and 2020) both with sub-elements $Headers and $Data

state code for Canada does not seem to be accepted by the API

An issue report has been filed for the pyaqsapi repository. The DataMart API is not accepting urls where the state code for Canada ("CC") is is being sent to the API. RAQSAPI seems to be affected by this issue as well.

The issue has been forwarded to the DataMart API team and they are looking into this.

Monitor with negative readings

In Palm Beach County – Lamstein Lane Site Monitor #12-099-0022 has several negative readings for PM 2.5.

image

I'm not seeing this with any other monitor in South Florida:
image

Are the readings supposed to be positive? Was an error code supposed to be attached to the readings?

Thank you for your help in this matter. And fantastic package.

github_actions fail on macosx-latest

The RAQSAPI githubactions on github have reported failures on the macosx-latest platform. I have submitted an issue request to the creator of usethis who has responded with a possible solution. I am making a bug report to document this fix and will close the issue one it passes an initial test run.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.