Coder Social home page Coder Social logo

envair's Introduction

bcgov/envair: BC air quality data retrieval and analysis tool

bcgovr

img License

Overview

bcgov/envair is an R package developed by the BC Ministry of Environment and Climate Change Strategy Knowledge Management Branch/Environmental and Climate Monitoring Section (ENV/KMB/ECMS) air quality monitoring unit. Package enables R-based retrieval and processing of air quality monitoring data. Output is now compatible with the popular openair package.

Installation

You can install envair directly from this GitHub repository. To proceed, you will need the remotes package:

install.packages("remotes")

Next, install and load the envair package using remotes::install_github():

remotes::install_github("bcgov/envair")
library(envair)

Features

  • Retrieve data from the Air Quality Data Archive by specifying parameter (pollutant) or station. Functions have the option to add Transboundary Flow Exceptional Event (TFEE) flags. Data archive is located in the ENV’s FTP server: ftp://ftp.env.gov.bc.ca/pub/outgoing/AIR/

  • Generate annual metrics, data captures and statistical summaries following the Guidance Document on Achievement Determination and the Canadian Ambient Air Quality Standards

  • Most of data processing functions can automatically process a parameter as input, or a dataframe of air quality data.

  • Retrieve archived and current ventilation index data with options to generate a kml map.

Functions

  • importBC_data() Retrieves station or parameter data from specified year/s between 1980 and yesterday.

    • for you can specify one or several parameters, or name of one or several stations
    • if station is specified, output returns a wide table following the format of the openair package. It also renames all columns into lowercase letters, changes scalar wind speed to ws, and vector wind direction to wd. It also shifts the datetime to the time-beginning format
    • if parameter is specified, output displays data from all air quality monitoring stations that reported this data.
    • use flag_TFEE = TRUE to add a new boolean (TRUE/FALSE) column called flag_tfee. This option only works when you enter parameter (not station) in the parameter_or_station.
    • use merge_Stations = TRUE to merge data from monitoring stations and corresponding alternative stations, especially in locations where the monitoring station was relocated. This function may also change the name of the air quality monitoring station.
    • set pad = TRUE to pad missing dates, set use_ws_vector = TRUE to use vector wind speed instead of scalar, and set use_openairformat = FALSE to produce the original non-openair output.
  • importBC_data_avg() Retrieves pollutant (parameter) data and performs statistical averaging based on the specified averaging_type

    • function can retrieve 24-hour averages (24-hr), daily 1-hour maximum (d1hm), daily 8-hour maximum (d8hm), rolling 8-hour values (8-hr)
    • it can also make annual summaries such as 98th percentile of daily 1-hour maximum, annual mean of 24-hour values. To perform annual summaries, the averaging_type should include “annual <averaging/percentile> <1-hour or 24-hour or dxhm>”
      • function can calculate the number of times a certain value has been exceeded

        List of possible values for the averaging_type.
        Type of averaging averaging_type= Syntax Output description
        1-hr “1-hr” Outputs hourly data. No averaging done.
        Daily Average “24-hr” Outputs the daily (24-hour) values.
        Rolling 8-hour “8-hr” Outputs hourly values that were calculated from rolling 8-hour average.
        Daily 1-hour maximum “d1hm” Outputs daily values of the highest 1-hour concentration
        Daily 8-hour maximum “d8hm” Outputs the daily 8-hour maximum for each day
        Annual mean of 1-hour values “annual mean 1-hr” Outputs the average of all hourly values
        Annual mean of daily values

        “annual mean 24-hr”

        “annual mean <avging>”

        Outputs average of all daily values.
        Annual 98th percentile of 1-hour values

        “annual 98p 1-hr”

        “annual <xxp> <avging>

        Outputs the 98th percentile of the 1-hour values.
        4th Highest daily 8-hour maximum

        “annual 4th d8hm”

        “annual <rank> <avging>

        Outputs the 4th highest daily 8-hour maximum.
        Number of daily values exceeding 28 µg/m3

        “exceed 28 24-hr”

        “exceed <value> <avging>

        Outputs the number of days where the 28 µg/m3 is exceeded
        Number of exceedance to d8hm of 62ppb

        “exceed 62 d8hm”

        “exceed <value> <avging>

        Outputs the number of days where the daily 8-hour maximum exceeds 62 ppb

        List of possible values for the averaging_type.

  • get_stats() Retrieves a statistical summary based on the default for the pollutant. Currently applies to PM2.5, O3, NO2, and SO2. Output includes data captures, annual metrics, and exceedances.

  • get_captures() Calculates the data captures for a specified pollutant or dataframe. Output is a long list of capture statistics such as hourly, daily, quarterly, and annual sumamaries.

  • listBC_stations() Lists details of all air quality monitoring stations (active or inactive)

  • list_parameters() Lists the parameters that can be imported by importBC_data()

  • importECCC_forecast() Retrieves AQHI, PM2.5, PM10, O3, and NO2 forecasts from the ECCC datamart

  • get_venting_summary() Summarizes the ventilation index, counting the number of GOOD, FAIR, or POOR days for the month

  • GET_VENTING_ECCC() Retrieve the venting index FLCN39 from Environment and Climate Change Canada datamart or from the B.C.’s Open Data Portal

  • ventingBC_kml() Creates a kml or shape file based on the 2019 OBSCR rules. This incorporates venting index and sensitivity zones.

Usage and Examples

importBC_data()


Retrieving air quality data, include TFEE adjustment and combine related stations

Use flag_TFEE = TRUE an merge_Stations = TRUE to produce a result that defines the TFEE and merges station and instruments, as performed during the CAAQS-reporting process.

library(envair)
df_data <- importBC_data('pm25',years = 2015:2017, flag_TFEE = TRUE,merge_Stations = TRUE)

knitr::kable(df_data[1:4,])
Using openair package function on BC ENV data.

By default, this function produces openair-compatible dataframe output. This renames WSPD_VECT,WDIR_VECT into ws and wd, changes pollutant names to lower case characters (e.g., pm25,no2,so2), and shifts the date from time-ending to time-beginning format. To use, specify station name and year/s. For a list of stations, use listBC_stations() function. If no year is specified, function retrieves latest data, typically the unverfied data from start of year to current date.

library(openair)
PG_data <- importBC_data('Prince George Plaza 400',2010:2012)
pollutionRose(PG_data,pollutant='pm25')

Other features for station data retrieval
  • To import without renaming column names, specify use_openairformat = FALSE. This also keeps date in time-ending format
  • By default, vector wind direction and scalar wind speeds are used
  • To use vector wind speed, use use_ws_vector = TRUE
  • Station name is not case sensitive, and works on partial text match
  • Multiple stations can be specified c(‘Prince George’,‘Kamloops’)
  • For non-continuous multiple years, use c(2010,2011:2014)
importBC_data('Prince George Plaza 400',2010:2012,use_openairformat = FALSE)
importBC_data('Kamloops',2015)
importBC_data(c('Prince George','Kamloops'),c(2010,2011:2014))
importBC_data('Trail',2015,pad = TRUE)              
Retrieve parameter data

Specify parameter name to retrieve data. Note that these are very large files and may use up your computer’s resources. List of parameters can be found using the list_parameters() function.

pm25_3year <- importBC_data('PM25',2010:2012)

importBC_data_avg()


Retrieving the annual average of daily values for multiple parameters

The function is capable of processing multiple parameters, and multiple years, but it can only do one averaging type. The averaging type can be a simple averaging (e.g., 24-hour, 8-hour, or combined averaging (e.g., annual 98p d1hm, annual mean 24-hour). Check the table above for a comprehensive list of averaging_type.

#user can specify the parameter, enter parameter name as input
annual_mean <- importBC_data_avg(c('pm25','o3'), years = 2015:2018, averaging_type = 'annual mean 24-hr')

#or if you already have a dataframe, you can use the dataframe as input to for the statistical summary
df_input <- importBC_data(param = c('pm25','o3'), years = 2015:2018)
annual_mean <- importBC_data_avg(df_input,averaging_type = 'annual mean 24-hr')

get_stats()


Calculate the annual metrics of PM2.5

The function will calculate statistical summaries , data captures, and number of exceedances based on the CAAQS metrics and values. The function will only perform a year-by-year calculation so the results are not the actual CAAQS metrics, but can be used to derive it. For ozone, it creates Q2 + Q3

#example retrieves stat summaries
stats_result <- get_stats(param = 'o3', years = 2016,add_TFEE = TRUE, merge_Stations = TRUE)

get_captures()


Calculate the data captures of PM2.5

The function will create a summary of data captures for the parameter or for dataframe. You can specify the parameter or, if available, use an air quality dataframe as input.

#you can use the parameter as input
data_captures <- get_captures(param = c('pm25','o3'), years = 2015:2018,merge_Stations = TRUE)

#or you can use a dataframe 
air_data <- importBC_data(c('pm25','o3'), years = 2015:201,merge_Stations = TRUE)
data_captures <- get_captures(param = air_data, years = 2015:2018)

listBC_stations()


produces a dataframe that lists all air quality monitoring station details. if year is specified, it retrieves the station details from that year. Note that this entry may not be accurate since system has not been in place to generate these station details.

listBC_stations()
listBC_stations(2016)
STATION_NAME_FULL STATION_NAME EMS_ID NAPS_ID SERIAL CITY LAT LONG ELEVATION STATUS_DESCRIPTION OWNER REGION STATUS OPENED CLOSED NOTES SERIAL_CODE CGNDB AIRZONE
100 Mile House 100 Mile House M116006 NA 374 100 Mile House 51.6542 -121.375 1000 NON OPERATIONAL ENV 05 - Cariboo INACTIVE 1992-11-11 NA N/A UNKNOWN N/A Central Interior
100 Mile House BCAC 100 Mile House BCAC E218444 NA 228 100 MIle House 51.6461 -121.937 0 NON OPERATIONAL ENV 05 - Cariboo INACTIVE 2010-02-16 NA N/A UNKNOWN N/A Central Interior
Abbotsford A Columbia Street Abbotsford A Columbia Street E289309 NA 428 Abbotsford 49.0215 -122.3266 65 METRO VANCOUVER MVRD 02 - Lower Mainland ACTIVE 2012-07-25 NA N/A UNKNOWN N/A Lower Fraser Valley
Abbotsford A Columbia Street Met Abbotsford A Columbia Street E289309 NA 429 Abbotsford 49.0215 -122.3266 65 METRO VANCOUVER MVRD 02 - Lower Mainland ACTIVE 2012-07-25 NA N/A UNKNOWN N/A Lower Fraser Valley

list_parameters()


produce a vector string of available parameters that can be retrieved with importBC_data()

GET_VENTING_ECCC()


produces a dataframe containing the recent venting index.

GET_VENTING_ECCC()
GET_VENTING_ECCC('2019-11-08')
GET_VENTING_ECCC((dates = seq(from = lubridate::ymd('2021-01-01'),
        to = lubridate::ymd('2021-05-01'), by = 'day')))
VENTING_INDEX_ABBREV DATE_ISSUED CURRENT_VI CURRENT_VI_DESC CURRENT_WSPD CURRENT_MIX_HEIGHT TODAY_VI TODAY_VI_DESC TODAY_WSPD TODAY_MIX_HEIGHT TOMORROW_VI TOMORROW_VI_DESC TOMORROW_WSPD TOMORROW_MIX_HEIGHT NAME REGION LAT LONG
100 MILE 2024-01-12 73 GOOD 23 1686 48 FAIR 16 1474 23 POOR 15 1124 100 Mile House CENTRAL INTERIOR 51.63915 -121.2945
ATLIN 2024-01-12 11 POOR 10 742 11 POOR 10 745 15 POOR 10 817 Atlin NORTHERN BC 59.57000 -133.7000
BELLA COOLA 2024-01-12 37 FAIR 13 480 34 FAIR 21 283 21 POOR 11 256 Bella Coola COAST 52.38000 -126.7500
BURNS LAKE 2024-01-12 10 POOR 6 748 20 POOR 8 987 25 POOR 10 1026 Burns Lake CENTRAL INTERIOR 54.23142 -125.7597

importECCC_forecast()


  • Retrieves forecasts and model data from ECCC
  • parameters include AQHI, PM25, NO2, O3, PM10
importECCC_forecast('no2')

ventingBC_kml()


  • creates a kml object based on the 2019 OBSCR rules
  • directory to save kml file can be specified. File will be saved in that directory as Venting_Index_HD.kml.
ventingBC_kml()
ventingBC_kml('C:/temp/')

envair's People

Contributors

jeromerobles avatar zoegao218 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

envair's Issues

rqst to add lat long for fort saint john stations

latitude and longitude are missing for NE stations - requesting this info be added

envair::listBC_stations() %>%
# ne stations don't have lat long 
  dplyr::filter(is.na(LONG) | is.na(LAT)) %>%
dplyr::distinct(STATION_NAME)

duplicate rows

Hi @jeromerobles

importBC_data() generates 8 duplicate timestamps at the beginning of each year.

I believe this is an importBC_data() issue as the original data does not contain the duplicate rows. Data: ftp://ftp.env.gov.bc.ca//pub/outgoing/AIR/AnnualSummary/2018/STATION_DATA/E295892.csv

I wonder if this issue is related to #4? There there is an 8-hour difference between UTC and PST.

farm <- importBC_data('Farmington', 2018)
farm %>% 
  select(date, station_name, ems_id, contains("o3")) %>% 
  arrange(date)

                  date              station_name  ems_id    o3_raw   o3 o3_instrument
1  2018-01-01 00:00:00 Farmington Community Hall E295892        NA   NA          <NA>
2  2018-01-01 00:00:00 Farmington Community Hall E295892  6.848054  6.8    O3_APIT400
3  2018-01-01 01:00:00 Farmington Community Hall E295892        NA   NA          <NA>
4  2018-01-01 01:00:00 Farmington Community Hall E295892  6.640554  6.6    O3_APIT400
5  2018-01-01 02:00:00 Farmington Community Hall E295892        NA   NA          <NA>
6  2018-01-01 02:00:00 Farmington Community Hall E295892  5.903611  5.9    O3_APIT400
7  2018-01-01 03:00:00 Farmington Community Hall E295892        NA   NA          <NA>
8  2018-01-01 03:00:00 Farmington Community Hall E295892  4.670833  4.7    O3_APIT400
9  2018-01-01 04:00:00 Farmington Community Hall E295892        NA   NA          <NA>
10 2018-01-01 04:00:00 Farmington Community Hall E295892  5.527222  5.5    O3_APIT400
11 2018-01-01 05:00:00 Farmington Community Hall E295892        NA   NA          <NA>
12 2018-01-01 05:00:00 Farmington Community Hall E295892  9.448887  9.4    O3_APIT400
13 2018-01-01 06:00:00 Farmington Community Hall E295892        NA   NA          <NA>
14 2018-01-01 06:00:00 Farmington Community Hall E295892 33.229170 33.2    O3_APIT400
15 2018-01-01 07:00:00 Farmington Community Hall E295892        NA   NA          <NA>
16 2018-01-01 07:00:00 Farmington Community Hall E295892 36.935830 36.9    O3_APIT400
17 2018-01-01 08:00:00 Farmington Community Hall E295892 37.506110 37.5    O3_APIT400
18 2018-01-01 09:00:00 Farmington Community Hall E295892 39.153610 39.2    O3_APIT400

Add missing topics

TL;DR

Topics greatly improve the discoverability of repos; please add the short code from the table below to the topics of your repo so that ministries can use GitHub's search to find out what repos belong to them and other visitors can find useful content (and reuse it!).

Why Topic

In short order we'll add our 800th repo. This large number clearly demonstrates the success of using GitHub and our Open Source initiative. This huge success means its critical that we work to make our content as discoverable as possible; Through discoverability, we promote code reuse across a large decentralized organization like the Government of British Columbia as well as allow ministries to find the repos they own.

What to do

Below is a table of abbreviation a.k.a short codes for each ministry; they're the ones used in all @gov.bc.ca email addresses. Please add the short codes of the ministry or organization that "owns" this repo as a topic.

add a topic

That's in, you're done!!!

How to use

Once topics are added, you can use them in GitHub's search. For example, enter something like org:bcgov topic:citz to find all the repos that belong to Citizens' Services. You can refine this search by adding key words specific to a subject you're interested in. To learn more about searching through repos check out GitHub's doc on searching.

Pro Tip 🤓

  • If your org is not in the list below, or the table contains errors, please create an issue here.

  • While you're doing this, add additional topics that would help someone searching for "something". These can be the language used javascript or R; something like opendata or data for data only repos; or any other key words that are useful.

  • Add a meaningful description to your repo. This is hugely valuable to people looking through our repositories.

  • If your application is live, add the production URL.

Ministry Short Codes

Short Code Organization Name
AEST Advanced Education, Skills & Training
AGRI Agriculture
ALC Agriculture Land Commission
AG Attorney General
MCF Children & Family Development
CITZ Citizens' Services
DBC Destination BC
EMBC Emergency Management BC
EAO Environmental Assessment Office
EDUC Education
EMPR Energy, Mines & Petroleum Resources
ENV Environment & Climate Change Strategy
FIN Finance
FLNR Forests, Lands, Natural Resource Operations & Rural Development
HLTH Health
FLNR Indigenous Relations & Reconciliation
JEDC Jobs, Economic Development & Competitiveness
LBR Labour Policy & Legislation
LDB BC Liquor Distribution Branch
MMHA Mental Health & Addictions
MAH Municipal Affairs & Housing
BCPC Pension Corporation
PSA Public Safety & Solicitor General & Emergency B.C.
SDPR Social Development & Poverty Reduction
TCA Tourism, Arts & Culture
TRAN Transportation & Infrastructure

NOTE See an error or omission? Please create an issue here to get it remedied.

Add project lifecycle badge

No Project Lifecycle Badge found in your readme!

Hello! I scanned your readme and could not find a project lifecycle badge. A project lifecycle badge will provide contributors to your project as well as other stakeholders (platform services, executive) insight into the lifecycle of your repository.

What is a Project Lifecycle Badge?

It is a simple image that neatly describes your project's stage in its lifecycle. More information can be found in the project lifecycle badges documentation.

What do I need to do?

I suggest you make a PR into your README.md and add a project lifecycle badge near the top where it is easy for your users to pick it up :). Once it is merged feel free to close this issue. I will not open up a new one :)

importBC_data() - suggestion: load 'plyr' before 'dplyr'

I noticed that dplyr::group_by()wasn't working for me after using the importBC_data() function.

Loading the package plyr before dplyr will prevent this type of issue. See previous discussion on this topic here.

The following change to importBC_data() (line 46) should do the trick:
Current
#load packages
RUN_PACKAGE(c('dplyr','RCurl','plyr','readr','lubridate','tidyr','stringi')) #,'feather'
if (is.null(years))
{
years=as.numeric(format(Sys.Date(),'%Y'))
}

Update to
#load packages
RUN_PACKAGE(c('plyr', 'dplyr', 'RCurl', 'readr', 'lubridate', 'tidyr' ,'stringi')) #,'feather'
if (is.null(years))
{
years=as.numeric(format(Sys.Date(),'%Y'))
}

Or use tidyverse (which includes several packages listed above)
#load packages
RUN_PACKAGE(c('plyr', 'RCurl', 'stringi', 'tidyverse', 'lubridate')) #,'feather'
if (is.null(years))
{
years=as.numeric(format(Sys.Date(),'%Y'))
}

importBC_data() throwing error about parameter=aqhi

When trying to import data for many parameters (but not for the aqhi) for the year 2022, get an error as follows:

parameters <- tolower(
  c(
    "CO",
    "H2S",
    "HF",
    "HUMIDITY",
    "NO",
    "NO2",
    "NOX",
    "O3",
    "PM10",
    "PM25",
    "SO2",
    "TEMP_MEAN",
    "TRS",
    "WDIR_VECT",
    "WSPD_SCLR"
  )
)

data<-envair::importBC_data(
  parameter_or_station = parameters,
  years=2022,
  use_openairformat = FALSE
) %>%
  dplyr::distinct(.) %>%
  dplyr::filter(lubridate::year(DATE_PST) == 2022)
Error in if (tolower(parameter_or_station) == "aqhi") { : 
  the condition has length > 1

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    

time zone: America/Edmonton
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] janitor_2.2.0    arrow_13.0.0.1   envair_0.4.0.100 magrittr_2.0.3   lubridate_1.9.3  forcats_1.0.0   
 [7] stringr_1.5.0    dplyr_1.1.3      purrr_1.0.2      readr_2.1.4      tidyr_1.3.0      tibble_3.2.1    
[13] ggplot2_3.4.3    tidyverse_2.0.0 

loaded via a namespace (and not attached):
 [1] gtable_0.3.4        xfun_0.40           htmlwidgets_1.6.2   latticeExtra_0.6-30 lattice_0.21-8     
 [6] tzdb_0.4.0          bitops_1.0-7        vctrs_0.6.4         tools_4.3.1         generics_0.1.3     
[11] curl_5.1.0          fansi_1.0.5         cluster_2.1.4       pkgconfig_2.0.3     Matrix_1.6-1       
[16] data.table_1.14.8   checkmate_2.2.0     openair_2.17-0      RColorBrewer_1.1-3  assertthat_0.2.1   
[21] readxl_1.4.3        lifecycle_1.0.3     compiler_4.3.1      deldir_1.0-9        munsell_0.5.0      
[26] mapproj_1.2.11      snakecase_0.11.1    htmltools_0.5.6     maps_3.4.1          RCurl_1.98-1.12    
[31] yaml_2.3.7          htmlTable_2.4.1     Formula_1.2-5       hexbin_1.28.3       pillar_1.9.0       
[36] MASS_7.3-60         DT_0.29             Hmisc_5.1-1         rpart_4.1.19        nlme_3.1-163       
[41] tidyselect_1.2.0    digest_0.6.33       stringi_1.7.12      bookdown_0.35       splines_4.3.1      
[46] fastmap_1.1.1       grid_4.3.1          colorspace_2.1-0    cli_3.6.1           base64enc_0.1-3    
[51] utf8_1.2.3          withr_2.5.1         foreign_0.8-85      scales_1.2.1        backports_1.4.1    
[56] bit64_4.0.5         timechange_0.2.0    rmarkdown_2.24      jpeg_0.1-10         bit_4.0.5          
[61] interp_1.1-4        nnet_7.3-19         gridExtra_2.3       cellranger_1.1.0    png_0.1-8          
[66] hms_1.1.3           evaluate_0.21       knitr_1.44          mgcv_1.9-0          rlang_1.1.1        
[71] Rcpp_1.0.11         glue_1.6.2          rstudioapi_0.15.0   jsonlite_1.8.7      R6_2.5.1    

importBC_data error for aqhi

R Version 4.3.1. envair version 0.4.0.100

Is is possible to access aqhi data using importBC_data? I don't see the value showed when I tried...

envair::importBC_data(parameter_or_station="aqhi",years=2023)

envair

importBC_data() error: "DATE_PST" not found

importBC_data() is throwing an error "DATE_PST" not found. For example, importBC_data('Kamloops', 2015) results in:

image

I suspect the failure is occuring at line 740 because DATE_PST was removed from df_data earlier, on line 711.

image

Extra data imported using envair::importBC_data

R Version 4.3.1 envair version 0.4.0.100.

The call below is requesting wind data from 2016-2021, but data from 2022-2024 is also imported

# import previous x years wind data for comparison
prevYrWind <- envair::importBC_data(
  parameter_or_station = c("wdir_vect",
                           "wspd_sclr"),
  years = (yearToValidate - 6):(yearToValidate - 1),
  use_openairformat = FALSE
)

# check what years were imported
prevYrWind %>%
  dplyr::distinct(year=lubridate::year(DATE_PST)) %>%
  dplyr::arrange(year)# imports data in years that were not requested

2016-21WindData_envairYears

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 181 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

importBC_data() error

importBC_data() and listBC_stations() are throwing the same error on R 4.2.1, Windows (BC Gov workstation) and Mac OS (personal). The nature of the error is unclear to me.

library(envair)
PG_data <- importBC_data('Prince George Plaza 400',2010:2012)

airzones was updated on NULL
Error in wk_handle.wk_wkb(wkb, s2_geography_writer(oriented = oriented,  : 
  Loop 0 is not valid: Edge 441 has duplicate vertex with edge 446
sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8   
[3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.10     envair_0.2.2.100

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9         pillar_1.8.1       compiler_4.2.1     bcmaps_1.0.3      
 [5] class_7.3-20       bitops_1.0-7       tools_4.2.1        bit_4.0.4         
 [9] lifecycle_1.0.3    tibble_3.1.8       gtable_0.3.1       pkgconfig_2.0.3   
[13] rlang_1.0.6        DBI_1.1.3          cli_3.4.1          rstudioapi_0.14   
[17] curl_4.3.2         e1071_1.7-11       s2_1.1.0           withr_2.5.0       
[21] generics_0.1.3     vctrs_0.4.2        hms_1.1.2          classInt_0.4-7    
[25] bit64_4.0.5        grid_4.2.1         tidyselect_1.2.0   glue_1.6.2        
[29] sf_1.0-8           R6_2.5.1           fansi_1.0.3        vroom_1.6.0       
[33] sessioninfo_1.2.2  tzdb_0.3.0         readr_2.1.3        ggplot2_3.3.6     
[37] magrittr_2.0.3     units_0.8-0        rcaaqs_0.3.1.9000  scales_1.2.1      
[41] ellipsis_0.3.2     assertthat_0.2.1   colorspace_2.0-3   KernSmooth_2.23-20
[45] utf8_1.2.2         proxy_0.4-27       stringi_1.7.8      wk_0.6.0          
[49] RCurl_1.98-1.9     munsell_0.5.0      crayon_1.5.2     

get_captures error for 2022 sparwood centennial square PM data

envair 0.4.0.600
R version 4.3.1

Trying to use get_captures on a dataframe retrieved from an importBC_data call:

data<-envair::importBC_data(parameter_or_station = c("pm25",
                                                     "pm10"),
                            years=2022,
                            use_openairformat = FALSE) %>%
  dplyr::filter(STATION_NAME=="Sparwood Centennial Square" &
                  lubridate::year(DATE_PST)==2022)
  

captures<-envair::get_captures(param=data,
                               years = 2022)

Produces this error:
2024-01-19 get_captures

Inconsistent PM2.5 data downloaded for 2017 Kamloops

For PM2.5 data for Kamloops in 2017, data returned by importBC_data() function differs if downloading the data by station (importBC_data(station, ...) vs by parameter (importBC_data("PM25", ...)). When downloaded by parameter, the data are "missing" past 2017-06-19 12:00:00. This is likely due to the fact that the monitor was switched to a SHARP; data may not be gathered from the correct ftp paths and thus appears "missing". See example below to create plots for comparison.

library(envair)
library(tidyr)
library(ggplot2)

station <- "Kamloops Federal Building"
years <- 2016:2018

# by station
dataByStation <- importBC_data(station, years = years)
xs <- dataByStation %>%
  mutate(pm25_instrument = paste0("ch_PM25_", pm25_instrument),
         pm25_2_instrument = paste0("ch_PM25_2_", pm25_2_instrument))

ggplot() +
  geom_point(data = xs, aes(x = date, y = pm25_raw, color = pm25_instrument), alpha = 0.5) +
  geom_point(data = xs, aes(x = date, y = pm25_2_raw, color = pm25_2_instrument), size = 1.5, alpha = 0.5) +
  theme_bw()

# by parameter
pm25 <- importBC_data("PM25", years = years)
dataByParameter <- pm25 %>% filter(STATION_NAME == station)

xp <- pivot_wider(dataByParameter, id_cols = c("DATE_PST", "DATE", "TIME", "STATION_NAME", "NAPS_ID", "INSTRUMENT"),
                  names_from = "PARAMETER",
                  values_from = "RAW_VALUE") %>%
  rename(date = DATE_PST)

ggplot() +
  geom_point(data = xp, aes(x = date, y = PM25, color = INSTRUMENT), alpha = 0.5) +
  theme_bw()

Wind data issues from 2016 - 2021

R Version 4.3.1 envair version 0.4.0.100.

Looking at wind data from 2016 - 2021 across BC:

# import wind data from 2016-2021
prevYrWind <- envair::importBC_data(
  parameter_or_station = c("wdir_vect",
                           "wspd_sclr"),
  years = 2016:2021,
  use_openairformat = FALSE
)

# look at summary table of data
prevYrWind %>%
  dplyr::distinct(year=lubridate::year(DATE_PST),
                  STATION_NAME_FULL,
                  STATION_NAME,
                  PARAMETER,
                  VALIDATION_STATUS)

Issues with the data:

  1. Additional years of data are imported:
prevYrWind %>%
  dplyr::distinct(year=lubridate::year(DATE_PST)) %>%
  dplyr::arrange(year)# imports data in years that were not requested
  1. there are many instances with NA for STATION_NAME_FULL and for VALIDATION_STATUS
# there are NA's for STATION_NAME_FULL and for VALIDATION_STATUS
prevYrWind %>%
  dplyr::filter(is.na(STATION_NAME_FULL) & is.na(VALIDATION_STATUS)) #5,468,551 rows

# if it's the same number as is.na(STATION_NAME_FULL) | is.na(VALIDATION_STATUS),
# then it's always both of them that are NA
prevYrWind %>%
  dplyr::filter(is.na(STATION_NAME_FULL) | is.na(VALIDATION_STATUS)) #5,468,551 rows
# the above shows that STATION_NAME_FULL is always NA when VALIDATION_STATUS is NA

# summary of when these NA's occur:
prevYrWind %>%
  dplyr::filter(is.na(STATION_NAME_FULL) & is.na(VALIDATION_STATUS)) %>%
  dplyr::group_by(year=lubridate::year(DATE_PST),
                  STATION_NAME_FULL,
                  STATION_NAME,
                  PARAMETER) %>%
  dplyr::summarise(`# NA's`=n()) # there are lots of them at different stations, 
# in different years and for both WDIR_VECT and WSPD_SCLR
  1. For some stations 2021-01-01 00:00:00 has Level 2 for VALIDATION_STATUS, but all other hours in the year have Level 0. Shouldn't all of 2021 have Level 2 for VALIDATION_STATUS?
# look for duplicates by validation status. years 2016-2021 should only have 
# validation status =  Level 2
doubleValidationStatus<-prevYrWind %>%
  dplyr::distinct(STATION_NAME_FULL,
                  STATION_NAME,
                  PARAMETER,
                  year=lubridate::year(DATE_PST),
                  VALIDATION_STATUS,
                  INSTRUMENT) %>% 
  dplyr::arrange(year,STATION_NAME_FULL,PARAMETER,INSTRUMENT) %>%
  dplyr::group_by(STATION_NAME_FULL,
                  STATION_NAME,
                  PARAMETER,
                  INSTRUMENT,
                  year) %>%
  dplyr::filter(n()>1)

# which years does this happen for?
doubleValidationStatus %>%
  dplyr::ungroup() %>%
  dplyr::distinct(year) # only 2021

# where and when does it happen in 2021?
# all of the level 0 validation status in 2021 occurs jan 1 00:00, but not at all stations
prevYrWind %>%
  dplyr::filter(lubridate::year(DATE_PST)==2021) %>%
  dplyr::group_by(STATION_NAME_FULL,
                  STATION_NAME,
                  PARAMETER,
                  VALIDATION_STATUS) %>%
  dplyr::summarise(start=min(DATE_PST),
                   end=max(DATE_PST),
                   .groups = "drop") %>%
  dplyr::arrange(STATION_NAME,
                 STATION_NAME_FULL,
                 PARAMETER,
                VALIDATION_STATUS) %>% #utils::View(.)
  dplyr::ungroup() %>%
  dplyr::filter(VALIDATION_STATUS=="Level 2") %>%
  dplyr::distinct(start,end)

Please let me know if you want me to create multiple issues from the list above.

INSTRUMENT NA's for Port Edward Sunset Drive 2022

Port Edward Sunset Drive has a subset of data with NA in the INSTRUMENT field that has all NA in the RAW_VALUE fields. I presume this data can be deleted from the database.

portEd<-envair::importBC_data(parameter_or_station = "Port Edward Sunset Drive",
                              years=2022,
                              use_openairformat = FALSE)

portEd %>%
  dplyr::group_by(PARAMETER,INSTRUMENT) %>%
  dplyr::summarise(sum(!is.na(RAW_VALUE)),
                   sum(is.na(RAW_VALUE)),
                   n())

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 181 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

Missing AQHI PM2.5 Sparwood Centennial Square 2022

envair version 0.4.0.300

Sparwood Centennial Square has PM2.5 data during 2022, but the AQHI PM2.5 is not in the AQHI dataset:

aqhi<-envair::importBC_data(parameter_or_station = "aqhi",
                            years = 2022,
                            use_openairformat = FALSE) %>%
  dplyr::filter(STATION_NAME=="Sparwood Centennial Square")

#preview the data
head(aqhi %>%
       dplyr::select(PARAMETER,
                     DATE_PST,
                     STATION_NAME,
                     contains("AQHI")))

pm25<-envair::importBC_data(parameter_or_station = "pm25",
                            years = 2022,
                            use_openairformat = FALSE) %>%
  dplyr::filter(STATION_NAME=="Sparwood Centennial Square")

head(pm25 %>%
       dplyr::select(PARAMETER,
                     DATE_PST,
                     STATION_NAME,
                     RAW_VALUE))

2024-01-19 scs aqhi pm25

importBC_data() date is POSIXct, UTC

The time zone attribute of the date column produced by importBC_data() is incorrectly assigned as UTC (should be Etc/GMT+8 , i.e. Pacific Standard Time).

A reprex:

data <- importBC_data("co", 2018)
(head(data, n = 5))
# A tibble: 5 x 15
date                station_name   station_name_full ems_id  naps_id raw_value rounded_value unit  instrument parameter owner region             validation_stat~ ws    wd   
<dttm>              <chr>          <chr>             <chr>     <dbl>     <dbl>         <dbl> <chr> <chr>      <chr>     <chr> <chr>              <chr>            <lgl> <lgl>
1 2018-01-01 01:00:00 Victoria Topaz VICTORIA TOPAZ    E231866  100304     0.204         0.204 ppm   CO_API300  CO        ENV   01 - Vancouver Is~ VALID            NA    NA   
2 2018-01-01 02:00:00 Victoria Topaz VICTORIA TOPAZ    E231866  100304     0.186         0.186 ppm   CO_API300  CO        ENV   01 - Vancouver Is~ VALID            NA    NA   
3 2018-01-01 03:00:00 Victoria Topaz VICTORIA TOPAZ    E231866  100304     0.199         0.199 ppm   CO_API300  CO        ENV   01 - Vancouver Is~ VALID            NA    NA   
4 2018-01-01 04:00:00 Victoria Topaz VICTORIA TOPAZ    E231866  100304     0.186         0.186 ppm   CO_API300  CO        ENV   01 - Vancouver Is~ VALID            NA    NA   
5 2018-01-01 05:00:00 Victoria Topaz VICTORIA TOPAZ    E231866  100304     0.164         0.164 ppm   CO_API300  CO        ENV   01 - Vancouver Is~ VALID            NA    NA 

(attr(data$date, "tzone"))
# [1] "UTC"

I see that importBC_data uses read_csv. Perhaps setting locale = locale(tz = "Etc/GMT+8") within read_csv will fix the issue?

Missing data for 2020 Penticton

There was a temporary station operating in Penticton in 2020. There should be at least PM2.5 data, but this seems to be missing when using the importBC_data() function. The data may be missing from the ftp. See example below.

library(envair)
library(openair)
library(tidyr)

station <- "Penticton Industrial Place"
years <- 2019:2021

pm25 <- importBC_data("PM25", years = years)

stationPM25 <- pm25 %>%
  filter(STATION_NAME == station) %>%
  select(-c(LONGITUDE, LATITUDE, ROUNDED_VALUE)) %>%
  rename(PM25 = RAW_VALUE,
         date = DATE_PST)

summaryPlot(stationPM25, na.len = 0)

Golden Helipad NA's

R Version 4.3.1 envair version 0.4.0.100.

For Golden Helipad there are some entries with INSTRUMENT as NA along with many other columns.

# import data: hourly PM2.5 for the last 10 years
data<-envair::importBC_data(parameter_or_station = "pm25",
                            years=2014:2023,
                            use_openairformat = FALSE
                            )

# there are entries for instrument == NA in golden's data
data %>%
  dplyr::filter(STATION_NAME=="Golden Helipad") %>%
  dplyr::filter(is.na(INSTRUMENT)) 

goldenNAs

duplicates for golden helipad

R Version 4.3.1 envair version 0.4.0.100.

envair::importBC_data(parameter_or_station="pm25",years=2021) %>% 
                            dplyr::filter(STATION_NAME=="Golden Helipad")

Contains duplicates as follows. Data is identical but has both level 0 and level 2 validation status listed

duplicates

Add project lifecycle badge

No Project Lifecycle Badge found in your readme!

Hello! I scanned your readme and could not find a project lifecycle badge. A project lifecycle badge will provide contributors to your project as well as other stakeholders (platform services, executive) insight into the lifecycle of your repository.

What is a Project Lifecycle Badge?

It is a simple image that neatly describes your project's stage in its lifecycle. More information can be found in the project lifecycle badges documentation.

What do I need to do?

I suggest you make a PR into your README.md and add a project lifecycle badge near the top where it is easy for your users to pick it up :). Once it is merged feel free to close this issue. I will not open up a new one :)

importBC_data() missing most recent month of data when station specified

Here is a reprex.

library(envair)

#Example 1: station(s) specified
fsj_1hr_data <- importBC_data("Fort St John", years = 2019)
tail(fsj_1hr_data[,1:5]) # does not work as expected; missing most recent ~1 month of data

#Example 2: parameter specified
co_1hr_data <-  importBC_data("CO", years = 2019)
tail(co_1hr_data[,1:5]) # works as expected; includes today's data

My hunch -

In Example 1, importBC_data() reads files in the Year_to_Date/STATION_DATA directory. The modified date of files in this directory are about a month old.

In Example 2 importBC_data() reads files in the Year_to_Date directory. The modified date of files in this directory are up to date.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.