Coder Social home page Coder Social logo

ropensci / gsodr Goto Github PK

View Code? Open in Web Editor NEW
85.0 10.0 15.0 154.61 MB

API Client for Global Surface Summary of the Day ('GSOD') Weather Data Client in R

Home Page: https://docs.ropensci.org/GSODR

License: Other

R 95.74% TeX 4.26%
r weather-data gsod ncdc weather-stations global-data data-access ncei weather weather-information

gsodr's Introduction

{GSODR}: Global Surface Summary of the Day ('GSOD') Weather Data Client

logo

R-CMD-check codecov DOI CRAN_Status_Badge Project Status: Active – The project has reached a stable, usable state and is being actively developed. JOSS


Introduction

The GSOD or Global Surface Summary of the Day (GSOD) data provided by the US National Centers for Environmental Information (NCEI) are a valuable source of weather data with global coverage. {GSODR} aims to make it easy to find, transfer and format the data you need for use in analysis and provides five main functions for facilitating this:

  • get_GSOD() - this function queries and transfers files from the NCEI's web server, reformats them and returns a data frame.

  • reformat_GSOD() - this function takes individual station files from the local disk and re-formats them returning a data frame.

  • nearest_stations() - this function returns a data.table of stations with their metadata and the distance in which they fall from the given radius (kilometres) of a point given as latitude and longitude in order from nearest to farthest.

  • update_station_list() - this function downloads the latest station list from the NCEI's server updates the package's internal database of stations and their metadata.

  • get_inventory() - this function downloads the latest station inventory information from the NCEI's server and returns the header information about the latest version as a message in the console and a tidy data frame of the stations' inventory for each month that data are reported.

When reformatting data either with get_GSOD() or reformat_GSOD(), all units are converted to International System of Units (SI), e.g., inches to millimetres and Fahrenheit to Celsius. File output is returned as a data.table object, summarising each year by station, which also includes vapour pressure and relative humidity elements calculated from existing data in GSOD. Additional data are calculated by this R package using the original data and included in the final data. These include vapour pressure (ea and es) and relative humidity calculated using the improved August-Roche-Magnus approximation (Alduchov and Eskridge 1996).

For more information see the description of the data provided by NCEI, https://www.ncei.noaa.gov/data/global-summary-of-the-day/doc/readme.txt.

How to Install

Stable Version

A stable version of GSODR is available from CRAN.

install.packages("GSODR")

Development Version

A development version is available from from GitHub. If you wish to install the development version that may have new features or bug fixes before the CRAN version does (but also may not work properly), please install from the rOpenSci R Universe. We strive to keep the main branch on GitHub functional and working properly.

install.packages("GSODR", repos = "https://ropensci.r-universe.dev")

Other Sources of Weather Data in R

There are several other sources of weather data and ways of retrieving them through R. Several are also rOpenSci projects.

The GSODTools by Florian Detsch is an R package that offers similar functionality as {GSODR}, but also has the ability to graph the data and working with data for time series analysis.

rnoaa, from rOpenSci offers tools for interacting with and downloading weather data from the United States National Oceanic and Atmospheric Administration but lacks support for GSOD data.

stationaRy, from Richard Iannone offers hourly meteorological data from stations located all over the world. There is a wealth of data available, with historic weather data accessible from nearly 30,000 stations.

riem from rOpenSci allows to get weather data from Automated Surface Observing System (ASOS) stations (airports) in the whole world thanks to the Iowa Environment Mesonet website.

weathercan from rOpenSci makes it easier to search for and download multiple months/years of historical weather data from Environment and Climate Change Canada (ECCC) website.

clifro from rOpenSci is a web portal to the New Zealand National Climate Database and provides public access (via subscription) to around 6,500 various climate stations (see https://cliflo.niwa.co.nz/ for more information). Collating and manipulating data from CliFlo (hence clifro) and importing into R for further analysis, exploration and visualisation is now straightforward and coherent. The user is required to have an Internet connection, and a current CliFlo subscription (free) if data from stations, other than the public Reefton electronic weather station, is sought.

Notes

NOAA policy

Users of these data should take into account the following:

The data summaries provided here are based on data exchanged under the World Meteorological Organization (WMO) World Weather Watch Program according to WMO Resolution 40 (Cg-XII). This allows WMO member countries to place restrictions on the use or re-export of their data for commercial purposes outside of the receiving country. Data for selected countries may, at times, not be available through this system. Those countries' data summaries and products which are available here are intended for free and unrestricted use in research, education, and other non-commercial activities. However, for non-U.S. locations' data, the data or any derived product shall not be provided to other users or be used for the re-export of commercial services.

Meta

  • Please report any issues or bugs.

  • License: MIT

  • To cite {GSODR}, please use: Adam H. Sparks, Tomislav Hengl and Andrew Nelson (2017). GSODR: Global Summary Daily Weather Data in R. The Journal of Open Source Software, 2(10). DOI: 10.21105/joss.00177.

Code of Conduct

Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

Alduchov, O.A. and Eskridge, R.E., 1996. Improved Magnus form approximation of saturation vapor pressure. Journal of Applied Meteorology and Climatology, 35(4), pp. 601-609 DOI: 10.1175/1520-0450(1996)035<0601:IMFAOS>2.0.CO;2.

gsodr's People

Contributors

adamhsparks avatar cboettig avatar karthik avatar noamross avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gsodr's Issues

Code review badge isn't visible

@karthik the new badge for the getCRUdata package works, but for GSODR it's not appearing.

I can't find anything in the code that would prevent this from working.

Any thoughts?

No support for *.op.gz files in reformat_GSOD()?

I may be confusing something here, but I think that a while back, reformat_GSOD() worked with *.op.gz files obtained from NOAA ftp.
It seems like this is no longer the case. Is something changed?

This is not an issue per se, it's a confusion. I couldn't rerun my code from a year ago, and I'm trying to see what changes do I need to make to the code to make it work.

Check individual station values

Users can enter any value for a station, there are no checks as for years or file path for validity and/or existence in database

nearest_stations() distance info

Thanks for doing this
Couple of points

  1. There does not appear to be any difference in time taken to return a list of stations frrom a distance of 10 or 100 km . In my test 4 are returned in case 1 and 107 in case 2. both take around 10 secs. Is this just the way it is or could the former be speeded up

  2. The r code implies that you could return a distance (miles=FALSE). Can you set an option so that distance could also be returned as station id (presumably as a data.frame)so that the nearest or nearest 5,say, could be identified for further processing

thank

How do I plot temperature and precipitation for 3 stations in Tanzania from 1957 to 2016?

Dear Sparks,
Thanks for the GSODR package tutorial provided.
Using the Global Summary of the Day data from NOAA, I want to plot temperature (mean daily minimum temperatures, mean daily maximum temperatures), precipitation (total precipitation, average precipitation) for three Stations in Tanzania from 1957 to 2016 (station IDs: Kigoma, Zanzibar, and Mtwara).
Then repeat the process for the Tanzania as a country.
I read the example for Toowoomba, Queenland for 2010 provided on github. I wanted to start with your example and then adapt the codes to solve my problems. However, I have not been able to do this. I also tried to replicate the exact examples as presented for Toowoomba, Queenland for 2010 but could not still do it.
See my attempts below with the message I get. I will appreciate any assistance with how to plot the variables described above for the three stations in Tanzania and for the Country Tanzania from 1957 to 2016. I am not proficient in the use of R.

Thanks .

kigoma
queensland

"Error in read_connection_(con):" when writing to csv

I can retrieve the station data as a data frame when I first open my script, but only once. After that, I get this error for any further efforts to retrieve station data, whether as a data frame or as a csv:

image

Is this an issue with the server?

Max-missing only works for complete years

If max-missing is specified for an incomplete year, e.g. download 2017 in November 2017, it will omit all stations and data.

Check to see if function is downloading current year and warn user that using max-missing is not possible for current year data.

Typo resulting in "Error in readChar [...]"

There is a typo in line 160 of get_GSOD():

  • load(system.file("extdata", "country_list.Rda", package = "GSODR"))
  • Rda should be rda

Resulting in the following error:

Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file '', probable reason 'No such file or directory'


R version 3.4.1 (2017-06-30) -- "Single Candle"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

Low precipitation values

The precipitation values appear lower than expected/observed and some of data are also missing. Is there anything I can do about these low values? How do I reduce the number of values removed from the plots?

rplot01
western precipitation

Add ability to interpolate missing temperature values

Often there are several missing min/max temperature values or stations missing several days of complete sets of data.

Several options exist for interpolating missing station data using within and between station data.

Add ability to interpolate missing values for temperature, min temperature and max temperature.

Improve test coverage

With the newly rewritten functionality test coverage has taken a hit. Need to write better unit tests.

paper.md lacks all authors

@thengl, I don't know what's happened, but your edits to the paper.md file have gone missing and I cannot find them in git anywhere.

I moved the /paper directory to /inst/paper and I think perhaps you were editing the version in the previous location and the changes got clobbered with the moving directories around?

Can you check and update your information and edits in the current location of the file? I see that there is a .png file in the /paper directory. That directory should not exist in the current master or devel branches.

RH >100%

RH is sometimes calculated at being >100%. This means that ea and es are also incorrect.

Check TDEW and TEMP relationships in these data to see what relationship exists in these instances and correct or remove spurious data.

Integrate information from isd-inventory.txt into GSODR

From the text file header description:

THIS INVENTORY SHOWS THE NUMBER OF WEATHER OBSERVATIONS BY STATION-YEAR-MONTH FOR BEGINNING OF RECORD THROUGH SEPTEMBER 2017. THE DATABASE CONTINUES TO BE UPDATED AND ENHANCED, AND THIS INVENTORY WILL BE UPDATED ON A REGULAR BASIS.

ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-inventory.txt

get_gsod returns additional station data

I tried to download data from 2 stations with this piece of code

install.packages("GSODR")
library(GSODR)
library(dplyr)

load(system.file("extdata", "country_list.rda", package = "GSODR"))
load(system.file("extdata", "isd_history.rda", package = "GSODR"))

a <- get_GSOD(years = 2010, station = "489300-99999")    
b <- get_GSOD(years = 2010, station = "489260-99999")

I don't understand why b contains data from a:

image

I used package version 1.3.1 and R version 3.5.0.

Many thanks for your help!

reformat_GSOD() runs infinitely with 0 activity

So, this is a strange issue which I'm not sure how to diagnose.
reformat_GSOD() does not behave as expected. It runs, but with no end, no output, no and CPU/GPU activity shown on the activity monitor by Mac. It's like nothing is happening at all.

I narrowed everything down to one single file, but nothing.

Extra info: this is the first attempt to run the function after a clean install of R, RStudio, and all packages on a new MBP with M1 chip. Not sure if this is relevant at all

─ Session info ──────────────
setting value
version R version 4.0.3 (2020-10-10)
os macOS Big Sur 10.16
system x86_64, darwin17.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2021-01-11

GSODR * 2.1.2.9000 2021-01-11 [1] Github (63b1ea8)

library(GSODR)

list.files("data/gsod/2010/")
[1] "02431099999.csv"

data <- reformat_GSOD(dsn = "data/gsod/2010")

Error in open.connection(con, "rb") : Timeout was reached

Just tried your function using the following setting:

GSODR::get_GSOD(years = 1973:2015, path = "~/Downloads/", station = "080250-99999")

Failing with the error above. If I set years to 2003:2015, everything works.

Thanks for checking!

Release GSODR 3.1.0

Prepare for release:

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Tweet

Merge .csv files for all years

@PRoyal22 suggested to be able to merge CSV files for all years of a single station that are queried.

This is a desirable and easy to implement feature for a single station and include in the get_GSOD() function. To do this for all years queried for the complete GSOD set or agroclimatology might not be desirable simply due to the size of the final files that are generated for each year.

get_GSOD bug

I tried running the following code:

library(GSODR)
g <- get_GSOD(years='2016', country='NZ', path=tempdir())

And I received the error message:

Error in { : task 1 failed - "task 1 failed - "unrecognized date format""

Below is my sessionInfo():

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.5.0 GSODR_0.1.9

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7       lattice_0.20-33   codetools_0.2-14  foreach_1.4.3    
 [5] assertthat_0.1    chron_2.3-47      grid_3.3.1        R6_2.1.3         
 [9] DBI_0.5-1         magrittr_1.5      settings_0.2.4    stringi_1.1.1    
[13] curl_2.1          data.table_1.9.6  doParallel_1.0.10 raster_2.5-8     
[17] sp_1.2-3          iterators_1.0.8   tools_3.3.1       stringr_1.1.0    
[21] parallel_3.3.1    compiler_3.3.1    tibble_1.2  

EDIT: This was with the CRAN version of GSODR

Country list problem

Hi,

When i'm trying to get the country_lsit with this code:

load(system.file("extdata", "country_list.rda", package = "GSODR"))

R shows me the next error:

Error in readChar(con, 5L, useBytes = TRUE) :
no se puede abrir la conexión
Además: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file '', probable reason 'Invalid argument'

Any idea what is going wrong? i'm trying the same with:

load(system.file("extdata", "isd_history.rda", package = "GSODR"))

and is working fine..

Thanks in advance!

About `update-tic.yml`

Hi Adam,

in order for update-tic.yml to work, you need to add a secret with "workflow" scopes named "TIC_UPDATE" to your repo.

This can be easily done via tic::gha_add_secret(<secret>, "TIC_UPDATE").

Just letting you know since you are one of the few persons who were keen enough to try this new functionality :)

reformat_GSOD() does not use file_list input parameter

Session Info

See reformat_GSOD() source code below. Object file_list is an input parameter, but it is reassigned automatically when dsn is not NULL, and input value is ignored. I suppose file_list is only used when input dsn is NULL, but that is not clear in the package documentation, and the required format for file_list is also not clear from documentation. Thank you for the package!

Alison


reformat_GSOD <- function (dsn = NULL, file_list = NULL)
{
isd_history <- NULL
load(system.file("extdata", "isd_history.rda", package = "GSODR"))
setkeyv(isd_history, "STNID")
if (!is.null(dsn)) {
file_list <- list.files(path = dsn, pattern = "^.*\.csv$",
full.names = TRUE)
if (length(file_list) == 0)
stop("No files were found, please check your file location.")
}
GSOD_XY <- .apply_process_csv(file_list, isd_history)
return(GSOD_XY)
}

Improve test coverage

There has been some slippage with changes in the code over time. It's time to improve the test coverage again before releasing a new version.

get_GSOD not downloading data from selected year

  • I am trying to download data for a specific year-station. For example:

Lond_2005 <- get_GSOD(years = 2005, station = "837660-99999")

It is returning only 2010 irrespectively of the chosen year.

$ YEAR <chr> "2010", "2010", "2010", "2010", "2010", "2010",...

  • Is there a way to download data from multiple/interval of years? something like c(2010:2014)

Give option to generate shape file as well as CSV

T. Hengl's original script also generates a shape file.

Incorporate this functionality as an option into the get_GSOD function so that the use can select a CSV (default output), shape file (optional) or both (optional).

Error with get_GSOD's first example

Typing the first example of the get_GSOD's help produces an error:

> t <- get_GSOD(years = 2010, station = "955510-99999")
Error in .f(.x[[i]], ...) : 
955510-99999 is not a valid station ID number, please check your entry.
Valid Station IDs can be found in the isd-history.txt file
available from the US NCEI FTP server by combining the USAF and WBAN
columns, e.g., '007005' '99999' is '007005-99999' from this file 
<ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-history.txt>

Updated badge links but Travis build fails

@sckott,
I'm unable to switch on the Travis to build this package since I'm not an administrator any longer now so the build fails.

I've updated the links in the documentation. Just need to fix things up on the Travis side of things, I think?

Batch Processing

Is there a way to do batch processing, if I have multiple stations. I have tried using a vector and having the station[1] in the get_gsod, which it did not take and I also tried using the paste command but was getting this error: We've tried to get the file(s) you requested six times, but the server is not responding, so we are unable to process your request now. Please try again later.

Thanks,

Shaffiq

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.