Coder Social home page Coder Social logo

pestr's Introduction

pestr

pestr package hexsticker

CRAN_Status_Badge Lifecycle: maturing License: MIT Build Status R-CMD-check codecov

Functions included in this package allows users to painlessly connect to and extract data from EPPO Data Services / EPPO Global Database. Before you start using it you should register on: EPPO Data Services and obtain your token to access REST API.

Installation

You can install the released version of pestr from CRAN with:

install.packages("pestr")

OR you can install and use pestr development version from GitHub with:

# install.packages("devtools")
devtools::install_github("mczyzj/pestr")

Overview and Usage

Package include functions that allow you to download SQLite database eppo_database_download (archive around 12 MB, after extraction around 45 MB). The database is needed for extracting eppocodes that are used in other functions from this package. Function included in eppo_tabletools_ family return both:

  • table of raw results in so called long format (machine friendly)
  • processed, compact table, that contain all information in one row per one pest.

Before using functions that connect to REST API (hosts, categorization, taxonomy and pests) you should execute create_eppo_token function with string argument equal to your personal EPPO Data Services token. This function creates global variable eppo_token which should be parsed as an argument to functions that require token argument.

eppo_table_full allow to execute all the functions and return compact table with information on names, hosts, categorization, distribution and taxonomy – one row per one pest.

Feel free to contribute to this package and report issues via GitHub or email.

Example workflow

First you need to create token variable, use your token from EPPO Data Services.

#make basic checks and store your EPPO token in a variable
eppo_token <- pestr::create_eppo_token('<<your token>>')

Than:

  • on LINUX: use function to automatically download and unizp SQLite db from EPPO
#download SQLite databaase
pestr::eppo_database_download()
  • on Windows: download SQLite db using:
#by default it downloads to working directory
#you can override this behaviour filepath argument
pestr::eppo_database_download()

and extract the file manually to project working directory.

Put all the names that you are looking for into a vector, e.g.:

#use some pests names and store them in a variable
pests <- c('Xylella', 'Cydia packardi', 'Drosophila suzuki')

and make connection to database, as in code below:

# store SQL connection in a varibale
eppo_SQLite <- pestr::eppo_database_connect()

names of pests

Get pest names using:

# which names from pests variable can be found in SQLite database
# results of this function might be used as an input for eppo_tabletools
# funtions family
pests_names_tables <- pestr::eppo_names_tables(pests, eppo_SQLite)

in result you will have list containing 4 tables:

  • data frame with names that are present in EPPO Data Services;
  • data frame with names that are not present EPPO Data Services;
  • data frame with preferred names and eppo codes EPPO Data Services;
  • data frame with all associated names to eppocode from third data frame.

You might parse results of this function directly to eppo_tabletools_ to obtain data. Other way is to use raw eppocodes as argument (this workflow is explained in Vignettes).

Categorization of pests

Using:

# check category of pests using results of eppo_names_tables
pests_cat <- pestr::eppo_tabletools_cat(pests_names_tables, eppo_token)

you will get as result you will get list with two elements:

  • data frame with categorization tables for each eppocode in long format;
  • a single data frame with categorization for each eppocode condensed into a single cell.

Hosts of pests

# find hosts of pests using results of eppo_names_tables
pests_hosts <- pestr::eppo_tabletools_hosts(pests_names_tables, eppo_token)

result with two tables:

  • first is a data frame in long format with all data for all pests;
  • second is a data frame where hosts are combined into single cell for each eppocode.

Taxonomy of Pests and hosts

To get taxonomy use:

# get taxonomy of pests and hosts using results of eppo_names_tables
pests_taxo <- pestr::eppo_tabletools_taxo(pests_names_tables, eppo_token)

This function results are a list of two data frames:

  • first is a long format table;
  • second is table with ‘main category’ of each eppocode.

Distribution of pests

The function extracting distribution from EPPO Global Database does not need eppo_token. It can be called like:

# returns distribution of pests using results of eppo_names_tables
pest_distri <- pestr::eppo_tabletools_distri(pests_names_tables)

The result is a two element list:

  • first one contains data frame of distribution in long format;
  • second contains single cell of distribution for each eppocode.

Names, categorization, distribution, taxonomy and hosts of pests in one shot

Whole condensed table in one shot:

# return condensed table with names, categorization, distribution, taxonomy and
# hosts of pests
eppo_fulltable <- pestr::eppo_table_full(pests, eppo_SQLite, eppo_token)

which you can easily save as csv and use in a spreadsheet:

write.csv(eppo_fulltable, 'eppo_fulltable.csv')

Pests of hosts

Since the EPPO Data Services provides information on pest of particular host, you can easily access information with:

#make vector with names of hosts
hosts <- c("Abies alba", "Triticum")

# query SQLite database to obtain valid names of hosts
hosts_names_tables <- pestr::eppo_names_tables(hosts, eppo_SQLite)

# use results of previous query to find pests of hosts
hosts_pests <- pestr::eppo_tabletools_pests(hosts_names_tables, eppo_token)

Please cite

Please, do remember to cite this package AND EPPO resources:

#to get citation of pestr package
citation("pestr")
#to get citation of EPPO Global Database
pestr::eppo_citation("global_database")
#to get citation of EPPO Data Services
pestr::eppo_citation("data_services")
#shortcut to get citation of both EPPO resources
pestr::eppo_citation("global_both")

For more details on using pestr package please check vignettes.

TODO:

  • Internationalization
  • Connection to pestrPRA

pestr's People

Contributors

mczyzj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pestr's Issues

Fail Gracefully

Adjust REST and download functions to follow the Fail Gracefully CRAN policy.

Test functions

Create more sensible tests for eppo tabletools. Now the test are more like self sustaining prophecy.

eppo_tabletools_distri subscript out of bounds error

Hello,

Thanks for the cool package :)

I've installed the most recent version (0.8.2) and am encountering an error when trying to retrieve pest distributions. I was able to retrieve other metadata (e.g. taxonomy, hosts etc) without problems, but when I try to get distributions, the following error arises:

> pests <- c("Anastrepha ludens", "Drosophila suzukii")
> pests_names_tables <- eppo_names_tables(pests, eppo_SQLite)
> pest_distri <- eppo_tabletools_distri(pests_names_tables, eppo_token)
New names:`` -> `...6``` -> `...7``` -> `...8``` -> `...9``` -> `...10``` -> `...11``` -> `...12``` -> `...13``` -> `...14``` -> `...15``` -> `...16``` -> `...17``` -> `...18``` -> `...19``` -> `...20``` -> `...21``` -> `...22``` -> `...23``` -> `...24``` -> `...25`
New names:`` -> `...6``` -> `...7``` -> `...8``` -> `...9``` -> `...10``` -> `...11``` -> `...12``` -> `...13``` -> `...14``` -> `...15``` -> `...16``` -> `...17``` -> `...18``` -> `...19``` -> `...20``` -> `...21``` -> `...22``` -> `...23``` -> `...24``` -> `...25`
The distribution file for EPPO code ANSTLU was not found.
The distribution file for EPPO code DROSSU was not found.
Error in distri_lists[[i]] : subscript out of bounds

It appears that the eppocodes and URLs are being generated as expected:

Browse[1]> eppocodes
[1] "ANSTLU" "DROSSU"
Browse[1]> distri_urls
[1] "https://gd.eppo.int/taxon/ANSTLU/download/distribution_csv" "https://gd.eppo.int/taxon/DROSSU/download/distribution_csv"
Browse[1]> names_tables$exist_in_DB
  codeid           fullname
1   4669  Anastrepha ludens
2   9518 Drosophila suzukii

When I do a manual search on the EPPO website, the data is there. When I paste the above URLs into a browser window, the files download and contain data. However, the column format of the EPPO file appears different to the expected by pestr - in addition to the columns "continent", "country", "state","country code", "state code", "Status", there are now also several other unnamed columns.

I think this is causing eppo_csv_download to throw an error at this point:

if (!all(names(distri_lists[[i]]) %in%
             c("continent", "country", "state",
               "country code", "state code", "Status"))) {
      message(msg_helper("no_distri", i))
      distri_lists[[i]] <- NULL

When I run the preceding code in debug mode it seems to work ok:

Browse[1]> for (i in 1:length(distri_lists)) {
+     distri_lists[[i]] <- eppo_try_urls(distri_urls[i]) %>%
+         httr::content(type = "text/csv",
+                       encoding = "UTF-8",
+                       col_types = readr::cols()) %>%
+         as.data.frame()
+ }
New names:`` -> `...6``` -> `...7``` -> `...8``` -> `...9``` -> `...10``` -> `...11``` -> `...12``` -> `...13``` -> `...14``` -> `...15``` -> `...16``` -> `...17``` -> `...18``` -> `...19``` -> `...20``` -> `...21``` -> `...22``` -> `...23``` -> `...24``` -> `...25`
New names:`` -> `...6``` -> `...7``` -> `...8``` -> `...9``` -> `...10``` -> `...11``` -> `...12``` -> `...13``` -> `...14``` -> `...15``` -> `...16``` -> `...17``` -> `...18``` -> `...19``` -> `...20``` -> `...21``` -> `...22``` -> `...23``` -> `...24``` -> `...25`
Browse[1]> distri_lists[["DROSSU"]][1,]
  country   state country code state code Status                             ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13
1  Africa Algeria         <NA>         DZ   <NA> Present, restricted distribution   NA   NA   NA    NA    NA    NA    NA
  ...14 ...15 ...16 ...17 ...18 ...19 ...20 ...21 ...22 ...23 ...24 ...25 continent
1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA        NA

Lastly, I noticed that EPPO seems to have scrambled the column order - 'continent' is the last column name, but the continent data is in the first column.

Thanks!

add JRC 'Gridded Agro-Meteorological Data in Europe'

As a follow up form #1 , let's maybe discuss this concrete data set.

The data itself is available after registration and in "pieces",
to be downloadable in CVS file.
http://agri4cast.jrc.ec.europa.eu/DataPortal/Index.aspx

It requires registration and is asynchronous (via email notifications).
This prevents to use via an R package directly.

There is a limit of "records", but with < 10 downloads the whole thing could be downloaded.

I agree that adding them to an R package does not make a lot of sense.

I checked the "reuse conditions" and I think we could re-publish the data (with proper citation, of course) into "http://zenodo.org"

Having done that would give it a DOI and "stable download links" for each file,
in th form of :
https://zenodo.org/record/7531/files/C3-EURO4M-MEDARE_TX.txt

Like this we could split the data in files per year, and the R code would use that links to access the data.

The same could work for all type of big climate data (as long as the re-use conditions allow the re-publishing).

Windows bug

Incorrect path ('/' instead of '') for Windows. Need some ifelse statements to correctly create paths in functions from eppo_database set.

Documentation

Needs re-writing the documentation.
Check spelling.
Add examples.

scope of this package ?

I am interested in enhancing or (starting a new R) package useful for pest risk assesment.
My current focus would be on ease the access to (European) climatic data and some crop / host distribution data,
mainly things available at the JRC:

http://agri4cast.jrc.ec.europa.eu/DataPortal/Index.aspx
http://forest.jrc.ec.europa.eu/european-atlas-of-forest-tree-species/atlas-data-and-metadata/

My scientific colleges use as well some EPPO data, but we are not aware of this package.

So I am wondering to create new package(s) or add to something existing.

Any views on this ?

accessing temporal Data

Dear Mr. Czyz,

I have two questions.

First, is there any possibility to be able to access the "reporting" column from the EPPO GD with your R Package. As far as I see it, you can only see the current distribution (which includes the info from the "reporting" collumn), we are intrested in the temporal aspect of those reports.

The second question is, if there is an easy way to create a pest vector with every EPPO Code, as to include every entry in the db.

Sincerely yours,

Gustav Glock.

TravisCI error on OSX (latex issue)

from travis log:

  • checking PDF version of manual ... WARNING

LaTeX errors when creating PDF version.

This typically indicates Rd problems.

LaTeX errors found:

! LaTeX Error: File `inconsolata.sty' not found.

Tables formating

Add new list item - tables formatted in the same way as they are formatted in EPPO template. To reconsider.

Retrieve all pest names

Dear Michal,

I am trying to get a vector with all the pests that exist in the EPPO database, a complete list of all the full names. I have not managed to do it with this package, if not possible what method could you advice.
Thank you.

Best,
João Colaço

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.