Coder Social home page Coder Social logo

ropensci / occcite Goto Github PK

View Code? Open in Web Editor NEW
21.0 5.0 4.0 17.95 MB

Querying database aggregators and citing primary sources of resulting occurrence records.

Home Page: https://docs.ropensci.org/occCite/

License: GNU General Public License v3.0

R 24.96% HTML 75.04%
biodiversity-informatics biodiversity-data biodiversity-standards citations museum-collections museum-metadata museum-collection-specimens

occcite's Introduction

occCite

R build status cran version rstudio mirror downloads Project Status: Active – The project has reached a stable, usable state and is being actively developed. Codecov test coverage ROpenSci status DOI

Summary

The occCite workflow follows a three-step process. First, the user inputs one or more taxonomic names (or a phylogeny). occCite then rectifies these names by checking them against one or more taxonomic databases, which can be specified by the user (see the Global Names List). The results of the taxonomic rectification are then kept in an occCiteData object in local memory. Next, occCite takes the occCiteData object and user-defined search parameters to query BIEN (through rbien) and/or GBIF(through rGBIF) for records. The results are appended to the occCiteData object, along with metadata on the search. Finally, the user can pass the occCiteData object to occCitation, which compiles citations for the primary providers, database aggregators, and R packages used to build the dataset.

Please cite occCite. Run the following to get the appropriate citation for the version you’re using:

citation(package = "occCite")
## 
## Owens H, Merow C, Maitner B, Kass J, Barve V, Guralnick R (2022).
## _occCite: Querying and Managing Large Biodiversity Occurrence
## Datasets_. doi: 10.5281/zenodo.632770 (URL:
## https://doi.org/10.5281/zenodo.632770), R package version 0.5.7,
## <URL: https://CRAN.R-project.org/package=occCite>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {occCite: Querying and Managing Large Biodiversity Occurrence Datasets},
##     author = {Hannah Owens and Cory Merow and Brian Maitner and Jamie Kass and Vijay Barve and Robert Guralnick},
##     year = {2022},
##     note = {R package version 0.5.7},
##     url = {https://CRAN.R-project.org/package=occCite},
##     doi = {10.5281/zenodo.632770},
##   }

Installation:

install.packages("occCite")

Or, install via r-universe

install.packages("occCite", repos = "https://ropensci.r-universe.dev")

Or, install github development version:

devtools::install_github("ropensci/occCite")

After using one of these options, you can load the package into your environment using:

library("occCite")

Getting Started

Meta


occcite's People

Contributors

cmerow avatar gepinillab avatar hannahlowens avatar jamiemkass avatar maelle avatar vijaybarve avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

occcite's Issues

Handle custom predicates for GBIF searches

Suggested during review by @damianooldoni.

"This is the classic dilemma while programming high level functions: the balance between being user friendly and flexibility.

Personally I would allow users to make their own query by means of ellipsis .... In this way your function is powerful but at the same time user friendly: GBIF/rgbif experts can specify their own predicates while other users just benefit of the basic functionality and so everybody is happy."

Plotting suggestion

From Cory: take two objects, an old and a new, and plot both to compare result.

Citation Formatting Suggestion

Via Cory's manuscript comments: automatically reformatting plan-text citations as laTex so it can be automatically imported into any reference manager.

Handling higher taxonomic names

Also from @damianooldoni:
"why don't you allow the user to specify the rank itself by adding an argument rank to the function? In this way you make your function much more useful... you can add a default value ("species") to this new argument. This is also important as it can be applied for a multi-species search! You can just pass a vector of species to argument taxon or simply the parent name, e.g. Gadus to get all occurrences of Gadus spp.

I would also suggest to add (at least) a kingdom argument to avoid hemihomony. An alternative would be to allow the user to select which taxon he/she means if GBIF returns multiple taxa instead of choosing the first key.

occCite out of CRAN

Hi Hannah,

We just realized that occCite is not on CRAN. Let me know if I can help with something.

I hope that it is an easy thing to fix.

Best,
Gonzalo

Mixed GBIF sources

Another useful feature idea: allow for some GBIF species records to come from local machine, some prepared new or downloaded as previously-prepared datasets on GBIF servers.

Subset Citations

In instances where the user removes occurrences that are not fit for their specific use, enable citations that only include data sources for occurrences that are used.

Synonymy error handling

Currently, taxonomicRectification() is only a fancy spell check. Would be good to verify accepted name status for submitted names, and return accepted names in the case of junior synonyms in the search.

Would also be good to optionally search for junior synonyms in occurrence databases, as well.

Fit-For-Purpose Filtering

Add columns to overall occurrence table with 1's and 0's for records that pass and fail a certain set of filtration criteria (e.g., remove putative duplicate records, records outside certain temporal or geographic constraints, downsampled to a particular resolution, etc.)

include 'ocurrenceStatus' and 'coordinateUncertaintyInMeters'

Hi,
At least two of GBIF's columns should be added to the occCite::occQuery() output, as they are crucial for cleaning the occurrence data: 'ocurrenceStatus' (which can be "Present" or "Absent"!) and 'coordinateUncertaintyInMeters' (as many observations are very far away from their assigned point coordinates, which can be e.g. continent centroids). If the same (or equivalent) columns are not available in BIEN, these columns could get NA values in those rows.
Cheers!

Use old download keys!

From @damianooldoni:
"Let's say I had to trigger a download via rgbif instead of using your getGBIFpoints() function as I need to download all data of all species in a region instead of specifying a taxon/phylogeny. Well, it would be nice to allow me to still use the rest of the package functionalities, isn't? This can be another way to solve the dilemma between user friendly code and flexibility. How to do it? you can allow me to get my GBIF download by passing a GBIF download key or the GBIF DOI to getGBIFpoints() and so to prevGBIFdownload()!"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.