Coder Social home page Coder Social logo

smithsonian / gbif-issues-explorer Goto Github PK

View Code? Open in Web Editor NEW
2.0 6.0 2.0 115 KB

Shiny app to explore issues in occurrence records from a GBIF DwC download

Home Page: https://confluence.si.edu/display/DPOI/GBIF+Issues+Explorer

License: Apache License 2.0

R 100.00%
r shiny shiny-apps gbif museum-collections occurrences data

gbif-issues-explorer's Introduction

GBIF Issues Explorer

Archived

This repo is now archived. For a similar set of features, check https://github.com/Smithsonian/GBIF-Dataset-Explorer

Intro

This Shiny app allows researchers and data/collection managers to navigate the records with issues in a GBIF Darwin Core Archive.

The app can be used to:

  • Determine the source of issues:
    • Researchers can determine if the data is usable for a particular analysis
    • Collection and data managers can check their own database and figure out the source of the problem and fix it in the next update to GBIF
  • Determine if an issue would affect an analysis:
    • For example, a COUNTRY_COORDINATE_MISMATCH could be because the coordinates fall just outside the country borders. Is this an error in the coordinates or an expected result of an occurrence in water?

Occurrence records in GBIF can be tagged with a number of issues that their system has detected. However, like the processing information page indicates:

Not all issues indicate bad data. Some are merley flagging the fact that GBIF has altered values during processing.

This tool allows collection and data managers, as well as researchers, to explore issues in GBIF Darwin Core Archive downloads in an easy web-based interface. Just enter a GBIF download key and the tool will download the zip file, create a local database, and display the issues in the data contained. Once provided with the GBIF key, this tool will:

  1. Download the zip archive
  2. Extract the files
  3. Create a local database
  4. Load the data from the occurrence, verbatim, multimedia, and dataset tables to the database
  5. Generate summary statistics of the issues

To use, just provide the key to a Darwin Core Archive from GBIF. The download key can be requested via the GBIF API or on the website. If your download URL is:

www.gbif.org/occurrence/download/0001419-180824113759888

Then, the last part, '0001419-180824113759888' is the GBIF key you will need to provide this tool. The first time the app is run, it takes some time to create a local database, in particular for large data files. Afterwards, it uses the local database, so it will be faster.

As an alternative, you can copy the zip file to the data folder and run the load_from_DwC_zip.R script. It will run the same steps as above (skipping downloading the file) from the command line.

Then, you can click the 'Explore Issues' tab to see how many records have been tagged with a particular issue.

Once you select an issue, a table will display the rows that have been tagged with that issue. If you click on a row, more details of the occurrence record will be shown, including a map using Leaflet (if the record has coordinates). You can choose to delete the row from the local database.

The 'Explore Data Fields' will show a summary and top data values in all fields of the occurrence.txt file (except for the gbifID field).

Features

  • Load a GBIF DwC download from the web or from a local zip file
  • Navigate the issues in the records
    • Spatial issues are shown with relevant fields and a map
  • Explore the data included in each field
    • How many are null or empty, how many distinct values there are?

Screenshots

Main page, showing the number of records with specific issues:

gbif1

Exploring issues by looking at record details:

GBIF Issues Explorer2

Explore the data fields, see number of records without data, and distinct values (new in version 0.4):

GBIF Issues Explorer3

Testing the app in local computer

To test the app locally, without the need of a server, just install R and Shiny. Then, run a command that will download the source files from Github.

R version 3.3 or better is required. After starting R, copy and paste these commands:

install.packages(
    c("shiny", "DT", "dplyr", 
      "ggplot2", "stringr", "leaflet", 
      "XML", "curl", "data.table", "RSQLite", 
      "jsonlite", "R.utils", "shinyWidgets", 
      "shinycssloaders")
    )

library(shiny)
runGitHub("GBIF-Issues-Explorer", "Smithsonian")

Please note that the installation of the required packages may take a few minutes to download and install. Future versions will try to reduce the number of dependencies.

Winner of 2nd Place Award in GBIF 2018 Ebbe Nielsen Challenge!

The Global Biodiversity Information Facility (GBIF) Secretariat announced today that the Smithsonian Institution’s Digitization Program Office (DPO) was selected by an expert jury as a winner in the 2018 Ebbe Nielsen Challenge. The entry, submitted by DPO Informatics Program Officer Luis J. Villanueva, the GBIF Issues Explorer, won a Second Place award among 23 entries from countries around the world. The Challenge was open to software tools that used GBIF data or tools to promote open science and open biodiversity data.

More details...

Install

The app requires R 3.3, or later, and these packages:

  • shiny
  • DT
  • dplyr
  • ggplot2
  • stringr
  • leaflet
  • XML
  • curl
  • data.table
  • RSQLite
  • jsonlite
  • R.utils
  • shinyWidgets
  • shinycssloaders

To install the required packages:

install.packages(
    c("shiny", "DT", "dplyr", 
      "ggplot2", "stringr", "leaflet", 
      "XML", "curl", "data.table", "RSQLite", 
      "jsonlite", "R.utils", "shinyWidgets", 
      "shinycssloaders")
    )

Please feel free to submit issues, ideas, suggestions, and pull requests.

gbif-issues-explorer's People

Contributors

villanueval avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gbif-issues-explorer's Issues

Too many dependencies

Will be nice to reduce the number of packages required. For example, should be possible to skip XML by using jsonlite and querying the GBIF API.

To be done in a future version.

GBIF searching takes too long when country/year is missing

This is due to the large size of the database (23M records). We need a better way to do the string matching, or do it in the database, when the string does not have a country and/or year to filter the options.

As an alternative, we can do the matching in chunks of data, but it will be slow.

Add button to remove rows

This way, rows with problems can be ignored when downloading the occurrence file from the app. Also, collection managers can ignore these rows because they are not real problems or because they were fixed and will be updated in the next data sync.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.