Coder Social home page Coder Social logo

waternk / rdataretriever Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ropensci/rdataretriever

0.0 1.0 0.0 244 KB

R interface to the Data Retriever

Home Page: https://docs.ropensci.org/rdataretriever

License: Other

R 94.72% Dockerfile 4.93% Shell 0.35%

rdataretriever's Introduction

rdataretriever

Build Status cran version Documentation Status Downloads + Downloads (old package name)

R interface to the Data Retriever.

The Data Retriever automates the tasks of finding, downloading, and cleaning up publicly available data, and then stores them in a local database or csv files. This lets data analysts spend less time cleaning up and managing data, and more time analyzing it.

This package lets you access the Retriever using R, so that the Retriever's data handling can easily be integrated into R workflows.

Table of Contents

Installation

rdataretriever is an R wrapper for the Python based Data Retriever. This means that Python and the retriever package need to be installed first.

Installation from CRAN and conda or Anaconda

Use this if you are new to Python or don't have a local Python installation

  1. Install the Python 3.7 version of the miniconda Python distribution from https://docs.conda.io/en/latest/miniconda.html
  2. In R install the reticulate package (the current release, 1.13, does not work on Windows so installation using devtools is recommended):
devtools::install_github("rstudio/reticulate")
  1. In R run the following to install the retriever Python package:
library(reticulate)
py_available(initialize = TRUE)
py_install("retriever")
  1. Install the rdataretriever R package:
install.packages("rdataretriever") # from CRAN
devtools::install_github("ropensci/rdataretriever") # from GitHub

Installation with devtools

Use this if you are already familiar with Python and have a local Python installation

  1. Check that your local Python installation is Python 3.6 and above
  2. In R install the reticulate package:
install.packages("reticulate")
  1. In R run the following (replacing "/path/to/python" with the path to you Python executeable) to install the retriever Python package:
library(reticulate)
use_python("/path/to/python")
py_install("retriever")

Note: When using virtual environment make sure the python using which virtual environment has been created is installed using --enable-shared option.

./configure --enable-shared
  1. Install the rdataretriever R package:
devtools::install_github("ropensci/rdataretriever") # from GitHub
install.packages("rdataretriever") # from CRAN

Examples

library(rdataretriever)

# List the datasets available via the Retriever
rdataretriever::datasets()

# Install the portal into csv files in your working directory
rdataretriever::install_csv('portal')

# Download the raw portal dataset files without any processing to the
# subdirectory named data
rdataretriever::download('portal', './data/')

# Install and load a dataset as a list
portal = rdataretriever::fetch('portal')
names(portal)
head(portal$species)

Spatial data Installation

Set-up and Requirements

Tools

  • PostgreSQL with PostGis, psql(client), raster2pgsql, shp2pgsql, gdal,

The rdataretriever supports installation of spatial data into Postgres DBMS.

  1. Install PostgreSQL and PostGis

    To install PostgreSQL with PostGis for use with spatial data please refer to the OSGeo Postgres installation instructions.

    We recommend storing your PostgreSQL login information in a .pgpass file to avoid supplying the password every time. See the .pgpass documentation for more details.

    After installation, Make sure you have the paths to these tools added to your system's PATHS. Please consult an operating system expert for help on how to change or add the PATH variables.

    For example, this could be a sample of paths exported on Mac:

    #~/.bash_profile file, Postgres PATHS and tools.
    export PATH="/Applications/Postgres.app/Contents/MacOS/bin:${PATH}"
    export PATH="$PATH:/Applications/Postgres.app/Contents/Versions/10/bin"
    
  2. Enable PostGIS extensions

    If you have Postgres set up, enable PostGIS extensions. This is done by using either Postgres CLI or GUI(PgAdmin) and run

    For psql CLI

    psql -d yourdatabase -c "CREATE EXTENSION postgis;"
    psql -d yourdatabase -c "CREATE EXTENSION postgis_topology;"

    For GUI(PgAdmin)

    CREATE EXTENSION postgis;
    CREATE EXTENSION postgis_topology

    For more details refer to the PostGIS docs.

Sample commands

rdataretriever::install_postgres('harvard-forest') # Vector data
rdataretriever::install_postgres('bioclim') # Raster data

# Install only the data of USGS elevation in the given extent
rdataretriever::install_postgres('usgs-elevation', list(-94.98704597353938, 39.027001800158615, -94.3599408119917, 40.69577051867074))

Provenance

rdataretriever allows users to save a dataset in its current state which can be used later.

Note: You can save your datasets in provenance directory by setting the environment variable PROVENANCE_DIR

Commit a dataset

rdataretriever::commit('abalone-age', commit_message='Sample commit', path='/home/user/')

To commit directly to provenance directory:

rdataretriever::commit('abalone-age', commit_message='Sample commit')

Log of committed dataset in provenance directory

rdataretriever::commit_log('abalone-age')

Install a committed dataset

rdataretriever::install_sqlite('abalone-age-a76e77.zip') 

Datasets stored in provenance directory can be installed directly using hash value

rdataretriever::install_sqlite('abalone-age', hash_value='a76e77`)

Using Dockers

To run the image interactively

docker-compose run --service-ports rdata /bin/bash

To run tests

docker-compose run rdata Rscript load_and_test.R

Release

Make sure you have tests passing on R-oldrelease, current R-release and R-devel

To check the package

R CMD Build #build the package
R CMD check  --as-cran --no-manual rdataretriever_[version]tar.gz

To Test

setwd("./rdataretriever") # Set working directory
# install all deps
# install.packages("reticulate")
library(DBI)
library(RPostgreSQL)
library(RSQLite)
library(reticulate)
library(RMariaDB)
install.packages(".", repos = NULL, type="source")
roxygen2::roxygenise()
devtools::test()

To get citation information for the rdataretriever in R use citation(package = 'rdataretriever')

Acknowledgements

A big thanks to Ben Morris for helping to develop the Data Retriever. Thanks to the rOpenSci team with special thanks to Gavin Simpson, Scott Chamberlain, and Karthik Ram who gave helpful advice and fostered the development of this R package. Development of this software was funded by the National Science Foundation as part of a CAREER award to Ethan White.


ropensci footer

rdataretriever's People

Contributors

dmcglinn avatar ethanwhite avatar henrykironde avatar pranita-s avatar apoorvaeternity avatar harshitbansal05 avatar maxpohlman avatar pakillo avatar sdtaylor avatar davharris avatar jeroen avatar karthik avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.