Coder Social home page Coder Social logo

matt-dray / ghdump Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 0.0 66 KB

:octocat::arrow_down: R package: download/clone all of a user's GitHub repositories for archiving purposes

Home Page: https://www.rostrum.blog/2020/06/14/ghdump/

License: Other

R 100.00%
r rstats package github-api gh purrr

ghdump's Introduction

ghdump

Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows. R-CMD-check rostrum.blog post

Purpose

Clone all of a GitHub user's repositories, or download them as zip files (and optionally unzip them). Intended for archiving purposes or setting up on a new computer.

Using {ghdump}

Note that the package does what I need it to do, but is not fully tested for all systems and set-ups. Please file an issue or raise a pull request if you have a problem or contribution. You can also learn more about this package from an associated blog post.

The {gitcellar} package later appeared on rOpenSci by Maëlle Salmon and Jeroen Ooms. They're very smart, so you might want to check it out. It has a keep function that allows you to select only certain repos ({ghdump} doesn't have this feature).

Install

You can install {ghdump} from GitHub with:

install.packages("remotes")  # if not yet installed
remotes::install_github("matt-dray/ghdump")

GitHub PAT

You'll need a GitHub Personal Access Token (PAT) to use {ghdump}.

Assuming you have a GitHub account, generate a token for accessing the GitHub API and store this in your .Renviron file. The {usethis} package helps make this a breeze. Read more in the Happy Git and GitHub for the useR book by Jenny Bryan, the STAT 545 TAs and Jim Hester.

usethis::browse_github_pat()  # opens browser to generate token
usethis::edit_r_environ()     # add your token to the .Renviron

Make sure to restart R after these steps.

Use ghd_copy()

{ghdump} has one exported function: ghd_copy(). Pass to the function a GitHub user name, a local directory to download into, and whether you want to download or clone the repos. If you want to clone, you must specify the protocol (make sure your keys are set up if specifying SSH).

To clone:

ghdump::ghd_copy(
  gh_user = "matt-dray",  # user whose repos to download
  dest_dir = "~/Documents/repos",  # where to copy to
  copy_type = "clone",  # 'clone' or 'download' the repos
  protocol = "https"  # 'https' or 'ssh'
)

To download:

ghdump::ghd_copy(
  gh_user = "matt-dray",
  dest_dir = "~/Documents/repos",
  copy_type = "download"
)

The function is designed to be used interactively and infrequently. To this end, the user is prompted throughout as to whether to:

  • create a new local directory with the provided 'dest_dir' argument
  • commit to cloning all the repos (if copy_type = "clone")
  • commit to downloading all zip files (if copy_type = "download") and then whether to:
    • unzip all the files
    • retain the zip files
    • rename the unzipped directories to remove the default branch suffix (e.g. '-master')

Credits

The function interacts with the GitHub API thanks to the {gh} package by Gábor Csárdi, Jenny Bryan and Hadley Wickham. Iteration is thanks to the {purrr} package by Lionel Henry and Hadley Wickham. The {cli} package by Gábor Csárdi allowed for a prettier user interface.

The {ghdump} package sticker was made thanks to Dmytro Perepolkin's {bunny} package and the {magick} package from Jeroen Ooms.

Code of Conduct

Please note that the {ghdump} project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

ghdump's People

Contributors

matt-dray avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ghdump's Issues

Allow for selection of HTTPS or SSH

Defaults to HTTPS iirc. Allow for SSH with a simple argument like protocol = c("https, "ssh") and ifelse some dependent paste statements.

Control which repos are downloaded

ghd_download() is only concerned with all repos.

  • Offer repo selection from this function?
  • Offer personal/private only (or both) if you are recognised as accessing your own repos

Better reporting of failure

Currently uses purrr::safely() in ghd_download_zips() in case a repo can't be download for whatever reason.

The failures are printed at the end of the output from ghd_download(), which is a poor user experience.

Can we get a see warnings() equivalent for reviewing the failures? Or at least cat() the failures.

In addition: Warning message:
In download.file(url = .x, destfile = paste0(dest_dir, "/", .y,  :
  cannot open URL 'https://codeload.github.com/matt-dray/2019-10-11-test-workshop/zip/master': HTTP status was '404 Not Found'

Correct `if` stop condition when downloading

The protocol argument check gets evaluated even when copy_type == "download" . It shouldn't. The !protocol %in% c("https", "ssh") in the if should only evaluate if copy_type == "clone" passes. So probably change & to && (preferred) or used nested ifs.

> ghdump::ghd_copy("co-analysis", "~/Desktop/co-analysis_copies", copy_type = "download")
Error in if (copy_type == "clone" & !protocol %in% c("https", "ssh")) { : 
  argument is of length zero

Better error handling

Just checks for character strings in ghd_download() right now. Need better coverage of conditions. Intermediate non-exported functions also need handling in case something weird is returned.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.