Coder Social home page Coder Social logo

sbg / sevenbridges-r Goto Github PK

View Code? Open in Web Editor NEW
33.0 17.0 14.0 3.49 MB

Seven Bridges API Client, CWL Schema, Meta Schema, and SDK Helper in R

Home Page: https://sbg.github.io/sevenbridges-r/

License: Apache License 2.0

R 97.94% Shell 0.04% Dockerfile 1.51% CSS 0.50%
sevenbridges api-client common-workflow-language bioconductor bioinformatics cloud

sevenbridges-r's Introduction

sevenbridges-r

Build Status BioC Downloads Docker Pulls

BioC (Release) · BioC (Development) · GitHub (Latest)

Overview

sevenbridges-r is an R/Bioconductor package that provides an interface for the Seven Bridges Platform (US, EU, China), Cancer Genomics Cloud, Cavatica, and BioData Catalyst Powered by Seven Bridges public APIs.

The Seven Bridges Platform is a cloud-based environment for conducting bioinformatics analysis. It is a central hub for teams to store, analyze, and jointly interpret their bioinformatic data. The Platform co-locates analysis pipelines alongside the largest genomic datasets to optimize processing, allocating storage, and compute resources on demand.

The Cancer Genomics Cloud (CGC), powered by Seven Bridges, is also a cloud-based computation environment. It was built as one of three pilot systems funded by the National Cancer Institute to explore the paradigm of colocalizing massive genomics datasets, like The Cancer Genomics Atlas (TCGA), alongside secure and scalable computational resources to analyze them. The CGC makes more than a petabyte of multi-dimensional data available immediately to authorized researchers. You can add your data to analyze alongside TCGA using predefined analytical workflows or your own tools.

Cavatica, powered by Seven Bridges, is a data analysis and sharing platform designed to accelerate discovery in a scalable, cloud-based compute environment where data, results, and workflows are shared among the world's research community. Cavatica is built in collaboration with the Children Hospital of Philadelphia and it is focused on pediatric data.

Table of Contents

Installation

Check R version

First, check the version of R you are using with the following command (in R):

R.version.string

If you are not running the latest release version of R, install or upgrade with these instructions. If you are using RStudio, restart RStudio after installing R. RStudio will detect the new installation.

Bioconductor - Release Branch

This is recommended for most users as it is the most stable version.

You can install the package from the release branch on Bioconductor using BiocManager:

install.packages("BiocManager")
BiocManager::install("sevenbridges")

Bioconductor - Development Branch

If you are developing tools under the devel branch or use the development version of R and Bioconductor, install the package from the Bioconductor devel branch. You probably also want to install R-devel first by following the directions in "Using the 'Devel' Version of Bioconductor".

To install the sevenbridges package from the devel branch, use

install.packages("BiocManager")
BiocManager::install("sevenbridges", version = "devel")

Latest Development Version

To try the latest features, please install the package directly from GitHub. We push to the Bioconductor branch (release and devel) regularly.

Installing the sevenbridges package from GitHub requires you have the devtools package. If you do not have devtools, install it from CRAN first.

install.packages("devtools")

You may get an error for missing system dependecies such as curl and ssl. You probably need to do the following first in order to install devtools and to build vignettes since you need pandoc under Ubuntu.

apt-get update
apt-get install libcurl4-gnutls-dev libssl-dev pandoc pandoc-citeproc

After devtools is installed, install the latest version of sevenbridges from GitHub:

install.packages("BiocManager")

devtools::install_github(
  "sbg/sevenbridges-r",
  repos = BiocManager::repositories(),
  build_vignettes = TRUE, dependencies = TRUE
)

If you have trouble with pandoc and do not want to install it, set build_vignettes = FALSE to avoid building the vignettes.

Features

The sevenbridges package includes the following features:

Flexible Authentication Methods

Multiple authentication methods support.

  • Direct authentication:
# Direct authentication
a <- Auth(token = "your_token", platform = "cgc")

# or use base url
a <- Auth(token = "your_token", url = "https://cgc-api.sbgenomics.com/v2")
  • Authentication via system environment variables:
sbg_set_env(token = "your_token", url = "https://cgc-api.sbgenomics.com/v2")
a <- Auth(from = "env")
  • Authentication via a user configuration file, collect and manage your credentials for multiple accounts across various Seven Bridges environments:
a <- Auth(from = "file", profile_name = "aws-us-username")

Please check vignette("api", package = "sevenbridges") for technical details about all available authentication methods.

Complete API R Client

A complete API R client with a user-friendly, object-oriented API with printing and support operations for API requests relating to users, billing, projects, files, apps, and tasks. Short examples are also included, as shown below:

# Get a project by pattern-matching its name
p <- a$project("demo")

# Get a project by its id
p <- a$project(id = "username/demo")

# Delete files from a project
p$file("sample.tz")$delete()

# Upload fies from a folder to a project and include file metadata
p$upload("folder_path", metadata = list(platform = "Illumina"))

Task Monitoring

A task monitoring hook which allows you to add a hook function to specific task statuses as you monitor a task. For example, you can opt to receive an email when the task is completed or specify to download all files produced by the task, as shown below:

setTaskHook("completed", function() {
  tsk$download("~/Downloads")
})
tsk$monitor()

Batch Tasks Support

Batch tasks by metadata and by item.

# Batch by item
(tsk <- p$task_add(
  name = "RNA DE report new batch 2",
  description = "RNA DE analysis report",
  app = rna.app$id,
  batch = batch(input = "bamfiles"),
  inputs = list(
    bamfiles = bamfiles.in,
    design = design.in,
    gtffile = gtf.in
  )
))

# Batch by metadata. Note that input files must
# have relevant metadata fields specified.
(tsk <- p$task_add(
  name = "RNA DE report new batch 3",
  description = "RNA DE analysis report",
  app = rna.app$id,
  batch = batch(
    input = "fastq",
    c("metadata.sample_id", "metadata.library_id")
  ),
  inputs = list(
    bamfiles = bamfiles.in,
    design = design.in,
    gtffile = gtf.in
  )
))

Cross Environment Support

Cross-platform support for Seven Bridges environments, such as Cancer Genomics Cloud or Seven Bridges Platform on either Amazon Web Services or Google Cloud Platform.

Common Workflow Language Tool Interface

A Common Workflow Language (CWL) Tool interface to directly describe your tool in R, export it to JSON or YAML, or add it to your online project. This package defines a complete set of CWL object, so you can describe tools as follows:

fd <- fileDef(name = "runif.R", content = readr::read_file(fl))

rbx <- Tool(
  id = "runif",
  label = "runif",
  hints = requirements(
    docker(pull = "rocker/r-base"),
    cpu(1), mem(2000)
  ),
  requirements = requirements(fd),
  baseCommand = "Rscript runif.R",
  stdout = "output.txt",
  inputs = list(
    input(id = "number", type = "integer", position = 1),
    input(id = "min", type = "float", position = 2),
    input(id = "max", type = "float", position = 3)
  ),
  outputs = output(id = "random", glob = "output.txt")
)

# Print CWL JSON
rbx$toJSON(pretty = TRUE)

# Print CWL YAML
rbx$toYAML()

Utilities for Tool and Flow

Utilities for Tool and Flow, for example

library("sevenbridges")

# convert a SBG CWL JSON file
t1 <- system.file("extdata/app", "tool_star.json", package = "sevenbridges")

# convert json file into a Tool object
t1 <- convert_app(t1)

# shows all input matrix
t1$input_matrix()

Tutorials

We maintain 3 different sets of documentation: the sevenbridges-r GitHub repository (latest and most up-to-date), Bioconductor release channel, and Bioconductor development channel. Below, only the GitHub version is linked to provide the latest documentation. For the other versions, please visit Bioconductor Release version or Bioconductor Development version. The tutorials below are re-generated regularly as we update the package on GitHub.

Tutorial Title HTML Rmd Source
Complete Reference for the API R Client HTML Source
Use R on the Cancer Genomics Cloud HTML Source
Create a Docker Container and use Command Line Interface for R HTML Source
Describe and execute Common Workflow Language (CWL) Tools and Workflows in R HTML Source
IDE container: Rstudio and Shiny server and more HTML Source
Browse data on the Cancer Genomics Cloud via the Data Explorer, a SPARQL query,
or the Datasets API
HTML Source

IDE Docker Image

In the tutorial for IDE container above, we built a Docker container locally from which we can launch RStudio and Shiny. To launch RStudio and Shiny Server with the Seven Bridges IDE Docker container, do the following:

docker run  -d -p 8787:8787 -p 3838:3838 --name rstudio_shiny_server sevenbridges/sevenbridges-r

To mount a file system, you need to use --privileged with fuse.

docker run  --privileged -d -p 8787:8787 -p 3838:3838 --name rstudio_shiny_server sevenbridges/sevenbridges-r

Check out the IP from Docker Machine if you are on a Mac OS.

docker-machine ip default

In your browser, you can see where the RStudio server is located from the path http://<url>:8787/. For example, if 192.168.99.100 is returned, visit http://192.168.99.100:8787/ for Rstudio.

For the Shiny server, each app__ is hosted at http://<url>:3838/users/<username of rstudio>/<app_dir> for the Shiny server. For example, an app called 01_hello owned by user rstudio (a default user) has the path http://<url>:3838/users/rstudio/01_hello/. To develop your Shiny apps as an Rstudio user, you can login your RStudio server and create a folder in your home folder called ~/ShinyApps. There, you can develop shiny apps in that folder. For example, you can create an app called 02_text at ~/ShinyApps/02_text/.

Log into your RStudio at http://<url>:8787. Then, try to copy an app to your home folder, as follows:

dir.create("~/ShinyApps")
file.copy(
  "/usr/local/lib/R/site-library/shiny/examples/01_hello/",
  "~/ShinyApps/",
  recursive = TRUE
)

If you are logged in as user rstudio, visit http://192.168.99.100:3838/rstudio/01_hello. You should be able to see the "hello" example.

Note: Generic Shiny apps can also be hosted at http://<url>:3838/ or, for a particular app, at http://<url>:3838/<app_dir>. Inside the Docker container, it's hosted under /srv/shiny-server/.

FAQ

The best place to ask questions about the sevenbridges package is the mailing list.

  • Q: Which version of the Common Workflow Language (CWL) is supported?
    A: We support draft 2 and are making progress on supporting draft 3.

  • Q: Is there a Python binding for the API?
    A: Yes, the official Python client is here. Recipes and tutorials using the Python bindings are here.

  • Q: Why do I get warning messages when I use the API R client?
    A: The warning only exists in Rstudio and is potentially a bug in Rstudio. To ignore, it use options(warn = -1)

  • Q: I still have problems despite dismissing the messages.
    A: Please try to use the latest package on GitHub or update installed Bioconductor packages. This usually includes the most recent bug fixes.

Events

Time Event Location
Jan 12, 2017 Genomics in the Cloud - Boston Bioconductor Meetup (talk) [slides] Dana-Farber Cancer Institute, Boston, MA
Sep 12 - 14, 2016 Probabilistic Modeling in Genomics (poster) University of Oxford, Oxford, UK
May 27 - 29, 2016 The 9th China-R Conference (talk) Renmin University of China, Beijng, China
Jun 27 - 30, 2016 The R User Conference 2016 (talk) Stanford University, Stanford, CA
Jun 24 - 26, 2016 BioC 2016: Where Software and Biology Connect (workshop) Stanford University, Stanford, CA
Apr 1 - 3, 2016 NCI Cancer Genomics Cloud Hackathon (tutorial)
[HTML] [R Markdown Source]
Seven Bridges Genomics, Inc., Boston, MA

Contribute

Please file bug reports/feature requests on the issue page, or create pull requests here.

Contributors should read the Seven Bridges Notice to Contributors and sign the Seven Bridges Contributor Agreement before submitting a pull request.

Copyright

© 2020 Seven Bridges Genomics, Inc. All rights reserved.

This project is licensed under the terms of the Apache License 2.0.

sevenbridges-r's People

Contributors

duxan avatar emyo avatar nanxstats avatar nemanjab17 avatar sbgtengfei avatar tengfei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sevenbridges-r's Issues

R API issues with Cavatica

When I follow the tutorial for the R API in sbg (adapted for Cavatica, http://docs.sevenbridges.com/v1.0/reference#api-r-library, I get the following error Error in envRefSetField(.Object, field, classDef, selfEnv, elements[[field]]) : when creating a task (see screenshot).

This prevents the tsk object from being created. The task is still running, but I get this error message each time I handle the tsk object and it’s not possible to download it.

Any idea of what is happening there?
screen shot 2017-02-15 at 11 17 00 pm

test with rabix/bunny

How we make it better integrated with bunny? Can we run local test from bunny for our CWL Tool or Flow object? This enhancement is going to be easier for developers to test their app with test data (maybe in the IDE container?) need more discussion and brainstorm on this.

Improve CWL workflow construction in R

Currently t1 %>>% t2 only support simple flow simple matching, but we need flexible extra parameter in Flow function, that allow user to connect by id directly, so they can create non-linear or complex flow from R script easily.

Upload via a manifest file

what is a manifest file
http://docs.sevenbridges.com/docs/format-of-a-manifest-file

Supported in GUI and command line uploader, so in R API client we want to do

  • interface allow you to specify which meta to keep and to ignore, by default, upload all meta with files
    p$upload(manifest_file = , manifest_metadata = TRUE, meta.keep = NULL, meta.ignore = NULL, verbal = FALSE) # with metadata = TRUE by default
  • Validation: check file exists or not and stop if not all satisfied, assumption here is that user provide a manifest, it has to work and all files need to exists.
  • Uploading files: print files info in verbal mode otherwise just progress bar

Remove Trailing Spaces

This task is aimed for

  1. removing all trailing spaces and unnecessary blank lines in all relevant files, especially R files.
  2. updating the project configuration file .Rproj to make this the default setting if people contribute with RStudio.

This option is hidden in RStudio:

Single project-wise:
Build - Configure build tools - Code editing - Strip trailing horizonal whitespaces when saving

Global-wise:
Tools - Global options - Code - Saving - Strip trailing horizonal whitespaces when saving

Sync Check with API v2

Check if the API interface supported in core.R are synced with API v2.

Add, delete, or modify necessary parts to ensure consistency.

Build vignettes failed when installing from github

Hi,

When installing from github:

source("http://bioconductor.org/biocLite.R")
useDevel()
biocLite("BiocUpgrade")
library(devtools)
install_github("tengfei/sevenbridges", build_vignettes=TRUE, 
  repos=BiocInstaller::biocinstallRepos(),
  dependencies=TRUE, type = "source")

I get an error:

Quitting from lines 155-183 (bioc-workflow.Rmd) 
Error: processing vignette 'bioc-workflow.Rmd' failed with diagnostics:
invalid assignment for reference class field 'token', should be from class "character" or a subclass (was class "NULL")
Execution halted

It is a tiny issue - pull request that fixes it is #1

p.s. great work btw.

export FilesList() to users and make sure single file works in case of file array

Public API related validation question

  • when file array specified, should I pass a single file a list of a single file
  • when file[not array] specified, can I pass a single file or a list of a single file

need to test and need clear documentation, or make the R API extremely simple so user won't think about it or worry about it at all.

support for SBG "Test" tab

I test multiple revisions of a tool on CGC/SBG platform. Every time the revision is pushed the "Test" tab is blanked out and I have to re-input to test the command line expression.

Would be great if I could add the "Test" information via R script.

fix annoying warning

This is caused by reference class, hard to debug, but let me try to fix it by April 20.

lift single R markdown into everything

from R markdown with headers into

  1. command line script
  2. Dockerfile
  3. cwl json file
  4. combined with API client, when token provided, push to repos you specified. should I support any public repos config?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.