Coder Social home page Coder Social logo

nf-quilt's Introduction

nf-quilt

ARCHIVED: As of version 0.3.2, the nf-quilt plugin is now maintained by Quilt at quiltdata/nf-quilt.

Nextflow plugin for interacting with Quilt packages.

nf-quilt currently allows you to publish the outputs of a workflow run as a Quilt package. WHen you launch a pipeline with the nf-quilt plugin, it will publish a Quilt package upon workflow completion that contains output files published to S3.

Getting Started

To use the nf-quilt plugin, you need Nextflow 22.04 (or later) and Python 3.7 (or later).

Install the quilt-cli Python package:

pip3 install git+https://github.com/nextflow-io/nf-quilt.git#subdirectory=quilt-cli

Add the following snippet to your nextflow.config to enable the plugin:

plugins {
    id 'nf-quilt'
}

Configure the plugin with the quilt config scope in your nextflow.config. At a minimum, you should specify the package name and registry. You can also specify a list of paths to include in the Quilt package; by default, the plugin will include all output files that were published to S3.

TIP: It is recommended that you use publishDir to select outputs for the Quilt package, rather than quilt.paths, so that the Quilt package matches the actual workflow outputs.

Here's an example based on nf-core/rnaseq:

quilt {
  packageName = 'genomes/yeast'
  registry = 's3://seqera-quilt'
  message = 'My commit message'
  meta = [pipeline: 'nf-core/rnaseq']
  force = false
}

Finally, run your Nextflow pipeline with your config file. You do not need to modify your pipeline script in order to use the nf-quilt plugin. As long as your pipeline publishes the desired output files to S3, the plugin will automatically publish a Quilt package based on your configuration settings.

Reference

The plugin exposes a new quilt config scope which supports the following options:

Config option Description
quilt.packageName Name of package, in the USER/PKG format
quilt.registry Registry where to create the new package
quilt.message The commit message for the new package
quilt.meta Package-level metadata in the form of key-value pairs
quilt.force Skip the parent top hash check and create a new revision even if your local state is behind the remote registry
quilt.paths List of published files (can be path or glob) to include in the package

Tower

To use nf-quilt with Nextflow Tower, add the following lines to your pre-run script:

yum install python3-pip -y
yum install git -y
pip3 install git+https://github.com/nextflow-io/nf-quilt.git#subdirectory=quilt-cli

Development

Refer to the nf-hello README for instructions on how to build, test, and publish Nextflow plugins.

nf-quilt's People

Contributors

bentsherman avatar pditommaso avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nf-quilt's Issues

Automatically add Tower run id to package metadata

For the Tower/Quilt integration, it is looking like each workflow run will map to a new revision in a quilt package (like pushing a new commit to github). As such, we want to add the workflow run ID to the revision metadata, so that users browsing a quilt package have a link to the Tower run that produced it.

Basically we need to decide who is responsible for adding the workflow id. I see that the nf-tower plugin is given the workflow id on workflow start, so perhaps there is a clean way to provide that to the quilt plugin. Alternatively, the Tower backend could add it to the Nextflow config text.

cc @pditommaso @swampie

Add get started section

Let's add in the readme a "Get started" with a mini tutorial to guide a dumb user (like me) to:

  1. to install the Python package
  2. write a simple nextflow script using the nf-quilt package
  3. run it
  4. check the result

Ideas for future enhancements

Currently the nf-quilt plugin can publish a single package for a workflow run, with the option to select files by name.

Here is a list of feature enhancements that we are considering:

  • allow multiple quilt packages to be published, with a config block for each package
  • add fromQuilt channel factory to download a quilt package and emit files to a channel
  • add publishQuilt operator to publish received files to a quilt package, allows user to select files with pipeline logic instead of config (also supports publishing multiple packages) (similar to publish operator idea)

All of these ideas are feasible to implement, but I think we should try to figure out what users would prefer before we move forward with any of them. In particular I'd like to understand how users feel about the "config" approach vs "pipeline logic" approach.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.