Coder Social home page Coder Social logo

fmalmeida / ngs-preprocess Goto Github PK

View Code? Open in Web Editor NEW
28.0 4.0 4.0 5.4 MB

A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies

Home Page: https://ngs-preprocess.readthedocs.io/

License: GNU General Public License v3.0

Nextflow 97.55% Dockerfile 2.45%
ngs-preprocess illumina pacbio nextflow pipeline trimgalore nanopack bax2bam porechop reproducible-research

ngs-preprocess's Introduction

Hello ๐Ÿ˜ ๐Ÿ‘‹

Hello there, my name is Felipe Almeida, a brazilian scientist, bioinformatician, pipeline developer and problem solver. My main interests are: Bioinformatics, genomic surveillance, precision medicine, and microbial genomics. You can also find me on twitter @fmarquesalmeida, stackoverflow and linkedin.

Academic info

I'm a PhD student at the University of Brasilia, at the CompGen (Computational Genomics) laboratory with academic guidance from PhD. Prof. Georgios J. Pappas Jr.

Some of my favourite tools:

Nextflow Python R bash

My stats

Top Langs

Felipe Github Stats

ngs-preprocess's People

Contributors

fmalmeida avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ngs-preprocess's Issues

change structure of output directory

The structure of the output directory is not standardized and needs some changes in order to enable easy accession of final (preprocessed) reads.

It would be nice to have:

  • A final directory, probably called final_output that will contain all final (trimmed and filtered) fastq files, in fq.gz format to standardize filenames.
  • This directory will hold all results and separate reads in subdirectories (for longreads or shortreads).
  • Then, the other files (quality, merging steps, correction steps, etc.) would be saved in other directories, one for each step, software or strain ... still needs to think about it.

More brainstorming about this issue is still required before taking action into its implementation. Help required to decide the structure (@gpappasunb).

change to bioconda images

Instead of creating a custom docker image with all tools, reconfigure the pipeline to use the bioconda channels and images, which will enable users to run the tool with conda, docker or singularity.

update module to fetch data from sra

Currently, the pipeline understands it to split downloaded data to modules based on the patterns: Illumina,pacbio,nanopore.
But what if a downloaded data is not from any of these platforms?
Think on how to better approach channel splitting.

Enhance documentation (paper review)

Background
This issue is meant to address the comments received on the paper review here.

Description
Create an "Output" page to facilitate users on the output structure and refer the correct tools-specific links as it is done in the bacannot documentation page, which gives users the interpretation of the generated results, including the directory structure and the relevant links for the tool-specific reference material.

Suggestion for hybrid error correction

Hi there,

Found your wrapper over Twitter, great incentive :). I have a suggestion for your pipeline - it would be of interest to consider hybrid correction (aka. combine short + longread). With my current pipeline I was using fmlrc, combine with ONT works pretty well

Cheers,

Tuan

Add example of non-bacterial dataset analysis (paper review)

Background
This issue is meant to address the comments received on the paper review here.

Description
Generate a new page in the web documentation, showing the analysis of a fungi or plant sequencing dataset. Make sure that they have the necessary command lines from input to output, so one can reproduce, but also, add an overview of the generated results in the web page.

Once done, check how easily one can we update the paper to provide an additional Zenodo for the non-bacterial analysis (ngs-preprocess + MpGAP).

standard profile to not load docker

Instead of making the standard profile of the pipeline to automatically load Docker, it is best to make it do not load for any profile by default and act as a simple local pipeline.

So, if users desire to use one of the available profiles one must explicitly select -profile docker/singularity/conda.

new tool for long reads QC

A new tool for long reads quality assessment is now available:

The task is to evaluate the tool and compare it with NanoPack and pycoQC in order to evaluate whether this tools is worthy its inclusion or the replacement of one of the mentioned tools in the pipeline.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.