nf-core / bacass Goto Github PK

Simple bacterial assembly and annotation pipeline

License: MIT License

HTML 2.68% Python 12.18% Nextflow 85.14%

nf-core workflow nextflow pipeline genome-assembly bacterial-genomes assembly hybrid-assembly nanopore nanopore-sequencing

bacass's People

Contributors

Stargazers

Watchers

bacass's Issues

Documentation broken link

The documentation is at https://nf-co.re/bacass/usage, but the link on the Documentation section of the nextflow page https://nf-co.re/bacass#documentation goes to https://nf-core/bacass/docs (which is a broken link)

Readme asks users to cite ATAC-seq pipeline

If you use nf-core/atacseq for your analysis, please cite it using the following doi: 10.5281/zenodo.2634132

Local processes to nf-core/modules

Several processes are using tools that are maybe interesting also in other pipeline and could be made available in nf-core/modules, such as:

modules/local/canu.nf
modules/local/dfast.nf
modules/local/medaka.nf
modules/local/miniasm.nf
modules/local/nanopolish.nf
modules/local/porechop.nf

Other processes use local modules because the nf-core modules were not applicable:

modules/local/pycoqc.nf -> only accepts fast5 summary files, no raw fast5 files
modules/local/nanoplot.nf -> only accepts accepts only .fastq.gz, not .fq.gz
modules/local/minimap_align.nf -> inout path does not match nf.core module, might be possible to fix in bacass
modules/local/unicycler.nf -> only allows short read assembly (but not hybrid and long)

Enhance MultiQC to collect stats from all supported modules

Issue description

MultiQC module retrieves QC and reads trimming stats only.

Describe the solution you'd like

Gather the files of all modules (supported in multiqc) and let MultiQC access their data to get a complete overview of the workflow.

Unicycler version issue

Hi,

Thank you for this great workflow!

I encounter the following error from unicycler running hybrid mode:

Dependencies:
  Program         Version           Status
  spades.py       3.13.0            good
  racon           -                 good
  makeblastdb     2.5.0+            good
  tblastn         2.5.0+            good
  bowtie2-build   2.4.1             good
  bowtie2         2.4.1             good
  samtools        ?                 too old
  java            11.0.8-internal   good
  pilon           1.23              good
  bcftools                          not used

Error: Unspecified error with Unicycler dependencies

I think this is probably related to this issue.

It seems changing to another image fixes the issue. I used quay.io/biocontainers/unicycler:0.4.8--py37h13b99d1_3.

Software versions:

nextflow: 21.04.1.5556
bacass: nfcore/bacass:1.1.1
runtime: docker
platform: google cloud instance with Ubuntu 20.04.2 LTS

Chenhao

skewer: command not found with Singularity profile

I am trying to run bacass v1.1.0, using nextflow version 20.04.1 and the Singularity profile. The run terminates with an error exit status (127):

Error executing process > 'trim_and_combine (ERR3219830)'

Caused by:
  Process `trim_and_combine (ERR3219830)` terminated with an error exit status (127)

Command executed:

  # loop over readunits in pairs per sample
     pairno=0
     echo "ERR3219830_R1.fastq.gz ERR3219830_R2.fastq.gz" | xargs -n2 | while read fq1 fq2; do
  skewer --quiet -t 8 -m pe -q 3 -n -z $fq1 $fq2;
     done
     cat $(ls *trimmed-pair1.fastq.gz | sort) >> ERR3219830_trm-cmb.R1.fastq.gz
     cat $(ls *trimmed-pair2.fastq.gz | sort) >> ERR3219830_trm-cmb.R2.fastq.gz

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 5: skewer: command not found

The command I attempted was:

nextflow run nf-core/bacass --input bacass_short.tsv -profile singularity --kraken2db "~/db/minikraken2_v1_8GB_201904.tgz"

tbl2asn is expired

This breaks prokka. The docker container / conda environment needs to be rebuilt
see: tseemann/prokka#453

QUAST module not used in MultiQC

MultiQC/MultiQC#1351

So apparently it doesn't work with ${sample_id}_report.tsv - maybe I should try ``${sample_id}.report.tsv` instead and/or configure a custom name for the MultiQC config for the pipeline:

quast_config:
    fn: *_report.tsv

could do the trick maybe :-)

fastqc fails with Value Error

Hi all. I'm excited to get started with your workflow. However, I cannot as of yet run the fastqc step, at least on nextflow version 19.07.0.5106 / docker.

Here's a snipped of the output:

fastqc -t {task.cpus} -q 1_R1_001_trm-cmb.R1.fastq.gz 1_R1_001_trm-cmb.R2.fastq.gz
...
Value "{task.cpus}" invalid for option threads (number expected)

Looks like the actual value of task.cpus is not getting substituted. Perhaps there is a typo in Line 252 of main.tf:
Proposed change is:

-     fastqc -t {task.cpus} -q ${fq1} ${fq2}
+    fastqc -t ${task.cpus} -q ${fq1} ${fq2}

I can issue a PR, but this seems like a straightforward fix.
thanks!

Improvements in forked repository: KmerFinder, Estimation of Reference Genome, Custom Quast, and Custom MultiQC Reports

Description of feature

Overview

Hello!, My colleagues and I have been actively working on enhancing the nf-core/bacass workflow to address lab-specific challenges in bacterial genome assembly. We are happy to add these improvements into the main nf-core/bacass repository in case you are interested.

Currently, these enhancements have been implemented in my local fork of nf-core/bacass on the buisciii-develop branch.

nextflow run main.nf \
        -profile singularity,test \
        --skip_kmerfinder false \
        --kmerfinderdb path/to/kmerfinder_db/bacteria \
        --ncbi_assembly_metadata path/to/ncbi_assembly_metadata/assembly_summary_bacteria.txt \
        --outdir ./results \
        -w ./work \
        -resume

Breaking down implementations:

1. Kmerfinder Subworkflow:

Added a local KmerFinder module for read quality control (QC) and purity assessment.
Developed a local module to compile KmerFinder results from all samples into a comprehensive CSV summary file.
Implemented a method to group input samples (*.fastq, *.fasta, and other files...) based on the reference genome estimated with KmerFinder.
Created a local module to identify the reference genome estimated with KmerFinder in the NCBI database and download this genome. This reference genome is then utilized to retrieve relevant metrics from QUAST, such as the percentage of genome fraction. This functionality is particularly valuable when input samples belong to different species, requiring more than one reference for a comprehensive by_reference_genome report.

2. Quast Assembly QC by Grouping Samples:

Modified Quast execution when KmerFinder is invoked. Now, Quast runs twice:
Initial 'general' Quast without reference genome files (*.fna, *.gff).
Subsequent 'by reference genome' Quast, providing a Quast reports that agregates samples and their reference genome (estimated with kmerfinder).

3. Custom MultiQC Reports:

Incorporated a custom MultiQC module into the workflow.
Added multiqc_config.yml files for short, long and hybrid assembly modes (they work when kmerfinder is invoked, otherwise standard multiqc report is generated).
Upon invoking KmerFinder, a custom MultiQC HTML report is generated using the MULTIQC() module. This report consolidates metrics from KmerFinder, Quast, and other relevant sources, presenting them together in an overview table located in the first section of the report. See image:

Foot note

If you think these improvements could be implemented in nf-core/bacass, let me know so I can work on the test data and test profile.

Replace Prokka with Dfast

In reference to #11 - would it be an option to simply replace Prokka with Dfast:
https://github.com/nigyta/dfast_core

From my tests, it seems to outperform Prokka in terms of "loci annotated/named" - and it's also on bioconda and easily as fast as Prokka.

The issue with Prokka and Tbl2asn seems like it won't be fixable - and having the Docker image "expire" every so often is probably not an ideal solution moving forward.

Cheers,
Marc

Assembly Polishing is not working in the V2.0

Description of the bug

Assembly Polishing step is not being performed in the workflow by default nor with the param --polish_method

Steps to reproduce

Steps to reproduce the behaviour:

Command line: nextflow run nf-core/bacass --input 'sample_sheet.csv' --outdir /data/nihr/nanopore_sequencing/9_08_21/hybrid_assembly_output/ -profile docker --assembly_type hybrid -resume --kraken2db "https://genome-idx.s3.amazonaws.com/kraken/k2_standard_8gb_20210517.tar.gz"

Nextflow Installation

Version: -- 21.04.3

Container engine

Engine: Docker

Regex problem

bacass/modules/local/functions.nf

Line 32 in 9599673

    
           paths     = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes

There seems to be an incorrect regex here. Above regex can be translated as match an h / at the beginning once or more OR match an / or more that comes with literal $.

The regex doesn't remove whitespace and trailing slash(es) I believe. Correct regex should be replaceAll("\/+$| +", "").

List of ideas to improve assemblies

This is a collection of ideas that should be considered after the DSL2 conversion #56 is finished. The list is subject to change. Any ideas or discussions are welcome.

Preprocessing (check out nf-core/mag, any other examples out there?)

Filtlong to filter ONT by quality (e.g. >7)
Bowtie2 to remove Illumina PhiX reads
Nanolyse (alternatively Minimap2) to remove ONT Lambda reads
add option to down-sample reads, because sometimes this can actually improve assembly

Assemblers:

MEGAHIT (a5-miseq #23 , ...) to have alternative short read assembler
Trycycler to have better hybrid and long read assembly than Unicycler
Flye (Tulip, Redbean, Raven) to have more long read assemblers at hand
Pilon to polish Nanopore-derived contigs with Illumina reads (for long read assemblers)

Assembly QC:

BUSCO to check completeness and contamination of assemblies (and possibly bins)
MaxBin2 (or any other binner) to separate assembly (cleanup if contaminated). In contrast to other binners, MaxBin2 outputs "Completeness, Genome size, GC content" for each bin it found, that comes very handy when judging whether there is real contamination.

Structural:

Use only the most polished assembly for Prokka & QUAST (currently assemblies before polishing are used!)
By default, run all (or at least many) assemblers inclusive polishing (Medaka & Pilon) that are appropriate for a data set. That allows easy comparison (with e.g. QUAST and BUSCO) of the performance of different assemblers and choosing the best assembly.

Defaults

In my opinion, --skip_kraken2 should be either removed (i.e. using --krakendb to determine whether Kraken2 is used) or a simple default (small, fast, but helpful) value should be chosen for --krakendb, e.g. "https://genome-idx.s3.amazonaws.com/kraken/16S_Greengenes13.5_20200326.tgz". This is a very small 16S database but should be sufficient to detect serious bacterial contamination.

Add back tests for kraken

Follow viralrecon recommendations, tests should just test execution and not on big data.

URGENT: pin nf-validation version

Description of the bug

To prevent breaking this pipeline in the near future, the nf-validation version should be pinned to version 1.1.3 like:

plugins {
    id '[email protected]'
}

Command used and terminal output

No response

Relevant files

No response

System information

No response

Allow dragonflye to polish draft genome with short-reads

Description of feature

Dragonflye allows polishing the raw assembly with illumina reads if provided (See dragonflye docs). However, the current implementation of dragonflye in nf-core/bacass performs long read assembly only.

Allow nf-core/bacass and the nf-core/dragonflye module to polish the assembled genome with short reads when provided.

Add Zenodo DOI for release to main README on master

Would be good to add the Zenodo DOI for the release to the main README of the pipeline in order to make it citable. You will have to do this via a branch pushed to the repo in order to directly update master. See PR below for example and file changes:
nf-core/atacseq#38

See https://zenodo.org/record/2669429#.XVZ0bOhKhPY

Web-hooks are already set-up for this repo to have a unique Zenodo DOI generated everytime a new version of the pipeline is released. Would be good to add this in after every release 👍

Released nf-core v2.10 template

DESCRIPTION

nf-core v2.10 template update #89

Add A5-miseq support

Hi,

thank you for providing this pipeline.

Would you consider providing A5-miseq support for short-reads-only mode?

I normally use Spades (or Unicycler in this case), but I've consistently been getting better results with A5-miseq when assembling a short-reads-only dataset.

e.g. compare these two assemblies
bacass - unicycler:
Total n: 181
Total seq: 5473999 bp
Avg. seq: 30243.09 bp
Median seq: 1476.00 bp
N 50: 143598 bp
Min seq: 110 bp
Max seq: 623519 bp

a5-miseq:
Total n: 78
Total seq: 5532586 bp
Avg. seq: 70930.59 bp
Median seq: 4987.50 bp
N 50: 278625 bp
Min seq: 623 bp
Max seq: 761326 bp

It is not a huge difference, but I believe it would be a good addition to the pipeline. I'd love to make a PR myself, but I'm still not confident enough with Groovy/nextflow scripting.

Thank you for any assistance you can provide,

samplesheet should have a suffix .tsv if it is a tab-separated file as described

Description of the bug

The documentation describes that the input sample sheet is a tab-separated file but it labelled csv in the example. The pipeline fails if the suffix is .tsv

I suggest changing the example and the pattern match to ^\S+\.tsv$

quast: command not found

Because of the issue in this post #24, I ran the command conda env create --prefix /home/ss/test_bacass/work/conda/nf-core-bacass-1.1-0-58ac097954559efb6ec2ce857847ed28 --file /home/ss/.nextflow/assests/nf-core/bacass/environment.yml to create the conda environment, then I ran nextflow run nf-core/bacass --input bacass_short.csv --skip_kraken2 -profile conda and got another error message:

Error executing process > 'quast (ER064912)'

Caused by:
Process 'quast (ER064912)' terminated with an error exit status (127)

Command executed:
quast -t 2 -o ER064912_assembly_QC ER064912_assembly.fasta
quast -v > v_quast.txt

command exit status:
127

Command output:
(empty)

Command error:
.command.sh: line 2: quast: command not found

quast is not in the environment.yml file, could it be missed?

Add flye to nf-core/bacass

Update bakta to v1.8.2

Description of feature

After Bakta up-date round, the last release v1.8.2 works with the db-light in zendo (oschwengers/bakta#241).

Module BAKTA_DBDOWNLOAD_RUN:BAKTA_BAKTADBDOWNLOAD() can be updated to v1.8.2

Can we access kraken2db from S3 bucket?

I just start using bacass on AWS. I tried to access kraken2db from S3 bucket. My question is that is it possible to access kraken2db from S3 bucket? I have tried run bacass with
--kraken2db 's3://kraken2_db/minikraken2_v2_8GB_201904_UPDATE' using nextflow-tower

I got the error message below.
Command error:
kraken2: database ("s3://kraken2_db/minikraken2_v2_8GB_201904_UPDATE") does not contain necessary file taxo.k2d

Thank you very much,
Piroon

sample sheet check reports wrong row item

In the checks for long reads and fast5 files the wrong file is reported if there is an error

bacass/subworkflows/local/input_check.nf

Line 66 in 9599673

    
           exit 1, "ERROR: Please check input samplesheet -> Long FastQ file does not exist!\n${row.R1}"

Should read

 exit 1, "ERROR: Please check input samplesheet -> Long FastQ file does not exist!\n${row.LongFastQ}"

bacass/subworkflows/local/input_check.nf

Line 74 in 9599673

    
           exit 1, "ERROR: Please check input samplesheet -> Fast5 file does not exist!\n${row.R1}"

Should read

exit 1, "ERROR: Please check input samplesheet -> Fast5 file does not exist!\n${row.Fast5}"

Working on updating the nf-core/tools template v2.9

Issue description

I'm planning to work on updating the nf-core/tools-2.9 (#84 ) template to enhance its functionality, improve documentation, and ensure it aligns with the latest best practices.

To that end I am going through this list:

1. Review the current template codebase.
2. Identify deprecated or obsolete components that need to be updated.
3. Update dependencies to the latest compatible versions.
4. Refactor code to adhere to nf-core coding standards.
5. Test the updated template with various data and scenarios.
6. Address any issues or bugs found during testing.
7. Ensure backward compatibility with existing workflows.
8. Linting

I aim to periodically update this issue to provide insights into the progress made. If anyone has expertise in template development or nf-core best practices, your input would be highly appreciated 🙏🏾 .

(edit v3: updated task list)

Save trimmed parameter saying 'false' is not a valid choice

Description of the bug

I am running into an issue with the --save_trimmed_fail setting when running the bacass pipeline. It indicates that 'false' is not a valid choice, but it also indicates that 'true' or 'false' are the only valid arguments for this setting.

Command used and terminal output

nextflow run https://github.com/nf-core/bacass \
		 -name bacass-test-2 \
		 -params-file https://api.cloud.seqera.io/ephemeral/Mn6cNSXHpkJNUK5IW3P4HQ.json \
		 -with-tower \
		 -r 2.2.0 \
		 -profile docker


  workDir        : /seqcoast-aws/scratch/1kcilKqDAJaXkO
  projectDir     : /.nextflow/assets/nf-core/bacass
  userName       : root
  profile        : docker
  configFiles    :
Input/output options
  input          : https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_hybrid.tsv
  outdir         : s3://seqcoast-aws
Assembly parameters
  assembly_type  : hybrid
  canu_mode      : -nanopore
Annotation
  dfast_config   : /.nextflow/assets/nf-core/bacass/assets/test_config_dfast.py
Skipping Options
  skip_kraken2   : true
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/bacass for your analysis please cite:
* The pipeline
  10.5281/zenodo.2669428
* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
  https://github.com/nf-core/bacass/blob/master/CITATIONS.md
------------------------------------------------------
ERROR ~ ERROR: Validation of pipeline parameters failed!
-- Check 'nf-1kcilKqDAJaXkO.log' file for details
The following invalid input values have been detected:
* --save_trimmed_fail: 'false' is not a valid choice (Available choices: true, false)
-- Check script '.nextflow/assets/nf-core/bacass/./workflows/../subworkflows/local/utils_nfcore_bacass_pipeline/../../nf-core/utils_nfvalidation_plugin/main.nf' at line: 57 or see 'nf-1kcilKqDAJaXkO.log' file for more details

Relevant files

No response

System information

Nextflow version: 23.10.1
Hardware: AWS
Executor: awsbatch
Container engine: Docker
OS: Linux
Version of nf-core/bacass: 2.2.0

nf-validation on sample sheet

Sample sheet validation could be performed via nf-validation plugin.
In addition, tsv/csv documentation and parsing will need adjustments.

Fix README: tsv file extension on sample sheet will fail, change it to csv
- [ ] Update test data-set (nf-core/test-dataset::bacass ). Two options:
- Switch from tab to comma separated content/structure on sample sheets for testing (I prefer this one) .
- Or Change file extensions on sample sheets from csv to tsv
Add nf-validation on sample sheet

kraken2 failing in docker on ubuntu

The example run fails on my Ubuntu machine

nextflow run nf-core/bacass -r 1.1.0 -profile docker --input https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_short.csv --kraken2db ${PWD}/minikraken2 --max_memory 40.GB --max_cpus 10

with

Command exit status:
  2

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  kraken2: database ("../minikraken2") does not contain necessary file taxo.k2d

However, I can use minikraken2 db locally. Docker version 19.03.5, nextflow version is 19.10.0.5170. I'm only seeing this in Ubuntu 18.04: the test runs fine on my macbook.

Any guidance?

skewer.nf problem with Docker container.

Description of the bug

Module compilation error
- file : /home/user/.nextflow/assets/nf-core/bacass/./workflows/../modules/local/skewer.nf
- cause: expecting '}', found ',' @ line 26, column 55.
                                 , emit: lo
                                 ^

1 error

Steps to reproduce

Steps to reproduce the behaviour:

Command line: nextflow run nf-core/bacass -r 2.0.0 -name amr_sample5 -profile docker -params-file nf-params.json

nf-params.json:

{
    "input": "amr_sample2.csv",
    "kraken2db": "\/home\/user\/minikraken2_v2_8GB_201904_UPDATE"
}

Conversion to DSL2 & update of tools

All nf-core pipelines will be converted to nextflow DSL2 and nf-core/bacass should not be left behind.
Additionally, this opportunity can be used to update all tools and progressively add more.

Currently, I am planning to go into that middle of September 21 at the latest. Earliest start would be when nf-core/tools releases its DSL2 template, which might be soon.

I'll write here as soon as I start. If anybody else is planning to or is tackling that problem already, please share your plans here that there is no redundant work done.

Edit: #54 with support for DSL2 is open

Add homopolish for nanopore-only assembly

Hi, may I know is it possible to add homopolish as a tool for polishing after polished by medaka?

Check for spaces in IDs in Sample Sheet

I got a "too many arguments" error for one of the commands when I had sample IDs with spaces in the sample sheet. I don't recall which command it was exactly but this should be an issue for any command. I'd suggest to either throw an error when the sample sheet is incorrect in this regard or to automatically get rid of spaces in the IDs.

Thanks!

skip-kraken2 requires Kraken2 DB arg

Running the following command using bacass 1.1.0:
nextflow run nf-core/bacass --input bacass_short.csv -profile singularity --skip-kraken2
results in the following:
Missing Kraken2 DB arg
One would need to specify the --kraken2db argument, which is a bit counter-intuitive to the role of --skip-kraken2.

Prokka version

Hi, yesterday we failed with this pipeline in the Prokka process. After we rewrote the environment.yml file and made it to Prokka 1.14.0, then it works perfect now.

kraken2db not being mounted for singularity

Hi there,

I downloaded the latest pipeline v1.1.1 and ran it offline with the following command:

nextflow run $PWD/nf-core-bacass-1.1.1/workflow/ \
  -profile singularity \
  --kraken2db /path/to/krakendb \
  --input $PWD/samples.csv \
  --assembly_type long \
  --skip_annotation \
  --skip_polish \
  --assembler canu \
  --canu_args 'stopOnLowCoverage=0 minInputCoverage=0'

The contents of the kraken2db folder are

library
taxonomy
hash.k2d
opts.k2d
seqid2taxid.map
taxo.k2d

The error I'm getting is

kraken2: database ("/path/to/krakendb") does not contain necessary file taxo.k2d

Here are the .command.run and .command.sh files (I added .txt at the end to be able to attach them)
command.run.txt
command.sh.txt

In my profile file, I have defined

singularity {
    enabled = true
    autoMounts = true
    cacheDir = "/path/to/images/singularity/nfcore/"
}

Thanks for looking into this. I'm just wondering whether I'm missing something.

Cheers,
Santiago

New module - dragonflye - for long-read assembly

Description of feature

Based on the list of proposed enhancements for the nf-core/bacass pipeline (#57), I suggest the integration of the Dragonflye module into the long-read assembly mode.

Failed to create conda environment (timeout?)

When I ran the command nextflow run nf-core/bacass --input bacass_short.csv --skip_kraken2 -profile conda by using your test data then I got the error message:

Caused by:
Failed to create Conda environment
...
Status: 120

According to another post here: nextflow-io/nextflow#1081, I think it maybe also caused by time out.

Can you add this line conda { createTimeout = '1 h' } into your nextflow.config file then let me try it again? Thanks!

Replace megacontainer with single biocontainers

Which resolves all the nasty errors with python / updates / annoyances.

If anyone wants to help with this, what we need to do is:

Replace for each process the respective container
Add multi-tool containers https://github.com/BioContainers/multi-package-containers for some steps (e.g. nanopolish needs nanopolish, samtools and minimap2 in one container -> should be done via PR)
Update get_software_versions / add channels to export this in a suitable way to get things to that process

As this is only spare time work from my side, need some help here from people with the possiblity to contribute here @nf-core/core :-) Also necessary to make this pipeline DSLv2 compatible at some point!

Running error of exit status 255

Description of the bug

I ran the pipeline with my own data the command line is "nextflow run nf-core/bacass -r 2.2.0 -profile docker --input ./minikrakendb/baccsamplesheet.tsv --kraken2db /home/aslangabriel/minikrakendb/k2_standard_08gb_20240112.tar.gz --max_cpus 12 --max_memory '125.GB' --outdir ./minikrakendb/results", my sample sheet looks like that

and the error message was as follows
"-[nf-core/bacass] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_BACASS:BACASS:FASTQ_TRIM_FASTP_FASTQC:FASTP (JSAHVC01)'

Caused by:
Process NFCORE_BACASS:BACASS:FASTQ_TRIM_FASTP_FASTQC:FASTP (JSAHVC01) terminated with an error exit status (255)

Command executed:

[ ! -f JSAHVC01_1.fastq.gz ] && ln -sf JSAHVC01_S1_R1.fastq.gz JSAHVC01_1.fastq.gz
[ ! -f JSAHVC01_2.fastq.gz ] && ln -sf JSAHVC01_S1_R2.fastq.gz JSAHVC01_2.fastq.gz
fastp
--in1 JSAHVC01_1.fastq.gz
--in2 JSAHVC01_2.fastq.gz
--out1 JSAHVC01_1.fastp.fastq.gz
--out2 JSAHVC01_2.fastp.fastq.gz
--json JSAHVC01.fastp.json
--html JSAHVC01.fastp.html

--thread 8
--detect_adapter_for_pe

2> >(tee JSAHVC01.fastp.log >&2)

cat <<-END_VERSIONS > versions.yml
"NFCORE_BACASS:BACASS:FASTQ_TRIM_FASTP_FASTQC:FASTP":
fastp: $(fastp --version 2>&1 | sed -e "s/fastp //g")
END_VERSIONS

Command exit status:
255

Command output:
(empty)

Command error:
ERROR: Failed to open file: JSAHVC01_1.fastq.gz

Work dir:
/home/aslangabriel/work/d0/6d8835e9fba25be3a34dcf33080fa2

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

", I have changed the file name per se the test example. please help me to fix it.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Nanoplot expect png output in hybrid assembly mode

Description of the bug

Since the last update 2.0.0 the Nanoplot command expect a .png output but this output format was removed from version 1.1.1 to 2.0.0. I solved the issue by removing the png output line in the modules/local/nanoplot.nf

Steps to reproduce

Steps to reproduce the behaviour:

Command line: NXF_VER=21.04.1 nextflow run nf-core/bacass -r 2.0.0 --input samplesheet-bacass.csv --kraken2db ~/db/kraken/k2_pluspf_16gb_20210127.tar --annotation_tool dfast --assembly_type hybrid -profile docker --outdir results
See error:
Error executing process > 'NFCORE_BACASS:BACASS:NANOPLOT (NS45)'

Caused by:
Missing output file(s) *.png expected by process NFCORE_BACASS:BACASS:NANOPLOT (NS45)

Command executed:

NanoPlot

-t 2
--fastq 2111-DK-l1-001.fastq
echo $(NanoPlot --version 2>&1) | sed 's/^.NanoPlot //; s/ .$//' > nanoplot.version.txt

Command exit status:
0

Command output:
(empty)

Work dir:
/home/sysgen/Desktop/ncct-projects/2201-Kostner-Assembly/work/b6/5ee3ce889e000a378cc359d94da18a

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Expected behaviour

I would expect that the Nanoplot process produce different output formats for the length distribution and similar statistics of my long reads

Log files

Have you provided the following extra information/files:

The command used to run the pipeline
The .nextflow.log file can be found in my fork of the bacass pipeline (https://github.com/jenmuell/bacass)

System

Hardware: HPC/Desktop
Executor: local
OS: Linux
Version 18.04

Nextflow Installation

Version:

Container engine

Engine: Docker
version: 20.10.11

Additional context

prokka doesn't work in docker nfcore/bacass:1.0.0

It seems that the default docker image doesn't have a functional prokka installation due to expiry of tbl2asn. At the moment, I've gotten around this by specifying a different docker image for the prokka process, but it would be nice if this wasn't necessary.

Single end Illumina processing?

Is there a way to process single end reads with bacass? I am attempting to modify config files, and it would be amazing if you already had a solution in hand.

Best,
Emily

Pilon issue (default_jvm_mem_opts)

Hi!

I would like to know is there a way to set default_jvm_mem_opts in Pilon (which is a part of Unicycler) through nextflow run command line? Especially when one uses -profile conda or -profile docker. Or there always be a problem by dealing with large genomes.

[Question] how to reduce Docker image size

Hi @apeltzer

We have exchanged messages here: #28 (comment)

A related question:

I am relatively new to Nextflow and using NF with Docker. As a general rule, how big can the Docker image can be before it is considered too large? I was wondering if you could help and share some guidelines?

If I do docker build with the environment.yml from nf-core/sarek or mag, the image size comes close to 2.4 GB. Is there a way to reduce the size of the image?

I tried some of these techniques too but it did not help:

https://uwekorn.com/2021/03/03/deploying-conda-environments-in-docker-cheatsheet.html
https://jcristharif.com/conda-docker-tips.html

I also tried with Micromamba, the size still of the final Docker image is pretty huge

https://github.com/mamba-org/micromamba-docker

Have you tried Mamba/Micromamba? I would be curious to know your findings

Thanks in advance.

Fix warning messages on process name

Description of feature

Two warning messages appear during the execution of tests in the nf-core/bacass GitHub CI.

WARN: A process with name 'MINIMAP2_CONSENSUS' is defined more than once in module script: /home/runner/work/bacass/bacass/./workflows/bacass.nf -- Make sure to not define the same function as process
WARN: A process with name 'MINIMAP2_POLISH' is defined more than once in module script: /home/runner/work/bacass/bacass/./workflows/bacass.nf -- Make sure to not define the same function as process

Removal of Prokka in favor of Bakta

Issue description

Prokka is no longer under maintenance and Bakta seems to be a reasonable replacement for genome annotation which incorporates several improvements .

Describe the solution you'd like

Remove Prokka from nf-core/bacass and add Bakta instead.

Additional notes

It seems that Bakta needs a database to perform the annotations. However, even the light version of its database is somewhat heavy and could slow down the testing process.

Another option is to keep Prokka and add Bakta as an additional tool for annotation.

I am open to suggestions.

nf-core / bacass Goto Github PK

bacass's People

Contributors

Stargazers

Watchers

Forkers

bacass's Issues

Issue description

Describe the solution you'd like

Description of feature

Overview

Breaking down implementations:

1. Kmerfinder Subworkflow:

2. Quast Assembly QC by Grouping Samples:

3. Custom MultiQC Reports:

Foot note

Description of the bug

Steps to reproduce

Nextflow Installation

Container engine

Preprocessing (check out nf-core/mag, any other examples out there?)

Assemblers:

Assembly QC:

Structural:

Defaults

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

DESCRIPTION

Description of the bug

Description of feature

Issue description

Description of the bug

Command used and terminal output

Relevant files

System information

Description of the bug

Steps to reproduce

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of the bug

Steps to reproduce

Expected behaviour

Log files

System

Nextflow Installation

Container engine

Additional context

Description of feature

Issue description

Describe the solution you'd like

Additional notes

Recommend Projects

Recommend Topics

Recommend Org