Coder Social home page Coder Social logo

nf-core / bacass Goto Github PK

View Code? Open in Web Editor NEW
51.0 141.0 35.0 4.8 MB

Simple bacterial assembly and annotation pipeline

Home Page: https://nf-co.re/bacass

License: MIT License

HTML 2.68% Python 12.18% Nextflow 85.14%
nf-core workflow nextflow pipeline genome-assembly bacterial-genomes assembly hybrid-assembly nanopore nanopore-sequencing

bacass's People

Contributors

angelovangel avatar apeltzer avatar bewt85 avatar d4straub avatar daniel-vm avatar drpatelh avatar ewels avatar kevinmenden avatar mashehu avatar maxulysse avatar nf-core-bot avatar rivera10 avatar xlinxlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bacass's Issues

Local processes to nf-core/modules

Several processes are using tools that are maybe interesting also in other pipeline and could be made available in nf-core/modules, such as:

  • modules/local/canu.nf
  • modules/local/dfast.nf
  • modules/local/medaka.nf
  • modules/local/miniasm.nf
  • modules/local/nanopolish.nf
  • modules/local/porechop.nf

Other processes use local modules because the nf-core modules were not applicable:

  • modules/local/pycoqc.nf -> only accepts fast5 summary files, no raw fast5 files
  • modules/local/nanoplot.nf -> only accepts accepts only .fastq.gz, not .fq.gz
  • modules/local/minimap_align.nf -> inout path does not match nf.core module, might be possible to fix in bacass
  • modules/local/unicycler.nf -> only allows short read assembly (but not hybrid and long)

Enhance MultiQC to collect stats from all supported modules

Issue description

MultiQC module retrieves QC and reads trimming stats only.

Describe the solution you'd like

Gather the files of all modules (supported in multiqc) and let MultiQC access their data to get a complete overview of the workflow.

Unicycler version issue

Hi,

Thank you for this great workflow!

I encounter the following error from unicycler running hybrid mode:

Dependencies:
  Program         Version           Status
  spades.py       3.13.0            good
  racon           -                 good
  makeblastdb     2.5.0+            good
  tblastn         2.5.0+            good
  bowtie2-build   2.4.1             good
  bowtie2         2.4.1             good
  samtools        ?                 too old
  java            11.0.8-internal   good
  pilon           1.23              good
  bcftools                          not used

Error: Unspecified error with Unicycler dependencies

I think this is probably related to this issue.

It seems changing to another image fixes the issue. I used quay.io/biocontainers/unicycler:0.4.8--py37h13b99d1_3.

Software versions:

  • nextflow: 21.04.1.5556
  • bacass: nfcore/bacass:1.1.1
  • runtime: docker
  • platform: google cloud instance with Ubuntu 20.04.2 LTS

Chenhao

skewer: command not found with Singularity profile

I am trying to run bacass v1.1.0, using nextflow version 20.04.1 and the Singularity profile. The run terminates with an error exit status (127):

Error executing process > 'trim_and_combine (ERR3219830)'

Caused by:
  Process `trim_and_combine (ERR3219830)` terminated with an error exit status (127)

Command executed:

  # loop over readunits in pairs per sample
     pairno=0
     echo "ERR3219830_R1.fastq.gz ERR3219830_R2.fastq.gz" | xargs -n2 | while read fq1 fq2; do
  skewer --quiet -t 8 -m pe -q 3 -n -z $fq1 $fq2;
     done
     cat $(ls *trimmed-pair1.fastq.gz | sort) >> ERR3219830_trm-cmb.R1.fastq.gz
     cat $(ls *trimmed-pair2.fastq.gz | sort) >> ERR3219830_trm-cmb.R2.fastq.gz

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 5: skewer: command not found

The command I attempted was:

nextflow run nf-core/bacass --input bacass_short.tsv -profile singularity --kraken2db "~/db/minikraken2_v1_8GB_201904.tgz"

QUAST module not used in MultiQC

MultiQC/MultiQC#1351

So apparently it doesn't work with ${sample_id}_report.tsv - maybe I should try ``${sample_id}.report.tsv` instead and/or configure a custom name for the MultiQC config for the pipeline:

quast_config:
    fn: *_report.tsv

could do the trick maybe :-)

fastqc fails with Value Error

Hi all. I'm excited to get started with your workflow. However, I cannot as of yet run the fastqc step, at least on nextflow version 19.07.0.5106 / docker.

Here's a snipped of the output:

fastqc -t {task.cpus} -q 1_R1_001_trm-cmb.R1.fastq.gz 1_R1_001_trm-cmb.R2.fastq.gz
...
Value "{task.cpus}" invalid for option threads (number expected)

Looks like the actual value of task.cpus is not getting substituted. Perhaps there is a typo in Line 252 of main.tf:
Proposed change is:

-     fastqc -t {task.cpus} -q ${fq1} ${fq2}
+    fastqc -t ${task.cpus} -q ${fq1} ${fq2}

I can issue a PR, but this seems like a straightforward fix.
thanks!

Improvements in forked repository: KmerFinder, Estimation of Reference Genome, Custom Quast, and Custom MultiQC Reports

Description of feature

Overview

Hello!, My colleagues and I have been actively working on enhancing the nf-core/bacass workflow to address lab-specific challenges in bacterial genome assembly. We are happy to add these improvements into the main nf-core/bacass repository in case you are interested.

Currently, these enhancements have been implemented in my local fork of nf-core/bacass on the buisciii-develop branch.

nextflow run main.nf \
        -profile singularity,test \
        --skip_kmerfinder false \
        --kmerfinderdb path/to/kmerfinder_db/bacteria \
        --ncbi_assembly_metadata path/to/ncbi_assembly_metadata/assembly_summary_bacteria.txt \
        --outdir ./results \
        -w ./work \
        -resume

Breaking down implementations:

1. Kmerfinder Subworkflow:

  • Added a local KmerFinder module for read quality control (QC) and purity assessment.
  • Developed a local module to compile KmerFinder results from all samples into a comprehensive CSV summary file.
  • Implemented a method to group input samples (*.fastq, *.fasta, and other files...) based on the reference genome estimated with KmerFinder.
  • Created a local module to identify the reference genome estimated with KmerFinder in the NCBI database and download this genome. This reference genome is then utilized to retrieve relevant metrics from QUAST, such as the percentage of genome fraction. This functionality is particularly valuable when input samples belong to different species, requiring more than one reference for a comprehensive by_reference_genome report.

2. Quast Assembly QC by Grouping Samples:

  • Modified Quast execution when KmerFinder is invoked. Now, Quast runs twice:
  • Initial 'general' Quast without reference genome files (*.fna, *.gff).
  • Subsequent 'by reference genome' Quast, providing a Quast reports that agregates samples and their reference genome (estimated with kmerfinder).

3. Custom MultiQC Reports:

  • Incorporated a custom MultiQC module into the workflow.
  • Added multiqc_config.yml files for short, long and hybrid assembly modes (they work when kmerfinder is invoked, otherwise standard multiqc report is generated).
  • Upon invoking KmerFinder, a custom MultiQC HTML report is generated using the MULTIQC() module. This report consolidates metrics from KmerFinder, Quast, and other relevant sources, presenting them together in an overview table located in the first section of the report. See image:

Screenshot from 2024-01-04 16-23-26

Foot note

If you think these improvements could be implemented in nf-core/bacass, let me know so I can work on the test data and test profile.

Replace Prokka with Dfast

In reference to #11 - would it be an option to simply replace Prokka with Dfast:
https://github.com/nigyta/dfast_core

From my tests, it seems to outperform Prokka in terms of "loci annotated/named" - and it's also on bioconda and easily as fast as Prokka.

The issue with Prokka and Tbl2asn seems like it won't be fixable - and having the Docker image "expire" every so often is probably not an ideal solution moving forward.

Cheers,
Marc

Assembly Polishing is not working in the V2.0

Description of the bug

Assembly Polishing step is not being performed in the workflow by default nor with the param --polish_method

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: nextflow run nf-core/bacass --input 'sample_sheet.csv' --outdir /data/nihr/nanopore_sequencing/9_08_21/hybrid_assembly_output/ -profile docker --assembly_type hybrid -resume --kraken2db "https://genome-idx.s3.amazonaws.com/kraken/k2_standard_8gb_20210517.tar.gz"

Nextflow Installation

  • Version: -- 21.04.3

Container engine

  • Engine: Docker

Screenshot from 2021-08-30 16-47-10

Regex problem

paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes

There seems to be an incorrect regex here. Above regex can be translated as match an h / at the beginning once or more OR match an / or more that comes with literal $.

The regex doesn't remove whitespace and trailing slash(es) I believe. Correct regex should be replaceAll("\/+$| +", "").

List of ideas to improve assemblies

This is a collection of ideas that should be considered after the DSL2 conversion #56 is finished. The list is subject to change. Any ideas or discussions are welcome.

Preprocessing (check out nf-core/mag, any other examples out there?)

  • Filtlong to filter ONT by quality (e.g. >7)
  • Bowtie2 to remove Illumina PhiX reads
  • Nanolyse (alternatively Minimap2) to remove ONT Lambda reads
  • add option to down-sample reads, because sometimes this can actually improve assembly

Assemblers:

  • MEGAHIT (a5-miseq #23 , ...) to have alternative short read assembler
  • Trycycler to have better hybrid and long read assembly than Unicycler
  • Flye (Tulip, Redbean, Raven) to have more long read assemblers at hand
  • Pilon to polish Nanopore-derived contigs with Illumina reads (for long read assemblers)

Assembly QC:

  • BUSCO to check completeness and contamination of assemblies (and possibly bins)
  • MaxBin2 (or any other binner) to separate assembly (cleanup if contaminated). In contrast to other binners, MaxBin2 outputs "Completeness, Genome size, GC content" for each bin it found, that comes very handy when judging whether there is real contamination.

Structural:

  • Use only the most polished assembly for Prokka & QUAST (currently assemblies before polishing are used!)
  • By default, run all (or at least many) assemblers inclusive polishing (Medaka & Pilon) that are appropriate for a data set. That allows easy comparison (with e.g. QUAST and BUSCO) of the performance of different assemblers and choosing the best assembly.

Defaults

  • In my opinion, --skip_kraken2 should be either removed (i.e. using --krakendb to determine whether Kraken2 is used) or a simple default (small, fast, but helpful) value should be chosen for --krakendb, e.g. "https://genome-idx.s3.amazonaws.com/kraken/16S_Greengenes13.5_20200326.tgz". This is a very small 16S database but should be sufficient to detect serious bacterial contamination.

URGENT: pin nf-validation version

Description of the bug

To prevent breaking this pipeline in the near future, the nf-validation version should be pinned to version 1.1.3 like:

plugins {
    id '[email protected]'
}

Command used and terminal output

No response

Relevant files

No response

System information

No response

Allow dragonflye to polish draft genome with short-reads

Description of feature

Dragonflye allows polishing the raw assembly with illumina reads if provided (See dragonflye docs). However, the current implementation of dragonflye in nf-core/bacass performs long read assembly only.

Allow nf-core/bacass and the nf-core/dragonflye module to polish the assembled genome with short reads when provided.

Add Zenodo DOI for release to main README on master

Would be good to add the Zenodo DOI for the release to the main README of the pipeline in order to make it citable. You will have to do this via a branch pushed to the repo in order to directly update master. See PR below for example and file changes:
nf-core/atacseq#38

See https://zenodo.org/record/2669429#.XVZ0bOhKhPY

Web-hooks are already set-up for this repo to have a unique Zenodo DOI generated everytime a new version of the pipeline is released. Would be good to add this in after every release ๐Ÿ‘

Add A5-miseq support

Hi,

thank you for providing this pipeline.

Would you consider providing A5-miseq support for short-reads-only mode?

I normally use Spades (or Unicycler in this case), but I've consistently been getting better results with A5-miseq when assembling a short-reads-only dataset.

e.g. compare these two assemblies
bacass - unicycler:
Total n: 181
Total seq: 5473999 bp
Avg. seq: 30243.09 bp
Median seq: 1476.00 bp
N 50: 143598 bp
Min seq: 110 bp
Max seq: 623519 bp

a5-miseq:
Total n: 78
Total seq: 5532586 bp
Avg. seq: 70930.59 bp
Median seq: 4987.50 bp
N 50: 278625 bp
Min seq: 623 bp
Max seq: 761326 bp

It is not a huge difference, but I believe it would be a good addition to the pipeline. I'd love to make a PR myself, but I'm still not confident enough with Groovy/nextflow scripting.

Thank you for any assistance you can provide,

V

quast: command not found

Because of the issue in this post #24, I ran the command conda env create --prefix /home/ss/test_bacass/work/conda/nf-core-bacass-1.1-0-58ac097954559efb6ec2ce857847ed28 --file /home/ss/.nextflow/assests/nf-core/bacass/environment.yml to create the conda environment, then I ran nextflow run nf-core/bacass --input bacass_short.csv --skip_kraken2 -profile conda and got another error message:

Error executing process > 'quast (ER064912)'

Caused by:
Process 'quast (ER064912)' terminated with an error exit status (127)

Command executed:
quast -t 2 -o ER064912_assembly_QC ER064912_assembly.fasta
quast -v > v_quast.txt

command exit status:
127

Command output:
(empty)

Command error:
.command.sh: line 2: quast: command not found

quast is not in the environment.yml file, could it be missed?

Update bakta to v1.8.2

Description of feature

After Bakta up-date round, the last release v1.8.2 works with the db-light in zendo (oschwengers/bakta#241).

Module BAKTA_DBDOWNLOAD_RUN:BAKTA_BAKTADBDOWNLOAD() can be updated to v1.8.2

Can we access kraken2db from S3 bucket?

I just start using bacass on AWS. I tried to access kraken2db from S3 bucket. My question is that is it possible to access kraken2db from S3 bucket? I have tried run bacass with
--kraken2db 's3://kraken2_db/minikraken2_v2_8GB_201904_UPDATE' using nextflow-tower

I got the error message below.
Command error:
kraken2: database ("s3://kraken2_db/minikraken2_v2_8GB_201904_UPDATE") does not contain necessary file taxo.k2d

Thank you very much,
Piroon

sample sheet check reports wrong row item

In the checks for long reads and fast5 files the wrong file is reported if there is an error

exit 1, "ERROR: Please check input samplesheet -> Long FastQ file does not exist!\n${row.R1}"

Should read

 exit 1, "ERROR: Please check input samplesheet -> Long FastQ file does not exist!\n${row.LongFastQ}"

exit 1, "ERROR: Please check input samplesheet -> Fast5 file does not exist!\n${row.R1}"

Should read

exit 1, "ERROR: Please check input samplesheet -> Fast5 file does not exist!\n${row.Fast5}"

Working on updating the nf-core/tools template v2.9

Issue description

I'm planning to work on updating the nf-core/tools-2.9 (#84 ) template to enhance its functionality, improve documentation, and ensure it aligns with the latest best practices.

To that end I am going through this list:

  • 1. Review the current template codebase.
  • 2. Identify deprecated or obsolete components that need to be updated.
  • 3. Update dependencies to the latest compatible versions.
  • 4. Refactor code to adhere to nf-core coding standards.
  • 5. Test the updated template with various data and scenarios.
  • 6. Address any issues or bugs found during testing.
  • 7. Ensure backward compatibility with existing workflows.
  • 8. Linting

I aim to periodically update this issue to provide insights into the progress made. If anyone has expertise in template development or nf-core best practices, your input would be highly appreciated ๐Ÿ™๐Ÿพ .

(edit v3: updated task list)

Save trimmed parameter saying 'false' is not a valid choice

Description of the bug

I am running into an issue with the --save_trimmed_fail setting when running the bacass pipeline. It indicates that 'false' is not a valid choice, but it also indicates that 'true' or 'false' are the only valid arguments for this setting.

Command used and terminal output

nextflow run https://github.com/nf-core/bacass \
		 -name bacass-test-2 \
		 -params-file https://api.cloud.seqera.io/ephemeral/Mn6cNSXHpkJNUK5IW3P4HQ.json \
		 -with-tower \
		 -r 2.2.0 \
		 -profile docker


  workDir        : /seqcoast-aws/scratch/1kcilKqDAJaXkO
  projectDir     : /.nextflow/assets/nf-core/bacass
  userName       : root
  profile        : docker
  configFiles    :
Input/output options
  input          : https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_hybrid.tsv
  outdir         : s3://seqcoast-aws
Assembly parameters
  assembly_type  : hybrid
  canu_mode      : -nanopore
Annotation
  dfast_config   : /.nextflow/assets/nf-core/bacass/assets/test_config_dfast.py
Skipping Options
  skip_kraken2   : true
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/bacass for your analysis please cite:
* The pipeline
  10.5281/zenodo.2669428
* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
  https://github.com/nf-core/bacass/blob/master/CITATIONS.md
------------------------------------------------------
ERROR ~ ERROR: Validation of pipeline parameters failed!
-- Check 'nf-1kcilKqDAJaXkO.log' file for details
The following invalid input values have been detected:
* --save_trimmed_fail: 'false' is not a valid choice (Available choices: true, false)
-- Check script '.nextflow/assets/nf-core/bacass/./workflows/../subworkflows/local/utils_nfcore_bacass_pipeline/../../nf-core/utils_nfvalidation_plugin/main.nf' at line: 57 or see 'nf-1kcilKqDAJaXkO.log' file for more details

Relevant files

No response

System information

  • Nextflow version: 23.10.1
  • Hardware: AWS
  • Executor: awsbatch
  • Container engine: Docker
  • OS: Linux
  • Version of nf-core/bacass: 2.2.0

nf-validation on sample sheet

Sample sheet validation could be performed via nf-validation plugin.
In addition, tsv/csv documentation and parsing will need adjustments.

  • Fix README: tsv file extension on sample sheet will fail, change it to csv
    - [ ] Update test data-set (nf-core/test-dataset::bacass ). Two options:
    - Switch from tab to comma separated content/structure on sample sheets for testing (I prefer this one) .
    - Or Change file extensions on sample sheets from csv to tsv
  • Add nf-validation on sample sheet

kraken2 failing in docker on ubuntu

The example run fails on my Ubuntu machine

nextflow run nf-core/bacass -r 1.1.0 -profile docker --input https://raw.githubusercontent.com/nf-core/test-datasets/bacass/bacass_short.csv --kraken2db ${PWD}/minikraken2 --max_memory 40.GB --max_cpus 10

with

Command exit status:
  2

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  kraken2: database ("../minikraken2") does not contain necessary file taxo.k2d

However, I can use minikraken2 db locally. Docker version 19.03.5, nextflow version is 19.10.0.5170. I'm only seeing this in Ubuntu 18.04: the test runs fine on my macbook.

Any guidance?

skewer.nf problem with Docker container.

Description of the bug

Module compilation error
- file : /home/user/.nextflow/assets/nf-core/bacass/./workflows/../modules/local/skewer.nf
- cause: expecting '}', found ',' @ line 26, column 55.
                                 , emit: lo
                                 ^

1 error

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: nextflow run nf-core/bacass -r 2.0.0 -name amr_sample5 -profile docker -params-file nf-params.json

nf-params.json:

{
    "input": "amr_sample2.csv",
    "kraken2db": "\/home\/user\/minikraken2_v2_8GB_201904_UPDATE"
}

Conversion to DSL2 & update of tools

All nf-core pipelines will be converted to nextflow DSL2 and nf-core/bacass should not be left behind.
Additionally, this opportunity can be used to update all tools and progressively add more.

Currently, I am planning to go into that middle of September 21 at the latest. Earliest start would be when nf-core/tools releases its DSL2 template, which might be soon.

I'll write here as soon as I start. If anybody else is planning to or is tackling that problem already, please share your plans here that there is no redundant work done.

Edit: #54 with support for DSL2 is open

Check for spaces in IDs in Sample Sheet

I got a "too many arguments" error for one of the commands when I had sample IDs with spaces in the sample sheet. I don't recall which command it was exactly but this should be an issue for any command. I'd suggest to either throw an error when the sample sheet is incorrect in this regard or to automatically get rid of spaces in the IDs.

Thanks!

skip-kraken2 requires Kraken2 DB arg

Running the following command using bacass 1.1.0:
nextflow run nf-core/bacass --input bacass_short.csv -profile singularity --skip-kraken2
results in the following:
Missing Kraken2 DB arg
One would need to specify the --kraken2db argument, which is a bit counter-intuitive to the role of --skip-kraken2.

Prokka version

Hi, yesterday we failed with this pipeline in the Prokka process. After we rewrote the environment.yml file and made it to Prokka 1.14.0, then it works perfect now.

kraken2db not being mounted for singularity

Hi there,

I downloaded the latest pipeline v1.1.1 and ran it offline with the following command:

nextflow run $PWD/nf-core-bacass-1.1.1/workflow/ \
  -profile singularity \
  --kraken2db /path/to/krakendb \
  --input $PWD/samples.csv \
  --assembly_type long \
  --skip_annotation \
  --skip_polish \
  --assembler canu \
  --canu_args 'stopOnLowCoverage=0 minInputCoverage=0'

The contents of the kraken2db folder are

library
taxonomy
hash.k2d
opts.k2d
seqid2taxid.map
taxo.k2d

The error I'm getting is

kraken2: database ("/path/to/krakendb") does not contain necessary file taxo.k2d

Here are the .command.run and .command.sh files (I added .txt at the end to be able to attach them)
command.run.txt
command.sh.txt

In my profile file, I have defined

singularity {
    enabled = true
    autoMounts = true
    cacheDir = "/path/to/images/singularity/nfcore/"
}

Thanks for looking into this. I'm just wondering whether I'm missing something.

Cheers,
Santiago

Failed to create conda environment (timeout?)

When I ran the command nextflow run nf-core/bacass --input bacass_short.csv --skip_kraken2 -profile conda by using your test data then I got the error message:

Caused by:
Failed to create Conda environment
...
Status: 120

According to another post here: nextflow-io/nextflow#1081, I think it maybe also caused by time out.

Can you add this line conda { createTimeout = '1 h' } into your nextflow.config file then let me try it again? Thanks!

Replace megacontainer with single biocontainers

Which resolves all the nasty errors with python / updates / annoyances.

If anyone wants to help with this, what we need to do is:

  • Replace for each process the respective container
  • Add multi-tool containers https://github.com/BioContainers/multi-package-containers for some steps (e.g. nanopolish needs nanopolish, samtools and minimap2 in one container -> should be done via PR)
  • Update get_software_versions / add channels to export this in a suitable way to get things to that process

As this is only spare time work from my side, need some help here from people with the possiblity to contribute here @nf-core/core :-) Also necessary to make this pipeline DSLv2 compatible at some point!

Running error of exit status 255

Description of the bug

I ran the pipeline with my own data the command line is "nextflow run nf-core/bacass -r 2.2.0 -profile docker --input ./minikrakendb/baccsamplesheet.tsv --kraken2db /home/aslangabriel/minikrakendb/k2_standard_08gb_20240112.tar.gz --max_cpus 12 --max_memory '125.GB' --outdir ./minikrakendb/results", my sample sheet looks like that
image
and the error message was as follows
"-[nf-core/bacass] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_BACASS:BACASS:FASTQ_TRIM_FASTP_FASTQC:FASTP (JSAHVC01)'

Caused by:
Process NFCORE_BACASS:BACASS:FASTQ_TRIM_FASTP_FASTQC:FASTP (JSAHVC01) terminated with an error exit status (255)

Command executed:

[ ! -f JSAHVC01_1.fastq.gz ] && ln -sf JSAHVC01_S1_R1.fastq.gz JSAHVC01_1.fastq.gz
[ ! -f JSAHVC01_2.fastq.gz ] && ln -sf JSAHVC01_S1_R2.fastq.gz JSAHVC01_2.fastq.gz
fastp
--in1 JSAHVC01_1.fastq.gz
--in2 JSAHVC01_2.fastq.gz
--out1 JSAHVC01_1.fastp.fastq.gz
--out2 JSAHVC01_2.fastp.fastq.gz
--json JSAHVC01.fastp.json
--html JSAHVC01.fastp.html



--thread 8
--detect_adapter_for_pe

2> >(tee JSAHVC01.fastp.log >&2)

cat <<-END_VERSIONS > versions.yml
"NFCORE_BACASS:BACASS:FASTQ_TRIM_FASTP_FASTQC:FASTP":
fastp: $(fastp --version 2>&1 | sed -e "s/fastp //g")
END_VERSIONS

Command exit status:
255

Command output:
(empty)

Command error:
ERROR: Failed to open file: JSAHVC01_1.fastq.gz

Work dir:
/home/aslangabriel/work/d0/6d8835e9fba25be3a34dcf33080fa2

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

", I have changed the file name per se the test example. please help me to fix it.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Nanoplot expect png output in hybrid assembly mode

Description of the bug

Since the last update 2.0.0 the Nanoplot command expect a .png output but this output format was removed from version 1.1.1 to 2.0.0. I solved the issue by removing the png output line in the modules/local/nanoplot.nf

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: NXF_VER=21.04.1 nextflow run nf-core/bacass -r 2.0.0 --input samplesheet-bacass.csv --kraken2db ~/db/kraken/k2_pluspf_16gb_20210127.tar --annotation_tool dfast --assembly_type hybrid -profile docker --outdir results
  2. See error:
    Error executing process > 'NFCORE_BACASS:BACASS:NANOPLOT (NS45)'

Caused by:
Missing output file(s) *.png expected by process NFCORE_BACASS:BACASS:NANOPLOT (NS45)

Command executed:

NanoPlot

-t 2
--fastq 2111-DK-l1-001.fastq
echo $(NanoPlot --version 2>&1) | sed 's/^.NanoPlot //; s/ .$//' > nanoplot.version.txt

Command exit status:
0

Command output:
(empty)

Work dir:
/home/sysgen/Desktop/ncct-projects/2201-Kostner-Assembly/work/b6/5ee3ce889e000a378cc359d94da18a

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Expected behaviour

I would expect that the Nanoplot process produce different output formats for the length distribution and similar statistics of my long reads

Log files

Have you provided the following extra information/files:

System

  • Hardware: HPC/Desktop
  • Executor: local
  • OS: Linux
  • Version 18.04

Nextflow Installation

  • Version:

Container engine

  • Engine: Docker
  • version: 20.10.11

Additional context

prokka doesn't work in docker nfcore/bacass:1.0.0

It seems that the default docker image doesn't have a functional prokka installation due to expiry of tbl2asn. At the moment, I've gotten around this by specifying a different docker image for the prokka process, but it would be nice if this wasn't necessary.

Single end Illumina processing?

Is there a way to process single end reads with bacass? I am attempting to modify config files, and it would be amazing if you already had a solution in hand.

Best,
Emily

Pilon issue (default_jvm_mem_opts)

Hi!

I would like to know is there a way to set default_jvm_mem_opts in Pilon (which is a part of Unicycler) through nextflow run command line? Especially when one uses -profile conda or -profile docker. Or there always be a problem by dealing with large genomes.

[Question] how to reduce Docker image size

Hi @apeltzer

We have exchanged messages here: #28 (comment)

A related question:

I am relatively new to Nextflow and using NF with Docker. As a general rule, how big can the Docker image can be before it is considered too large? I was wondering if you could help and share some guidelines?

If I do docker build with the environment.yml from nf-core/sarek or mag, the image size comes close to 2.4 GB. Is there a way to reduce the size of the image?

I tried some of these techniques too but it did not help:

https://uwekorn.com/2021/03/03/deploying-conda-environments-in-docker-cheatsheet.html
https://jcristharif.com/conda-docker-tips.html

I also tried with Micromamba, the size still of the final Docker image is pretty huge

https://github.com/mamba-org/micromamba-docker

Have you tried Mamba/Micromamba? I would be curious to know your findings

Thanks in advance.

Fix warning messages on process name

Description of feature

Two warning messages appear during the execution of tests in the nf-core/bacass GitHub CI.

WARN: A process with name 'MINIMAP2_CONSENSUS' is defined more than once in module script: /home/runner/work/bacass/bacass/./workflows/bacass.nf -- Make sure to not define the same function as process
WARN: A process with name 'MINIMAP2_POLISH' is defined more than once in module script: /home/runner/work/bacass/bacass/./workflows/bacass.nf -- Make sure to not define the same function as process

Removal of Prokka in favor of Bakta

Issue description

Prokka is no longer under maintenance and Bakta seems to be a reasonable replacement for genome annotation which incorporates several improvements .

Describe the solution you'd like

Remove Prokka from nf-core/bacass and add Bakta instead.

Additional notes

It seems that Bakta needs a database to perform the annotations. However, even the light version of its database is somewhat heavy and could slow down the testing process.

Another option is to keep Prokka and add Bakta as an additional tool for annotation.

I am open to suggestions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.