Coder Social home page Coder Social logo

evotools / nf-lo Goto Github PK

View Code? Open in Web Editor NEW
51.0 3.0 10.0 19.73 MB

A Nextflow workflow to generate lift over files for any pair of genomes

Home Page: https://nf-lo.readthedocs.io/

License: MIT License

Dockerfile 2.25% Shell 10.04% Nextflow 87.50% q 0.20%
nextflow-workflow nextflow-dsl2 liftover genome bioinformatics

nf-lo's Introduction

nf-LO

lastz minimap2 GSAlign blat Docker Singularity Build (docker) Python Package using Conda

Nextflow LiftOver pipeline

nf-LO is a nextflow workflow for generating genome alignment files compatible with the UCSC liftOver utility for converting genomic coordinates between assemblies. It can automatically pull genomes directly from NCBI or iGenomes (or the user can provide fasta files) and supports four different aligners (lastz, blat, minimap2, GSAlign). Together these provide solutions for both different-species (lastz and minimap2) as well as same-species alignments (blat and GSAlign), with both standard and ultra-fast algorithms from a source to a target genome. It comes with a series of presets, allowing alignments of genomes depending on their genomic distance (near, medium and far).

Updates

See CHANGELOG for more details.

UPDATE 05/2024: The --aligner minimap2 mode now runs in multiple processes, splitting the target genome in fragments of at least --tgtSize bases; individual contigs and scaffolds will not be fragmented, and each chunk will contain entire sequences, unless the --mm2_lowmem option is provided. The old approach is still accessible through the --mm2_full_alignment option. The anaconda recipe with the dependencies has been updated, so please ensure to re-create the container where needed. This optimization allows to perform a minimap2 liftover of the panTro6 to the hg38 genomes on a 16-cores Ryzen 7 8700G 64G Ubuntu machine in under half an hour

UPDATE 14/12/2022: Now the NCBI/iGenomes accession have to be provided in the --source/--target field, and then use the appropriate --igenomes_source/--ncbi_source and --igenomes_target/--ncbi_target as a modifier.

UPDATE 08/06/2022: fixed a bug in which lastz would not align small fragmented genomes, as well as small contigs, in the source assembly. Anyone interested in these small contigs should discard the previous version of nf-LO using nextflow drop evotools/nf-LO, and repeat the analyses.

UPDATE 07/06/2022: Added the possibility of providing customized conservation scores in the q-format via the --qscores flag.

Documentation

You can find more details on the usage of nf-LO in the readthedocs or in the wiki pages. These also include a simple step-by-step tutorial to run the analyses on your own genomes.

Table of Contents

Installation

Nextflow first needs to be installed. To do so, follow the instructions here

curl -s https://get.nextflow.io | bash

Note Nextflow requires Java 8 or later. We suggest to install, depending on your preferences:

The workflow natively support four different ways to provide dependencies:

  1. Anaconda: this is the recommended and easiest way.
  2. Docker: you can create a docker image locally by using the Dockerfile and environment.yml files in the folder
  3. Singularity: you can create a singularity sif image locally by using the singularity.def and environment.yml files in the folder
  4. Local installation: we provide an assets/install.sh script that will take care of installing all the dependencies.

Using anaconda is the easiest to run almost all components of the workflow, the only exception being mafTools. This can be installed locally using the assets/install_maftools.sh script, that will take care of the installation in your linux or macOS machine. The Singularity and Docker containers contain mafTools. If you need further information on the installation of the dependencies, you can have a look at the specific wiki page

Quick start

After obtaining nextflow, to run the nf-LO workflow to align the S. cerevisiae and S. pombe genomes pulled directly from iGenomes simply type:

./nextflow run evotools/nf-LO --igenomes_target sacCer3 --igenomes_source EF2 --distance far --aligner minimap2 -profile conda -latest --outdir ./my_liftover_minimap2

This command will use anaconda to obtain the required dependencies and output a chain file compatible with the liftOver utility to the my_liftover_minmap2 folder. See below for more information on how to alternatively use docker, or to manually install the required tools.

Resource management

By default, nf-LO will attempt to use all cores available - 1 and the total amount of memory reserved by the java virtual machine. For most installation, it means that the workflow will use up to 3.GB of memory and almost all cores accessible. Users can customize these values in case the memory and/or cpus requested are not enough, or if the user is running the workflow on a cluster system. To do so, users can specify the settings as follow:

  1. --max_cpus: maximum number of cpus requested and used by the tasks (e.g. --max_cpus 4 will use at most 4 cpus for a single job)
  2. --max_time: maximum time to use for a single job (e.g. --max_time 12.h will run a task for at most 12 hours)
  3. --max_memory: maximum memory used by a single job (e.g. --max_memory 16.GB will use at most 16 GB of ram for a job)

Profiles

nf-LO comes with a series of pre-defined profiles:

  • standard: this profile runs all dependencies using anaconda
  • local: runs using local exe instead of containerized/conda dependencies (see manual installation for further details)
  • conda: runs the dependencies within conda
  • uge: runs using UGE scheduling system
  • sge: runs using SGE scheduling system
  • Additional profiles: see additional profiles supported here

Inputs

There are three different ways a user can specify genomes to align. Note in each case the source genome is the genome of origin, from which you which to lift the positions. The target genome is the genome to which you wish to lift the positions. We recommend to use soft-masked genomes to reduce the computation time for aligners such as lastz.

Custom fasta

The source and target genomes can be specified as local or remote (un)compressed fasta files using the --source and --target flags.

Download from NCBI

nf-LO can download fasta files from ncbi directly using the datasets API. Users provide a GCA/GCF code in the --source/--target field, and add the --ncbi_source and --ncbi_target flags as follow:

nextflow run evotools/nf-LO --source "GCF_001549955.1" --target "GCF_011751205.1" --ncbi_source --ncbi_target  -profile conda 

Download from iGenomes

nf-LO can also download genomes from the iGenomes site. Users provide a GCA/GCF code in the --source/--target field, and add the --igenomes_source and --igenomes_target flags as follow:

nextflow run evotools/nf-LO --source "equCab2" --target "dm6" --igenomes_source --target_igenome -profile conda 

Note it is possible to mix source and target flags. For example using --igenomes_source with --ncbi_target.

Customize the run

The workflow will provide some custom configuration for the different algorithms and distances. NOTE: the alignment stage heavily affects the results of the chaining process, so we strongly recommend to perform different tests with different configurations, including custom ones. To see the presets available and how to fine-tune the pipeline go to our Alignments wiki page. The chain/net generation can also be fine-tuned to achieve better results (see Chain/Netting).

UPDATE 07/06/2022: it is now possible to specify customized conservation scores as q files (see here for examples) using the --qscores options and providing the correct input file.

Resources

If you're running the workflow on a local workstation, single node or a local server we recommend to define the maximum amount of cores and memory for each job. You can set that using the --max_memory NCPU and --max_cpus 'MEM.GB', where NCPU is the maximum number of cpus per task and MEM is the maximum amount of memory for a single task.

Example

To test the pipeline locally, simply run:

nextflow run evotools/nf-LO -profile test,conda

This will download and run the pipeline on the two toy genomes provided and generate liftover files. If you have all dependencies installed locally you can omit conda from the profile configuration.

Alternatively, you can run it on your own genomes using a command like this:

nextflow run evotools/nf-LO \
    --source genome1 \
    --target genome2 \
    --annotation myfile.gff \
    --annotation_format gff \
    --distance near \
    --aligner lastz \
    --tgtSize 10000000 \
    --tgtOvlp 100000 \
    --srcSize 20000000 \
    --liftover_algorithm crossmap \
    --outdir ./my_liftover \
    --publish_dir_mode copy \
    --max_cpus 8 \
    --max_memory 32.GB \
    -profile conda 

This analysis will run using genome1 and genome2 as source and target, respectively. The source genome will be fragmented in chunks of 20Mb, whereas the target will be fragmented in 10Mb chunks overlapping 100Kb. It will use lastz as the aligner using the preset for closely related genomes (near). The output files will be copied into the folder my_liftover.

Citing nf-LO

To cite nf-LO, please refer to:

nf-LO: A scalable, containerised workflow for genome-to-genome lift over
Andrea Talenti, James Prendergast
Genome Biology and Evolution, 2021;, evab183, https://doi.org/10.1093/gbe/evab183

References

Adaptive seeds tame genomic sequence comparison. Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC. Genome Res. 2011 21(3):487-93; http://dx.doi.org/10.1101/gr.113985.110

Harris, R.S. (2007) Improved pairwise alignment of genomic DNA. Ph.D. Thesis, The Pennsylvania State University

Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. http://dx.doi.org/10.1093/bioinformatics/bty191

Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64

Zhao, H., Sun, Z., Wang, J., Huang, H., Kocher, J.-P., & Wang, L. (2013). CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics (Oxford, England), btt730

Lin, HN., Hsu, WL. GSAlign: an efficient sequence alignment tool for intra-species genomes. BMC Genomics 21, 182 (2020). https://doi.org/10.1186/s12864-020-6569-1

nf-lo's People

Contributors

bunop avatar prenderj avatar renzotale88 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nf-lo's Issues

Too large of a bash script for slurm in chainMerge

Hi,
I'm running a job on to mammalian genomes.

nextflow run ../bosTauTomBal/nf-LO --source ../bosTau6.fa --target /project/mchaisso_100/projects/Whales/mPhosSin1.pri/assembly.orig.fasta --max-cpus 16 --max_memory 64.GB -with-trace -with-report -resume -profile slurm,conda

At the step:
[54/e0d255] process > ALIGNER:chainMerge (chainmerge) [100%] 3 of 3, failed: 3, retries: 2
[- ] process > ALIGNER:chainNet -
[- ] process > ALIGNER:netSynt -
[- ] process > ALIGNER:chainsubset -
[- ] process > ALIGNER:chain2maf -
[- ] process > ALIGNER:name_maf_seq -
[- ] process > ALIGNER:mafstats -
[9d/5d42ff] NOTE: Error submitting process 'ALIGNER:chainMerge (chainmerge)' for execution -- Execution is retried (1)
[55/e09c1b] NOTE: Error submitting process 'ALIGNER:chainMerge (chainmerge)' for execution -- Execution is retried (2)
[54/e0d255] NOTE: Error submitting process 'ALIGNER:chainMerge (chainmerge)' for execution -- Error is ignored

slurm is not able to submit the command:
sbatch $PWD/work/54/e0d255bdfcaa440fbfe84e1e09cee3/.command.run
sbatch: error: Batch job submission failed: Pathname of a file, directory or other parameter too long

The problem is the file is too large (5.3M):

-rw-rw---- 1 mchaisso mchaisso_100 5.3M Dec 29 19:09 work/54/e0d255bdfcaa440fbfe84e1e09cee3/.command.run

I think the default slurm max script is 4.5 M, and I don't have permissions to increase this. The problem is in the zillions of rm and ln lines in nxf_stage(). Ideally nextflow could do the work of creating a script that runs the rm and ln's, and to submit that.

grouptgt error

[0f/a27c7f] NOTE: Process PREPROC:grouptgt (grouptgt) terminated with an error exit status (127) -- Execution is retried (1)
[6c/14fce7] NOTE: Process PREPROC:grouptgt (grouptgt) terminated with an error exit status (127) -- Execution is retried (2)
[7e/a18dee] NOTE: Process PREPROC:grouptgt (grouptgt) terminated with an error exit status (127) -- Error is ignored
WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info.
Completed at: 20-Nov-2023 20:40:11

Having trouble in test nf-LO using conda

I run this in the termimal in conda env nf-LO
./nextflow run evotools/nf-LO -profile test,conda

and returns the following info:
N E X T F L O W ~ version 22.10.0
Launching https://github.com/evotools/nf-LO [sick_kimura] DSL2 - revision: 5583780 [main]

======================================
__ _ ____
/ | | | / __
_ __ | |
______ | | | | | |
| '_ | | || | | | | | |
| | | | | | |
| |
| |
|
| ||| |____/

  Nextflow LiftOver v 1.7.0

======================================
source : https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/549/955/GCF_001549955.1_ASM154995v1/GCF_001549955.1_ASM154995v1_genomic.fna.gz
target : https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/011/751/205/GCF_011751205.1_ASM1175120v1/GCF_011751205.1_ASM1175120v1_genomic.fna.gz
aligner : lastz
distance : medium
custom align : false
custom chain : false
source chunk : 1000000
source overlap : 0
target chunk : 2000000
target overlap : 100000
output folder : /media/ggj/WHY/package/nextflow/outputs
liftover name : liftover
annot : https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/549/955/GCF_001549955.1_ASM154995v1/GCF_001549955.1_ASM154995v1_genomic.gff.gz
annot type : gff
liftover meth. : liftover
igenomes_base : s3://ngi-igenomes/igenomes/
igenomes_ignore : false
mamba : false
no_maf : false
skip netsynt : false
max cpu : 48
max mem : 4.GB
max rt : 240.h
Using CrossMap
[- ] process > PREPROC:src2bit -
[- ] process > PREPROC:src2bit -
[- ] process > PREPROC:tgt2bit -
executor > local (1)
[1d/9a29d4] process > PREPROC:src2bit (src2Bit) [ 0%] 0 of 1
executor > local (8)
[f5/18c1ae] process > PREPROC:src2bit (src2Bit) [100%] 3 of 3, failed: 3, retries: 2 ✔
[b2/334e03] process > PREPROC:tgt2bit (tgt2Bit) [100%] 1 of 1, failed: 1, retries: 1
[c8/9af8a0] process > PREPROC:splitsrc (splitsrc) [100%] 3 of 3, failed: 3, retries: 2 ✔
[- ] process > PREPROC:groupsrc -
[5a/486c00] process > PREPROC:splittgt (splittgt) [ 0%] 0 of 1
[- ] process > PREPROC:grouptgt -
[- ] process > PREPROC:pairs -
[- ] process > ALIGNER:lastz -
executor > local (12)
[f5/18c1ae] process > PREPROC:src2bit (src2Bit) [100%] 3 of 3, failed: 3, retries: 2 ✔
[b0/1f643c] process > PREPROC:tgt2bit (tgt2Bit) [100%] 3 of 3, failed: 3, retries: 2 ✔
[c8/9af8a0] process > PREPROC:splitsrc (splitsrc) [100%] 3 of 3, failed: 3, retries: 2 ✔
[- ] process > PREPROC:groupsrc -
[b7/e1aa20] process > PREPROC:splittgt (splittgt) [100%] 3 of 3, failed: 3, retries: 2 ✔
[- ] process > PREPROC:grouptgt -
[- ] process > PREPROC:pairs -
[- ] process > ALIGNER:lastz -
[- ] process > ALIGNER:axtChain -
[- ] process > ALIGNER:chainMerge -
[- ] process > ALIGNER:chainNet -
[- ] process > ALIGNER:netSynt -
[- ] process > ALIGNER:chainsubset -
[- ] process > ALIGNER:chain2maf -
[- ] process > ALIGNER:name_maf_seq -
executor > local (12)
[f5/18c1ae] process > PREPROC:src2bit (src2Bit) [100%] 3 of 3, failed: 3, retries: 2 ✔
[b0/1f643c] process > PREPROC:tgt2bit (tgt2Bit) [100%] 3 of 3, failed: 3, retries: 2 ✔
[c8/9af8a0] process > PREPROC:splitsrc (splitsrc) [100%] 3 of 3, failed: 3, retries: 2 ✔
[- ] process > PREPROC:groupsrc -
[b7/e1aa20] process > PREPROC:splittgt (splittgt) [100%] 3 of 3, failed: 3, retries: 2 ✔
[- ] process > PREPROC:grouptgt -
[- ] process > PREPROC:pairs -
[- ] process > ALIGNER:lastz -
[- ] process > ALIGNER:axtChain -
[- ] process > ALIGNER:chainMerge -
[- ] process > ALIGNER:chainNet -
[- ] process > ALIGNER:netSynt -
[- ] process > ALIGNER:chainsubset -
[- ] process > ALIGNER:chain2maf -
[- ] process > ALIGNER:name_maf_seq -
[- ] process > ALIGNER:mafstats -
[- ] process > LIFTOVER:lifter -
[- ] process > LIFTOVER:features_stats -
[- ] process > make_report -
Staging foreign file: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/549/955/GCF_001549955.1_ASM154995v1/GCF_001549955.1_ASM154995v1_genomic.fna.gz

[1d/9a29d4] NOTE: Process PREPROC:src2bit (src2Bit) terminated with an error exit status (1) -- Execution is retried (1)
[d5/a639d1] NOTE: Process PREPROC:splitsrc (splitsrc) terminated with an error exit status (1) -- Execution is retried (1)
[49/3aef03] NOTE: Process PREPROC:src2bit (src2Bit) terminated with an error exit status (1) -- Execution is retried (2)
[81/d5e694] NOTE: Process PREPROC:splitsrc (splitsrc) terminated with an error exit status (1) -- Execution is retried (2)
[f5/18c1ae] NOTE: Process PREPROC:src2bit (src2Bit) terminated with an error exit status (1) -- Error is ignored
[c8/9af8a0] NOTE: Process PREPROC:splitsrc (splitsrc) terminated with an error exit status (1) -- Error is ignored
[b2/334e03] NOTE: Process PREPROC:tgt2bit (tgt2Bit) terminated with an error exit status (1) -- Execution is retried (1)
[5a/486c00] NOTE: Process PREPROC:splittgt (splittgt) terminated with an error exit status (1) -- Execution is retried (1)
[d3/70a534] NOTE: Process PREPROC:tgt2bit (tgt2Bit) terminated with an error exit status (1) -- Execution is retried (2)
[e0/caf685] NOTE: Process PREPROC:splittgt (splittgt) terminated with an error exit status (1) -- Execution is retried (2)
[b0/1f643c] NOTE: Process PREPROC:tgt2bit (tgt2Bit) terminated with an error exit status (1) -- Error is ignored
[b7/e1aa20] NOTE: Process PREPROC:splittgt (splittgt) terminated with an error exit status (1) -- Error is ignored

looking forward to your reply!Thanks!

Issue regarding 'nextflow run evotools/nf-LO'

Hello, thank you for developing this tool, I tried to run the following code :
nextflow run evotools/nf-LO --source /home/sealight1999/references/GRCH38/GRCh38_no_alt.fna --target /home/sealight1999/references/CHM13/ chm13v2.0_1.fa --aligner minimap2 –outdir /home/sealight1999/references -profile local
But encountered following warnings:
[c3/9fda46] NOTE: Process PREPROC:tgt2bit (tgt2Bit) terminated with an error exit status (127) -- Execution is retried (1)
[ae/47e4dc] NOTE: Process PREPROC:splitsrc (splitsrc) terminated with an error exit status (127) -- Execution is retried (1)
[e7/fdb913] NOTE: Process PREPROC:src2bit (src2Bit) terminated with an error exit status (127) -- Execution is retried (1)
[fd/3eadf0] NOTE: Process PREPROC:splittgt (splittgt) terminated with an error exit status (127) -- Execution is retried (1)
[03/acbfe7] NOTE: Process PREPROC:tgt2bit (tgt2Bit) terminated with an error exit status (127) -- Execution is retried (2)
[8e/ecf45f] NOTE: Process PREPROC:splitsrc (splitsrc) terminated with an error exit status (127) -- Execution is retried (2)
[22/e11aa5] NOTE: Process PREPROC:src2bit (src2Bit) terminated with an error exit status (127) -- Execution is retried (2)
[62/97c512] NOTE: Process PREPROC:splittgt (splittgt) terminated with an error exit status (127) -- Execution is retried (2)
[83/bd21fa] NOTE: Process PREPROC:tgt2bit (tgt2Bit) terminated with an error exit status (127) -- Error is ignored
[3f/930365] NOTE: Process PREPROC:splitsrc (splitsrc) terminated with an error exit status (127) -- Error is ignored
[9e/fdca06] NOTE: Process PREPROC:src2bit (src2Bit) terminated with an error exit status (127) -- Error is ignored
[77/94ba10] NOTE: Process PREPROC:splittgt (splittgt) terminated with an error exit status (127) -- Error is ignored
WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info.

What could be wrong? Thank you!

Failed in ALIGNER:lastz

Hello! I am using nf-LO to bulid liftover chains for Facs6toFacs5, but I met the following errs. Does it mean the disk space is not enough? (I am not sure) Looking forward to your reply. Thanks!
BTW,I use conda environmet and the example data runs successfully(-profile test,conda).

Here is the err info:
executor > local (50)
[37/3b6a75] process > PREPROC:src2bit (src2Bit) [100%] 1 of 1 ✔
[08/9a11ef] process > PREPROC:tgt2bit (tgt2Bit) [100%] 1 of 1 ✔
[88/1c52c2] process > PREPROC:splitsrc (splitsrc) [100%] 1 of 1 ✔
[6f/def6e1] process > PREPROC:groupsrc (groupsrc) [100%] 1 of 1 ✔
[ba/1794c8] process > PREPROC:splittgt (splittgt) [100%] 1 of 1 ✔
[17/acab5e] process > PREPROC:grouptgt (grouptgt) [100%] 1 of 1 ✔
[87/a62aee] process > PREPROC:pairs (mkpairs) [100%] 1 of 1 ✔
[90/f5326a] process > ALIGNER:lastz (lastz_near.2... [ 3%] 10 of 265, failed...
[6a/8f2413] process > ALIGNER:axtChain (axtchain_n) [100%] 7 of 7
[- ] process > ALIGNER:chainMerge -
[- ] process > ALIGNER:chainNet -
[- ] process > ALIGNER:chainsubset -
[- ] process > ALIGNER:chain2maf -
[- ] process > ALIGNER:name_maf_seq -
[- ] process > ALIGNER:mafstats -
[db/556bfd] NOTE: Process ALIGNER:lastz (lastz_near.11.tgt8) failed -- Execution is retried (1)
[ac/e063b1] NOTE: Process ALIGNER:lastz (lastz_near.6.tgt10) failed -- Execution is retried (1)
[26/33681b] NOTE: Process ALIGNER:lastz (lastz_near.6.tgt1) failed -- Execution is retried (1)

executor > local (54)
[37/3b6a75] process > PREPROC:src2bit (src2Bit) [100%] 1 of 1 ✔
[08/9a11ef] process > PREPROC:tgt2bit (tgt2Bit) [100%] 1 of 1 ✔
[88/1c52c2] process > PREPROC:splitsrc (splitsrc) [100%] 1 of 1 ✔
[6f/def6e1] process > PREPROC:groupsrc (groupsrc) [100%] 1 of 1 ✔
[ba/1794c8] process > PREPROC:splittgt (splittgt) [100%] 1 of 1 ✔
[17/acab5e] process > PREPROC:grouptgt (grouptgt) [100%] 1 of 1 ✔
[87/a62aee] process > PREPROC:pairs (mkpairs) [100%] 1 of 1 ✔
[a6/1e196a] process > ALIGNER:lastz (lastz_near.9... [ 5%] 14 of 270, failed...
[6a/8f2413] process > ALIGNER:axtChain (axtchain_n) [100%] 7 of 7
[- ] process > ALIGNER:chainMerge -
[- ] process > ALIGNER:chainNet -
[- ] process > ALIGNER:chainsubset -
[- ] process > ALIGNER:chain2maf -
[- ] process > ALIGNER:name_maf_seq -
[- ] process > ALIGNER:mafstats -
[db/556bfd] NOTE: Process ALIGNER:lastz (lastz_near.11.tgt8) failed -- Execution is retried (1)
[ac/e063b1] NOTE: Process ALIGNER:lastz (lastz_near.6.tgt10) failed -- Execution is retried (1)
[26/33681b] NOTE: Process ALIGNER:lastz (lastz_near.6.tgt1) failed -- Execution is retried (1)
[13/b53784] NOTE: Process ALIGNER:lastz (lastz_near.7.tgt21) failed -- Execution is retried (1)
[4e/94bcd1] NOTE: Process ALIGNER:lastz (lastz_near.11.tgt22) failed -- Execution is retried (1)
[37/566b5a] NOTE: Process ALIGNER:lastz (lastz_near.11.tgt4) failed -- Execution is retried (1)
[b9/3c93d9] NOTE: Process ALIGNER:lastz (lastz_near.11.tgt1) failed -- Execution is retried (1)

ALIGNER:chainMerge (chainmerge) FAILED

Dear nd-LO developers,

I'm trying to align two local genome assemblies using nfLO, and get the following error at the bottom of the 'reports/execution_trace.txt'
file. Could you please help me figure out why?

701855  fc/df1139       144312  ALIGNER:axtChain (axtchain_m)   COMPLETED       0       2023-02-06 05:52:10.789 926ms   733ms   101.9%  133.3 MB        415.4 MB      90 MB   3.6 KB
701856  ca/843384       160553  ALIGNER:chainMerge (chainmerge) FAILED  139     2023-02-06 06:22:34.216 21.1s   21s     -       -       -       -       -
701857  dd/25d4a1       160837  ALIGNER:chainMerge (chainmerge) FAILED  139     2023-02-06 06:23:02.349 21.1s   21.1s   -       -       -       -       -
701858  76/995157       161063  ALIGNER:chainMerge (chainmerge) FAILED  139     2023-02-06 06:23:30.339 21.5s   21.5s   -       -       -       -       -
```````

The code I used was:

 nextflow run evotools/nf-LO --source ../GCA_021130815.1_PanTigT.MC.v3_genomic.fna.gz --target ~/data/hg38.fa.masked.gz --outdir ../NFLOW


When the code failed, it had created three directories: genome2bit, singlechains, and reports. 

Please let me know if you need more information from me,
Shweta

environment.yml error even while running via docker

Hello,

I am having similar issues with creating environment.yml during the workflow run. I am running it using docker using the following command,

nextflow run evotools/nf-LO -with-docker nflo:latest --source source.fa --target target.fa --distance near --aligner lastz --outdir Output_nfLo --profile conda

But, it runs all the way to make_report, and then stalls while trying to create the conda environment, and complains there are lot of versions of the packages in conflict. I thought running it via docker would not cause any issue with dependency.

Any idea what I may be doing wrong ?

having trouble installing nf-lo manually

I've tried install nf-lo manually with this command:

wget https://raw.githubusercontent.com/evotools/nf-LO/main/install.sh

but it returned a 404 error. the log is displayed below:

--2023-06-07 16:14:11-- https://raw.githubusercontent.com/evotools/nf-LO/main/install.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-06-07 16:14:11 ERROR 404: Not Found.

I'm looking forward to a reply. thanks.

error exit status (255)

Hi,

I am getting this general error and cant figure out why.
This is the output I get.
Appreciate your help
Thank you

pipeline commands

Is there an option for nf-LO to simply generate the pipeline commands into a text file, so that I can run every step one-by-one.

nf-LO Keeps randomly crashing, is has been a highly frustrating experience.

The current r-environment.yml never resolves

Hello.

Thank you for no-LO.

I am running it with —annotation and —annotation_format. I am using the conda profile.

It runs all the way to make_report, and then stalls while trying to create the conda environment.

I have tried to run conda env create directly to troubleshoot. Essentially, it gets stuck in the dependency solver section. By using the newer lmamba solver, it appears that some of the versions of the packages conflict. I am wondering if the versions could be relaxed or removed all together.

Thanks again.

PS: It is the dependency hell problem. 😣

Process requirement exceed available CPUs -- req: 2; avail: 1

Hi I am trying to use nf-LO pipeline and I am running into some errors. On my server I had specified the nflo pipeline to use 8 cpus and 32gb of RAM but it fails with the error " Process requirement exceed available CPUs -- req: 2; avail: 1" I was wondering why is that when I am giving it 8 cpus in both my lsf command and in the nextflow command.

nextflow run evotools/nf-LO
--source /scratch1/fs1/allegra.petti/khan.saad/Glass_synapse/reference/human_g1k_v37.fasta
--target /scratch1/fs1/allegra.petti/khan.saad/liftover_chain/hg19_igenomes/genome.fa
--aligner lastz
--tgtSize 10000000
--tgtOvlp 100000
--srcSize 20000000
--liftover_algorithm crossmap
--outdir ./my_liftover
--publish_dir_mode copy
--max_cpus 8
--max_memory 120.GB \

JAVA_HOME='/venv' export PATH=/opt/conda/bin:$PATH PATH=/venv/bin:$PATH LSF_DOCKER_VOLUMES='/home/khan.saad/:/home/khan.saad/ /storage1/fs1/allegra.petti/Active/:/storage1/fs1/al legra.petti/Active/ /scratch1/fs1/allegra.petti/:/scratch1/fs1/allegra.petti/' bsub -oo logs/nexflow_nflo.%J -G compute-allegra .petti -g /khan.saad/R_seurat -q general -M 128000000 -n 8 -R 'rusage[mem=128000]' -a 'docker(smk5g5/nf-lo:1.0.0)' /bin/bash ./ nflo.sh

Below is the error I am getting.

Command output:
(empty)

Work dir:
/scratch1/fs1/allegra.petti/khan.saad/liftover_chain/work/ae/5dbc7b0dff31f4dbfda486663a4019

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

WARN: Killing pending tasks (1)

executor > local (7044)
[99/b8dfba] process > PREPROC:src2bit (src2Bit) [100%] 1 of 1 ✔
[77/b6fe1e] process > PREPROC:tgt2bit (tgt2Bit) [100%] 1 of 1 ✔
[87/500b9a] process > PREPROC:splitsrc (splitsrc) [100%] 1 of 1 ✔
[5d/0611b0] process > PREPROC:groupsrc (groupsrc) [100%] 1 of 1 ✔
[ca/a47e09] process > PREPROC:splittgt (splittgt) [100%] 1 of 1 ✔
[c4/622932] process > PREPROC:grouptgt (grouptgt) [100%] 1 of 1 ✔
[03/40d546] process > PREPROC:pairs (mkpairs) [100%] 1 of 1 ✔
[e1/c85215] process > ALIGNER:lastz (lastz_med.sr... [ 15%] 7009 of 46610, fa...
[8f/e0eb1a] process > ALIGNER:axtChain (axtchain_m) [ 0%] 28 of 7007
[- ] process > ALIGNER:chainMerge -
[- ] process > ALIGNER:chainNet -
[- ] process > ALIGNER:netSynt -
[- ] process > ALIGNER:chainsubset -
[- ] process > ALIGNER:chain2maf -
[- ] process > ALIGNER:name_maf_seq -
[- ] process > ALIGNER:mafstats -
[d1/3b95a6] NOTE: Process ALIGNER:lastz (lastz_med.src93.tgt68) terminated with an error exit status (1) -- Execution is retried (1)
Error executing process > 'ALIGNER:lastz (lastz_med.src93.tgt68)'

Caused by:
Process requirement exceed available CPUs -- req: 2; avail: 1

Command executed:

echo B=0 C=0 E=30 H=0 K=3000 L=3000 M=50 O=400 T=1 Y=9400
lastz /scratch1/fs1/allegra.petti/khan.saad/liftover_chain/work/5d/0611b088a66d4b63a1adb6961c96d7/CLUST_src/src93.fa /scratch1/fs1/allegra.petti/khan.saad/liftover_chain/work/c4/622932c827e301415a2268ee960622/CLUST_tgt/tgt68.fa B=0 C=0 E=30 H=0 K=3000 L=3000 M=50 O=400 T=1 Y=9400 --ambiguous=iupac --format=lav |
lavToPsl stdin stdout |
liftUp -type=.psl stdout source.lift warn stdin |
liftUp -type=.psl -pslQ src93.tgt68.psl target.lift warn stdin

Command exit status:

Command output:
(empty)

Work dir:
/scratch1/fs1/allegra.petti/khan.saad/liftover_chain/work/ae/5dbc7b0dff31f4dbfda486663a4019

Process `ALIGNER:chainMerge (chainmerge)` terminated with an error exit status (1) -- Execution is retried (1)

I got the error information when runing the n-Lo,this log showing in blow:

WARNING: Could not load nf-core/config profiles: https://raw.githubusercontent.com/nf-core/configs/master/nfcore_custom.config
Launching https://github.com/evotools/nf-LO [pedantic_ptolemy] DSL2 - revision: 312fb56 [main]

======================================
__ _ ____
/ | | | / __
_ __ | |
______ | | | | | |
| '_ | | || | | | | | |
| | | | | | |
| |
| |
|
| ||| |____/

  Nextflow LiftOver v 1.6.0

======================================
source : chrY.oldVersin.fa
target : DRC.chrY.fa
aligner : minimap2
distance : medium
custom align : -cx asm5 -l 10000
custom chain : false
source chunk : 20000000
source overlap : 0
target chunk : 10000000
target overlap : 100000
output folder : ./liftover_Minimap2
liftover name : liftover
annot : false
annot type : false
liftover meth. : liftover
igenomes_base : s3://ngi-igenomes/igenomes/
igenomes_ignore : false
no_maf : false
skip netsynt : true
max cpu : 143
max mem : 26 GB
max rt : 240.h
Using CrossMap
executor > local (5)
[13/61a6e5] process > PREPROC:src2bit (src2Bit) [100%] 1 of 1 ✔
executor > local (10)
[13/61a6e5] process > PREPROC:src2bit (src2Bit) [100%] 1 of 1 ✔
[cb/e667b4] process > PREPROC:tgt2bit (tgt2Bit) [100%] 1 of 1 ✔
[3e/c206c5] process > PREPROC:splitsrc (splitsrc) [100%] 1 of 1 ✔
[83/de485d] process > PREPROC:splittgt (splittgt) [100%] 1 of 1 ✔
[73/c5a594] process > PREPROC:pairs (mkpairs) [100%] 1 of 1 ✔
[58/cefc2c] process > ALIGNER:minimap2 (minimap2.medium.src.tgt) [100%] 1 of 1 ✔
[26/7d4da4] process > ALIGNER:axtChain (axtchain_m) [100%] 1 of 1 ✔
[33/4e27ca] process > ALIGNER:chainMerge (chainmerge) [100%] 3 of 3, failed: 3, retries: 2 ✔
[- ] process > ALIGNER:chainNet -
[- ] process > ALIGNER:chainsubset -
[- ] process > ALIGNER:chain2maf -
[- ] process > ALIGNER:name_maf_seq -
[- ] process > ALIGNER:mafstats -
[c7/cb6d2d] NOTE: Process ALIGNER:chainMerge (chainmerge) terminated with an error exit status (1) -- Execution is retried (1)
[8d/9085f1] NOTE: Process ALIGNER:chainMerge (chainmerge) terminated with an error exit status (1) -- Execution is retried (2)
[33/4e27ca] NOTE: Process ALIGNER:chainMerge (chainmerge) terminated with an error exit status (1) -- Error is ignored

A problem while using nf-LO

Hi,
When I using nf-LO to get chain file, I met a problem.
Below is my code.

get_chain_file(){
  source ~/miniconda3/bin/activate nf-LO
  nextflow run evotools/nf-LO --source $1 \
    --target $2 \
    --annotation $3 \
    --annotation_format gff \
    --distance near \
    --aligner $4 \
    --liftover_algorithm liftover \
    --outdir $5 \
    --max_cpus $6 \
    --max_memory 8.GB \
    -profile conda
}

get_chain_file $dhl92_ref $ref_13C $dhl92_gff GSAlign DHL92_Lift_To_13C 48

Below is error log:
executor > local (18)
[0a/0b9198] process > PREPROC:src2bit (src2Bit) [100%] 1 of 1 ✔
[ca/375eee] process > PREPROC:tgt2bit (tgt2Bit) [100%] 1 of 1 ✔
[d6/67237f] process > PREPROC:splitsrc (splitsrc) [100%] 1 of 1 ✔
[06/67ee7a] process > PREPROC:splittgt (splittgt) [100%] 1 of 1 ✔
[3c/cd32ff] process > PREPROC:pairs (mkpairs) [100%] 1 of 1 ✔
[2f/9671a6] process > ALIGNER:bwt_index (bwt_index) [100%] 1 of 1 ✔
[fc/c8c3bd] process > ALIGNER:gsalign (gsalign_ne... [100%] 1 of 1 ✔
[46/9b560b] process > ALIGNER:axtChain (axtchain_n) [100%] 1 of 1 ✔
[1d/f4c0bd] process > ALIGNER:chainMerge (chainme... [100%] 1 of 1 ✔
[2e/e1b000] process > ALIGNER:chainNet (chainnet) [100%] 1 of 1 ✔
[0c/2ea9e2] process > ALIGNER:netSynt (netSyntenic) [100%] 1 of 1 ✔
[98/9f1882] process > ALIGNER:chainsubset (chains... [100%] 1 of 1 ✔
[48/1eb5ae] process > ALIGNER:chain2maf (chainmaf) [100%] 1 of 1 ✔
[4e/b212f0] process > ALIGNER:name_maf_seq (namemaf) [100%] 1 of 1 ✔
[eb/1065eb] process > ALIGNER:mafstats (mafstats) [100%] 1 of 1 ✔
[1e/428505] process > LIFTOVER:lifter (crossmap) [100%] 3 of 3, failed: 3...
[- ] process > LIFTOVER:features_stats -
[- ] process > make_report -
[5d/627e71] NOTE: Process LIFTOVER:lifter (crossmap) terminated with an error exit status (127) -- Execution is retried (1)
[43/56ce02] NOTE: Process LIFTOVER:lifter (crossmap) terminated with an error exit status (127) -- Execution is retried (2)
[1e/428505] NOTE: Process LIFTOVER:lifter (crossmap) terminated with an error exit status (127) -- Error is ignored

executor > local (18)
[0a/0b9198] process > PREPROC:src2bit (src2Bit) [100%] 1 of 1 ✔
[ca/375eee] process > PREPROC:tgt2bit (tgt2Bit) [100%] 1 of 1 ✔
[d6/67237f] process > PREPROC:splitsrc (splitsrc) [100%] 1 of 1 ✔
[06/67ee7a] process > PREPROC:splittgt (splittgt) [100%] 1 of 1 ✔
[3c/cd32ff] process > PREPROC:pairs (mkpairs) [100%] 1 of 1 ✔
[2f/9671a6] process > ALIGNER:bwt_index (bwt_index) [100%] 1 of 1 ✔
[fc/c8c3bd] process > ALIGNER:gsalign (gsalign_ne... [100%] 1 of 1 ✔
[46/9b560b] process > ALIGNER:axtChain (axtchain_n) [100%] 1 of 1 ✔
[1d/f4c0bd] process > ALIGNER:chainMerge (chainme... [100%] 1 of 1 ✔
[2e/e1b000] process > ALIGNER:chainNet (chainnet) [100%] 1 of 1 ✔
[0c/2ea9e2] process > ALIGNER:netSynt (netSyntenic) [100%] 1 of 1 ✔
[98/9f1882] process > ALIGNER:chainsubset (chains... [100%] 1 of 1 ✔
[48/1eb5ae] process > ALIGNER:chain2maf (chainmaf) [100%] 1 of 1 ✔
[4e/b212f0] process > ALIGNER:name_maf_seq (namemaf) [100%] 1 of 1 ✔
[eb/1065eb] process > ALIGNER:mafstats (mafstats) [100%] 1 of 1 ✔
[1e/428505] process > LIFTOVER:lifter (crossmap) [100%] 3 of 3, failed: 3...
[- ] process > LIFTOVER:features_stats -
[- ] process > make_report -

Completed at: 20-Feb-2024 17:20:07
Duration : 12m 32s
CPU hours : 2.5 (0.1% failed)
Succeeded : 15
Ignored : 1
Failed : 3

I don't know what happend because I can't get a detail error log. How to solve this problem?
Much appreciate!

Minimap2 and gsalign using too few cpus

Hello @RenzoTale88.

My machine has 16 CPUs, and by default no-LO sets the max to 15 cpus. I think that is reasonable.

But, I noticed that both minimap2 and gsalign, which share the cpus definition in the config files, end up with only 7 cpus while running. Is there a particular reason for halving the cpus for these two tools?

I noticed that the function check_max returns the minimum of max cpus / 2 * the number of tries and the max number of cpus. So, it will always return at most half the number of max cpus. Is that the intended behaviour?

Thanks again.

minimap2: Stdin is empty

Hi,

I'm running nf-LO between hg38 and pantro6 using the minimap2 preset. The docker container used was built as suggested in the docs. I'm running this on a SLURM cluster launched via a login node. It looks like minimap2 fails but the .exitcode is still 0 so the downstream processes fail. I can confirm that the same files and settings work with lastz but not minimap2.

I've also noticed that only one minimap2 process spawns whereas a few hundred lastz processes typically spawn. It seems that splitsrc and splittgt each return a large single multifasta.

run script:

nextflow run evotools/nf-LO \
	-latest \
	-c phoenix.config \
	--outdir pantro6_to_hg38_nflo \
	--distance near \
	--aligner minimap2 \
	--max_cpus 32 \
	--max_memory 128.GB \
	-w work3 \
	--source panTro6_cleaned.fa \
	--target hg38_cleaned.fa \
	-with-tower \
	-profile docker -with-docker vpeddu/nflo:latest

process .command.error:

Got 30 lifts in source.lift
Lifting stdin
Got 328 lifts in target.lift
Lifting stdin
[M::mm_idx_gen::42.331*1.45] collected minimizers
[M::mm_idx_gen::68.360*2.79] sorted minimizers
[M::main::68.360*2.79] loaded/built the index for 30 target sequence(s)
[M::mm_mapopt_update::174.858*1.28] mid_occ = 121
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 30
[M::mm_idx_stat::259.800*0.94] distinct minimizers: 213075345 (92.73% are singletons); average occurrences: 1.328; average spacing: 10.079; total length: 2851453592
stdin is empty

Issue about Annotation

Hi,
I met a problem about using nf-LO to liftover my genomes.
About Annotation param, If i support a gff file, this file is source genome's gff or target genome's gff?
Can you help me out?
Much appreciate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.