Coder Social home page Coder Social logo

malonge / ragoo Goto Github PK

View Code? Open in Web Editor NEW
168.0 6.0 28.0 146 KB

RaGOO is no longer supported. Please use RagTag instead: https://github.com/malonge/RagTag

License: MIT License

Python 88.93% Perl 11.07%
bioinformatics genome-scaffolding genome-assembly

ragoo's Introduction

DOI

RaGOO is no longer supported. Please use RagTag instead.

ragoo's People

Contributors

malonge avatar mschatz avatar sam217pa avatar wdecoster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ragoo's Issues

Preventing Ragoo output to be similar too reference

Hi,
I ran ragoo.py -t 8 -g 100 -s -b -C contigs.fa refence.fa. The reference.fa contains a 70bp insertion which is correct. On the other hand, contigs.fa (wild type) does not contain it which is correct, too.

However, Ragoo inserted the 70bp from the reference.fa to its output which I would like to prevent. Is there a better setting in Ragoo to prevent it from happening?

Thank you in advance,

Michal

Lift-over reference annotations to ordered contigs

Hi,

The liftover.py script is able to use the orderings to lift-over annotations from the input contigs onto the newly-ordered contigs (pseudomolecules). Is there a straightforward way to use that ordering information to take annotations from the reference genome and apply them to the ordered contigs? Thanks!

Potential chimeric contig persists after RaGOO

Hi Michael,

  • I've used RaGOO to scaffold my PacBio contigs against a Sanger reference genome. Here is my command:
ragoo.py -b -g 100 -i 0.6 -C "$ASMN" "$REFN"
  • Here is the alignment between the reference I used (y-axis) and the resulting RaGOO scaffolds (x-axis).

s160_2_edited_to_ragoo paf plot

  • Everything looks correct except for the highlighted region at the end of chromosome E. Below is a zoom of that region:

Screen Shot 2019-11-13 at 3 42 52 PM

  • I am confused as to why do I still see a difference between the arrangement of my scaffolds and the reference I used.
  • I understand that my assembly has some extra sequence that is not present in the reference, but I don't understand why that inverted segment on the right side is not placed in the correct spot and orientation.

Let me know if you have any tips to help me understand this.

Installation problem

Hi,

I'm trying to install RaGOO using virtualenv, but even if the installation ended without errors or warnings ragoo.py seems not to be working

(Ragoo_env) [fc464@login-e-12 RaGOO]$ python ragoo.py 
Traceback (most recent call last):
  File "ragoo.py", line 8, in <module>
    from ragoo_utilities.ContigAlignment import ContigAlignment
  File "/home/fc464/software/RaGOO/ragoo_utilities/ContigAlignment.py", line 4, in <module>
    from ragoo_utilities.utilities import summarize_planesweep, binary_search
  File "/home/fc464/software/RaGOO/ragoo_utilities/utilities.py", line 9, in <module>
    complements = str.maketrans("ACGTNURYSWKMBVDHacgtnuryswkmbvdh", "TGCANAYRSWMKVBHDtgcanayrswmkvbhd")
AttributeError: type object 'str' has no attribute 'maketrans'

Am I miss something?
Thanks
F

[morecore] insufficient memory

Hi,
The Contigs.txt and NbV1ChF.fasta are 2.8 G and 2.6G. Minimap2 seems to run out of memory on 2TB machine.

$ ragoo.py -t 8 -g 100 -s -b -gff augustus.hints_utr.gff3 Contigs.txt NbV1ChF.fasta
Mon Oct 28 01:45:52 2019 --- Running : minimap2 -k19 -w19 -t8 ../NbV1ChF.fasta ../Contigs.txt > contigs_against_ref.paf 2> contigs_against_ref.paf.log
Mon Oct 28 01:49:23 2019 --- Reading alignments
Mon Oct 28 01:52:45 2019 --- Getting gff features
Mon Oct 28 01:53:07 2019 --- Getting contigs
Mon Oct 28 01:53:25 2019 --- Finding interchromosomally chimeric contigs
Mon Oct 28 01:53:25 2019 --- Finding break points and breaking interchromosomally chimeric contigs
Mon Oct 28 01:53:45 2019 --- Running : minimap2 -k19 -w19 -t8 ../../NbV1ChF.fasta Contigs.inter.chimera.broken.fa > inter_contigs_against_ref.paf 2> inter_contigs_against_ref.paf.log
Mon Oct 28 01:57:22 2019 --- Reading interchromosomal chimera broken alignments
Mon Oct 28 02:00:58 2019 --- Finding intrachromosomally chimeric contigs
Mon Oct 28 02:01:52 2019 --- Running : minimap2 -k19 -w19 -t8 ../../NbV1ChF.fasta Contigs.intra.chimera.broken.fa > intra_contigs_against_ref.paf 2> intra_contigs_against_ref.paf.log
Mon Oct 28 02:05:25 2019 --- Reading intrachromosomal chimera broken alignments
Mon Oct 28 02:09:19 2019 --- The total number of interchromasomally chimeric contigs broken is 0
Mon Oct 28 02:09:19 2019 --- The total number of intrachromasomally chimeric contigs broken is 6
Mon Oct 28 02:09:19 2019 --- Assigning contigs
Mon Oct 28 02:09:40 2019 --- Ordering and orienting contigs
Mon Oct 28 02:11:01 2019 --- Creating pseudomolecules
Mon Oct 28 05:26:02 2019 --- Aligning pseudomolecules to reference
Mon Oct 28 05:26:02 2019 --- Running : minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
/bin/bash: line 1: 10389 Aborted                 (core dumped) minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/ragoo/bin/ragoo.py", line 4, in <module>
    __import__('pkg_resources').run_script('RaGOO==1.1', 'ragoo.py')
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1469, in run_script
    exec(script_code, namespace, namespace)
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/EGG-INFO/scripts/ragoo.py", line 754, in <module>
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/EGG-INFO/scripts/ragoo.py", line 439, in align_pms
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/ragoo_utilities/utilities.py", line 25, in run
RuntimeError: Failed : minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
less pm_contigs_against_ref.sam.log 
[M::mm_idx_gen::55.511*1.79] collected minimizers
[M::mm_idx_gen::60.192*2.17] sorted minimizers
[M::main::60.192*2.17] loaded/built the index for 19 target sequence(s)
[M::mm_mapopt_update::64.517*2.10] mid_occ = 342
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 19
[M::mm_idx_stat::69.857*2.01] distinct minimizers: 143929053 (83.21% are singletons); average occurrences: 1.928; average spacing: 9.884
[M::worker_pipeline::3024.805*3.40] mapped 5 sequences
[M::worker_pipeline::4921.171*3.81] mapped 6 sequences
[M::worker_pipeline::7761.560*3.75] mapped 4 sequences
[morecore] insufficient memory

How important is the the allignment from Aligning pseudomolecules to reference or can ragoo.fasta been used? How much more memory do you think I would need?

Thank you in advance,

Michal

How reliable these results are?

Hi,

I did a couple of tests and it look like there is a lot of scaffolding going on

There are 10Mya difference between the assembly to scaffold and its reference.

Reference: EisaPacBio.assembly.v1.0.fasta
____________________________
N50: 1613086.0
Number of contigs: 580
Longest contig: 6085293
Genome length: 401008328
My query: Etal.curated.fasta
____________________________
N50: 39539
Number of contigs: 25749
Longest contig: 401059
Genome length: 461252547
The output: ragoo.fasta
____________________________
N50: 2449116
Number of contigs: 504
Longest contig: 97232201
Genome length: 463829447

and this is the way I'm runnig RaGOO

python ragoo.py -t 32 -i 0.5 -b Etal.curated.fasta EisaPacBio.assembly.v1.0.fasta

How reliable is this result?

Thanks
Great tool!
F

SV types from RaGOO output as VCF

Hi Michael,

Thanks for the really cool tool, the speed up with the current version is awesome.
I have been RaGOO for scaffolding and calling SVs between sister species of lizards.

I tried to use the SURVIVOR bedtovcf on the assemblytics_out.Assemblytics_structural_variants.bed with types as DEL/INS/DUP/INV/TRA/BND.
The TRA and BND both don't work. Additionally, I get two groups of outputs :
(1) identical between DEL/INS/DUP/INV - the name of each type is distinct but the same results including coordinates
(2) identical between TRA/BND - shown as type NA

Is there something I am missing here or is the output conversion currently under development?

Pseudomolecules creation step error

Greetings of the season.

I ran RaGoo on a Masurca assembly of short reads data with a chromosomal-scale reference genome.

I got this error

Thu Jan 23 09:53:29 2020 --- Reading alignments
Thu Jan 23 09:53:30 2020 --- Assigning contigs
Thu Jan 23 09:53:30 2020 --- Ordering and orienting contigs
Thu Jan 23 09:53:32 2020 --- Creating pseudomolecules
Traceback (most recent call last):
File "ragoo.py", line 742, in
create_pseudomolecules(contigs_file, all_unique_contigs, g, make_chr0)
File "ragoo.py", line 339, in create_pseudomolecules
remaining_contig_headers.pop(remaining_contig_headers.index('>' + line[0]))
ValueError: '>gi|1046489213|gb|MBSK01000787.1|' is not in list

How can I fix this problem?

Thank you.

Here are the assembly stats

Assembly final.genome.scf

contigs (>= 0 bp) 13835

contigs (>= 1000 bp) 7205

contigs (>= 5000 bp) 4649

contigs (>= 10000 bp) 3841

contigs (>= 25000 bp) 2730

contigs (>= 50000 bp) 1675

Total length (>= 0 bp) 250495326
Total length (>= 1000 bp) 247455423
Total length (>= 5000 bp) 241625893
Total length (>= 10000 bp) 235851007
Total length (>= 25000 bp) 217286364
Total length (>= 50000 bp) 179069540

contigs 9145

Largest contig 780203
Total length 248822296
GC (%) 34.83
N50 85631
N75 44887
L50 851
L75 1835

N's per 100 kbp 0.00

Low sum in ragoo.fasta

Hello,

I am attempting to run Ragoo using a long-read assembly as the 'reference'.
After running Ragoo with the following command:

ragoo.py -t 4 -b -C ${assembly} ${ref}

My output ragoo.fasta file seems to be missing a lot of bases. The original assembly is ~2.7Gb, but the output fasta file has ~736 Mb only.

Any idea about what is happening to the outstanding sequences, or is this expected behaviour? The chimera.broken.fa file is the correct size, so it seems that things are being lost after that stage somewhere.

Thanks!
Lauren

Why did it go to Chr0?

Hello,
I ran RaGOO v1.1 with default parameters:
python ragoo.py contigs.fasta ref.fasta -t10
I looked for two specific contigs in the results which I know should map one after the other to chromosome 4. I looked at the paf output:

$ grep -E 'NODE_16239_|NODE_27438_' contigs_against_ref.paf
NODE_16239_length_1784_cov_3.719491     1784    5       1775    -       4       18585056        15903632        15905402        1737      1770    60      tp:A:P  cm:i:177        s1:i:1737       s2:i:0  dv:f:0.0015     rl:i:0
NODE_27438_length_660_cov_4.593388      660     3       645     -       4       18585056        15905406        15906048        642       642     60      tp:A:P  cm:i:67 s1:i:642        s2:i:0  dv:f:0.0008     rl:i:0

However, when looking at orderings/4_orderings.txt , only the first contig is there, while the second went to Chr0. Same when I look at groupings/4_contigs.txt.
Can you help me understand why the second contig was added to Chr0? it looks like it maps pretty well to the reference.

Thanks!

Empty Chr0 record

Hey Michael,

  • I think right now (v1.1) RaGOO outputs the unplaced >Chr0 sequence record even when there is no unplaced sequences.
  • So far I've gotten only warnings from this empty record when passing my RaGOO scaffolds to other software, but maybe consider not including it when it's empty?
  • I guess an empty >Chr0 is diagnostic that no contigs are unplaced, so perhaps it should be there... Anyway, just thought I mentioned this here since the current output may be incompatible with some post-processing tools and need editing.

Best,
Gui

Structural Variants Calling

Hi Mike,

In Assebmlytics website it is highly not recommended to have NNNNN's between contigs to anchor them. But ragoo does scaffolding to call svs via assemblytics. Is this an issue or did you walked around it somehow.

Also are you planning to make a parameter to set minimum and maximum size of SVs instead of having default values which are 50 and 10kb?

Mehmet

Exclude chromosomes/regions from analysis

Hi,

  • The RaGOO preprint mentions the ability to provide a set of genomic regions as gff3 so that the analysis will not break chimeric contigs at those segments.

RaGOO can optionally avoid breaking chimeric intervals at loci within genomic coordinates specified by a gff3 file, such as to avoid disrupting gene models identified in the de novo assembly.

  • When looking at the ragoo.py options, I can only fin the -e option, which seems to ignore complete chromosomes/scaffods from the analysis.
  • Is the gff3 option implemented in the current version of RaGOO?
  • Also, could you provide a little more detail on how the exclude option -e works? Specifically, are the provided sequences completely removed from the analysis?

Btw, this tool is very useful and seems to work very well! Congratulations! 🎉

Hard-coded output directory

Hi- thanks for the great tool! In the current version (v1.1), the output directory is hard-coded to be ragoo_output. Is there any plan to add a command line parameter to allow the user to decide where to write the results? (I see the option -o is there but commented out so there must be a good reason for it!)

Error in liftover script

Hi Michael,

I keep getting the following error message when trying to lift genes from several reference genome following RaGOO scaffolding. Any idea of how to solve this?

python /scratch/biotools/software/assembly/RaGOO-1.1/lift_over.py -g 100 ragoo_output/chimera_break/PC_chr.intra.chimera_broken.gff orderings.fofn ragoo_output/chimera_break/PC_CM_merged_pilon.intra.chimera.broken.fa.fai > genes.ragoo.gff
Traceback (most recent call last):
  File "/scratch/biotools/software/assembly/RaGOO-1.1/lift_over.py", line 84, in <module>
    new_header = ctg_chr[gff_header]
KeyError: 'Chr1'

Gap filling with RaGOO

Hey,

I heard RaGOO is also good for gap filling. Is that so?

Can I use my scaffolded assembly as reference and NanoPore reads as contigs to map? Or perhaps with the -R parameter?

Cheers,
Ricardo

Data cannot be updated

hello,Why do I stop here when I run ragoo and the data don't update

[M::worker_pipeline::2085.5312.87] mapped 165627 sequences
[M::worker_pipeline::2109.052
2.87] mapped 511234 sequences
[M::worker_pipeline::2121.037*2.87] mapped 393467 sequences
[M::main] Version: 2.15-r906-dirty
[M::main] CMD: minimap2 -k19 -w19 -t3 ../../Data/asm.fasta ../../Data/draft.asm.fasta
[M::main] Real time: 2121.506 sec; CPU: 6091.642 sec; Peak RSS: 28.659 GB

RaGOO vs Ragout

Hi Michael,

Not an issue, but I rather want to state my opinion on this, and I am interested in hearing your thoughts as well.

As you might now, there is another tool for reference assembly called Ragout (https://github.com/fenderglass/Ragout). We initially published it in 2014 (https://academic.oup.com/bioinformatics/article-abstract/30/12/i302/388572) and another update in 2018 (https://genome.cshlp.org/content/28/11/1720.short). It is fairly popular in the community, e.g. we have >4k downloads within the last 1.5 years and 96 citations of the two papers combined. As far as I know, at least some authors of the RaGOO bioRxiv manuscript are aware of Ragout.

So as a result, we have two tools that are doing very similar jobs, and their names differ in two letters (that are not pronounced). I think this might cause a lot of confusion in the future. There might be obvious problems with packaging too (Bioconda and perhaps Linux distributions).

To reiterate, I am curious what are you thoughts on this? I am not criticizing any research aspects of your work. In fact, I am happy to see another tool that is faster and perhaps easier to use, I also appreciate that you have cited the Ragout2 paper in your preprint. It's just the naming that might cause the confusion for the genomics community in the future.

Best,
Mikhail

ValueError: IntervalTree

Hi,
I ran into the following error:

Thu Oct 17 16:25:47 2019 --- Misassembly correction has been turned on. This automatically inactivates chimeric contig correction.
Thu Oct 17 16:25:47 2019 --- Running : minimap2 -k19 -w19 -t8 ../NbV1ChF.fasta ../Contigs.txt > contigs_against_ref.paf 2> contigs_against_ref.paf.log
Thu Oct 17 16:29:25 2019 --- Reading alignments
Thu Oct 17 16:34:00 2019 --- Getting gff features
Thu Oct 17 16:34:17 2019 --- Aligning raw reads to contigs
Thu Oct 17 16:34:17 2019 --- Running : minimap2 -x sr -t8 ../../Contigs.txt ../../1740D-43-05_S0_L001_R1.fastq > reads_against_ctg.paf 2> reads_against_ctg.paf.log
Thu Oct 17 21:27:31 2019 --- Computing contig coverage
Traceback (most recent call last):
  File "/work/waterhouse_team/miniconda2/envs/ragoo/bin/ragoo.py", line 4, in <module>
    __import__('pkg_resources').run_script('RaGOO==1.1', 'ragoo.py')
  File "/work/waterhouse_team/miniconda2/envs/ragoo/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/work/waterhouse_team/miniconda2/envs/ragoo/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1469, in run_script
    exec(script_code, namespace, namespace)
  File "/work/waterhouse_team/miniconda2/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/EGG-INFO/scripts/ragoo.py", line 714, in <module>
  File "/work/waterhouse_team/miniconda2/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/EGG-INFO/scripts/ragoo.py", line 36, in update_misasm_features
  File "/work/waterhouse_team/miniconda2/envs/ragoo/lib/python3.7/site-packages/intervaltree/intervaltree.py", line 1060, in __setitem__
    self.addi(index.start, index.stop, value)
  File "/work/waterhouse_team/miniconda2/envs/ragoo/lib/python3.7/site-packages/intervaltree/intervaltree.py", line 343, in addi
    return self.add(Interval(begin, end, data))
  File "/work/waterhouse_team/miniconda2/envs/ragoo/lib/python3.7/site-packages/intervaltree/intervaltree.py", line 326, in add
    " {0}".format(interval)
ValueError: IntervalTree: Null Interval objects not allowed in IntervalTree: Interval(0, 0, (0, 0))

What possible could have caused the error?

Thank you in advance,

Michal

Error in install

This is the python3 in my linux

python3
Python 3.6.6 (default, Mar 29 2019, 00:03:27)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.

And when I install the software: python3 setup.py install
it occur:

$ python3 setup.py install
/usr/lib/python3.6/site-packages/setuptools/dist.py:398: UserWarning: Normalizing 'v1.1' to '1.1'
normalized_version,
running install
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: '/usr/local/lib/python3.6'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

/usr/local/lib/python3.6/site-packages/

This directory does not currently exist. Please create it and try again, or
choose a different installation directory (using the -d or --install-dir
option).

Super-scaffolding two genome drafts

Would there be a way to use RaGOO to use two draft genomes with already interesting scaffold N50s and create a single super-scaffolded assembly?

Bogus example

Genome 1:
-----  --------------     --------------  ----  ------
Genome 2:
------------- ---- ----------------   ------------
Super scaffolds:
------------------------------------------------------

not authorized `copy' @ error/constitute.c

Hi,
can you explain what might be the issue with the following error message:

$ ragoo.py contigs.fa ref.fa
from: can't read /var/mail/collections
[[Bimport-im6.q16: not authorized `copy' @ error/constitute.c/WriteImage/1037.
from: can't read /var/mail/intervaltree
from: can't read /var/mail/ragoo_utilities.PAFReader
from: can't read /var/mail/ragoo_utilities.SeqReader
from: can't read /var/mail/ragoo_utilities.ContigAlignment
from: can't read /var/mail/ragoo_utilities.ContigAlignment
from: can't read /var/mail/ragoo_utilities.ContigAlignment
from: can't read /var/mail/ragoo_utilities.GFFReader
from: can't read /var/mail/ragoo_utilities.utilities
from: can't read /var/mail/ragoo_utilities.break_chimera
/home/user/.local/bin/ragoo.py: line 16: syntax error near unexpected token `('
/home/user/.local/bin/ragoo.py: line 16: `def write_contig_clusters(unique_dict, thresh, skip_list):'

$ python -V
Python 3.6.7

I'm running ubuntu 18.04
installed ragoo in a python3 env

Thanks in advance!
Ulrike

Misassembly Correction

Hi,
I would like to use ONT reads with RaGOO's Misassembly Correction step. We have 7x ONT reads and 50x Illumina pair-end reads. What would be the best way to error-corrected these long reads?

Thank you in advance,

Michal

Differentiation Pseudomolecules

Hi,
RaGOO is smoothly working and is quite handy.
But of course I would still make a small remark: is there any direct way to know the origin from a piece of the pseudomolecule? In other words: what is exactly coming from the reference and what from the supplied new scaffolds?
Maybe the usage of upper and lower letters or converting the remaining reference to bases to N?
Many thanks

RuntimeError when turing on the SV calling option

Hi Micheal,
I encountered an RunTimeError when turning on the Assemblytics option to map the pseudomolecules back to the reference and call SVs:
Fri Dec 13 21:10:54 2019 --- Running : minimap2 -ax asm5 --cs -t12 ../.././Zm-Mo17-REFERENCE-CAU-1.0.fa ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
/bin/bash: line 1: 35791 Aborted minimap2 -ax asm5 --cs -t12 ../.././Zm-Mo17-REFERENCE-CAU-1.0.fa ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
Traceback (most recent call last):
File "/home/lailab/shijunpeng/software/RaGOO-master/ragoo.py", line 754, in
align_pms(minimap_path, t, reference_file)
File "/home/lailab/shijunpeng/software/RaGOO-master/ragoo.py", line 439, in align_pms
run(cmd)
File "/home/lailab/shijunpeng/software/RaGOO-master/ragoo_utilities/utilities.py", line 25, in run
raise RuntimeError('Failed : %s ' % cmnd)
RuntimeError: Failed : minimap2 -ax asm5 --cs -t12 ../.././Zm-Mo17-REFERENCE-CAU-1.0.fa ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log

Possibilly due to the --cs option with out a build-in string in RaGOO scripts that disabled the -t option of minimap2?

Best Regards,
Junpeng

KeyError during liftover

Hi,

I aligned a set of contigs to a reference genome of the same species with RaGOO, and while trying to run the liftover utility afterwards I get a KeyError exception. The commands I used were as follow:

ragoo.py -t 6 -g 100 contigs.fasta reference.fna   # runs OK
ls ragoo_output/orderings/* > orderings.fofn   # runs OK
samtools faidx contigs.fasta   # runs OK
lift_over.py -g 100 reference.genes.gff orderings.fofn contigs.fasta.fai > genes.lifted-ragoo.gff
Traceback (most recent call last):
  File "/Users/pmca/local/biosoft/bin/lift_over.py", line 4, in <module>
    __import__('pkg_resources').run_script('RaGOO==1.2', 'lift_over.py')
  File "/Users/pmca/Software/assemblers/RaGOO_venv/66ef904/lib/python3.6/site-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/Users/pmca/Software/assemblers/RaGOO_venv/66ef904/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1445, in run_script
    exec(script_code, namespace, namespace)
  File "/Users/pmca/Software/assemblers/RaGOO_venv/66ef904/lib/python3.6/site-packages/RaGOO-1.2-py3.6.egg/EGG-INFO/scripts/lift_over.py", line 84, in <module>
KeyError: 'NC_024238.1'

I don't think I swapped the contigs-reference order in the commands, so not sure what might be causing this error.

I also checked the output files and everything seems to be present. This reference scaffold only has a single alignment but I wouldn't guess this to be the cause of the issue.

$ cat reference.fna | grep NC_024238.1
>NC_024238.1 Some species mitochondrion, complete genome
$ cat reference.genes.gff | grep NC_024238.1
##sequence-region   NC_024238.1 1 16538
NC_024238.1	RefSeq	tRNA	1	69	.	+	.	ID=rna48629;gbkey=tRNA;product=tRNA-Phe
NC_024238.1	RefSeq	exon	1	69	.	+	.	ID=id635338;Parent=rna48629;gbkey=tRNA;product=tRNA-Phe
# ... and more GFF lines
$ cat orderings.fofn | grep NC_024238.1
ragoo_output/orderings/NC_024238.1_orderings.txt
$ ls -l ragoo_output/orderings/ | grep NC_024238.1
-rw-r--r--  1 pmca  staff     31  5 Mar 22:10 NC_024238.1_orderings.txt
$ ls -l ragoo_output/groupings/ | grep NC_024238.1
-rw-r--r--  1 pmca  staff      8  5 Mar 22:10 NC_024238.1_contigs.txt
$ head ragoo_output/orderings/NC_024238.1_orderings.txt
41378	-	0.7238458280389666	1.0

Any suggestion on how to solve this?
Many thanks,
Pedro

AGP file

Hi,

I was wondering if RaGOO could generate an AGP file describing the location of each sequence after anchoring. I have tried each option but it doesn't seem that an AGP file can be generated. Could you please tell me if this is possible?

Scaffolding overlapping contigs

Hi,
I would like to know what is the behaviour of RaGOO when, doing the reference-based scaffolding, it finds out that some contigs are partially overlapped. This might be the result of assemblying a highly heterozygous genome. I also have Bionano data that I could use for doing the scaffolding.
Moreover I would like to ask you for an advice: starting from Falcon-Unzip contigs my idea would be to run purge_haplotigs for removing haplotipic duplication, running RaGOO and finally correcting inconsistencies using Bionano data, similar to what was done with Hi-C data in the publication. Would you advice changing the order of the last two steps?
After the Bionano scaffolding, the genome is composed of 220 scaffolds, so it is definitely not at chromosome level (19 chrs expected in Vitis vinifera). Thanks

Pass arguments to minimap2 (especially `-I` for large genomes)

Dear @malonge ,
many thanks for developing RaGOO. We would like to use it on our research on wheat, but I have hit a (very minor) roadblock, that is that it is not possible to give extra-command line arguments to minimap2. This is particularly important for wheat because the genome is six times larger than human and this conflicts with the default index size of minimap2 (ie 4Gbps).

I have hard-coded the modification in my copy of the code, but a less brittle solution might be useful for the tool.

Kind regards

Manual order select contigs

Hi Michael
I'm using RaGOO to scaffold very long contigs of assemblies done with illumina polished ONT reads (~ 400 contigs in 400 Mb genome). Since some of the samples have been used in mapping populations, we have genetic linkage maps which we can use as another reference for contig grouping and order.

My question is whether there is some way of 'imposing' scaffolding grouping and order manually for some of the contigs?

Grouping confidence scores definition

Hi,

I'm trying to determine a reasonable grouping confidence score for the placement of scaffolds. The default seems to work OK (-i 0.2), but I'd also like to try with other values. In the preprint you have a section defining the confidence scores (clustering, location and orientation), but on the program -i is actually defined as grouping confidence score. Is this the same as clustering? And could a -i 0.2 be said that the scaffold is covered ~20% of its length in its assigned reference chromosome?

Many thanks,
Pedro

Some general queries

Hi, I intend to use RaGOO in my workflow to build larger scaffolds from the contigs. I have the following general questions for which I did not find answers in the manuscript.

  1. The reference I have is a nanopore assembly which was De novo assembled which I need to use as a reference. I have 4 more genomes of close related organisms assembled to contigs and are through Illumina sequencing. My reference is also fragmented but has good N50 value. Is it advisable to use >1Mb sequences from my nanopore assembly as a reference and build pseudomolecules using Illumina data?

  2. Will the N's present in my nanopore assembly affect the scaffolding in any manner ?

  3. Should I be masking (simple and low complexity) both reference and query and proceed or can I carry out the analyses with unmasked sequences?

  4. Will scaffolding be affected if a diverged organism is used as a reference?

your response will help me a lot as I am new to bioinformatics.
Thanks in advance

reassembling chr0

Hi,
I ran RaGOO, and 2,288,581,979 bp could be mapped to the reference, but 524,739,651 could not be aligned. How would it be possible to use the unmapped contigs and reassemble them?

Thank you in advance,

Michal

anchored position and orient file between the original scaffolds and the final assembly

Hi,

thank you for your softare, ragoo works well in my genome. and i have two questions when i use this software.

I want to know, is there any quick way for me to generate the anchored position and orient file between the original scaffolds and the final assembly (i. e. agp file or gfa file)?

there are many misjoins in my orginal scaffolds, and I want to use Hi-C scaffolding results (agp result) to correct and improve the ragoo assembly results. is there any softwares or sugesstions for me to do this work?

thank you very much.

Best,

He

RaGOO failed Mapping short reads with minimap2 -- RuntimeWarning: Mean of empty slice

Hi @malonge,

I was trying to scaffold an assembly using your great tool. However I keep getting this warning and the minimap2 mapping is not done all.

Here is the command I ran using python3 :

python ../ragoo.py -b -R PE.inter.fq -T sr -t 100 -g 250 -C ../6000flye.ctg.fa ../PFLA.genome.fasta

And here the error-message

nkal/anaconda3/bin/minimap2 6000flye.ctg.fa  PFLA.genome.fasta 
Sun Jul 28 19:44:09 2019 --- Misassembly correction has been turned on. This automatically inactivates chimeric contig correction.
Sun Jul 28 19:44:09 2019 --- Reading alignments
Sun Jul 28 19:44:09 2019 --- Aligning raw reads to contigs
Sun Jul 28 19:44:09 2019 --- Computing contig coverage
/home/nguinkal/anaconda3/envs/mypython3/lib/python3.7/site-packages/numpy-1.17.0-py3.7-linux-x86_64.egg/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/nguinkal/anaconda3/envs/mypython3/lib/python3.7/site-packages/numpy-1.17.0-py3.7-linux-x86_64.egg/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

Best,
Julien

FileNotFoundError: [Errno 2] No such file or directory:

Hi,
I ran into the following error:

ragoo.py outpre.ctg.lay.fa merge.ccs.fasta
Mon Jan 27 17:33:56 2020 --- Reading alignments
Mon Jan 27 17:37:10 2020 --- Assigning contigs
Traceback (most recent call last):
File "/home/nfhy_huangz/miniconda2/envs/ragoo/bin/ragoo.py", line 4, in
import('pkg_resources').run_script('RaGOO==1.1', 'ragoo.py')
File "/home/nfhy_huangz/miniconda2/envs/ragoo/lib/python3.8/site-packages/pkg_resources/init.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/nfhy_huangz/miniconda2/envs/ragoo/lib/python3.8/site-packages/pkg_resources/init.py", line 1470, in run_script
exec(script_code, namespace, namespace)
File "/home/nfhy_huangz/miniconda2/envs/ragoo/lib/python3.8/site-packages/RaGOO-1.1-py3.8.egg/EGG-INFO/scripts/ragoo.py", line 744, in
File "/home/nfhy_huangz/miniconda2/envs/ragoo/lib/python3.8/site-packages/RaGOO-1.1-py3.8.egg/EGG-INFO/scripts/ragoo.py", line 123, in write_contig_clusters
FileNotFoundError: [Errno 2] No such file or directory: 'm64031_200102_132349/172032446/ccs_contigs.txt'

What possible could have caused the error?

Thanks,
Zhen

memory error with -R parameter

Dear Sir,
I have run ragoo with -R parameter by executing as ragoo.py contigs.fasta reference.fasta -R short reads -T sr -C -g 111. It run well but it showed memory Error. I herewith attached screenshot with red leveled memory Error. Kindly help me solve this problem.
Thanks in advanced

Shakti Kumar
NIAB, Hyderabad, India
ragoo_error_20190717

Question about the confident scores

Hello,
I see you have confidence scores associated with the grouping, localization, and orientation for each contig, and I want to know more details about it.
For example, I have a contig in the final fasta file, and I get it's location confidence scores = 0.03104861142651071 ,and It's orientation confidence scores = 0.9638021314266446 (This contig I think it should belong to chr Y <ref dosen't have chr Y> and should not be broken,but it is assembled to chr X, and It is broken ), I want to know these scores are good or bad? And I want to know how can I judge what scores are reliable?
Here it's my command:
ragoo.py -R raw.corrected.fasta -C -m /bin/minimap2 -gff stringtie.generated.gff3 -T corr -t 28 assembly.fa ref.fna
If you can help me,I will be very grateful to you.

UnicodeDecodeError in filter_gap_SVs.py

Thanks for this awesome tool. I have successfully used it to reorder the contigs in my draft assembly. Now, I am trying to correct the misassemblies and call SVs. This attempt gives the following runtime error:

Traceback (most recent call last):
  File "/home/rkundu/miniconda3/bin/filter_gap_SVs.py", line 4, in <module>
   __import__('pkg_resources').run_script('RaGOO==1.1', 'filter_gap_SVs.py')
  File "/home/rkundu/miniconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 666, in run_script
     self.require(requires)[0].run_script(script_name, ns)
   File "/home/rkundu/miniconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1453, in run_script
      exec(script_code, namespace, namespace)
   File "/home/rkundu/miniconda3/lib/python3.6/site-packages/RaGOO-1.1-py3.6.egg/EGG-INFO/scripts/filter_gap_SVs.py", line 125, in <module>
   File "/home/rkundu/miniconda3/lib/python3.6/site-packages/RaGOO-1.1-py3.6.egg/EGG-INFO/scripts/filter_gap_SVs.py", line 120, in main
   File "/home/rkundu/miniconda3/lib/python3.6/site-packages/RaGOO-1.1-py3.6.egg/EGG-INFO/scripts/filter_gap_SVs.py", line 27, in make_gaps_tree
   File "/home/rkundu/miniconda3/lib/python3.6/site-packages/RaGOO-1.1-py3.6.egg/ragoo_utilities/SeqReader.py", line 20, in parse_fasta
   File "/home/rkundu/miniconda3/lib/python3.6/codecs.py", line 321, in decode
       (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
/home/rkundu/miniconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
   out=out, **kwargs)
/home/rkundu/miniconda3/lib/python3.6/site-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
   ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
   File "/home/rkundu/miniconda3/bin/ragoo.py", line 4, in <module>
     __import__('pkg_resources').run_script('RaGOO==1.1', 'ragoo.py')
   File "/home/rkundu/miniconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 666, in run_script
     self.require(requires)[0].run_script(script_name, ns)
   File "/home/rkundu/miniconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1453, in run_script
     exec(script_code, namespace, namespace)
   File "/home/rkundu/miniconda3/lib/python3.6/site-packages/RaGOO-1.1-py3.6.egg/EGG-INFO/scripts/ragoo.py", line 749, in <module>
   File "/home/rkundu/miniconda3/lib/python3.6/site-packages/RaGOO-1.1-py3.6.egg/EGG-INFO/scripts/ragoo.py", line 475, in get_SVs
    File "/home/rkundu/miniconda3/lib/python3.6/site-packages/RaGOO-1.1-py3.6.egg/ragoo_utilities/utilities.py", line 25, in run
RuntimeError: Failed : filter_gap_SVs.py ../../GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

Conda recipe

It would be great to have RaGOO on Conda/Bioconda. Makes it easier to use it in SnakeMake and allows for analysis versioning. Thx!

Multiple assemblers

Hello, could I use multiple assemblers, concat the contigs and the use RaGOO? Are there advantages to this approach? I am talking about de novo assembly with illumina reads only

Question: contig names in output

Hello,

Just a quick question about the contig names in the output ragoo.fasta file - some of them have inherited the reference scaffold names while others remain unchanged (apart from the "_RaGOO" suffix). Is it simply the case that any contigs aligning to a given reference sequence are then stitched together and collectively inherit that reference sequence name?

Thanks in advance, and also for the nice piece of software.

Cheers,
Reuben

Padding at the end of scaffolds

Hi Michael,

  • I am using RaGOO to scaffold a very contiguous assembly. In fact, most chromosomes are present as a single contig in my assembly, so all RaGOO should do in these cases is orient and assign them to the correct reference chromosome.
  • I am curious why does RaGOO add gap padding to the end of these "single-contig scaffolds". It looks like this:
(...)
ACTTCTTGGTGTACGGATGTCTAACTTCTTGGTGTACGGATGTCTAACTTCTTGGTGTACGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
  • These sequence records end in a 100 bp gap despite nothing else being on the other side of the gap.
  • I can easily remove this terminal gap in post-processing but I was curious if this had any purpose.

Thanks,
Guilherme

P.S. RaGOO is very useful and efficient. Great work 👍

RaGOO Chr > reference Chr

Hello,

I am trying to use RaGOO to map a frog draft assembly (4.6G) of another 30Mya diverged high quality assembly (4.4G), RaGOO is doing a great job for chr1-10 except the strange pattern i observe for the chr11-13: the constructed Chr are larger than the reference. Would appreciate any suggestion how to deal with that.

Screen Shot 2020-02-06 at 2 49 32 PM

RaGOO out visualization

Hi Michael!

Thank you for the super-fast RaGOO algorithm!
Do you have any good solutions for fast RaGOO out visualization?
Do you have any idea how to visualize
contigs_against_ref.paf
and the files from /groupings /orderings?
Thank you in advance!

Best regards,
Ural

Feature request: keep unassigned contigs separate, rather than gluing them into 'Chr0'?

RaGOO looks wonderful! But ... it assigns unmapped contigs to a pseudochromosome, Chr0.

Would it be possible to have an option to have RaGOO, instead of putting unmapped contigs into 'Chr0', keep them as unligated contig sequences?

It isn't hard to get around the Chr0 issue for DNA sequence alone (just split Chr0 on 'N' residues and voila, free contigs again!). But hacking a lifted-over GFF is considerably harder if it lifts genes to Chr0 and one really wants free contigs instead.

Maximum number of breaks per contig

Hi @malonge,

I have tested RaGOO with read alignment-based misassembly correction for the first time and noticed that some contigs are broken in more than one place.

Here is an example of a contig that was broken into 4 parts after a single round of RaGOO:

tig00000329|arrow|pilon_misasm_break:0-735947
tig00000329|arrow|pilon_misasm_break:735947-1296254
tig00000329|arrow|pilon_misasm_break:1296254-2273490
tig00000329|arrow|pilon_misasm_break:2273490-3198721

The parameters I used were:

ragoo.py -t 20 -R ../correctedReads.fasta -T corr -i 0.7 -c 1000000 -d 1500000 -g 100 -C "$ASMN" "$REFN"

I was just wondering if the 1 inter- and 1 intra-chromosomal break maximum is only in place if not using -R.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.