sanger-pathogens / assembly_improvement Goto Github PK
View Code? Open in Web Editor NEWImprove the quality of a denovo assembly by scaffolding and gap filling
License: Other
Improve the quality of a denovo assembly by scaffolding and gap filling
License: Other
I have pre-assembled contigs from Canu.
When I run the script: It gives me an error: Please insert a file with contig sequences. You've inserted '/SSPACE/SSPACE-LongRead_v1-1/runs/data/contigs.fasta' which either does not exist or is not filled in
However this is a file that consists all the. contigs.
Eg of header:
tig00000001 len=1248915 reads=5062 covStat=1331.73 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
TACTAAGCTCCTTACTAGAGGCATCATAATATTATCCCTTTACCTATTAGCATAAGTCTTATTCTTCATTCTTCTACAAAAACCTAAACCATCCAAAAAT
ERROR: mummer and/or mgaps returned non-zero
ERROR: Could not parse delta file, nucmer.delta
Hi, I am getting this error while running ABACAS with the following params.
$ perl abacas.1.3.1.pl -b -N -i 50 -v 50 -p nucmer -m -o -r -q
could anyone suggest where I am going wrong?
error no: 400
ERROR: Could not parse delta file, nucmer.filtered.delta
error no: 402
Use of uninitialized value in addition (+) at abacas.1.3.1.pl line 1001.
Total contigs = 0
Use of uninitialized value $id in print at abacas.1.3.1.pl line 1013.
Use of uninitialized value $id in print at abacas.1.3.1.pl line 1014.
Use of uninitialized value $id in print at abacas.1.3.1.pl line 1015.
Use of uninitialized value $id in print at abacas.1.3.1.pl line 1178.
FINISHED CONTIG ORDERING
Running abacas works fine when reference is in subfolder. But tblastx gives following error:
This may take several minutes ...
[formatdb] ERROR: Could not open reference.fasta
ERROR: Could not find 'formatdb' for blast
When reference is moved to the working directory tblastx works fine.
It looks like maybe the expected input to SSPACE has changed since assembly_improvement was originally released.
Running against the latest SSPACE (SSPACE-STANDARD-3.0_linux-x86_64) produces errors like:
ERROR: Invalid aligner in library LIB: /data/thesisWGS/AW01/_5AzmruAqQ/AW-1_S1_L001_R1_001.fastq. Should be either 'bowtie', 'bwa' or 'bwasw' -- fatal
(There's a similar error message in the code for GapFiller, but I confirmed that at runtime it's coming from SSPACE by adding debug statements to SSPACE to print the path of the library file it's attempting to process)
The current version of AssemblyImprovement installed from cpan (and also here in git at AssemblyImprovement/Scaffold/SSpace/Config.pm) creates a _scaffolder_config_file that looks like:
$ cat /data/thesisWGS/AW01/8Y4EvoYYIc/KToMu7SX1I/_scaffolder_config_file
LIB /data/thesisWGS/AW01/_5AzmruAqQ/AW-1_S1_L001_R1_001.fastq /data/thesisWGS/AW01/_5AzmruAqQ/AW-1_S1_L001_R2_001.fastq 350 0.3 FR
The first two args are: "LIB" (hardcoded placeholder library name), followed by path to the forward read.
Compare to the current SSPACE example, where the aligner is the second arg:
$ cat ~/tools/SSPACE-STANDARD-3.0_linux-x86_64/example/libraries.txt
lib1 bowtie SRR001665_1.fastq SRR001665_2.fastq 200 0.25 FR
I don't have previous versions of SSPACE to confirm that it changed at some point, but that seems to be the most plausible explanation.
It seems like the most reasonable fix is to add a mapper arg as is used in AssemblyImprovement/FillGaps/GapFiller/Config.pm, also defaulting to "bwa".
I can add a simple pull request to implement that, but I'm hoping someone can shed light on why/when the SSPACE expected params changed so that we don't break backward compatibility.
Hi @andrewjpage
I ran assembly improvement against a spades assembly and a virus reference sequence provided on the command line and I get this error. I decided to just run the order_contigs_with_abacus
script and again it raises this error. Is this a known issue or something wrong with my setup? The reference file was provided in the arguments.
This is with improve_assembly
For some reason it is unable to locate the given files and arguments.
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file '/assembly/spades/sample1/improved_no_ref_assembly/ucgdTSn1uJ/scaffolds.scaffolded.gapfilled.length_filtered.sorted.fa_KP317916_1.fasta.fasta': No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/Root/Root.pm:447
STACK: Bio::Root::IO::_initialize_io /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/Root/IO.pm:268
STACK: Bio::SeqIO::_initialize /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/SeqIO.pm:513
STACK: Bio::SeqIO::fasta::_initialize /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/SeqIO.pm:389
STACK: Bio::SeqIO::new /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/SeqIO.pm:435
STACK: Bio::AssemblyImprovement::Abacas::DelimiterRole::_split_sequence_on_delimiter /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/AssemblyImprovement/Abacas/DelimiterRole.pm:70
STACK: Bio::AssemblyImprovement::Abacas::Main::run /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/AssemblyImprovement/Abacas/Main.pm:73
STACK: Bio::AssemblyImprovement::Abacas::Iterative::_run_abacas /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/AssemblyImprovement/Abacas/Iterative.pm:74
STACK: Bio::AssemblyImprovement::Abacas::Iterative::run /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/AssemblyImprovement/Abacas/Iterative.pm:46
STACK: /apps/perl/5.24.0/bin/order_contigs_with_abacas:57
-----------------------------------------------------------
I had provided all the arguments with the correct paths to where these files are located.
I have been using AssemblyImprover following VelvetOptimizer for 36 bacterial genomes. For the majority of the genomes, I generated the expected output files:
contigs.fa.scaffolded.filtered
contigs.fa.scaffolded.gapfilled.filtered
scaffolds.scaffolded.gapfilled.length_filtered.fa
scaffolds.scaffolded.gapfilled.length_filtered.sorted.fa
But for a handful, I get the following files:
contigs.scaffolded.fa
contigs.scaffolded.gapfilled.fa
scaffolds.scaffolded.gapfilled.length_filtered.fa
scaffolds.scaffolded.gapfilled.length_filtered.sorted.fa
What is the difference? Are these unexpected files usable? I can not find any documentation about this discrepancy anywhere, so anything you can tell me would be helpful.
Dear @andrewjpage
I've been trying, for a couple of days already, to properly install and run your assembly improvment pipeline.
After installing all dependencies highlighted by cpanm -f Bio::AssemblyImprovement
and test with dzil test
runs smoothly within the github cloned folder.
However, when I run as follows (and only if I decide to include a reference with -c
):
$ improve_assembly -a denovo_assembly/complete/KB10_trim_contigs.fasta -f clinic1/raw_reads/KB10_R1.fastq.gz -r clinic1/raw_reads/KB10_R2.fastq.gz -s bin/SSPACE-STANDARD-3.0_linux-x86_64/SSPACE_Standard_v3.0.pl -g bin/GapFiller_v1-10_linux-x86_64/GapFiller.pl -o improve_assembly_KB10/ -c references/polyoma_dunlop_reference.fasta
Reports back the following error:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file '/home/jmarti/polyoma/7wWtvgPPJ7/KB10_trim_contigs.scaffolded.fasta_polyoma_dunlop_reference.fasta.fasta': Bestand of map bestaat niet
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/jmarti/perl5/lib/perl5/Bio/Root/Root.pm:449
STACK: Bio::Root::IO::_initialize_io /home/jmarti/perl5/lib/perl5/Bio/Root/IO.pm:272
STACK: Bio::SeqIO::_initialize /home/jmarti/perl5/lib/perl5/Bio/SeqIO.pm:508
STACK: Bio::SeqIO::fasta::_initialize /home/jmarti/perl5/lib/perl5/Bio/SeqIO/fasta.pm:88
STACK: Bio::SeqIO::new /home/jmarti/perl5/lib/perl5/Bio/SeqIO.pm:384
STACK: Bio::SeqIO::new /home/jmarti/perl5/lib/perl5/Bio/SeqIO.pm:430
STACK: Bio::AssemblyImprovement::Abacas::DelimiterRole::_split_sequence_on_delimiter /home/jmarti/perl5/lib/perl5/Bio/AssemblyImprovement/Abacas/DelimiterRole.pm:70
STACK: Bio::AssemblyImprovement::Abacas::Main::run /home/jmarti/perl5/lib/perl5/Bio/AssemblyImprovement/Abacas/Main.pm:73
STACK: Bio::AssemblyImprovement::Abacas::Iterative::_run_abacas /home/jmarti/perl5/lib/perl5/Bio/AssemblyImprovement/Abacas/Iterative.pm:74
STACK: Bio::AssemblyImprovement::Abacas::Iterative::run /home/jmarti/perl5/lib/perl5/Bio/AssemblyImprovement/Abacas/Iterative.pm:46
STACK: /home/jmarti/perl5/bin/improve_assembly:142
-----------------------------------------------------------
I have the impression that /home/jmarti/polyoma/7wWtvgPPJ7/KB10_trim_contigs.scaffolded.fasta_polyoma_dunlop_reference.fasta.fasta
merges names of both my assembly file and reference. Is it a bug? Is it supposed to have an intermediate file like this? Is there an error with the requirements?
Excited to make this pipeline to work. Thank you very much!
~ Joan ~
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.