Coder Social home page Coder Social logo

sanger-pathogens / assembly_improvement Goto Github PK

View Code? Open in Web Editor NEW
55.0 16.0 21.0 3.1 MB

Improve the quality of a denovo assembly by scaffolding and gap filling

License: Other

Perl 99.24% Java 0.67% Python 0.09%
genomics sequencing next-generation-sequencing research bioinformatics bioinformatics-pipeline global-health infectious-diseases pathogen

assembly_improvement's People

Contributors

andrewjpage avatar aslett1 avatar bewt85 avatar craigporter avatar kpepper avatar martinghunt avatar seretol avatar ssjunnebo avatar vaofford avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

assembly_improvement's Issues

Trying to use SSpace-Long Read

I have pre-assembled contigs from Canu.

When I run the script: It gives me an error: Please insert a file with contig sequences. You've inserted '/SSPACE/SSPACE-LongRead_v1-1/runs/data/contigs.fasta' which either does not exist or is not filled in

However this is a file that consists all the. contigs.

Eg of header:

tig00000001 len=1248915 reads=5062 covStat=1331.73 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
TACTAAGCTCCTTACTAGAGGCATCATAATATTATCCCTTTACCTATTAGCATAAGTCTTATTCTTCATTCTTCTACAAAAACCTAAACCATCCAAAAAT

Mummer error

ERROR: mummer and/or mgaps returned non-zero
ERROR: Could not parse delta file, nucmer.delta
Hi, I am getting this error while running ABACAS with the following params.
$ perl abacas.1.3.1.pl -b -N -i 50 -v 50 -p nucmer -m -o -r -q

could anyone suggest where I am going wrong?
error no: 400
ERROR: Could not parse delta file, nucmer.filtered.delta
error no: 402
Use of uninitialized value in addition (+) at abacas.1.3.1.pl line 1001.
Total contigs = 0
Use of uninitialized value $id in print at abacas.1.3.1.pl line 1013.
Use of uninitialized value $id in print at abacas.1.3.1.pl line 1014.
Use of uninitialized value $id in print at abacas.1.3.1.pl line 1015.
Use of uninitialized value $id in print at abacas.1.3.1.pl line 1178.
FINISHED CONTIG ORDERING

Option -t tblastx can not open reference when it is in subfolder

Running abacas works fine when reference is in subfolder. But tblastx gives following error:

This may take several minutes ...
[formatdb] ERROR: Could not open reference.fasta

ERROR: Could not find 'formatdb' for blast

When reference is moved to the working directory tblastx works fine.

Call to SSPACE dies due to invalid config file (expects aligner as second argument)

It looks like maybe the expected input to SSPACE has changed since assembly_improvement was originally released.

Running against the latest SSPACE (SSPACE-STANDARD-3.0_linux-x86_64) produces errors like:

ERROR: Invalid aligner in library LIB: /data/thesisWGS/AW01/_5AzmruAqQ/AW-1_S1_L001_R1_001.fastq. Should be either 'bowtie', 'bwa' or 'bwasw' -- fatal

(There's a similar error message in the code for GapFiller, but I confirmed that at runtime it's coming from SSPACE by adding debug statements to SSPACE to print the path of the library file it's attempting to process)

The current version of AssemblyImprovement installed from cpan (and also here in git at AssemblyImprovement/Scaffold/SSpace/Config.pm) creates a _scaffolder_config_file that looks like:

$ cat /data/thesisWGS/AW01/8Y4EvoYYIc/KToMu7SX1I/_scaffolder_config_file
LIB /data/thesisWGS/AW01/_5AzmruAqQ/AW-1_S1_L001_R1_001.fastq /data/thesisWGS/AW01/_5AzmruAqQ/AW-1_S1_L001_R2_001.fastq 350 0.3 FR

The first two args are: "LIB" (hardcoded placeholder library name), followed by path to the forward read.

Compare to the current SSPACE example, where the aligner is the second arg:

$ cat ~/tools/SSPACE-STANDARD-3.0_linux-x86_64/example/libraries.txt
lib1 bowtie SRR001665_1.fastq SRR001665_2.fastq 200 0.25 FR

I don't have previous versions of SSPACE to confirm that it changed at some point, but that seems to be the most plausible explanation.

It seems like the most reasonable fix is to add a mapper arg as is used in AssemblyImprovement/FillGaps/GapFiller/Config.pm, also defaulting to "bwa".

I can add a simple pull request to implement that, but I'm hoping someone can shed light on why/when the SSPACE expected params changed so that we don't break backward compatibility.

Could not read file (reference file) error

Hi @andrewjpage
I ran assembly improvement against a spades assembly and a virus reference sequence provided on the command line and I get this error. I decided to just run the order_contigs_with_abacus script and again it raises this error. Is this a known issue or something wrong with my setup? The reference file was provided in the arguments.

This is with improve_assembly
For some reason it is unable to locate the given files and arguments.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file '/assembly/spades/sample1/improved_no_ref_assembly/ucgdTSn1uJ/scaffolds.scaffolded.gapfilled.length_filtered.sorted.fa_KP317916_1.fasta.fasta': No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/Root/Root.pm:447
STACK: Bio::Root::IO::_initialize_io /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/Root/IO.pm:268
STACK: Bio::SeqIO::_initialize /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/SeqIO.pm:513
STACK: Bio::SeqIO::fasta::_initialize /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/SeqIO.pm:389
STACK: Bio::SeqIO::new /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/SeqIO.pm:435
STACK: Bio::AssemblyImprovement::Abacas::DelimiterRole::_split_sequence_on_delimiter /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/AssemblyImprovement/Abacas/DelimiterRole.pm:70
STACK: Bio::AssemblyImprovement::Abacas::Main::run /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/AssemblyImprovement/Abacas/Main.pm:73
STACK: Bio::AssemblyImprovement::Abacas::Iterative::_run_abacas /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/AssemblyImprovement/Abacas/Iterative.pm:74
STACK: Bio::AssemblyImprovement::Abacas::Iterative::run /apps/perl/5.24.0/lib/site_perl/5.24.0/Bio/AssemblyImprovement/Abacas/Iterative.pm:46
STACK: /apps/perl/5.24.0/bin/order_contigs_with_abacas:57
-----------------------------------------------------------

I had provided all the arguments with the correct paths to where these files are located.

Question about output

I have been using AssemblyImprover following VelvetOptimizer for 36 bacterial genomes. For the majority of the genomes, I generated the expected output files:
contigs.fa.scaffolded.filtered
contigs.fa.scaffolded.gapfilled.filtered
scaffolds.scaffolded.gapfilled.length_filtered.fa
scaffolds.scaffolded.gapfilled.length_filtered.sorted.fa

But for a handful, I get the following files:
contigs.scaffolded.fa
contigs.scaffolded.gapfilled.fa
scaffolds.scaffolded.gapfilled.length_filtered.fa
scaffolds.scaffolded.gapfilled.length_filtered.sorted.fa

What is the difference? Are these unexpected files usable? I can not find any documentation about this discrepancy anywhere, so anything you can tell me would be helpful.

Intermediate file not found?

Dear @andrewjpage

I've been trying, for a couple of days already, to properly install and run your assembly improvment pipeline.

After installing all dependencies highlighted by cpanm -f Bio::AssemblyImprovement and test with dzil test runs smoothly within the github cloned folder.

However, when I run as follows (and only if I decide to include a reference with -c):

$ improve_assembly -a denovo_assembly/complete/KB10_trim_contigs.fasta -f clinic1/raw_reads/KB10_R1.fastq.gz -r clinic1/raw_reads/KB10_R2.fastq.gz -s bin/SSPACE-STANDARD-3.0_linux-x86_64/SSPACE_Standard_v3.0.pl -g bin/GapFiller_v1-10_linux-x86_64/GapFiller.pl  -o improve_assembly_KB10/   -c references/polyoma_dunlop_reference.fasta

Reports back the following error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file '/home/jmarti/polyoma/7wWtvgPPJ7/KB10_trim_contigs.scaffolded.fasta_polyoma_dunlop_reference.fasta.fasta': Bestand of map bestaat niet
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/jmarti/perl5/lib/perl5/Bio/Root/Root.pm:449
STACK: Bio::Root::IO::_initialize_io /home/jmarti/perl5/lib/perl5/Bio/Root/IO.pm:272
STACK: Bio::SeqIO::_initialize /home/jmarti/perl5/lib/perl5/Bio/SeqIO.pm:508
STACK: Bio::SeqIO::fasta::_initialize /home/jmarti/perl5/lib/perl5/Bio/SeqIO/fasta.pm:88
STACK: Bio::SeqIO::new /home/jmarti/perl5/lib/perl5/Bio/SeqIO.pm:384
STACK: Bio::SeqIO::new /home/jmarti/perl5/lib/perl5/Bio/SeqIO.pm:430
STACK: Bio::AssemblyImprovement::Abacas::DelimiterRole::_split_sequence_on_delimiter /home/jmarti/perl5/lib/perl5/Bio/AssemblyImprovement/Abacas/DelimiterRole.pm:70
STACK: Bio::AssemblyImprovement::Abacas::Main::run /home/jmarti/perl5/lib/perl5/Bio/AssemblyImprovement/Abacas/Main.pm:73
STACK: Bio::AssemblyImprovement::Abacas::Iterative::_run_abacas /home/jmarti/perl5/lib/perl5/Bio/AssemblyImprovement/Abacas/Iterative.pm:74
STACK: Bio::AssemblyImprovement::Abacas::Iterative::run /home/jmarti/perl5/lib/perl5/Bio/AssemblyImprovement/Abacas/Iterative.pm:46
STACK: /home/jmarti/perl5/bin/improve_assembly:142
-----------------------------------------------------------

I have the impression that /home/jmarti/polyoma/7wWtvgPPJ7/KB10_trim_contigs.scaffolded.fasta_polyoma_dunlop_reference.fasta.fasta merges names of both my assembly file and reference. Is it a bug? Is it supposed to have an intermediate file like this? Is there an error with the requirements?

Excited to make this pipeline to work. Thank you very much!

~ Joan ~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.