sanger-pathogens / roary Goto Github PK

Rapid large-scale prokaryote pan genome analysis

Home Page: http://sanger-pathogens.github.io/Roary

License: Other

Perl 49.71% R 0.70% Shell 0.80% HTML 35.73% Python 0.85% Jupyter Notebook 11.99% Dockerfile 0.02% Raku 0.19%

genomics sequencing next-generation-sequencing research bioinformatics bioinformatics-pipeline global-health infectious-diseases pathogen

roary's Introduction

Roary - The pan genome pipeline

Takes annotated assemblies in GFF3 format and calculates the pan genome.

PLEASE NOTE: we currently do not have the resources to provide support for Roary, so please do not expect a reply if you flag any issue.

Introduction
Installation
Usage
License
Feedback/Issues
Citation
Further Information

Introduction

Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka) and calculates the pan genome. Using a standard desktop PC, it can analyse datasets with thousands of samples, something which is computationally infeasible with existing methods, without compromising the quality of the results. 128 samples can be analysed in under 1 hour using 1 GB of RAM and a single processor. To perform this analysis using existing methods would take weeks and hundreds of GB of RAM.

Installation

Roary has the following dependencies:

Required dependencies

Optional dependencies

kraken

There are a number of ways to install Roary and details are provided below. If you encounter an issue when installing Roary please contact your local system administrator.

Ubuntu/Debian

Debian Testing

sudo apt-get install roary

Ubuntu 14.04/16.04

All the dependancies can be installed using apt and cpanm. Root permissions are required. Ubuntu 16.04 contains a package for Roary but it is frozen at v3.6.0.

sudo apt-get install bedtools cd-hit ncbi-blast+ mcl parallel cpanminus prank mafft fasttree
sudo cpanm -f Bio::Roary

Ubuntu 12.04

Some of the software versions in apt are quite old so follow the instructions for Bioconda below.

Bioconda - OSX/Linux

Install conda. Then install bioconda and roary:

conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install roary

Galaxy

Roary is available from the Galaxy toolshed (as is Prokka).

GNU Guix

Roary is included in Guix and can be installed in the usual way:

guix package --install roary

Virtual Machine - OSX/Linux/Windows

Roary wont run natively on Windows but we have created virtual machine which has all of the software setup, including Prokka, along with the test datasets from the paper. It is based on Bio-Linux 8. You need to first install VirtualBox, then load the virtual machine, using the 'File -> Import Appliance' menu option. The root password is 'manager'.

ftp://ftp.sanger.ac.uk/pub/pathogens/pathogens-vm/pathogens-vm.latest.ova

More importantly though, if you're trying to do bioinformatics on Windows, you're not going to get very far and you should seriously consider upgrading to Linux.

Docker - OSX/Linux/Windows/Cloud

We have a docker container which gets automatically built from the latest version of Roary in Debian Med. To install it:

docker pull sangerpathogens/roary

To use it you would use a command such as this (substituting in your directories), where your GFF files are assumed to be stored in /home/ubuntu/data:

docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/roary roary -f /data /data/*.gff

Installing from source (advanced Linux users only)

As a last resort you can install everything from source. This is for users with advanced Linux skills and we do not provide any support with this method since you have the skills to figure things out. Download the latest software from (https://github.com/sanger-pathogens/Roary/tarball/master).

Choose somewhere to put it, for example in your home directory (no root access required):

cd $HOME
tar zxvf sanger-pathogens-Roary-xxxxxx.tar.gz
ls Roary-*

Add the following lines to your $HOME/.bashrc file, or to /etc/profile.d/roary.sh to make it available to all users:

export PATH=$PATH:$HOME/Roary-x.x.x/bin
export PERL5LIB=$PERL5LIB:$HOME/Roary-x.x.x/lib

Install the Perl dependencies:

sudo cpanm  Array::Utils Bio::Perl Exception::Class File::Basename File::Copy File::Find::Rule File::Grep File::Path File::Slurper File::Spec File::Temp File::Which FindBin Getopt::Long Graph Graph::Writer::Dot List::Util Log::Log4perl Moose Moose::Role Text::CSV PerlIO::utf8_strict Devel::OverloadInfo Digest::MD5::File

Install the external dependances either from source or from your packaging system:

bedtools cd-hit blast mcl GNUparallel prank mafft fasttree

Ancient systems and versions of perl

The code will not work with perl 5.8 or below (pre-modern perl). We no longer test against 5.10 (released 2007) or 5.12 (released 2010). If you're running a very old verison of Linux, you're also in trouble.

Running the tests

The test can be run with dzil from the top level directory:

dzil test

Versions of software we test against

Perl 5.14, 5.26
cdhit 4.6.8
ncbi blast+ 2.6.0
mcl 14-137
bedtools 2.27.1
prank 140603
GNU parallel 20170822, 20160722
FastTree 2.1.9

Usage

Usage:   roary [options] *.gff

Options: -p INT    number of threads [1]
         -o STR    clusters output filename [clustered_proteins]
         -f STR    output directory [.]
         -e        create a multiFASTA alignment of core genes using PRANK
         -n        fast core gene alignment with MAFFT, use with -e
         -i        minimum percentage identity for blastp [95]
         -cd FLOAT percentage of isolates a gene must be in to be core [99]
         -qc       generate QC report with Kraken
         -k STR    path to Kraken database for QC, use with -qc
         -a        check dependancies and print versions
         -b STR    blastp executable [blastp]
         -c STR    mcl executable [mcl]
         -d STR    mcxdeblast executable [mcxdeblast]
         -g INT    maximum number of clusters [50000]
         -m STR    makeblastdb executable [makeblastdb]
         -r        create R plots, requires R and ggplot2
         -s        dont split paralogs
         -t INT    translation table [11]
         -ap       allow paralogs in core alignment
         -z        dont delete intermediate files
         -v        verbose output to STDOUT
         -w        print version and exit
         -y        add gene inference information to spreadsheet, doesnt work with -e
         -iv STR   Change the MCL inflation value [1.5]
         -h        this help message

Example: Quickly generate a core gene alignment using 8 threads
         roary -e --mafft -p 8 *.gff

For further info see: http://sanger-pathogens.github.io/Roary/

For further instructions on how to use the software, the input format and output formats, please see the Roary website.

License

Roary is free software, licensed under GPLv3.

Feedback/Issues

We currently do not have the resources to provide support for Roary. However, the community might be able to help you out if you report any issues about usage of the software to the issues page.

Citation

If you use this software please cite:

"Roary: Rapid large-scale prokaryote pan genome analysis",
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
Bioinformatics, (2015). doi: http://dx.doi.org/10.1093/bioinformatics/btv421

Roary: Rapid large-scale prokaryote pan genome analysis

Further Information

For more information on this software see:

roary's People

Contributors

Stargazers

Watchers

Forkers

andrewjpage craigporter wesf abe-uib carlacummins fw1121 manwar idotchere danknight minisciencegirl satta mgalardini riyeyan abremges thaoto brinerallie rafalcode duytintruong tarah28 nds pauruihu wwood psweston roychaudhuri matamoros germainchev dzif cgreene domoritz joscarhuguet fphage ptemidayo minamikawasaki htcf kkilgorend nickp60 ilnamkang fasnicar wangdi2014 hites77 vaofford skammlade boegel embatty giugnolabadmin bioinfoacademy felipelira nicole-emm evdh0 cheerupgirls kdbrumfield kriskiil thkuo micro-it aedecano dolapoa pythseq mhswaney keenii00 rpucheq deminu ssarria pneumowidow 18874851654 duceppemo andreinacastillo smadariaga derekls1 mysoldier barrantesisrael rachelahickman annagaines kannanthirumalmuthu ja-lacey fonyambu martinclott wanliu2019 lalbarracin89 mgro yikedou modupeh tseemann jdalsdurf ggh2020 damedin pxhhappy learithe sebbruchmann mza0150 mradz19 aderonkeayilara beherrm cjreid stellareichling kathiruuu averiian ampholyt ryanjoel ravinpoudel sunnycqcn

roary's Issues

Add support for GFF files from NCBI

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.gff.gz

Use of uninitialized value in require at (eval ..) line 1.

Hi!

I get this recurring error message, also in the newest version.

I ran with -v option to get a bit more info on when it is happening and it seems to be in the end of the FastTree part.

2015/11/13 09:40:24 Running command: /usr/local/bin/FastTree -fastest -nt accessory_binary_genes.fa > accessory_binary_genes.fa.newick
FastTree Version 2.1.8 SSE3, OpenMP (4 threads)
Alignment: accessory_binary_genes.fa
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Fastest+2nd +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.50
ML Model: Jukes-Cantor, CAT approximation with 20 rate categories
Initial topology in 0.00 seconds
Refining topology: 0 rounds ME-NNIs, 2 rounds ME-SPRs, 0 rounds ML-NNIs
Total branch-length 0.000 after 0.00 sec
Total time: 0.00 seconds Unique: 1/16 Bad splits: 0/0
Aligning each cluster
Use of uninitialized value in require at (eval 1957) line 1.

The next thing that happens is:
2015/11/13 09:41:08 Running command: protein_alignment_from_nucleotides -v pan_genome_sequences/acdA.fa pan_genome_sequences/ackA.fa pan_genome_sequences/acpP.fa pan_genome_sequences/acyP.fa pan_genome_sequences/addA.fa pan_genome_sequences/adh.fa pan_genome_sequences/adk.fa pan_genome_sequences/ahpC.fa pan_genome_sequences/alaS.fa pan_genome_sequences/alr.fa

After this it continues to run and finishes without further complaints and I wonder if the error is something that one should be concerned about?

Best wishes,
Kaisa

Annotation missing in set_difference_unique_set_one/two_statistics.csv files

The files generated by the "query_pan_genome -a difference --input_set_one 1.gff,2.gff --input_set_two 3.gff,4.gff,5.gff -g clustered_proteins" (set_difference_unique_set_one/two_statistics.csv) do not have the information in the annotation column, which the gene_presence_absence.csv does have. It would be helpful if they do, is that possible?

Syntax (?) errors on perl 5.10.1

Hi,

on our cluster setup we are using perl version 5.10.1, which gives the following errors when running

roary

perl ~/PATH_TO_ROARY/roary

Type of arg 1 to values must be hash (not hash element) at [...]/perl5/lib/perl5/Bio/Roary/AnnotateGroups.pm line 106, near "} )"
BEGIN not safe after errors--compilation aborted at [...]/perl5/lib/perl5/Bio/Roary/AnnotateGroups.pm line 312.
Compilation failed in require at [...]/perl5/lib/perl5/Bio/Roary.pm line 15.
BEGIN failed--compilation aborted at [...]/perl5/lib/perl5/Bio/Roary.pm line 15.
Compilation failed in require at [...]/perl5/lib/perl5/Bio/Roary/CommandLine/Roary.pm line 8.
BEGIN failed--compilation aborted at [...]/perl5/lib/perl5/Bio/Roary/CommandLine/Roary.pm line 8.
Compilation failed in require at [...]/perl5/bin/roary line 12.
BEGIN failed--compilation aborted at [...]/perl5/bin/roary line 12.

Thanks a lot,
Marco

Add embl output file mapping location of each core gene in the core genome alignment

This is not an issue, just a request for a future release. It would be great if there was an additional output embl that mapped the core genes to the core_gene_alignment.aln file. Alternatively, this could be output in the gene_presence_absence.csv (e.g. core gene = yes/no, position in aln = 1..1469).

Print out the version number

Need protein lengths in the final spreadsheet

Some idea of average or min/max length of proteins in each cluster would be helpful

Even perhaps %id (amino?) level.

Torst

ERROR: cannot remove directory for split_groups

[/bio/linuxbrew/bin/mcxdeblast] all secondary elements were also seen as primary elements (check ok)
cannot remove directory for split_groups: Directory not empty at /bio/perl5/lib/perl5/Bio/Roary/SplitGroups.pm line 167.

Prepackaged binaries are dynamically linked and not working on older distributions

Hi there,

this could get messy and I do not know whether you want to go down this path.

Observation

The binaries prepackaged in the .tar.gz downloadable on http://sanger-pathogens.github.io/Roary/ are dynamically linked. Which means they will fail on older installations with, e.g., messages like these:

bach@trinity:/opt/biosw/roary/binaries/linux$ ./bedtools
./bedtools: /lib/libc.so.6: version GLIBC_2.15' not found (required by ./bedtools) ./bedtools: /lib/libc.so.6: versionGLIBC_2.14' not found (required by ./bedtools)

I just saw problems with GLIBC, but maybe there may be others lurking in the background. Users not always have the possibility to upgrade their machines or patch in newer versions of GLIBC.

You could provide statically linked executables in the download (I do that for MIRA), but this might become a messy thing for OSX.

Use of temporary folders and files

FYI, i noticed that in the folder roary is run from that there is an tradtional tmpdir() folder made, but also lots of temp files (some with underscore, some .faa etc) which are not in that tmpdir.

Not sure if this is deliberate or not.

Change QC so that it doesnt shred reads

labeling of missing values in CSV files

I primarily use the CSV files produced by Roary, and currently markers missing are represented by an empty cell if opened in a spreadsheet. With large datasets (say 500-1000 genomes) this is sometimes awkward, especially if one wants to do formatting. If all cells were filled (i.e. N/A for missing values), then this would make life a lot easier.

Is this possible? It saves doing Find/Replace of empty cells in Excel. I have done Find/Replace of "" in the text file, but this gives some errors and hence does not work well.

MSG: Got a sequence without letters. Could not guess alphabet

Hi
I got this error when I try to create a core alignment
Thanks

Roary not checking tools needed to run

Hi there,

Bug report

When starting up, Roaring does not check whether tools it needs are in its path, leading to incomprehensible (for users) error reports. So, instead of somthing like "Could not run 'bedtools', is it installed? Is it in your $PATH?" the user gets (e.g).:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file '/scratch2/tmp/homebachtmp/tmp/compgenomes/tm9omL9Afb/bpg16.gff.proteome.faa.intermediate.extracted.fa': No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:449
STACK: Bio::Root::IO::_initialize_io /usr/local/share/perl/5.10.0/Bio/Root/IO.pm:270
STACK: Bio::SeqIO::_initialize /usr/local/share/perl/5.10.0/Bio/SeqIO.pm:499
STACK: Bio::SeqIO::fasta::_initialize /usr/local/share/perl/5.10.0/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /usr/local/share/perl/5.10.0/Bio/SeqIO.pm:375
STACK: Bio::SeqIO::new /usr/local/share/perl/5.10.0/Bio/SeqIO.pm:421
STACK: Bio::Roary::ExtractProteomeFromGFF::_fastatranslate /opt/biosw/roary/lib/Bio/Roary/ExtractProteomeFromGFF.pm:135
STACK: Bio::Roary::ExtractProteomeFromGFF::_convert_nucleotide_to_protein /opt/biosw/roary/lib/Bio/Roary/ExtractProteomeFromGFF.pm:149
STACK: Bio::Roary::ExtractProteomeFromGFF::fasta_file /opt/biosw/roary/lib/Bio/Roary/ExtractProteomeFromGFF.pm:40
STACK: Bio::Roary::CommandLine::ExtractProteomeFromGff::run /opt/biosw/roary/lib/Bio/Roary/CommandLine/ExtractProteomeFromGff.pm:95
STACK: /opt/biosw/bin/extract_proteome_from_gff:19

Suggested resolution

prokka (from tseeman) has a pretty flexible runtime tool checker which could be adapted for Roary in no time.

Background

I just asked 1 hour agon on Twitter about a tool to find presence/absence of genes in bacterial genomes and got a pointer to Roary. What I see on GitHub looks fantastic both in terms of presentation and code.

I have a couple of bugs / observations for which I will open separate tickets, please excuse that spamming but I think Roary is simply too good to not make it even better.

This is occuring on a fairly old machine (Kubuntu 9.10) and I need to install a lot by hand to keep it running with current software. That might explain a couple of oddities I report which you would not expect on newer distributions or when installed vie simple apt-get, homebrew or similar. Still, having Roary quickly point to the obvious reason for fails will make life easier for a couple of people.

Best,
Bastien

Accessory genes newick file contains full path of infividual files

The accessory_binary_genes.fa.newick addition is great. One thing I noticed is that the name of each entry contains the full path name, i.e. /usr/name/yadda1/yadda2/roary/cxgf68xy/sample1, /usr/name/yadda1/yadda2/roary/cxgf68xy/sample2, etc.

I would prefer only the names without the path, is inclusion of the full pathname by design?

"cpan" command reports Bio::Roary as version '(undef)'

cpan Bio::Roary
Reading '/bio/perl5/.cpan/Metadata'
  Database was generated on Sun, 31 May 2015 22:41:02 GMT
Bio::Roary is up to date (undef).

Add --outdir option to avoid blatting current directory

Roary outputs a lot of files. I was hoping you would add an --outdir option to place everything in a specific folder?

This would simplify pipelines so they don't have to do lots of mkdir/cd logic.

Non-issue, FYI regarding my 'roary2svg.pl' script

I didn't have much luck getting Marco's contrib/ plots script to work, so I ended up writing a basic SVG plotter for roary output. It's going to be a permanent part of Nullarbor but you can add it to your contrib/ folder if you think others would benefit.

https://raw.githubusercontent.com/tseemann/nullarbor/master/bin/roary2svg.pl

It's pretty basic... SVG sucks at fonts, so it's more confusing than it needs to be, and it may not even work 100% for crazy pan-genomes, not sure yet!

The --taxacol is so it's easy to adapt when you break the .csv file structure ;-)

I just use "convert foo.svg foo.png" to get an image if i need it.

Below is some details:

Usage: /home/tseemann/git/nullarbor/bin/roary2svg.pl [options] gene_presence_absence.csv > pan_genome.svg
  --help          This help.
  --verbose!      Verbose output (default '0').
  --width=i       Canvas width (default '1024').
  --height=i      Row height (and ~ font height) (default '20').
  --taxacol=i     Column in gpa.csv where taxa begin (default '14').
  --panonly!      Only non-core genes (default '0').

[question] Can query_pan_genome be set for percentage

is it possible to use query_pan_genome with a percentage, just as roary? For example, when comparing two groups, be able to set thresholds like "max 30% present in group 1, min 70% present in group 2" or vice versa. Maybe it's already there and I am missing it.

Thanks :)

Bio-RetrieveAssemblies-1.0.1 fails to install

I note in the manual you use "cpan -f" to force install.
I didn't use it - and this happened:
Do I need to force?

AJPAGE/Bio-RetrieveAssemblies-1.0.1.tar.gz
  /bin/make -- OK
Running make test
PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/Bio/*.t t/Bio/RetrieveAssemblies/*.t
t/Bio/RetrieveAssemblies.t ................ 1/? Unable to get remote page at t/lib/TestHelper.pm line 28.
t/Bio/RetrieveAssemblies.t ................ 2/?
#   Failed test 'Expected file for command -q Mycobacterium -a -f gff PRJEB8877 exists: downloaded_files/CVMX01.1.gbff.gz.gff'
#   at t/lib/TestHelper.pm line 31.
Unable to get remote page at t/lib/TestHelper.pm line 28.
t/Bio/RetrieveAssemblies.t ................ 3/?
#   Failed test 'Expected file for command -q Mycobacterium -f fasta PRJEB8877 exists: downloaded_files/CVMX01.1.fsa_nt.gz'
#   at t/lib/TestHelper.pm line 31.
Unable to get remote page at t/lib/TestHelper.pm line 28.
t/Bio/RetrieveAssemblies.t ................ 4/?
#   Failed test 'Expected file for command -q Mycobacterium -f gff PRJEB8877 exists: downloaded_files/CVMX01.1.gbff.gz.gff'
#   at t/lib/TestHelper.pm line 31.
Unable to get remote page at t/lib/TestHelper.pm line 28.
t/Bio/RetrieveAssemblies.t ................ 5/?
#   Failed test 'Expected file for command -q Mycobacterium -o my_dir PRJEB8877 exists: my_dir/CVMX01.1.gbff.gz'
#   at t/lib/TestHelper.pm line 31.
Unable to get remote page at t/lib/TestHelper.pm line 28.
t/Bio/RetrieveAssemblies.t ................ 6/?
#   Failed test 'Expected file for command -q Mycobacterium PRJEB8877 exists: downloaded_files/CVMX01.1.gbff.gz'
#   at t/lib/TestHelper.pm line 31.
# Looks like you failed 5 tests of 6.
t/Bio/RetrieveAssemblies.t ................ Dubious, test returned 5 (wstat 1280, 0x500)
Failed 5/6 subtests
t/Bio/RetrieveAssemblies/AccessionFile.t .. ok
t/Bio/RetrieveAssemblies/RefWeak.t ........ ok
t/Bio/RetrieveAssemblies/WGS.t ............ ok
t/requires_external.t ..................... ok

Test Summary Report
-------------------
t/Bio/RetrieveAssemblies.t              (Wstat: 1280 Tests: 6 Failed: 5)
  Failed tests:  2-6
  Non-zero exit status: 5
Files=5, Tests=33, 95 wallclock secs ( 0.02 usr  0.01 sys +  1.40 cusr  0.20 csys =  1.63 CPU)
Result: FAIL
Failed 1/5 test programs. 5/33 subtests failed.
make: *** [test_dynamic] Error 255
  AJPAGE/Bio-RetrieveAssemblies-1.0.1.tar.gz
  /bin/make test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports AJPAGE/Bio-RetrieveAssemblies-1.0.1.tar.gz
Stopping: 'install' failed for 'A/AJ/AJPAGE/Bio-RetrieveAssemblies-1.0.1.tar.gz'.

Enhancement: roary -a to continue on if other parameters as well

It would be good if the -a option checked the deps and then continued on and did the clustering IF there were GFF parameters etc.

Verbose output with -v

Pentuple memory for worst case sCenario

OLD

  # Triple memory for worst case senario
    $memory_required *= 5;

NEW?

  # Pentuple memory for worst case sCenario
    $memory_required *= 5;

Identical sequences are placed in different OGs??

Hi
I used Roary 3.2.7 to find the orthologs in a collection of 650 e.coli isolates (commandline: roary -e --mafft *.gff) To my surprise identical sequences are placed in different clusters. ( I have reproduced this error with Roary 3.3.4 in a smaller set of 11 genomes, here only two out of 4000 genes are duplicates, so it appears it may be fixed, however not completely.)

In some cases there are duplicate genes in the pan genome reference file, sometimes there are genes with duplicate sequences. The numbers don't add up with the summary statistics file or the gene presence absence matrix. Also, by manually checking I found genes with exactly the same sequence placed in two different clusters in the gene_presence_absence file.

These are serious errors and I feel some release testing should have been implemented (especially as this is published software). Although the 3.3.4 release appears to fix some of the issues, it is still not correct. Could you please take if you can reproduce this issue? Here's the 11 genomes set:

http://klif.vet.uu.nl/mrsa/
(roary -e --mafft *.gff results in 4683 OGs, with 2 identical sequences as separate OGs)

fasta2tab transforms fasta file into a tab delimited file. This is just some perl code.
perl -e '$count=0; $len=0; while(<>) {s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) {print "\n"} s/ |$/\t/; $count++; $_ .= "\t";} else {s/ //g; $len += length($)} print $;} print "\n"; warn "\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n";'

roary R plots don't work on server --- lack of X11

Hi. Great software. Unfortunately, when running on the server with no X11, obtaining the PNG plots runs into the following problem:

Error in .External2(C_X11, paste("png::", filename, sep = ""), g$width, :
unable to start device PNG
In addition: Warning message:
In png("test.png") : unable to open connection to X11 display ''

The same does not occur if the plot is a PDF or bitmap:

bitmap(filename,"png16m")

This would still generate a readable PNG.

Or:

pdf(filename)

Thank you.

Anders.

Error "Cant open file: _uninflated_mcl_groups"

Hi!

I have downloaded roary 3.2.7 using homebrew on my Mac OSX Yosemite. It seemed to have installed properly but when I run there seems to be some problem with an intermediate file not being created/found. I ran it in verbose mode to see if I could get more clues but cannot see what is the root of the problem. Do you know what could be the problem? I append the command line output below.

Best wishes

Kaisa

roary -e -v *.gff

Please cite Roary if you use any of the results it produces:
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill (2015), "Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics,
doi: http://doi.org/10.1093/bioinformatics/btv421

2015/09/23 11:26:57 Fixing input GFF files
2015/09/23 11:27:03 Extracting proteins from GFF files
Extracting proteins from 513A.gff
Extracting proteins from PC2022III.gff
Extracting proteins from PC2777IV.gff
Extracting proteins from PC3053II.gff
Extracting proteins from PC3517II.gff
Extracting proteins from PC3714II.gff
Extracting proteins from PC390II.gff
Extracting proteins from PC3939II.gff
Extracting proteins from PC3997IV.gff
Extracting proteins from PC4226IV.gff
Extracting proteins from PC4580III.gff
Extracting proteins from PC4597II.gff
Extracting proteins from PC5099IV.gff
Extracting proteins from PC5538III.gff
Extracting proteins from PC5587platt.gff
Extracting proteins from PC5587u.gff
Extracting proteins from W1090330.gff
Combine proteins into a single file
Iteratively run cd-hit
Parallel all against all blast
Cluster with MCL
2015/09/23 11:51:37 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presence_absence.csv -c _clustered.clstr --output_multifasta_files -i _gff_files -f _fasta_files -t 11 --dont_create_rplots -v -j Local --processors 1 --group_limit 50000 -cd 99
2015/09/23 11:51:37 Reinflate clusters
Cant open file: _uninflated_mcl_groups
KaisaTiMac:kaisa$ ls
513A.gff PC3714II.gff.proteome.faa PC4597II.gff W1090330.gff.proteome.faa
513A.gff.proteome.faa PC390II.gff PC4597II.gff.proteome.faa _clustered
PC2022III.gff PC390II.gff.proteome.faa PC5099IV.gff _clustered.clstr
PC2022III.gff.proteome.faa PC3939II.gff PC5099IV.gff.proteome.faa _combined_files
PC2777IV.gff PC3939II.gff.proteome.faa PC5538III.gff _combined_files.groups
PC2777IV.gff.proteome.faa PC3997IV.gff PC5538III.gff.proteome.faa _fasta_files
PC3053II.gff PC3997IV.gff.proteome.faa PC5587platt.gff _gff_files
PC3053II.gff.proteome.faa PC4226IV.gff PC5587platt.gff.proteome.faa blast_identity_frequency.Rtab
PC3517II.gff PC4226IV.gff.proteome.faa PC5587u.gff
PC3517II.gff.proteome.faa PC4580III.gff PC5587u.gff.proteome.faa
PC3714II.gff PC4580III.gff.proteome.faa W1090330.gff

EXCEPTION: Bio::Root::Exception could not read ...faa.intermediate.extracted.fa

Hello,

after asking IT to install roary into local cluster I have been stuck with the following error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file 'nameX.gff.proteome.faa.intermediate.extracted.fa': No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/Root/Root.pm:449
STACK: Bio::Root::IO::_initialize_io /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/Root/IO.pm:270
STACK: Bio::SeqIO::_initialize /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/SeqIO.pm:499
STACK: Bio::SeqIO::fasta::_initialize /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/SeqIO.pm:375
STACK: Bio::SeqIO::new /opt/apps/perl/5.22.0/lib/site_perl/5.22.0/Bio/SeqIO.pm:421
STACK: Bio::Roary::ExtractProteomeFromGFF::_fastatranslate /opt/apps/roary/3.2.7/lib/Bio/Roary/ExtractProteomeFromGFF.pm:138
STACK: Bio::Roary::ExtractProteomeFromGFF::_convert_nucleotide_to_protein /opt/apps/roary/3.2.7/lib/Bio/Roary/ExtractProteomeFromGFF.pm:152
STACK: Bio::Roary::ExtractProteomeFromGFF::fasta_file /opt/apps/roary/3.2.7/lib/Bio/Roary/ExtractProteomeFromGFF.pm:43
STACK: Bio::Roary::CommandLine::ExtractProteomeFromGff::run /opt/apps/roary/3.2.7/lib/Bio/Roary/CommandLine/ExtractProteomeFromGff.pm:86

STACK: /opt/apps/roary/3.2.7/bin/extract_proteome_from_gff:18

I am sure my input files is ok, they finished fine in my laptop with older roary (2.0.0).
Any ideas of what to check are very welcome!

Thank you,
Nadejda

Roary not using packaged executables

Hi there,

maybe this is due to preliminary status of Roary.

Bug report

The prepackaged download .tar.gz file on http://sanger-pathogens.github.io/Roary/ apparently comes with binaries for Linux and OSX. However, if the user does not actively take them up in the $PATH, Roary will not use them (and fail if there are no other versions of them in the $PATH).

Suggested resolution

Test whether a tool is in the current PATH (see #214) and if not, make sure the prepackaged version is used.

Is the roary -a check complete?

Is is missing FastTree or fastml or exonerate or R or kraken?

2015/11/17 11:54:58 Looking for 'awk' - found /usr/bin/awk
2015/11/17 11:54:58 Looking for 'bedtools' - found /bio/linuxbrew/bin/bedtools
2015/11/17 11:54:58 Determined bedtools version is 2.24
2015/11/17 11:54:58 Looking for 'blastp' - found /bio/linuxbrew/bin/blastp
2015/11/17 11:54:58 Determined blastp version is 2.2.31
2015/11/17 11:54:58 Looking for 'cd-hit' - found /bio/linuxbrew/bin/cd-hit
2015/11/17 11:54:58 Determined cd-hit version is 4.6
2015/11/17 11:54:58 Optional tool 'cdhit' not found in your $PATH
2015/11/17 11:54:58 Looking for 'grep' - found /usr/bin/grep
2015/11/17 11:54:58 Looking for 'mafft' - found /bio/linuxbrew/bin/mafft
2015/11/17 11:54:58 Determined mafft version is 7.221
2015/11/17 11:54:58 Looking for 'makeblastdb' - found /bio/linuxbrew/bin/makeblastdb
2015/11/17 11:54:58 Determined makeblastdb version is 2.2.31
2015/11/17 11:54:58 Looking for 'mcl' - found /bio/linuxbrew/bin/mcl
2015/11/17 11:54:58 Determined mcl version is 14-137
2015/11/17 11:54:58 Looking for 'parallel' - found /bio/linuxbrew/bin/parallel
2015/11/17 11:54:58 Determined parallel version is 20150922
2015/11/17 11:54:59 Looking for 'prank' - found /bio/linuxbrew/bin/prank
2015/11/17 11:55:00 Determined prank version is 140603
2015/11/17 11:55:00 Looking for 'sed' - found /bio/linuxbrew/bin/sed
Roary version 3.5.1

GFF files derived from Prokka genbank raise errors

Hi,

I'm using Roary with a bunch of bacterial genomes; some have been annotated with prokka, some others not. A genbank file is available for all of them. I've converted all the genbank files to gff3 using the bcbio gff writer (https://github.com/chapmanb/bcbb/tree/master/gff), which to the best of my knowledge produces valid GFF3 files.

When running using the prokka generated gff files the program runs smoothly; when running with the gff files derived from the genbank file, the program halts with the following error:

BLAST Database error: No alias or index file found for protein database [/home/user/workspace/Roary/bin/UcWJpjcOru/output_contigs] in search path [/home/user/workspace/Roary/bin::]

Some files are however still produced, like the gene_presence_absence.csv one, even though the genomes columns do not contain the locus_tag but either nothing or the EC_number (see below). A more detailed documentation on the expected GFF format (order of the anotations for instance) would maybe help?

Thanks a lot,
Marco

Example of annotation from prokka:

gnl|Prokka|GENOME02_contig000001   Prodigal:2.6    CDS     42      578     .       +       0       ID=GENOME02_00001;inference=ab initio prediction:Prodigal:2.6;locus_tag=GENOME02_00001;product=hypothetical protein;protein_id=gnl|Prokka|GENOME02_00001

Example annotation from the gff file converted from the prokka genbank file:

GENOME02_contig000001      feature CDS     42      578     .       +       0       codon_start=1;inference=ab initio prediction:Prodigal:2.6;locus_tag=GENOME02_00001;product=hypothetical protein;protein_id=Prokka:GENOME02_00001;transl_table=11;translation=MIAEIFQGGFVVFQQQFSKVHFEAATTHNAHHHDVGGFTAESEGRNLPAAQTQTFREVVQGVSRIFTIFQFEANRRDAFVRATRTDELIRPQFGDFIRQISGNLVRGVLYFGIAFTTEAQEFIVLCNYLTRRAGEVDGKSTNLTTQVVNVEHQFLRQRFFVTPDNPAAAQRSQTEFMA

gene_presence_absence.csv produced from the prokka gff files (6036 lines):

"group_4797","","hypothetical protein","3","3","1","","","","","","GENOME02_00001","GENOME03_01386","GENOME04_00768"

gene_presence_absence.csv produced from the gff files derived from the genbank (2472 lines):

"group_1","","","1","1","1","","","","","","","","EC_number=2.7.2.11"

00_requires_external.t missing "mafft" ?

MAFFT not here:

ok(scalar PATH->Whence($_), "$_ in PATH") for qw(blastp makeblastdb mcl mcxdeblast bedtools prank parallel);

Hard-coded Sanger paths in some scripts

/software/pathogen/external/apps/usr/local/bin/Rscript

should be

/usr/bin/env Rscript

No tagged release for 2.2.3

CPAN installs something with version 2.2.3 but not git release for it.

Use of uninitialized value in File::Slurper and Encode.pm

Note sure what these relate to:

2015/11/05 09:32:30 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presen
sence.csv -c _clustered.clstr  -i /mnt/seq/JOBS/J2014-06814/nullarbor.modern/roary/PGjpiKTwDC//_gff_files -f /mnt/s
BS/J2014-06814/nullarbor.modern/roary/PGjpiKTwDC//_fasta_files -t 11  --dont_create_rplots   -v  -j Parallel --proc
s 8 --group_limit 50000 -cd 99
Use of uninitialized value in require at /bio/perl5/lib/perl5/File/Slurper.pm line 32.
2015/11/05 09:32:32 Reinflate clusters
2015/11/05 09:32:32 Split groups with paralogs
Use of uninitialized value in require at /bio/perl5/lib/perl5/x86_64-linux-thread-multi/Encode.pm line 59.

.aln files

Very cool tool!

I notice that after the alignment of the core genome is complete the content of the output directory 'pan_genome_sequences' containing the complete set of pan genome .fa.aln files gets deleted.
I could put these individual .fa.aln files to good use, is it possible to stop this occuring?

Cheers
Dan

[bug] Newick files in 3.5.1 have branch lengths of 0.0

The conversion to presence/absence of accessory genes to Newick files goes wrong since the update to 3.5.1. The .fa file contains only C's, and hence there is no difference visible.

This is tested with datasets that previously gave clear differences.

Summary file:

Core genes (99% <= strains <= 100%): 1461
Soft core genes (95% <= strains < 99%): 765
Shell genes (15% <= strains < 95%): 1389
Cloud genes (0% <= strains < 15%): 7657
Total genes: 11272

So branch lengths of 0 are not expected.

The Newick file looks like this:

(L1_Lm_10KSM:0.0,L1_Lm_11KSM:0.0,L1_Lm_13KSM:0.0,L1_Lm_15KSM:0.0,L1_Lm_4KSM:0.0,L1_Lm_6KSM:0.0,L1_Lm_8KSM:0.0,L1_Lm_BHU1:0.0,L1_Lm_BHU2:0.0,L1_Lm_BHU3:0.0, etc

The .fa file says:

L1_Lm_10KSM
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
L1_Lm_11KSM
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
L1_Lm_13KSM
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

(etc)

Check at least 2 gff files have been passed in

cleanup outputfiles

Use of -e switch gives multifasta file with N's only

I have Muscle and revtrans.py (v1.4) installed and in the PATH, but the .aln file produced contains only N's. The GFF's are from Prokka and work fine for creating the gene lists, but somewhere it goes wrong in the reverse translation.

Any thoughts?

The commandline is: roary -v -i 90 -e *.gff.

Edit: the temporary files are created fine, so the revtrans.py works fine, it is just in the conversion of all the temporary files to the final alignment which goes wrong

Make summary_statistics a TAB/TSV file?

Currently summary_statistics.txt is space padded. I am currently splitting it on : to format it into a table for reports. Maybe make it TSV ?

prank seg fault

If the -e option is used, prank seg faults if all the sequences are the same.

QC doesnt work outside sanger

Binaries with Higher GLIBC than available.

Hi,

The binaries supplied with roary complain about GLIBC:

/lib64/libc.so.6: version `GLIBC_2.14' not found
/usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found

Can the binaries supplied with the package be re-compiled against the lowest possible GLIBC please. We have 2.12 but, I am sure there are others that run older versions on the GLIBC

Get Roary to detect installation of dependencies

Would it be possible to get Roary to check for the installation of it's non perl dependencies, like MAFFT, exonerate etc and report which ones are missing?

sadaf

Hi. I downloaded Roary on our iridis (SSH). I am having a problem while trying to run (roary *.gff) on command line. The error looks something like that

EXCEPTION: Bio::Root::Exception -------------
MSG: Could not open PROKKA_SG1.gff.proteome.faa.intermediate.extracted.fa: No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /local/software/perl-modules/share/perl5/Bio/Root/Root.pm:486
STACK: Bio::Root::IO::_initialize_io /local/software/perl-modules/share/perl5/Bio/Root/IO.pm:351
STACK: Bio::SeqIO::_initialize /local/software/perl-modules/share/perl5/Bio/SeqIO.pm:491
STACK: Bio::SeqIO::fasta::_initialize /local/software/perl-modules/share/perl5/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /local/software/perl-modules/share/perl5/Bio/SeqIO.pm:372
STACK: Bio::SeqIO::new /local/software/perl-modules/share/perl5/Bio/SeqIO.pm:413
STACK: Bio::Roary::ExtractProteomeFromGFF::_fastatranslate /local/software/roary/3.0.3/source/sanger-pathogens-Roary-5685c8b/lib/Bio/Roary/ExtractProteomeFromGFF.pm:138
STACK: Bio::Roary::ExtractProteomeFromGFF::_convert_nucleotide_to_protein /local/software/roary/3.0.3/source/sanger-pathogens-Roary-5685c8b/lib/Bio/Roary/ExtractProteomeFromGFF.pm:152
STACK: Bio::Roary::ExtractProteomeFromGFF::fasta_file /local/software/roary/3.0.3/source/sanger-pathogens-Roary-5685c8b/lib/Bio/Roary/ExtractProteomeFromGFF.pm:43
STACK: Bio::Roary::CommandLine::ExtractProteomeFromGff::run /local/software/roary/3.0.3/source/sanger-pathogens-Roary-5685c8b/lib/Bio/Roary/CommandLine/ExtractProteomeFromGff.pm:79

STACK: /local/software/roary/3.0.3/source/sanger-pathogens-Roary-5685c8b/bin/extract_proteome_from_gff:18

I have published a Roary homebrew formula

I've written a homebrew package for Roary in my tap.

https://github.com/tseemann/homebrew-bioinformatics-linux/blob/master/roary.rb

It doesn't install Roary itself (as Brew doesn't do Perl) but it installs its dependencies and checks that the Bio::Roary Perl module is installed.

You're missing some Perl dependancies

Installation using -

Installation - With bundled binaries instructions.

You list -

cpanm Array::Utils BioPerl Exception::Class File::Find::Rule File::Grep File::Slurp::Tiny Graph Moose Moose::Role Text::CSV

On a fresh install of Ubuntu 14.04 you also need -
Log::Log4perl
File::Which

CPAN install failure "unknown option mafft"

Mafft is installed and in PATH.
I deleted the original bin/roary from previous install too.
It looks like the FASTA file comparison fails - lower case vs uppercase?

t/Bio/Roary/CommandLine/QueryRoary.t .................... ok
t/Bio/Roary/CommandLine/Roary.t ......................... 46/? Unknown option: mafft
Unknown option: mafft
Unknown option: mafft
Unknown option: mafft
t/Bio/Roary/CommandLine/Roary.t ......................... 48/?
#   Failed test 'Actual and expected output match for '-j Local --dont_delete_files --dont_split_groups  --output_multifasta_files --mafft --dont_delete_files t/data/real_data_1.gff t/data/real_data_2.gff''
#   at t/lib/TestHelper.pm line 75.
# +---+--------------------------------------------------------------+--------------------------------------------------------------+
# |   |Got                                                           |Expected                                                      |
# | Ln|                                                              |                                                              |
# +---+--------------------------------------------------------------+--------------------------------------------------------------+
# |  1|>11111_1#11_04119                                             |>11111_1#11_04119
# *  2|ATGAATAAAACAACTGAGTATATTGACGCACTGCTGCTTTCTGAACGTGAGAAAGCGGCA  |atgaataaaacaactgagtatattgacgcactgctgctttctgaacgtgagaaagcggca  *```

Test Summary Report
-------------------
t/Bio/Roary/CommandLine/Roary.t                       (Wstat: 256 Tests: 50 Failed: 1)
  Failed test:  49
  Non-zero exit status: 1
Files=48, Tests=704, 140 wallclock secs ( 0.17 usr  0.05 sys + 73.77 cusr 26.66 csys = 100.65 CPU)
Result: FAIL
Failed 1/48 test programs. 1/704 subtests failed.
make: *** [test_dynamic] Error 255
  AJPAGE/Bio-Roary-3.2.4.tar.gz
  /bin/make test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports AJPAGE/Bio-Roary-3.2.4.tar.gz
Stopping: 'install' failed for 'Bio::Roary'.
Failed during this command:
 AJPAGE/Bio-Roary-3.2.4.tar.gz                : make_test NO

roary --version should return 0 not 255 exit code

I asked for version, it was a success, not a failure :)
Important for brew test results.

% roary --version
3.5.1
% echo $?
255

Also, roary -a returns 2 which is non-zero and indicates error.
Important for pipelines that check before they run.

Getting Roary into Homebrew

All the binary dependencies are in Homebrew Science. The main issue is Perl modules. Homebrew only checks to see if they are installed. You can install them your ones in custom places etc, but the dependencies are trickier.

Have you considered fatpack?
http://search.cpan.org/~mstrout/App-FatPacker-0.010003/lib/App/FatPacker.pm

I'm happy to help get it into Brew. If all the Perl deps were already installed, could i run it from the untarred file somehow? Or customize where it installs it?

sanger-pathogens / roary Goto Github PK

roary's Introduction

Roary - The pan genome pipeline

Contents

Introduction

Installation

Required dependencies

Optional dependencies

Ubuntu/Debian

Debian Testing

Ubuntu 14.04/16.04

Ubuntu 12.04

Bioconda - OSX/Linux

Galaxy

GNU Guix

Virtual Machine - OSX/Linux/Windows

Docker - OSX/Linux/Windows/Cloud

Installing from source (advanced Linux users only)

Ancient systems and versions of perl

Running the tests

Versions of software we test against

Usage

License

Feedback/Issues

Citation

Further Information

roary's People

Contributors

Stargazers

Watchers

Forkers

roary's Issues

Observation

Bug report

Suggested resolution

Background

STACK: /opt/apps/roary/3.2.7/bin/extract_proteome_from_gff:18

Bug report

Suggested resolution

STACK: /local/software/roary/3.0.3/source/sanger-pathogens-Roary-5685c8b/bin/extract_proteome_from_gff:18

Recommend Projects

Recommend Topics

Recommend Org