will-rowe / groot Goto Github PK

View Code? Open in Web Editor NEW

63.0 8.0 6.0 12.76 MB

A resistome profiler for Graphing Resistance Out Of meTagenomes

License: MIT License

Go 96.76% Shell 3.24%

antibiotic-resistance metagenomics resistome minhash lsh alignment

groot's Introduction

Graphing Resistance Out Of meTagenomes

Overview

GROOT is a tool to type Antibiotic Resistance Genes (ARGs) in metagenomic samples (a.k.a. Resistome Profiling). It combines variation graph representation of gene sets with an LSH indexing scheme to allow for fast classification of metagenomic reads. Subsequent hierarchical local alignment of classified reads against graph traversals facilitates accurate reconstruction of full-length gene sequences using a simple scoring scheme.

GROOT will output an ARG alignment file (in BAM format) that contains the graph traversals possible for each query read; the alignment file is then used by GROOT to generate a resistome profile.

Since version 0.4, GROOT will also output the variation graphs which had reads align. These graphs are in GFA format, allowing you to visualise graph alignments using Bandage and determine which variants of a given ARG type are dominant in your metagenomes. Read the documentation for more info.

Since version 1.0.0, GROOT has had a partial re-write (merging features and changes from my baby groot project). It now uses the excellent LSH Ensemble library as the LSH index, enabling containment search for read seeding. I've also improved my dev know-how and GROOT is now more efficient. However, these changes have meant that I've needed to change some of the CLI, so please read the docs.

Installation

Check out the releases to download a binary. Alternatively, install using Bioconda or compile the software from source.

Bioconda

conda install -c bioconda groot

Brew

brew install brewsci/bio/groot

Source

GROOT is written in Go (v1.14) - to compile from source you will first need the Go tool chain. Once you have it, try something like this to compile:

# Clone this repository
git clone https://github.com/will-rowe/groot.git

# Go into the repository and get the package dependencies
cd groot
go get -d -t -v ./...

# Run the unit tests
go test -v ./...

# Compile the program
go build ./

# Call the program
./groot --help

Quick Start

GROOT is called by typing groot, followed by the subcommand you wish to run. There are three main subcommands: index, align and report. This quick start will show you how to get things running but it is recommended to follow the documentation.

# Get a pre-clustered ARG database
groot get -d arg-annot

# Create graphs and index
groot index -m arg-annot.90 -i grootIndex -w 100

# Align reads and report
groot align -i grootIndex -f reads.fq | groot report

note: it's recommended to index the graph using a window size ~= your maximum expected read length, so for 100bp reads, use -w 100

Further Information & Citing

Please readthedocs for more extensive documentation and a tutorial.

GROOT has now been published in Bioinformatics:

Rowe WPM, Winn MD. Indexed variation graphs for efficient and accurate resistome profiling. Bioinformatics. 2018. doi: bty387

groot's People

Contributors

Stargazers

Watchers

Forkers

tomdeman-bio fmaguire lancetxiao kristapsbe gomathinayagam

groot's Issues

mean read length is outside the graph window size (+/- 10 bases)

Hi! Due to trimming my read data actually ranged in length from 55 to 235. It is possible for groot to accomodate this? In the meantime I'll try to get groot to work by filtering the reads by size.
Thanks for the help!

update databases

Time to revisit the databases. I think they should now be:

downloaded from source, rather than the cloned copies in this repo
time stamped or version numbered so that results are reproducible
augmented with some recent databases, not just for AMR genes

Containment not option not present in 0.8.5 on OSX

Looks like the containment option is not present on my version of groot. (0.8.5 installed via bioconda to OSX)

(base) dcdanko@mac196156:~/Dev/cap2
$ vendor/conda/CAP_v2/bin/groot version
0.8.5
(base) dcdanko@mac196156:~/Dev/cap2
$ vendor/conda/CAP_v2/bin/groot index -i tests/data/groot_amrs -o bar -l 250 --containment -j 0.5
Error: unknown flag: --containment
Usage:
  groot index [flags]

Flags:
  -h, --help             help for index
  -j, --jsThresh float   minimum Jaccard similarity for a seed to be recorded (default 0.99)
  -k, --kmerSize int     size of k-mer (default 7)
  -i, --msaDir string    directory containing the clustered references (MSA files) - required
  -o, --outDir string    directory to save index files to (default "./groot-index-20191112133350")
  -l, --readLength int   length of query reads (which will be aligned during the align subcommand) (default 100)
  -s, --sigSize int      size of MinHash signature (default 128)

Global Flags:
  -y, --logFile string   filename for log file (default "./groot.log")
  -p, --processors int   number of processors to use (default 1)
      --profiling        create the files needed to profile GROOT using the go tool pprof

Any obvious fix?

Paired-end reads information

Dear Authors,
Thank you for creating a great tool!
I was wondering a bit about running your tool with paired-end data. I see you mention that the tool does not utilize the paired-end information. Might I ask what the reason for this is?
Also, if I run the tool with both R1 and R2 concatenated, will the read counts be duplicated?
Lastly, is it enough to only run on the R1 reads?
Thanks in advance!

option to output GFA during indexing

At the moment the graph is essentially in GFA format anyway as GROOT converts the input MSA to GFA before indexing, but it then serialises it with MessagePack. Could just add an option to write it in GFA too.

This will be useful for debugging as I'm extending GROOT beyond AMR genes.

Read Count Clarification

Hi Will! I have a quick clarification about the read counts reported with groot report. Are read paires or individual read counts reported? Thanks, Emily

Problem downloading card database

Hi,

Many thanks for creating GROOT! How i could install the card database? By using the get command (below), it is downloading the arg-annot database.

groot get –d card -o card_7-4-2022
downloading the pre-clustered arg-annot database...
unpacking...
database saved to: card_7-4-2022/arg-annot.90
now run groot index -m card_7-4-2022/arg-annot.90 or groot index --help for full options

Thank you!
Best, Wasim

unhelpful crash message when file in index is empty

I'm trying to build a groot index and it returned this error


goroutine 39 [running]:
github.com/biogo/biogo/seq/multi.(*Multi).Column(0xc00001d780, 0x0, 0x1, 0x0, 0x0, 0x0)
	/opt/conda/conda-bld/groot_1553596083523/work/src/github.com/biogo/biogo/seq/multi/multi.go:250 +0x691
github.com/will-rowe/gfa.getNodes(0xc00001d780, 0x1, 0x0, 0x0)
	/opt/conda/conda-bld/groot_1553596083523/work/src/github.com/will-rowe/gfa/msa.go:150 +0xa42
github.com/will-rowe/gfa.MSA2GFA(0xc00001d780, 0xaf3010, 0xc0000240d0, 0x0)
	/opt/conda/conda-bld/groot_1553596083523/work/src/github.com/will-rowe/gfa/msa.go:46 +0x117
github.com/will-rowe/groot/cmd.runIndex.func1(0xc0000240d0, 0xc0000c4240, 0x12, 0xc00001d780)
	/opt/conda/conda-bld/groot_1553596083523/work/src/github.com/will-rowe/groot/cmd/index.go:168 +0x59
created by github.com/will-rowe/groot/cmd.runIndex
	/opt/conda/conda-bld/groot_1553596083523/work/src/github.com/will-rowe/groot/cmd/index.go:165 +0x55f```

Turns out I had an empty file in my input directory, removing it makes it work. Maybe kind of niche (what kind of dumbass includes an empty file in their index directory, right?!), but might be good for groot to do this as a sanity check?

Increase accuracy of containment search seeding (and reduce memory usage)

What is the expected memory usage for groot?

I'm having issues running groot align on my 320GB RAM CentOS 7.5 machine, with groot 0.8.1 installed via conda. The input file is a gzipped 1.6GB FASTQ with approximately 20 M reads. I ran the following commands:

 groot get --out /db/groot/arg-annot 
 groot index -i /db/groot/arg-annot/arg-annot.90 --readLength 125 --containment -j 0.5 --processors 30
 gunzip -c 3920_v1_1.fastq.gz | groot align --indexDir groot-index-20181012134944/ --processors 30 > 3920_v1.groot.bam

The process is killed by the kernel after a while due to excessive memory usage.

ntHash update has broken GROOT

compilation error:

cannot use minHash.kSize (type int) as type uint in argument to ntHash.New

Build command returns error

Hi Will!
While installing GROOT I encountered an error.
I am installing the program compiling the source.
Everything goes smoothly until I do not run go build ./, which returns the following error:

~/groot$ go build ./

# github.com/will-rowe/groot/cmd
../go/src/github.com/will-rowe/groot/cmd/get.go:176:24: invalid method expression archiver.Tar.Open (needs pointer receiver: (*archiver.Tar).Open)
../go/src/github.com/will-rowe/groot/cmd/get.go:176:24: archiver.Tar.Open undefined (type archiver.Tar has no method Open)

I solved it by changing line 176 of get.go from:
if err := archiver.Tar.Open("tmp.tar", "tmp"); err != nil {
to:
if err := archiver.Unarchive("tmp.tar", "tmp"); err != nil {

This solved immediately the issue.
As for reference I am on Ubuntu 18.04 and using go version go1.10.4 linux/amd64.

I am not sure if this problem is related only to me or to an update of archiver or it is version specific.

Hope this info can help and thanks for GROOT!
Best,
Stefano

Error on installation from source

Hello, I am trying to install groot from source onto a Linux server. I receive the following error message when attempting to both test (go test -v ./...) and compile (go build ./) the install: "segment.GetKmerCount undefined (type *gfa.segment has no field or method GetKmerCount)". I do have go installed and working on the server.

Abundance table

Hello,

Thank you for this great pipeline!

Is there a way to merge all groot reports into a single abundance table in a similar way to ASVs/OTUs tables?

Thanks a lot in advance!

Inconsistent results with very similar read length thresholds

Hi Will,

I came across a bizarre case where I am getting drastically different results if I slightly change the read length thresholds (for both indexing and aligning).

I have a metagenome sequence file (publicly available in ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR503/000/SRR5032340/SRR5032340_1.fastq.gz). Using FastQC I see that the majority of reads are within the 100-101 bp window. However, the mean read length is 97 bp. So I first indexed and aligned the genome using the default 100 bp length:

groot index -p 4 -l 100 -i groot-db.90 -o groot-db_100bp -y groot-db_100bp.log
groot align -p 4 -l 100 -i groot-db_100bp -f SRR5032340_1.fastq.gz -o groot-db_100bp_graphs -y groot-db_100bp_align.log > groot-align.bam
groot report -i groot-align.bam -c 0.95 -y groot-db_100bp_rep.log > groot-report.tab

This identifies 90 different AMR genes, with counts hovering in the thousands for many of them. However, if I repeat the same procedure above but using the average read length of 97 instead of 100 for indexing and aligning (-l option), I only get 17 AMR genes, each with very low counts (100-200 reads).

I didn't expect that such a small difference in the read length option would result in such drastic differences. Do you have any idea why this would be the case and what would be the most appropriate approach to deal with this inconsistency? I am running Groot v0.7.1. I haven't yet tried the --containment option with the new version as I was having the same issues raised by another user regarding large memory requirements.

Many thanks in advance for your help.
Alex

GROOT and long reads

Hi, I'm currently testing GROOT on Nanopore data and am getting this error --> panic: k size is greater than sequence length (31 vs 1). Please let me know if GROOT is compatible with Nanopore data and if so, how to fix this error. Thanks!

Now in brew

brew install brewsci/bio/groot

groot-db output issues

Hi!

I am exploring the option of using groot-db as it combines 3 well used AMR databases.
I believe that this is a very good idea but i am facing some issues that I believe is good to address.

Here is a typical output that I have from my results:

C_RESFINDER__erm(F)3_M17808 211 801 762M39D
groot-db_CARD__gb|GQ342996|+|797-1793|ARO:3003097|CfxA6 346 997 38D948M11D
groot-db_ARGANNOT_(Bla)cfxA6:GQ342996:798-1793:966 346 996 38D948M10D
groot-db_RESFINDER__tet(Q)4_Z21523 194 1926 12D1850M64D
groot-db_ARGANNOT_(Tet)TetQ:Z21523:362-2287:1926 197 1974 1910M64D

It is clear that entries 2-3 and 4-5 are duplicates. Same gene (maybe different allele?) presented 2 times in the report. This makes parsing and summarizing the results quite tricky to handle.
Can you see any way to tackle that?

Also the format of each entry is dependent from the database of origin. So the first column is different for CARD, ARGANNOT and RESFINDER. This is also a bit confusing and difficult to handle.
Do you think that you could homogenize that? if not maybe give a description of the format for each different DB in the report files?

Please let me know what you think.
Thank you in advance
Leonardos

Encountered error: gob: encoder: message too big

Hi!

Unfortunately having some trouble running groot. Based on the log file (below), it appears to fail when trying to save the index. I ran:

groot get --database groot-db --out /db/groot/groot-db/ -y /db/groot/groot-db.log
groot index -i /db/groot/groot-db/groot-db.90 --readLength 125 --processors 30

2018/10/12 12:23:18 i am groot (version 0.8.1)                                                    
2018/10/12 12:23:18 starting the index subcommand                                                 
2018/10/12 12:23:18 checking parameters...                                                        
2018/10/12 12:23:18     indexing scheme: lshForest                                                
2018/10/12 12:23:18     processors: 30                                                            
2018/10/12 12:23:18     k-mer size: 7                                                             
2018/10/12 12:23:18     signature size: 128                                                       
2018/10/12 12:23:18     read length (window size): 125                                            
2018/10/12 12:23:18     number of MSA files found: 1156                                           
2018/10/12 12:23:18 building groot graphs...                                                      
2018/10/12 12:23:20     number of groot graphs built: 1156                                        
2018/10/12 12:23:20 windowing graphs and generating MinHash signatures...                         
2018/10/12 12:23:39     number of signatures generated: 3525379                                   
2018/10/12 12:23:39 running LSH Forest...                                                         
2018/10/12 12:23:53     number of hash functions per bucket: 128                                  
2018/10/12 12:23:53     number of buckets: 1                                                      
2018/10/12 12:23:53 saving index files to "./groot-index-20181012122318"...                       
2018/10/12 12:24:07 encountered error: gob: encoder: message too big

Building custom database, memory overconsumption.

Hello,
I'm trying to index a custom database of 35 .msa weighting about 71mo.
My issue is that for some reason it takes an incredible amount of memory to build. I'm talking 2T of ram. Well, It took 2T before beeing killed by the system, so actually I never managed to build the db yet. At first I thought it may be related to number of cpu, I was running with 100 or so. I am running the same command with only 1 cpu and at the moment we are at 360G of ram taken.
The memory overcomsumption happens at the following step:
windowing graphs and generating MinHash signatures.
Do you have any clue?
Best,
Seb

Could not unpack the tarball after downloading the arg-annot database

Hi. Thanks for the great work with groot! Also, thanks so much for the detailed and comprehensive documentation. Very much appreciated!

I am using version 0.7 that I installed with conda from the bioconda channel, and I run it in a conda environment that I created for it. I am running into the following issue when I want to use the get command. It is not a big deal since the arg-annot database is downloaded and I can unpack the tarball manually, but I just thought I would let you know about this. Thank you!

downloading the pre-clustered arg-annot database...
unpacking...
could not unpack the tarball
arg-annot.90/: illegal file path

Interpreting output plots

It's expected for coverage plot to have a peak in the middle with lower coverage on the 5' & 3' ends due to seeding difficulty at the ends. But how would you interpret 'valleys' and heavily skewed peaks? My guess is that a valley is a gene that has a insertion in the middle and that a heavy skew is a duplicated region.

Get the database error "groot get -d arg-annot"

Hi， Thanks for this great tool！
In source activate grootTutorial
When I "groot get -d arg-annot"
It Shows the error message
"downloading the pre-clustered arg-annot database...
unpacking...
could not unpack the tarball
arg-annot.90/: illegal file path"
or
"downloading the pre-clustered arg-annot database...
could not download the tarball
Get https://raw.githubusercontent.com/will-rowe/groot/master/db/clustered-ARG-databases/arg-annot.90.tar: dial tcp 151.101.228.133:443: i/o timeout"

When just use the "conda install groot & groot get -d arg-annot" no Error shows,
Could Anyone Help me, Thanks a lot?

Trimming and read length?

Hi,
I contact you because I am having some trouble running Groot.

My data:
I have a collection of metagenomic samples and I have qc'ed my paired-end shotgun sequences using the Sunbeam pipeline. For the AMR detection, I have created a Groot database, based on the Megares database, which detects quite a bit of AMR genes in my samples.

My problem:
Since I have access to the SAGA compute cluster, I use a slurm script to process the samples with Groot and my megares databases, and there i have something funny happening...
Most of my samples are analyzed, and I get the output, but for four samples I have jobs failing, and it seems that it is due to the average read length of those samples. Since I have paired end data, I run each read set with align and then I combined the bam files to generate the final output file. (reason was that Groot used a lot of memory, when running the R1 and R2 files together)

The command that I used is the following

groot align -i megares_db \
     -f fasta_file_R1.fastq \
     -y file_R1.align.log \
      -l 150 \
     - p 10
     -o file_R1.groot-graphs > file_R1.bam

groot align -i megares_db \
     -f fasta_file_R2.fastq \
     -y file_R2.align.log \
      -l 150 \
     - p 10
     -o file_R2.groot-graphs > file_R2.bam

For some of my datasets, they crash when running the second command. When I check those datasets, they had an average read length below 140 for the reverse reads but not the forward reads. My database was indexed using a length of 150.

So my question is, why do I want to set the read length for the align step, or should I only do that in conjunction with the --trim option? Long reads are better for classification, than short ones, I would think.

and what happens in the trimming? Will that only remove low quality bases, or does it also throw away reads that are to short?

Build problem with go-metro while executing the tests

Hello,

I am trying to update the Bioconda recipe for Groot to add Linux ARM64 support.
But go test fails with this error (on both x86_64 and aarch64):

+ go test -v ./...�[0m
09:01:34 �[32mBIOCONDA INFO�[0m (OUT) go: downloading github.com/adam-hanna/arrayOperations v0.2.6�[0m
09:01:34 �[32mBIOCONDA INFO�[0m (OUT) # github.com/dgryski/go-minhash�[0m
09:01:34 �[32mBIOCONDA INFO�[0m (OUT) gopath/pkg/mod/github.com/dgryski/[email protected]/minwise_test.go:6:2: no required module provides package github.com/dgryski/go-metro; to add it:�[0m
09:01:34 �[32mBIOCONDA INFO�[0m (OUT) 	go get github.com/dgryski/go-metro�[0m
09:01:34 �[32mBIOCONDA INFO�[0m (OUT) FAIL	github.com/dgryski/go-minhash [setup failed]�[0m

Any hints what could be the problem ?
I have no experience with Go lang

Thank you!

protein database maybe More conservative than DNA

hi sir：
Groot is Very interesting method with variation graph to detect AMR subtype,
Considering the conservation of protein，protein database maybe better than DNA。
Another problem， how can i build myself database index in fasta file？
thanks

Could the "groot" be used for the paired-end reads?

Could the "groot" be used for the paired-end reads? Tutorial does not show how to use paired-end reads data.

Card db

Hi,

I'm trying to get the CARD database, but I have the following output:

(groot) kayobianco@arqueas:~$ groot get -d card
downloading the pre-clustered card database...
unpacking...
could not unpack the tarball
md5sum for downloaded tarball did not match record

Can you help me?

out of memory error ( > 256 GB)

This may be linked to issue #31

I ran groot version 0.8.6 using the groot-db and had a out of memory error during read alignment. This is with a pair of fastq.gz files of 2x 1 GB.
My computer has 256 GB ram.

This is the commands I used for index creation and alignment :

 groot get -d groot-db -i 90 -o .

 groot index -i  groot-db.90/ -o groot-index-groot-db -p 8 -l 100

 groot align -i groot-index-groot-db -f S1_R1.fastq.gz,S1_R2.fastq.gz -p 20  > S1_res_groot.bam

Wich results in :


fatal error: runtime: out of memory

runtime stack:
runtime.throw(0xb36987, 0x16)
        /usr/local/go/src/runtime/panic.go:774 +0x72
runtime.sysMap(0xfc1c000000, 0x4000000, 0x12bfff8)
        /usr/local/go/src/runtime/mem_linux.go:169 +0xc5
runtime.(*mheap).sysAlloc(0x12a5ec0, 0x2000, 0x3f, 0xb20)
        /usr/local/go/src/runtime/malloc.go:701 +0x1cd
runtime.(*mheap).grow(0x12a5ec0, 0x1, 0xf9ffffffff)
        /usr/local/go/src/runtime/mheap.go:1255 +0xa3
runtime.(*mheap).allocSpanLocked(0x12a5ec0, 0x1, 0x12c0008, 0xe124e0)
        /usr/local/go/src/runtime/mheap.go:1170 +0x266
runtime.(*mheap).alloc_m(0x12a5ec0, 0x1, 0xfae75c0008, 0x0)
        /usr/local/go/src/runtime/mheap.go:1022 +0xc2
runtime.(*mheap).alloc.func1()
        /usr/local/go/src/runtime/mheap.go:1093 +0x4c
runtime.systemstack(0xc0000d0480)
        /usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1146

Groot containment searching

Hi!

I have a metagenomic dataset that after QC contains reads of variable length.
Can you please explain to me how the containment searching in the index command works?
If I understand correctly that is the parameter I should use for my index command.

How can I determine what number I should give to that argument?

Thank you in advance.
Leonardos

how to select reasonable ARG result ?

hi sir：
i run groot a metagenomic '''
GeneName Name:ADC-1 151 Name:ADC-25 153 Name:ADC-30 163 Name:ADC-31 135 Name:ADC-56 155 Name:ADC-61 145 Name:ADC-67 158 Name:ADC-73 169 Name:ADC-75 156 Name:ADC-82 162 Name:OXA-23 279 Name:OXA-27 207 Name:OXA-49 242 Name:OXA-66 109 Name:OXA-73 253 Name:OXA-76 105 Name:OXA-79 96 Name:OXA-80 101 Name:OXA-83 101 Name:OXA-84 94 Name:OXA-109 101 Name:OXA-127 100 Name:OXA-146 279 Name:OXA-165 267 Name:OXA-166 265 Name:OXA-168 268 Name:OXA-169 257 Name:OXA-170 257 Name:OXA-177 90 Name:OXA-202 101 Name:OXA-206 101 Name:OXA-225 268 Name:OXA-234 96 Name:OXA-239 242 Name:OXA-254 101 Name:OXA-260 108 Name:OXA-336 99 Name:OXA-398 268 Name:OXA-422 257 Name:OXA-435 268 Name:OXA-440 267 Name:OXA-482 241 '''
there are so many thanks wiht card database version 3.0.8 index -w 75; and reprot -c 0.95 .
sample result show as below:
ReadsCount GeneLength CIGAR DL ML ML_Rate
1152 2D718M23D278M15D109M7D 47 1105 0.95920
1152 2D730M11D278M2D122M7D 22 1130 0.98090
1152 2D730M11D402M7D 20 1132 0.98264
1152 2D573M13D144M11D278M2D122M7D 35 1117 0.96962
1152 2D510M6D214M11D402M7D 26 1126 0.97743
1152 2D573M13D144M11D402M7D 33 1119 0.97135
1152 2D730M11D383M26D 39 1113 0.96615
1152 2D1143M7D 9 1143 0.99219
1152 2D730M11D92M15D295M7D 35 1117 0.96962
1152 2D718M23D402M7D 32 1120 0.97222
822 13D806M3D 16 806 0.98054
822 13D142M9D117M2D455M3D78M3D 30 792 0.96350
825 13D429M3D377M3D 19 806 0.97697
825 7D810M8D 15 810 0.98182
822 13D725M3D78M3D 19 803 0.97689
825 7D799M19D 26 799 0.96848
825 7D656M11D143M8D 26 799 0.96848
825 7D366M21D423M8D 36 789 0.95636
825 7D366M21D423M8D 36 789 0.95636
825 7D366M21D109M5D309M8D 41 784 0.95030
825 7D366M21D423M8D 36 789 0.95636
825 7D483M18D309M8D 33 792 0.96000
825 13D809M3D 16 809 0.98061
822 13D806M3D 16 806 0.98054
822 13D806M3D 16 806 0.98054
822 13D678M1D127M3D 17 805 0.97932
822 13D117M13D676M3D 29 793 0.96472
822 13D605M6D195M3D 22 800 0.97324
825 7D483M18D88M1D220M8D 34 791 0.95879
825 7D366M21D423M8D 36 789 0.95636
825 7D366M21D423M8D 36 789 0.95636
822 13D657M5D144M3D 21 801 0.97445
825 7D656M11D143M8D 26 799 0.96848
822 13D309M4D331M18D144M3D 38 784 0.95377
825 7D366M21D423M8D 36 789 0.95636
825 21D796M8D 29 796 0.96485
825 7D496M17D297M8D 32 793 0.96121
822 13D657M5D144M3D 21 801 0.97445
822 38D781M3D 41 781 0.95012
822 13D678M1D127M3D 17 805 0.97932
822 13D655M7D144M3D 23 799 0.97202
822 13D102M6D698M3D 22 800 0.97324
too many ADC gene and OXA gene subtype.
subtype of one gene in one clinic sample seem unreasonable；so how should i to select reasonable result from groot raw result？

Invalid filename chars in plot pngs

Recently encountered this in groot 0.8.3 from conda, using the groot-db:

panic: open ./groot-plots/coverage-for-groot-db_RESFINDER__tet(0/32/0)_7_FP929050.png: no such file or directory          
                                                                                                                          
goroutine 1 [running]:                                                                                                    
github.com/will-rowe/groot/src/reporting.(*BAMreader).Run(0xc4201bc900)                                                   
        /opt/conda/conda-bld/groot_1540475137626/work/src/github.com/will-rowe/groot/src/reporting/reporting.go:191 +0xf7c
github.com/will-rowe/groot/cmd.runReport()                                                                                
        /opt/conda/conda-bld/groot_1540475137626/work/src/github.com/will-rowe/groot/cmd/report.go:140 +0x2a3             
github.com/will-rowe/groot/cmd.glob..func7(0x1036140, 0xc4201dc090, 0x0, 0x9)                                             
        /opt/conda/conda-bld/groot_1540475137626/work/src/github.com/will-rowe/groot/cmd/report.go:56 +0x20               
github.com/spf13/cobra.(*Command).execute(0x1036140, 0xc4201dc000, 0x9, 0x9, 0x1036140, 0xc4201dc000)                     
        /opt/conda/conda-bld/groot_1540475137626/work/src/github.com/spf13/cobra/command.go:766 +0x2c1                    
github.com/spf13/cobra.(*Command).ExecuteC(0x10363a0, 0x1, 0x1, 0x1036600)                                                
        /opt/conda/conda-bld/groot_1540475137626/work/src/github.com/spf13/cobra/command.go:852 +0x30a                    
github.com/spf13/cobra.(*Command).Execute(0x10363a0, 0xc420032478, 0x0)                                                   
        /opt/conda/conda-bld/groot_1540475137626/work/src/github.com/spf13/cobra/command.go:800 +0x2b                     
github.com/will-rowe/groot/cmd.Execute()                                                                                  
        /opt/conda/conda-bld/groot_1540475137626/work/src/github.com/will-rowe/groot/cmd/root.go:64 +0x2d                 
main.main()                                                                                                               
        /opt/conda/conda-bld/groot_1540475137626/work/src/github.com/will-rowe/groot/main.go:26 +0x20

I think it's due to the forward slashes in the filename it attempts to write to.

Out-of-memory failure v0.8.6

With version 0.8.6, my align processes is failing due to memory. The same command with the same fastq file size worked before with v0.8.5 and the resfam90 index. I am now using an index of CARD and running with 0.8.6 with 8 threads and as much memory as I am allowed to allocate on my computing cluster.

checksum error for CARD

Issue raised via email:

I was trying to download the card database for groot with the command  groot get –d card, but got the following error message:
 
downloading the pre-clustered card database...
unpacking...
could not unpack the tarball
md5sum for downloaded tarball did not match record
 
Downloading groot-db and the default arg-annot was fine. Perhaps something worth looking into?

encountered error: mean read length is outside the graph window size (+/- 10 bases)

Sorry again for bothering you。
command linegunzip -c *.fq.gz | groot align -i /groot-index -p 84

2019/09/10 02:56:32 i am groot (version 0.8.5)
2019/09/10 02:56:32 starting the align subcommand
2019/09/10 02:56:32 checking parameters...
2019/09/10 02:56:32 input file: using STDIN
2019/09/10 02:56:32 processors: 84
2019/09/10 02:56:32 read trimming: disabled
2019/09/10 02:56:32 maximum clipped bases allowed: 5
2019/09/10 02:56:32 loading index information...
2019/09/10 02:56:32 k-mer size: 7
2019/09/10 02:56:32 signature size: 128
2019/09/10 02:56:32 Jaccard similarity theshold: 0.99
2019/09/10 02:56:32 window sized used in indexing: 100
2019/09/10 02:56:32 loading the groot graphs...
2019/09/10 02:56:32 number of variation graphs: 583
2019/09/10 02:56:32 loading the MinHash signatures...
2019/09/10 02:56:38 number of hash functions per bucket: 128
2019/09/10 02:56:38 number of buckets: 1
2019/09/10 02:56:38 initialising alignment pipeline...
2019/09/10 02:56:38 initialising the processes
2019/09/10 02:56:38 connecting data streams
2019/09/10 02:56:38 number of processes added to the alignment pipeline: 6
2019/09/10 02:56:38 now streaming reads...
2019/09/10 03:03:41 number of reads received from input: 72841016
2019/09/10 03:03:41 mean read length: 150
2019/09/10 03:03:41 encountered error: mean read length is outside the graph window size (+/- 10 bases)

panic: k size is greater than sequence length (31 vs 29)

Hi, I have been using Groot for about a month now. I am now seeing an error like the one below, which I don't understand. Could you help me?
Also, I know that my sequences contain a little amount of low quality short sequences. Meanwhile i try with the filtered reads.

goroutine 1191 [running]:
github.com/will-rowe/groot/src/pipeline.(*theBoss).mapReads.func1(0xc04a4f2910, 0xc00335b0a0, 0x0)
/Users/willrowe/Desktop/groot/src/pipeline/boss.go:165 +0x5f7
created by github.com/will-rowe/groot/src/pipeline.(*theBoss).mapReads
/Users/willrowe/Desktop/groot/src/pipeline/boss.go:136 +0x304
panic: k size is greater than sequence length (31 vs 29)