kbseah / genome-bin-tools Goto Github PK
View Code? Open in Web Editor NEWInteractive tools for metagenome visualization and binning in R
License: GNU General Public License v2.0
Interactive tools for metagenome visualization and binning in R
License: GNU General Public License v2.0
From Liz:
I don't know where to post this, but the documentation of your script for the pastg fishing is missing a parameter on the section:
perl fastg_paths_fishing.pl -g <assemblygraph.fastg> -p <scaffolds.paths>
-o -b -i -r
(line 9 and 10)
it should also incorporate -s <scaffolds.fasta>
get_ssu_for_genome_bin_tools.pl
calls old version of phyloFlash database which is no longer produced by latest phyloFlash version.
Temp file names tmp.$filename make it impossible to use paths. Use tmp file generator instead.
This is a problem for Fastg connectivity binning because header names expected to match exactly
Script fastgFishing.pl works fine because it only checks the NODE numbers - however the output is based on the header given in the scaffolds.paths file, not scaffolds.fasta file. Check how they are hashed and report the names in the scaffolds.fasta file!
Hi,
I have got the below error when attempting to use gbtools to generate a GC-coverage plot. Any help here would be appreciated
library(gbtools)
d <-gbt(covstats="INT_metagenome.coverage",mark="INT_metagenome_taxonomy.tab",marksource="blobology")
plot(d)
Error in tabulate(taxon) : 'bin' must be numeric or a factor
In case you would like to inspect, the input files that I used here will be available at this link for the next 7 days
Fix the error catching
> d <- gbt(covstats="example_data/HPminus.coverage",mark="example_data/phylotype.result.parsed")
gbtools WARNING: marksource not supplied.
Error in table(markTab$source) : object 'markTab' not found
EOF character ' in tables imported to R will cause error.
Amphora2 breaks contig names on first whitespace, whereas Bbmap does not. (Can't remember what Barrnap does.) Megahit assembler produces contig names with spaces.
This leads to contig/scaffold names not being recognized when trying to import annotations as gbt object.
Either "fix" the covstats file from Bbmap, or split contig names on first whitespace when importing gbt object.
Check if it is working, update Readme and docs
"left outer join" refers to something else. You mean "left excluding join" or "reverse complement set"
fastgFishing
should not be defined as a method for class gbtbin. Call to Perl script throws an error? Check if error can be reproduced.
This module is used in the input_validator.pl
script. However, it's no longer distributed with core Perl after version 5.20; should find an alternative that's also reverse compatible, if not then just use standard 'print'
Simplest idea: HMM profiles for marker genes that are then blastp vs. ncbi-nr to get taxonomy string
so that annotations from different tools/pipelines can be combined and plotted separately.
generalize GC coloring
Allow user to choose number of iterations when using fastgFish()
, with default (0?) simply running iterations to completion (current behavior).
Need to be able to specify which marker set to use for winnowMark, otherwise if there's more than one marker set, it uses all that match!
To highlight only a single taxon in plot, we now need to use an 'unhealthy workaround' of defining bin by taxon, then overlaying on plot with points.gbtbin()
. Should implement a simple parameter in the plot.gbt
function called something like "highlightTaxon".
Fastg fishing script is hardwired to parse headers from SPAdes only.
Would be useful to have output from Megahit and other assemblers supported.
Normally I run this code with no issues at all. I use GBTools to interactively/manually delineate my bins. However, for some reason it is saving an empty (1 byte) list file; i.e., my bin.connected.scaffolds.list file is created but is only 1 byte and does not actually contain a list. I'm not sure what is wrong or what to do. My imported files (assembly_graph.fastg, scaffolds.paths and scaffolds.fasta files are 84.5MB, 7.2MB, and 39.1MB respectively, and the differential coverage and GC plots are created successfully, so I don't think it's a problem with the files the code is calling on.
I tried updating my R (to 4.0.0) to see if this would resolve the issue, but that created a whole bunch of new problems (I believe with incompatibilities with package versions, but I am new to R so not sure. All I know is it was very irksome to troubleshoot, so I went back to version 3.5.2)
For the part of the code that gives me issues
'
`> bin <- choosebin(d,slice=1,save=TRUE,file="bin.scaffolds.list",num.points=15)
Loading required package: sp
bin.connected <- fastgFish(d,bin,fastg.file="assembly_graph.fastg",paths.file="scaffolds.paths",fasta.file="scaffolds.fasta",script.path="/Users/mhauer32/Documents/Beinart_Lab/Marianas_Project/Genome_assembly_final/Spades2_5k/fastg_paths_fishing.pl",depth=1,save=TRUE,file="bin.connected.scaffolds.list")
Warning messages:
1: In system2(command, command.params, input = as.character(bin$scaff$ID), :
running command ''perl' /Users/mhauer32/Documents/Beinart_Lab/Marianas_Project/Genome_assembly_final/Spades2_5k/fastg_paths_fishing.pl -g assembly_graph.fastg -p scaffolds.paths -s scaffolds.fasta -o /tmp/tmp.fishing_output -b - -i 1 -r 2>/dev/null < '/var/folders/qx/zjx9f11d0tjc0qdf8x5c08r40000gn/T//RtmpXB6VHZ/file12caf2980a940'' had status 2
2: In max(scaff.subset$Length) :
no non-missing arguments to max; returning -Inf
3: In min(scaff.subset$Length) :
no non-missing arguments to min; returning Inf`
for the entire code
'
`> d <- gbt(covstats=c("Mar13_symbiont3_5k.coverage.mod", "Mar11_symbiont3_5k.coverage.mod"),mark="phylotype.result.parsed.mod",marksource="amphora2",ssu="Mar13_symbiont3_5k.ssu.tab.mod")
col <- data.frame(c("Gammaproteobacteria"),c("red"))
plot(d,slice=c(1,2),ssu=TRUE,trna=TRUE,markCustomPalette=col,taxonLevel="Class",legend=TRUE)
plot(d,ssu=TRUE,trna=TRUE,markCustomPalette=col,taxonLevel="Class",legend=TRUE)
pdf("DiffCov.pdf")
plot(d,slice=c(1,2),ssu=TRUE,trna=TRUE,markCustomPalette=col,taxonLevel="Class",markCutoff=1.0,legend=TRUE)
dev.off()
quartz
2
pdf("GCCov.pdf")
plot(d,ssu=TRUE,trna=TRUE,markCustomPalette=col,taxonLevel="Class",markCutoff=1.0,legend=TRUE)
dev.off()
quartz
2
bin <- choosebin(d,slice=1,save=TRUE,file="bin.scaffolds.list",num.points=15)
Loading required package: sp
bin.connected <- fastgFish(d,bin,fastg.file="assembly_graph.fastg",paths.file="scaffolds.paths",fasta.file="scaffolds.fasta",script.path="/Users/mhauer32/Documents/Beinart_Lab/Marianas_Project/Genome_assembly_final/Spades2_5k/fastg_paths_fishing.pl",depth=1,save=TRUE,file="bin.connected.scaffolds.list")
Warning messages:
1: In system2(command, command.params, input = as.character(bin$scaff$ID), :
running command ''perl' /Users/mhauer32/Documents/Beinart_Lab/Marianas_Project/Genome_assembly_final/Spades2_5k/fastg_paths_fishing.pl -g assembly_graph.fastg -p scaffolds.paths -s scaffolds.fasta -o /tmp/tmp.fishing_output -b - -i 1 -r 2>/dev/null < '/var/folders/qx/zjx9f11d0tjc0qdf8x5c08r40000gn/T//RtmpXB6VHZ/file12caf2980a940'' had status 2
2: In max(scaff.subset$Length) :
no non-missing arguments to max; returning -Inf
3: In min(scaff.subset$Length) :
no non-missing arguments to min; returning Inf
`
the programs I installed to run this code (package, version, built):
sp : 1.4-1 3.5.2
plyr : 1.8.6 3.5.2
devtools : 2.3.0 3.5.2
rlang : 0.4.3 3.5.2
gbtools : 2.6.0 3.5.2
parsers for other marker gene tools?
Option to skip mapping step and parse Fasta files from SPAdes, Velvet, etc. which have coverage data in the headers, and also calculate GC% and length from the sequences.
To keep the workspace tidy, especially when there are a lot of bins, give option to import multiple bins into a list of gbtbin objects, rather than individual gbtbin objects. This also makes it easier to do multiBinPlot...
Hello,
I am back again (sorry).
I have made coverage tables, taxonomy tables and ssu tables for multiple samples and have imported them individually into R using the code specified in the workflow. Plotting of 2 samples worked successfully, but for the third one I am having an error.
The data was imported successfully using following code:
sample_59 <- gbt(covstats="U:\\linux_shared_data\\infant_genome_binning\\assembly_coverages\\S59.coverage", mark="U:\\linux_shared_data\\infant_genome_binning\\assembly_taxonomy_tables\\S59_contig_taxonomy.tab", marksource="blobology", ssu="U:\\linux_shared_data\\infant_genome_binning\\assembly_ssu_tables\\S59.ssu.tab")
Then, check with summary(sample_59) and gbt_checkinput also shows no errors present.
When plotting with code:
plot(sample_59,taxon="Family",ssu=TRUE,textlabel=TRUE,legend=TRUE,main="Coverage-GC plot of isolate")
The error is
Error in legend("topright", legend = colorframe$taxon, cex = 0.6, fill = as.character(colorframe$colors)) : 'legend' is of length 0
The data is plotted correctly but all datapoints are in grey, with no legend.
I am using RStudio with R 3.6.1.
Legend not being generated properly. Introduce a catch for special case, i.e. markCutoff=1
means include all taxa in legend?
Improve user's documentation of own work by saving function calls in gbtbin objects.
Bin summary should report summary stats of coverage, GC, length, and maybe also variances thereof
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.