Coder Social home page Coder Social logo

genome-bin-tools's People

Contributors

kbseah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

genome-bin-tools's Issues

Documentation fix

From Liz:
I don't know where to post this, but the documentation of your script for the pastg fishing is missing a parameter on the section:
perl fastg_paths_fishing.pl -g <assemblygraph.fastg> -p <scaffolds.paths>
-o -b -i -r
(line 9 and 10)
it should also incorporate -s <scaffolds.fasta>

temp file names

Temp file names tmp.$filename make it impossible to use paths. Use tmp file generator instead.

Header names sometimes don't match in SPAdes scaffolds.paths vs. scaffolds.fastg

This is a problem for Fastg connectivity binning because header names expected to match exactly
Script fastgFishing.pl works fine because it only checks the NODE numbers - however the output is based on the header given in the scaffolds.paths file, not scaffolds.fasta file. Check how they are hashed and report the names in the scaffolds.fasta file!

Error in tabulate(taxon) : 'bin' must be numeric or a factor

Hi,

I have got the below error when attempting to use gbtools to generate a GC-coverage plot. Any help here would be appreciated

library(gbtools)
d <-gbt(covstats="INT_metagenome.coverage",mark="INT_metagenome_taxonomy.tab",marksource="blobology")
plot(d)
Error in tabulate(taxon) : 'bin' must be numeric or a factor

In case you would like to inspect, the input files that I used here will be available at this link for the next 7 days

gbt() dies if mark= not supplied

Fix the error catching

> d <- gbt(covstats="example_data/HPminus.coverage",mark="example_data/phylotype.result.parsed")
gbtools WARNING: marksource not supplied.
Error in table(markTab$source) : object 'markTab' not found

whitespace in contig names

Amphora2 breaks contig names on first whitespace, whereas Bbmap does not. (Can't remember what Barrnap does.) Megahit assembler produces contig names with spaces.

This leads to contig/scaffold names not being recognized when trying to import annotations as gbt object.

Either "fix" the covstats file from Bbmap, or split contig names on first whitespace when importing gbt object.

improve input file checker

  • Document with Pod
  • Output fixed files for common errors (comment char in covstats file, special chars that jam up R)

`fastgFishing()` multiple issues

fastgFishing should not be defined as a method for class gbtbin. Call to Perl script throws an error? Check if error can be reproduced.

Alternative to Log::Message::Simple

This module is used in the input_validator.pl script. However, it's no longer distributed with core Perl after version 5.20; should find an alternative that's also reverse compatible, if not then just use standard 'print'

alternative to Amphora2

Simplest idea: HMM profiles for marker genes that are then blastp vs. ncbi-nr to get taxonomy string

implement taxon highlighting as plot parameter

To highlight only a single taxon in plot, we now need to use an 'unhealthy workaround' of defining bin by taxon, then overlaying on plot with points.gbtbin(). Should implement a simple parameter in the plot.gbt function called something like "highlightTaxon".

Empty scaffold list files

Normally I run this code with no issues at all. I use GBTools to interactively/manually delineate my bins. However, for some reason it is saving an empty (1 byte) list file; i.e., my bin.connected.scaffolds.list file is created but is only 1 byte and does not actually contain a list. I'm not sure what is wrong or what to do. My imported files (assembly_graph.fastg, scaffolds.paths and scaffolds.fasta files are 84.5MB, 7.2MB, and 39.1MB respectively, and the differential coverage and GC plots are created successfully, so I don't think it's a problem with the files the code is calling on.

I tried updating my R (to 4.0.0) to see if this would resolve the issue, but that created a whole bunch of new problems (I believe with incompatibilities with package versions, but I am new to R so not sure. All I know is it was very irksome to troubleshoot, so I went back to version 3.5.2)

For the part of the code that gives me issues
'
Screen Shot 2020-05-26 at 3 07 47 PM
`> bin <- choosebin(d,slice=1,save=TRUE,file="bin.scaffolds.list",num.points=15)
Loading required package: sp

bin.connected <- fastgFish(d,bin,fastg.file="assembly_graph.fastg",paths.file="scaffolds.paths",fasta.file="scaffolds.fasta",script.path="/Users/mhauer32/Documents/Beinart_Lab/Marianas_Project/Genome_assembly_final/Spades2_5k/fastg_paths_fishing.pl",depth=1,save=TRUE,file="bin.connected.scaffolds.list")
Warning messages:
1: In system2(command, command.params, input = as.character(bin$scaff$ID), :
running command ''perl' /Users/mhauer32/Documents/Beinart_Lab/Marianas_Project/Genome_assembly_final/Spades2_5k/fastg_paths_fishing.pl -g assembly_graph.fastg -p scaffolds.paths -s scaffolds.fasta -o /tmp/tmp.fishing_output -b - -i 1 -r 2>/dev/null < '/var/folders/qx/zjx9f11d0tjc0qdf8x5c08r40000gn/T//RtmpXB6VHZ/file12caf2980a940'' had status 2
2: In max(scaff.subset$Length) :
no non-missing arguments to max; returning -Inf
3: In min(scaff.subset$Length) :
no non-missing arguments to min; returning Inf`

for the entire code
'Screen Shot 2020-05-26 at 3 07 40 PM
`> d <- gbt(covstats=c("Mar13_symbiont3_5k.coverage.mod", "Mar11_symbiont3_5k.coverage.mod"),mark="phylotype.result.parsed.mod",marksource="amphora2",ssu="Mar13_symbiont3_5k.ssu.tab.mod")

col <- data.frame(c("Gammaproteobacteria"),c("red"))
plot(d,slice=c(1,2),ssu=TRUE,trna=TRUE,markCustomPalette=col,taxonLevel="Class",legend=TRUE)
plot(d,ssu=TRUE,trna=TRUE,markCustomPalette=col,taxonLevel="Class",legend=TRUE)
pdf("DiffCov.pdf")
plot(d,slice=c(1,2),ssu=TRUE,trna=TRUE,markCustomPalette=col,taxonLevel="Class",markCutoff=1.0,legend=TRUE)
dev.off()
quartz
2
pdf("GCCov.pdf")
plot(d,ssu=TRUE,trna=TRUE,markCustomPalette=col,taxonLevel="Class",markCutoff=1.0,legend=TRUE)
dev.off()
quartz
2
bin <- choosebin(d,slice=1,save=TRUE,file="bin.scaffolds.list",num.points=15)
Loading required package: sp
bin.connected <- fastgFish(d,bin,fastg.file="assembly_graph.fastg",paths.file="scaffolds.paths",fasta.file="scaffolds.fasta",script.path="/Users/mhauer32/Documents/Beinart_Lab/Marianas_Project/Genome_assembly_final/Spades2_5k/fastg_paths_fishing.pl",depth=1,save=TRUE,file="bin.connected.scaffolds.list")
Warning messages:
1: In system2(command, command.params, input = as.character(bin$scaff$ID), :
running command ''perl' /Users/mhauer32/Documents/Beinart_Lab/Marianas_Project/Genome_assembly_final/Spades2_5k/fastg_paths_fishing.pl -g assembly_graph.fastg -p scaffolds.paths -s scaffolds.fasta -o /tmp/tmp.fishing_output -b - -i 1 -r 2>/dev/null < '/var/folders/qx/zjx9f11d0tjc0qdf8x5c08r40000gn/T//RtmpXB6VHZ/file12caf2980a940'' had status 2
2: In max(scaff.subset$Length) :
no non-missing arguments to max; returning -Inf
3: In min(scaff.subset$Length) :
no non-missing arguments to min; returning Inf
`

the programs I installed to run this code (package, version, built):

sp : 1.4-1 3.5.2

plyr : 1.8.6 3.5.2

devtools : 2.3.0 3.5.2

rlang : 0.4.3 3.5.2

gbtools : 2.6.0 3.5.2

make importBins() import as a list of gbtbin objects

To keep the workspace tidy, especially when there are a lot of bins, give option to import multiple bins into a list of gbtbin objects, rather than individual gbtbin objects. This also makes it easier to do multiBinPlot...

Coverage-GC plot in R error: 'legend' is of length 0

Hello,

I am back again (sorry).
I have made coverage tables, taxonomy tables and ssu tables for multiple samples and have imported them individually into R using the code specified in the workflow. Plotting of 2 samples worked successfully, but for the third one I am having an error.

The data was imported successfully using following code:

sample_59 <- gbt(covstats="U:\\linux_shared_data\\infant_genome_binning\\assembly_coverages\\S59.coverage", mark="U:\\linux_shared_data\\infant_genome_binning\\assembly_taxonomy_tables\\S59_contig_taxonomy.tab", marksource="blobology", ssu="U:\\linux_shared_data\\infant_genome_binning\\assembly_ssu_tables\\S59.ssu.tab")

Then, check with summary(sample_59) and gbt_checkinput also shows no errors present.

When plotting with code:
plot(sample_59,taxon="Family",ssu=TRUE,textlabel=TRUE,legend=TRUE,main="Coverage-GC plot of isolate")

The error is

Error in legend("topright", legend = colorframe$taxon, cex = 0.6, fill = as.character(colorframe$colors)) : 'legend' is of length 0

The data is plotted correctly but all datapoints are in grey, with no legend.
I am using RStudio with R 3.6.1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.