ngslca,miwipe

allow for intervals of edit distance and similarity score

output plots

I think we should add examples of all the different types of plots the Rscript produces.

extract results above species level

check bam flags for the discard of unmapped and low quality reads

Make ngsLCA output as Kaiju output file style (https://github.com/bioinformatics-centre/kaiju) that gives you the counts only for the lowest ranks, rather than Kraken2 or Metaphaln that sum up the counts to the highest ranks.

Which is compatible with Phyloseq (https://joey711.github.io/phyloseq/) and Mia (https://microbiome.github.io/OMA/).

ont_otu_table.csv

output description

We still need a output format description of the .lca files, and the Rscript outputted files.

should .log be removed?

Error in MN:i: parsing sequences with MN:i:1 and above have been parsed?

see below parsed seqs which doesn't fit 100% to reference sequence

samtools view CHL_155_12485.sort.bam | grep -e 'HISEQ:50:C6NJJANXX:1:1108:14286:4677' -e 'HISEQ:50:C6NJJANXX:1:1108:14285:32643' -e 'HISEQ:50:C6NJJANXX:1:1108:14287:23673' -e 'HISEQ:50:C6NJJANXX:1:1108:14289:31539' -e 'HISEQ:50:C6NJJANXX:1:1108:14289:54817' | less -S

rm wLCA output

filtering of best matches reads only

Only use the maps with lowest editdistance

Segmentation fault (core dumped)

As of f118505, after having succesfully compiled ngsLCA on Ubuntu 20.04, I get a segmentation fault and empty output files while running sam2lca on of the example file in the bam_files :

See log below:

$ ./ngsLCA -names ncbi_tax_dmp/names.dmp -nodes ncbi_tax_dmp/nodes.dmp -acc2tax ncbi_tax_dmp/nucl_gb.accession2taxid -bam bam_files/SPL_015_1444.fq.plastids.sorted.bam -outnames SPL_015_1444
	-> Will output lca results in file:		'SPL_015_1444.lca'
	-> [thread1] Will read header
	-> Will output lca weight in file:		'SPL_015_1444.wlca'
	-> Will output log info (problems) in file:	'SPL_015_1444.log'
	-> [thread1] Done reading header: 0.00 sec, header contains: 4322
Segmentation fault (core dumped)

Taxa rank otu tables

Hi,

Is it possible to combine each taxa rank otutable into one rather than having them separate (one for species, one for genera etc)?

I can see that the program creates files containing all the ranks put together (kraken style), like "complete profile" file or the one in the taxa_groups folder but the counts, in my case, do not match with the counts in the separate files, therefore I am confused.

Thanks

Kaiju output file style

Make ngsLCA output as Kaiju output file style (https://github.com/bioinformatics-centre/kaiju) that gives you the counts only for the lowest ranks, rather than Kraken2 or Metaphaln that sum up the counts to the highest ranks.

Which is compatible with Phyloseq (https://joey711.github.io/phyloseq/) and Mia (https://microbiome.github.io/OMA/).

ont_otu_table.csv

parse fasta for recalculation of edit distance

ngsLCA with bwa output

Hi,
I'm wondering if it's ok to run ngsLCA with the bam file from bwa-mem. I see in the tutorial using bowtie2 with some specific parameters. If we use bwa-mem, is there any option that are recommended.
Thanks,
Hien

make script that parses to krona and to megan

problems running the test data

I've been having problems running ngsLCA on my own data so I tried running the test data first, but I also can't get that to work. I downloaded SPL_015_1444.fq.plastids.bam from ERDA, downloaded the ncbi tax_tax_dmp and then ran the following code:

ngsLCA -editdistmin 0 -editdistmax 0 -names ncbi_tax_dmp/names.dmp -nodes ncbi_tax_dmp/nodes.dmp -acc2tax ncbi_tax_dmp/nucl_gb.accession2taxid.gz -bam SPL_015_1444.fq.plastids.bam -outnames outfile.ed0

check input file is sorted by readname

make option for input filenames

filenames needs to be possible to specify

Compilation error on macOS 11.6

Hi @miwipe ,
While working ont the review of ngsLCA, I tried to compile it following the README's instruction (as of f118505), however, I encountered some errors (see log attached).
Compilation worked fine on Ubuntu 20.04

ngsLCA_build_log.txt

install ERROR

hi:
thanks for ngsLCA
but i get a error when Install the package "ngsLCA" with the command devtools::install_github("wyc661217/ngsLCA").
ERROR: dependencies ‘ComplexHeatmap’, ‘ggpubr’, ‘vegan’ are not available for package ‘ngsLCA’,my R version is latest 4.2.0

output print accession number and query sequence length

Can we add accession number and query sequence length to output print?

miwipe / ngslca Goto Github PK

ngslca's People

Contributors

Stargazers

Watchers

Forkers

ngslca's Issues

Recommend Projects

Recommend Topics

Recommend Org