ivarz / conifer Goto Github PK
View Code? Open in Web Editor NEWCalculate confidence scores from Kraken2 output
License: BSD 2-Clause "Simplified" License
Calculate confidence scores from Kraken2 output
License: BSD 2-Clause "Simplified" License
Hi Ivar,
Thank you for making this useful tool! I was wondering if it is possible to know the read length percentiles also for each taxa assignment.
Thanks!
Hena
I wanted to try out your tool as you recommended in my issue on kraken. I started it with:
./conifer --both_scores -s -i kraken.out.txt -d /scratch/databases/Standard_v2/taxo.k2d
then saw output
1000000 lines processed...
2000000 lines processed...
3000000 lines processed...
4000000 lines processed...
5000000 lines processed...
6000000 lines processed...
7000000 lines processed...
8000000 lines processed...
9000000 lines processed...
10000000 lines processed...
11000000 lines processed...
12000000 lines processed...
13000000 lines processed...
14000000 lines processed...
15000000 lines processed...
16000000 lines processed...
17000000 lines processed...
18000000 lines processed...
19000000 lines processed...
20000000 lines processed...
21000000 lines processed...
22000000 lines processed...
23000000 lines processed...
24000000 lines processed...
25000000 lines processed...
26000000 lines processed...
27000000 lines processed...
28000000 lines processed...
29000000 lines processed...
30000000 lines processed...
31000000 lines processed...
32000000 lines processed...
33000000 lines processed...
34000000 lines processed...
35000000 lines processed...
36000000 lines processed...
37000000 lines processed...
38000000 lines processed...
39000000 lines processed...
40000000 lines processed...
41000000 lines processed...
42000000 lines processed...
taxon_name taxid reads P25_conf P50_conf P75_conf P25_rtl P50_rtl P75_rtl
I expected to see more in the table. Any ideas what could cause this?
Hi @Ivarz,
I recently created a Docker image for conifer. I thought, I'd leave it here in case it's useful for you or someone else.
# Copyright (c) 2020, Moritz E. Beber.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
FROM bitnami/minideb:buster AS builder
RUN set -eux \
&& install_packages \
build-essential \
ca-certificates \
git \
libz-dev
WORKDIR /opt
RUN set -eux \
&& git clone https://github.com/Ivarz/Conifer.git \
&& cd Conifer \
&& git submodule update --init --recursive \
&& gcc -static -std=c99 -Wall -Wextra -O3 -D_POSIX_C_SOURCE=200809L -I third_party/uthash/src -I . src/utils.c src/kraken_stats.c src/kraken_taxo.c src/main.c -o conifer -l:libm.a -l:libz.a
FROM busybox:glibc
COPY --from=builder /opt/Conifer/conifer /
ENTRYPOINT ["/conifer"]
I'll track progress of this file over here.
Hi @Ivarz,
I recently created a bioconda package for conifer. Hope it's useful for you or someone else.
Hi, Ivarz:
Thanks for your convenient tool.
I am trying to calculate confidence score using result from kraken2. I am wondering why len not equal to 100?
C V100006960L1C001R001000420 853 100|100 0:16 853:8 1783272:2 748224:2 1783272:2 168384:5 186801:6 0:2 168384:5 0:18 |:|748224:7 0:2 748224:5 0:21 853:4 748224:7 0:5 748224:3 0:12
read1 : 16+8+2+2+2+5+6+2+5+18=66,
read2: 7+2+5+21+4+7+5+3+12=66.
Thanks!
Hello,
Thanks for developing this tool! I have recently come across it and thought it would help me to fine tune the accuracy of my Kraken2 results. I read in the readme file that it generates the confidence scores for each readID. However, in my conifer output file, I see the confidence score for each taxid/taxname, as opposed to readID. Is there anything I did wrong?
Thanks,
Elly
Hello again,
I've been using Conifer for a bit now and I find it very useful. Thank you for that. At the moment, Conifer in its simplest form reports
kraken output | read1 confidence | read2 confidence | average |
---|
Since Conifer can obviously do this, as seen for the summary report, I would love to get the output as
taxid | name (optional) | read1 confidence | read2 confidence | average |
---|
and simply have additional rows for the same taxid. Does this make sense? Would you consider adding this output option? Or maybe there is a different simple way to map the kraken output to the taxid that I am missing right now.
I wanted to let you know that I've created a Python package that uses dask for distributed analysis of conifer output files. You can find the repo on GH and install the Python package from PyPI. It's quite minimal so far but feel free to create issues or contribute other kinds of analyses.
Would it be possible to make a new release after adding the --help
message/option? That way the bioconda recipe will trigger a new release too and the bioconda install of conifer will then have access to that option.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.