ivarz / conifer Goto Github PK

View Code? Open in Web Editor NEW

18.0 3.0 7.0 406 KB

Calculate confidence scores from Kraken2 output

License: BSD 2-Clause "Simplified" License

C 97.94% Makefile 2.06%

kraken2 confidence-scores metagenomic-analysis

conifer's People

Contributors

Stargazers

Watchers

Forkers

ditag slw287r twelvesummer duttaanik mbhall88

conifer's Issues

read length percentile

Hi Ivar,

Thank you for making this useful tool! I was wondering if it is possible to know the read length percentiles also for each taxa assignment.

Thanks!
Hena

Missing output

I wanted to try out your tool as you recommended in my issue on kraken. I started it with:

./conifer --both_scores -s -i kraken.out.txt -d /scratch/databases/Standard_v2/taxo.k2d

then saw output

1000000 lines processed...                                                                                                                                                                    
2000000 lines processed...        
3000000 lines processed...
4000000 lines processed...
5000000 lines processed...
6000000 lines processed...
7000000 lines processed...
8000000 lines processed...
9000000 lines processed...
10000000 lines processed...
11000000 lines processed...
12000000 lines processed...
13000000 lines processed...
14000000 lines processed...
15000000 lines processed...
16000000 lines processed...
17000000 lines processed...
18000000 lines processed...
19000000 lines processed...
20000000 lines processed...
21000000 lines processed...
22000000 lines processed...
23000000 lines processed...
24000000 lines processed...
25000000 lines processed...
26000000 lines processed...
27000000 lines processed...
28000000 lines processed...
29000000 lines processed...
30000000 lines processed...
31000000 lines processed...
32000000 lines processed...
33000000 lines processed...
34000000 lines processed...
35000000 lines processed...
36000000 lines processed...
37000000 lines processed...
38000000 lines processed...
39000000 lines processed...
40000000 lines processed...
41000000 lines processed...
42000000 lines processed...
taxon_name      taxid   reads   P25_conf        P50_conf        P75_conf        P25_rtl P50_rtl P75_rtl

I expected to see more in the table. Any ideas what could cause this?

Docker image

Hi @Ivarz,

I recently created a Docker image for conifer. I thought, I'd leave it here in case it's useful for you or someone else.

# Copyright (c) 2020, Moritz E. Beber.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM bitnami/minideb:buster AS builder

RUN set -eux \
    && install_packages \
        build-essential \
        ca-certificates \
        git \
        libz-dev

WORKDIR /opt

RUN set -eux \
    && git clone https://github.com/Ivarz/Conifer.git \
    && cd Conifer \
    && git submodule update --init --recursive \
    && gcc -static -std=c99 -Wall -Wextra -O3 -D_POSIX_C_SOURCE=200809L -I third_party/uthash/src -I . src/utils.c src/kraken_stats.c src/kraken_taxo.c src/main.c -o conifer -l:libm.a -l:libz.a

FROM busybox:glibc

COPY --from=builder /opt/Conifer/conifer /

ENTRYPOINT ["/conifer"]

I'll track progress of this file over here.

Bioconda package

Hi @Ivarz,

I recently created a bioconda package for conifer. Hope it's useful for you or someone else.

https://anaconda.org/bioconda/conifer

length and confidence length differ

Hi, Ivarz:
Thanks for your convenient tool.
I am trying to calculate confidence score using result from kraken2. I am wondering why len not equal to 100?

C V100006960L1C001R001000420 853 100|100 0：16 853：8 1783272：2 748224：2 1783272：2 168384：5 186801：6 0：2 168384：5 0：18 |：|748224:7 0:2 748224:5 0:21 853:4 748224:7 0:5 748224:3 0:12
read1 : 16+8+2+2+2+5+6+2+5+18=66,
read2: 7+2+5+21+4+7+5+3+12=66.
Thanks!

conifer output not to specific readIDs

Hello,

Thanks for developing this tool! I have recently come across it and thought it would help me to fine tune the accuracy of my Kraken2 results. I read in the readme file that it generates the confidence scores for each readID. However, in my conifer output file, I see the confidence score for each taxid/taxname, as opposed to readID. Is there anything I did wrong?

Thanks,
Elly

Report the taxid and or name

Hello again,

I've been using Conifer for a bit now and I find it very useful. Thank you for that. At the moment, Conifer in its simplest form reports

kraken output	read1 confidence	read2 confidence	average

Since Conifer can obviously do this, as seen for the summary report, I would love to get the output as

taxid	name (optional)	read1 confidence	read2 confidence	average

and simply have additional rows for the same taxid. Does this make sense? Would you consider adding this output option? Or maybe there is a different simple way to map the kraken output to the taxid that I am missing right now.

Downstream analysis with dask

I wanted to let you know that I've created a Python package that uses dask for distributed analysis of conifer output files. You can find the repo on GH and install the Python package from PyPI. It's quite minimal so far but feel free to create issues or contribute other kinds of analyses.

New release

Would it be possible to make a new release after adding the --help message/option? That way the bioconda recipe will trigger a new release too and the bioconda install of conifer will then have access to that option.

ivarz / conifer Goto Github PK

conifer's People

Contributors

Stargazers

Watchers

Forkers

conifer's Issues

read length percentile

Missing output

Docker image

Bioconda package

length and confidence length differ

conifer output not to specific readIDs

Report the taxid and or name

Downstream analysis with dask

New release

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent