possibly related to <a class="issue-link js-issue-link" data-error-text="Failed to loa

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

thanks for your swift aswers! DeepConsensus <a href="ht

lower quality and less reads in deepconsensus 1.0 output compared to ccs about deepconsensus HOT 2 CLOSED

daaaaande commented on June 2, 2024

lower quality and less reads in deepconsensus 1.0 output compared to ccs

from deepconsensus.

Comments (2)

danielecook commented on June 2, 2024

@daaaaande thank you for your thorough investigation here. To address some of your comments:

very similar results are also coming from 2 human DNA smrtcells from SequelII systems, where the output was already good. The q scores were much lower in the deepconsensus versus the ccs file

DeepConsensus caps base qualities at 40 currently. This corresponds with a predicted error rate of 1/10,000, a rate that corresponds with approx 1 error / HiFi read. We plan to allow users to configure this capping behavior in the next release.

... and even a few reads were missing!

By default, DeepConsensus will filter out reads where the min_quality is less than 20. If you want to recover all reads you can set --min_quality=0 when running DeepConsensus.

Another point that i do not understand is the base-dependent q score that dissapears after deepconsensus:
see fastp reports below.

This is an interesting observation. Base probabilities are generated using the outputs of the DeepConsensus model which appears to remove the base-dependent effect. Further investigation here would be helpful to determine if the quality predictions accurately reflect the base errors rates when stratified by base with both CCS and DeepConsensus.

from deepconsensus.

daaaaande commented on June 2, 2024

thanks for your swift aswers!

DeepConsensus caps base qualities at 40 currently. This corresponds with a predicted error rate of 1/10,000, a rate that corresponds with approx 1 error / HiFi read. We plan to allow users to configure this capping behavior in the next release.

why ? did you see one error in each read in the data? tbh i do not see a logical reason for this in my data. also, since the input has higher q scores i would expect to hit the ceiling (40) on all reads that exceed that in the "input". is that due to the model being trained to be "conservative" with the q scores? anyway, the possibility to remove the cap would be fantastic!

By default, DeepConsensus will filter out reads where the min_quality is less than 20. If you want to recover all reads you can set --min_quality=0 when running DeepConsensus.

thanks, i did not know this. Also i did not check if all missing reads are q<20.

Further investigation here would be helpful to determine if the quality predictions accurately reflect the base errors rates when stratified by base with both CCS and DeepConsensus.

since ccs seems to have a base-dependence here and Deepconsensus seems to remove it mostly, i will forward this to pacbio. There might be a base-dependent signal/noise characteristic here caused by the chemistry or optics that i did not know about. Anyway, making the scores of all bases more similar seems like an improvement to me.

I will close this issue for now since you explained the reason for my biggest concern (the high delta in ccs - deepconsensus base quality averages)

from deepconsensus.

lower quality and less reads in deepconsensus 1.0 output compared to ccs about deepconsensus HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent