Coder Social home page Coder Social logo

Comments (14)

jts avatar jts commented on August 22, 2024 2

I think a useful analysis to inform this discussion would be:

a) map human reads to the ncov reference to see what maps
b) map coronavirus reads to human to see what gets lost

(NB: in real amplicon sequencing sets very few reads (if any) will be human so a) will be a massive overestimate of what happens in a real experiment)

from covid-19-signal.

fmaguire avatar fmaguire commented on August 22, 2024

This was resolved in commit 7b580ba right?

from covid-19-signal.

jaleezyy avatar jaleezyy commented on August 22, 2024

I believe we wanted confirmation to be through HISAT2 alignment against the human genome? This looks like the removal of non-SARS-CoV-2 reads, but not necessarily the confirmation step.

from covid-19-signal.

fmaguire avatar fmaguire commented on August 22, 2024

Considering we are throwing away any reads that don't map to the reference with the new "core" (see current branch) we can probably dispense with this entirely.

from covid-19-signal.

agmcarthur avatar agmcarthur commented on August 22, 2024

There was discussion in one of the groups that step 5 (removal of non-SARS-CoV-2) be via alignment to the SARS-CoV-2 genome but step 7 verification was alignment against human genome as this combination would absolutely pass ethics/privacy requirements and provide full confidence to our healthcare colleagues.

from covid-19-signal.

fmaguire avatar fmaguire commented on August 22, 2024

Just chatting about that on the cancogen call. Sounds like: map raw sorted reads to human reference and remove any reads that map. Then go ahead with the remaining reads for trimming.

The human removed but otherwise raw reads can then be uploaded to SRA?

I can add that and remake the PR.

from covid-19-signal.

agmcarthur avatar agmcarthur commented on August 22, 2024

Are we sure removing based on alignment to human won't exclude some legitimate SARS-CoV-2 data? Don't want to create a region of false low coverage.

from covid-19-signal.

jts avatar jts commented on August 22, 2024

Thinking about it a bit more human WGS won't be representative of the type of off-target sequences we might see so I'm going off the idea of a) a bit. b) is worth doing though

from covid-19-signal.

robynslee avatar robynslee commented on August 22, 2024

Yeah, not sure about benefit of a. B seems useful, as that's the issue we're worried about. Might be good to try b with a diverse sample of strains from GISAID, to get a sense of this

from covid-19-signal.

jts avatar jts commented on August 22, 2024

Yeah the thought with a) was to test whether mapping to human is necessary to remove any potential host reads. It would be far simpler if we can just map to coronavirus and discard everything else, I wanted to address that with a) but realized it won't really answer the question

from covid-19-signal.

robynslee avatar robynslee commented on August 22, 2024

Yes, agree, a) doesn't address that question.

from covid-19-signal.

fmaguire avatar fmaguire commented on August 22, 2024

Just throwing this in here so its all together.

  • Need to consider mapping scores: galaxyproject/SARS-CoV-2#49
  • Could do with a set of SRA archives across the nextstrain tree for doing this analysis (among other QC): e.g. create a composite reference and see if there is a good threshold for distinguishing host contamination.

from covid-19-signal.

fmaguire avatar fmaguire commented on August 22, 2024

I took the one wuhan scheme illumina sample I had to hand and ran BWA-MEM versus a composite human + viral reference.

Of those reads which mapped to viral and human contigs (~200 or 0.02%) the distribution the respective mapping qualities looking like this:

composite_reference

So most of the small number of problematic reads are a clear viral hit and lower quality human hit (as Torsten suggests in the linked thread).
If we just take those multihit reads with a MAPQ>=30 to the human reference, we are left with 13 (0.002%) reads:

multimaps MAPQ>=30

We could save a whole 4 of them by comparing the map-qualities between human and viral. The remaining 11 reads with equally good hits to viral and human aren't likely to majorly affect the viral consensus or variant calling.

I could grab a bunch more SRAs and do this across a lot more samples but honestly, we are probably fine just using BWA-MEM MAPQ>=30 in the host removal stage and calling it a good'un.

from covid-19-signal.

agmcarthur avatar agmcarthur commented on August 22, 2024

This excellent, please make sure it is clear in the documentation, including the supporting data.

from covid-19-signal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.