Coder Social home page Coder Social logo

mscproject's Introduction

Detection and Analysis of Somatic Mutations Overlapping Single Nucleotide Polymorphisms

Table of Contents

Abstract

Accurate somatic mutation calling is crucial for understanding cancer at a molecular level, guiding targeted therapies, and personalized medicine. This project focuses on the Panel of Normals (PoN) filter used to filter out germline variants and polymorphisms. While effective, the PoN filter may still eliminate true somatic variants. This study identifies candidate somatic mutations coinciding with SNP positions in the LUAD dataset, where the alternate allele differs from the PoN recorded allele. Results highlight potential somatic mutations that could be overlooked due to filtering.

Introduction

Somatic variants play a pivotal role in tumor development and treatment. Somatic variant calling pipelines, like MuTect2, are employed to identify such variants. The Panel of Normals (PoN) filter is common in these pipelines to distinguish somatic from germline variants. However, this filter's exclusive focus on genomic positions, not alternate alleles, poses a challenge. Variants with non-matching alternate alleles may be true somatic mutations filtered by PoN. This project aims to explore such candidate somatic mutations within the LUAD dataset.

Methods

Data Acquisition

The LUAD dataset, comprising 630 VCF files from TCGA, forms the basis of this study. Using the dbSNP database and hg38 reference genome, LUAD VCF files intersected with the PoN to identify candidate somatic variants.

Estimating Somatic Mutation Rate

Calculations for true somatic variants and mutation rates provide insights into the expected number of mutated bases within the PoN region.

Identifying Candidate Somatic Mutations

Filtered variants from the PoN were compared with recorded SNPs to identify candidate somatic mutations. Variants with mismatched alternate alleles were isolated and analyzed.

Alternate Allele Frequency Comparison

Density plots and statistical tests were used to compare alternate allele frequencies between candidate somatic mutations and regular SNPs.

Variants Filtered due to PoN + Others, Filter Impact

The impact of additional filters on alternate allele mismatches was assessed by comparing candidate somatic variants and normal SNPs.

Somatic Mutation Analysis

Candidate somatic mutations were analyzed for recurrence and gene associations using NCBI databases.

Results

A total of 186,812 candidate somatic variants were identified within the LUAD dataset. The 'PoN filter only with credible rs IDs' group emerged as the most credible for somatic mutations at SNP positions, revealing 513 variants and 29 recurrences.

Conclusion

This study sheds light on potential somatic mutations overlooked by traditional filtering methods. Understanding the functional impact of these variants could contribute to cancer research and personalized medicine.

For more information, please refer to the full manuscript.

Contact

For any questions or inquiries, please contact me at [email protected]

mscproject's People

Contributors

liamporter6 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.