Coder Social home page Coder Social logo

tcga_brca's Introduction

TCGA_BRCA数据挖掘测试

首先在UCSC Xena数据库下载TCGA计划的所有BRCA相关数据分析结果来进行下游挖掘。

UCSC Xena网址: https://xenabrowser.net/datapages/

我这里选择的是TCGA Breast Cancer (BRCA) (30 datasets) 而不是 GDC TCGA Breast Cancer (BRCA) (18 datasets) 一定要搞清楚哦!!!

下载的数据包括:

首先对芯片表达矩阵分析

这里仅仅是跑了PAM50分类,结果如下:

如果你感兴趣其它分析,可以看我安排给2018年学徒的数据挖掘任务,比如下载乳腺癌的芯片表达数据进行差异分析 https://mp.weixin.qq.com/s/CJb27qhbjdZadJDnK2vNLw

然后是针对RNA-seq表达矩阵的

因为GitHub容量限制,我仅仅是挑选了TNBC的病人,代码如下:

rm(list = ls())
options(stringsAsFactors = F)
a=read.table('TCGA-BRCA.survival.tsv.gz',header = T,sep = '\t')
a=read.table('TCGA-BRCA.GDC_phenotype.tsv.gz',header = T,sep = '\t',quote = '')
(tmp=as.data.frame(colnames(a)))
tmp=a[,grepl('her2',colnames(a))]
table(a$breast_carcinoma_estrogen_receptor_status)
table(a$breast_carcinoma_progesterone_receptor_status)
table(a$lab_proc_her2_neu_immunohistochemistry_receptor_status)
eph=a[,grepl('receptor_status',colnames(a))]
eph=eph[,1:3]
## 挑选全部是阴性的
tnbc_s=a[apply(eph,1, function(x) sum(x=='Negative'))==3,1]
tnbc_s
save(tnbc_s,file = 'tnbc_s.Rdata')

然后在TNBC病人里面挑选那些既有normal又有tumor的样本,这样就只有9个TNBC病人了,他们的表达矩阵的主成分分析如下:

可以看到,normal和tumor在RNA-seq的表达水平上泾渭分明,就可以做差异分析流程啦,代码见:TCGA数据库中三阴性乳腺癌在亚洲人群中的差异表达 , https://mp.weixin.qq.com/s/IOGfzzpcWkzyQPzMADKY4g

当然了,针对这么大的数据量,你可以任意开启自己的课题,比如我安排给2018年学徒的:

somatic 突变分析

CNV分析

临床资料分析

tcga_brca's People

Contributors

jmzeng1314 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.