1. 程式人生 > >GWAS這十年 | 10 Years of GWAS Discovery: Biology, Function, and Translation

GWAS這十年 | 10 Years of GWAS Discovery: Biology, Function, and Translation

所有 這就是 infer rac 超過 dep 以及 wide rec

太重要了,不得不單獨拿出來分析一下。本review高度總結了GWAS這10年的成績、以及現在的局限性。每個搞統計遺傳的都必須要好好看看。

第一篇GWAS是什麽時候?誰提出的?The first successful GWAS published in 2002 studied myocardial infarction. Ozaki

trans-ethnic和meta analysis的區別?不同人種和整合分析,概念不一樣,目的不一樣

LD和correlation的聯系和區別?LD表征的是一個SNP的特征,correlation是兩個對象之間的相關性。LD就是某個SNP的1M區域內,其所有閾值超過0.5的r2的總和。

heritability是什麽?在GWAS中如何計算?heritability在不同人群中是會變的,因為GE interaction會變,看wiki定義。

在GWAS實驗設計中,有哪些因素會影響power?如何計算power和控制power?見下文

什麽是genetic architecture?the joint distribution of effect size and allele frequency

GWAS的SNP arrays通常包含多少個SNP位點,是如何選擇出這些位點的?minor allele frequency (MAF) typically larger than 1%.

為什麽要做SNP imputation,根據什麽來做?haplotype

什麽情況下WGS能檢測出rare variant與trait的association?sample size足夠,或者effect size足夠,最終還是要power足夠。

burden testing of rare variant是什麽?以基因為單位,檢驗rare variant在case和control中的差異是否顯著。

10 Years of GWAS Discovery: Biology, Function, and Translation

之前的第一個五年總結:Five Years of GWAS Discovery

The GWAS is an experimental design used to detect associations between genetic variants and traits in samples from populations. 可以說genetic variants,也可以說gene,或者loci。

GWAS其實是一種包含了實驗設計和分析的整合方法,主要用於complex-trait的控制基因定位。如果是單基因病monogenic disease的話,就沒必要做GWAS了。

對於正常的性狀,比如身高,定位到的就是控制身高的一些loci;對於疾病就是定位到導致疾病的一些variant上。

variant有很多種,目前GWAS主要關註的是SNP,其實還有InDel、CNV和SV。

這就是我的主要工作,學學別人的措辭:

The path from GWAS to biology is not straightforward because an association between a genetic variant at a genomic locus and a trait is not directly informative with respect to the target gene or the mechanism whereby the variant is associated with phenotypic differences.

The statistical power to detect associations between DNA variants and a trait depends on the experimental sample size, the distribution of effect sizes of (unknown) causal genetic variants that are segregating in the population, the frequency of those variants, and the LD between observed genotyped DNA variants and the unknown causal variants.

In addition, other genome-wide scans, such as WES and WGS studies, allow testing for a burden of rare variants across shared functional units (e.g., genes) in a way that is not accessible to GWASs.

Burden Testing of Rare Variants Identified through Exome Sequencing via Publicly Available Control Data

這個視頻講得很好,由淺及深:BroadE: Statistical Genetics - Rare variants

以下是本review的框架:

復雜性狀的高度多基因性Complex Traits Are Highly Polygenic

phenotype可以很general,大到身高、小到基因表達、表觀變化。

Polygenic就是控制復雜性狀的基因或loci是很多的,如何整合解釋它們整體的影響非常重要。


多效性是普遍存在的Pleiotropy Is Pervasive

多效性就是一個位點的突變可能影響了多種表現,這也就是為什麽很多表型具有高度的相關性。


新的分析方法學New Analysis Methodology Underpinning New Discovery

GWAS的後續研究主要有以下四個方面:

(1) methods of better modeling population structure and relatedness between individuals in a sample during association analyses,28–34
(2) methods of detecting novel variants and gene loci on the basis of GWAS summary statistics, 35–37
(3) methods of estimating and partitioning genetic (co)variance,38,39 and
(4) methods of inferring causality.40–42

常見變異解釋了大部分的累積遺傳變異Common Variants Together Tag a Substantial Proportion of Additive Genetic Variance

Additive Genetic Variance就是指AA、Aa、aa之間的表型是線性的,而不是顯性和隱性的關系。


遺傳預測方法The Utility of GWAS-Derived Genetic Predictors

polygenic risk score (PRS),就是根據每個個體的變異和effect size,給每一個個體一個具體的患某疾病的打分。


公共數據庫的應用The Public Availability of Data Has Enabled Novel Research and Discoveries

GWAS Catalog - EMBL-EBI,最有名的數據庫。

UK Biobank


從GWAS到生物學From GWAS to Biology

如何填補這個gap,於是出現了很多數據庫:ENCODE Epigenome RoadMap, and GTEx


三個成功的GWAS案例Three Exemplars of GWAS Success

值得一看,如何用精簡的語言高度總結一個項目。

GWAS這十年 | 10 Years of GWAS Discovery: Biology, Function, and Translation