DNA methylation microarrays : experimental design and statistical analysis /

副标题:无

作   者:Sun-Chong Wang, Arturas Petronis.

分类号:

ISBN:9781420067279

微信扫一扫,移动浏览光盘

简介

Summary: Publisher Summary 1 Wang (Systems Biology and Bioinformatics Institute, National Central U., Taiwan) and Petronis (Epigenetics Laboratory, Centre for Addiction and Mental Health, Canada) wrote this work to aid researchers and students in analyzing high throughput epigenomic data with sound statistics. The focus is on DNA methylation microarray data, but much of the analysis applies also to gene expression and histone modifications by chromatin immunoprecipitation on a chip. The first section introduces the basic statistics, describes the wet bench technologies producing the data for analysis, and preprocesses the data to remove systematic artifacts resulting from imperfection in the measurement. This normalized data is then subject to conventional hypothesis-driven analysis looking for differential methylated loci between populations, following which genomic tiling arrays are discussed. The next section concerns exploratory analysis and considers how the functions and roles of unannotated DNA elements are associated with those of known ones by cluster and network analysis. Final chapters discuss online annotations, public microarray data repositories, and open source software for microarray data analysis. The CD-ROM contains files of results plots discussed in the text and full-color versions of a number of images. Annotation 漏2008 Book News, Inc., Portland, OR (booknews.com)  

目录



Contents
1 Applied Statistics 23
1.1 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . 23
1.1.1 Frequency distribution . . . . . . . . . . . . . . . . . . 24
1.1.2 Central tendency and variability . . . . . . . . . . . . 24
1.1.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . 26
1.2 Inferential statistics . . . . . . . . . . . . . . . . . . . . . . . 27
1.2.1 Probability distribution . . . . . . . . . . . . . . . . . 27
1.2.2 Central limit theorem and normal distribution . . . . 29
1.2.3 Statistical hypothesis testing . . . . . . . . . . . . . . 29
1.2.4 Two-sample t-test . . . . . . . . . . . . . . . . . . . . 31
1.2.5 Non-parametric test . . . . . . . . . . . . . . . . . . . 31
1.2.6 One-factor ANOVA and F-test . . . . . . . . . . . . . 32
1.2.7 Simple linear regression . . . . . . . . . . . . . . . . . 33
1.2.8 Chi-square test of contingency . . . . . . . . . . . . . 35
1.2.9 Statistical power analysis . . . . . . . . . . . . . . . . 36
2 DNA Methylation Microarrays and Quality Control 39
2.1 DNA methylation microarrays . . . . . . . . . . . . . . . . . 40
2.2 Workflow of methylome experiment . . . . . . . . . . . . . . 43
2.2.1 Restriction enzyme based enrichment . . . . . . . . . 43
2.2.2 Immunoprecipitation based enrichment . . . . . . . . 43
2.3 Image analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4 Visualization of raw data . . . . . . . . . . . . . . . . . . . . 48
2.5 Reproducibility . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.5.1 Positive and negative controls by exogenous sequences 54
2.5.2 Intensity fold-change and p-value . . . . . . . . . . . . 54
2.5.3 DNA unmethylation profiling . . . . . . . . . . . . . . 55
2.5.4 Correlation of intensities between tiling arrays . . . . 55
3 Experimental Design 57
3.1 Goals of experiment . . . . . . . . . . . . . . . . . . . . . . . 58
3.1.1 Class comparison and class prediction . . . . . . . . . 58
3.1.2 Class discovery . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Reference design . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2.1 Dye swaps . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3 Balanced block design . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Loop design . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5
6
3.5 Factorial design . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.6 Time course experimental design . . . . . . . . . . . . . . . . 69
3.7 How many samples/arrays are needed? . . . . . . . . . . . . 71
3.7.1 Biological vs technical replicates . . . . . . . . . . . . 71
3.7.2 Statistical power analysis . . . . . . . . . . . . . . . . 71
3.7.3 Pooling biological samples . . . . . . . . . . . . . . . . 77
3.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4 Data Normalization 81
4.1 Measure of methylation . . . . . . . . . . . . . . . . . . . . . 81
4.2 The need for normalization . . . . . . . . . . . . . . . . . . . 83
4.3 Strategy for normalization . . . . . . . . . . . . . . . . . . . 84
4.4 Two-color CpG island microarray normalization . . . . . . . 85
4.4.1 Global dependence of log methylation ratios . . . . . . 86
4.4.2 Dependence of log ratios on intensity . . . . . . . . . . 87
4.4.3 Dependence of log ratios on print-tips . . . . . . . . . 89
4.4.4 Normalized Cy3- and Cy5-intensities . . . . . . . . . . 92
4.4.5 Between-array normalization . . . . . . . . . . . . . . 93
4.5 Oligonucleotide arrays normalization . . . . . . . . . . . . . 94
4.5.1 Background correction, PM-MM? . . . . . . . . . . . . 94
4.5.2 Quantile normalization . . . . . . . . . . . . . . . . . . 95
4.5.3 Probeset summarization . . . . . . . . . . . . . . . . . 97
4.6 Normalization using control sequences . . . . . . . . . . . . . 98
4.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5 Significant Differential Methylation 103
5.1 Fold change . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Linear model for log-ratios or log-intensities . . . . . . . . . 106
5.2.1 Microarrays reference design or oligonucleotide chips . 106
5.2.2 Sequence-specific dye effect in two-color microarrays . 109
5.3 t -test for contrasts . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4 F-test for joint contrasts . . . . . . . . . . . . . . . . . . . . 111
5.5 P-value adjustment for multiple testing . . . . . . . . . . . . 114
5.5.1 Bonferroni correction . . . . . . . . . . . . . . . . . . . 114
5.5.2 False discovery rate . . . . . . . . . . . . . . . . . . . 114
5.6 Modified t - and F-test . . . . . . . . . . . . . . . . . . . . . 116
5.7 Significant variation within and between groups . . . . . . . 117
5.7.1 Within-group variation . . . . . . . . . . . . . . . . . 117
5.7.2 Between-group variation . . . . . . . . . . . . . . . . . 118
5.8 Significant correlation with a co-variate . . . . . . . . . . . . 119
5.9 Permutation test for bisulfite sequence data . . . . . . . . . . 122
5.9.1 Euclidean distance . . . . . . . . . . . . . . . . . . . . 123
5.9.2 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.10 Missing data values . . . . . . . . . . . . . . . . . . . . . . . 125
5.11 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7
5.11.1 Factorial design . . . . . . . . . . . . . . . . . . . . . . 126
5.11.2 Time-course experiments . . . . . . . . . . . . . . . . 127
5.11.3 Balanced block design . . . . . . . . . . . . . . . . . . 128
5.11.4 Loop design . . . . . . . . . . . . . . . . . . . . . . . . 129
6 High Density Genomic Tiling Arrays 131
6.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.1.1 Intra- and interarray normalization . . . . . . . . . . . 132
6.1.2 Sequence based probe effects . . . . . . . . . . . . . . 132
6.2 Wilcoxon test in a sliding window . . . . . . . . . . . . . . . 134
6.2.1 Probe score or scan statistic . . . . . . . . . . . . . . . 140
6.2.2 False positive rate . . . . . . . . . . . . . . . . . . . . 141
6.3 Boundaries of methylation regions . . . . . . . . . . . . . . . 141
6.4 Principal component analysis and biplot . . . . . . . . . . . 143
7 Cluster Analysis 147
7.1 Measure of dissimilarity . . . . . . . . . . . . . . . . . . . . . 147
7.2 Dimensionality reduction . . . . . . . . . . . . . . . . . . . . 148
7.3 Hierarchical clustering . . . . . . . . . . . . . . . . . . . . . . 151
7.3.1 Bottom-up approach . . . . . . . . . . . . . . . . . . . 151
7.3.2 Top-down approach . . . . . . . . . . . . . . . . . . . 154
7.4 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . 155
7.5 Model-based clustering . . . . . . . . . . . . . . . . . . . . . 159
7.6 Quality of clustering . . . . . . . . . . . . . . . . . . . . . . . 161
7.7 Statistically significance of clusters . . . . . . . . . . . . . . . 161
7.8 Reproducibility of clusters . . . . . . . . . . . . . . . . . . . 164
7.9 Repeated measurements . . . . . . . . . . . . . . . . . . . . . 165
8 Statistical Classification 167
8.1 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.2 Discriminant function . . . . . . . . . . . . . . . . . . . . . . 170
8.2.1 Linear discriminant analysis . . . . . . . . . . . . . . . 171
8.2.2 Diagonal linear discriminant analysis . . . . . . . . . . 172
8.3 K-nearest neighbor . . . . . . . . . . . . . . . . . . . . . . . 172
8.4 Performance assessment . . . . . . . . . . . . . . . . . . . . . 173
8.4.1 Leave-one-out cross validation . . . . . . . . . . . . . . 174
8.4.2 Receiver operating characteristic analysis . . . . . . . 176
9 Interdependency Network of DNA Methylation 181
9.1 Graphs and networks . . . . . . . . . . . . . . . . . . . . . . 182
9.2 Partial correlation . . . . . . . . . . . . . . . . . . . . . . . . 182
9.3 Dependence networks from DNA methylation microarrays . . 183
9.4 Network analysis . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.4.1 Distribution of connectivities . . . . . . . . . . . . . . 187
9.4.2 Active epigenetically regulated loci . . . . . . . . . . . 187
8
9.4.3 Correlation of connectivities . . . . . . . . . . . . . . . 188
9.4.4 Modularity . . . . . . . . . . . . . . . . . . . . . . . . 189
10 Time Series Experiment 197
10.1 Regulatory networks from microarray data . . . . . . . . . . 199
10.2 Dynamic model of regulation . . . . . . . . . . . . . . . . . . 200
10.3 A penalized likelihood score for parsimonious model . . . . . 200
10.4 Optimization by genetic algorithms . . . . . . . . . . . . . . 202
11 Online Annotations 205
11.1 Gene centric resources . . . . . . . . . . . . . . . . . . . . . . 205
11.1.1 GenBank: a nucleotide sequence database . . . . . . . 205
11.1.2 UniGene: an organized view of transcriptomes . . . . 206
11.1.3 RefSeq: reviews of sequences and annotations . . . . . 206
11.1.4 PubMed: a bibliographic database of biomedical journals
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.1.5 dbSNP: database for nucleotide sequence variation . . 208
11.1.6 OMIM: a directory of human genes and genetic disorders
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
11.1.7 Entrez Gene: a Web portal of genes . . . . . . . . . . 208
11.2 PubMeth: a cancer methylation database . . . . . . . . . . . 210
11.3 Gene Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . 211
11.4 Kyoto Encyclopedia of Genes and Genomes . . . . . . . . . . 213
11.5 UniProt/Swiss-Prot knowledgebase . . . . . . . . . . . . . . 214
11.6 The International HapMap Project . . . . . . . . . . . . . . 216
11.7 UCSC human genome browser . . . . . . . . . . . . . . . . . 216
12 Public Microarray Data Repositories 223
12.1 Epigenetics Society . . . . . . . . . . . . . . . . . . . . . . . 223
12.2 Microarray Gene Expression Data Society . . . . . . . . . . . 224
12.3 Minimum Information About a Microarray Experiment . . . 224
12.4 Public repositories for high-throughput arrays . . . . . . . . 226
12.4.1 Gene Expression Omnibus at NCBI . . . . . . . . . . 226
12.4.2 ArrayExpress at EBI . . . . . . . . . . . . . . . . . . . 226
12.4.3 Center for Information BiologyGene Expression at DDBJ
228
13 Open Source Software for Microarray Data Analysis 229
13.1 R ¿ a language and environment for statistical computing and
graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
13.2 Bioconductor . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
13.2.1 marray package . . . . . . . . . . . . . . . . . . . . . . 233
13.2.2 affy package . . . . . . . . . . . . . . . . . . . . . . . 233
13.2.3 limma package . . . . . . . . . . . . . . . . . . . . . . 233
13.2.4 stats package . . . . . . . . . . . . . . . . . . . . . . 233
9
13.2.5 tilingArray package . . . . . . . . . . . . . . . . . . 235
13.2.6 Ringo package . . . . . . . . . . . . . . . . . . . . . . 235
13.2.7 cluster package . . . . . . . . . . . . . . . . . . . . . 235
13.2.8 class package . . . . . . . . . . . . . . . . . . . . . . 235
13.2.9 GeneNet package . . . . . . . . . . . . . . . . . . . . . 235
13.2.10inetwork package . . . . . . . . . . . . . . . . . . . . 235
13.2.11GOstats package . . . . . . . . . . . . . . . . . . . . . 236
13.2.12annotate package . . . . . . . . . . . . . . . . . . . . 236
References 237

已确认勘误

次印刷

页码 勘误内容 提交人 修订印次

DNA methylation microarrays : experimental design and statistical analysis /
    • 名称
    • 类型
    • 大小

    光盘服务联系方式: 020-38250260    客服QQ:4006604884

    意见反馈

    14:15

    关闭

    云图客服:

    尊敬的用户,您好!您有任何提议或者建议都可以在此提出来,我们会谦虚地接受任何意见。

    或者您是想咨询:

    用户发送的提问,这种方式就需要有位在线客服来回答用户的问题,这种 就属于对话式的,问题是这种提问是否需要用户登录才能提问

    Video Player
    ×
    Audio Player
    ×
    pdf Player
    ×
    Current View

    看过该图书的还喜欢

    some pictures

    解忧杂货店

    东野圭吾 (作者), 李盈春 (译者)

    亲爱的云图用户,
    光盘内的文件都可以直接点击浏览哦

    无需下载,在线查阅资料!

    loading icon