Department of Biostatistics Seminar/Workshop Series:
Statistical Methods for DNA Resequencing Analysis in Disease-Gene Studies
Wenyi Wang, Ph.D., Faculty Candidate
Stanford Genome Technology Center, UC Berkeley Statistics
Monday, February 15, 2010
MRB III Room 1220
Intended Audience: Persons interested in applied statistics, statistical theory, epidemiology, health services research, clinical trials methodology, statistical computing, statistical graphics, R users or potential users
Nuclear genes encode most mitochondrial proteins and their mutations cause diverse and debilitating clinical disorders. To date, 1,200 genes have been recorded to be associated with mitochondrial diseases. Identifying DNA variants in these genes in individuals affected by mitochondrial diseases remains a major challenge, because many diseases are thought to be associated with rare variants (minor allele frequency <1%). Medical resequencing arrays enable cost-efficient and high-throughput sequencing of candidate genes. For diploid genomes, available base-calling tools only achieve high accuracy by calling a portion of all nucleotide positions.
Distinct from whole-genome SNP data, the array-based resequencing data present a very low frequency of biological signals (genetic variations from a reference sequence), which motivated us to develop a new statistical method, SRMA (Sequence Robust Multi-array Analysis). Our challenge was to fully detect sequence variations with minimal false discoveries, when sequencing errors occur at high frequencies. We extended the multi-level mixture models, previously deployed for SNP arrays, to accurately call single heterozygous samples at rare variant positions. We demonstrate our methods in a resequencing study of 39 candidate genes among healthy individuals and patients with mitochondrial diseases.