Tuesday, January 26, 2010

Rare Variants Create Synthetic Genome-Wide Associations

"The hunt for the genetic roots of common diseases has hit a blank wall."

...quoting the first sentence in Nicholas Wade's New York Times article reviewing this PLoS Biology research paper by David Goldstein and his colleagues at Duke University. Also be sure to see Richard Robinson's synopsis of this paper, both published this week in PLoS Bio.

The argument made by Goldstein and colleagues is that natural selection has done much better at eliminating disease-predisposing variants than we originally thought. Although we've found over 2000 disease-gene associations to common variants, it's likely that many of the common variants are tagging rare, ungenotyped variants. Yet most of the time we look at the gene nearest our top hits and try to spin a story about how that gene relates to our phenotype.  As a proof of principle, the authors perform a simulation study using sickle-cell anemia as a model. Although the disease is caused by a variant in a single gene, the authors found statistically significant associations with 179 SNPs, spread across several megabases of DNA, containing many genes - and most of these SNPs were clearly pointing at the wrong thing.

New York Times: A New Way to Look for Diseases’ Genetic Roots

Synopsis (PLoS Biology): Common Disease, Multiple Rare (and Distant) Variants

Research article (PLoS Biology): Rare Variants Create Synthetic Genome-Wide Association

Abstract: Genome-wide association studies (GWAS) have now identified at least 2,000 common variants that appear associated with common diseases or related traits (http://www.genome.gov/gwastudies), hundreds of which have been convincingly replicated. It is generally thought that the associated markers reflect the effect of a nearby common (minor allele frequency >0.05) causal site, which is associated with the marker, leading to extensive resequencing efforts to find causal sites. We propose as an alternative explanation that variants much less common than the associated one may create “synthetic associations” by occurring, stochastically, more often in association with one of the alleles at the common site versus the other allele. Although synthetic associations are an obvious theoretical possibility, they have never been systematically explored as a possible explanation for GWAS findings. Here, we use simple computer simulations to show the conditions under which such synthetic associations will arise and how they may be recognized. We show that they are not only possible, but inevitable, and that under simple but reasonable genetic models, they are likely to account for or contribute to many of the recently identified signals reported in genome-wide association studies. We also illustrate the behavior of synthetic associations in real datasets by showing that rare causal mutations responsible for both hearing loss and sickle cell anemia create genome-wide significant synthetic associations, in the latter case extending over a 2.5-Mb interval encompassing scores of “blocks” of associated variants. In conclusion, uncommon or rare genetic variants can easily create synthetic associations that are credited to common variants, and this possibility requires careful consideration in the interpretation and follow up of GWAS signals.


  1. Too weak to be found by linkage studies? Big enough to show up stochastically with common variants? How big is this space?

    Of course, if right, it helps lay the foundation for funding what we were waiting for all along: whole genome sequencing.

    As Kari Steffanson said in the NY Times: We can speculate until we are blue in the face, but there is no substitute for data.

  2. Thinking about this more: couldn't this be tested with data from dbGAP for independent GWAS studies. If correct, wouldn't you expect to see a higher frequency of nominally significant associations than expected by chance around, but distant to the original signal? Maybe an abundance associated with lower MAF SNPs after removal of the major signal?

    What I mean is: couldn't you find a way to test the correctness of their hypothesis by using existing GWAS data?

  3. Thanks RxDx. I believe you have a point above. Here's another perspective.



Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.