Thursday, December 16, 2010

Epistasis in New Places

Coming from the lineage of Jason Moore, I am obliged to occasionally remind everyone that biological systems are inherently complex, and to some degree, we should therefore expect statistical models involving those systems to be complex as well.

With the development of GWAS, many approaches to examine epistasis are weighed down by the computational burden of exhaustively conducting billions of statistical tests. With this in mind, several bioinformatics approaches (such as Biofilter and INTERSNP) have focused on looking for gene-gene interactions within biological pathways, ontologies, or protein-protein interaction networks. The assumption underlying these methods is that interactions occur between variants of two different genes – what you could call trans-epistasis.

Considering the epic complexity of the transcriptions process, the genetics of gene expression seems just as likely to harbor epistasis as biological pathways. Following the excellent work of Barbara Stranger, Jonathan Pritchard, and various other luminaries in this area, Stephen Turner and I examined HapMap genotypes and gene expression levels from corresponding cell lines to look for cis-epistasis.

We found 79 genes where SNP pairs in the gene's regulatory region can interact to influence the gene's expression. What is perhaps most interesting is that there are often large distances between the two interacting SNPs (with minimal LD between them), meaning that most haplotype and sliding window approaches would miss these effects. The full text is available online: "Multivariate analysis of regulatory SNPs: empowering personal genomics by considering cis-epistasis and heterogeneity."


  1. In the QQplot (Figure 1) of your paper, what is the expected distribution of interaction P-values for pairs of cis-SNPs selected on the basis of an association with the expression trait at p<0.05 ? It appears that you assume that it is the same as in absence of selection.

  2. The line representing the null hypothesis in that figure does assume that the p-values of interaction terms will be uniformly distributed... However the FDR correction we applied does a dynamic fitting of the null distribution that should account for inflation due to main effects, at least to some degree -- supposedly better than the classic hochberg procedure.

    I'm not sure I've seen any work describing the theoretical distribution of interaction terms (and p-values) for various genetic models and for cases where the SNPs have independent effects. If you have references please share! There is certainly lots of work to be done in this area.


Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.