Per tradition, Russ Altman gave his "Translational Bioinformatics: The Year in Review" presentation at the close of the AMIA Joint Summit on Translational Bioinformatics in San Francisco on March 26th. This year, papers came from six key areas (and a final Odds and Ends category). His full slide deck is available here.
I always enjoy this talk because it routinely points me to new collections of data and new software tools that are useful for a variety of analyses; as such, I thought I would highlight these resources from his talk this year.
GRASP: analysis of genotype-phenotype results from1390 genome-wide association studies and corresponding open access database
Some of you may have accessed the Johnson and O'Donnell catalog of GWAS results published in 2009. This data set was a more extensive collection of GWAS findings than the popular NHGRI GWAS catalog, as it did not impose a genome-wide significance threshold for reported associations. The GRASP database is a similar effort, reporting numerous attributes of each study.
A zip archive of the full data set (a flat file) is available here.
Effective diagnosis of genetic disease by computational phenotype analysis of the disease associated genome
This paper tackles the enormously complex task of diagnosing rare genetic diseases using a combination of genetic variants (from a VCF file), a list of phenotype characteristics (fed from the Human Phenotype Ontology), and a few other aspects of the disease.
The online tool called PhenIX is available here.
A network based method for analysis of lncRNA disease associations and prediction of lncRNAs implicated in diseases
Here, Yang et al. examine relationships between known long non-coding RNAs and disease using graph propagation. Their underlying database, however, was generated using PubMed mining along with some manual curation.
Their lncRNA-Disease database is available here.
SNPsea: an algorithm to identify cell types, tissuesand pathways affected by risk loci
This tool is a type of SNP set enrichment, designed to specifically look at functional enrichment in the context of specific tissues and cell types. The tool is a C++ executable, available for download here.
The data sources underlying the SNPsea algorithm are available here.
Human symptoms-disease network
Here Zhou et al. systematically extract symptom-to-disease network by exploting MeSH annotations. They compiled a list of 322 symptoms and 4,442 diseases from the MeSH vocabulary, and document their occurrence within PubMed. Using this disease-symptom network, the authors explore the biological underpinnings of certain symptoms by looking at shared genomic elements between diseases with similar symptoms.
The full list of ~130,000 edges in their disease-symptom network is available here.
A circadian gene expression atlas in mammals: implications for biology and medicine
This fascinating paper explores the temporal impact on gene expression traits from 12 mouse organs. By systematically collecting transcriptome data from these tissues at two hour intervals, the authors construct a temporal atlas of gene expression, and show that 43% of proteins have a circadian expression profile.
The accompanying CircaDB database is available online here.
dRiskKB: a large-scale disease-disease riskrelationship knowledge base constructed frombiomedical text
The authors of dRiskKB use text mining across MEDLINE citations using a controlled disease vocabulary, in this case the Human Disease Ontology, to generate pairs of diseases that co-occur with specific patterns in abstract text. These pairs are ranked with a scoring algorithm and provide a new resource for disease co-morbidity relationships.
The flat file data driving dRiskKB can be found online here.
A tissue-based map of the human proteome
In this major effort, a group of investigators have published the most detailed atlas of human protein expression to date. The transcriptome has been extensively studied across human tissues, but it remains unclear to what extent transcriptional activity reflects translation into protein. But most importantly, the data are searchable via a beautiful website.
The underlying data from the Human Protein Atlas is available here.