Monday, March 8, 2010

Searching for SNPs with cloud computing

Suppose you have billions of reads from a hot new sequencing machine and you want to simultaneously align these reads and call SNPs very quickly on the cheap. Check out an open source tool called Crossbow and the recent paper in Genome Biology.  Crossbow is a Hadoop-based software tool that combines the speed of the short read aligner Bowtie with the accuracy of the SNP caller SOAPsnp to perform alignment and SNP calling for multiple human whole-genome datasets per day. In the demonstration in the paper, the authors aligned and called SNPs from 2.7 billion short reads from a Han Chinese male with 98% concordance to the calls from an Illumina genotyping chip. The whole process took 3 hours on a 320-core parallel computing cluster rented from the Amazon Elastic Compute Cloud (EC2) for a total cost of $85. Since everything is open-source, there should be nothing stopping you from downloading all the necessary software and running it on your own cluster if you have access to one.

Crossbow: Genotyping from short reads using cloud computing

No comments:

Post a Comment

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.