Monday, January 28, 2013

Scotty, We Need More Power! Power, Sample Size, and Coverage Estimation for RNA-Seq

Two of the most common questions at the beginning of an RNA-seq experiments are "how many reads do I need?" and "how many replicates do I need?". This paper describes a web application for designing RNA-seq applications that calculates an appropriate sample size and read depth to satisfy user-defined criteria such as cost, maximum number of reads or replicates attainable, etc. The power and sample size estimations are based on a t-test, which the authors claim, performs no worse than the negative binomial models implemented by popular RNA-seq methods such as DESeq, when there are three or more replicates present. Empirical distributions are taken from either (1) pilot data that the user can upload, or (2) built in publicly available data. The authors find that there is substantial heterogeneity between experiments (technical variation is larger than biological variation in many cases), and that power and sample size estimation will be more accurate when the user provides their own pilot data.

My only complaint, for all the reasons expressed in my previous blog post about why you shouldn't host things like this exclusively on your lab website, is that the code to run this analysis doesn't appear to be available to save, study, modify, maintain, or archive. When lead author Michele Busby leaves Gabor Marth's lab, hopefully the app doesn't fall into the graveyard of computational biology web apps Update 2/7/13: Michele Busby created a public Github repository for the Scotty code:

tl;dr? There's a new web app that does power, sample size, and coverage calculations for RNA-seq, but it only works well if the pilot or public data you give it closely matches the actual data you'll collect. 


  1. Hi Stephen,

    I will submit the Matlab code to GitHub within the next few days. The journal has the code as a supplemental file but I am not sure whether they will post it with the final version of the paper. They also have a big supplement that has not been posted which explains the motivation for using the t-test in more depth.

    One never really leaves the Marth lab, but I've been working at the Broad since September.

    Thanks for the post!

    1. Michele - good to hear! Post a link here to the GitHub repo and I'll update the post.

    2. Hi Stephen,

      The files are up. The documentation is not as thorough as for the web app (still kind of a work in progress) but I'm available by email.


    3. Great! I updated the post with a link.

  2. This comment has been removed by the author.

  3. Hi Stephen,
    Did you get any chance to actually run the matlab code of Scotty. There is bugs when I implement it using their example data. (
    I found it's not quite convenient of the web tool since they don't give any quantatitive representations. (only color) Also when I tried to export it to pdf there's problem. (figure not showing)


Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.