**Update**Tuesday, September 14, 2010: Fixed the ylim issue, now it sets the y axis limit based on the smallest observed p-value.

A while back Will showed you how to create QQ plots of p-values in Stata and in R using the now-deprecated

*sma*package. A bit later on I showed you how to do the same thing in R using ggplot2. As much as we (and our readers) love ggplot2 around here, it can be quite a bit slower than using the built in base graphics. This was only recently a problem for me when I tried creating a quantile-quantile plot of over 12-million p-values. I wrote the code to do this in base graphics, which is substantially faster than using the ggplot2 code I posted a while back. The code an an example are below.

Here's what the resulting QQ-plot will look like:

Actually in base graphics, the easiest way to do the equangular line is abline(0,1, col='red')

ReplyDeleteAh, thanks Abhijit. That'll might speed things up just slightly I imagine.

ReplyDeleteAlso, for the qqplot, you really aren't plotting 1/n,2/n,...,1 against the p-values. There is a R function ppoints which gives the values against which the qqplot should be plotted. For large data this doesn't really matter, but technically....

ReplyDeleteActually, you could just have done:

e = -log10(ppoints(10000))

o = -log10(pvalues)

qqplot(e, o, pch=19, cex=1, main=main, ...,

xlab=expression(Expected~~-log[10](italic(p))),

ylab=expression(Observed~~-log[10](italic(p))),

xlim=c(0,max(e)), ylim=c(0,max(e)))

abline(0,1, col='red')

If you wanted something quick and dirty you could just have done qqplot(ppoints(length(pvalues)), pvalues, log='xy') which would have done things in terms of log10(p) as opposed to -log10(p).

Shouldn't you be setting your y-limit to

ReplyDeleteylim=c(0,max(o)) ?

Otherwise wouldn't it be cutting off everyone's "good news" p-values ??

Good catch Mike. This is now fixed. Thanks.

ReplyDeleteStupid question, but why do you take the -log 10?

ReplyDelete