Thursday, May 15, 2014

qqman: an R package for creating Q-Q and manhattan plots from GWAS results

Three years ago I wrote a blog post on how to create manhattan plots in R. After hundreds of comments pointing out bugs and other issues, I've finally cleaned up this code and turned it into an R package.

The qqman R package is on CRAN:

The source code is on GitHub:

If you'd like to cite the qqman package (appreciated but not required), please cite this pre-print: Turner, S.D. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. biorXiv DOI: 10.1101/005165 (2014).

Something wrong? Please file bug reports, feature requests, or anything else related to the code as an issue on GitHub rather than commenting here. Also, please post the code you're using and some example data causing a failure in a publicly accessible place, such as a GitHub gist (no registration required). It's difficult to troubleshoot if I can't see the data where the code is failing. Want to contribute? Awesome! Send me a pull request.

Note: This release is substantially simplified for the sake of maintainability and creating an R package. The old code that allows confidence intervals on the Q-Q plot and allows more flexible annotation and highlighting is still available at the version 0.0.0 release on GitHub.

Here's a shout-out to all the blog commenters on the previous post for pointing out bugs and other issues, and a special thanks to Dan Capurso and Tim Knutsen for useful contributions and bugfixes.

qqman package tutorial

First things first. Install the package (do this only once), then load the package (every time you start a new R session)
# only once:

# each time:
You can access this help any time from within R by accessing the vignette:
The qqman package includes functions for creating manhattan plots and q-q plots from GWAS results. The gwasResults data.frame included with the package has simulated results for 16,470 SNPs on 22 chromosomes. Take a look at the data:
'data.frame':   16470 obs. of  4 variables:
 $ SNP: chr  "rs1" "rs2" "rs3" "rs4" ...
 $ CHR: int  1 1 1 1 1 1 1 1 1 1 ...
 $ BP : int  1 2 3 4 5 6 7 8 9 10 ...
 $ P  : num  0.915 0.937 0.286 0.83 0.642 ...
  SNP CHR BP      P
1 rs1   1  1 0.9148
2 rs2   1  2 0.9371
3 rs3   1  3 0.2861
4 rs4   1  4 0.8304
5 rs5   1  5 0.6417
6 rs6   1  6 0.5191
          SNP CHR  BP      P
16465 rs16465  22 530 0.5644
16466 rs16466  22 531 0.1383
16467 rs16467  22 532 0.3937
16468 rs16468  22 533 0.1779
16469 rs16469  22 534 0.2393
16470 rs16470  22 535 0.2630
How many SNPs on each chromosome?$CHR))
   Var1 Freq
1     1 1500
2     2 1191
3     3 1040
4     4  945
5     5  877
6     6  825
7     7  784
8     8  750
9     9  721
10   10  696
11   11  674
12   12  655
13   13  638
14   14  622
15   15  608
16   16  595
17   17  583
18   18  572
19   19  562
20   20  553
21   21  544
22   22  535

Creating manhattan plots

Now, let's make a basic manhattan plot.

We can also pass in other graphical parameters. Let's add a title (main=), reduce the point size to 50%(cex=), and reduce the font size of the axis labels to 80% (cex.axis=):
manhattan(gwasResults, main = "Manhattan Plot", cex = 0.5, cex.axis = 0.8)

Let's change the colors and increase the maximum y-axis:
manhattan(gwasResults, col = c("blue4", "orange3"), ymax = 12)

Let's remove the suggestive and genome-wide significance lines:
manhattan(gwasResults, suggestiveline = F, genomewideline = F)

Let's look at a single chromosome:
manhattan(subset(gwasResults, CHR == 1))

Let's highlight some SNPs of interest on chromosome 3. The 100 SNPs we're highlighting here are in a character vector called snpsOfInterest. You'll get a warning if you try to highlight SNPs that don't exist.
 chr [1:100] "rs3001" "rs3002" "rs3003" "rs3004" "rs3005" ...
manhattan(gwasResults, highlight = snpsOfInterest)

We can combine highlighting and limiting to a single chromosome:
manhattan(subset(gwasResults, CHR == 3), highlight = snpsOfInterest, main = "Chr 3")

A few notes on creating manhattan plots:
  • Run str(gwasResults). Notice that the gwasResults data.frame has SNP, chromosome, position, and p-value columns named SNP, CHR, BP, and P. If you're creating a manhattan plot and your column names are different, you'll have to pass the column names to the chr=, bp=, p=, and snp= arguments. See help(manhattan) for details.
  • The chromosome column must be numeric. If you have “X,” “Y,” or “MT” chromosomes, you'll need to rename these 23, 24, 25, etc.
  • If you'd like to change the color of the highlight or the suggestive/genomewide lines, you'll need to modify the source code. Search for col="blue", col="red", or col="green3" to modify the suggestive line, genomewide line, and highlight colors, respectively.

Creating Q-Q plots

Creating Q-Q plots is straightforward - simply supply a vector of p-values to the qq() function. You can optionally provide a title.
qq(gwasResults$P, main = "Q-Q plot of GWAS p-values")


  1. Great work Stephen!
    Really like the updated source code with friendly comments.
    For my own work I'll try to implement colouring of the SNPs based on LD with the leading SNP.


    1. Great - and don't hesitate to send a PR if you do. And I fully intend on re-integrating your and Dan's contributions - I just had to simplify code to the point where I felt comfortable documenting and maintaining before I could release as a package.

  2. This is a really useful form of your previous work. One thing that many might like tweaked and is good for some low level users is an argument to not -logten transform the data. There are many types of genome wide results that this function is great for. it was pretty easy to tweek the function myself, but not everyone is comfortable doing that. Thanks!

    1. Good call. Looking at raw p-values in a manhattan plot isn't very useful, but looking at other types of scores, e.g., test statistics, peak heights, bayes factors, etc. might be useful when not -log10 transformed. Would you mind submitting this as an issue on GitHub? Thanks.

    2. This issue is now resolved on the dev branch on github.

  3. Hi Stephen,
    Thanks a lot for sharing this. Is there any information that you know about the default resolution of the plots?

    1. That's controlled with the options used for saving the image

      png("file.png", width=2000, height=1000, pointsize=18, ...)

  4. Hi Stephen,

    I am trying to bastardise your fine work above to plot differences in alt alleles in SNPs between phenotypes (rightly or wrongly...). I had a look at the code but it wasn't immediately apparent where you plot and colour CHR so as to be able to visually discriminate between each in turn. I am a little short on time so thought I would ask if you could point out to me? Sorry for not RTFMing=P


    1. Bruce, take a look at the source code. I believe you'll need to mod lines 149 and 156-160. Basically, I'm creating a vector of color choices longer than I'll need on line 149, then alternating through them with the for loop. Inelegant, but it works.

    2. Great, thanks for that Stephen.

  5. Hi Stephen,

    Thanks for the useful code and package. Sorry for the simple question - but how do I go about calculating the lambda and adding to the plot ?

    Thanks in advance,


  6. Brilliant, just in time to cite it in a paper :D

  7. Hi Stephen,

    Is there an option to modify the width of the points?, I looked into the code and it says pch= 20 but I don't know how to change it, I tried to enter "pch = 10 " in the function but an error appeared saying that "pch" matched by multiple actual arguments.


  8. Hi Stephen

    Some chromosome may have blank gaps as the SNPs within some region are not genotyped, you need to address that in your script...

    1. Yuan: how about posting some example data as a github gist on the qqman issues page? Thanks.

  9. Hi Stephen,

    Thanks for such a beautiful script, and for such clear commenting! I am wondering if there is a way to alter the source code so that the ticks demarcating chromosomes fall at the outer boundaries of the chromosomes, rather than the middle, and the chromosome numbers fall in the middle? In the plots I am preparing I am not alternating colours of the chromosomes, and it is difficult to discern the boundaries between chromosomes with the ticks falling in the centre. I have tried fiddling around with the source code, but I am still fairly new to R and have not had any success.


  10. Hi Stephen,

    Thanks for the useful package - I have been hacking the GenABEL plot function on and off over the years and yours is really nice and clean.

    In one of my OCD moments I noticed that the x axes tick markers weren't centered on the chromosomes, so I went digging through the source. It seems that you've made an assumption that SNPs are evenly distributed across chromosomes, which is probably correct 99.9% of the time. However a student in our lab was interested in a subset of ~16K SNPs in specific protein coding genes, which don't follow an even distribution (obviously a very specific use case).

    I believe that a small change to line 127;

    ticks=c(ticks, d[d$index==i, ]$pos[floor(length(d[d$index==i, ]$pos)/2)+1])
    # this takes the middle value of the number of observed positions, which isn't always going to be the 'mean' chromosomal position


    ticks = c(ticks, (min(d[d$CHR == i,]$pos) + max(d[d$CHR == i,]$pos))/2 + 1)
    # this modification should identify the 'true' mean chromosomal position

    This fixes the issue with various data I've run through. I don't imagine the modification would cause issues down the track... but don't quote me on that!

    Please disregard if I'm talking nonsense!

    - Miles

    1. Miles, thanks so much for the fix. I haven't tested it out yet. Would you mind uploading some sample data to so I can test this out to see what you're dealing with? Or you can just email me if it's not too large. I'd love to include this in the next update. Thanks.

    2. No problem Stephen. I have sent through some example data via email, I hope it is useful. Please let me know if you would like me to test anything.

  11. Hi Stephen,

    thanks for the great package!
    I'm using it to plot my GWAS data and I was wondering if there is a command to add the rs numbers to SNPs that are above a certain threshold. (in my case the p=10^-5 line)
    I have a basic understanding of R, but I cannot do any programming myself.
    Could you help my with this?
    Thanks in advance,


    1. Roos - this is literally next on my to-do list with this code. I'll post here when it's updated. Thanks.

    2. That would be great!!
      Looking forward to it.

    3. Version 0.1.3 now has the ability to highlight SNPs below a particular threshold or annotate top SNPs on each chromosome. The release is already live on GitHub, and should propagate to CRAN within a day or two.

  12. Hi Stephan,
    Very nice. I found the single chromosome results came out smashed on a vertical line.
    This seems to be because "d$pos=d$BP/1e6" in manhattan.R (the x axis is in megabytes, even if x limits have been specified)
    I know you have highlighted ggplot2 in the past. Any reason for not using it here?

    1. Hm, didn't consider that when merging this PR. Will revisit.

  13. Hi Stephan,

    For some reason, when I try to change the colors for my plot, I get the following error. Do you know why?

    manhattan(data_test, ymax=8, col=c("blue4","orange3"))

    Error in localWindow(xlim, ylim, log, asp, ...) :
    formal argument "col" matched by multiple actual arguments

    1. hi @jmtoung, please post this as an issue on the issue tracker on GitHub. Thanks.

  14. Hi Stephen,

    Thank you for creating such useful code! I am trying to plot Likelihood values from a non-human species. I am finding that the option "chrlabs=c(1:38)" is not changing the chromosome labels on my plot. It also does not work if I use the option "limitchromosomes=1:38". Also, I am using the logp=FALSE flag since I want to plot just the likelihood values. There is no difference in my plot if I use logp=FALSE or logp=TRUE and the log10 values are being plotted in both cases.


  15. Hi Stephen,

    Thank you for creating usefull tool. I am new to R and manhattan plot is my first assignment. I am trying to adjust y axis for plot and its giving me warning " In plot.xy(xy.coords(x, y), type = type, ...) :
    "ymin" is not a graphical parameter"

    How do i adjust Y axis ?
    Thank you,

    1. Dear Stephen, many thanks for a useful and practical package, but the ylim=c(a,b) seems to be unresponsive to any values I feed it?

      many thanks, Isi

    2. Please post your example data and code as an issue on the github issue tracker. Thanks.

  16. Hi Stephen,

    Useful package! Maybe you can add an option to the manhattan plot to show the rsid's of the highlighted SNPs (at least if there not too much SNPs). For example add:

    with(d.highlight, text(pos, logp, SNP, col = "green3",

    Or use an additional argument to the manhattan function.


  17. Thanks a lot!!!!

    This it's going to be very usefull to me (and easier that using the entire code)

  18. Hi Stephen,
    Thanks for this useful package.
    I have a persistent error "Error in plot.window(...) : need finite 'ylim' values" which I cant seem to eliminate.
    Some of my previous examples worked fine.

    1. open an issue on github and leave some example code and data as a github gist.

    2. I am having the same issue with my dataset. My dataset is an imputed list of snps and I consistently get the Error in plot.window(...) : need finite 'ylim' values -- error message
      also using ymax = 10 I get the following response from R In plot.window(...) : "ymax" is not a graphical parameter

      I can't see where the person has placed the problem on github

    3. Please open an issue on github and leave a reproducible example as a github gist, with data.

    4. Fredrick and "Unknown"
      I had similar issues which I figured out.

      The Error in plot window.... is because you probably have NAs in your data.
      Solution: Get rid of them with >results <- na.omit(results)

      "ymax" error

      Solution: use "ylim" not "ymax"

  19. Thanks for a very useful package. I have a naive question in that I am not entirely certain what the convention is for a suggestive threshold (maybe because I work with plants?). Any comments or direction to literature would be appreciated.

  20. Hi!
    Thanks for so useful package!
    I was wondering: It would be possible to overlapping in one Manhattan plot, the p-value results for two or more phenotypes?


  21. Hi. while trying qqman I'm running into the following error "Error in plot.window(...) : need finite 'ylim' values". I don't have any NAs in my data. Is there something i'm doing wrong here?

    1. Please open an issue on github and leave a reproducible example as a github gist, with data.

  22. Hello Stephen,
    Thank you for this useful package (I'm really happy with mine manhattan plots). =)
    I want to know if is possible to create a qq plot with multiples pvalue vectors? I mean, I'm trying to make a plot to compare different models for association analysis similar to the one in:

    1. Pamela, certainly possible, but would require some retooling. The easiest way to do this now would be to use ggplot2.

  23. It works fine with this small data. But I can't see any highlighted SNPs with a bigger data file. Does it have something to do with the P-value or do i need to scale the graph? In the attached file "Highlight.txt" I can only see the top 5 SNPs in the manhattan plot and not the bottom 5.

  24. This is a great package, thank you so much. I'm just wondering if there is a way to annotate the genome-wide significant probes/SNPs with specific data from another column in my dataframe (for example, closest gene or rsID)?
    Many thanks

    1. I think you could just pass in the other variable as a vector to highlight. e.g., qqman(..., highlight=mydata$mycol)

    2. Many thanks for your quick reply. I've realised that I didn't actually explain my question very well. What I mean by 'annotate' is that I would like text on the manhattan plot. For example, for the highly significant probes/SNPs I would like text stating the gene or rsID. Thanks for your help.

    3. Ah, I see what you're saying. Unfortunately, no, that isn't implemented yet. But I'd recommend taking a look at textxy in the calibrate package - you could probably hack this into the manhattan source code somewhere that would work...


Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.