Thursday, February 17, 2011

R: Given column name in a Data Frame, Get the Index

Had a mental block today trying to figure out how to get the indices of columns in a data frame given their names. Simple task but difficult to search Google for an answer. Thanks to jashapiro, Matt, and Vince for giving me a heads up on the which() function. The which() function returns the indices of TRUE values in a logical vector.

If you're looking at the iris data:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

And you needed to know which column number "Sepal.Width" and "Species" were, use the which() function:

mycols <- c("Sepal.Width","Species")
which(names(iris) %in% mycols)
[1] 2 5
[1] 3



  1. The 'match' function is even easier for this. It returns indices for the first vector argument's matches in the second one.

    > match(mycols, colnames(iris))
    [1] 2 5

  2. Following Alex, note that if you had specified

    >mycols <-c("Species","Sepal.Width")

    then your first answer would be wrong (if you cared about retaining order, which I usually do). match() would give the right answer.

    Also, match has the potential to be faster if you are doing a lot of comparisons.

  3. haha I totally spent a good 1/2 hr with this same question yesterday morning. Just saw the repost to R-bloggers

    Thanks for both versions

  4. brilliant, neat answer. thanks!


Note: Only a member of this blog may post a comment.

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.