I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject.
R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to take data frames, munge them around, and return a data frame. David Robinson's broom package bridges this gap by taking un-tidy output from model objects, which are not data frames, and returning them in a tidy data frame format.
(From the documentation): if you performed a linear model on the built-in
mtcarsdataset and view the object directly, this is what you’d see:
lmfit = lm(mpg ~ wt, mtcars) lmfit
Call: lm(formula = mpg ~ wt, data = mtcars) Coefficients: (Intercept) wt 37.285 -5.344
Call: lm(formula = mpg ~ wt, data = mtcars) Residuals: Min 1Q Median 3Q Max -4.543 -2.365 -0.125 1.410 6.873 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.285 1.878 19.86 < 2e-16 *** wt -5.344 0.559 -9.56 1.3e-10 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.05 on 30 degrees of freedom Multiple R-squared: 0.753, Adjusted R-squared: 0.745 F-statistic: 91.4 on 1 and 30 DF, p-value: 1.29e-10
If you’re just trying to read it this is good enough, but if you’re doing other follow-up analysis or visualization, you end up hacking around with
str()and pulling out coefficients using indices, and everything gets ugly quick.
tidyfunction in the broom package run on the fit object probably gives you what you were looking for in a tidy data frame:
term estimate stderror statistic p.value 1 (Intercept) 37.285 1.8776 19.858 8.242e-19 2 wt -5.344 0.5591 -9.559 1.294e-10
tidy()function also works on other types of model objects, like those produced by
nls(), as well as popular built-in hypothesis testing tools like
View the README on the GitHub page, or install the package and run the vignette to see more examples and conventions.