Thursday, August 6, 2015

Compiling RMarkdown from a Helper R Script

The problem

I was looking for a way to compile an RMarkdown document and have the filename of the resulting PDF or HTML document contain the name of the input data that it processed. That is, if I compiled the analysis.Rmd file, where in that file it did some analysis and reporting on data001.txt, I’d want the resulting filename to look something like data001.txt.analysis.html. Or even better, to stick in a timestamp with the date, so if the analysis was compiled today, August 6 2015, the resulting filename would be data001.txt.2015-08-06.html. I also wanted to implement the entire solution in R, not relying on fiddly makefiles or scripts that may behave differently depending on the OS/environment.
I found a near-solution as described on this SO post and detailed on this follow-up blog post, but neither really addressed my problem.

The solution

The simplest solution I could come up with involved creating two files:
  1. A .Rmd file that would actually do all the analysis and generate the compiled report.
  2. A second .R script to be used as a config file. Here you’d specify the input data (and potentially other analysis parameters).
By default, when calling rmarkdown::render() from an R script, the environment in which the code chunks are to be evaluated during knitting uses parent.frame() by default, so anything you define in the .R config file will get passed on to the .Rmd that is to be compiled.
Here’s what it looks like in practice.
First, the analysis.Rmd file that actually runs the analysis:
 ---
 title: "Analysis Markdown document"
 author: "Stephen Turner"
 date: "August 6, 2015"
 output: html_document
 ---

 This is the Rmarkdown document that runs the analysis.
 Some narrative text goes here. 
 Maybe we'll do some analysis here. The `infile` variable is passed 
 in from the config script. You could pass in other variables too.

 ```{r}
 # check that you defined infile from the config and that 
 # the file actually exists in the current directory
 stopifnot(exists("infile"))

 stopifnot(file.exists(infile))

 # read in the data
 x = read.table(infile)

 # do some stuff, make a plot, etc.
 result = mean(x$value)
 hist(x$value)
 ```

 Here is some conclusion narrative text. Maybe show some notes:

 - Input file used for this report: `r infile`
 - This report was compiled: `r Sys.Date()`
 - The mean of the `value` column is: `r result`

 Also, never forget to show your...

 ```{r}
 sessionInfo()
 ``` 
And the config.R helper script:
#-------- define the input filename --------#
infile = "data001.txt"
#----- Now just hit the source button! -----#

# check that the input file actually exists!
stopifnot(file.exists(infile))

# create the output filename
outfile = paste(infile, Sys.Date(), "analysis.html", sep=".")

# compile the document
rmarkdown::render(input="analysis.Rmd", output_file=outfile)
All I’d need to now is open up the config.R script, edit the infile variable, and hit the source button in RStudio. This runs the analysis.Rmd as shown above for the input (data001.txt in this example) and saves the resulting compiled report as data001.txt.2015-08-06.analysis.html.
(Crosspost at RPubs).

4 comments:

  1. Hi Stephen,

    I have been fiddling around a bit with this as well. You can actually pass parameters to your .Rmd directly with the params option in the YAML header, check http://rmarkdown.rstudio.com/developer_parameterized_reports.html for more info.

    ReplyDelete
    Replies
    1. Thanks Pierre. For simpler cases this might be better than the solution proposed here. Thanks for sharing!

      Delete
  2. One other solution, that is a bit more DRY, is to have the input script readLines() on the Rmd chunk, then if you have a specific pattern (eg always name the chunk read-data) it can then extract the file name from that for naming.

    It's a little more complicated, but it prevents changing data in one place and forgetting to change in another, and makes your compilation script applicable to all your files

    ReplyDelete
  3. Hi, just fyi, I've created a similar solution with
    https://github.com/holgerbrandl/datautils/tree/master/R/rendr
    by throwing in a bit of docopt to expose some common parameters to the user via a simple CLI.

    ReplyDelete

Creative Commons License
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.