Code
## `r params$municipality` {.unnumbered}
February 15, 2023
Let’s say you have survey data from a group of 28 municipalities. You have worked out a Quarto file to generate nice tables and figures and want to produce 28 different reports that all use the same Quarto file for each of the municipalities. Additionally, you want one collective report using the complete dataset.
Instead of creating 28+1 different .qmd files and running them manually, this can be easily automated with parameterization. This will save huge amounts of time, for instance when you find that typo in one figure and need to re-render all 29 reports. Or, even better, when you get next years survey results and can re-use the whole setup and instantly generate your new reports!
In our simple case, we will only use one parameter, the name of the municipality. Of course there could be any number of parameters that you may want to customize, which you are likely to easily be able to do based on this example.
We need to add two rows to the qmd-file YAML:
This creates the object params$municipality
and gives it a default value.
Next, we create our little script, in a file called render.R
, starting with a vector of municipalities. This could of course be read from a file, and our example will only contain four.
municipalities <- c("All municipalities","Vallentuna","Vaxholm","Södertälje","Botkyrka")
library(glue)
library(quarto)
library(purrr)
walk(1:length(municipalities), function(i) {
muni <- municipalities[i]
outfile <- glue("{Sys.Date()}_{muni}.html") # gives the filename a date and the municipality name
quarto_render(input = "yourQuartoTemplate.qmd",
execute_params = list("grupp" = muni),
output_file = outfile,
output_format = "html")
})
As you can see, we make use of
purrr::walk
, which means you may be able to usefurrr::future_walk
to enable parallel processing?! I have only tried this briefly, but it didn’t work. Could be worth looking into if you need to generate a lot of reports often.
Now, we just need to make sure that our Quarto file uses the params. For simplicity, I expect that you read your data from a file into a dataframe. We will make use of a simple if
and else
combined with dplyr::filter()
from the tidyverse
.
# read data
df <- read_csv("yourDataFile.csv")
if (params$municipality == "All municipalities") {
df <- df
} else {
df <- df %>%
filter(municipality == params$municipality)
}
Save your files and run the render.R
script! Files will be output to the same directory as your .qmd file.
If you’d like to use the municipality in the first headline of the report, you can add a line like this to the Quarto file:
## `r params$municipality` {.unnumbered}
A far more complex example can be found here (and some of my code was adapted from there): https://github.com/Pecners/sra_pullout/blob/main/render.R
The earlier example only used a single vector of municipalities as a parameter. In a current project I’m working on the municipalities also want to specify who they want to compare themselves with, and the timespan in years that they want displayed in the report. We keep this information in a MS Excel file with three variables: focusMuni
, compMuni
, yearsMuni
, which is read into a dataframe called DIDparams
in the script below. To avoid issues with blank spaces in the cells, we’ll remove them when importing. This is done with the gsub()
function.
# we need two additional packages for this
library(readxl)
library(dplyr)
DIDparams <- read_excel("DIDreportParameters.xls") %>%
mutate(across(everything(), ~ gsub(" ","",.x)))
Then we loop through each row in the DIDparams dataframe. Each cell in the variables compMuni and YearsMuni will contain multiple values. We use strsplit()
and unlist()
in a pipe to convert these into vectors. Finally, we want to make sure that the variable containing years is coded as numeric.
walk(1:nrow(DIDparams), function(i) {
this <- DIDparams[i,]
outfile <- glue("{Sys.Date()}_DIDrapport_{this$fokusKommun}.html")
quarto_render(input = "DIDreport.qmd",
execute_params = list("focusMuni" = this$focusMuni, # only contains one value
"compMuni" = this$compMuni %>% strsplit(",") %>% unlist(),
"yearsMuni" = this$yearsMuni %>% strsplit(",") %>% unlist() %>% as.numeric()
),
output_file = outfile,
output_format = "html")
})
@online{johansson2023,
author = {Johansson, Magnus},
title = {Automating Reports with {Quarto}},
date = {2023-02-15},
url = {https://pgmj.github.io/parameterizedQ.html},
langid = {en}
}
---
title: "Automating reports with Quarto"
author:
name: Magnus Johansson
affiliation: RISE Research Institutes of Sweden
affiliation-url: https://www.ri.se/sv/vad-vi-gor/expertiser/kategoriskt-baserade-matningar
orcid: 0000-0003-1669-592X
date: 2023-02-15
citation:
type: 'webpage'
csl: apa.csl
editor:
markdown:
wrap: 72
---
## Background
Let's say you have survey data from a group of 28 municipalities. You
have worked out a [Quarto](https://www.quarto.org) file to generate nice
tables and figures and want to produce 28 different reports that all use
the same Quarto file for each of the municipalities. Additionally, you
want one collective report using the complete dataset.
Instead of creating 28+1 different .qmd files and running them manually,
this can be easily automated with parameterization. This will save huge
amounts of time, for instance when you find that typo in one figure and
need to re-render all 29 reports. Or, even better, when you get next
years survey results and can re-use the whole setup and instantly
generate your new reports!
## Setting up
In our simple case, we will only use one parameter, the name of the
municipality. Of course there could be any number of parameters that you
may want to customize, which you are likely to easily be able to do
based on this example.
We need to add two rows to the qmd-file YAML:
``` yaml
---
params:
municipality: "All municipalities"
---
```
This creates the object `params$municipality` and gives it a default
value.
Next, we create our little script, in a file called `render.R`, starting
with a vector of municipalities. This could of course be read from a
file, and our example will only contain four.
``` r
municipalities <- c("All municipalities","Vallentuna","Vaxholm","Södertälje","Botkyrka")
library(glue)
library(quarto)
library(purrr)
walk(1:length(municipalities), function(i) {
muni <- municipalities[i]
outfile <- glue("{Sys.Date()}_{muni}.html") # gives the filename a date and the municipality name
quarto_render(input = "yourQuartoTemplate.qmd",
execute_params = list("grupp" = muni),
output_file = outfile,
output_format = "html")
})
```
> As you can see, we make use of `purrr::walk`, which means you may be
> able to use `furrr::future_walk` to enable parallel processing?! I
> have only tried this briefly, but it didn't work. Could be worth
> looking into if you need to generate a lot of reports often.
### Final settings
Now, we just need to make sure that our Quarto file uses the params. For
simplicity, I expect that you read your data from a file into a
dataframe. We will make use of a simple `if` and `else` combined with
`dplyr::filter()` from the `tidyverse`.
``` r
# read data
df <- read_csv("yourDataFile.csv")
if (params$municipality == "All municipalities") {
df <- df
} else {
df <- df %>%
filter(municipality == params$municipality)
}
```
Save your files and run the `render.R` script! Files will be output to
the same directory as your .qmd file.
If you'd like to use the municipality in the first headline of the
report, you can add a line like this to the Quarto file:
```{r}
## `r params$municipality` {.unnumbered}
```
::: {.callout-note icon="false"}
A far more complex example can be found here (and some of my code was
adapted from there):
<https://github.com/Pecners/sra_pullout/blob/main/render.R>
:::
## A slightly more complex example
The earlier example only used a single vector of municipalities as a parameter. In a current project I'm working on the municipalities also want to specify who they want to compare themselves with, and the timespan in years that they want displayed in the report. We keep this information in a MS Excel file with three variables: `focusMuni`, `compMuni`, `yearsMuni`, which is read into a dataframe called `DIDparams` in the script below. To avoid issues with blank spaces in the cells, we'll remove them when importing. This is done with the `gsub()` function.
``` r
# we need two additional packages for this
library(readxl)
library(dplyr)
DIDparams <- read_excel("DIDreportParameters.xls") %>%
mutate(across(everything(), ~ gsub(" ","",.x)))
```
Then we loop through each row in the DIDparams dataframe. Each cell in the variables compMuni and YearsMuni will contain multiple values. We use `strsplit()` and `unlist()` in a pipe to convert these into vectors. Finally, we want to make sure that the variable containing years is coded as numeric.
```r
walk(1:nrow(DIDparams), function(i) {
this <- DIDparams[i,]
outfile <- glue("{Sys.Date()}_DIDrapport_{this$fokusKommun}.html")
quarto_render(input = "DIDreport.qmd",
execute_params = list("focusMuni" = this$focusMuni, # only contains one value
"compMuni" = this$compMuni %>% strsplit(",") %>% unlist(),
"yearsMuni" = this$yearsMuni %>% strsplit(",") %>% unlist() %>% as.numeric()
),
output_file = outfile,
output_format = "html")
})
```