Define a ggplot theme theme_ki(), a standard table function, kbl_ki(), and a color palette based on KI’s design guide, ki_color_palette.
Code
source("ki.R")# this reads an external file and loads whatever is in it
2 Visualizations
2.1 Histogram
Code
df.long%>%filter(Group=="Control")%>%ggplot(aes(x =value, fill =measure))+geom_histogram(binwidth =4)+facet_grid(measure~time)+theme_ki()+scale_fill_manual(values =ki_color_palette)+labs(title ="Outcomes over time", subtitle ="Control group", x ="", y ="Number of respondents")
Code
df.long%>%filter(!Group=="Control")%>%ggplot(aes(x =value, fill =measure))+geom_histogram(binwidth =4)+facet_grid(measure~time)+theme_ki()+scale_fill_manual(values =ki_color_palette)+labs(title ="Outcomes over time", subtitle ="Intervention group", x ="", y ="Number of respondents")
2.2 Density
Code
df.long%>%filter(Group=="Control")%>%ggplot(aes(x =value, fill =measure))+geom_density()+facet_grid(measure~time)+theme_ki()+scale_fill_manual(values =ki_color_palette)+labs(title ="Outcomes over time", subtitle ="Control group", x ="", y ="Density of respondents")
Code
df.long%>%filter(!Group=="Control")%>%ggplot(aes(x =value, fill =measure))+geom_density()+facet_grid(measure~time)+theme_ki()+scale_fill_manual(values =ki_color_palette)+labs(title ="Outcomes over time", subtitle ="Intervention group", x ="", y ="Density of respondents")
2.3 stat_halfeye
Code
df.long%>%filter(measure=="DEP")%>%ggplot(aes(x =value, fill =Group))+stat_halfeye()+facet_grid(Group~time)+theme_ki()+#scale_fill_manual(values = ki_color_palette) +labs(title ="Depression outcomes over time", x ="", y ="Density of respondents")
Notice the notch in the boxplot. This is from the documentation (?geom_boxplot): > If FALSE (default) make a standard box plot. If TRUE, make a notched box plot. Notches are used to compare groups; if the notches of two boxes do not overlap, this suggests that the medians are significantly different.
2.5 Individuals over time
Using the package ggrain we can get this sweet figure that combines a boxplot, a half/split violin plot, jittered points for individuals, and lines between individuals across time points!
Code
df.long%>%filter(measure=="DEP")%>%ggplot(aes(x =time, y =value, group =time, fill =Group, color =Group))+geom_rain( boxplot.args =list(color ="black", outlier.shape =NA), id.long.var ="id")+scale_fill_brewer(palette ="Dark2", aesthetics =c("color","fill"))+theme_ki()+facet_wrap(~Group, nrow =2)+labs(title ="Depression over time")
And a slightly different version - can you see what is different and how it relates to the code?
We will start with DEP as outcome and fit a linear model.
3.1 Data wrangling
First, we’ll split our measure variable into three separate variables (while retaining time as its own variable), using pivot_wider().
Second, we’ll remove the last time point in this first model, in order to make our data fit a linear model better.
Both of these transformations are made in the code chunk below.
Code
df.model1<-df.long%>%filter(!time=="3")%>%# since time is a character variable in df.long we need ""pivot_wider(names_from ="measure", values_from ="value")%>%rename(Depression =DEP, Anxiety =ANX, Stress =STRESS)%>%mutate(time =as.integer(time))
Can you decipher the syntax (before looking below)?
3.3 Syntax for lmer
The general trick is that the formula follows the form - dependent ~ independent | grouping. The grouping is generally a random factor, you can include fixed factors without any grouping and you can have additional random factors without any fixed factor (an intercept-only model). A + between factors indicates no interaction, a * indicates interaction.
For random factors, you have three basic variants:
Intercepts only by random factor: (1 | random.factor)
Slopes only by random factor: (0 + fixed.factor | random.factor)
Intercepts and slopes by random factor: (1 + fixed.factor | random.factor)
Note that variant 3 has the slope and the intercept calculated in the same grouping, i.e. at the same time. If we want the slope and the intercept calculated independently, i.e. without any assumed correlation between the two, we need a fourth variant:
Intercept and slope, separately, by random factor: (1 | random.factor) + (0 + fixed.factor | random.factor). An alternative way to write this is using the double-bar notation fixed.factor + (fixed.factor || random.factor).
Our linear mixed model makes a lot of assumptions about the structure of data. It is important that we investigate this before looking at the output of the model, which may be entirely misleading if assumptions are not met.
Linear mixed model fit by REML ['lmerMod']
Formula: Depression ~ time + Group + time * Group + (1 | id)
Data: df.model1
REML criterion at convergence: 1974.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.3880 -0.5395 -0.0722 0.4928 3.3691
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 80.29 8.960
Residual 31.09 5.576
Number of obs: 282, groups: id, 106
Fixed effects:
Estimate Std. Error t value
(Intercept) 16.9085 1.3922 12.145
time -0.7405 0.5711 -1.297
GroupMiCBT -0.7535 2.0160 -0.374
time:GroupMiCBT -2.7173 0.8442 -3.219
Correlation of Fixed Effects:
(Intr) time GrMCBT
time -0.372
GroupMiCBT -0.691 0.257
tim:GrpMCBT 0.252 -0.676 -0.366
3.5.3 tidy
More friendly formatted with tidy() makes it easy to create a simple table.
Code
tidy(m1)%>%mutate_if(is.numeric, round, 2)%>%kbl_ki()# note that this function is specified in the file "ki.R"
This function autogenerates text describing the model output.
Code
report(m1)
We fitted a linear mixed model (estimated using REML and nloptwrap optimizer)
to predict Depression with time and Group (formula: Depression ~ time + Group +
time * Group). The model included id as random effect (formula: ~1 | id). The
model's total explanatory power is substantial (conditional R2 = 0.74) and the
part related to the fixed effects alone (marginal R2) is of 0.06. The model's
intercept, corresponding to time = 0 and Group = Control, is at 16.91 (95% CI
[14.17, 19.65], t(276) = 12.14, p < .001). Within this model:
- The effect of time is statistically non-significant and negative (beta =
-0.74, 95% CI [-1.86, 0.38], t(276) = -1.30, p = 0.196; Std. beta = -0.06, 95%
CI [-0.14, 0.03])
- The effect of Group [MiCBT] is statistically non-significant and negative
(beta = -0.75, 95% CI [-4.72, 3.22], t(276) = -0.37, p = 0.709; Std. beta =
-0.30, 95% CI [-0.65, 0.04])
- The effect of time × Group [MiCBT] is statistically significant and negative
(beta = -2.72, 95% CI [-4.38, -1.06], t(276) = -3.22, p = 0.001; Std. beta =
-0.21, 95% CI [-0.33, -0.08])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
4 More linear models…
What happens if we define time as a factor and add time point 3?
We can create a similar example to the blog post linked above. It makes a comparison of post-measurements between intervention and control groups, and adds pre-measurement as a covariate. We can use the last time point in our data
First some data wrangling to get our pre/post measurement into wide format.