As noted in the Anova cookbook section, repeated measures anova can be approximated using linear mixed models.

For example, reprising the sleepstudy example, we can approximate a repeated measures Anova in which multiple measurements of Reaction time are taken on multiple Days for each Subject.

As we saw before, the traditional RM Anova model is:

sleep.rmanova <- afex::aov_car(Reaction ~ Days + Error(Subject/(Days)), data=lme4::sleepstudy)
Registered S3 methods overwritten by 'car':
method                          from
influence.merMod                lme4
cooks.distance.influence.merMod lme4
dfbeta.influence.merMod         lme4
dfbetas.influence.merMod        lme4
sleep.rmanova
Anova Table (Type 3 tests)

Response: Reaction
Effect          df     MSE         F ges p.value
1   Days 3.32, 56.46 2676.18 18.70 *** .29  <.0001
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Sphericity correction method: GG 

The equivalent lmer model is:

library(lmerTest)
sleep.lmer <- lmer(Reaction ~ factor(Days) + (1|Subject), data=lme4::sleepstudy)
anova(sleep.lmer)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)
factor(Days) 166235   18471     9   153  18.703 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The following sections demonstrate just some of the extensions to RM Anova which are possible with mutlilevel models,

### Fit a simple slope for Days

lme4::sleepstudy %>%
ggplot(aes(Days, Reaction)) +
geom_point() + geom_jitter() +
geom_smooth()
geom_smooth() using method = 'loess' and formula 'y ~ x'

slope.model <- lmer(Reaction ~ Days + (1|Subject),  data=lme4::sleepstudy)
anova(slope.model)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)
Days 162703  162703     1   161   169.4 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
slope.model.summary <- summary(slope.model)
slope.model.summary$coefficients Estimate Std. Error df t value Pr(>|t|) (Intercept) 251.40510 9.7467163 22.8102 25.79383 2.241351e-18 Days 10.46729 0.8042214 161.0000 13.01543 6.412601e-27 ### Allow the effect of sleep deprivation to vary for different participants If we plot the data, it looks like sleep deprivation hits some participants worse than others: set.seed(1234) lme4::sleepstudy %>% filter(Subject %in% sample(levels(Subject), 10)) %>% ggplot(aes(Days, Reaction, group=Subject, color=Subject)) + geom_smooth(method="lm", se=F) + geom_jitter(size=1) + theme_minimal() If we wanted to test whether there was significant variation in the effects of sleep deprivation between subjects, by adding a random slope to the model. The random slope allows the effect of Days to vary between subjects. So we can think of an overall slope (i.e. RT goes up over the days), from which individuals deviate by some amount (e.g. a resiliant person will have a negative deviation or residual from the overall slope). Adding the random slope doesn’t change the F test for Days that much: random.slope.model <- lmer(Reaction ~ Days + (Days|Subject), data=lme4::sleepstudy) anova(random.slope.model) Type III Analysis of Variance Table with Satterthwaite's method Sum Sq Mean Sq NumDF DenDF F value Pr(>F) Days 30024 30024 1 16.995 45.843 3.273e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Nor the overall slope coefficient: random.slope.model.summary <- summary(random.slope.model) slope.model.summary$coefficients
Estimate Std. Error       df  t value     Pr(>|t|)
(Intercept) 251.40510  9.7467163  22.8102 25.79383 2.241351e-18
Days         10.46729  0.8042214 161.0000 13.01543 6.412601e-27

But we can use the lmerTest::ranova() function to show that there is statistically significant variation in slopes between individuals, using the likelihood ratio test:

lmerTest::ranova(random.slope.model)
ANOVA-like table for random-effects: Single term deletions

Model:
Reaction ~ Days + (Days | Subject)
npar  logLik    AIC    LRT Df Pr(>Chisq)
<none>                      6 -871.81 1755.6
Days in (Days | Subject)    4 -893.23 1794.5 42.837  2   4.99e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Because the random slope for Days is statistically significant, we know it improves the model. One way to see that improvement is to plot residuals (unexplained error for each datapoint) against predicted values. To extract residual and fitted values we use the residuals() and predict() functions. These are then combined in a data_frame, to enable us to use ggplot for the subsequent figures.

# create data frames containing residuals and fitted
# values for each model we ran above
a <-  data_frame(
model = "random.slope",
fitted = predict(random.slope.model),
residual = residuals(random.slope.model))
Warning: data_frame() is deprecated, use tibble().
This warning is displayed once per session.
b <- data_frame(
model = "random.intercept",
fitted = predict(slope.model),
residual = residuals(slope.model))

# join the two data frames together
residual.fitted.data <- bind_rows(a,b)

We can see that the residuals from the random slope model are much more evenly distributed across the range of fitted values, which suggests that the assumption of homogeneity of variance is met in the random slope model:

# plots residuals against fitted values for each model
residual.fitted.data %>%
ggplot(aes(fitted, residual)) +
geom_point() +
geom_smooth(se=F) +
facet_wrap(~model)
geom_smooth() using method = 'loess' and formula 'y ~ x'

We can plot both of the random effects from this model (intercept and slope) to see how much the model expects individuals to deviate from the overall (mean) slope.

# extract the random effects from the model (intercept and slope)
ranef(random.slope.model)$Subject %>% # implicitly convert them to a dataframe and add a column with the subject number rownames_to_column(var="Subject") %>% # plot the intercept and slobe values with geom_abline() ggplot(aes()) + geom_abline(aes(intercept=(Intercept), slope=Days, color=Subject)) + # add axis label xlab("Day") + ylab("Residual RT") + # set the scale of the plot to something sensible scale_x_continuous(limits=c(0,10), expand=c(0,0)) + scale_y_continuous(limits=c(-100, 100)) Inspecting this plot, there doesn’t seem to be any strong correlation between the RT value at which an individual starts (their intercept residual) and the slope describing how they change over the days compared with the average slope (their slope residual). That is, we can’t say that knowing whether a person has fast or slow RTs at the start of the study gives us a clue about what will happen to them after they are sleep deprived: some people start slow and get faster; other start fast but suffer and get slower. However we can explicitly check this correlation (between individuals’ intercept and slope residuals) using the VarCorr() function: VarCorr(random.slope.model) Groups Name Std.Dev. Corr Subject (Intercept) 24.7366 Days 5.9229 0.066 Residual 25.5918  The correlation between the random intercept and slopes is only 0.066, and so very low. We might, therefore, want to try fitting a model without this correlation. lmer includes the correlation by default, so we need to change the model formula to make it clear we don’t want it: uncorrelated.reffs.model <- lmer( Reaction ~ Days + (1 | Subject) + (0 + Days|Subject), data=lme4::sleepstudy) VarCorr(uncorrelated.reffs.model) Groups Name Std.Dev. Subject (Intercept) 25.0499 Subject.1 Days 5.9887 Residual 25.5652  The variance components don’t change much when we constrain the covariance of intercepts and slopes to be zero, and we can explicitly compare these two models using the anova() function, which is somewhat confusingly named because in this instance it is performing a likelihood ratio test to compare the two models: anova(random.slope.model, uncorrelated.reffs.model) refitting model(s) with ML (instead of REML) Data: lme4::sleepstudy Models: uncorrelated.reffs.model: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject) random.slope.model: Reaction ~ Days + (Days | Subject) Df AIC BIC logLik deviance Chisq Chi Df uncorrelated.reffs.model 5 1762.0 1778.0 -876.00 1752.0 random.slope.model 6 1763.9 1783.1 -875.97 1751.9 0.0639 1 Pr(>Chisq) uncorrelated.reffs.model random.slope.model 0.8004 Model fit is not significantly worse with the constrained model, so for parsimony’s sake we prefer it to the more complex model. ### Fitting a curve for the effect of Days In theory, we could also fit additional parameters for the effect of Days, although a combined smoothed line plot/scatterplot indicates that a linear function fits the data reasonably well. lme4::sleepstudy %>% ggplot(aes(Days, Reaction)) + geom_point() + geom_jitter() + geom_smooth() geom_smooth() using method = 'loess' and formula 'y ~ x' If we insisted on testing a curved (quadratic) function of Days, we could: quad.model <- lmer(Reaction ~ Days + I(Days^2) + (1|Subject), data=lme4::sleepstudy) quad.model.summary <- summary(quad.model) quad.model.summary$coefficients
Estimate Std. Error        df   t value     Pr(>|t|)
(Intercept) 255.4493728 10.4656347  30.04058 24.408398 2.299848e-21
Days          7.4340850  2.9707976 160.00001  2.502387 1.334036e-02
I(Days^2)     0.3370223  0.3177733 160.00001  1.060575 2.904815e-01

Here, the p value for I(Days^2) is not significant, suggesting (as does the plot) that a simple slope model is sufficient.