3 level models with ‘partially crossed’ random effects
The lme4::InstEval
dataset records University lecture evaluations by students
at ETH Zurich. The variables include:
s
a factor with levels 1:2972 denoting individual students.d
a factor with 1128 levels from 1:2160, denoting individual professors or lecturers.studage
an ordered factor with levels 2 < 4 < 6 < 8, denoting student’s “age” measured in the semester number the student has been enrolled.lectage
an ordered factor with 6 levels, 1 < 2 < … < 6, measuring how many semesters back the lecture rated had taken place.service
a binary factor with levels 0 and 1; a lecture is a “service”, if held for a different department than the lecturer’s main one.dept
a factor with 14 levels from 1:15, using a random code for the department of the lecture.y
a numeric vector of ratings of lectures by the students, using the discrete scale 1:5, with meanings of ‘poor’ to ‘very good’.
For convenience, in this example we take a subsample of the (fairly large) dataset:
set.seed(1234)
lectures <- sample_n(lme4::InstEval, 10000)
We run a model without any predictors, but respecting the clustering in the data, in the example below. This model is a three-level random intercepts model, which splits the variance between lecturers, students, and the residual variance. Because, in some cases, some of the same students provide data on a particular lecturer these data are ‘partially crossed’ (the alternative would be to sample different students for each lecturer).
lectures.model <- lmer(y~(1|d)+(1|s), data=lectures)
summary(lectures.model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: y ~ (1 | d) + (1 | s)
Data: lectures
REML criterion at convergence: 33063.8
Scaled residuals:
Min 1Q Median 3Q Max
-2.62960 -0.74245 0.03516 0.76510 2.62158
Random effects:
Groups Name Variance Std.Dev.
s (Intercept) 0.1119 0.3345
d (Intercept) 0.2717 0.5213
Residual 1.3656 1.1686
Number of obs: 10000, groups: s, 2707; d, 1065
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.219e+00 2.381e-02 1.023e+03 135.2 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As before, we can extract only the variance components from the model, and look at the ICC:
VarCorr(lectures.model) %>% as_data_frame() %>%
mutate(icc=vcov/sum(vcov)) %>%
select(grp, vcov, icc)
# A tibble: 3 x 3
grp vcov icc
<chr> <dbl> <dbl>
1 s 0.112 0.0640
2 d 0.272 0.155
3 Residual 1.37 0.781
And we can add predictors to the model to see if they help explain student ratings:
lectures.model.2 <- lmer(y~service*dept+(1|d)+(1|s), data=lectures)
anova(lectures.model.2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
service 3.001 3.0006 1 7457.1 2.2026 0.137820
dept 21.306 1.6389 13 1157.3 1.2031 0.270909
service:dept 41.546 3.1958 13 6638.0 2.3459 0.004052 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we can see the service
variable does predict evaluations, and we can use
the model to estimate the mean and SE for service == 1 or service == 0 (see also
the sections on multiple comparisons,
followup contrasts, and doing
followup contrasts with lmer models for more options here):
service.means <- emmeans::emmeans(lectures.model.2, 'service')
Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
To enable adjustments, set emm_options(pbkrtest.limit = 10000) or larger,
but be warned that this may result in large computation time and memory use.
Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
To enable adjustments, set emm_options(lmerTest.limit = 10000) or larger,
but be warned that this may result in large computation time and memory use.
NOTE: Results may be misleading due to involvement in interactions
service.means %>%
broom::tidy() %>%
select(service, estimate, std.error) %>%
pander
service | estimate | std.error |
---|---|---|
0 | 3.257 | 0.02911 |
1 | 3.196 | 0.03914 |
Or change the proportions of variance components at each level (they don’t, much, in this instance):
VarCorr(lectures.model.2) %>% as_data_frame() %>%
mutate(icc=vcov/sum(vcov)) %>%
select(grp, vcov, icc)
# A tibble: 3 x 3
grp vcov icc
<chr> <dbl> <dbl>
1 s 0.112 0.0643
2 d 0.262 0.151
3 Residual 1.36 0.785