Mediation
Mediation is a complex topic, and the key message to take on — before starting to analyse your data — is that mediation analayses make many strong assumptions abou the data. These assumptions can often be pretty unreasonable, when spelled out, so be cautious in the interpretation of you data.
Put differently, mediation is a correlational technique aiming to provide a causal interpretation of data; caveat emptor.
Mediation with multiple regression
One common (if outdated) way to analyse mediation is via the 3 steps described by Baron and Kenny @baron1986moderator (also see @zhao_reconsidering_2010).
Let’s say we have a hypothesised situation such as this:
knit_gv("
Lateness -> Crashes
Lateness -> Speeding
Speeding -> Crashes
")
Baron and Kenny propose 3 steps to establishing mediation. These steps correspond to three separate regression models:
Mediation Steps
Step 1 (check distal variable predicts mediator)
That is, show Lateness predicts Crashes
Step 2 (check distal variable predict mediator)
That is, show Lateness predicts Speeding
Step 3 (check for mediation)
That is, show Speeding predicts Crashes, controlling for Lateness
An additional step, which allows us to test whether the effect is completely mediated, also uses the final regression model:
Step 4 (check for total mediation)
That is, check if Lateness still predicts crashes, controlling for Lateness
Mediation example after Baron and Kenny
Using simulated data, we can work through the steps.
smash %>% glimpse
Observations: 200
Variables: 4
$ person <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
$ lateness <int> 11, 12, 9, 8, 11, 4, 11, 7, 8, 11, 9, 10, 9, 11, 13, 10…
$ speed <dbl> 48.88524, 43.07030, 33.72812, 44.17897, 51.22378, 40.86…
$ crashes <int> 15, 9, 12, 20, 25, 13, 22, 14, 22, 18, 11, 16, 19, 15, …
Step 1: does lateness predict crashes?
step1 <- lm(crashes ~ lateness, data=smash)
tidy(step1) %>% pander()
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 12.15 | 1.204 | 10.09 | 1.365e-19 |
lateness | 0.4448 | 0.1125 | 3.953 | 0.0001074 |
Step 2: Does lateness predict speed?
step2 <- lm(speed ~ lateness, data=smash)
tidy(step2, conf.int = T) %>% pander()
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 33.42 | 2.275 | 14.69 | 1.563e-33 | 28.93 | 37.9 |
lateness | 0.515 | 0.2126 | 2.422 | 0.01633 | 0.09573 | 0.9343 |
The coefficient for lateness
is statistically significant, so we would say
yes.
Step 3: Does speed predict crashes, controlling for lateness?
step3 <- lm(crashes ~ lateness+speed, data=smash)
tidy(step3) %>% pander()
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 2.542 | 1.465 | 1.735 | 0.08427 |
lateness | 0.2967 | 0.09611 | 3.088 | 0.002309 |
speed | 0.2875 | 0.03166 | 9.083 | 1.122e-16 |
The coefficient for speed is statistically significant, so we can say mediation does occur.
Step 4: In the same model, does lateness predict crashes, controlling for speed? That is to say, is the mediation via speed total?
Here, the coefficient is still statistically significant. According to the Baron and Kenny steps, this would indicate the mediation is partial, although the fact the p value falls one side or another of .05 is not necessarily the best way to express this (see below for ways to calculate the proportion of the effect which is mediated).
We should alse be concerned here with the degree to which predictor and mediator are measured with error — if they are noisy measures, then the proportion of the effect which appears to be mediated will be reduced artificially (see the SEM chapter for more on this).