## Mediation

Mediation is a complex topic, and the key message to take on — before starting to analyse your data — is that mediation analayses make many strong assumptions abou the data. These assumptions can often be pretty unreasonable, when spelled out, so be cautious in the interpretation of you data.

Put differently, mediation is a correlational technique aiming to provide a causal interpretation of data; caveat emptor.

### Mediation with multiple regression

One common (if outdated) way to analyse mediation is via the 3 steps described by Baron and Kenny @baron1986moderator (also see @zhao_reconsidering_2010).

Let’s say we have a hypothesised situation such as this:

```
knit_gv("
Lateness -> Crashes
Lateness -> Speeding
Speeding -> Crashes
")
```

Baron and Kenny propose 3 steps to establishing mediation. These steps correspond to three separate regression models:

### Mediation Steps

#### Step 1 (check distal variable predicts mediator)

That is, show Lateness predicts Crashes

#### Step 2 (check distal variable predict mediator)

That is, show Lateness predicts Speeding

#### Step 3 (check for mediation)

That is, show Speeding predicts Crashes, controlling for Lateness

An additional step, which allows us to test whether the effect is *completely*
mediated, also uses the final regression model:

#### Step 4 (check for total mediation)

That is, check if Lateness still predicts crashes, controlling for Lateness

### Mediation example after Baron and Kenny

Using simulated data, we can work through the steps.

```
smash %>% glimpse
Observations: 200
Variables: 4
$ person <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
$ lateness <int> 11, 12, 9, 8, 11, 4, 11, 7, 8, 11, 9, 10, 9, 11, 13, 10…
$ speed <dbl> 48.88524, 43.07030, 33.72812, 44.17897, 51.22378, 40.86…
$ crashes <int> 15, 9, 12, 20, 25, 13, 22, 14, 22, 18, 11, 16, 19, 15, …
```

Step 1: does lateness predict crashes?

```
step1 <- lm(crashes ~ lateness, data=smash)
tidy(step1) %>% pander()
```

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | 12.15 | 1.204 | 10.09 | 1.365e-19 |

lateness | 0.4448 | 0.1125 | 3.953 | 0.0001074 |

Step 2: Does lateness predict speed?

```
step2 <- lm(speed ~ lateness, data=smash)
tidy(step2, conf.int = T) %>% pander()
```

term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|

(Intercept) | 33.42 | 2.275 | 14.69 | 1.563e-33 | 28.93 | 37.9 |

lateness | 0.515 | 0.2126 | 2.422 | 0.01633 | 0.09573 | 0.9343 |

The coefficient for `lateness`

is statistically significant, so we would say
yes.

Step 3: Does speed predict crashes, controlling for lateness?

```
step3 <- lm(crashes ~ lateness+speed, data=smash)
tidy(step3) %>% pander()
```

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | 2.542 | 1.465 | 1.735 | 0.08427 |

lateness | 0.2967 | 0.09611 | 3.088 | 0.002309 |

speed | 0.2875 | 0.03166 | 9.083 | 1.122e-16 |

The coefficient for speed is statistically significant, so we can say mediation does occur.

Step 4: In the same model, does lateness predict crashes, controlling for speed?
That is to say, is the mediation via speed *total*?

Here, the coefficient is still statistically significant. According to the Baron
and Kenny steps, this would indicate the mediation is *partial*, although the
fact the p value falls one side or another of .05 is not necessarily the best
way to express this (see below for ways to calculate the proportion of the
effect which is mediated).

We should alse be concerned here with the degree to which predictor and mediator are measured with error — if they are noisy measures, then the proportion of the effect which appears to be mediated will be reduced artificially (see the SEM chapter for more on this).