Path models
Path models are an extension of linear regression, but where multiple observed variables can be considered as ‘outcomes’.
Because the terminology of outcomes v.s. predictors breaks down when variables can be both outcomes and predictors at the same time, it’s normal to distinguish instead between:
Exogenous variables: those which are not predicted by any other
Endogenous variables: variables which do have predictors, and may or may not predict other variales
Defining a model
To define a path model, lavaan
requires that you specify the relationships
between variables in a text format. A full
guide to this lavaan model syntax
is available on the project website.
For path models the format is very simple, and resembles a series of linear models, written over several lines, but in text rather than as a model formula:
# define the model over multiple lines for clarity
mediation.model <- "
y ~ x + m
m ~ x
"
In this case the ~
symbols just means ‘regressed on’ or ‘is predicted by’. The
model in the example above defines that our outcome y
is predicted by both x
and m
, and that x
also predicts m
. You might recognise this as a
mediation model.
Make sure you include the closing quote symbol, and also be careful when running the code which defines the model. RStdudio can sometimes get confused and only run some of the lines, leading to errors. The simplest solution is to select the entire block explicitly and run that.
To fit the model we pass the model specification and the data to the sem()
function:
mediation.fit <- sem(mediation.model, data=mediation.df)
As we did for linear regression models, we have saved
the model fit object into a variable, here named mediation.fit
.
To display the model results we can use summary()
. The key section of the
output to check is the table listed ‘Regressions’, which lists the regression
parameters for the predictors for each of the endogenous variables.
summary(mediation.fit)
lavaan 0.6-3 ended normally after 12 iterations
Optimization method NLMINB
Number of free parameters 5
Number of observations 200
Estimator ML
Model Fit Test Statistic 0.000
Degrees of freedom 0
Minimum Function Value 0.0000000000000
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard Errors Standard
Regressions:
Estimate Std.Err z-value P(>|z|)
y ~
x 0.166 0.075 2.198 0.028
m 0.190 0.070 2.721 0.007
m ~
x 0.530 0.067 7.958 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.y 0.967 0.097 10.000 0.000
.m 0.993 0.099 10.000 0.000
From this table we can see that both x
and m
are significant predictors of
y
, and that x
also predicts m
. This implie that mediation is taking place,
but see the mediation chapter for details of testing indirect
effects in lavaan
.
Where’s the intercept?
Path analysis is part of the set of techniques often termed ‘covariance modelling’. As the name implies the primary focus here is the relationships between variables, and less so the mean-structure of the variables. In fact, by default the software first creates the covariance matrix of all the variables in the model, and the fit is based only on these values, plus the sample sizes (in early SEM software you typically had to provide the covariance matrix directly, rather than working with the raw data).
Nonetheless, because path analysis is an extension of regression techniques it
is possible to request that intercepts are included in the model, and means
estimated, by adding meanstructure=TRUE
to the sem()
function
(see the lavaan
manual for details).
In the output below we now also see a table labelled ‘Intercepts’ which gives the mean values of each variable when it’s predictors are zero (just like in linear regression):
mediation.fit.means <- sem(mediation.model,
meanstructure=T,
data=mediation.df)
summary(mediation.fit.means)
lavaan 0.6-3 ended normally after 16 iterations
Optimization method NLMINB
Number of free parameters 7
Number of observations 200
Estimator ML
Model Fit Test Statistic 0.000
Degrees of freedom 0
Minimum Function Value 0.0000000000000
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard Errors Standard
Regressions:
Estimate Std.Err z-value P(>|z|)
y ~
x 0.166 0.075 2.198 0.028
m 0.190 0.070 2.721 0.007
m ~
x 0.530 0.067 7.958 0.000
Intercepts:
Estimate Std.Err z-value P(>|z|)
.y 10.629 0.362 29.323 0.000
.m 5.097 0.070 72.298 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.y 0.967 0.097 10.000 0.000
.m 0.993 0.099 10.000 0.000
Tables of model coefficients
If you want to present results from these models in table format, the
parameterEstimates()
function is useful to extract the relevant numbers as a
dataframe. We can then manipulate and present this table as we would any other
dataframe.
In the example below we extract the parameter estimates, select only the
regression parameters (~
) and remove some of the columns to make the final
output easier to read:
parameterEstimates(mediation.fit.means) %>%
as_tibble() %>%
filter(op=="~") %>%
mutate(Term=paste(lhs, op, rhs)) %>%
rename(estimate=est,
p=pvalue) %>%
select(Term, estimate, z, p) %>%
pander::pander(caption="Regression parameters from `mediation.fit`")
Term | estimate | z | p |
---|---|---|---|
y ~ x | 0.1657 | 2.198 | 0.02797 |
y ~ m | 0.1899 | 2.721 | 0.006515 |
m ~ x | 0.5298 | 7.958 | 1.776e-15 |
Diagrams
Because describing path, CFA and SEM models in words can be tedious and difficult for readers to follow it is conventional to include a diagram of (at least) your final model, and perhaps also initial or alternative models.
The semPlot::
package makes this relatively easy: passing a fitted lavaan
model to the semPaths()
function produces a line drawing, and gives the option
to overlap raw or standardised coefficients over this drawing:
# unfortunately semPaths plots very small by default, so we set
# some extra parameters to increase the size to make it readable
semPlot::semPaths(mediation.fit, "par",
sizeMan = 15, sizeInt = 15, sizeLat = 15,
edge.label.cex=1.5,
fade=FALSE)