Goodness of fit statistics in CFA

It’s worth noting that many ‘goodness of fit’ statistics are misnamed and are in fact indexing ‘badness of fit’. This applies to RMSEA, \(\chi^2\), BIC, AIC and others.

However all of these indices are trying to solve similar problems in subtly different ways. The problem is that we would like a model which:

  • Fits the data we have and also
  • Predicts new data

You might think that these goals would be aligned and that a model which fits the data we have would also be good ad predicting new data, but this isn’t the case. In fact, if we overfit our current data we won’t be able to predict new observations very accurately.

How fit indices work

There is a tradeoff involved to avoid over-fitting the data, and most fit indices attempt to:

  • Quantify how well the model fits the current data but
  • Penalise models which use many parameters (i.e. those in danger of overfitting)

Each formula for a goodness of fit statistic represents a different tradeoff between these goals.

Model fit statistics are useful but can be misleading and misused. See David Kenny’s page on model fit for more details:

Below are some of the most useful and commonly reported GOF statistics for CFA and SEM models:

Root Mean Square Error of Approximation (RMSEA)

MacCallum, Browne and Sugawara (1996) have used 0.01, 0.05, and 0.08 to indicate excellent, good, and mediocre fit, respectively.

RMSEA < .05 often used as a cutoff for a reasonably fitting model, athough others suggest .1.

RMSEA is also used to calculate the ‘probability of a close fit’ or pclose statistic — this is the probability that the RMSEA is under 0.05.

Comparative fit index (CFI)

CFI (and the related TLI) assesses the relative improvement in fit of your model compared with the baseline model.

CFI ranges between 0 and 1.

The conventional (rule of thumb) threshold for a good fitting model is for CFI to be > .9

Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)

The AIC and BIC are measures of comparative fit, so can be used when models are non-nested (and therefore otherwise not easily comparable).

AIC is particularly attractive because it corresponds to a measure of predictive accuracy. That is, selecting the model with the smallest AIC is one way of asking: “which model is most likely to accurately predict new data?”

Factors which can influence fit statistics

All of the following can influence or bias fit statistics:

  • Number of variables (although note RMSEA trends to reduce with more parameters included, but other fit statistics will increase).
  • Model complexity (different statistics reward parsimony to different degrees).
  • Sample size (varies by statistic: some increase and others decrease with sample size).
  • Non-normality of outcome data will (tend to) worsen fit.

Which statistics should you report?

When reporting absolute model fit, RMSEA and CFI are the most widely reported, and are probably sufficient.

However, you should almost never just report a single model, and so:

  • When comparing nested models you should report the \(\chi^2\) lrtest.
  • When comparing non-nested models you should also report differences in BIC and AIC.

Further reading

This set of slides on model fit provides all fo the formulae and an explanation for many different fit indices: