22 Non-independence

Psychological data often contains natural groupings. In intervention research, multiple patients may be treated by individual therapists, or children taught within classes, which are further nested within schools; in experimental research participants may respond on multiple occasions to a variety of stimuli.

Although disparate in nature, these groupings share a common characteristic: they induce dependency between the observations we make. That is, our data points are not independently sampled from one another.

What this means is that observations within a particular grouping will tend, all other things being equal, be more alike than those from a different group.

Why does this matter?

Think of the last quantitative experiment you read about. If you were the author of that study, and were offered 10 additional datapoints for ‘free’, which would you choose:

  1. 10 extra datapoints from existing participants.
  2. 10 data points from 10 new participants.

In general you will gain more new information from data from a new participant. Intuitively we know this is correct because an extra observation from someone we have already studies is less likely to surprise us or be different from the data we already have than an observation from a new participant.

Most traditional statistical models assume that data are sampled independently however. And the precision of the inferences we can draw from from statistical models is based on the amount of information we have available. This means that if we violate this assumption of independent sampling we will trick our model into thinking we have more information than we really do, and our inferences may be wrong.