In brief

Scientists develop theoretical models which aim to describe the true relationships between variables. Statistical models are used to link their theories with data. They allow us to make predictions about future events, and our confidence in these predictions. But accurate predictions alone aren’t enough: We also want to understand our data and the process that generated it.

Causal diagrams are a useful tool for thinking about causes and effects. For psychological phenomena the diagrams can become complicated. But good research simplifies: We focus on small parts of a large network of causes and effects to make incremental progress.

The quality of evidence for different parts of a causal diagram can vary a lot. We are much more sure about some links in the network than others. But causal diagrams can represent both our knowledge and our hypotheses explicitly; this is useful as we build statistical models to check how well our theories perform in the real world.

1. Tasks and slides

This worksheet contains explanatory notes and some additional reading. For quick reference, the following are shortcuts to the tasks you must complete during, or as homework between, the two workshops:

Main workshop

Preparation

TARA led session

The slides which accompanied this workshop are available on the module DLE site.

Variables and constructs

Our first job is to try and identify the variables or constructs involved in the phenomena we are interested in.

We’ll talk more about variable and constructs in the session on measurement, but for the moment we’ll use the following definition for a variable:

  • Things we can measure (observations)
  • Things that cause other things, even if we can’t measure them (constructs)

For example, let’s imagine we were trying to explain attendance rates for a 9am lecture.

  • We might be interested in the time the cohort of students went to bed. This is (in theory) measurable, and we could make observations of students’ bedtimes.

  • We might also hypothesise that students’ mood or affect would influence their attendance. If someone was depressed they might skip the lecture. We can’t measure depression directly, but we do think of it as something which causes other things to happen. For this reason we would talk about depression as a construct: we infer that it is present from other indicators (e.g. reports of poor mood, hopelessness, fatigue, trouble concentrating, etc. in a psychometric scale).

Task 1

In groups think back to your qualitative projects and interviews:

  1. What variables did you encounter in that work which might be related in some way academic achievement in higher education?

  2. Brainstorm, based on your qualitative research and general knowledge, what other variables might be connected in some way to the ones you have already identified? The connections might be quite distant or convoluted - but that’s fine.

  3. Make a list (as long as you like) of all these variables. Then:

  4. For each one, decide if it is something which can be directly observed, or if it is a construct which we measure indirectly.


Hints:

  • Remember, variables can be both observations (things we could measure directly, e.g. hours of paid word) and constructs (e.g. motivation), which we can only measure indirectly.

  • Be as specific as possible when identifying variables. So, for example, rather than “Mental health” you might say “anxiety” or “depression”. Instead of “Family context”, you might say “number of siblings”, “parental income”, or “positive parental attitude to study”.

  • Think laterally: the list is supposed to capture everything you think could be related to academic achievement at university.

  • Don’t limit yourself to the things discussed in your focus groups — be as inclusive as possible at this stage.

  • Find ways to work on documents together. This is especially important if you are working remotely. A good option for making lists or collaborating on text is to create a shared word document on Office 365. See this guide on sharing office documents.

Causal diagrams

This section in intended as notes to the lecture/background reading. If you are in the workshop you can skip to the next section for the task.

Once we have identified the variables we think might be involved, we can try and describe the network of causes and effects between them and the phenomena we’re interested in. This is our model.

To represent this model we can use a special type of diagram called a directed acyclic graph. That’s a fancy name for a boxes-and-arrows diagram, with a few special rules (we can come to those later).


This diagram represents a simple situation in which one variable (“Predictor”) causes another variable (“Outcome”). In this diagram we are saying nothing else causes the outcome. That’s important to note because it’s quite a strong assumption that this model makes.

Simple cause and effect


This diagram says our footwear and exam grades AREN’T related AT ALL, because we didn’t draw a line between them:

No causation


Diagrams can also describe how variables are related in a particular causal sequence. For example, in our research we might ask questions like:

  • does childhood poverty reduce academic achievement by delaying brain development?
  • does weaker childhood attachment reduce academic performance by reducing the motivation to study?

This pattern – where variables are linked in a series – is called ‘mediation’:

Causal sequences/mediation


Finally, if you don’t know which direction the arrow should point — that is, you don’t know which is the cause and which is the effect — we can (temporarily) draw an arrowhead at both ends like this:

Correlation (as distinct from causation)

This represents a correlation (see the stage 1 notes on correlation and relationships).

Our hope is that — as we learn more, by collecting data or running experiments — we can decide which way the arrow should point.

Draw a causal diagram

In your groups:

  1. Draw a causal diagram of relationships between the variables you identified in the first task. Draw in all the paths where the variables might be related. Leave out paths where you don’t think there is any relationship.

  2. Discuss how strong you think each of the relationships (lines) are. What kinds of evidence do you have (or know of) that make you think the diagram is correct?

  3. Do some variables have no link between them? If so, discuss whether you think there is really absolutely no relationship between these constructs, vs. the case where the relationship is just very weak/uncertain.

  4. Can you find examples of mediation in your diagram?

Tips for this task:

  • You can see an example drawing in PowerPoint here. Use this to get started and draw your own diagram. Another example is here, using Miro: https://miro.com/app/board/o9J_khyI8Ds=/ Use whichever tool feels easiest to you.

  • Don’t worry if you have lots of variables, or if your diagram gets very complicated. Just draw in all the connections you think are reasonable.

  • Try to leave out some connections if you can. The simpler the set of interconnections the better.


Notice that “Academic performance” has several arrows pointing at it. This means this model *predicts academic performance.

Your model should also have at least one arrow pointing at “Academic performance”.

Can you spot problems with this model? Things you disagree with? Things that might be missing? As an example: should there be additional variables between parental income and academic performance?

‘Moderation’

Another common question for researchers is whether relationships between variables are true all the time, or if they vary depending on the context or person.

In concrete terms, we might ask questions like:

  • “does low self esteem hurt academic performance more for women than for men?”
  • “does social media cause less anxiety for people who have high emotional intelligence (EQ)?”

There is no universally agreed-upon way to represent effect modification in DAGs1, but for the moment, you should draw it like this:

Moderation or effect modification

The arrow from gender points at the relationship between self esteem and grades. We mean that the effect of self esttem on grades depends on whether you are a woman.

This pattern is called moderation or effect modification. Checking to see if a relationship is the same for different groups is also called stratification.

Moderation in your causal model

Task 3: Moderation in your model

In groups again, consider your own causal diagram from the first task:

  1. If gender is not already included, add a new box for it and paths to any of the other variables you think appropriate.

  2. Discuss: Could gender moderate one of the other relationships in the diagram? If so, draw this in now.

  3. Could moderation be occurring with any of the other variables in your diagram? If so, add paths to represent this.

Confounding and bias

Thinking about correlations, causation and experiments

Note: There are no activities in the rest of this section. If you are in the workshop now then skip to the next section but come back and read this later.

How can we be sure we haven’t forgotten to measure important variables that might be responsible for confounding? How can we avoid confounding? There are three main ways:

  1. Use well-designed and well-run experiments to make confounding impossible.

  2. Design our correlational studies carefully and use multiple sources of evidence, to convince ourselves that confounding is improbable in this case (see notes on why smoking is a good example of this).

  3. Account for all the possible confounders (this is virtually impossible, but sometimes trying to do this is the best we can hope for).

Experiments and confounding

Good experiments make confounding impossible. To see how, consider this diagram based on the example above:

The digramme represents an experiment in which we randomise our participants to two experimental conditions, and followed them up over 2 years.

  • Condition 1: Strike 20 extra matches each day for no reason
  • Condition 2: Don’t strike any extra matches

At the end we count how many participants in each group had cancer.

In the diagram we can see there are no arrows pointing at the box marked 'Experimental Randomisation'. This means the randomisation is uncaused. When we say something is random, we mean there is no other variable which is causing it.

Our randomisation influences the level of match-use independently of any other variable. We can use this fact to make inferences about the relationship between matches and cancer, without having to worry about confounding.

Interestingly though: a study which randomises participants to two treatments doesn’t actually estimate the effect of the treatment itself. Instead, the trial estimates the effect of offering a treatment, which is similar but not always quite the same thing. In clinical trials this is known as an intention to treat analysis.

This means that our experiment lets us test: The effect of being asked to strike 20 extra matches each day on cancer risk.

Normally we will be prepared to make the extra assumption that any group difference was due to the matches struck — because there were no other (obvious) differences in our treatment of the groups.

Using multiple sources of evidence

I mentioned above that using multiple sources of evidence can help to make reasonable assumptions for inferences, when experiments are impossible.

Where experiments are hard to run, it’s important to use observational data wisely. See Pearl and Mackenzie (2018) chapter 5 for a great example of this: Pearl argues (persuasively) that R.A. Fisher (the godfather of experimental design) was wrong to dismiss the observational/correlational evidence that smoking was harmful. Careful observational research linked smoking with cancer and has saved millions of lives — and nobody needed to run an expriment in which people were randomised to a smoking condition!

2. TARA-led session

Preparation

As a group, use Google Scholar to find a quantitative study which provides evidence to support one of the paths in your diagram. This study should either be:

Tips on literature searching

If you feel out of practice when searching for literature, you Use Chris Berry’s materials on finding literature from Stage 1, or Kerri Daymond’s talk earlier in the module (see Panopto).

Using Google scholar

Academics use a combination for search tools and strategies in their work. However most of my colleagues use Google scholar as a starting point: it’s a powerful tool and enables you to quickly find research data and narrow down an initial search.

Google Scholar is very sensitive to the quality of the search terms you use. These articles have good tips on effective searching:

This video shows you how to link GS to the Plymouth library, making it easier to find PDF versions of the papers you want to read: https://youtu.be/BnmSSiOM5Sc. Doing this will enable you to click through directly to Primo full text sources.

Considering confounding

In groups, consider the causal diagram you drew in the second task:

  1. Look at each of the variables in your diagram which has an arrow pointing away from it. Could you run an experiment which randomises people to have higher or lower scores on these variables? If not, why not?

  2. Is it possible that you missed any variables when you drew your diagram? Could confounding be taking place? If so, update your diagram to make this explicit.

  3. Take this opportunity to make any other changes you would like to your causal diagram. If you would like to create your own, individual versions of the diagram (e.g. because you disagree with your group!) then that’s fine. If you do diverge from your group, try and summarise any differences in a note to yourself and justify your own decisions.

Journal club

A journal club is a group of individuals who meet regularly to critically evaluate recent articles in the academic literature https://en.wikipedia.org/wiki/Journal_club. It’s a great way of providing regular peer support and discussion (especially in stage 4).

Working on your own, spend no more than 25 minutes to extract the key points from one of the studies you found so that you can communicate them to your group in the session (see the task below).

Be sure to identify:

  • The design of the study
  • The population sampled
  • The predictor variables or interventions used
  • The outcome variable (and how it was measured)
  • How strong or weak was the relationship between predictors and outcome? That is, what was the ‘effect size’?

In groups:

  • Take turns to present the findings from the paper you identified. This should be a 2-3 minute summary of the paper you found. After your presentation, be ready to take questions from the other members of the group.

  • If you are listening to a presentation, be sure to think of at least one follow-up question to ask.

Once everyone has presented and answered questions,

  • Discuss which papers you think provide the best evidence for one of the paths in your diagram, and justify why. There is no single ‘right’ answer here, but you might like to consider factors like the design, sample size, quality of the outcome measures etc.

  • Discuss and place each of the papers in your group into rank order, based on how “good” the evidence they provide is.

3. Homework and assessment progress

After this session you should be able to answer (or at least make a start on) questions 1 and 2 from part 1 of the ‘data’ assignment.

  1. Save a digital version of the causal diagram you drew in the workshop. This image can be either be a powerpoint file, a screen-grab or a photo of a hand drawn sketch. It doesn’t have to look neat — it’s the ideas that matter here.

Feel free to update/amend the diagram from the one you drew in the session. Make the model reflect what you would predict/expect to happen, based on what you know.

To be explicit: It’s fine if your model diverges from the one you developed as a group. You can each submit your own model if you like, or submit the one you developed together.

  1. Write a short summary of the evidence for the three most important relationships described by your model (providing a citation for each).

References

Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic Books.

Weinberg, Clarice R. 2007. “Can DAGs Clarify Effect Modification?” Epidemiology (Cambridge, Mass.) 18 (5): 569.


  1. Weinberg (2007) suggests the style/syntax I use here. However, other authors (including Pearl) argue that this is unecessary. Because DAGs don’t describe the nature of the relationships between variables, they argue, effect modification is always implied or at least possible whenever two variables point to a third. My suggestion would be to follow Weinberg (2007) to make your model more explicit to general readers.↩︎