Quantifying evidence

Ben Whalley, Paul Sharpe, Sonja Heintz

Overview

In the previous workshop we learned techniques to describe and summarise data. Now we take the next step quantify how sure we are that the patterns are real, using the Bayes Factor.

The Bayes Factor (BF) is a single number which allows us to choose between different possible hypotheses (explanations) of how our data were generated, based on which is most probable.

The important thing to remember about Bayes Factors is that they compare two possible hypotheses about how the data came about. The meaning of the Bayes Factor completely depends on which hypotheses we compare.

Is this pattern a fluke?

  • In all data there is sampling error or ‘noise’
  • Patterns caused by noise will not replicate (i.e., we would not achieve the same result if we repeated the experiment)
  • Bayes Factors quantify the evidence for patterns
  • We compare the probability of two scenarios:
    1. the difference we see is ‘real’, and
    2. any difference is a ‘fluke’
  • By convention, large Bayes Factors mean the pattern is ‘real’
# No code examples in this video

Whenever we collect data, there is some element of chance. If the data we have are limited, it’s possible that any patterns have only occurred by chance. We need to be able to quantify how much information our data provide — that is, how much more or less confident about the patterns we are after we make observations.

Bayes Factors for differences

Previously, we emphasised that summaries and descriptions of data are used to answer research questions.

For example, we used a boxplot to compare the differences in the amount of weight participants lost when treated with either Functional Imagery Training or Motivational Interviewing (the funimagery dataset). This lets us answer the question: “which intervention was more effective in helping participants lose weight?”:

This boxplot shows the median and interquartile range for each treatment group. It’s immediately obvious that — in this sample — there is a difference between the two groups (3.8 kg, in fact), and that — in this sample — FIT helped people lose more weight than MI.

For many purposes this plot is enough.

The difference here is quite large and we can interpret this graph at face value. If the costs of these interventions were similar (i.e. if ‘all other things are equal’) we should choose FIT rather than MI.

However, researchers often want to quantify the evidence provided by the sample of data they collect. This is important in psychology, and other sciences, because the size of the effects we see are often quite small or subtle.

It might be the case that patterns we see are just a ‘fluke’. That is, they have come about by chance. Perhaps if we took another sample from the same population (treated more people with FIT and MI) then we would see no difference?

The Bayes Factor quantifies how confident we should be about any given pattern in our data by comparing two possibilities:

  1. There really is a difference in how much weight people lost with FIT and MI (we call this H1)
  2. There’s NO difference between FIT and MI; any difference we see is likely to be a fluke (H0)

Bayes Factors for correlations

We have also seen how a scatterplot describes the relationship between two columns of data:

From this plot the relationship seems quite clear, and the correlation between these columns is 0.79.

However, we don’t have that many data points, so the pattern might have happened by chance.

The Bayes Factor allows us to quantify how much evidence we have for this correlation. As before, we are comparing two different possibilities:

  1. There is a correlation between engine_size and power
  2. There is NO correlation (the pattern is just a fluke)

Do it: Bayesian t test

  • t-tests compare the average score for two groups
  • A Bayes Factor from a t-test compares two hypotheses:
    • H\(_1\): the groups are different, vs.
    • H\(_0\): the groups are identical
  • Use the ttestBF function in R (load the BayesFactor package first)
  • Bayes Factors range from zero to infinity
  • Large Bayes Factors are evidence for some difference
  • Use a boxplot to check the direction of any difference
# load packages first
library(psydata)
library(BayesFactor)


funimagery %>%
  ggplot(aes(intervention, weight_lost_end_trt)) +
  geom_boxplot() +
  geom_hline(yintercept=0)

# use `formula = ` to tell ttestBF which data to use
# the tilde symbol, ~, means "is predicted by"
ttestBF(formula = weight_lost_end_trt ~ intervention, data=funimagery)
Bayes factor analysis
--------------
[1] Alt., r=0.707 : 515647.5 ±0%

Against denominator:
  Null, mu1-mu2 = 0 
---
Bayes factor type: BFindepSample, JZS

A ‘Bayesian t-test’ is a procedure which calculates the Bayes Factor for a difference between groups.

That is, we get a Bayes Factor which tells us how likely the difference we see is ‘real’, rather than a fluke or due to sampling variation.

Specifically, the Bayes Factor from a Bayesian t-test says how much more likely it is that two groups have a different average score for a continuous variable, versus the possibility that there is no difference.

Some people call these the experimental and null hypotheses, or H1 and H0. H1 is the hypothesis that there is a difference. H0 is the hypothesis that there is zero difference.

The Bayes Factor is the ratio of the probabilities of H1 and H0. If H1 is highly probable and H0 improbable then the Bayes Factor will be large (and vice versa).

As we saw above, we can look at a boxplot and get an intuition for whether we think the difference is real by looking at how much the distributions of the two groups overlap.

  • Show boxplot again using code below

There is not much overlap between the groups, so we intuit the difference is real

If we find a large Bayes Factor, this is evidence for the experimental hypothesis, as compared with the null hypothesis.

Running a t-test

To run a t-test and calculate the Bayes Factor we first need to load the BayesFactor package:

library(psydata)
library(BayesFactor)

And then use the ttestBF function.

ttestBF(formula = weight_lost_end_trt ~ intervention, data=funimagery)
Bayes factor analysis
--------------
[1] Alt., r=0.707 : 515647.5 ±0%

Against denominator:
  Null, mu1-mu2 = 0 
---
Bayes factor type: BFindepSample, JZS
  • Write out the ttestBF code
  • explain this is an old function, so looks slightly different
  • explain the formula
  • outcome variable to the left
  • the tilde symbol in the middle, means “is predicted by”
  • then the group variable on the right
  • we use data=funimagery to say which data
  • show 2 factor levels in funimagery

The important part of the code here is where we write: weight_lost_end_trt ~ intervention. This tells R which column to use as the outcome and which to use as the group predictor. The tilde symbol, ~, just means “is predicted by”.

fitbf <- ttestBF(formula = weight_lost_end_trt ~ intervention, data=funimagery) %>%
  as_tibble() %>% pull(bf) %>% .[1]

The number 5.1564753^{5} is our Bayes Factor. This is very large, so we have strong evidence that the difference in means is real.

We explain how to interpret Bayes Factors in more detail in the section below, but large numbers are evidence for a difference.

Exercise 1

  1. Open session-5.rmd using the Files pane. This is the workbook you will be using in this session.
  2. Use the funimagery dataset
  3. Plot participants’ weights at baseline (kg1) in each intervention group
  4. Use summarise() to calculate each group’s average weight at baseline (you will need the group_by, summarise and mean functions to calculate these averages)
  5. Use ttestBF() with a formula to calculate the Bayes Factor for the difference in these weights. The resulting Bayes Factor is:

Interpreting Bayes Factors

No matter what the Bayes Factor is testing (correlation, difference etc), the size of a Bayes Factor is mostly dependent on three things:

  • How large the effect is (how big is the group difference or how steep is the slope)
  • How clear the effect is (e.g., how much do the scores from each group overlap)
  • How much data we collect (more data lets us be more confident; data quality also matters)

We return to this topic later but common interpretations of Bayes Factors are that:

  • A Bayes Factor of 1 means each hypothesis is equally likely

  • A Bayes factor > 3 is moderate evidence for the hypothesis being tested. BF > 10 is considered strong evidence.

  • A Bayes Factor of \(< \tfrac{1}{3}\) is moderate evidence against the hypothesis.

  • A Bayes Factor is \(> \tfrac{1}{3}\) and \(< 3\) is considered inconclusive

Do it: Bayes Factor for correlations

  • correlationBF estimates the Bayes Factor for a correlation
  • It compares:
    • H\(_1\): There is a relationship between two columns of data
    • H\(_0\): There is NO relationship
  • Large Bayes Factors are evidence for a relationship
# make a copy of the happy data, including only data from 2019
happy.2019 <- happy %>%
  filter(year == 2019)

# calculate the correlation coefficient first
happy.2019 %>% select(happiness, perceptions_of_corruption) %>% correlate()
# A tibble: 2 × 3
  term                      happiness perceptions_of_corruption
  <chr>                         <dbl>                     <dbl>
1 happiness                    NA                        -0.425
2 perceptions_of_corruption    -0.425                    NA    
# use `with()` to tell `correlationBF` which dataset to use
with(happy.2019,  correlationBF(happiness, perceptions_of_corruption))
Bayes factor analysis
--------------
[1] Alt., r=0.333 : 70351.73 ±0%

Against denominator:
  Null, rho = 0 
---
Bayes factor type: BFcorrelation, Jeffreys-beta*

A Bayes Factor for a correlation compares the two hypotheses that either:

  • H1: There is a relationship between the two columns of data
  • H0: There is NO relationship (any pattern is just due to chance)

Another way to think about it is to imagine we have a set of data points like this:

# A tibble: 2 × 3
  term       y      x
  <chr>  <dbl>  <dbl>
1 y     NA      0.693
2 x      0.693 NA    

We ask: if we wanted to draw a straight line through all the data points, which would be the best fit? Which line would fall the closest to all the data points?


The plot below shows some examples of lines we could draw.

Zero correlation is drawn in blue.

Examples of other possible coefficients are shown in red (but remember, it could be anything from -1 to 1).

Warning in geom_abline(data = dd, aes(intercept = 0, slope = s, 1, colour =
cc)): Ignoring unknown aesthetics: x

The Bayes Factor tells us: “Are ANY of the red lines a better fit than the blue line?”

The Bayes Factor is comparing the change that ANY of the red lines (where \(r > 0\) or \(r < 0\)) are a better fit than the blue line, where \(r = 0\).

Another way to say it is, “given the data we have collected, how much more likely is it that the correlation is non-zero, than actually being zero?”

A worked example

We want to check the relationship of two variables in the happy dataset, which is based on the Gallup World Happiness Report 2021:

  • happiness (a national measure of average life satisfaction)
  • perceptions_of_corruption (a national measure of how corrupt they think their society is)

The first thing to do is to plot the data, to check that the relationship between these variables is approximately linear (a straight line). Here I have used filter to select only the data from 2019.

happy %>% 
  filter(year==2019) %>% 
  ggplot(aes(perceptions_of_corruption, happiness)) + 
  geom_point() + 
  geom_smooth(method=lm, se=F)
Warning: Removed 8 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 8 rows containing missing values (`geom_point()`).

The plot suggests a fairly strong negative correlation, as you might expect.

We can check the correlation using select and correlate():

happy  %>% 
  filter(year == 2019) %>% 
  select(happiness, perceptions_of_corruption) %>% 
  correlate()
# A tibble: 2 × 3
  term                      happiness perceptions_of_corruption
  <chr>                         <dbl>                     <dbl>
1 happiness                    NA                        -0.425
2 perceptions_of_corruption    -0.425                    NA    

So - the question now is, how much evidence do we have that the correlation is ‘real’, and that this pattern isn’t just a fluke.

# first, make a copy of the happy data, including only data from 2019
happy.2019 <- happy %>% filter(year == 2019)
with(happy.2019,  correlationBF(happiness, perceptions_of_corruption))
Bayes factor analysis
--------------
[1] Alt., r=0.333 : 70351.73 ±0%

Against denominator:
  Null, rho = 0 
---
Bayes factor type: BFcorrelation, Jeffreys-beta*

Now, the Bayes Factor is 70352.

That means we can be very confident that the pattern we see is not simply due to chance.

Exercise 2

  • Use the attitude dataset which is built into R.

  • Use the correlate function in the corrr library to find the largest correlation between any of the variables in that dataset.

  • Calculate the BF for this correlation. Do we have strong evidence for it?

  • Now, repeat the process for the smallest correlation between the variables.

  • Do we have evidence against this correlation?

We expected students to find the largest and smallest correlations by looking at the output from this function:

attitude %>% corrr::correlate()

A trick to get the highest and lowest value is shown in the code below. If you’re interested in how it works, you can follow along with an extension worksheet here, although this is not required.

attitude %>% corrr::correlate() %>% 
  pivot_longer(-term) %>% 
  arrange(-abs(value)) %>% 
  filter(!is.na(value)) %>% 
  select(var1=term, var2=name, r=value) %>% 
  slice(c(1, n())) %>% 
  pander("The largest and smallest correlations in the attitude dataset")
The largest and smallest correlations in the attitude dataset
var1 var2 r
rating complaints 0.8254
critical learning 0.116

Having found the largest/smallest values, we can use correlationBF to compute the Bayes Factor for each one, as shown above.

We have very strong evidence for the largest correlation, BF > 1000.

with(attitude,  correlationBF(rating, complaints))
Bayes factor analysis
--------------
[1] Alt., r=0.333 : 199888.9 ±0%

Against denominator:
  Null, rho = 0 
---
Bayes factor type: BFcorrelation, Jeffreys-beta*

Because the BF is less than 1, we have anecdotal very weak against the weakest correlation (BF = 0.47). However because the value is between 0.33 and 3 we would consider this evidence inconclusive — we cannot be confident either way.

with(attitude,  correlationBF(learning, critical))
Bayes factor analysis
--------------
[1] Alt., r=0.333 : 0.4685909 ±0%

Against denominator:
  Null, rho = 0 
---
Bayes factor type: BFcorrelation, Jeffreys-beta*

Check your knowledge

  • What hypotheses are being compared when we calculate a Bayes Factor for a correlation?
  • Which function do we recommend to calculate correlation coefficients?
  • Which function calculates the Bayes Factor for a correlation?
  • Which function calculates the Bayes Factor for a difference in the average of two groups?
  • What direction slope does a correlation of -0.5 imply? What would the scatterplot look like?
  • What conclusion should we draw if the Bayes Factor for a correlation is 2.1? How about if it was 0.21, or 210?
  • What conclusion should we draw if the Bayes Factor for a t test is 0.8? How about if it were > 10?
  • What is the difference between ttestBF and correlationBF?