• Just enough R
  • Introduction
  • I Getting started
  • 1 Working with R
    • Installation
    • Workflow
      • RMarkdown
      • RStudio
      • Creating code chunks
    • First commands
    • Naming things
    • Vectors and lists
      • Vectors
      • Accessing elements
      • Selecting more than one element
      • Making and slicing with sequences
      • Conditional slicing
    • Working with vectors
      • Making new vectors
      • Making up data (new vectors)
    • Functions to learn now
    • Packages
  • II Data
  • 2 The dataframe
    • Working with dataframes
      • Introducing the tidyverse
    • Selecting columns
    • Selecting rows
    • ‘Operators’
    • Sorting
    • Pipes
    • Modifying and creating new columns
  • 3 ‘Real’ data
    • Importing data
      • Importing data over the web
      • Importing from SPSS and other packages
    • Saving and exporting
      • Use CSV files
      • Archiving, publication and sharing
    • Dealing with multiple files
    • Joining different datasets
    • Types of variable
      • Differences in quantity: numeric variables
      • Differences in quality or kind
      • Factors for categorical data
      • Dates
    • Missing values
    • Tidying data
    • Reshaping
      • Which package should you use to reshape data?
      • Aggregating and reshaping at the same time
  • 4 Summaries
    • A generalised approach
      • Fancy reshaping
  • 5 Graphics
    • Benefits of visualising data
    • Which tool to use?
    • Layered graphics with ggplot
      • A thought on ‘chart chooser’ guides
      • Thinking like ggplot
      • ‘Relationships’
      • ‘Distributions’
      • ‘Comparisons’
      • ‘Composition’
    • ‘Quick and dirty’ (utility) plots
      • 5.0.1 Distributions
      • 5.0.2 Relationships
      • 5.0.3 Quantities
    • Tricks with ggplot
      • More ways to facet a plot
      • facet_wrap
      • facet_grid
      • Combining separate plots in a grid
    • Exporting for print
  • III Models
  • 6 Commonly used statistics
    • 6.1 Non-parametric statistics
    • Crosstabulations and \(\chi^2\)
      • Three-way tables
    • Correlations
      • Creating a correlation matrix
      • Working with correlation matrices
      • Tables for publication
      • Other methods for correlation
    • t-tests
      • Visualising your data first
      • Running a t-test
  • 7 Regression
    • Describing statistical models using formulae
    • Running a linear model
    • More on formulas
    • Factors and variable codings
    • Model specification
      • Effect/dummy coding and contrasts
      • Centering (is often helpful)
      • Scaling inputs
      • Alternatives to rescaling
      • What next
  • 8 Anova
    • Rules for using Anova in R
    • Recommendations for doing Anova
    • Anova ‘Cookbook’
      • Between-subjects Anova
      • Repeated measures or ‘split plot’ designs
      • Traditional repeated measures Anova
      • Comparison with a multilevel model
    • Checking assumptions
    • Followup tests
  • 9 Generalized linear models
    • Logistic regression
  • 10 Multilevel models
    • Fitting multilevel models in R
      • Use lmer and glmer
      • p values in multilevel models
    • Extending traditional RM Anova
      • Fit a simple slope for Days
      • Allow the effect of sleep deprivation to vary for different participants
      • Fitting a curve for the effect of Days
    • Variance partition coefficients and intraclass correlations
    • 3 level models with ‘partially crossed’ random effects
    • Contrasts and followup tests using lmer
    • Troubleshooting
      • Convergence problems and simplifying the random effects structure
    • Bayesian multilevel models
  • 11 Mediation and covariance modelling
    • Mediation
      • Mediation with multiple regression
      • Mediation Steps
      • Mediation example after Baron and Kenny
    • Testing the indirect effect
    • 11.1 Mediation using Path models
    • Covariance modelling
    • Path models
      • Defining a model
    • Confirmatory factor analysis (CFA)
      • Latent variables
      • Defining a CFA model
    • CFA model fit
    • Modification indices
    • Model modification and improvement
    • Structural eqution modelling (SEM)
    • ‘Identification’ in CFA and SEM
    • Missing data
    • Goodness of fit statistics in CFA
  • 12 Baysian model fitting
    • Baysian fitting of linear models via MCMC methods
    • Posterior probabilities for parameters
    • Credible intervals
    • Bayesian ‘p values’ for parameters
  • 13 Power analysis
    • For most inferential statistics
    • For multilevel or generalised linear models
  • IV Patterns
  • 14 Learning key patterns
  • 15 Unpicking interactions
    • What is an interaction?
    • Visualising interactions from raw data
    • A painful example
    • Continuous predictors
  • 16 Making predictions
    • Predictions vs margins
    • Predicted means
    • Effects (margins)
    • Continuous predictors
    • Predicted means and margins using lm()
      • Running the model
      • Making predictions for means
      • Making prdictions for margins (effects of predictors)
      • Marginal effects
    • Predictions with continuous covariates
    • Visualising interactions
  • 17 Models are data
    • Storing models in variables
    • Extracting results from models
    • ‘Poking around’ with $ and @
    • Save time: use a broom
    • ‘Processing’ results
    • Printing tables
    • APA formatting for free
      • Chi2
      • T-test
      • Anova
      • Multilevel models
    • Simplifying and re-using
      • Writing helper functions
      • Re-using code with ggplot
    • “Table 1”
  • 18 Dealing with quirks of R
    • Rownames are evil
    • Working with character strings
      • 18.0.1 Searching and replacing
      • Using paste to make labels
      • Fixing up variable after melting
    • Colours
      • Picking colours for plots
      • Named colours in R
      • ColourBrewer with ggplot
  • 19 Getting help
    • Finding the backtick on your keyboard
  • V Explanations
  • 20 Confidence and Intervals
    • The problem with confidence intervals
    • Forgetting that the CI depends on sample size.
  • 21 Multiple comparisons
    • p values and ‘false discoveries’
    • Multiple tests on the same data
    • What to do about it?
    • Practical examples
  • 22 Non-independence
  • 23 Fixed and random effects
  • 24 Scaling predictor variables
    • Standardising
    • Dichotomising continuous predictors (or outcomes)
  • 25 Non-scale outcomes
    • 25.1 Link functions
      • Logistic regression
  • 26 Building and choosing models
    • Like maps, models are imperfect but useful
    • Overfitting/underfitting
    • Choosing the ‘right variables’
  • References

Just Enough R

25.1 Link functions

Logistic and poisson regression extend regular linear regression to allow us to constrain linear regression to predict within the rannge of possible outcomes. To achieve this, logistic regression, poisson regression and other members of the family of ‘generalised linear models’ use different ‘link functions’.

Link functions are used to connect the outcome variable to the linear model (that is, the linear combination of the parameters estimated for each of the predictors in the model). This means we can use linear models which still predict between -∞ and +∞, but without making inappropriate predictions.

For linear regression the link is simply the identity function — that is, the linear model directly predicts the outcome. However for other types of model different functions are used.

A good way to think about link functions is as a transformation of the model’s predictions. For example, in logistic regression predictions from the linear model are transformed in such a way that they are constrained to fall between 0 and 1. Thus, although to the underlying linear model allows values between -∞ and +∞, the link function ensures predictions fall between 0 and 1.

Logistic regression

When we have binary data, we want to be able run something like regression, but where we predict a probability of the outcome.

Because probabilities are limited to between 0 and 1, to link the data with the linear model we need to transform so they range from -∞ (infinity) to +∞.

You can think of the solution as coming in two steps:

Step 1

We can transform a probability on the 0—1 scale to a \(0 \rightarrow ∞\) scale by converting it to odds, which are expressed as a ratio:

\[odds = \dfrac{p}{1-p}\]

Probabilities and odds ratios are two equivalent ways of expressing the same idea.

So a probability of .5 equates to an odds ratio of 1 (i.e. 1 to 1); p=.6 equates to odds of 1.5 (that is, 1.5 to 1, or 3 to 2), and p = .95 equates to an odds ratio of 19 (19 to 1).

Odds convert or map probabilities from 0 to 1 onto the real numbers from 0 to ∞.

Probabilities converted to the odds scale. As p approaches 1 Odds goes to infinity.

Figure 9.1: Probabilities converted to the odds scale. As p approaches 1 Odds goes to infinity.

We can reverse the transformation too (which is important later) because:

\[\textrm{probability} = \dfrac{\textrm{odds}}{1+\textrm{odds}}\]

  • If a bookie gives odds of 66:1 on a horse, what probability does he think it has of winning?

  • Why do bookies use odds and not probabilities?

  • Should researchers use odds or probabilities when discussing with members of the public?

Step 2

When we convert a probability to odds, the odds will always be > zero.

This is still a problem for our linear model. We’d like our ‘regression’ coefficients to be able to vary between -∞ and ∞.

To avoid this restriction, we can take the logarithm of the odds — sometimes called the logit.

The figure below shows the transformation of probabilities between 0 and 1 to the log-odds scale. The logit has two nice properties:

  1. It converts odds of less than one to negative numbers, because the log of a number between 0 and 1 is always negative[^1].

  2. It flattens the rather square curve for the odds in the figure above, and

Probabilities converted to the logit (log-odds) scale. Notice how the slope implies that as probabilities approach 0 or 1 then the logit will get very large.

Figure 5.1: Probabilities converted to the logit (log-odds) scale. Notice how the slope implies that as probabilities approach 0 or 1 then the logit will get very large.

Reversing the process to interpret the model

As we’ve seen here, the logit or logistic link function transforms probabilities between 0/1 to the range from negative to positive infinity.

This means logistic regression coefficients are in log-odds units, so we must interpret logistic regression coefficients differently from regular regression with continuous outcomes.

  • In linear regression, the coefficient is the change the outcome for a unit change in the predictor.

  • For logistic regression, the coefficient is the change in the log odds of the outcome being 1, for a unit change in the predictor.

If we want to interpret logistic regression in terms of probabilities, we need to undo the transformation described in steps 1 and 2. To do this:

  1. We take the exponent of the logit to ‘undo’ the log transformation. This gives us the predicted odds.

  2. We convert the odds back to probability.

A hypothetical example

Imagine if we have a model to predict whether a person has any children. The outcome is binary, so equals 1 if the person has any children, and 0 otherwise.

The model has an intercept and one predictor, \(age\) in years. We estimate two parameters: \(\beta_0 = 0.5\) and \(\beta_{1} = 0.02\).

The outcome (\(y\)) of the linear model is the log-odds.

The model prediction is: \(\hat{y} = \beta_0 + \beta_1\textrm{age}\)

So, for someone aged 30:

  • the predicted log-odds = \(0.5 + 0.02 * 30 = 1.1\)
  • the predicted odds = \(exp(1.1) = 3.004\)
  • the predicted probability = \(3.004 / (1 + 3.004) = .75\)

For someone aged 40:

  • the predicted log-odds = \(0.5 + 0.02 * 40 = 1.3\)
  • the predicted odds = \(exp(1.3) = 3.669\)
  • the predicted probability = \(3.669 / (1 + 3.669) = .78\)

25.1.0.1 Regression coefficients are odds ratios

One final twist: In the section above we said that in logistic regression the coefficients are the change in the log odds of the outcome being 1, for a unit change in the predictor.

Without going into too much detail, one nice fact about logs is that if you take the log of two numbers and subtract them to take the difference, then this is equal to dividing the same numbers and then taking the log of the result:

A <- log(1)-log(5)
B  <- log(1/5)


# we have to use rounding because of limitations in
# the precision of R's arithmetic, but A and B are equal
round(A, 10) == round(B, 10)
[1] TRUE

This means that change in the log-odds is the same as the ratio of the odds

So, once we undo the log transformation by taking the exponent of the coefficient, we are left with the odds ratio.

You can now jump back to running logistic regression.