Crosstabulations and \(\chi^2\)

We saw in a previous section how to create a frequency table of one or more variables. Using that previous example, assume we already have a crosstabulation of age and prefers

lego.table
         prefers
age       duplo lego
  4 years    38   20
  6 years    12   30

We can easily run the inferential \(\chi^2\) (sometimes spelled “chi”, but pronounced “kai”-squared) test on this table:

lego.test <- chisq.test(lego.table)
lego.test

    Pearson's Chi-squared test with Yates' continuity correction

data:  lego.table
X-squared = 11.864, df = 1, p-value = 0.0005724

Note that we can access each number in this output individually because the chisq.test function returns a list. We do this by using the $ syntax:

# access the chi2 value alone
lego.test$statistic
X-squared 
 11.86371 

Even nicer, you can use an R package to write up your results for you in APA format!

library(apa)
apa(lego.test, print_n=T)
[1] "$\\chi^2$(1, n = 100) = 11.86, *p* < .001"

See more on automatically displaying statistics in APA format

Three-way tables

You can also use table() or xtabs() to get 3-way tables of frequencies (xtabs is probably better for this than table).

For example, using the mtcars dataset we create a 3-way table, and then convert the result to a dataframe. This means we can print the table nicely in RMarkdown using the pander.table() function, or process it further (e.g. by sorting or reshaping it).

xtabs(~am+gear+cyl, mtcars) %>%
  as_data_frame() %>%
  pander()
Warning: `as_data_frame()` is deprecated, use `as_tibble()` (but mind the new semantics).
This warning is displayed once per session.
am gear cyl n
0 3 4 1
1 3 4 0
0 4 4 2
1 4 4 6
0 5 4 0
1 5 4 2
0 3 6 2
1 3 6 0
0 4 6 2
1 4 6 2
0 5 6 0
1 5 6 1
0 3 8 12
1 3 8 0
0 4 8 0
1 4 8 0
0 5 8 0
1 5 8 2

Often, you will want to present a table in a wider format than this, to aid comparisons between categories. For example, we might want our table to make it easy to compare between US and non-US cars for each different number of cylinders:

xtabs(~am+gear+cyl, mtcars) %>%
  as_data_frame() %>%
  reshape2::dcast(am+gear~paste(cyl, "Cylinders")) %>%
  pander()
Using n as value column: use value.var to override.
am gear 4 Cylinders 6 Cylinders 8 Cylinders
0 3 1 2 12
0 4 2 2 0
0 5 0 0 0
1 3 0 0 0
1 4 6 2 0
1 5 2 1 2

Or our primary question might be related to the effect of am, in which case we might prefer to incude separate columns for US and non-US cars:

xtabs(~am+gear+cyl, mtcars) %>%
  as_data_frame() %>%
  reshape2::dcast(gear+cyl~paste0("US=", am)) %>%
  pander()
Using n as value column: use value.var to override.
gear cyl US=0 US=1
3 4 1 0
3 6 2 0
3 8 12 0
4 4 2 6
4 6 2 2
4 8 0 0
5 4 0 2
5 6 0 1
5 8 0 2