Crosstabulations and $\chi^2$

We saw in a previous section how to create a frequency table of one or more variables. Using that previous example, assume we already have a crosstabulation of age and prefers

lego.table
         prefers
age       duplo lego
  4 years    38   20
  6 years    12   30

We can easily run the inferential $\chi^2$ (sometimes spelled “chi”, but pronounced “kai”-squared) test on this table:

lego.test <- chisq.test(lego.table)
lego.test

    Pearson's Chi-squared test with Yates' continuity correction

data:  lego.table
X-squared = 11.864, df = 1, p-value = 0.0005724

Note that we can access each number in this output individually because the chisq.test function returns a list. We do this by using the $ syntax:

# access the chi2 value alone
lego.test$statistic
X-squared 
 11.86371

Even nicer, you can use an R package to write up your results for you in APA format!

library(apa)
apa(lego.test, print_n=T)
[1] "$\\chi^2$(1, n = 100) = 11.86, *p* < .001"

See more on automatically displaying statistics in APA format

Three-way tables

You can also use table() or xtabs() to get 3-way tables of frequencies (xtabs is probably better for this than table).

For example, using the mtcars dataset we create a 3-way table, and then convert the result to a dataframe. This means we can print the table nicely in RMarkdown using the pander.table() function, or process it further (e.g. by sorting or reshaping it).

xtabs(~am+gear+cyl, mtcars) %>%
  as_data_frame() %>%
  pander()
Warning: `as_data_frame()` is deprecated, use `as_tibble()` (but mind the new semantics).
This warning is displayed once per session.

am	gear	cyl	n
0	3	4	1
1	3	4	0
0	4	4	2
1	4	4	6
0	5	4	0
1	5	4	2
0	3	6	2
1	3	6	0
0	4	6	2
1	4	6	2
0	5	6	0
1	5	6	1
0	3	8	12
1	3	8	0
0	4	8	0
1	4	8	0
0	5	8	0
1	5	8	2

Often, you will want to present a table in a wider format than this, to aid comparisons between categories. For example, we might want our table to make it easy to compare between US and non-US cars for each different number of cylinders:

xtabs(~am+gear+cyl, mtcars) %>%
  as_data_frame() %>%
  reshape2::dcast(am+gear~paste(cyl, "Cylinders")) %>%
  pander()
Using n as value column: use value.var to override.

am	gear	4 Cylinders	6 Cylinders	8 Cylinders
0	3	1	2	12
0	4	2	2	0
0	5	0	0	0
1	3	0	0	0
1	4	6	2	0
1	5	2	1	2

Or our primary question might be related to the effect of am, in which case we might prefer to incude separate columns for US and non-US cars:

xtabs(~am+gear+cyl, mtcars) %>%
  as_data_frame() %>%
  reshape2::dcast(gear+cyl~paste0("US=", am)) %>%
  pander()
Using n as value column: use value.var to override.

gear	cyl	US=0	US=1
3	4	1	0
3	6	2	0
3	8	12	0
4	4	2	6
4	6	2	2
4	8	0	0
5	4	0	2
5	6	0	1
5	8	0	2

Just Enough R

Crosstabulations and \(\chi^2\)

Three-way tables

am	gear	cyl	n
0	3	4	1
1	3	4	0
0	4	4	2
1	4	4	6
0	5	4	0
1	5	4	2
0	3	6	2
1	3	6	0
0	4	6	2
1	4	6	2
0	5	6	0
1	5	6	1
0	3	8	12
1	3	8	0
0	4	8	0
1	4	8	0
0	5	8	0
1	5	8	2

am	gear	cyl	n
0	3	4	1
1	3	4	0
0	4	4	2
1	4	4	6
0	5	4	0
1	5	4	2
0	3	6	2
1	3	6	0
0	4	6	2
1	4	6	2
0	5	6	0
1	5	6	1
0	3	8	12
1	3	8	0
0	4	8	0
1	4	8	0
0	5	8	0
1	5	8	2

am	gear	cyl	n
0	3	4	1
1	3	4	0
0	4	4	2
1	4	4	6
0	5	4	0
1	5	4	2
0	3	6	2
1	3	6	0
0	4	6	2
1	4	6	2
0	5	6	0
1	5	6	1
0	3	8	12
1	3	8	0
0	4	8	0
1	4	8	0
0	5	8	0
1	5	8	2