Crosstabulations and \(\chi^2\)
We saw in a previous section
how to create a frequency table of one or more variables.
Using that previous example, assume we already have a crosstabulation of age
and prefers
lego.table
prefers
age duplo lego
4 years 38 20
6 years 12 30
We can easily run the inferential \(\chi^2\) (sometimes spelled “chi”, but pronounced “kai”-squared) test on this table:
lego.test <- chisq.test(lego.table)
lego.test
Pearson's Chi-squared test with Yates' continuity correction
data: lego.table
X-squared = 11.864, df = 1, p-value = 0.0005724
Note that we can access each number in this output individually because the
chisq.test function returns a list. We do this by using the $ syntax:
# access the chi2 value alone
lego.test$statistic
X-squared
11.86371
Even nicer, you can use an R package to write up your results for you in APA format!
library(apa)
apa(lego.test, print_n=T)
[1] "$\\chi^2$(1, n = 100) = 11.86, *p* < .001"
See more on automatically displaying statistics in APA format
Three-way tables
You can also use table() or xtabs() to get 3-way tables of frequencies
(xtabs is probably better for this than table).
For example, using the mtcars dataset we create a 3-way table, and then
convert the result to a dataframe. This means we can print the table nicely in
RMarkdown using the pander.table() function, or process it further (e.g. by
sorting or reshaping it).
xtabs(~am+gear+cyl, mtcars) %>%
as_data_frame() %>%
pander()
Warning: `as_data_frame()` is deprecated, use `as_tibble()` (but mind the new semantics).
This warning is displayed once per session.
| am | gear | cyl | n |
|---|---|---|---|
| 0 | 3 | 4 | 1 |
| 1 | 3 | 4 | 0 |
| 0 | 4 | 4 | 2 |
| 1 | 4 | 4 | 6 |
| 0 | 5 | 4 | 0 |
| 1 | 5 | 4 | 2 |
| 0 | 3 | 6 | 2 |
| 1 | 3 | 6 | 0 |
| 0 | 4 | 6 | 2 |
| 1 | 4 | 6 | 2 |
| 0 | 5 | 6 | 0 |
| 1 | 5 | 6 | 1 |
| 0 | 3 | 8 | 12 |
| 1 | 3 | 8 | 0 |
| 0 | 4 | 8 | 0 |
| 1 | 4 | 8 | 0 |
| 0 | 5 | 8 | 0 |
| 1 | 5 | 8 | 2 |
Often, you will want to present a table in a wider format than this, to aid comparisons between categories. For example, we might want our table to make it easy to compare between US and non-US cars for each different number of cylinders:
xtabs(~am+gear+cyl, mtcars) %>%
as_data_frame() %>%
reshape2::dcast(am+gear~paste(cyl, "Cylinders")) %>%
pander()
Using n as value column: use value.var to override.
| am | gear | 4 Cylinders | 6 Cylinders | 8 Cylinders |
|---|---|---|---|---|
| 0 | 3 | 1 | 2 | 12 |
| 0 | 4 | 2 | 2 | 0 |
| 0 | 5 | 0 | 0 | 0 |
| 1 | 3 | 0 | 0 | 0 |
| 1 | 4 | 6 | 2 | 0 |
| 1 | 5 | 2 | 1 | 2 |
Or our primary question might be related to the effect of am, in which case we
might prefer to incude separate columns for US and non-US cars:
xtabs(~am+gear+cyl, mtcars) %>%
as_data_frame() %>%
reshape2::dcast(gear+cyl~paste0("US=", am)) %>%
pander()
Using n as value column: use value.var to override.
| gear | cyl | US=0 | US=1 |
|---|---|---|---|
| 3 | 4 | 1 | 0 |
| 3 | 6 | 2 | 0 |
| 3 | 8 | 12 | 0 |
| 4 | 4 | 2 | 6 |
| 4 | 6 | 2 | 2 |
| 4 | 8 | 0 | 0 |
| 5 | 4 | 0 | 2 |
| 5 | 6 | 0 | 1 |
| 5 | 8 | 0 | 2 |