“Table 1”
Table 1 in reports of clinical trials and many psychological studies reports characteristics of the sample. Typically, you will want to present information collected at baseline, split by experimental groups, including:
- Means, standard deviations or other descriptive statistics for continuous variables
- Frequencies of particular responses for categorical variables
- Some kind of inferential test for a zero-difference between the groups; this could be a t-test, an F-statistic where there are more than 2 groups, or a chi-squared test for categorical variables.
Producing this table is a pain because it requires collating multiple statistics, calculated from different functions. Many researchers resort to performing all the analyses required for each part of the table, and then copying-and-pasting results into Word.
It can be automated though! This example combines and extends many of the techniques we have learned using the split-apply-combine method.
To begin, let’s simulate some data from a fairly standard 2-arm clinical trial or psychological experiment:
Check our data:
boring.study %>% glimpse
Observations: 280
Variables: 8
$ person <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
$ time <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ condition <fct> Control, Control, Control, Control, Control, Control, …
$ yob <dbl> 1982, 1979, 1981, 1978, 1969, 1975, 1979, 1974, 1977, …
$ WM <dbl> 102, 96, 100, 102, 101, 85, 94, 113, 107, 79, 114, 118…
$ education <chr> "Secondary", "Graduate", "Graduate", "Graduate", "Prim…
$ ethnicity <chr> "White British", "White British", "Black / African / C…
$ Attitude <dbl> 6, 11, 6, 4, 1, 8, 7, 8, 7, 7, 11, 5, 6, 7, 7, 8, 10, …
Start by making a long-form table for the categorical variables:
boring.study.categorical.melted <-
table1.categorical.Ns <- boring.study %>%
select(condition, education, ethnicity) %>%
melt(id.var='condition')
Then calculate the N’s for each response/variable in each group:
(table1.categorical.Ns <-
boring.study.categorical.melted %>%
group_by(condition, variable, value) %>%
summarise(N=n()) %>%
dcast(variable+value~condition, value.var="N"))
variable value Control
1 education Graduate 24
2 education Postgraduate 26
3 education Primary 28
4 education Secondary 31
5 education <NA> 31
6 ethnicity Asian / Asian British 32
7 ethnicity Black / African / Caribbean / Black British 36
8 ethnicity Mixed / multiple ethnic groups 40
9 ethnicity White British 32
Intervention
1 34
2 28
3 31
4 23
5 24
6 28
7 36
8 41
9 35
Then make a second table containing Chi2 test statistics for each variable:
(table1.categorical.tests <-
boring.study.categorical.melted %>%
group_by(variable) %>%
do(., chisq.test(.$value, .$condition) %>% tidy) %>%
# this purely to facilitate matching rows up below
mutate(firstrowforvar=T))
# A tibble: 2 x 6
# Groups: variable [2]
variable statistic p.value parameter method firstrowforvar
<fct> <dbl> <dbl> <int> <chr> <lgl>
1 education 2.92 0.404 3 Pearson's Chi-squar… TRUE
2 ethnicity 0.413 0.937 3 Pearson's Chi-squar… TRUE
Combine these together:
(table1.categorical.both <- table1.categorical.Ns %>%
group_by(variable) %>%
# we join on firstrowforvar to make sure we don't duplicate the tests
mutate(firstrowforvar=row_number()==1) %>%
left_join(., table1.categorical.tests, by=c("variable", "firstrowforvar")) %>%
# this is gross, but we don't want to repeat the variable names in our table
ungroup() %>%
mutate(variable = ifelse(firstrowforvar==T, as.character(variable), NA)) %>%
select(variable, value, Control, Intervention, statistic, parameter, p.value))
# A tibble: 9 x 7
variable value Control Intervention statistic parameter p.value
<chr> <chr> <int> <int> <dbl> <int> <dbl>
1 education Graduate 24 34 2.92 3 0.404
2 <NA> Postgraduate 26 28 NA NA NA
3 <NA> Primary 28 31 NA NA NA
4 <NA> Secondary 31 23 NA NA NA
5 <NA> <NA> 31 24 NA NA NA
6 ethnicity Asian / Asian… 32 28 0.413 3 0.937
7 <NA> Black / Afric… 36 36 NA NA NA
8 <NA> Mixed / multi… 40 41 NA NA NA
9 <NA> White British 32 35 NA NA NA
Now we deal with the continuous variables. First we make a ‘long’ version of the continuous data
continuous_variables <- c("yob", "WM")
boring.continuous.melted <-
boring.study %>%
select(condition, continuous_variables) %>%
melt() %>%
group_by(variable)
Using condition as id variables
boring.continuous.melted %>% head
# A tibble: 6 x 3
# Groups: variable [1]
condition variable value
<fct> <fct> <dbl>
1 Control yob 1982
2 Control yob 1979
3 Control yob 1981
4 Control yob 1978
5 Control yob 1969
6 Control yob 1975
Then calculate separate tables of t-tests and means/SD’s:
(table.continuous_variables.tests <-
boring.continuous.melted %>%
# note that we pass the result of t-test to tidy, which returns a dataframe
do(., t.test(.$value~.$condition) %>% tidy) %>%
select(variable, statistic, parameter, p.value))
# A tibble: 2 x 4
# Groups: variable [2]
variable statistic parameter p.value
<fct> <dbl> <dbl> <dbl>
1 yob -1.07 269. 0.285
2 WM -0.455 276. 0.649
(table.continuous_variables.descriptives <-
boring.continuous.melted %>%
group_by(variable, condition) %>%
# this is not needed here because we have no missing values, but if there
# were missing value in this dataset then mean/sd functions would fail below,
# so best to remove rows without a response:
filter(!is.na(value)) %>%
# note, we might also want the median/IQR
summarise(Mean=mean(value), SD=sd(value)) %>%
group_by(variable, condition) %>%
# we format the mean and SD into a single column using sprintf.
# we don't have to do this, but it makes reshaping simpler and we probably want
# to round the numbers at some point, and so may as well do this now.
transmute(MSD = sprintf("%.2f (%.2f)", Mean, SD)) %>%
dcast(variable~condition))
Using MSD as value column: use value.var to override.
variable Control Intervention
1 yob 1979.31 (5.58) 1979.97 (4.63)
2 WM 99.37 (10.00) 99.94 (10.99)
And combine them:
(table.continuous_variables.both <-
left_join(table.continuous_variables.descriptives,
table.continuous_variables.tests))
Joining, by = "variable"
variable Control Intervention statistic parameter p.value
1 yob 1979.31 (5.58) 1979.97 (4.63) -1.0714551 268.8780 0.2849256
2 WM 99.37 (10.00) 99.94 (10.99) -0.4549637 275.5261 0.6494937
Finally put the whole thing together:
(table1 <- table1.categorical.both %>%
# make these variables into character format to be consistent with
# the Mean (SD) column for continuus variables
mutate_each(funs(format), Control, Intervention) %>%
# note the '.' as the first argument, which is the input from the pipe
bind_rows(.,
table.continuous_variables.both) %>%
# prettify a few things
rename(df = parameter,
p=p.value,
`Control N/Mean (SD)`= Control,
Variable=variable,
Response=value,
`t/χ2` = statistic))
Warning: funs() is soft deprecated as of dplyr 0.8.0
please use list() instead
# Before:
funs(name = f(.))
# After:
list(name = ~ f(.))
This warning is displayed once per session.
Warning in bind_rows_(x, .id): binding character and factor vector,
coercing into character vector
# A tibble: 11 x 7
Variable Response `Control N/Mean… Intervention `t/χ2` df p
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 education Graduate 24 34 2.92 3 0.404
2 <NA> Postgraduate 26 28 NA NA NA
3 <NA> Primary 28 31 NA NA NA
4 <NA> Secondary 31 23 NA NA NA
5 <NA> <NA> 31 24 NA NA NA
6 ethnicity Asian / Asi… 32 28 0.413 3 0.937
7 <NA> Black / Afr… 36 36 NA NA NA
8 <NA> Mixed / mul… 40 41 NA NA NA
9 <NA> White Briti… 32 35 NA NA NA
10 yob <NA> 1979.31 (5.58) 1979.97 (4.… -1.07 269. 0.285
11 WM <NA> 99.37 (10.00) 99.94 (10.9… -0.455 276. 0.649
And we can print to markdown format for outputting. This is best done in a separate chunk to avoid warnings/messages appearing in the final document.
table1 %>%
# split.tables argument needed to avoid the table wrapping
pander(split.tables=Inf,
missing="-",
justify=c("left", "left", rep("center", 5)),
caption='Table presenting baseline differences between conditions. Categorical variables tested with Pearson χ2, continuous variables with two-sample t-test.')
Variable | Response | Control N/Mean (SD) | Intervention | t/χ2 | df | p |
---|---|---|---|---|---|---|
education | Graduate | 24 | 34 | 2.921 | 3 | 0.404 |
- | Postgraduate | 26 | 28 | - | - | - |
- | Primary | 28 | 31 | - | - | - |
- | Secondary | 31 | 23 | - | - | - |
- | - | 31 | 24 | - | - | - |
ethnicity | Asian / Asian British | 32 | 28 | 0.4133 | 3 | 0.9375 |
- | Black / African / Caribbean / Black British | 36 | 36 | - | - | - |
- | Mixed / multiple ethnic groups | 40 | 41 | - | - | - |
- | White British | 32 | 35 | - | - | - |
yob | - | 1979.31 (5.58) | 1979.97 (4.63) | -1.071 | 268.9 | 0.2849 |
WM | - | 99.37 (10.00) | 99.94 (10.99) | -0.455 | 275.5 | 0.6495 |
Some exercises to work on/extensions to this code you might need:
- Add a new continuous variable to the simulated dataset and include it in the final table
- Create a third experimental group and amend the code to i) include 3 columns for the N/Mean and ii) report the F-test from a one-way Anova as the test statistic.
- Add the within-group percentage for each response to a categorical variable.