Hypothetical constructs
The term hypothetical construct dates back to a famous paper by MacCorquodale and Meehl (1948). In it they pointed out that many of the things that psychologists are interested in — for example intelligence — can’t be directly observed, and that we can’t make any single measurement that fully encapsulates the concept. For more details, see Chung and Hyland (2012), pages 164 to 168.
Reflective and formative measurement
In the section above we said most psychologists assume that the observations we make (what we see) have a hidden cause. Often researchers use the phrase ‘latent variables’ for these hidden causes. This means we don’t measure constructs directly - we measure their consequences.
Some researchers have claimed, contrary to this view, that measurement should work the other way around. To give a concrete example, it has been claimed that ‘intelligence is what intellingence tests measure’ (Boring 1923). The argument is that intelligence doesn’t really ‘exist’ in any meaningful way: Intelligence is just a label we use to describe performance on the tests we use. This is an example of a ‘formative’ measurement model:
This disagreement is part of a long-running debate in psychology, but which is largely now settled in favour of reflective models: Most contemporary psychologists would agree that intelligence ‘exists’, and will form part of theories which guide their thinking and research. For an interesting discussion of this distinction see the chapter on neobehaviourism in Chung and Hyland (2012), page 158, and especially pages 164 to 168.
Three exclamation marks: !!!
Preface: Understanding this isn’t necessary to get on with the course. Only read this if you are interested!
In the code above when we used recode
we used three exclamation marks just before our list.
We defined the mapping:
c(
likert.responses <-"I hate them" = 1,
"I don't like them" = 2,
"I'm neutral" = 3,
"I like them" = 4,
"I can't live without them" = 5)
And then used it with recode
, with the three exclamation marks.
%>%
liking_of_sweets_data mutate(like_sweets_numeric = recode(like_sweets_text, !!!likert.responses))
The reason for this is that recode
actually expects us to specify the mapping for it like this:
%>%
liking_of_sweets_data mutate(like_sweets_numeric = recode(like_sweets_text,
"I hate them" = 1,
"I don't like them" = 2 ...))
But this means we have to repeat the mapping for each of the questions. Because all the questions use the same mapping this gets repetitive, and can lead to errors.
The three exclamation marks !!!
unpacks the list for us. So writing !!!likert.responses
saves us the bother of writing it out in full each time.
Use CSV and open formats
If you are committed to doing open science you should avoid proprietary formats like word or Excel for a number of reasons. Using open formats means:
It’s easier for you to work with the data using R and similar tools which are optimised around open formats.
Other researchers can read your data using the software they prefer, and they don’t have to buy software to use your data.
Also you might consider that:
Without access to the commercial software you won’t have access to your data without paying (this has happened to students of mine in the past).
Your data will still be usable by everyone many years in the future. This isn’t always the case with commercial formats because companies often withdraw support for old version. Librarians and archivists are very concerned about the possibility that knowledge will be lost in this way (e.g. see https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-formats)
More ways to combine data
As well as inner_join
(as shown in the workshop), tidyverse has other types of join to combine two dataframes.
In particular full_join
and inner_join
can be useful. They are explained in the tidyverse documentation as follows:
inner_join(x, y)
: return all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned.
left_join(x, y)
: return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned.
full_join(x,y)
: return all rows and all columns from both x and y. Where there are not matching values, returns NA for the one missing.
If you still have time, there are also anti_join
and semi_join
to play with.
Most of the time left_join
is likely to be what you need when you have a complete set of identifier records (e.g. participant information) and an incomplete set of survey data.
Exporting plots
You can save plots as in image file (for this module, use png format) suitable for uploading to PsychEL, using the ggsave
function.
%>%
mtcars ggplot(aes(wt, mpg)) +
geom_smooth() +
xlab("Weight (1000 lbs)") +
ylab("Fuel economy (mpg)")
ggsave("media/fuel-economy-plot.png", width=4, height=3)
The ggsave
function saves the most recent plot, so make sure it is included directly below your call to ggplot
.
If you are submitting graphics to a journal would like to save a high quality vector graphics version then use .pdf
or .eps
as the file extension:
ggsave("fuel-economy-plot.pdf")
Why? This page explains the difference between vector and bitmap graphics and why you should care. The short answer is that vector graphics don’t look fuzzy when enlarged, and often produce smaller file sizes.
‘Reverse coding’
Sometimes questionnaire items can be worded ‘backwards’. For example both these questions might be used to measure the same underlying construct or hidden cause.
- I really hate parties
- I like being alone
However, if we used the same response scale for each participants’ responses would cancel out.
When recoding these data, we need to create create two different mappings: One for questions where an “Agree” response indicates a positive study habit, and a separate mapping for questions where a “Disagree” response is a good habit.
For example, we might use these two mappings:
c(
likert <-"Strongly disagree" = 1,
"Disagree" = 2,
"Neutral" = 3,
"Agree" = 4,
"Strongly agree" = 5
)
c(
likert.reversed <-"Strongly disagree" = 5,
"Disagree" = 4,
"Neutral" = 3,
"Agree" = 2,
"Strongly agree" = 1
)
And we would use the correct mapping with each column when we use recode
:
%>% mutate(
sillydata hate_parties_numeric = recode(hate_parties, !!!likert),
like_alone_numeric = recode(like_alone_numeric, !!!likert.reversed)
)
References
Boring, Edwin G. 1923. “Intelligence as the Tests Test It.” New Republic, 35–37.
Chung, Man Cheung, and Michael E. Hyland. 2012. History and Philosophy of Psychology. John Wiley & Sons.
MacCorquodale, Kenneth, and Paul E. Meehl. 1948. “On a Distinction Between Hypothetical Constructs and Intervening Variables.” Psychological Review 55 (2): 95–107. https://doi.org/10.1037/h0056029.