Vectors and lists
When working with data, we often have lists or sequences of ‘things’. For example: a list of measurements we have made.
When all the things are of the same type, R calls this a vector2.
When there is a mix of different things R calls this a list.
Vectors
We can create a vector of numbers and display it like this:
# this creates a vector of heights, in cm
heights <- c(203, 148, 156, 158, 167,
162, 172, 164, 172, 187,
134, 182, 175)
The c()
command is shorthand for combine, so the example above combines the
individual elements (numbers) into a new vector.
We can create a vector of alphanumeric names just as easily:
names <- c("Ben", "Joe", "Sue", "Rosa")
And we can check the values stored in these variables by printing them. You can
either type print(heights)
, or just write the name of the variable alone,
which will print it by default. E.g.:
heights
[1] 203 148 156 158 167 162 172 164 172 187 134 182 175
Try creating your own vector of numbers in a new code block below3 using the c(...)
command. Then change the name of the
variable you assign it to.
Accessing elements
Once we have created a vector, we often want to access the individual elements again. We do this based on their position.
Let’s say we have created a vector:
my.vector <- c(10, 20, 30, 40)
We can display the whole vector by just typing its name, as we saw above. But if we want to show only the first element of this vector, we type:
my.vector[1]
[1] 10
Here, the square brackets specify a subset of the vector we want - in this case, just the first element.
Selecting more than one element
A neat feature of subsetting is that we can grab more than one element at a time.
To do this, we need to tell R the positions of the elements we want, and so we provide a vector of the positions of the elements we want.
It might seem obvious, but the first element has position 1, the second has position 2, and so on. So, if we wanted to extract the 4th and 5th elements from the vector of heights we saw above we would type:
elements.to.grab <- c(4, 5)
heights[elements.to.grab]
[1] 158 167
We can also make a subset of the original vector and assign it to a new variable:
first.two.elements <- heights[c(1, 2)]
first.two.elements
[1] 203 148
Making and slicing with sequences
One common task in R is to create sequences of numbers, letters or dates.
The simplest way of doing this is to define a range, with the colon:
onetoten <- 1:10
onetoten
[1] 1 2 3 4 5 6 7 8 9 10
This creates a vector which can be sliced like any other:
onetoten[8]
[1] 8
One common use of sequences is to slice other vectors:
onetoten[1:3]
[1] 1 2 3
Or the first 10 values in the heights
vector we defined above:
heights[1:10]
[1] 203 148 156 158 167 162 172 164 172 187
This works backwards, and with negative numbers too:
5:-5
[1] 5 4 3 2 1 0 -1 -2 -3 -4 -5
When your sequence doesn’t contain only whole numbers, or non-consecutive
numbers, you can use the seq
function:
seq(1,10,by=2)
[1] 1 3 5 7 9
seq(0, 1, by=.2)
[1] 0.0 0.2 0.4 0.6 0.8 1.0
Conditional slicing
One neat feature of R is that you can create a sequence of TRUE
or FALSE
values, by asking whether each value in a sequence matches a particular
condition. For example:
1:10 > 5
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
Re-using the heights vector from above, we can then use this to select values that are above the average:
heights > mean(heights)
[1] TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE
[12] TRUE TRUE
And we can use the vector of TRUE
and FALSE
values to select from the actual
scores:
heights[heights > mean(heights)]
[1] 203 172 172 187 182 175