‘Operators’
When selecting rows in the example above we used two equals
signs ==
to select rows where cyl
was exactly 6
.
As you might guess, there are other ‘operators’ we can use to create filters.
Rather than describe them, the examples below demonstrate what each of them do.
Equality and matching
As above, to compare a single value we use ==
2 == 2
[1] TRUE
And in a filter:
filter(mtcars, cyl==6)
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
4 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
5 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
6 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
7 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
You might have noted above that we write ==
rather than just =
to define the
criteria. This is because most programming languages, including R, use two =
symbols to distinguish: comparison from assignment.
Presence/absence
To test if a value is in a vector of suitable matches we can use: %in%
:
5 %in% 1:10
[1] TRUE
Or for an example which is not true:
100 %in% 1:10
[1] FALSE
Perhaps less obviously, we can test whether each value in a vector is in a second vector.
This returns a vector of TRUE/FALSE
values as long as the first list:
c(1, 2) %in% c(2, 3, 4)
[1] FALSE TRUE
This is very useful in a dataframe filter:
head(filter(mtcars, cyl %in% c(4, 6)))
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
6 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Here we selected all rows where cyl
matched either 4
or 6
. That is, where
the value of cyl
was ‘in’ the vector c(4,6)
.
Greater/less than
The <
and >
symbols work as you’d expect:
head(filter(mtcars, cyl > 4))
head(filter(mtcars, cyl < 5))
You can also use >=
and <=
:
filter(mtcars, cyl >= 6)
filter(mtcars, cyl <= 4)
Negation (opposite of)
The !
is very useful to tell R to reverse an expression; that is, take the
opposite of the value. In the simplest example:
!TRUE
[1] FALSE
This is helpful because we can reverse the meaning of other expressions:
is.na(NA)
[1] TRUE
!is.na(NA)
[1] FALSE
And we can use in dplyr filters.
Here we select rows where Ozone
is missing (NA
):
filter(airquality, is.na(Ozone))
And here we use !
to reverse the expression and select rows which are not
missing:
filter(airquality, !is.na(Ozone))
Try running these commands for yourself and experiment with changing the operators to make select different combinations of rows
Other logical operators
There are operators for ‘and’/‘or’ which can combine other filters.
Using &
(and) with two condtions makes the filter more restrictive:
filter(mtcars, hp > 200 & wt > 4)
mpg cyl disp hp drat wt qsec vs am gear carb
1 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
2 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
3 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
In contrast, the pipe symbol, |
, means ‘or’, so we match more rows:
filter(mtcars, hp > 200 | wt > 4)
mpg cyl disp hp drat wt qsec vs am gear carb
1 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
2 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
3 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
4 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
5 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
6 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
7 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
8 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Finally, you can set the order in which operators are applied by using parentheses. This means these expressions are subtly different:
# first
filter(mtcars, (hp > 200 & wt > 4) | cyl==8)
mpg cyl disp hp drat wt qsec vs am gear carb
1 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
2 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
3 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
4 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
5 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
6 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
7 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
8 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
9 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
10 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
11 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
12 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
13 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
14 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
# second reordered evaluation
filter(mtcars, hp > 200 & (wt > 4 | cyl==8))
mpg cyl disp hp drat wt qsec vs am gear carb
1 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
2 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
3 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
4 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
5 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
6 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
7 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
Try writing in plain English the meaning of the two filter expressions above