‘Operators’

When selecting rows in the example above we used two equals signs == to select rows where cyl was exactly 6.

As you might guess, there are other ‘operators’ we can use to create filters.

Rather than describe them, the examples below demonstrate what each of them do.

Equality and matching

As above, to compare a single value we use ==

2 == 2
[1] TRUE

And in a filter:

filter(mtcars, cyl==6)
   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
7 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

You might have noted above that we write == rather than just = to define the criteria. This is because most programming languages, including R, use two = symbols to distinguish: comparison from assignment.

Presence/absence

To test if a value is in a vector of suitable matches we can use: %in%:

5 %in% 1:10
[1] TRUE

Or for an example which is not true:

100 %in% 1:10
[1] FALSE

Perhaps less obviously, we can test whether each value in a vector is in a second vector.

This returns a vector of TRUE/FALSE values as long as the first list:

c(1, 2) %in% c(2, 3, 4)
[1] FALSE  TRUE

This is very useful in a dataframe filter:

head(filter(mtcars, cyl %in% c(4, 6)))
   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
5 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
6 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2

Here we selected all rows where cyl matched either 4 or 6. That is, where the value of cyl was ‘in’ the vector c(4,6).

Greater/less than

The < and > symbols work as you’d expect:

head(filter(mtcars, cyl > 4))
head(filter(mtcars, cyl < 5))

You can also use >= and <=:

filter(mtcars, cyl >= 6)
filter(mtcars, cyl <= 4)
Negation (opposite of)

The ! is very useful to tell R to reverse an expression; that is, take the opposite of the value. In the simplest example:

!TRUE
[1] FALSE

This is helpful because we can reverse the meaning of other expressions:

is.na(NA)
[1] TRUE
!is.na(NA)
[1] FALSE

And we can use in dplyr filters.

Here we select rows where Ozone is missing (NA):

filter(airquality, is.na(Ozone))

And here we use ! to reverse the expression and select rows which are not missing:

filter(airquality, !is.na(Ozone))

Try running these commands for yourself and experiment with changing the operators to make select different combinations of rows

Other logical operators

There are operators for ‘and’/‘or’ which can combine other filters.

Using & (and) with two condtions makes the filter more restrictive:

filter(mtcars, hp > 200 & wt > 4)
   mpg cyl disp  hp drat    wt  qsec vs am gear carb
1 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
2 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
3 14.7   8  440 230 3.23 5.345 17.42  0  0    3    4

In contrast, the pipe symbol, |, means ‘or’, so we match more rows:

filter(mtcars, hp > 200 | wt > 4)
   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
2 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
3 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
4 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
5 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
6 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
7 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
8 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

Finally, you can set the order in which operators are applied by using parentheses. This means these expressions are subtly different:

# first
filter(mtcars, (hp > 200 & wt > 4) | cyl==8)
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
2  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
3  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
4  17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
5  15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
6  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
7  10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
8  14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
9  15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
10 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
11 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
12 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
13 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
14 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

# second reordered evaluation
filter(mtcars, hp > 200 & (wt > 4 | cyl==8))
   mpg cyl disp  hp drat    wt  qsec vs am gear carb
1 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
2 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
3 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
4 14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
5 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
6 15.8   8  351 264 4.22 3.170 14.50  0  1    5    4
7 15.0   8  301 335 3.54 3.570 14.60  0  1    5    8

Try writing in plain English the meaning of the two filter expressions above