Working with character strings

18.0.1 Searching and replacing

If you want to search inside a string there are lots of useful functions in the stringr:: library. These replicate some functionality in base R, but like other packages in the ‘tidyverse’ they tend to be more consistent and easier to use. For example:

cheese <- c("Stilton", "Brie", "Cheddar")
stringr::str_detect(cheese, "Br")
[1] FALSE  TRUE FALSE
stringr::str_locate(cheese, "i")
     start end
[1,]     3   3
[2,]     3   3
[3,]    NA  NA
stringr::str_replace(cheese, "Stil", "Mil")
[1] "Milton"  "Brie"    "Cheddar"

Using paste to make labels

Paste can combine character strings with other types of variable to produce a new vector:

paste(mtcars$cyl, "cylinders")[1:10]
 [1] "6 cylinders" "6 cylinders" "4 cylinders" "6 cylinders" "8 cylinders"
 [6] "6 cylinders" "8 cylinders" "4 cylinders" "4 cylinders" "6 cylinders"

Which can be a useful way to label graphs:

mtcars %>%
  ggplot(aes(paste(mtcars$cyl, "cylinders"), mpg)) +
  geom_boxplot() + xlab("")

Fixing up variable after melting

In this example melt() creates a new column called variable.

sleep.wide %>%
  melt(id.var="Subject") %>%
  arrange(Subject, variable)  %>%
  head
  Subject variable    value
1       1    Day.0 249.5600
2       1    Day.1 258.7047
3       1    Day.2 250.8006
4       1    Day.3 321.4398
5       1    Day.4 356.8519
6       1    Day.5 414.6901

However the contents of variable are now a character string (i.e. a list of letters and numbers) rather than numeric values (see column types) but in this instance we know that the values Day.1, Day.2… are not really separate categories but actually form a linear sequence, from 1 to 9.

We can use the extract or separate functions to split up variable and create a numeric column for Day:

sleep.long %>%
  separate(variable, c("variable", "Day")) %>%
  mutate(Day=as.numeric(Day)) %>%
  arrange(Subject) %>%
  head %>% pander
Subject variable Day value
1 Day 0 249.6
1 Day 1 258.7
1 Day 2 250.8
1 Day 3 321.4
1 Day 4 356.9
1 Day 5 414.7

See the user guide for separate and extract for more details.

If you are familiar with regular expressions you will be happy to know that you can use regex to separate variables using extract and separate. See this guide for more details on how separate and extract work