Chapter 2 Data manipulation using dplyr

Trust me, this is the part of my research where I spend a significant portion of my time. Real-life data are not polished and nicely annotated. Moreover, when you want to integrate data from different sources, the fun begins (I am showing the quotation finger, of course)! Moreover, you need to format the output from one process and make it worthy for the next one. So, there’s no escape from formatting / manipulating data in real-life.

Here, we will be using the dplyr package which is one of the most powerful and popular packages in R. The d here stands for data and plyr is supposed to be the tool plier. Therefore, dplyr packages refers to a tool to manipulate data(-frame). dplyr provides a grammar of data manipulation and the functions it provides are regarded as the verbs in the code and are very efficient ones in solving most common data manipulation problems. It is sometimes arguably more efficient than the base R operations.

2.1 Install

There are mainly two ways to install dplyr package in R. You can install the tidyverse package and dplyr, being a part of it, will automatically be installed in your R environment.

install.packages("tidyverse")

Or, you can install just the dplyr package by -

install.packages("dplyr")

However, if you want to install the development version, which I won’t recommend at this stage, you can follow the codes below -

if (packageVersion("devtools") < 1.6) {
  install.packages("devtools")
}
devtools::install_github("hadley/lazyeval")
devtools::install_github("hadley/dplyr")

And, now load it …

library(dplyr)

2.2 Pipe operator %>%

It will be a crime not to introduce the pipe operator %>% to you before starting with dplyr verbs. If you are familiar with the pipe operator | in bash scripting, that’s it. I have no better way to describe it to you. But, if you are not, then here is the thing for you -

The pipe operator %>% connects two operations on the same data (be it a vector or a data-frame). It passes the output from the left-hand side operation of it as the first argument to the right-hand side operation. If you want a formal definition: x %>% f(y) is converted into f(x,y) by using the pipe operator.

Let’s look at a example. Say, we have a vector x that holds value from 1 to 100 and we want to calculate the mean of x and make it round to an integer, we write in base R -

x <- 1:100
round(mean(x))
## [1] 50

On the other hand, using the pipe operator, we can first define the x and then calculate the mean and, at the end, round it to an integer, like -

x <- 1:100
x %>% mean %>% round
## [1] 50

It goes from left to right as we think and build our data analysis pipeline. The new version of dplyr also supports |> as the pipe operator, but I will stick to %>% in the workshop.

2.3 dplyr verbs

There are many verbs embedded in the dplyr package. Here I will be discussing a few (but very important ones) that you will need to resolve most of the data manipulation challenges in your day-to-day life.

2.3.1 select()

select() picks variables based on their names or types. For example -

# using specific variable names -
iris %>% 
  select(Sepal.Length, Sepal.Width) 
(#tab:select1_kable1)iris data: Sepal length and width
Sepal.Length Sepal.Width
5.1 3.5
4.9 3.0
4.7 3.2
4.6 3.1
5.0 3.6
5.4 3.9
4.6 3.4
5.0 3.4
4.4 2.9
4.9 3.1
5.4 3.7
4.8 3.4
4.8 3.0
4.3 3.0
5.8 4.0
5.7 4.4
5.4 3.9
5.1 3.5
5.7 3.8
5.1 3.8
5.4 3.4
5.1 3.7
4.6 3.6
5.1 3.3
4.8 3.4
5.0 3.0
5.0 3.4
5.2 3.5
5.2 3.4
4.7 3.2
4.8 3.1
5.4 3.4
5.2 4.1
5.5 4.2
4.9 3.1
5.0 3.2
5.5 3.5
4.9 3.6
4.4 3.0
5.1 3.4
5.0 3.5
4.5 2.3
4.4 3.2
5.0 3.5
5.1 3.8
4.8 3.0
5.1 3.8
4.6 3.2
5.3 3.7
5.0 3.3
7.0 3.2
6.4 3.2
6.9 3.1
5.5 2.3
6.5 2.8
5.7 2.8
6.3 3.3
4.9 2.4
6.6 2.9
5.2 2.7
5.0 2.0
5.9 3.0
6.0 2.2
6.1 2.9
5.6 2.9
6.7 3.1
5.6 3.0
5.8 2.7
6.2 2.2
5.6 2.5
5.9 3.2
6.1 2.8
6.3 2.5
6.1 2.8
6.4 2.9
6.6 3.0
6.8 2.8
6.7 3.0
6.0 2.9
5.7 2.6
5.5 2.4
5.5 2.4
5.8 2.7
6.0 2.7
5.4 3.0
6.0 3.4
6.7 3.1
6.3 2.3
5.6 3.0
5.5 2.5
5.5 2.6
6.1 3.0
5.8 2.6
5.0 2.3
5.6 2.7
5.7 3.0
5.7 2.9
6.2 2.9
5.1 2.5
5.7 2.8
6.3 3.3
5.8 2.7
7.1 3.0
6.3 2.9
6.5 3.0
7.6 3.0
4.9 2.5
7.3 2.9
6.7 2.5
7.2 3.6
6.5 3.2
6.4 2.7
6.8 3.0
5.7 2.5
5.8 2.8
6.4 3.2
6.5 3.0
7.7 3.8
7.7 2.6
6.0 2.2
6.9 3.2
5.6 2.8
7.7 2.8
6.3 2.7
6.7 3.3
7.2 3.2
6.2 2.8
6.1 3.0
6.4 2.8
7.2 3.0
7.4 2.8
7.9 3.8
6.4 2.8
6.3 2.8
6.1 2.6
7.7 3.0
6.3 3.4
6.4 3.1
6.0 3.0
6.9 3.1
6.7 3.1
6.9 3.1
5.8 2.7
6.8 3.2
6.7 3.3
6.7 3.0
6.3 2.5
6.5 3.0
6.2 3.4
5.9 3.0
# using type -
iris %>% 
  select(is.numeric)
(#tab:select1_kable2)iris data: neumeric columns only
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2
5.4 3.9 1.7 0.4
4.6 3.4 1.4 0.3
5.0 3.4 1.5 0.2
4.4 2.9 1.4 0.2
4.9 3.1 1.5 0.1
5.4 3.7 1.5 0.2
4.8 3.4 1.6 0.2
4.8 3.0 1.4 0.1
4.3 3.0 1.1 0.1
5.8 4.0 1.2 0.2
5.7 4.4 1.5 0.4
5.4 3.9 1.3 0.4
5.1 3.5 1.4 0.3
5.7 3.8 1.7 0.3
5.1 3.8 1.5 0.3
5.4 3.4 1.7 0.2
5.1 3.7 1.5 0.4
4.6 3.6 1.0 0.2
5.1 3.3 1.7 0.5
4.8 3.4 1.9 0.2
5.0 3.0 1.6 0.2
5.0 3.4 1.6 0.4
5.2 3.5 1.5 0.2
5.2 3.4 1.4 0.2
4.7 3.2 1.6 0.2
4.8 3.1 1.6 0.2
5.4 3.4 1.5 0.4
5.2 4.1 1.5 0.1
5.5 4.2 1.4 0.2
4.9 3.1 1.5 0.2
5.0 3.2 1.2 0.2
5.5 3.5 1.3 0.2
4.9 3.6 1.4 0.1
4.4 3.0 1.3 0.2
5.1 3.4 1.5 0.2
5.0 3.5 1.3 0.3
4.5 2.3 1.3 0.3
4.4 3.2 1.3 0.2
5.0 3.5 1.6 0.6
5.1 3.8 1.9 0.4
4.8 3.0 1.4 0.3
5.1 3.8 1.6 0.2
4.6 3.2 1.4 0.2
5.3 3.7 1.5 0.2
5.0 3.3 1.4 0.2
7.0 3.2 4.7 1.4
6.4 3.2 4.5 1.5
6.9 3.1 4.9 1.5
5.5 2.3 4.0 1.3
6.5 2.8 4.6 1.5
5.7 2.8 4.5 1.3
6.3 3.3 4.7 1.6
4.9 2.4 3.3 1.0
6.6 2.9 4.6 1.3
5.2 2.7 3.9 1.4
5.0 2.0 3.5 1.0
5.9 3.0 4.2 1.5
6.0 2.2 4.0 1.0
6.1 2.9 4.7 1.4
5.6 2.9 3.6 1.3
6.7 3.1 4.4 1.4
5.6 3.0 4.5 1.5
5.8 2.7 4.1 1.0
6.2 2.2 4.5 1.5
5.6 2.5 3.9 1.1
5.9 3.2 4.8 1.8
6.1 2.8 4.0 1.3
6.3 2.5 4.9 1.5
6.1 2.8 4.7 1.2
6.4 2.9 4.3 1.3
6.6 3.0 4.4 1.4
6.8 2.8 4.8 1.4
6.7 3.0 5.0 1.7
6.0 2.9 4.5 1.5
5.7 2.6 3.5 1.0
5.5 2.4 3.8 1.1
5.5 2.4 3.7 1.0
5.8 2.7 3.9 1.2
6.0 2.7 5.1 1.6
5.4 3.0 4.5 1.5
6.0 3.4 4.5 1.6
6.7 3.1 4.7 1.5
6.3 2.3 4.4 1.3
5.6 3.0 4.1 1.3
5.5 2.5 4.0 1.3
5.5 2.6 4.4 1.2
6.1 3.0 4.6 1.4
5.8 2.6 4.0 1.2
5.0 2.3 3.3 1.0
5.6 2.7 4.2 1.3
5.7 3.0 4.2 1.2
5.7 2.9 4.2 1.3
6.2 2.9 4.3 1.3
5.1 2.5 3.0 1.1
5.7 2.8 4.1 1.3
6.3 3.3 6.0 2.5
5.8 2.7 5.1 1.9
7.1 3.0 5.9 2.1
6.3 2.9 5.6 1.8
6.5 3.0 5.8 2.2
7.6 3.0 6.6 2.1
4.9 2.5 4.5 1.7
7.3 2.9 6.3 1.8
6.7 2.5 5.8 1.8
7.2 3.6 6.1 2.5
6.5 3.2 5.1 2.0
6.4 2.7 5.3 1.9
6.8 3.0 5.5 2.1
5.7 2.5 5.0 2.0
5.8 2.8 5.1 2.4
6.4 3.2 5.3 2.3
6.5 3.0 5.5 1.8
7.7 3.8 6.7 2.2
7.7 2.6 6.9 2.3
6.0 2.2 5.0 1.5
6.9 3.2 5.7 2.3
5.6 2.8 4.9 2.0
7.7 2.8 6.7 2.0
6.3 2.7 4.9 1.8
6.7 3.3 5.7 2.1
7.2 3.2 6.0 1.8
6.2 2.8 4.8 1.8
6.1 3.0 4.9 1.8
6.4 2.8 5.6 2.1
7.2 3.0 5.8 1.6
7.4 2.8 6.1 1.9
7.9 3.8 6.4 2.0
6.4 2.8 5.6 2.2
6.3 2.8 5.1 1.5
6.1 2.6 5.6 1.4
7.7 3.0 6.1 2.3
6.3 3.4 5.6 2.4
6.4 3.1 5.5 1.8
6.0 3.0 4.8 1.8
6.9 3.1 5.4 2.1
6.7 3.1 5.6 2.4
6.9 3.1 5.1 2.3
5.8 2.7 5.1 1.9
6.8 3.2 5.9 2.3
6.7 3.3 5.7 2.5
6.7 3.0 5.2 2.3
6.3 2.5 5.0 1.9
6.5 3.0 5.2 2.0
6.2 3.4 5.4 2.3
5.9 3.0 5.1 1.8


With the verb select(), comes some selection helpers -

If you want to select all the variables, you can use everything()

iris %>% 
  select(everything())
(#tab:select2_kable)iris data: everything
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.1 3.5 1.4 0.3 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
4.8 3.4 1.9 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3.1 1.6 0.2 setosa
5.4 3.4 1.5 0.4 setosa
5.2 4.1 1.5 0.1 setosa
5.5 4.2 1.4 0.2 setosa
4.9 3.1 1.5 0.2 setosa
5.0 3.2 1.2 0.2 setosa
5.5 3.5 1.3 0.2 setosa
4.9 3.6 1.4 0.1 setosa
4.4 3.0 1.3 0.2 setosa
5.1 3.4 1.5 0.2 setosa
5.0 3.5 1.3 0.3 setosa
4.5 2.3 1.3 0.3 setosa
4.4 3.2 1.3 0.2 setosa
5.0 3.5 1.6 0.6 setosa
5.1 3.8 1.9 0.4 setosa
4.8 3.0 1.4 0.3 setosa
5.1 3.8 1.6 0.2 setosa
4.6 3.2 1.4 0.2 setosa
5.3 3.7 1.5 0.2 setosa
5.0 3.3 1.4 0.2 setosa
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
5.5 2.3 4.0 1.3 versicolor
6.5 2.8 4.6 1.5 versicolor
5.7 2.8 4.5 1.3 versicolor
6.3 3.3 4.7 1.6 versicolor
4.9 2.4 3.3 1.0 versicolor
6.6 2.9 4.6 1.3 versicolor
5.2 2.7 3.9 1.4 versicolor
5.0 2.0 3.5 1.0 versicolor
5.9 3.0 4.2 1.5 versicolor
6.0 2.2 4.0 1.0 versicolor
6.1 2.9 4.7 1.4 versicolor
5.6 2.9 3.6 1.3 versicolor
6.7 3.1 4.4 1.4 versicolor
5.6 3.0 4.5 1.5 versicolor
5.8 2.7 4.1 1.0 versicolor
6.2 2.2 4.5 1.5 versicolor
5.6 2.5 3.9 1.1 versicolor
5.9 3.2 4.8 1.8 versicolor
6.1 2.8 4.0 1.3 versicolor
6.3 2.5 4.9 1.5 versicolor
6.1 2.8 4.7 1.2 versicolor
6.4 2.9 4.3 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.8 2.8 4.8 1.4 versicolor
6.7 3.0 5.0 1.7 versicolor
6.0 2.9 4.5 1.5 versicolor
5.7 2.6 3.5 1.0 versicolor
5.5 2.4 3.8 1.1 versicolor
5.5 2.4 3.7 1.0 versicolor
5.8 2.7 3.9 1.2 versicolor
6.0 2.7 5.1 1.6 versicolor
5.4 3.0 4.5 1.5 versicolor
6.0 3.4 4.5 1.6 versicolor
6.7 3.1 4.7 1.5 versicolor
6.3 2.3 4.4 1.3 versicolor
5.6 3.0 4.1 1.3 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
6.1 3.0 4.6 1.4 versicolor
5.8 2.6 4.0 1.2 versicolor
5.0 2.3 3.3 1.0 versicolor
5.6 2.7 4.2 1.3 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 2.9 4.2 1.3 versicolor
6.2 2.9 4.3 1.3 versicolor
5.1 2.5 3.0 1.1 versicolor
5.7 2.8 4.1 1.3 versicolor
6.3 3.3 6.0 2.5 virginica
5.8 2.7 5.1 1.9 virginica
7.1 3.0 5.9 2.1 virginica
6.3 2.9 5.6 1.8 virginica
6.5 3.0 5.8 2.2 virginica
7.6 3.0 6.6 2.1 virginica
4.9 2.5 4.5 1.7 virginica
7.3 2.9 6.3 1.8 virginica
6.7 2.5 5.8 1.8 virginica
7.2 3.6 6.1 2.5 virginica
6.5 3.2 5.1 2.0 virginica
6.4 2.7 5.3 1.9 virginica
6.8 3.0 5.5 2.1 virginica
5.7 2.5 5.0 2.0 virginica
5.8 2.8 5.1 2.4 virginica
6.4 3.2 5.3 2.3 virginica
6.5 3.0 5.5 1.8 virginica
7.7 3.8 6.7 2.2 virginica
7.7 2.6 6.9 2.3 virginica
6.0 2.2 5.0 1.5 virginica
6.9 3.2 5.7 2.3 virginica
5.6 2.8 4.9 2.0 virginica
7.7 2.8 6.7 2.0 virginica
6.3 2.7 4.9 1.8 virginica
6.7 3.3 5.7 2.1 virginica
7.2 3.2 6.0 1.8 virginica
6.2 2.8 4.8 1.8 virginica
6.1 3.0 4.9 1.8 virginica
6.4 2.8 5.6 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.4 2.8 6.1 1.9 virginica
7.9 3.8 6.4 2.0 virginica
6.4 2.8 5.6 2.2 virginica
6.3 2.8 5.1 1.5 virginica
6.1 2.6 5.6 1.4 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.4 3.1 5.5 1.8 virginica
6.0 3.0 4.8 1.8 virginica
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica


You can choose the last column using last_col() or only columns that are grouped using group_cols() (You will understand better when I discuss the group_by() verb later).

# select the last column
iris %>% 
  select(last_col())
(#tab:select3_kable1)iris data: last_col()
Species
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
versicolor
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
virginica
# select the grouped column(s)
iris %>% 
  group_by(Sepal.Length,Sepal.Width) %>% 
  select(group_cols())
(#tab:select3_kable2)iris data: select grouped columns
Sepal.Length Sepal.Width
5.1 3.5
4.9 3.0
4.7 3.2
4.6 3.1
5.0 3.6
5.4 3.9
4.6 3.4
5.0 3.4
4.4 2.9
4.9 3.1
5.4 3.7
4.8 3.4
4.8 3.0
4.3 3.0
5.8 4.0
5.7 4.4
5.4 3.9
5.1 3.5
5.7 3.8
5.1 3.8
5.4 3.4
5.1 3.7
4.6 3.6
5.1 3.3
4.8 3.4
5.0 3.0
5.0 3.4
5.2 3.5
5.2 3.4
4.7 3.2
4.8 3.1
5.4 3.4
5.2 4.1
5.5 4.2
4.9 3.1
5.0 3.2
5.5 3.5
4.9 3.6
4.4 3.0
5.1 3.4
5.0 3.5
4.5 2.3
4.4 3.2
5.0 3.5
5.1 3.8
4.8 3.0
5.1 3.8
4.6 3.2
5.3 3.7
5.0 3.3
7.0 3.2
6.4 3.2
6.9 3.1
5.5 2.3
6.5 2.8
5.7 2.8
6.3 3.3
4.9 2.4
6.6 2.9
5.2 2.7
5.0 2.0
5.9 3.0
6.0 2.2
6.1 2.9
5.6 2.9
6.7 3.1
5.6 3.0
5.8 2.7
6.2 2.2
5.6 2.5
5.9 3.2
6.1 2.8
6.3 2.5
6.1 2.8
6.4 2.9
6.6 3.0
6.8 2.8
6.7 3.0
6.0 2.9
5.7 2.6
5.5 2.4
5.5 2.4
5.8 2.7
6.0 2.7
5.4 3.0
6.0 3.4
6.7 3.1
6.3 2.3
5.6 3.0
5.5 2.5
5.5 2.6
6.1 3.0
5.8 2.6
5.0 2.3
5.6 2.7
5.7 3.0
5.7 2.9
6.2 2.9
5.1 2.5
5.7 2.8
6.3 3.3
5.8 2.7
7.1 3.0
6.3 2.9
6.5 3.0
7.6 3.0
4.9 2.5
7.3 2.9
6.7 2.5
7.2 3.6
6.5 3.2
6.4 2.7
6.8 3.0
5.7 2.5
5.8 2.8
6.4 3.2
6.5 3.0
7.7 3.8
7.7 2.6
6.0 2.2
6.9 3.2
5.6 2.8
7.7 2.8
6.3 2.7
6.7 3.3
7.2 3.2
6.2 2.8
6.1 3.0
6.4 2.8
7.2 3.0
7.4 2.8
7.9 3.8
6.4 2.8
6.3 2.8
6.1 2.6
7.7 3.0
6.3 3.4
6.4 3.1
6.0 3.0
6.9 3.1
6.7 3.1
6.9 3.1
5.8 2.7
6.8 3.2
6.7 3.3
6.7 3.0
6.3 2.5
6.5 3.0
6.2 3.4
5.9 3.0


If there’s a common prefix or suffix to some column names, you can utilise that by using selection helpers starts_with() or ends_with(), respectively -

# starts_with()
iris %>% 
  select(starts_with("Sepal"))
(#tab:select4_kable1)iris data: columns starts with Sepal
Sepal.Length Sepal.Width
5.1 3.5
4.9 3.0
4.7 3.2
4.6 3.1
5.0 3.6
5.4 3.9
4.6 3.4
5.0 3.4
4.4 2.9
4.9 3.1
5.4 3.7
4.8 3.4
4.8 3.0
4.3 3.0
5.8 4.0
5.7 4.4
5.4 3.9
5.1 3.5
5.7 3.8
5.1 3.8
5.4 3.4
5.1 3.7
4.6 3.6
5.1 3.3
4.8 3.4
5.0 3.0
5.0 3.4
5.2 3.5
5.2 3.4
4.7 3.2
4.8 3.1
5.4 3.4
5.2 4.1
5.5 4.2
4.9 3.1
5.0 3.2
5.5 3.5
4.9 3.6
4.4 3.0
5.1 3.4
5.0 3.5
4.5 2.3
4.4 3.2
5.0 3.5
5.1 3.8
4.8 3.0
5.1 3.8
4.6 3.2
5.3 3.7
5.0 3.3
7.0 3.2
6.4 3.2
6.9 3.1
5.5 2.3
6.5 2.8
5.7 2.8
6.3 3.3
4.9 2.4
6.6 2.9
5.2 2.7
5.0 2.0
5.9 3.0
6.0 2.2
6.1 2.9
5.6 2.9
6.7 3.1
5.6 3.0
5.8 2.7
6.2 2.2
5.6 2.5
5.9 3.2
6.1 2.8
6.3 2.5
6.1 2.8
6.4 2.9
6.6 3.0
6.8 2.8
6.7 3.0
6.0 2.9
5.7 2.6
5.5 2.4
5.5 2.4
5.8 2.7
6.0 2.7
5.4 3.0
6.0 3.4
6.7 3.1
6.3 2.3
5.6 3.0
5.5 2.5
5.5 2.6
6.1 3.0
5.8 2.6
5.0 2.3
5.6 2.7
5.7 3.0
5.7 2.9
6.2 2.9
5.1 2.5
5.7 2.8
6.3 3.3
5.8 2.7
7.1 3.0
6.3 2.9
6.5 3.0
7.6 3.0
4.9 2.5
7.3 2.9
6.7 2.5
7.2 3.6
6.5 3.2
6.4 2.7
6.8 3.0
5.7 2.5
5.8 2.8
6.4 3.2
6.5 3.0
7.7 3.8
7.7 2.6
6.0 2.2
6.9 3.2
5.6 2.8
7.7 2.8
6.3 2.7
6.7 3.3
7.2 3.2
6.2 2.8
6.1 3.0
6.4 2.8
7.2 3.0
7.4 2.8
7.9 3.8
6.4 2.8
6.3 2.8
6.1 2.6
7.7 3.0
6.3 3.4
6.4 3.1
6.0 3.0
6.9 3.1
6.7 3.1
6.9 3.1
5.8 2.7
6.8 3.2
6.7 3.3
6.7 3.0
6.3 2.5
6.5 3.0
6.2 3.4
5.9 3.0
# ends_with()
iris %>% 
  select(ends_with("Length"))
(#tab:select4_kable2)iris data: columns ends with Length
Sepal.Length Petal.Length
5.1 1.4
4.9 1.4
4.7 1.3
4.6 1.5
5.0 1.4
5.4 1.7
4.6 1.4
5.0 1.5
4.4 1.4
4.9 1.5
5.4 1.5
4.8 1.6
4.8 1.4
4.3 1.1
5.8 1.2
5.7 1.5
5.4 1.3
5.1 1.4
5.7 1.7
5.1 1.5
5.4 1.7
5.1 1.5
4.6 1.0
5.1 1.7
4.8 1.9
5.0 1.6
5.0 1.6
5.2 1.5
5.2 1.4
4.7 1.6
4.8 1.6
5.4 1.5
5.2 1.5
5.5 1.4
4.9 1.5
5.0 1.2
5.5 1.3
4.9 1.4
4.4 1.3
5.1 1.5
5.0 1.3
4.5 1.3
4.4 1.3
5.0 1.6
5.1 1.9
4.8 1.4
5.1 1.6
4.6 1.4
5.3 1.5
5.0 1.4
7.0 4.7
6.4 4.5
6.9 4.9
5.5 4.0
6.5 4.6
5.7 4.5
6.3 4.7
4.9 3.3
6.6 4.6
5.2 3.9
5.0 3.5
5.9 4.2
6.0 4.0
6.1 4.7
5.6 3.6
6.7 4.4
5.6 4.5
5.8 4.1
6.2 4.5
5.6 3.9
5.9 4.8
6.1 4.0
6.3 4.9
6.1 4.7
6.4 4.3
6.6 4.4
6.8 4.8
6.7 5.0
6.0 4.5
5.7 3.5
5.5 3.8
5.5 3.7
5.8 3.9
6.0 5.1
5.4 4.5
6.0 4.5
6.7 4.7
6.3 4.4
5.6 4.1
5.5 4.0
5.5 4.4
6.1 4.6
5.8 4.0
5.0 3.3
5.6 4.2
5.7 4.2
5.7 4.2
6.2 4.3
5.1 3.0
5.7 4.1
6.3 6.0
5.8 5.1
7.1 5.9
6.3 5.6
6.5 5.8
7.6 6.6
4.9 4.5
7.3 6.3
6.7 5.8
7.2 6.1
6.5 5.1
6.4 5.3
6.8 5.5
5.7 5.0
5.8 5.1
6.4 5.3
6.5 5.5
7.7 6.7
7.7 6.9
6.0 5.0
6.9 5.7
5.6 4.9
7.7 6.7
6.3 4.9
6.7 5.7
7.2 6.0
6.2 4.8
6.1 4.9
6.4 5.6
7.2 5.8
7.4 6.1
7.9 6.4
6.4 5.6
6.3 5.1
6.1 5.6
7.7 6.1
6.3 5.6
6.4 5.5
6.0 4.8
6.9 5.4
6.7 5.6
6.9 5.1
5.8 5.1
6.8 5.9
6.7 5.7
6.7 5.2
6.3 5.0
6.5 5.2
6.2 5.4
5.9 5.1


Even, an internal pattern can be used to select a column by using contains() -

iris %>% 
  select(contains("dth"))
(#tab:select5_kable)iris data: column names containing ‘dth’
Sepal.Width Petal.Width
3.5 0.2
3.0 0.2
3.2 0.2
3.1 0.2
3.6 0.2
3.9 0.4
3.4 0.3
3.4 0.2
2.9 0.2
3.1 0.1
3.7 0.2
3.4 0.2
3.0 0.1
3.0 0.1
4.0 0.2
4.4 0.4
3.9 0.4
3.5 0.3
3.8 0.3
3.8 0.3
3.4 0.2
3.7 0.4
3.6 0.2
3.3 0.5
3.4 0.2
3.0 0.2
3.4 0.4
3.5 0.2
3.4 0.2
3.2 0.2
3.1 0.2
3.4 0.4
4.1 0.1
4.2 0.2
3.1 0.2
3.2 0.2
3.5 0.2
3.6 0.1
3.0 0.2
3.4 0.2
3.5 0.3
2.3 0.3
3.2 0.2
3.5 0.6
3.8 0.4
3.0 0.3
3.8 0.2
3.2 0.2
3.7 0.2
3.3 0.2
3.2 1.4
3.2 1.5
3.1 1.5
2.3 1.3
2.8 1.5
2.8 1.3
3.3 1.6
2.4 1.0
2.9 1.3
2.7 1.4
2.0 1.0
3.0 1.5
2.2 1.0
2.9 1.4
2.9 1.3
3.1 1.4
3.0 1.5
2.7 1.0
2.2 1.5
2.5 1.1
3.2 1.8
2.8 1.3
2.5 1.5
2.8 1.2
2.9 1.3
3.0 1.4
2.8 1.4
3.0 1.7
2.9 1.5
2.6 1.0
2.4 1.1
2.4 1.0
2.7 1.2
2.7 1.6
3.0 1.5
3.4 1.6
3.1 1.5
2.3 1.3
3.0 1.3
2.5 1.3
2.6 1.2
3.0 1.4
2.6 1.2
2.3 1.0
2.7 1.3
3.0 1.2
2.9 1.3
2.9 1.3
2.5 1.1
2.8 1.3
3.3 2.5
2.7 1.9
3.0 2.1
2.9 1.8
3.0 2.2
3.0 2.1
2.5 1.7
2.9 1.8
2.5 1.8
3.6 2.5
3.2 2.0
2.7 1.9
3.0 2.1
2.5 2.0
2.8 2.4
3.2 2.3
3.0 1.8
3.8 2.2
2.6 2.3
2.2 1.5
3.2 2.3
2.8 2.0
2.8 2.0
2.7 1.8
3.3 2.1
3.2 1.8
2.8 1.8
3.0 1.8
2.8 2.1
3.0 1.6
2.8 1.9
3.8 2.0
2.8 2.2
2.8 1.5
2.6 1.4
3.0 2.3
3.4 2.4
3.1 1.8
3.0 1.8
3.1 2.1
3.1 2.4
3.1 2.3
2.7 1.9
3.2 2.3
3.3 2.5
3.0 2.3
2.5 1.9
3.0 2.0
3.4 2.3
3.0 1.8


Even, you can use regular expression to select a column by using matches() -

# column name containing either W or d or both
iris %>% 
  select(matches("[Wd]"))
(#tab:select6_kable)iris data: column name containing W or d
Sepal.Width Petal.Width
3.5 0.2
3.0 0.2
3.2 0.2
3.1 0.2
3.6 0.2
3.9 0.4
3.4 0.3
3.4 0.2
2.9 0.2
3.1 0.1
3.7 0.2
3.4 0.2
3.0 0.1
3.0 0.1
4.0 0.2
4.4 0.4
3.9 0.4
3.5 0.3
3.8 0.3
3.8 0.3
3.4 0.2
3.7 0.4
3.6 0.2
3.3 0.5
3.4 0.2
3.0 0.2
3.4 0.4
3.5 0.2
3.4 0.2
3.2 0.2
3.1 0.2
3.4 0.4
4.1 0.1
4.2 0.2
3.1 0.2
3.2 0.2
3.5 0.2
3.6 0.1
3.0 0.2
3.4 0.2
3.5 0.3
2.3 0.3
3.2 0.2
3.5 0.6
3.8 0.4
3.0 0.3
3.8 0.2
3.2 0.2
3.7 0.2
3.3 0.2
3.2 1.4
3.2 1.5
3.1 1.5
2.3 1.3
2.8 1.5
2.8 1.3
3.3 1.6
2.4 1.0
2.9 1.3
2.7 1.4
2.0 1.0
3.0 1.5
2.2 1.0
2.9 1.4
2.9 1.3
3.1 1.4
3.0 1.5
2.7 1.0
2.2 1.5
2.5 1.1
3.2 1.8
2.8 1.3
2.5 1.5
2.8 1.2
2.9 1.3
3.0 1.4
2.8 1.4
3.0 1.7
2.9 1.5
2.6 1.0
2.4 1.1
2.4 1.0
2.7 1.2
2.7 1.6
3.0 1.5
3.4 1.6
3.1 1.5
2.3 1.3
3.0 1.3
2.5 1.3
2.6 1.2
3.0 1.4
2.6 1.2
2.3 1.0
2.7 1.3
3.0 1.2
2.9 1.3
2.9 1.3
2.5 1.1
2.8 1.3
3.3 2.5
2.7 1.9
3.0 2.1
2.9 1.8
3.0 2.2
3.0 2.1
2.5 1.7
2.9 1.8
2.5 1.8
3.6 2.5
3.2 2.0
2.7 1.9
3.0 2.1
2.5 2.0
2.8 2.4
3.2 2.3
3.0 1.8
3.8 2.2
2.6 2.3
2.2 1.5
3.2 2.3
2.8 2.0
2.8 2.0
2.7 1.8
3.3 2.1
3.2 1.8
2.8 1.8
3.0 1.8
2.8 2.1
3.0 1.6
2.8 1.9
3.8 2.0
2.8 2.2
2.8 1.5
2.6 1.4
3.0 2.3
3.4 2.4
3.1 1.8
3.0 1.8
3.1 2.1
3.1 2.4
3.1 2.3
2.7 1.9
3.2 2.3
3.3 2.5
3.0 2.3
2.5 1.9
3.0 2.0
3.4 2.3
3.0 1.8

2.3.2 filter()

The filter() verb is used to subset a data-frame based on one or more conditions imposed on the row(s). Only the elements (along the column) that satisfy the condition(s) remain and others (along with the whole row) get filtered out. There are some functions and operators that you should know while dealing with filter() verb, like -

==, >, <, >=, <=
&, |,  !
is.na()
%in%

Let’s see some examples -

# choose the rows whose Petal.Width is greater than 2
iris %>% 
  filter(Petal.Width > 2)
(#tab:filter1_kable)iris data: Petal width creater than 2
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
6.3 3.3 6.0 2.5 virginica
7.1 3.0 5.9 2.1 virginica
6.5 3.0 5.8 2.2 virginica
7.6 3.0 6.6 2.1 virginica
7.2 3.6 6.1 2.5 virginica
6.8 3.0 5.5 2.1 virginica
5.8 2.8 5.1 2.4 virginica
6.4 3.2 5.3 2.3 virginica
7.7 3.8 6.7 2.2 virginica
7.7 2.6 6.9 2.3 virginica
6.9 3.2 5.7 2.3 virginica
6.7 3.3 5.7 2.1 virginica
6.4 2.8 5.6 2.1 virginica
6.4 2.8 5.6 2.2 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.2 3.4 5.4 2.3 virginica
# choose the rows for setosa Species
iris %>% 
  filter(Species == "setosa")
  # filter(Species %in% "setosa")
(#tab:filter2_kable)iris data: setosa only
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.1 3.5 1.4 0.3 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
4.8 3.4 1.9 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3.1 1.6 0.2 setosa
5.4 3.4 1.5 0.4 setosa
5.2 4.1 1.5 0.1 setosa
5.5 4.2 1.4 0.2 setosa
4.9 3.1 1.5 0.2 setosa
5.0 3.2 1.2 0.2 setosa
5.5 3.5 1.3 0.2 setosa
4.9 3.6 1.4 0.1 setosa
4.4 3.0 1.3 0.2 setosa
5.1 3.4 1.5 0.2 setosa
5.0 3.5 1.3 0.3 setosa
4.5 2.3 1.3 0.3 setosa
4.4 3.2 1.3 0.2 setosa
5.0 3.5 1.6 0.6 setosa
5.1 3.8 1.9 0.4 setosa
4.8 3.0 1.4 0.3 setosa
5.1 3.8 1.6 0.2 setosa
4.6 3.2 1.4 0.2 setosa
5.3 3.7 1.5 0.2 setosa
5.0 3.3 1.4 0.2 setosa
# or even the opposite is True
iris %>% filter(Species != "setosa")
(#tab:filter3_kable)iris data: without setosa
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
5.5 2.3 4.0 1.3 versicolor
6.5 2.8 4.6 1.5 versicolor
5.7 2.8 4.5 1.3 versicolor
6.3 3.3 4.7 1.6 versicolor
4.9 2.4 3.3 1.0 versicolor
6.6 2.9 4.6 1.3 versicolor
5.2 2.7 3.9 1.4 versicolor
5.0 2.0 3.5 1.0 versicolor
5.9 3.0 4.2 1.5 versicolor
6.0 2.2 4.0 1.0 versicolor
6.1 2.9 4.7 1.4 versicolor
5.6 2.9 3.6 1.3 versicolor
6.7 3.1 4.4 1.4 versicolor
5.6 3.0 4.5 1.5 versicolor
5.8 2.7 4.1 1.0 versicolor
6.2 2.2 4.5 1.5 versicolor
5.6 2.5 3.9 1.1 versicolor
5.9 3.2 4.8 1.8 versicolor
6.1 2.8 4.0 1.3 versicolor
6.3 2.5 4.9 1.5 versicolor
6.1 2.8 4.7 1.2 versicolor
6.4 2.9 4.3 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.8 2.8 4.8 1.4 versicolor
6.7 3.0 5.0 1.7 versicolor
6.0 2.9 4.5 1.5 versicolor
5.7 2.6 3.5 1.0 versicolor
5.5 2.4 3.8 1.1 versicolor
5.5 2.4 3.7 1.0 versicolor
5.8 2.7 3.9 1.2 versicolor
6.0 2.7 5.1 1.6 versicolor
5.4 3.0 4.5 1.5 versicolor
6.0 3.4 4.5 1.6 versicolor
6.7 3.1 4.7 1.5 versicolor
6.3 2.3 4.4 1.3 versicolor
5.6 3.0 4.1 1.3 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
6.1 3.0 4.6 1.4 versicolor
5.8 2.6 4.0 1.2 versicolor
5.0 2.3 3.3 1.0 versicolor
5.6 2.7 4.2 1.3 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 2.9 4.2 1.3 versicolor
6.2 2.9 4.3 1.3 versicolor
5.1 2.5 3.0 1.1 versicolor
5.7 2.8 4.1 1.3 versicolor
6.3 3.3 6.0 2.5 virginica
5.8 2.7 5.1 1.9 virginica
7.1 3.0 5.9 2.1 virginica
6.3 2.9 5.6 1.8 virginica
6.5 3.0 5.8 2.2 virginica
7.6 3.0 6.6 2.1 virginica
4.9 2.5 4.5 1.7 virginica
7.3 2.9 6.3 1.8 virginica
6.7 2.5 5.8 1.8 virginica
7.2 3.6 6.1 2.5 virginica
6.5 3.2 5.1 2.0 virginica
6.4 2.7 5.3 1.9 virginica
6.8 3.0 5.5 2.1 virginica
5.7 2.5 5.0 2.0 virginica
5.8 2.8 5.1 2.4 virginica
6.4 3.2 5.3 2.3 virginica
6.5 3.0 5.5 1.8 virginica
7.7 3.8 6.7 2.2 virginica
7.7 2.6 6.9 2.3 virginica
6.0 2.2 5.0 1.5 virginica
6.9 3.2 5.7 2.3 virginica
5.6 2.8 4.9 2.0 virginica
7.7 2.8 6.7 2.0 virginica
6.3 2.7 4.9 1.8 virginica
6.7 3.3 5.7 2.1 virginica
7.2 3.2 6.0 1.8 virginica
6.2 2.8 4.8 1.8 virginica
6.1 3.0 4.9 1.8 virginica
6.4 2.8 5.6 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.4 2.8 6.1 1.9 virginica
7.9 3.8 6.4 2.0 virginica
6.4 2.8 5.6 2.2 virginica
6.3 2.8 5.1 1.5 virginica
6.1 2.6 5.6 1.4 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.4 3.1 5.5 1.8 virginica
6.0 3.0 4.8 1.8 virginica
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica

2.3.3 mutate()

The verb mutate() creates new columns and often the element of the new column can be functions of the existing variables (i.e. columns).

iris %>% 
  mutate(Length_difference = Sepal.Length - Petal.Length) # not that the new column here make much sense
(#tab:mutate_kable)iris data: new column added
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Length_difference
5.1 3.5 1.4 0.2 setosa 3.7
4.9 3.0 1.4 0.2 setosa 3.5
4.7 3.2 1.3 0.2 setosa 3.4
4.6 3.1 1.5 0.2 setosa 3.1
5.0 3.6 1.4 0.2 setosa 3.6
5.4 3.9 1.7 0.4 setosa 3.7
4.6 3.4 1.4 0.3 setosa 3.2
5.0 3.4 1.5 0.2 setosa 3.5
4.4 2.9 1.4 0.2 setosa 3.0
4.9 3.1 1.5 0.1 setosa 3.4
5.4 3.7 1.5 0.2 setosa 3.9
4.8 3.4 1.6 0.2 setosa 3.2
4.8 3.0 1.4 0.1 setosa 3.4
4.3 3.0 1.1 0.1 setosa 3.2
5.8 4.0 1.2 0.2 setosa 4.6
5.7 4.4 1.5 0.4 setosa 4.2
5.4 3.9 1.3 0.4 setosa 4.1
5.1 3.5 1.4 0.3 setosa 3.7
5.7 3.8 1.7 0.3 setosa 4.0
5.1 3.8 1.5 0.3 setosa 3.6
5.4 3.4 1.7 0.2 setosa 3.7
5.1 3.7 1.5 0.4 setosa 3.6
4.6 3.6 1.0 0.2 setosa 3.6
5.1 3.3 1.7 0.5 setosa 3.4
4.8 3.4 1.9 0.2 setosa 2.9
5.0 3.0 1.6 0.2 setosa 3.4
5.0 3.4 1.6 0.4 setosa 3.4
5.2 3.5 1.5 0.2 setosa 3.7
5.2 3.4 1.4 0.2 setosa 3.8
4.7 3.2 1.6 0.2 setosa 3.1
4.8 3.1 1.6 0.2 setosa 3.2
5.4 3.4 1.5 0.4 setosa 3.9
5.2 4.1 1.5 0.1 setosa 3.7
5.5 4.2 1.4 0.2 setosa 4.1
4.9 3.1 1.5 0.2 setosa 3.4
5.0 3.2 1.2 0.2 setosa 3.8
5.5 3.5 1.3 0.2 setosa 4.2
4.9 3.6 1.4 0.1 setosa 3.5
4.4 3.0 1.3 0.2 setosa 3.1
5.1 3.4 1.5 0.2 setosa 3.6
5.0 3.5 1.3 0.3 setosa 3.7
4.5 2.3 1.3 0.3 setosa 3.2
4.4 3.2 1.3 0.2 setosa 3.1
5.0 3.5 1.6 0.6 setosa 3.4
5.1 3.8 1.9 0.4 setosa 3.2
4.8 3.0 1.4 0.3 setosa 3.4
5.1 3.8 1.6 0.2 setosa 3.5
4.6 3.2 1.4 0.2 setosa 3.2
5.3 3.7 1.5 0.2 setosa 3.8
5.0 3.3 1.4 0.2 setosa 3.6
7.0 3.2 4.7 1.4 versicolor 2.3
6.4 3.2 4.5 1.5 versicolor 1.9
6.9 3.1 4.9 1.5 versicolor 2.0
5.5 2.3 4.0 1.3 versicolor 1.5
6.5 2.8 4.6 1.5 versicolor 1.9
5.7 2.8 4.5 1.3 versicolor 1.2
6.3 3.3 4.7 1.6 versicolor 1.6
4.9 2.4 3.3 1.0 versicolor 1.6
6.6 2.9 4.6 1.3 versicolor 2.0
5.2 2.7 3.9 1.4 versicolor 1.3
5.0 2.0 3.5 1.0 versicolor 1.5
5.9 3.0 4.2 1.5 versicolor 1.7
6.0 2.2 4.0 1.0 versicolor 2.0
6.1 2.9 4.7 1.4 versicolor 1.4
5.6 2.9 3.6 1.3 versicolor 2.0
6.7 3.1 4.4 1.4 versicolor 2.3
5.6 3.0 4.5 1.5 versicolor 1.1
5.8 2.7 4.1 1.0 versicolor 1.7
6.2 2.2 4.5 1.5 versicolor 1.7
5.6 2.5 3.9 1.1 versicolor 1.7
5.9 3.2 4.8 1.8 versicolor 1.1
6.1 2.8 4.0 1.3 versicolor 2.1
6.3 2.5 4.9 1.5 versicolor 1.4
6.1 2.8 4.7 1.2 versicolor 1.4
6.4 2.9 4.3 1.3 versicolor 2.1
6.6 3.0 4.4 1.4 versicolor 2.2
6.8 2.8 4.8 1.4 versicolor 2.0
6.7 3.0 5.0 1.7 versicolor 1.7
6.0 2.9 4.5 1.5 versicolor 1.5
5.7 2.6 3.5 1.0 versicolor 2.2
5.5 2.4 3.8 1.1 versicolor 1.7
5.5 2.4 3.7 1.0 versicolor 1.8
5.8 2.7 3.9 1.2 versicolor 1.9
6.0 2.7 5.1 1.6 versicolor 0.9
5.4 3.0 4.5 1.5 versicolor 0.9
6.0 3.4 4.5 1.6 versicolor 1.5
6.7 3.1 4.7 1.5 versicolor 2.0
6.3 2.3 4.4 1.3 versicolor 1.9
5.6 3.0 4.1 1.3 versicolor 1.5
5.5 2.5 4.0 1.3 versicolor 1.5
5.5 2.6 4.4 1.2 versicolor 1.1
6.1 3.0 4.6 1.4 versicolor 1.5
5.8 2.6 4.0 1.2 versicolor 1.8
5.0 2.3 3.3 1.0 versicolor 1.7
5.6 2.7 4.2 1.3 versicolor 1.4
5.7 3.0 4.2 1.2 versicolor 1.5
5.7 2.9 4.2 1.3 versicolor 1.5
6.2 2.9 4.3 1.3 versicolor 1.9
5.1 2.5 3.0 1.1 versicolor 2.1
5.7 2.8 4.1 1.3 versicolor 1.6
6.3 3.3 6.0 2.5 virginica 0.3
5.8 2.7 5.1 1.9 virginica 0.7
7.1 3.0 5.9 2.1 virginica 1.2
6.3 2.9 5.6 1.8 virginica 0.7
6.5 3.0 5.8 2.2 virginica 0.7
7.6 3.0 6.6 2.1 virginica 1.0
4.9 2.5 4.5 1.7 virginica 0.4
7.3 2.9 6.3 1.8 virginica 1.0
6.7 2.5 5.8 1.8 virginica 0.9
7.2 3.6 6.1 2.5 virginica 1.1
6.5 3.2 5.1 2.0 virginica 1.4
6.4 2.7 5.3 1.9 virginica 1.1
6.8 3.0 5.5 2.1 virginica 1.3
5.7 2.5 5.0 2.0 virginica 0.7
5.8 2.8 5.1 2.4 virginica 0.7
6.4 3.2 5.3 2.3 virginica 1.1
6.5 3.0 5.5 1.8 virginica 1.0
7.7 3.8 6.7 2.2 virginica 1.0
7.7 2.6 6.9 2.3 virginica 0.8
6.0 2.2 5.0 1.5 virginica 1.0
6.9 3.2 5.7 2.3 virginica 1.2
5.6 2.8 4.9 2.0 virginica 0.7
7.7 2.8 6.7 2.0 virginica 1.0
6.3 2.7 4.9 1.8 virginica 1.4
6.7 3.3 5.7 2.1 virginica 1.0
7.2 3.2 6.0 1.8 virginica 1.2
6.2 2.8 4.8 1.8 virginica 1.4
6.1 3.0 4.9 1.8 virginica 1.2
6.4 2.8 5.6 2.1 virginica 0.8
7.2 3.0 5.8 1.6 virginica 1.4
7.4 2.8 6.1 1.9 virginica 1.3
7.9 3.8 6.4 2.0 virginica 1.5
6.4 2.8 5.6 2.2 virginica 0.8
6.3 2.8 5.1 1.5 virginica 1.2
6.1 2.6 5.6 1.4 virginica 0.5
7.7 3.0 6.1 2.3 virginica 1.6
6.3 3.4 5.6 2.4 virginica 0.7
6.4 3.1 5.5 1.8 virginica 0.9
6.0 3.0 4.8 1.8 virginica 1.2
6.9 3.1 5.4 2.1 virginica 1.5
6.7 3.1 5.6 2.4 virginica 1.1
6.9 3.1 5.1 2.3 virginica 1.8
5.8 2.7 5.1 1.9 virginica 0.7
6.8 3.2 5.9 2.3 virginica 0.9
6.7 3.3 5.7 2.5 virginica 1.0
6.7 3.0 5.2 2.3 virginica 1.5
6.3 2.5 5.0 1.9 virginica 1.3
6.5 3.0 5.2 2.0 virginica 1.3
6.2 3.4 5.4 2.3 virginica 0.8
5.9 3.0 5.1 1.8 virginica 0.8
# To keep only the newly created column, use transmute()
iris %>% 
  transmute(Length_difference = Sepal.Length - Petal.Length)
(#tab:transmute_kable)iris data: new column only
Length_difference
3.7
3.5
3.4
3.1
3.6
3.7
3.2
3.5
3.0
3.4
3.9
3.2
3.4
3.2
4.6
4.2
4.1
3.7
4.0
3.6
3.7
3.6
3.6
3.4
2.9
3.4
3.4
3.7
3.8
3.1
3.2
3.9
3.7
4.1
3.4
3.8
4.2
3.5
3.1
3.6
3.7
3.2
3.1
3.4
3.2
3.4
3.5
3.2
3.8
3.6
2.3
1.9
2.0
1.5
1.9
1.2
1.6
1.6
2.0
1.3
1.5
1.7
2.0
1.4
2.0
2.3
1.1
1.7
1.7
1.7
1.1
2.1
1.4
1.4
2.1
2.2
2.0
1.7
1.5
2.2
1.7
1.8
1.9
0.9
0.9
1.5
2.0
1.9
1.5
1.5
1.1
1.5
1.8
1.7
1.4
1.5
1.5
1.9
2.1
1.6
0.3
0.7
1.2
0.7
0.7
1.0
0.4
1.0
0.9
1.1
1.4
1.1
1.3
0.7
0.7
1.1
1.0
1.0
0.8
1.0
1.2
0.7
1.0
1.4
1.0
1.2
1.4
1.2
0.8
1.4
1.3
1.5
0.8
1.2
0.5
1.6
0.7
0.9
1.2
1.5
1.1
1.8
0.7
0.9
1.0
1.5
1.3
1.3
0.8
0.8


Interestingly, setting the value of an existing column to NULL inside mutate deletes the column.

2.3.4 rename()

As the name suggests, rename() verb changes the name of an existing column. The syntax is <new_name> = <old_name>. Example -

iris %>% 
  rename(Species.name=Species) 
(#tab:rename1_kable)iris data: Species column renamed
Sepal.Length Sepal.Width Petal.Length Petal.Width Species.name
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.1 3.5 1.4 0.3 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
4.8 3.4 1.9 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3.1 1.6 0.2 setosa
5.4 3.4 1.5 0.4 setosa
5.2 4.1 1.5 0.1 setosa
5.5 4.2 1.4 0.2 setosa
4.9 3.1 1.5 0.2 setosa
5.0 3.2 1.2 0.2 setosa
5.5 3.5 1.3 0.2 setosa
4.9 3.6 1.4 0.1 setosa
4.4 3.0 1.3 0.2 setosa
5.1 3.4 1.5 0.2 setosa
5.0 3.5 1.3 0.3 setosa
4.5 2.3 1.3 0.3 setosa
4.4 3.2 1.3 0.2 setosa
5.0 3.5 1.6 0.6 setosa
5.1 3.8 1.9 0.4 setosa
4.8 3.0 1.4 0.3 setosa
5.1 3.8 1.6 0.2 setosa
4.6 3.2 1.4 0.2 setosa
5.3 3.7 1.5 0.2 setosa
5.0 3.3 1.4 0.2 setosa
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
5.5 2.3 4.0 1.3 versicolor
6.5 2.8 4.6 1.5 versicolor
5.7 2.8 4.5 1.3 versicolor
6.3 3.3 4.7 1.6 versicolor
4.9 2.4 3.3 1.0 versicolor
6.6 2.9 4.6 1.3 versicolor
5.2 2.7 3.9 1.4 versicolor
5.0 2.0 3.5 1.0 versicolor
5.9 3.0 4.2 1.5 versicolor
6.0 2.2 4.0 1.0 versicolor
6.1 2.9 4.7 1.4 versicolor
5.6 2.9 3.6 1.3 versicolor
6.7 3.1 4.4 1.4 versicolor
5.6 3.0 4.5 1.5 versicolor
5.8 2.7 4.1 1.0 versicolor
6.2 2.2 4.5 1.5 versicolor
5.6 2.5 3.9 1.1 versicolor
5.9 3.2 4.8 1.8 versicolor
6.1 2.8 4.0 1.3 versicolor
6.3 2.5 4.9 1.5 versicolor
6.1 2.8 4.7 1.2 versicolor
6.4 2.9 4.3 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.8 2.8 4.8 1.4 versicolor
6.7 3.0 5.0 1.7 versicolor
6.0 2.9 4.5 1.5 versicolor
5.7 2.6 3.5 1.0 versicolor
5.5 2.4 3.8 1.1 versicolor
5.5 2.4 3.7 1.0 versicolor
5.8 2.7 3.9 1.2 versicolor
6.0 2.7 5.1 1.6 versicolor
5.4 3.0 4.5 1.5 versicolor
6.0 3.4 4.5 1.6 versicolor
6.7 3.1 4.7 1.5 versicolor
6.3 2.3 4.4 1.3 versicolor
5.6 3.0 4.1 1.3 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
6.1 3.0 4.6 1.4 versicolor
5.8 2.6 4.0 1.2 versicolor
5.0 2.3 3.3 1.0 versicolor
5.6 2.7 4.2 1.3 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 2.9 4.2 1.3 versicolor
6.2 2.9 4.3 1.3 versicolor
5.1 2.5 3.0 1.1 versicolor
5.7 2.8 4.1 1.3 versicolor
6.3 3.3 6.0 2.5 virginica
5.8 2.7 5.1 1.9 virginica
7.1 3.0 5.9 2.1 virginica
6.3 2.9 5.6 1.8 virginica
6.5 3.0 5.8 2.2 virginica
7.6 3.0 6.6 2.1 virginica
4.9 2.5 4.5 1.7 virginica
7.3 2.9 6.3 1.8 virginica
6.7 2.5 5.8 1.8 virginica
7.2 3.6 6.1 2.5 virginica
6.5 3.2 5.1 2.0 virginica
6.4 2.7 5.3 1.9 virginica
6.8 3.0 5.5 2.1 virginica
5.7 2.5 5.0 2.0 virginica
5.8 2.8 5.1 2.4 virginica
6.4 3.2 5.3 2.3 virginica
6.5 3.0 5.5 1.8 virginica
7.7 3.8 6.7 2.2 virginica
7.7 2.6 6.9 2.3 virginica
6.0 2.2 5.0 1.5 virginica
6.9 3.2 5.7 2.3 virginica
5.6 2.8 4.9 2.0 virginica
7.7 2.8 6.7 2.0 virginica
6.3 2.7 4.9 1.8 virginica
6.7 3.3 5.7 2.1 virginica
7.2 3.2 6.0 1.8 virginica
6.2 2.8 4.8 1.8 virginica
6.1 3.0 4.9 1.8 virginica
6.4 2.8 5.6 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.4 2.8 6.1 1.9 virginica
7.9 3.8 6.4 2.0 virginica
6.4 2.8 5.6 2.2 virginica
6.3 2.8 5.1 1.5 virginica
6.1 2.6 5.6 1.4 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.4 3.1 5.5 1.8 virginica
6.0 3.0 4.8 1.8 virginica
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica


Interestingly, you can change the name of a column while selecting using select() verb -

iris %>% select(Sepal.Length, 
                Sepal.Width, 
                Petal.Length, 
                Petal.Width, 
                Species.name=Species)
(#tab:rename2_kable)iris data: Species column renamed using select()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species.name
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.1 3.5 1.4 0.3 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
4.8 3.4 1.9 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3.1 1.6 0.2 setosa
5.4 3.4 1.5 0.4 setosa
5.2 4.1 1.5 0.1 setosa
5.5 4.2 1.4 0.2 setosa
4.9 3.1 1.5 0.2 setosa
5.0 3.2 1.2 0.2 setosa
5.5 3.5 1.3 0.2 setosa
4.9 3.6 1.4 0.1 setosa
4.4 3.0 1.3 0.2 setosa
5.1 3.4 1.5 0.2 setosa
5.0 3.5 1.3 0.3 setosa
4.5 2.3 1.3 0.3 setosa
4.4 3.2 1.3 0.2 setosa
5.0 3.5 1.6 0.6 setosa
5.1 3.8 1.9 0.4 setosa
4.8 3.0 1.4 0.3 setosa
5.1 3.8 1.6 0.2 setosa
4.6 3.2 1.4 0.2 setosa
5.3 3.7 1.5 0.2 setosa
5.0 3.3 1.4 0.2 setosa
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
5.5 2.3 4.0 1.3 versicolor
6.5 2.8 4.6 1.5 versicolor
5.7 2.8 4.5 1.3 versicolor
6.3 3.3 4.7 1.6 versicolor
4.9 2.4 3.3 1.0 versicolor
6.6 2.9 4.6 1.3 versicolor
5.2 2.7 3.9 1.4 versicolor
5.0 2.0 3.5 1.0 versicolor
5.9 3.0 4.2 1.5 versicolor
6.0 2.2 4.0 1.0 versicolor
6.1 2.9 4.7 1.4 versicolor
5.6 2.9 3.6 1.3 versicolor
6.7 3.1 4.4 1.4 versicolor
5.6 3.0 4.5 1.5 versicolor
5.8 2.7 4.1 1.0 versicolor
6.2 2.2 4.5 1.5 versicolor
5.6 2.5 3.9 1.1 versicolor
5.9 3.2 4.8 1.8 versicolor
6.1 2.8 4.0 1.3 versicolor
6.3 2.5 4.9 1.5 versicolor
6.1 2.8 4.7 1.2 versicolor
6.4 2.9 4.3 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.8 2.8 4.8 1.4 versicolor
6.7 3.0 5.0 1.7 versicolor
6.0 2.9 4.5 1.5 versicolor
5.7 2.6 3.5 1.0 versicolor
5.5 2.4 3.8 1.1 versicolor
5.5 2.4 3.7 1.0 versicolor
5.8 2.7 3.9 1.2 versicolor
6.0 2.7 5.1 1.6 versicolor
5.4 3.0 4.5 1.5 versicolor
6.0 3.4 4.5 1.6 versicolor
6.7 3.1 4.7 1.5 versicolor
6.3 2.3 4.4 1.3 versicolor
5.6 3.0 4.1 1.3 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
6.1 3.0 4.6 1.4 versicolor
5.8 2.6 4.0 1.2 versicolor
5.0 2.3 3.3 1.0 versicolor
5.6 2.7 4.2 1.3 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 2.9 4.2 1.3 versicolor
6.2 2.9 4.3 1.3 versicolor
5.1 2.5 3.0 1.1 versicolor
5.7 2.8 4.1 1.3 versicolor
6.3 3.3 6.0 2.5 virginica
5.8 2.7 5.1 1.9 virginica
7.1 3.0 5.9 2.1 virginica
6.3 2.9 5.6 1.8 virginica
6.5 3.0 5.8 2.2 virginica
7.6 3.0 6.6 2.1 virginica
4.9 2.5 4.5 1.7 virginica
7.3 2.9 6.3 1.8 virginica
6.7 2.5 5.8 1.8 virginica
7.2 3.6 6.1 2.5 virginica
6.5 3.2 5.1 2.0 virginica
6.4 2.7 5.3 1.9 virginica
6.8 3.0 5.5 2.1 virginica
5.7 2.5 5.0 2.0 virginica
5.8 2.8 5.1 2.4 virginica
6.4 3.2 5.3 2.3 virginica
6.5 3.0 5.5 1.8 virginica
7.7 3.8 6.7 2.2 virginica
7.7 2.6 6.9 2.3 virginica
6.0 2.2 5.0 1.5 virginica
6.9 3.2 5.7 2.3 virginica
5.6 2.8 4.9 2.0 virginica
7.7 2.8 6.7 2.0 virginica
6.3 2.7 4.9 1.8 virginica
6.7 3.3 5.7 2.1 virginica
7.2 3.2 6.0 1.8 virginica
6.2 2.8 4.8 1.8 virginica
6.1 3.0 4.9 1.8 virginica
6.4 2.8 5.6 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.4 2.8 6.1 1.9 virginica
7.9 3.8 6.4 2.0 virginica
6.4 2.8 5.6 2.2 virginica
6.3 2.8 5.1 1.5 virginica
6.1 2.6 5.6 1.4 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.4 3.1 5.5 1.8 virginica
6.0 3.0 4.8 1.8 virginica
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica

2.3.5 arrange()

The verb arrange() arranges or orders the rows of a data-frame by the values of selected column(s), like -

iris %>% 
  arrange(Sepal.Length)
(#tab:arrange1_kable)iris data: arranged by Sepal length
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4.3 3.0 1.1 0.1 setosa
4.4 2.9 1.4 0.2 setosa
4.4 3.0 1.3 0.2 setosa
4.4 3.2 1.3 0.2 setosa
4.5 2.3 1.3 0.3 setosa
4.6 3.1 1.5 0.2 setosa
4.6 3.4 1.4 0.3 setosa
4.6 3.6 1.0 0.2 setosa
4.6 3.2 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.8 3.4 1.9 0.2 setosa
4.8 3.1 1.6 0.2 setosa
4.8 3.0 1.4 0.3 setosa
4.9 3.0 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
4.9 3.1 1.5 0.2 setosa
4.9 3.6 1.4 0.1 setosa
4.9 2.4 3.3 1.0 versicolor
4.9 2.5 4.5 1.7 virginica
5.0 3.6 1.4 0.2 setosa
5.0 3.4 1.5 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.0 3.2 1.2 0.2 setosa
5.0 3.5 1.3 0.3 setosa
5.0 3.5 1.6 0.6 setosa
5.0 3.3 1.4 0.2 setosa
5.0 2.0 3.5 1.0 versicolor
5.0 2.3 3.3 1.0 versicolor
5.1 3.5 1.4 0.2 setosa
5.1 3.5 1.4 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.1 3.7 1.5 0.4 setosa
5.1 3.3 1.7 0.5 setosa
5.1 3.4 1.5 0.2 setosa
5.1 3.8 1.9 0.4 setosa
5.1 3.8 1.6 0.2 setosa
5.1 2.5 3.0 1.1 versicolor
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
5.2 4.1 1.5 0.1 setosa
5.2 2.7 3.9 1.4 versicolor
5.3 3.7 1.5 0.2 setosa
5.4 3.9 1.7 0.4 setosa
5.4 3.7 1.5 0.2 setosa
5.4 3.9 1.3 0.4 setosa
5.4 3.4 1.7 0.2 setosa
5.4 3.4 1.5 0.4 setosa
5.4 3.0 4.5 1.5 versicolor
5.5 4.2 1.4 0.2 setosa
5.5 3.5 1.3 0.2 setosa
5.5 2.3 4.0 1.3 versicolor
5.5 2.4 3.8 1.1 versicolor
5.5 2.4 3.7 1.0 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
5.6 2.9 3.6 1.3 versicolor
5.6 3.0 4.5 1.5 versicolor
5.6 2.5 3.9 1.1 versicolor
5.6 3.0 4.1 1.3 versicolor
5.6 2.7 4.2 1.3 versicolor
5.6 2.8 4.9 2.0 virginica
5.7 4.4 1.5 0.4 setosa
5.7 3.8 1.7 0.3 setosa
5.7 2.8 4.5 1.3 versicolor
5.7 2.6 3.5 1.0 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 2.9 4.2 1.3 versicolor
5.7 2.8 4.1 1.3 versicolor
5.7 2.5 5.0 2.0 virginica
5.8 4.0 1.2 0.2 setosa
5.8 2.7 4.1 1.0 versicolor
5.8 2.7 3.9 1.2 versicolor
5.8 2.6 4.0 1.2 versicolor
5.8 2.7 5.1 1.9 virginica
5.8 2.8 5.1 2.4 virginica
5.8 2.7 5.1 1.9 virginica
5.9 3.0 4.2 1.5 versicolor
5.9 3.2 4.8 1.8 versicolor
5.9 3.0 5.1 1.8 virginica
6.0 2.2 4.0 1.0 versicolor
6.0 2.9 4.5 1.5 versicolor
6.0 2.7 5.1 1.6 versicolor
6.0 3.4 4.5 1.6 versicolor
6.0 2.2 5.0 1.5 virginica
6.0 3.0 4.8 1.8 virginica
6.1 2.9 4.7 1.4 versicolor
6.1 2.8 4.0 1.3 versicolor
6.1 2.8 4.7 1.2 versicolor
6.1 3.0 4.6 1.4 versicolor
6.1 3.0 4.9 1.8 virginica
6.1 2.6 5.6 1.4 virginica
6.2 2.2 4.5 1.5 versicolor
6.2 2.9 4.3 1.3 versicolor
6.2 2.8 4.8 1.8 virginica
6.2 3.4 5.4 2.3 virginica
6.3 3.3 4.7 1.6 versicolor
6.3 2.5 4.9 1.5 versicolor
6.3 2.3 4.4 1.3 versicolor
6.3 3.3 6.0 2.5 virginica
6.3 2.9 5.6 1.8 virginica
6.3 2.7 4.9 1.8 virginica
6.3 2.8 5.1 1.5 virginica
6.3 3.4 5.6 2.4 virginica
6.3 2.5 5.0 1.9 virginica
6.4 3.2 4.5 1.5 versicolor
6.4 2.9 4.3 1.3 versicolor
6.4 2.7 5.3 1.9 virginica
6.4 3.2 5.3 2.3 virginica
6.4 2.8 5.6 2.1 virginica
6.4 2.8 5.6 2.2 virginica
6.4 3.1 5.5 1.8 virginica
6.5 2.8 4.6 1.5 versicolor
6.5 3.0 5.8 2.2 virginica
6.5 3.2 5.1 2.0 virginica
6.5 3.0 5.5 1.8 virginica
6.5 3.0 5.2 2.0 virginica
6.6 2.9 4.6 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.7 3.1 4.4 1.4 versicolor
6.7 3.0 5.0 1.7 versicolor
6.7 3.1 4.7 1.5 versicolor
6.7 2.5 5.8 1.8 virginica
6.7 3.3 5.7 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.8 2.8 4.8 1.4 versicolor
6.8 3.0 5.5 2.1 virginica
6.8 3.2 5.9 2.3 virginica
6.9 3.1 4.9 1.5 versicolor
6.9 3.2 5.7 2.3 virginica
6.9 3.1 5.4 2.1 virginica
6.9 3.1 5.1 2.3 virginica
7.0 3.2 4.7 1.4 versicolor
7.1 3.0 5.9 2.1 virginica
7.2 3.6 6.1 2.5 virginica
7.2 3.2 6.0 1.8 virginica
7.2 3.0 5.8 1.6 virginica
7.3 2.9 6.3 1.8 virginica
7.4 2.8 6.1 1.9 virginica
7.6 3.0 6.6 2.1 virginica
7.7 3.8 6.7 2.2 virginica
7.7 2.6 6.9 2.3 virginica
7.7 2.8 6.7 2.0 virginica
7.7 3.0 6.1 2.3 virginica
7.9 3.8 6.4 2.0 virginica
# After arranging the data-frame by Sepal.Length, for a distinct Sepal.Length, the Sepal.Width is arrange and so as the rest of the data-frame with it.
iris %>% 
  arrange(Sepal.Length,Sepal.Width)
(#tab:arrange2_kable)iris data: arranged by Sepal length and width
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4.3 3.0 1.1 0.1 setosa
4.4 2.9 1.4 0.2 setosa
4.4 3.0 1.3 0.2 setosa
4.4 3.2 1.3 0.2 setosa
4.5 2.3 1.3 0.3 setosa
4.6 3.1 1.5 0.2 setosa
4.6 3.2 1.4 0.2 setosa
4.6 3.4 1.4 0.3 setosa
4.6 3.6 1.0 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.8 3.0 1.4 0.3 setosa
4.8 3.1 1.6 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.4 1.9 0.2 setosa
4.9 2.4 3.3 1.0 versicolor
4.9 2.5 4.5 1.7 virginica
4.9 3.0 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
4.9 3.1 1.5 0.2 setosa
4.9 3.6 1.4 0.1 setosa
5.0 2.0 3.5 1.0 versicolor
5.0 2.3 3.3 1.0 versicolor
5.0 3.0 1.6 0.2 setosa
5.0 3.2 1.2 0.2 setosa
5.0 3.3 1.4 0.2 setosa
5.0 3.4 1.5 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.0 3.5 1.3 0.3 setosa
5.0 3.5 1.6 0.6 setosa
5.0 3.6 1.4 0.2 setosa
5.1 2.5 3.0 1.1 versicolor
5.1 3.3 1.7 0.5 setosa
5.1 3.4 1.5 0.2 setosa
5.1 3.5 1.4 0.2 setosa
5.1 3.5 1.4 0.3 setosa
5.1 3.7 1.5 0.4 setosa
5.1 3.8 1.5 0.3 setosa
5.1 3.8 1.9 0.4 setosa
5.1 3.8 1.6 0.2 setosa
5.2 2.7 3.9 1.4 versicolor
5.2 3.4 1.4 0.2 setosa
5.2 3.5 1.5 0.2 setosa
5.2 4.1 1.5 0.1 setosa
5.3 3.7 1.5 0.2 setosa
5.4 3.0 4.5 1.5 versicolor
5.4 3.4 1.7 0.2 setosa
5.4 3.4 1.5 0.4 setosa
5.4 3.7 1.5 0.2 setosa
5.4 3.9 1.7 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.5 2.3 4.0 1.3 versicolor
5.5 2.4 3.8 1.1 versicolor
5.5 2.4 3.7 1.0 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
5.5 3.5 1.3 0.2 setosa
5.5 4.2 1.4 0.2 setosa
5.6 2.5 3.9 1.1 versicolor
5.6 2.7 4.2 1.3 versicolor
5.6 2.8 4.9 2.0 virginica
5.6 2.9 3.6 1.3 versicolor
5.6 3.0 4.5 1.5 versicolor
5.6 3.0 4.1 1.3 versicolor
5.7 2.5 5.0 2.0 virginica
5.7 2.6 3.5 1.0 versicolor
5.7 2.8 4.5 1.3 versicolor
5.7 2.8 4.1 1.3 versicolor
5.7 2.9 4.2 1.3 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 3.8 1.7 0.3 setosa
5.7 4.4 1.5 0.4 setosa
5.8 2.6 4.0 1.2 versicolor
5.8 2.7 4.1 1.0 versicolor
5.8 2.7 3.9 1.2 versicolor
5.8 2.7 5.1 1.9 virginica
5.8 2.7 5.1 1.9 virginica
5.8 2.8 5.1 2.4 virginica
5.8 4.0 1.2 0.2 setosa
5.9 3.0 4.2 1.5 versicolor
5.9 3.0 5.1 1.8 virginica
5.9 3.2 4.8 1.8 versicolor
6.0 2.2 4.0 1.0 versicolor
6.0 2.2 5.0 1.5 virginica
6.0 2.7 5.1 1.6 versicolor
6.0 2.9 4.5 1.5 versicolor
6.0 3.0 4.8 1.8 virginica
6.0 3.4 4.5 1.6 versicolor
6.1 2.6 5.6 1.4 virginica
6.1 2.8 4.0 1.3 versicolor
6.1 2.8 4.7 1.2 versicolor
6.1 2.9 4.7 1.4 versicolor
6.1 3.0 4.6 1.4 versicolor
6.1 3.0 4.9 1.8 virginica
6.2 2.2 4.5 1.5 versicolor
6.2 2.8 4.8 1.8 virginica
6.2 2.9 4.3 1.3 versicolor
6.2 3.4 5.4 2.3 virginica
6.3 2.3 4.4 1.3 versicolor
6.3 2.5 4.9 1.5 versicolor
6.3 2.5 5.0 1.9 virginica
6.3 2.7 4.9 1.8 virginica
6.3 2.8 5.1 1.5 virginica
6.3 2.9 5.6 1.8 virginica
6.3 3.3 4.7 1.6 versicolor
6.3 3.3 6.0 2.5 virginica
6.3 3.4 5.6 2.4 virginica
6.4 2.7 5.3 1.9 virginica
6.4 2.8 5.6 2.1 virginica
6.4 2.8 5.6 2.2 virginica
6.4 2.9 4.3 1.3 versicolor
6.4 3.1 5.5 1.8 virginica
6.4 3.2 4.5 1.5 versicolor
6.4 3.2 5.3 2.3 virginica
6.5 2.8 4.6 1.5 versicolor
6.5 3.0 5.8 2.2 virginica
6.5 3.0 5.5 1.8 virginica
6.5 3.0 5.2 2.0 virginica
6.5 3.2 5.1 2.0 virginica
6.6 2.9 4.6 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.7 2.5 5.8 1.8 virginica
6.7 3.0 5.0 1.7 versicolor
6.7 3.0 5.2 2.3 virginica
6.7 3.1 4.4 1.4 versicolor
6.7 3.1 4.7 1.5 versicolor
6.7 3.1 5.6 2.4 virginica
6.7 3.3 5.7 2.1 virginica
6.7 3.3 5.7 2.5 virginica
6.8 2.8 4.8 1.4 versicolor
6.8 3.0 5.5 2.1 virginica
6.8 3.2 5.9 2.3 virginica
6.9 3.1 4.9 1.5 versicolor
6.9 3.1 5.4 2.1 virginica
6.9 3.1 5.1 2.3 virginica
6.9 3.2 5.7 2.3 virginica
7.0 3.2 4.7 1.4 versicolor
7.1 3.0 5.9 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.2 3.2 6.0 1.8 virginica
7.2 3.6 6.1 2.5 virginica
7.3 2.9 6.3 1.8 virginica
7.4 2.8 6.1 1.9 virginica
7.6 3.0 6.6 2.1 virginica
7.7 2.6 6.9 2.3 virginica
7.7 2.8 6.7 2.0 virginica
7.7 3.0 6.1 2.3 virginica
7.7 3.8 6.7 2.2 virginica
7.9 3.8 6.4 2.0 virginica

2.3.6 distinct()

The distinct() verb retains only the unique/distinct rows from a data-frame given the column(s) selected and returns only the select column(s) (if not the .keep_all parameter is change from it’s default value FALSE to TRUE). Let’s see some examples -

iris %>% distinct(Sepal.Length)
(#tab:distinct1_kable)iris data: distinct Sepal length
Sepal.Length
5.1
4.9
4.7
4.6
5.0
5.4
4.4
4.8
4.3
5.8
5.7
5.2
5.5
4.5
5.3
7.0
6.4
6.9
6.5
6.3
6.6
5.9
6.0
6.1
5.6
6.7
6.2
6.8
7.1
7.6
7.3
7.2
7.7
7.4
7.9
# here only the unique combinations of Sepal.Length and Sepal.Width are kept.
iris %>% distinct(Sepal.Length,Sepal.Width) 
(#tab:distinct2_kable)iris data: distinct Sepal length and width only
Sepal.Length Sepal.Width
5.1 3.5
4.9 3.0
4.7 3.2
4.6 3.1
5.0 3.6
5.4 3.9
4.6 3.4
5.0 3.4
4.4 2.9
4.9 3.1
5.4 3.7
4.8 3.4
4.8 3.0
4.3 3.0
5.8 4.0
5.7 4.4
5.7 3.8
5.1 3.8
5.4 3.4
5.1 3.7
4.6 3.6
5.1 3.3
5.0 3.0
5.2 3.5
5.2 3.4
4.8 3.1
5.2 4.1
5.5 4.2
5.0 3.2
5.5 3.5
4.9 3.6
4.4 3.0
5.1 3.4
5.0 3.5
4.5 2.3
4.4 3.2
4.6 3.2
5.3 3.7
5.0 3.3
7.0 3.2
6.4 3.2
6.9 3.1
5.5 2.3
6.5 2.8
5.7 2.8
6.3 3.3
4.9 2.4
6.6 2.9
5.2 2.7
5.0 2.0
5.9 3.0
6.0 2.2
6.1 2.9
5.6 2.9
6.7 3.1
5.6 3.0
5.8 2.7
6.2 2.2
5.6 2.5
5.9 3.2
6.1 2.8
6.3 2.5
6.4 2.9
6.6 3.0
6.8 2.8
6.7 3.0
6.0 2.9
5.7 2.6
5.5 2.4
6.0 2.7
5.4 3.0
6.0 3.4
6.3 2.3
5.5 2.5
5.5 2.6
6.1 3.0
5.8 2.6
5.0 2.3
5.6 2.7
5.7 3.0
5.7 2.9
6.2 2.9
5.1 2.5
7.1 3.0
6.3 2.9
6.5 3.0
7.6 3.0
4.9 2.5
7.3 2.9
6.7 2.5
7.2 3.6
6.5 3.2
6.4 2.7
6.8 3.0
5.7 2.5
5.8 2.8
7.7 3.8
7.7 2.6
6.9 3.2
5.6 2.8
7.7 2.8
6.3 2.7
6.7 3.3
7.2 3.2
6.2 2.8
6.4 2.8
7.2 3.0
7.4 2.8
7.9 3.8
6.3 2.8
6.1 2.6
7.7 3.0
6.3 3.4
6.4 3.1
6.0 3.0
6.8 3.2
6.2 3.4
# rest of the columns are also returned.
iris %>% 
  distinct(Sepal.Length,Sepal.Width, .keep_all = T)
(#tab:distinct3_kable)iris data: distinct Sepal length and width only
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
5.0 3.0 1.6 0.2 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.8 3.1 1.6 0.2 setosa
5.2 4.1 1.5 0.1 setosa
5.5 4.2 1.4 0.2 setosa
5.0 3.2 1.2 0.2 setosa
5.5 3.5 1.3 0.2 setosa
4.9 3.6 1.4 0.1 setosa
4.4 3.0 1.3 0.2 setosa
5.1 3.4 1.5 0.2 setosa
5.0 3.5 1.3 0.3 setosa
4.5 2.3 1.3 0.3 setosa
4.4 3.2 1.3 0.2 setosa
4.6 3.2 1.4 0.2 setosa
5.3 3.7 1.5 0.2 setosa
5.0 3.3 1.4 0.2 setosa
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
5.5 2.3 4.0 1.3 versicolor
6.5 2.8 4.6 1.5 versicolor
5.7 2.8 4.5 1.3 versicolor
6.3 3.3 4.7 1.6 versicolor
4.9 2.4 3.3 1.0 versicolor
6.6 2.9 4.6 1.3 versicolor
5.2 2.7 3.9 1.4 versicolor
5.0 2.0 3.5 1.0 versicolor
5.9 3.0 4.2 1.5 versicolor
6.0 2.2 4.0 1.0 versicolor
6.1 2.9 4.7 1.4 versicolor
5.6 2.9 3.6 1.3 versicolor
6.7 3.1 4.4 1.4 versicolor
5.6 3.0 4.5 1.5 versicolor
5.8 2.7 4.1 1.0 versicolor
6.2 2.2 4.5 1.5 versicolor
5.6 2.5 3.9 1.1 versicolor
5.9 3.2 4.8 1.8 versicolor
6.1 2.8 4.0 1.3 versicolor
6.3 2.5 4.9 1.5 versicolor
6.4 2.9 4.3 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.8 2.8 4.8 1.4 versicolor
6.7 3.0 5.0 1.7 versicolor
6.0 2.9 4.5 1.5 versicolor
5.7 2.6 3.5 1.0 versicolor
5.5 2.4 3.8 1.1 versicolor
6.0 2.7 5.1 1.6 versicolor
5.4 3.0 4.5 1.5 versicolor
6.0 3.4 4.5 1.6 versicolor
6.3 2.3 4.4 1.3 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
6.1 3.0 4.6 1.4 versicolor
5.8 2.6 4.0 1.2 versicolor
5.0 2.3 3.3 1.0 versicolor
5.6 2.7 4.2 1.3 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 2.9 4.2 1.3 versicolor
6.2 2.9 4.3 1.3 versicolor
5.1 2.5 3.0 1.1 versicolor
7.1 3.0 5.9 2.1 virginica
6.3 2.9 5.6 1.8 virginica
6.5 3.0 5.8 2.2 virginica
7.6 3.0 6.6 2.1 virginica
4.9 2.5 4.5 1.7 virginica
7.3 2.9 6.3 1.8 virginica
6.7 2.5 5.8 1.8 virginica
7.2 3.6 6.1 2.5 virginica
6.5 3.2 5.1 2.0 virginica
6.4 2.7 5.3 1.9 virginica
6.8 3.0 5.5 2.1 virginica
5.7 2.5 5.0 2.0 virginica
5.8 2.8 5.1 2.4 virginica
7.7 3.8 6.7 2.2 virginica
7.7 2.6 6.9 2.3 virginica
6.9 3.2 5.7 2.3 virginica
5.6 2.8 4.9 2.0 virginica
7.7 2.8 6.7 2.0 virginica
6.3 2.7 4.9 1.8 virginica
6.7 3.3 5.7 2.1 virginica
7.2 3.2 6.0 1.8 virginica
6.2 2.8 4.8 1.8 virginica
6.4 2.8 5.6 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.4 2.8 6.1 1.9 virginica
7.9 3.8 6.4 2.0 virginica
6.3 2.8 5.1 1.5 virginica
6.1 2.6 5.6 1.4 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.4 3.1 5.5 1.8 virginica
6.0 3.0 4.8 1.8 virginica
6.8 3.2 5.9 2.3 virginica
6.2 3.4 5.4 2.3 virginica

2.3.7 slice()

The slice() verb lets you index rows by their (integer) locations. It has some helpers too -

  • slice_head() selects the first row, while slice_tail() selects the last. The same can be done using slice(1) and slice(n()).

  • slice_head(<int>) selects from the first to the <int>th row, while slice_tail(<int>) selects from <int>th to the last row up to the end row.

  • slice_sample() selects rows at random.

  • slice_min() and slice_max() helper selects rows with the lowest and the highest value of the selected variable.

Few examples -

iris %>% 
  slice(1)
(#tab:slice1_kable)iris data: a random row
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
iris %>% 
  slice(10:n()) 
(#tab:slice2_kable)iris data: from 10th row to the end
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.1 3.5 1.4 0.3 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
4.8 3.4 1.9 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3.1 1.6 0.2 setosa
5.4 3.4 1.5 0.4 setosa
5.2 4.1 1.5 0.1 setosa
5.5 4.2 1.4 0.2 setosa
4.9 3.1 1.5 0.2 setosa
5.0 3.2 1.2 0.2 setosa
5.5 3.5 1.3 0.2 setosa
4.9 3.6 1.4 0.1 setosa
4.4 3.0 1.3 0.2 setosa
5.1 3.4 1.5 0.2 setosa
5.0 3.5 1.3 0.3 setosa
4.5 2.3 1.3 0.3 setosa
4.4 3.2 1.3 0.2 setosa
5.0 3.5 1.6 0.6 setosa
5.1 3.8 1.9 0.4 setosa
4.8 3.0 1.4 0.3 setosa
5.1 3.8 1.6 0.2 setosa
4.6 3.2 1.4 0.2 setosa
5.3 3.7 1.5 0.2 setosa
5.0 3.3 1.4 0.2 setosa
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
5.5 2.3 4.0 1.3 versicolor
6.5 2.8 4.6 1.5 versicolor
5.7 2.8 4.5 1.3 versicolor
6.3 3.3 4.7 1.6 versicolor
4.9 2.4 3.3 1.0 versicolor
6.6 2.9 4.6 1.3 versicolor
5.2 2.7 3.9 1.4 versicolor
5.0 2.0 3.5 1.0 versicolor
5.9 3.0 4.2 1.5 versicolor
6.0 2.2 4.0 1.0 versicolor
6.1 2.9 4.7 1.4 versicolor
5.6 2.9 3.6 1.3 versicolor
6.7 3.1 4.4 1.4 versicolor
5.6 3.0 4.5 1.5 versicolor
5.8 2.7 4.1 1.0 versicolor
6.2 2.2 4.5 1.5 versicolor
5.6 2.5 3.9 1.1 versicolor
5.9 3.2 4.8 1.8 versicolor
6.1 2.8 4.0 1.3 versicolor
6.3 2.5 4.9 1.5 versicolor
6.1 2.8 4.7 1.2 versicolor
6.4 2.9 4.3 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.8 2.8 4.8 1.4 versicolor
6.7 3.0 5.0 1.7 versicolor
6.0 2.9 4.5 1.5 versicolor
5.7 2.6 3.5 1.0 versicolor
5.5 2.4 3.8 1.1 versicolor
5.5 2.4 3.7 1.0 versicolor
5.8 2.7 3.9 1.2 versicolor
6.0 2.7 5.1 1.6 versicolor
5.4 3.0 4.5 1.5 versicolor
6.0 3.4 4.5 1.6 versicolor
6.7 3.1 4.7 1.5 versicolor
6.3 2.3 4.4 1.3 versicolor
5.6 3.0 4.1 1.3 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
6.1 3.0 4.6 1.4 versicolor
5.8 2.6 4.0 1.2 versicolor
5.0 2.3 3.3 1.0 versicolor
5.6 2.7 4.2 1.3 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 2.9 4.2 1.3 versicolor
6.2 2.9 4.3 1.3 versicolor
5.1 2.5 3.0 1.1 versicolor
5.7 2.8 4.1 1.3 versicolor
6.3 3.3 6.0 2.5 virginica
5.8 2.7 5.1 1.9 virginica
7.1 3.0 5.9 2.1 virginica
6.3 2.9 5.6 1.8 virginica
6.5 3.0 5.8 2.2 virginica
7.6 3.0 6.6 2.1 virginica
4.9 2.5 4.5 1.7 virginica
7.3 2.9 6.3 1.8 virginica
6.7 2.5 5.8 1.8 virginica
7.2 3.6 6.1 2.5 virginica
6.5 3.2 5.1 2.0 virginica
6.4 2.7 5.3 1.9 virginica
6.8 3.0 5.5 2.1 virginica
5.7 2.5 5.0 2.0 virginica
5.8 2.8 5.1 2.4 virginica
6.4 3.2 5.3 2.3 virginica
6.5 3.0 5.5 1.8 virginica
7.7 3.8 6.7 2.2 virginica
7.7 2.6 6.9 2.3 virginica
6.0 2.2 5.0 1.5 virginica
6.9 3.2 5.7 2.3 virginica
5.6 2.8 4.9 2.0 virginica
7.7 2.8 6.7 2.0 virginica
6.3 2.7 4.9 1.8 virginica
6.7 3.3 5.7 2.1 virginica
7.2 3.2 6.0 1.8 virginica
6.2 2.8 4.8 1.8 virginica
6.1 3.0 4.9 1.8 virginica
6.4 2.8 5.6 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.4 2.8 6.1 1.9 virginica
7.9 3.8 6.4 2.0 virginica
6.4 2.8 5.6 2.2 virginica
6.3 2.8 5.1 1.5 virginica
6.1 2.6 5.6 1.4 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.4 3.1 5.5 1.8 virginica
6.0 3.0 4.8 1.8 virginica
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica
iris %>% 
  slice_min( Sepal.Length)
(#tab:slice3_kable)iris data: row with the lowest sepal length
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4.3 3 1.1 0.1 setosa

2.3.8 join

A disclaimer: there’s no verb (exactly) called join() in dplyr (at least, to date). However, there are two types of join verbs -

  • inner_join() and

  • outer_join (which is also not a verb, but a class of three verbs):

    • left_join(),

    • right_join() and

    • full_join().

Join verbs joins columns from two different data-frames based on a common key column.

inner_join() verb joins two data-frame and retains the rows where the keys match. This means that there is a potential loss of observations that we may not appreciate in the real-life analysis.

On the other hand, if we have two data-frames x and y, the left_join() verb matches the keys from x and y, while keeps all the rows from x and joins the matched rows (based on the key column) from y. The empty cells (if any) are filled with NA values. For right_join() verb, is the opposite scenario. On the other hand, the full_join() verb retains all the rows from both data-frames and empty cells are filled with NA values. Let’s clear the concept with some examples -

x <- iris %>% 
  select(Sepal.Length,Sepal.Width,Species) %>% 
  filter(Species %in% c("setosa", "versicolor")) %>% 
  slice_sample(n=10)

y <- iris %>% 
  select(Petal.Length,Petal.Width,Species) %>% 
  filter(Species %in% c("versicolor", "virginica")) %>% 
  slice_sample(n=10)

x %>% 
  inner_join(y, by = "Species")
(#tab:join1_kable)iris data: inner_join
Sepal.Length Sepal.Width Species Petal.Length Petal.Width
6.1 2.9 versicolor 4.3 1.3
6.1 2.9 versicolor 4.5 1.6
6.1 2.9 versicolor 4.0 1.3
6.1 2.9 versicolor 4.8 1.4
6.1 2.9 versicolor 3.3 1.0
6.1 2.9 versicolor 4.5 1.5
5.9 3.0 versicolor 4.3 1.3
5.9 3.0 versicolor 4.5 1.6
5.9 3.0 versicolor 4.0 1.3
5.9 3.0 versicolor 4.8 1.4
5.9 3.0 versicolor 3.3 1.0
5.9 3.0 versicolor 4.5 1.5
5.7 2.8 versicolor 4.3 1.3
5.7 2.8 versicolor 4.5 1.6
5.7 2.8 versicolor 4.0 1.3
5.7 2.8 versicolor 4.8 1.4
5.7 2.8 versicolor 3.3 1.0
5.7 2.8 versicolor 4.5 1.5
6.8 2.8 versicolor 4.3 1.3
6.8 2.8 versicolor 4.5 1.6
6.8 2.8 versicolor 4.0 1.3
6.8 2.8 versicolor 4.8 1.4
6.8 2.8 versicolor 3.3 1.0
6.8 2.8 versicolor 4.5 1.5
5.6 2.9 versicolor 4.3 1.3
5.6 2.9 versicolor 4.5 1.6
5.6 2.9 versicolor 4.0 1.3
5.6 2.9 versicolor 4.8 1.4
5.6 2.9 versicolor 3.3 1.0
5.6 2.9 versicolor 4.5 1.5
x %>% 
  left_join(y, by = "Species")
(#tab:join2_kable)iris data: left_join
Sepal.Length Sepal.Width Species Petal.Length Petal.Width
5.0 3.5 setosa NA NA
5.2 4.1 setosa NA NA
6.1 2.9 versicolor 4.3 1.3
6.1 2.9 versicolor 4.5 1.6
6.1 2.9 versicolor 4.0 1.3
6.1 2.9 versicolor 4.8 1.4
6.1 2.9 versicolor 3.3 1.0
6.1 2.9 versicolor 4.5 1.5
5.4 3.7 setosa NA NA
4.6 3.1 setosa NA NA
5.0 3.6 setosa NA NA
5.9 3.0 versicolor 4.3 1.3
5.9 3.0 versicolor 4.5 1.6
5.9 3.0 versicolor 4.0 1.3
5.9 3.0 versicolor 4.8 1.4
5.9 3.0 versicolor 3.3 1.0
5.9 3.0 versicolor 4.5 1.5
5.7 2.8 versicolor 4.3 1.3
5.7 2.8 versicolor 4.5 1.6
5.7 2.8 versicolor 4.0 1.3
5.7 2.8 versicolor 4.8 1.4
5.7 2.8 versicolor 3.3 1.0
5.7 2.8 versicolor 4.5 1.5
6.8 2.8 versicolor 4.3 1.3
6.8 2.8 versicolor 4.5 1.6
6.8 2.8 versicolor 4.0 1.3
6.8 2.8 versicolor 4.8 1.4
6.8 2.8 versicolor 3.3 1.0
6.8 2.8 versicolor 4.5 1.5
5.6 2.9 versicolor 4.3 1.3
5.6 2.9 versicolor 4.5 1.6
5.6 2.9 versicolor 4.0 1.3
5.6 2.9 versicolor 4.8 1.4
5.6 2.9 versicolor 3.3 1.0
5.6 2.9 versicolor 4.5 1.5
x %>% 
  right_join(y, by = "Species")
(#tab:join3_kable)iris data: right_join
Sepal.Length Sepal.Width Species Petal.Length Petal.Width
6.1 2.9 versicolor 4.3 1.3
6.1 2.9 versicolor 4.5 1.6
6.1 2.9 versicolor 4.0 1.3
6.1 2.9 versicolor 4.8 1.4
6.1 2.9 versicolor 3.3 1.0
6.1 2.9 versicolor 4.5 1.5
5.9 3.0 versicolor 4.3 1.3
5.9 3.0 versicolor 4.5 1.6
5.9 3.0 versicolor 4.0 1.3
5.9 3.0 versicolor 4.8 1.4
5.9 3.0 versicolor 3.3 1.0
5.9 3.0 versicolor 4.5 1.5
5.7 2.8 versicolor 4.3 1.3
5.7 2.8 versicolor 4.5 1.6
5.7 2.8 versicolor 4.0 1.3
5.7 2.8 versicolor 4.8 1.4
5.7 2.8 versicolor 3.3 1.0
5.7 2.8 versicolor 4.5 1.5
6.8 2.8 versicolor 4.3 1.3
6.8 2.8 versicolor 4.5 1.6
6.8 2.8 versicolor 4.0 1.3
6.8 2.8 versicolor 4.8 1.4
6.8 2.8 versicolor 3.3 1.0
6.8 2.8 versicolor 4.5 1.5
5.6 2.9 versicolor 4.3 1.3
5.6 2.9 versicolor 4.5 1.6
5.6 2.9 versicolor 4.0 1.3
5.6 2.9 versicolor 4.8 1.4
5.6 2.9 versicolor 3.3 1.0
5.6 2.9 versicolor 4.5 1.5
NA NA virginica 5.1 2.3
NA NA virginica 5.6 1.4
NA NA virginica 5.7 2.3
NA NA virginica 5.0 2.0
x %>% 
  full_join(y, by = "Species")
(#tab:join4_kable)iris data: full_join
Sepal.Length Sepal.Width Species Petal.Length Petal.Width
5.0 3.5 setosa NA NA
5.2 4.1 setosa NA NA
6.1 2.9 versicolor 4.3 1.3
6.1 2.9 versicolor 4.5 1.6
6.1 2.9 versicolor 4.0 1.3
6.1 2.9 versicolor 4.8 1.4
6.1 2.9 versicolor 3.3 1.0
6.1 2.9 versicolor 4.5 1.5
5.4 3.7 setosa NA NA
4.6 3.1 setosa NA NA
5.0 3.6 setosa NA NA
5.9 3.0 versicolor 4.3 1.3
5.9 3.0 versicolor 4.5 1.6
5.9 3.0 versicolor 4.0 1.3
5.9 3.0 versicolor 4.8 1.4
5.9 3.0 versicolor 3.3 1.0
5.9 3.0 versicolor 4.5 1.5
5.7 2.8 versicolor 4.3 1.3
5.7 2.8 versicolor 4.5 1.6
5.7 2.8 versicolor 4.0 1.3
5.7 2.8 versicolor 4.8 1.4
5.7 2.8 versicolor 3.3 1.0
5.7 2.8 versicolor 4.5 1.5
6.8 2.8 versicolor 4.3 1.3
6.8 2.8 versicolor 4.5 1.6
6.8 2.8 versicolor 4.0 1.3
6.8 2.8 versicolor 4.8 1.4
6.8 2.8 versicolor 3.3 1.0
6.8 2.8 versicolor 4.5 1.5
5.6 2.9 versicolor 4.3 1.3
5.6 2.9 versicolor 4.5 1.6
5.6 2.9 versicolor 4.0 1.3
5.6 2.9 versicolor 4.8 1.4
5.6 2.9 versicolor 3.3 1.0
5.6 2.9 versicolor 4.5 1.5
NA NA virginica 5.1 2.3
NA NA virginica 5.6 1.4
NA NA virginica 5.7 2.3
NA NA virginica 5.0 2.0

2.3.9 group_by() and summarise()

I will be describing group_by() and summarise() verbs together to show the effect of the former. group_by() is the most important grouping verb in dplyr. It takes one or more variables of the data-frame to group by -

iris %>% 
  group_by(Species)
(#tab:group_by1_kable)iris data: group_by Species
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa
5.7 4.4 1.5 0.4 setosa
5.4 3.9 1.3 0.4 setosa
5.1 3.5 1.4 0.3 setosa
5.7 3.8 1.7 0.3 setosa
5.1 3.8 1.5 0.3 setosa
5.4 3.4 1.7 0.2 setosa
5.1 3.7 1.5 0.4 setosa
4.6 3.6 1.0 0.2 setosa
5.1 3.3 1.7 0.5 setosa
4.8 3.4 1.9 0.2 setosa
5.0 3.0 1.6 0.2 setosa
5.0 3.4 1.6 0.4 setosa
5.2 3.5 1.5 0.2 setosa
5.2 3.4 1.4 0.2 setosa
4.7 3.2 1.6 0.2 setosa
4.8 3.1 1.6 0.2 setosa
5.4 3.4 1.5 0.4 setosa
5.2 4.1 1.5 0.1 setosa
5.5 4.2 1.4 0.2 setosa
4.9 3.1 1.5 0.2 setosa
5.0 3.2 1.2 0.2 setosa
5.5 3.5 1.3 0.2 setosa
4.9 3.6 1.4 0.1 setosa
4.4 3.0 1.3 0.2 setosa
5.1 3.4 1.5 0.2 setosa
5.0 3.5 1.3 0.3 setosa
4.5 2.3 1.3 0.3 setosa
4.4 3.2 1.3 0.2 setosa
5.0 3.5 1.6 0.6 setosa
5.1 3.8 1.9 0.4 setosa
4.8 3.0 1.4 0.3 setosa
5.1 3.8 1.6 0.2 setosa
4.6 3.2 1.4 0.2 setosa
5.3 3.7 1.5 0.2 setosa
5.0 3.3 1.4 0.2 setosa
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
6.9 3.1 4.9 1.5 versicolor
5.5 2.3 4.0 1.3 versicolor
6.5 2.8 4.6 1.5 versicolor
5.7 2.8 4.5 1.3 versicolor
6.3 3.3 4.7 1.6 versicolor
4.9 2.4 3.3 1.0 versicolor
6.6 2.9 4.6 1.3 versicolor
5.2 2.7 3.9 1.4 versicolor
5.0 2.0 3.5 1.0 versicolor
5.9 3.0 4.2 1.5 versicolor
6.0 2.2 4.0 1.0 versicolor
6.1 2.9 4.7 1.4 versicolor
5.6 2.9 3.6 1.3 versicolor
6.7 3.1 4.4 1.4 versicolor
5.6 3.0 4.5 1.5 versicolor
5.8 2.7 4.1 1.0 versicolor
6.2 2.2 4.5 1.5 versicolor
5.6 2.5 3.9 1.1 versicolor
5.9 3.2 4.8 1.8 versicolor
6.1 2.8 4.0 1.3 versicolor
6.3 2.5 4.9 1.5 versicolor
6.1 2.8 4.7 1.2 versicolor
6.4 2.9 4.3 1.3 versicolor
6.6 3.0 4.4 1.4 versicolor
6.8 2.8 4.8 1.4 versicolor
6.7 3.0 5.0 1.7 versicolor
6.0 2.9 4.5 1.5 versicolor
5.7 2.6 3.5 1.0 versicolor
5.5 2.4 3.8 1.1 versicolor
5.5 2.4 3.7 1.0 versicolor
5.8 2.7 3.9 1.2 versicolor
6.0 2.7 5.1 1.6 versicolor
5.4 3.0 4.5 1.5 versicolor
6.0 3.4 4.5 1.6 versicolor
6.7 3.1 4.7 1.5 versicolor
6.3 2.3 4.4 1.3 versicolor
5.6 3.0 4.1 1.3 versicolor
5.5 2.5 4.0 1.3 versicolor
5.5 2.6 4.4 1.2 versicolor
6.1 3.0 4.6 1.4 versicolor
5.8 2.6 4.0 1.2 versicolor
5.0 2.3 3.3 1.0 versicolor
5.6 2.7 4.2 1.3 versicolor
5.7 3.0 4.2 1.2 versicolor
5.7 2.9 4.2 1.3 versicolor
6.2 2.9 4.3 1.3 versicolor
5.1 2.5 3.0 1.1 versicolor
5.7 2.8 4.1 1.3 versicolor
6.3 3.3 6.0 2.5 virginica
5.8 2.7 5.1 1.9 virginica
7.1 3.0 5.9 2.1 virginica
6.3 2.9 5.6 1.8 virginica
6.5 3.0 5.8 2.2 virginica
7.6 3.0 6.6 2.1 virginica
4.9 2.5 4.5 1.7 virginica
7.3 2.9 6.3 1.8 virginica
6.7 2.5 5.8 1.8 virginica
7.2 3.6 6.1 2.5 virginica
6.5 3.2 5.1 2.0 virginica
6.4 2.7 5.3 1.9 virginica
6.8 3.0 5.5 2.1 virginica
5.7 2.5 5.0 2.0 virginica
5.8 2.8 5.1 2.4 virginica
6.4 3.2 5.3 2.3 virginica
6.5 3.0 5.5 1.8 virginica
7.7 3.8 6.7 2.2 virginica
7.7 2.6 6.9 2.3 virginica
6.0 2.2 5.0 1.5 virginica
6.9 3.2 5.7 2.3 virginica
5.6 2.8 4.9 2.0 virginica
7.7 2.8 6.7 2.0 virginica
6.3 2.7 4.9 1.8 virginica
6.7 3.3 5.7 2.1 virginica
7.2 3.2 6.0 1.8 virginica
6.2 2.8 4.8 1.8 virginica
6.1 3.0 4.9 1.8 virginica
6.4 2.8 5.6 2.1 virginica
7.2 3.0 5.8 1.6 virginica
7.4 2.8 6.1 1.9 virginica
7.9 3.8 6.4 2.0 virginica
6.4 2.8 5.6 2.2 virginica
6.3 2.8 5.1 1.5 virginica
6.1 2.6 5.6 1.4 virginica
7.7 3.0 6.1 2.3 virginica
6.3 3.4 5.6 2.4 virginica
6.4 3.1 5.5 1.8 virginica
6.0 3.0 4.8 1.8 virginica
6.9 3.1 5.4 2.1 virginica
6.7 3.1 5.6 2.4 virginica
6.9 3.1 5.1 2.3 virginica
5.8 2.7 5.1 1.9 virginica
6.8 3.2 5.9 2.3 virginica
6.7 3.3 5.7 2.5 virginica
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica


Rather than some messages on the R Console, you don’t see any change in the structure of the iris data-frame yet. Let’s select Sepal.Length and see the effect -

iris %>% 
  group_by(Species) %>% 
  select(Sepal.Length) 
(#tab:group_by2_kable)iris data: group by Species and selected by Sepal length
Species Sepal.Length
setosa 5.1
setosa 4.9
setosa 4.7
setosa 4.6
setosa 5.0
setosa 5.4
setosa 4.6
setosa 5.0
setosa 4.4
setosa 4.9
setosa 5.4
setosa 4.8
setosa 4.8
setosa 4.3
setosa 5.8
setosa 5.7
setosa 5.4
setosa 5.1
setosa 5.7
setosa 5.1
setosa 5.4
setosa 5.1
setosa 4.6
setosa 5.1
setosa 4.8
setosa 5.0
setosa 5.0
setosa 5.2
setosa 5.2
setosa 4.7
setosa 4.8
setosa 5.4
setosa 5.2
setosa 5.5
setosa 4.9
setosa 5.0
setosa 5.5
setosa 4.9
setosa 4.4
setosa 5.1
setosa 5.0
setosa 4.5
setosa 4.4
setosa 5.0
setosa 5.1
setosa 4.8
setosa 5.1
setosa 4.6
setosa 5.3
setosa 5.0
versicolor 7.0
versicolor 6.4
versicolor 6.9
versicolor 5.5
versicolor 6.5
versicolor 5.7
versicolor 6.3
versicolor 4.9
versicolor 6.6
versicolor 5.2
versicolor 5.0
versicolor 5.9
versicolor 6.0
versicolor 6.1
versicolor 5.6
versicolor 6.7
versicolor 5.6
versicolor 5.8
versicolor 6.2
versicolor 5.6
versicolor 5.9
versicolor 6.1
versicolor 6.3
versicolor 6.1
versicolor 6.4
versicolor 6.6
versicolor 6.8
versicolor 6.7
versicolor 6.0
versicolor 5.7
versicolor 5.5
versicolor 5.5
versicolor 5.8
versicolor 6.0
versicolor 5.4
versicolor 6.0
versicolor 6.7
versicolor 6.3
versicolor 5.6
versicolor 5.5
versicolor 5.5
versicolor 6.1
versicolor 5.8
versicolor 5.0
versicolor 5.6
versicolor 5.7
versicolor 5.7
versicolor 6.2
versicolor 5.1
versicolor 5.7
virginica 6.3
virginica 5.8
virginica 7.1
virginica 6.3
virginica 6.5
virginica 7.6
virginica 4.9
virginica 7.3
virginica 6.7
virginica 7.2
virginica 6.5
virginica 6.4
virginica 6.8
virginica 5.7
virginica 5.8
virginica 6.4
virginica 6.5
virginica 7.7
virginica 7.7
virginica 6.0
virginica 6.9
virginica 5.6
virginica 7.7
virginica 6.3
virginica 6.7
virginica 7.2
virginica 6.2
virginica 6.1
virginica 6.4
virginica 7.2
virginica 7.4
virginica 7.9
virginica 6.4
virginica 6.3
virginica 6.1
virginica 7.7
virginica 6.3
virginica 6.4
virginica 6.0
virginica 6.9
virginica 6.7
virginica 6.9
virginica 5.8
virginica 6.8
virginica 6.7
virginica 6.7
virginica 6.3
virginica 6.5
virginica 6.2
virginica 5.9


Though I selected only the Sepal.Length, the Species column also appears. Yes, that’s because we applied the group_by() verb beforehand. But the most dramatic effect can be seen in conjunction with the summarise() verb.

summarise() generates a new data-frame and returns one row (with the result of course) for each combination of grouping variables. In the case of no grouping variables, the output has a single row summarising all observations in the input. Now, let’s see the effect of group_by() in conjunction with summarise() verb -

iris %>% 
  group_by(Species) %>% 
  select(Sepal.Length) %>% 
  summarise(count=n())
(#tab:summarise1_kable)iris data: summarised count by Species
Species count
setosa 50
versicolor 50
virginica 50
iris %>% 
  group_by(Species) %>% 
  select(Sepal.Length) %>% 
  summarise(mean_Sepal_length=mean(Sepal.Length))
(#tab:summarise2_kable)iris data: Summarised mean Sepal length by Species
Species mean_Sepal_length
setosa 5.006
versicolor 5.936
virginica 6.588
# However, without any grouping -
iris %>% 
  select(Sepal.Length) %>% 
  summarise(mean_Sepal_length=mean(Sepal.Length))
(#tab:summarise3.kable)iris data: summarised mean Sepal length without grouping
mean_Sepal_length
5.843333

2.4 Exercise

Now, it’s time for a mini exercise:

  1. Install the package called gapminder. You will find a dataset called gapminder. For each continent, calculate the mean of life expectancy at birth for people whose data were collected after 2002 (not inclusive). The answer will look like below -
Table 2.1: gapminder data: summarised mean of life expectancy by continent
continent mean_LE
Oceania 80.22975
Europe 77.17460
Americas 73.01508
Asia 69.98118
Africa 54.06563
  1. Do the same for each country (instead of continent) and print the top 10 countries by life expectancy at birth. The result will look like this -
Table 2.2: gapminder data: summarised mean of life expectancy of top 10 countries
country mean_LE
Japan 82.3015
Hong Kong, China 81.8515
Switzerland 81.1605
Iceland 81.1285
Australia 80.8025
Sweden 80.4620
Italy 80.3930
Spain 80.3605
Israel 80.2205
Canada 80.2115