In this article we will learn how to sort rows in a dataframe in R using arrange() command from dplyr package.



Theory

Once we import the dataset into R, we often want to do some preliminary analysis.

Aside from descriptive statistics, certain datasets need to be ranked or sorted by a variable of interest.

Depending on the case and on the data we are working with, sorting the dataset allows us to see the top (or bottom) observations and helps answer a lot of business insights questions.

Think of for example working as a consultant for a chain of food stores. The company has 100 stores with data on annual sales for each of them.

One of the first things that comes to mind of a reporting analyst is to find top performing and worst performing stores.

Sure you can go manually through 100 rows, but what if it’s 1000?

This is where the skill of being able to sort by variables comes in handy!

Whether it’s a simple case or a more complex one, once you sort your data, it makes it much easier to draw insights and come up with data driven strategic decisions.



Application

Below are the steps we are going to take to make sure we do master the skill of sorting dataframe by variable in R:

  1. Installing dplyr package
  2. Basic arrange() command description
  3. Loading sample dataset: mtcars
  4. Sort dataframe by single variable in R
  5. Sort dataframe by multiple variables in R



Part 1. Installing dplyr package

As R doesn’t have this command built in, we will need to install an additional package in order to sort a dataframe by variable in R.

We turn back again to the famous dplyr package which has exactly what we need to perform this type of data manipulation.

In order to install and “call: the package into your R (R Studio) environment, you should use the following code:


install.packages("dplyr")
library(dplyr)

Awesome! The package is now installed and added to the environment.

The next step is to discuss the arrange() function in more detail to see its capabilities.



Part 2. Basic arrange() command description

The very brief theoretical explanation of the function is the following:

arrange(data, variables)

Here, “data” refers to the dataset you are going to sort; and “variables” refers a list of variables (1 or more) that you would like to sort by.

By default, arrange() will sort in the ascending order. If you would like to sort in descending order, you should use desc(variable) in the “variables” component.

I will provide several examples later in the article.

Now let’s prepare our dataset and get started on how to apply arrange() function in R.



Part 3. Loading sample dataset: mtcars

Similar to the majority of my articles and for simplicity, we will be working with one of the datasets already built into R.

If you would like to import your own dataset and follow the procedure described in this article, you are more then welcome to do so!

The dataset I will be using in this article is the same as the one I used in the article on how to filter by value in R. It is mtcars.

It contains observations on 32 cars across 11 variables (weight, fuel efficiency, engine, and so on).

As I always do, I call the dataset I’m working with in R Studio “mydata”. Let’s go ahead and add mtcars to the environment.


mydata<-mtcars

You can take a look at your dataset using the following code:


View(mydata)

The dataset is added and ready. Now let's get into examples of sorting in R!



Part 4. Sort dataframe by single variable in R

We will start of with a simple example of sorting by one variable and then get into more complex ones.

Recall that our dataset contains observations on cars.

One of the interesting variables to show the sorting example with is the fuel consumption which is stored in the data frame as "mpg".

Example 1.1: Sorting dataframe in R by single variable in ascending order

Let's go ahead and try the sorting by the "mpg" variable.

I will be storing all the results of sorting into different data frames so then it is easier for us to compare the results of different examples.

I will call the new data frame with results "mpg_asc".

You can do the sorting using the following code:


mpg_asc<-arrange(mydata, mpg)

When you look at the resulting data frame (using View(mpg_asc) command) you will see the results of sorting we just did.

You will also notice that, by default, R sorted by "mpg" in ascending order.

It is just the functionality of the arrange() function.

Example 1.2: Sorting dataframe in R by single variable in descending order

However, if you would like to sort in descending order, you should add desc() to the variable in the formula and you will get the desired result.

Let's see it in practice!

The new results data frame will be called "mpg_desc".

You can do the sorting using the following code:


mpg_desc<-arrange(mydata, desc(mpg))

Now, looking at the new results (using View(mpg_desc)) you will notice that the data is sorted in descending order by "mpg".

Quite useful right?

But more interesting and complex sorting in R is coming in the next section!



Part 5. Sort dataframe by multiple variables in R

Going back to our original dataset, recall that we were sorting by fuel efficiency "mpg".

Let's now take a step further and add the second sorting variable "wt" which represents the weight of a car (in thousands of pounds).

Example 2.1: Sorting dataframe in R by multiple variables in ascending order

It will be a very similar procedure as in Example 1.1 with the only difference being the addition of a new variable.

The new results data frame will be called "mpg_asc_wt_asc".

You can do the sorting using the following code:


mpg_asc_wt_asc<-arrange(mydata, mpg, wt)

The logic behind this type of sorting is that it will first sort the dataframe by "mpg" and then by "wt".

This means that "mpg" is the primary sorting variable.

To intuitively understand what I'm referring to, take a look at the new dataset using View(mpg_asc_wt_asc). For rows 5 and 6 you will notice that "mpg" in row 6 > "mpg" in row 5, while "wt" in row 6 < "wt" in row 5.

This occurred due to sorting being primarily done based on "mpg" (so first sort by "mpg" in asceding order) and then sort by "wt" in ascending order.

This example is relatively simple and intuitive.

Next, let's try to reverse the order of the observations.

Example 2.2: Sorting dataframe in R by multiple variables in descending order

The goal of this sorting is similar to the previous example, we are just going to reverse the order.

It is also very similar in terms of code. All we need to do is add desc() to both variables and do the sorting.

You can do it using the following code:


mpg_desc_wt_desc<-arrange(mydata, desc(mpg), desc(wt))

In this case, the function first sorts the dataframe in descending order based on "mpg" variable primarily, and then in descending order based on "wt".

If you take a look at the new dataframe using View(mpg_desc_wt_desc), you will notice that it's just the reverse order of mpg_asc_wt_asc.

Remember, that you can sort the dataframe in R based on more than 2 variables (I am doing it just for the purposes of simple examples).

Lastly, let's take a look how to sort dataframe based on one variable being in ascending order and the other variable being in descending order.

Example 2.3: Sorting dataframe in R by multiple variables in ascending and descending orders

This is an extension of the previous two examples where we either sorted by variables both being in ascending order or in descending order.

Now we will sort by one variable is ascending order and the other variable in descending order.

Often, depending on your dataset, it will be a very useful tool to take a preliminary look at the data.

You can do it using the following code:


mpg_asc_wt_desc<-arrange(mydata, mpg, desc(wt))

Looking at the new dataframe using View(mpg_asc_wt_desc) we see that the cars with lowest "mpg" and highest "wt" appear at the top of the dataframe.

It's quite intuitive, isn't it?

The larger is the weight of the car, the more fuel it consumes.

This is an example of a very simple data insight that could be extracted using simple sorting by multiple variables in R.


This concludes our article on how to sort dataframe in R. You can learn more about various data manipulation techniques in the Data Manipulation section.