In this article we will learn how to do right outer join in R for two tables using merge() command.


Theory

Did you ever come across a problem of merging two tables together but only for matching rows in “right” table?

For example, we are looking at a sales report and all we see is “customer_id” and “sale_amount”.

We want to find out which location it was shipped to, but that data with customer locations is stored in another table.

The issue is: you have 8 customers who made a purchase based on the sales report. Yet your complete list of customers’ locations includes 14 individual customers.

You care about merging the list of all customers with the sales report (regardless if the customer made a purchase or not) and to have all customer IDs listed.

Sounds quite common right?

It’s time to perform a right outer join in R!

Below I will show an example of the usage of popular R base command merge().



Application

Below are the steps we are going to take to make sure we do master the skill of doing right outer join in R:

  1. Basic merge() command description
  2. Loading the sales.csv and locations.csv files into R
  3. Performing right outer join in R for these two tables



Part 1. Basic merge() command description

The short theoretical explanation of the function is the following:

merge(x, y, by, by.x, by.y, sort=TRUE)

Here “x” and “y” are tables that we will be merging together.

“by”, “by.x”, and “by.y” allows us to specify what we will be merging by.

You can learn more about merge() command here.



Part 2. Loading the sales.csv and locations.csv into R

For the purpose of this tutorial, I created two sample .csv datasets that you can use to practice doing right outer join in R.

The files can be downloaded here: sales.csv, locations.csv

Now let’s go ahead and load it into R using the following commands:


sales<-read.csv("/Users/DataSharkie/Desktop/sales.csv")
locations<-read.csv("/Users/DataSharkie/Desktop/locations.csv")

Note: remember that your location of the file will be different from mine, so adjust the code accordingly!

If you would like to learn more on how to import .csv files into R you can find an article here.

Once the datasets are imported, we can view the table using the following command:


View(sales)
View(locations)

After you familiarized yourself with the two tables, it's time to do right outer join in R!



Part 3. Performing right outer join in R for these two tables

Our goal here is to create a new table "right_join", where we will only have the entries with matching rows from the "right" table.

By default, the merge() command in R performs an inner join, so we will need additional specification in terms of identifying the parameters that we will be merging by.

At this point we are all set to do the right outer join in R.

You can do it using the following code:


right_join<-merge(sales, locations, by = "customer_id", all.y = TRUE)

We are merging two tables based on the "customer_id" column and setting "all.y=TRUE" makes sure that only those rows from the "left" table that match with rows in the "right" table will be added to the new table.

You can take a look at your new table by using this command:


View(right_join)


If you are interested to learn more about different types of merging in R, you can find more articles here.