In this article we will learn how to do full outer join in R for two tables using merge() command.


Theory

Did you ever come across a problem of merging two tables together but including all matching and non-matching rows in both tables?

For example, we are looking at a sales report and all we see is “customer_id” and “sale_amount”.

This report will include only those customer IDs that had at least one sale transaction.

What about getting a full list out?

Sounds like you faced this issue before!

It’s time to perform a full outer join in R!

Below I will show an example of the usage of popular R base command merge().



Application

Below are the steps we are going to take to make sure we do master the skill of doing full outer join in R:

  1. Basic merge() command description
  2. Loading the sales.csv and locations.csv files into R
  3. Performing full outer join in R for these two tables



Part 1. Basic merge() command description

The short theoretical explanation of the function is the following:

merge(x, y, by, by.x, by.y, sort=TRUE)

Here “x” and “y” are tables that we will be merging together.

“by”, “by.x”, and “by.y” allows us to specify what we will be merging by.

You can learn more about merge() command here.



Part 2. Loading the sales.csv and locations.csv files into R

For the purpose of this tutorial, I created two sample .csv datasets that you can use to practice doing full outer join in R.

The files can be downloaded here: sales.csv, locations.csv

Now let’s go ahead and load it into R using the following commands:


sales<-read.csv("/Users/DataSharkie/Desktop/sales.csv")
locations<-read.csv("/Users/DataSharkie/Desktop/locations.csv")

Note: remember that your location of the file will be different from mine, so adjust the code accordingly!

If you would like to learn more on how to import .csv files into R you can find an article here.

Once the datasets are imported, we can view the table using the following command:


View(sales)
View(locations)

After you familiarized yourself with the two tables, it's time to do full outer join in R!



Part 3. Performing full outer join in R for these two tables

Our goal here is to create a new table "outer_join", where we will have all the entries with matching and non-matching rows from both tables.

By default, the merge() command in R performs an inner join, so we will need additional specification in terms of identifying the parameters that we will be merging by.

At this point we are all set to do an inner join in R.

You can do it using the following code:


outer_join<-merge(sales, locations, by = "customer_id", all = TRUE)

We are merging two tables based on the "customer_id" column and setting "all=TRUE" makes sure that doesn't matter if the two tables match or not match for a particular customer ID, it will still be added to the new table.

You can take a look at your new table by using this command:


View(outer_join)



If you are interested to learn more about different types of merging in R, you can find more articles here.