In this article we will learn how to do inner join in R for two tables using merge() command.


Theory

Did you ever come across a problem of merging two tables together but only for matching rows in both tables?

For example, we are looking at a sales report and all we see is “customer_id” and “sale_amount”.

We want to find out which location it was shipped to, but that data with customer locations is stored in another table.

Sounds quite common right?

It’s time to perform an inner join in R!

Below I will show an example of the usage of popular R base command merge().



Application

Below are the steps we are going to take to make sure we do master the skill of doing inner join in R:

  1. Basic merge() command description
  2. Loading the sales.csv and locations.csv files into R
  3. Performing inner join in R for these two tables



Part 1. Basic merge() command description

The short theoretical explanation of the function is the following:

merge(x, y, by, by.x, by.y, sort=TRUE)

Here “x” and “y” are tables that we will be merging together.

“by”, “by.x”, and “by.y” allows us to specify what we will be merging by.

You can learn more about merge() command here.



Part 2. Loading the sales.csv and locations.csv files into R

For the purpose of this tutorial, I created two sample .csv datasets that you can use to practice doing inner join in R.

The files can be downloaded here: sales.csv, locations.csv

Now let’s go ahead and load it into R using the following commands:


sales<-read.csv("/Users/DataSharkie/Desktop/sales.csv")
locations<-read.csv("/Users/DataSharkie/Desktop/locations.csv")

Note: remember that your location of the file will be different from mine, so adjust the code accordingly!

If you would like to learn more on how to import .csv files into R you can find an article here.

Once the datasets are imported, we can view the table using the following command:


View(sales)
View(locations)

After you familiarized yourself with the two tables, it's time to do inner join in R!



Part 3. Performing inner join in R for these two tables

Our goal here is to create a new table "inner_join", where we will only have the entries with matching rows from both tables.

By default, the merge() command in R performs an inner join, so there is no need for any additional specification in terms of identifying the parameters that we will be merging by.

At this point we are all set to do an inner join in R.

You can do it using the following code:


inner_join<-merge(sales, locations)

You can take a look at your new table by using this command:



If you are interested to learn more about different types of merging in R, you can find more articles here.