In this article we will work on learning how to remove columns from data frame in R using select() command.


Theory

It is often the case, when importing data into R, that our data frame of interest will have a large number of columns.

But assume we only need some of them for our statistical analysis.

One way to go around this problem is to select (keep) the columns we need. It conveniently works when you decide to keep 1-4 columns (just because of typing it out).

But what do you do when your data frame has 11 columns but you need 10 of them?

Clearly, selecting each of the 10 is quite time consuming. There has to be a better way around it!

If we don’t want to select 10/11 columns, why don’t we just remove 1/11 columns we don’t need?

Sounds way more efficient to me!

R has a solution for everything! In this article I show an applied example on how to remove a column from a data frame in R.



Application

Below are the steps we are going to take to make sure we do master the skill of removing columns from data frame in R:

  1. Basic select() command description
  2. Loading sample dataset: mtcars
  3. Removing columns from data frame in R



Part 1. Basic select() command description

The short theoretical explanation of the function is the following:

select(data, column1, column2, …)

Here, “data” refers to the data frame you are working with; and “column1” refers to the name of the column you would like to keep (note: you can select more than 1 column).



Part 2. Loading sample dataset: mtcars

For the purposes of this article, I will be working with one of the R built-in datasets “mtcars”.

This dataset provides observations on 32 cars across 11 variables (weight, fuel efficiency, engine, and so on).

I prefer to call the data I work with “mydata”, so here is the command you would use for that:


mydata<-mtcars

Note: in this article I work with a prebuilt dataset. If you have your own in a csv or excel files, you can follow the same procedure to arrive at the result.

Now, let's take a look at our dataset:


View(mydata)

We see a 32x11 table with a lot of numbers.

Again, if we wanted to just keep 1 or 2 columns (for example), we could just select the ones we want.

Assume I want to keep 10 columns except for "mpg" which shows the car's fuel efficiency.



Part 3. Removing columns from data frame in R

At this point we decided which columns we want to drop from the data frame.

You may be surprised why we are using select() command to drop the column from the data frame, and it's an important point to mention.

The name of the command suggests "selecting" not "removing".

R uses several arithmetic/logical operators, and an important one for us here is "-" which in plain words means "drop".

In simple terms, what we will do is select all but "drop" the column we don't want to keep.

Let's go ahead and remove a column from data frame in R!

You can do it using the following code:


mydata<-select(mydata, -mpg)

And let's take a look at the edited data frame:


View(mydata)

Recall: before it was a 32x11 table, and now it's 32x10.

We have successfully removed a column from data frame in R!


If you liked this article, I encourage you to take a look at the Data Manipulation in R section where you will find a lot of useful information and master the skill of data wrangling.