In this article we will work on learning how to select columns from data frame in R using select() command.
Theory
It is often the case, when importing data into R, that our data frame of interest will have a large number of columns.
But assume we only need one or two of them for our statistical analysis.
This technique conveniently works when you decide to keep 1-4 columns (just because of typing it out).
In this article I show an applied example on how to select a column from a data frame in R.
Application
Below are the steps we are going to take to make sure we do master the skill of removing columns from data frame in R:
- Basic select() command description
- Loading sample dataset: mtcars
- Selecting columns from data frame in R
Part 1. Basic select() command description
The short theoretical explanation of the function is the following:
select(data, column1, column2, …)
Here, “data” refers to the data frame you are working with; and “column1” refers to the name of the column you would like to keep (note: you can select more than 1 column).
Part 2. Loading sample dataset: mtcars
For the purposes of this article, I will be working with one of the R built-in datasets “mtcars”.
This dataset provides observations on 32 cars across 11 variables (weight, fuel efficiency, engine, and so on).
I prefer to call the data I work with “mydata”, so here is the command you would use for that:
mydata<-mtcars
Note: in this article I work with a prebuilt dataset. If you have your own in a csv or excel files, you can follow the same procedure to arrive at the result.
Now, let's take a look at our dataset:
View(mydata)
We see a 32x11 table with a lot of numbers.
Note: if we wanted to select 10/11 columns, it would be much easier to reverse the process and remove a column we don't need so the remaining 10 stay in the data frame.
Assume I want to keep only 1/11 columns "mpg" which shows each car's fuel efficiency.
Part 3. Selecting columns from data frame in R
At this point we decided which columns we want to keep from the data frame.
In simple terms, what the select() command does it it "keeps" the columns we choose or alternatively we can say that it "drops" the columns we didn't choose to keep.
Let's go ahead and select a column from data frame in R!
You can do it using the following code:
mydata<-select(mydata, mpg)
And let's take a look at the edited data frame:
View(mydata)
Recall: before it was a 32x11 table, and now it's 32x1.
We have successfully removed a column from data frame in R!
If you liked this article, I encourage you to take a look at the Data Manipulation in R section where you will find a lot of useful information and master the skill of data wrangling.