In this article we will learn how to remove the first character from a string in R using sub() command.



Theory

When working with text data or strings, quite often it will arrive to a data scientist with some typos or mistakes that occur on an observation-by-observation basis and follow some logical pattern.

As an example we can think of some type of scraping of a website or getting raw data from a source.

Suppose you receive some text file from your logistics department with a list of all methods of transportation that a business utilizes.

You look at the file (or a list) and you see that the first letter is repeated for every observation.

What I mean by that is that you see “aairplane” instead of “airplane”, “bbus” instead of “bus”, and so on.

You don’t have time right now to figure out where the mistake came from and you are facing a tight deadline to complete some analysis and you need this list in a working order.

Here, the art of string manipulation in R comes in handy.



Application

Below are the steps we are going to take to make sure we do learn how to remove first character from string in R:

  1. Creating a sample string in R
  2. Basic sub() command description
  3. Removing first character from string in R



Part 1. Creating a sample string in R

As mentioned in the introduction to this article, we will assume some very basic case when the data is sent to us and we are stuck working with what we were give.

Suppose that we receive a list of transportation methods that our company is currently using.

And it looks like this: “aairplane, bbus, ccar, ttruck”

It doesn’t really matter whether it’s in a form of a report with some table where this text is listed or if it’s just a list of values.

Let’s go ahead and add this into R (store it as “mytext”):


mytext <- c("aairplane", "bbus", "ccar", "ttruck")



Part 2. Basic sub() command description

In order to remove the first character from a string in R, there is a built-in command sub().

The short theoretical explanation of the function is the following:

sub(old_value, new_value, string)

Here, "old_value" refers to the character we are interested in replacing; "new_value" is what we will be replacing the "old_value" with; and "string" is the text object we will be working with.



Part 3. Removing first character from string in R

From the "sub" name you can probably realize that this command isn't only used for deleting characters but can also substitute the character with another character.

Essentially, what we will be doing here is replacing the first character with a blank "").

One of the very important parts here: there is a specific syntax you need to use to identify the first character in a string.

The first character is always identified as: "." (a dot in double quotation marks).

Now we are all set to remove the first character from a string in R!

You can do it using the following code:


mytext<-sub(".", "", mytext)

Awesome! We have just removed the first character from a string in R!

You can go ahead and print it out to see that the first character was removed (replaced with a blank).


print(mytext)



This concludes our article on how to remove first character from string in R. You can learn more about working with strings and text data in the Data Manipulation section.