JavaScript and RegExp()

Ashwin Kumar
6 min readAug 31, 2020

Hi /fellas/g ! I know it looks weird but that addresses all fellas globally. Too nerdy? So is Regex. This one’s a beauty but you need to explore it to know it’s powers. I first realized that when I wanted to remove spaces of a string for some weird reason.

Obviously one solution to this is String.split(“ ”).join(“”):

However creating an Array to remove the spaces seemed too dirty to me. I am not sure if I have OCD!

So I went with String.replace(“ ”,“”). This doesn’t work for the simple fact that it only targets and removes one space. What if our target String has multiple spaces? We target them globally! Regex to the rescue:

That’s it, all spaces gone! poof! If some of you’re bugged from the fact that not String rather String.prototype has the ‘replace’ and ‘split’ method, I am sorry. That was just for ease of readability.

Okay, so Regex is a short term for Regular Expressions and we can define them in JavaScript in two ways:

The first one is the raw way, where you take a string and use the Constructor function ‘RegExp’ to create the regular expression. Yes, this one’s also an Object in JavaScript. :p

The second method is the literal notation or basically a simpler way of writing Regex. We can use this all the time though this one will serve only for static creation of Regex. Let me show you :

Now what does Regex do or what’s it’s use case?

Well the basic one is pattern matching. We define a particular pattern using Regular Expressions which can then be used to search through a string to find the matches of it. Extremely useful for validation, sub-string identification, cleaning up a string or maybe you just wanna remove white-spaces. :p

Now we can create simple Regex with regular characters that will literally match the defined characters in Regex or we can elevate our game to a different level using MetaCharacters.

MetaCharacters are the building blocks of a Regular Expression and have special meaning. The bad boys are listed below:

\ . ^ $ * + ? { } [ ] | ( )
  1. \ Let’s start with backslash:

Backslashes are escape characters in JavaScript meaning that it helps you in adding special meaning to characters like let’s say you want add a new line \n or a tab space \t. Cool cool but what if you want to add a string with value “\n”? You will have to use two backslashes to nullify the special character: “\\n” outputs \n .

Regex adds more power to that. Apart from allowing you to define a new line or space tabs, it also allows you to add Numbers \d or alphanumeric characters \w or just white-space characters \s (I used this to remove the spaces.). Okay, so let’s just say you want to check if your string has a newline with a number and a space:

There’s a catch to this. If you’re gonna create this Regex using RegExp constructor, one thing that needs to be addressed is that the constructor takes an argument as a String which gets converted to a Regex, so the String is gonna only respect the special characters that it recognizes like \n. For the one’s it doesn’t recognize you will have to escape them using \:

And for some reason if you just want non numbers, you can use \D or for non alphanumeric, \W and non white spaces, \S.

2. . Coming to a dot it simply matches any character except the newline symbol. Very useful to match random gibberish as long as it’s on the same line.

3. ^ The caret is the anchor for the start of the string, which essentially means that the Regex can only look for a match from the start of the String.

Example: “^a” matches “a” at the start of the string only.

And if used inside [ … ] it acts as a negation symbol:

Example: “[^\d]” matches any non digit.

4. $ is the anchor for the end of the string. Brother from another mother for ^ if you may imagine.

Example: “a$” matches “a” at the end of the string only.

5. +, *, ?, {} are known as the Quantifiers. All these MetaCharacters help you define how many times a character needs to repeat.

The most flexible one is {x,y} which takes two numerical parameters defining the minimum limit- x and the maximum limit- y for repetition. If you don’t define y then it’s strictly repetition up to x times. And if you keep y empty, then that’s eternity of allowed repetition. Gotta love this one!

+ simply means the character has to occur once but can occur multiple times. {1, } is the equivalent.

* means character can occur multiple times or may not even occur. {0, } is the equivalent.

? is like boolean. Either you exist once or you don’t. {0,1} is the equivalent.

6. [ ] Square Brackets gives you the ability to define a character class or list of characters and special characters which you can use to match a Single character.

  • One more very interesting thing you can do inside square brackets is using hyphen to define a range of characters like 0–9. This works by picking all the characters falling between 0 and 9 in the index range. So 0 has an index of 48 and 9 has an index of 57.
  • You can also match all letters using a-z for small letters and A-Z for capital letters.
/[0-9]/ is equivalent to /\d/
/[A-z0-9]/ is equivalent to /\w/
Also,
/[^0-9]/ is equivalent to /\D/
/[^A-z0-9]/ is equivalent to /\W/

7. | The vertical pipe is a simple OR condition. You have options, you use vertical pipes. So if I have to put it in perspective with square brackets, I can say that :

/a|b|c/ is equivalent to /[abc]/. 

Fancy that!

8. ( ) The last and the very peculiar ones are Capturing Groups!

These are what allows you to identify sub-strings using Regex. Capturing Group allows you to get a part of the match as a separate item in the result array.

So let’s take the example of a pincode. In a pincode, the first two digits represent a circle, the third digit represents the district and the last three digits represents the post office.

So how do we identify the substrings or the capturing groups? We can name them using ?<> like this:

(?<name> ... )

Assuming that we can get inputs like 123 456, 123456, 123–456 , we can use groups like this:

One more important concept in Regex is FLAGS!

I started the article with a global flag g pun! Sorry for that :D. Flags are optional parameters that can modify the way Regex does the search and match. You can define them in two ways:

Some of the common flags are:

g :

This one is the prime reason why all spaces were removed in my first example. This returns all possible matches for a Regex instead of just one match.

i :

with this flag the search is case-insensitive: no difference between A and a.

m:

Multiline mode on! Regex by default treats a string to have one start and one end. Let’s say you have multiple lines and you need each of them to be treated differently, then this flag is all you need. You only need these if you’re using the MetaCharacters ^ or $ .

s:

The dot-match-all mode! For a dot . as I said it matches with everything except a newline \n. If you want a dot to match a newline also, this is your flag!

y:

This one’s called Sticky. You define a Regex with this flag and then you can alter the lastIndex property of the Regex to start the match process from the defined Index only. Also this flag does not respect flag g. So may be not use them together:

Well, that was all I had! Happy Regexing!!!

--

--

Ashwin Kumar

Web Developer, JavaScript Enthusiast, Love to Code.