Regex. Who are you?
Meeting New Expression During JavaScript Developing
Introduction
Stepping into a Tech Industry, I have moved to a totally different world. Not only I am getting used to Computer programming, as the learning process continues, I am also introduced to another languages. I am originally from Korea. English is my second language and now, JavaScript became my third language.
While I am learning this JavaScript for communicating with Computers, Internets and Servers, I have met this little guy called “Regex”. As soon as I met this little guy, I thought he came from outside of the universe. I did not understand what he was talking about. However, I would love to know him better. After learning this new language, I eventually started to understand what he’s saying, and now, we can communicate better.
What is Regex?
A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that specifies a search pattern. Usually such patterns are used by string-searching algorithms for “find” or “find and replace” operations on strings, or for input validation.
This expression starts with Slash(/), ends with Slash(/).
‘g’, means global, match every character. Without this letter, regex only matches the very first one. It follows by slash(/g).
‘i’, means case insensitive, match any character either upper or lower. It also follows by slash(/gi).
‘m’, means multi line, in order to find the beginning of the multiple line, we use ‘m’ after slash(/gm)
And, there are some operators, too.
In addition to that, there are other expressions using inside of Slash. As you see the following, uppercase letter means totally opposite to lowercase letter.
This backslash(\) cancel out anything that comes after, escape it.
‘\s’, means match white space, ‘\S’ means match anything except for white space
Inside numbers of curly bracket { }, means { minimum, maximum }
Inside letters of square bracket [ ], means we put any characters we want to match. (e.g. [fc]at means any 3 letter word that ends with ‘at’.)
[a-zA-Z] match all the letter that ends with 'at'
[0-9] match all the digits
[a-f] match anything between a to f
Inside letters of parenthesis ( ), means we group them that they only act upon themselves.
(t|T)he match upper or lowercase t. 'the' or 'The'
/(t|r|e){2,3}\./g t/r/e with 2 to 3 letter that ends with period.
‘^’, if it starts with this caret character, it means match the beginning of the paragraph as a whole, in order to find multi lines, we can also use ‘m’ after slash.
‘$’, means match the end of sentences.
These are the basic expressions. we now are looking something into deeper.
Look Behind
/(?<=)/g search the word that is preceded by
/(?<=[tT]he)/g match first character that is preceded by the or The
so this look behind allows us to look at things that either happen before or after the thing you want to capture.
/(?<![tT]he)/g match first character that is preceded by the or The
The only difference is just replacing equal sign with an exclamation point! Anything that doesn’t have the word ‘the’ or ‘The’ before. This is negative look behind. The result should be inverted.
Look Ahead
Look ahead is similar to look behind, but just remove less symbols.
/.(?=at)/g matches any character followed by 'at'
/.(?!at)/g matches words NOT followed by the characters
Simply replacing equal sign with exclamation mark, that gives us word that matches every word that does not followed by the characters.
These are all the expressions we have been through, and there are real world use cases that are commonly used.
Checking Emails
/[\w._%+-]+@[\w.-]+\.[a-zA-Z]{2,4}/g
Matching URL
/(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?/g
Matching Phone Numbers
1234567890
/\d{10}/
123-456-7890
/\d{3}-?\d{3}-?\d{4}/gm
123 456 7890
/\d{3}[ -]?\d{3}[ -]?\d{4}/gmconvert phone number into just 10 digits only
/(\d{3})[ -]?(\d{3})[ -]?(\d{4})/gm$1$2$3 => grouping number putting together
/(?<areacode>\d{3})[ -]?(\d{3})[ -]?(\d{4})/gm
you can name groups with ?<name>
(123) 456-7890
/\(?(?<areacode>\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})/gmbackslash,parenthesis,question mark(optionally)international number
+1 123 456 7890
/((\+1)[ -])?\(?(?<areacode>\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})/gm
/(?:(\+1)[ -])?\(?(?<areacode>\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})/gm
Conclusion
Once I have been through all, this does not look strange anymore. And I noticed that it will be really helpful for my career. It has been a great pleasure, Regex! Glad to know you, happy to understand your language. Hope to see you again in the near future!
reference,
picture reference : https://content.breatheco.de
https://dzone.com/articles/abc-of-regex
MDN Regex: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
Regexr Website: https://regexr.com