Regular Expressions In Python

Regular expressions are specialized little programming languages that are generally embedded inside the programming languages. 

A Regular expression can be – a simple character or a special character or a combination of both simple and special characters. 

With the help of this little programming language, you can specify some special string rules which will help you to match the specific set of strings or the specific string patterns. These specific string patterns can be – Email Addresses, Website URLs, or any other simple English word or sentence. 

In python, the built-in python module named “re” will help you to work with regular expressions. This module will also provide you with some functions that can help you to search or match or split a specific string pattern. 

And not only this but it also provides you with some other special methods which help you to apply some special operations which we will cover in later sections.


Mainly Used Functions Of Re Module

re.search(pattern, string, flags) = This function will help you scan the whole string and search the first position where the given string pattern matches. It will return you the corresponding match object if the pattern matches, or return you None if there is no match. You can pass one more argument to this function – flags. If you want to know how flags work, see the flags section.

Example :

import re

poem="twinkle twinkle little star,how i wonder what you are."
email="samplemail@gmail.com"

print(re.search("win",poem))
print(re.search("@",email))

Output :

<re.Match object; span=(1, 4), match='win'>
<re.Match object; span=(10, 11), match='@'>

re.match(pattern, string, flags) = It works like the re.search() function, but it will search the given string pattern at the beginning of the given string. Even if you specify the re.MULTILINE flag, it will only search the pattern at the beginning of the string not at the beginning of each line of the string.

Example :

import re

poem="twinkle twinkle little star,how i wonder what you are."
simpletext="hello world"

print(re.match("star",poem))
print(re.match("twinkle",poem))
print(re.match("hello",simpletext))

Output :

None
<re.Match object; span=(0, 7), match='twinkle'>
<re.Match object; span=(0, 5), match='hello'>

re.fullmatch(pattern, string, flags) = This function checks whether the whole string matches the given pattern or not. If it matches, then it will return you the corresponding match object and if it not matches, it will return None.

Example :

import re

poem="twinkle twinkle little star,how i wonder what you are."
simpletext="hello world"

print(re.fullmatch("twinkle",poem))
print(re.fullmatch("hello world",simpletext))

Output :

None
<re.Match object; span=(0, 11), match='hello world'>

re.findall(pattern, string, flags) = With the help of this function, you can get the list of all string patterns which match the given pattern.

Example :

import re

poem="twinkle twinkle little star,how i wonder and what you are."
print(re.findall("win",poem))

Output :

['win', 'win']

re.finditer(pattern, string, flags) = This function returns you an iterator of matched objects for the given string pattern.

Example :

import re

poem="twinkle twinkle little star,how i wonder and what you are."
print(list(re.finditer("win",poem)))

print("--------------------------")

for i in re.finditer("win",poem):
    print(i)

Output :

[<re.Match object; span=(1, 4), match='win'>, <re.Match object; span=(9, 12), match='win'>]
--------------------------
<re.Match object; span=(1, 4), match='win'>
<re.Match object; span=(9, 12), match='win'>

re.compile(pattern, flags) = This function allows you to compile the regular expression pattern into the regular expression object. After that, you can use match(), search(), and other regex functions with this regular expression object.

Example :

import re

poem="twinkle twinkle little star,how i wonder and what you are."
pattern=re.compile("win")

print(pattern.search(poem))
print("-------------------------")

print(pattern.findall(poem))
print("-------------------------")

print(list(pattern.finditer(poem)))

Output :

<re.Match object; span=(1, 4), match='win'>
-------------------------
['win', 'win']
-------------------------
[<re.Match object; span=(1, 4), match='win'>, <re.Match object; span=(9, 12), match='win'>]

Regular Expressions Flags

With the help of different flags, we can modify the behavior of regular expressions. 

Some important flags are :
  • re.I or re.IGNORECASE = This flag helps the functions to perform case-insensitive matching. The difference between lowercase and uppercase will get ignored. For example, if the expression is “ABC”, then it also matches with “abc”. 
  • re.M or re.MULTILINE = If you specify this flag, the pattern character “^” will match at the beginning of every new line of your string. And the pattern character “$” will match at the end of every line of your string.
  • re.S or re.DOTALL = Generally, the “.” special character matches any character except the newline character. But this flag will make a special character “.” match with the newline character too.

Example :

import re

poem='''twinkle twinkle little star\n
how i wonder and what you are\n
Up and above you were so high\n
Like a diamond in the sky'''

print(re.search('high$',poem))
print(re.search('high$',poem, flags=re.MULTILINE))
print(re.search('are$',poem, flags=re.MULTILINE))

Output :

None
<re.Match object; span=(85, 89), match='high'>
<re.Match object; span=(55, 58), match='are'>

Example :

import re

poem="twinKle Twinkle littLe Star\nhow I wondEr what You a3rE."

print(re.findall('(twin)',poem))
print(re.findall('(twin)',poem, flags=re.IGNORECASE))

Output :

['twin']
['twin', 'Twin']

Regex Special Characters

Special characters like – “?”, “+”, “^”, “&”, etc. – will help you to change the interpretation of the other regular expressions around them. 

List of some important Special Characters :
“.”It will match with any character(“a”, “B”, “3”, “@”) except the newline(“\n”).
“^”This character helps you to match the character at the beginning of the string.
“$”It helps you to match the character at the end of the string.

Example :

import re

poem="twinKle Twinkle winn wiN littLe litttle litttttle star\nhow I wondEr in what You arE"
simpletext="hello world"

print(re.findall(".",simpletext))
print("--------------------------")

print(list(re.finditer("^twi",poem)))

print("--------------------------")
print(re.search("arE$",poem))

Output :

['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
--------------------------
[<re.Match object; span=(0, 3), match='twi'>]
--------------------------
<re.Match object; span=(80, 83), match='arE'>

“*”It will match with the 0 or more repetitions of the preceding character.
“+”It matches with 1 or more repetitions of the preceding character.
“?”It matches with the 0 or 1 repetitions of the preceding character.

Example :

import re

poem="twinKle Twinkle winn wiN littLe litttle litttttle star\nhow I wondEr in what You arE"

print(re.findall("wiN*",poem))
print("-----------------------")

print(re.findall("in+",poem))

print("-----------------------")
print(re.findall("wiN?",poem))
print(re.findall("littt?",poem))

Output :

['wi', 'wi', 'wi', 'wiN']
-----------------------
['in', 'in', 'inn', 'in']
-----------------------
['wi', 'wi', 'wi', 'wiN']
['litt', 'littt', 'littt']

{m}It will match with the 0 or more repetitions of the preceding character.
{m, n}It matches with 1 or more repetitions of the preceding character.

Example :

import re

poem="twinKle Twinkle winn wiN littLe litttle litttttle star\nhow I wondEr in what You arE"

print(re.findall("t{2}",poem))
print(re.findall("t{3}",poem))

print("------------------------")
print(re.findall("t{1,2}",poem))
print(re.findall("t{2,4}",poem))

print("------------------------")
print(re.findall("lit{2,3}le",poem))

Output :

['tt', 'tt', 'tt', 'tt']
['ttt', 'ttt']
------------------------
['t', 'tt', 'tt', 't', 'tt', 'tt', 't', 't', 't']
['tt', 'ttt', 'tttt']
------------------------
['litttle']

“\”This will help you to match special characters like – “*”, “?”, “/”, “+” or special sequences like – “\n”, “\s”.

Example :

import re

poem='''twinkle twinkle little star\n
how i * wonder * and what you are\n
Up + and + above + you were so high\n
Like a diamond in the sky'''

print(re.findall('[\*\+\n]',poem))

Output :

['\n', '\n', '*', '*', '\n', '\n', '+', '+', '+', '\n', '\n']

[ ]This will help you to match the set of characters.

You can list the characters individually like this :

[abc]This will match with the “a”, “b”, and “c” individually.
[123]This will match with the 1, 2, and 3 individually. 

Example :

import re

poem="twinKle Twinkle winn wiN littLe litttle litttttle star\nhow I wondEr in what You arE"
alphanumerics="hey123 hello456 h!89 and namaste555 How @re3 yOu"

print(re.findall('[Tru]',poem))
print(re.findall('[378]',alphanumerics))

Output :

['T', 'r', 'r', 'u', 'r']
['3', '8', '3']

You can also use the range of characters like this :

[0-9]This will individually match the 0 to 9 numbers.
[a-z]This will individually match the a to z alphabets.

Example :

import re

poem="twinKle Twinkle winn wiN littLe litttle litttttle star\nhow I wondEr in what You arE"
alphanumerics="hey123 hello456 h!89 and namaste555 How @re3 yOu"

print(re.findall('[A-Z]',poem))

print("________________________________________")
print(re.findall('[a-z]',alphanumerics))
print("------------------------------------")
print(re.findall('[^a-z]',poem))
print("------------------------------------")
print(re.findall('[^a-z- ]',alphanumerics))

print("________________________________________")
print(re.findall('[0-9]',alphanumerics))
print("------------------------------------")
print(re.findall('[0-9- ]',alphanumerics))
print("------------------------------------")
print(re.findall('[^0-9]',alphanumerics))

print("________________________________________")
print(re.findall('[A-Z-0-9]',alphanumerics))
print("------------------------------------")
print(re.findall('[^A-Za-z0-9]',alphanumerics))
print("------------------------------------")
print(re.findall('[^A-Za-z0-9- ]',alphanumerics))

Output :

['K', 'T', 'N', 'L', 'I', 'E', 'Y', 'E']
________________________________________
['h', 'e', 'y', 'h', 'e', 'l', 'l', 'o', 'h', 'a', 'n', 'd', 'n', 'a', 'm', 'a', 's', 't', 'e', 'o', 'w', 'r', 'e', 'y', 'u']
------------------------------------
['K', ' ', 'T', ' ', ' ', 'N', ' ', 'L', ' ', ' ', ' ', '\n', ' ', 'I', ' ', 'E', ' ', ' ', ' ', 'Y',
' ', 'E']
------------------------------------
['1', '2', '3', '4', '5', '6', '!', '8', '9', '5', '5', '5', 'H', '@', '3', 'O']
________________________________________
['1', '2', '3', '4', '5', '6', '8', '9', '5', '5', '5', '3']
------------------------------------
['1', '2', '3', ' ', '4', '5', '6', ' ', '8', '9', ' ', ' ', '5', '5', '5', ' ', ' ', '3', ' ']
------------------------------------
['h', 'e', 'y', ' ', 'h', 'e', 'l', 'l', 'o', ' ', 'h', '!', ' ', 'a', 'n', 'd', ' ', 'n', 'a', 'm', 'a', 's', 't', 'e', ' ', 'H', 'o', 'w', ' ', '@', 'r', 'e', ' ', 'y', 'O', 'u']
________________________________________
['1', '2', '3', '4', '5', '6', '8', '9', '5', '5', '5', 'H', '3', 'O']
------------------------------------
[' ', ' ', '!', ' ', ' ', ' ', ' ', '@', ' ']
------------------------------------
['!', '@']

Remember, if you place “-” as the first [-a] or last character [a-], then it will match with the character “-”. You can also use “\-” to match the “-” character.

You can also match literal special characters inside the sets and also use special sequences :

[*+?#]This matches with any of these special characters – “*”, “+”, “?”, “#”.
[\n\]\s\-]This set can match with the newline, “]”, whitespace characters,  “-”.

Example :

import re

symbols="~`!@#$%^&*()_+{}[]().-/\?,;:''"

print(re.findall('[\][\*\.\-\\\\\?]',symbols))

Output :

['*', '[', ']', '.', '-', '\\', '?']

()This can be helpful to match the group of characters. If you want to match the literal “(” or “)”, then use “\(” and “\)” or [(] and [)].
|This will help you to make a regular expression that contains multiple regular expressions as choices like this A|B|…… And if you want to match the character “|”, then use “[|]” or “\|”.

Example :

import re
simpletext="hello apple banana orange banana kiwi bananana grapes banananana"

for i in re.finditer('ba(na)+',simpletext):
    print(i)
print("-------------------------------------")

for j in re.finditer('ba(na){2}',simpletext):
    print(j)
print("-------------------------------------")

for k in re.finditer('ba(na){2,3}',simpletext):
    print(k)

Output :

<re.Match object; span=(12, 18), match='banana'>
<re.Match object; span=(26, 32), match='banana'>
<re.Match object; span=(38, 46), match='bananana'>
<re.Match object; span=(54, 64), match='banananana'>
-------------------------------------
<re.Match object; span=(12, 18), match='banana'>
<re.Match object; span=(26, 32), match='banana'>
<re.Match object; span=(38, 44), match='banana'>
<re.Match object; span=(54, 60), match='banana'>
-------------------------------------
<re.Match object; span=(12, 18), match='banana'>
<re.Match object; span=(26, 32), match='banana'>
<re.Match object; span=(38, 46), match='bananana'>
<re.Match object; span=(54, 62), match='bananana'>

Example :

import re
poem="twinkle twinkle little star,how i wonder and what you are."

print(re.findall('and|twin',poem, flags=re.MULTILINE))
print(re.findall('(twin)|(and)',poem, flags=re.MULTILINE))

Output :

['twin', 'twin', 'and']
[('twin', ''), ('twin', ''), ('', 'and')]

Regex Special Sequences

Special Sequences are some characters preceded by “\” and have special meanings.

Below is the list of some important special sequences.
\dYou can use this sequence to match digit characters.
\DThis is the opposite of “\d” and you can use this to match characters that are not a digit.
\AThis will help you to match the character at the start of the string.
\ZThis can help you to match the character at the end of the string.
\sWith this, you can match the whitespace character.
\SIt is the opposite of “\s”. You can use this to not match the whitespace character.
\wWith this, you can match any word character ( [a-z], [A-Z], “_” ).
\WIf you do not want to match any word character, then use this sequence.

Example :

import re

dates="20-9-2022\n7/12/1860\n3.4.2014\n18-06-1990\n22/03/1980"

print(re.findall("\d\d.",dates))
print("-----------------------------------")

print(re.findall("\d\d.+",dates))
print("-----------------------------------")

print(re.findall("\d\d-\d\d-\d\d\d\d",dates))
print("-----------------------------------")

print(re.findall("\d\d-\d\d?-\d\d\d\d",dates))
print("-----------------------------------")

print(re.findall("\d\d/\d\d/\d\d\d\d",dates))
print("-----------------------------------")

print(re.findall("\d\d.\d\d.\d\d\d\d",dates))
print("-----------------------------------")

print(re.findall("\d\.\d\.\d\d\d\d",dates))
print("-----------------------------------")

print(re.findall(".+-.+-.+",dates))

Output :

['20-', '202', '12/', '186', '201', '18-', '06-', '199', '22/', '03/', '198']
-----------------------------------
['20-9-2022', '12/1860', '2014', '18-06-1990', '22/03/1980']
-----------------------------------
['18-06-1990']
-----------------------------------
['20-9-2022', '18-06-1990']
-----------------------------------
['22/03/1980']
-----------------------------------
['18-06-1990', '22/03/1980']
-----------------------------------
['3.4.2014']
-----------------------------------
['20-9-2022', '18-06-1990']

Example :

import re

simpletext=" hello my\n bananana banana world"
alphanumerics="hey123 hello456 h!89 and namaste555 How @re3 yOu"
symbols="~`!@#$%^&*()_+{}[]().-/\?,;:''"

print(re.search("\Ahe",simpletext))
print("-----------------------------------")

print(re.search("ld\Z",simpletext))
print("-----------------------------------")

print(re.findall("\D",alphanumerics))
print("-----------------------------------")

print(re.findall("\s",alphanumerics))
print("-----------------------------------")

print(re.findall("\s",simpletext))
print("-----------------------------------")

print(re.findall("\S",simpletext))
print("-----------------------------------")

print(re.findall("\w",alphanumerics))
print("-----------------------------------")

print(re.findall("\W",alphanumerics))
print("-----------------------------------")

print(re.findall(r"\\",symbols))

Output :

None
-----------------------------------
<re.Match object; span=(30, 32), match='ld'>
-----------------------------------
['h', 'e', 'y', ' ', 'h', 'e', 'l', 'l', 'o', ' ', 'h', '!', ' ', 'a', 'n', 'd', ' ', 'n', 'a', 'm', 'a', 's', 't', 'e', ' ', 'H', 'o', 'w', ' ', '@', 'r', 'e', ' ', 'y', 'O', 'u']
-----------------------------------
[' ', ' ', ' ', ' ', ' ', ' ', ' ']
-----------------------------------
[' ', ' ', '\n', ' ', ' ', ' ']
-----------------------------------
['h', 'e', 'l', 'l', 'o', 'm', 'y', 'b', 'a', 'n', 'a', 'n', 'a', 'n', 'a', 'b', 'a', 'n', 'a', 'n', 'a', 'w', 'o', 'r', 'l', 'd']
-----------------------------------
['h', 'e', 'y', '1', '2', '3', 'h', 'e', 'l', 'l', 'o', '4', '5', '6', 'h', '8', '9', 'a', 'n', 'd', 'n', 'a', 'm', 'a', 's', 't', 'e', '5', '5', '5', 'H', 'o', 'w', 'r', 'e', '3', 'y', 'O', 'u']
-----------------------------------
[' ', ' ', '!', ' ', ' ', ' ', ' ', '@', ' ']
-----------------------------------
['\\']

Regular Expressions Other Special Functions

re.split(pattern, string, maxsplit, flags) = This method will split the given string by the occurrences of the given pattern and returns you the resulting list. If you specify the value of maxsplit to non-zero, then only the specified number of splits occur.

Example :

import re

alphanumerics="hey123 hello456 h!89 and namaste555 How @re3 yOu"

print(re.split('\d+',alphanumerics))
print(re.split('\s',alphanumerics))

Output :

['hey', ' hello', ' h!', ' and namaste', ' How @re', ' yOu']
['hey123', 'hello456', 'h!89', 'and', 'namaste555', 'How', '@re3', 'yOu']

re.sub(pattern, replacement, string) = With this method, you can replace the pattern from the given string with the given replacement text.

Example :

import re

alphanumerics="hey123 hello456 h!89 and namaste555 How @re3 yOu"

for num in re.findall('\d\d\d',alphanumerics):
    print(re.sub(num,'[---]',alphanumerics))

Output :

hey[---] hello456 h!89 and namaste555 How @re3 yOu
hey123 hello[---] h!89 and namaste555 How @re3 yOu
hey123 hello456 h!89 and namaste[---] How @re3 yOu

re.escape(pattern) = This method helps you to escape special metacharacters in the given pattern. It is useful if you want to match the arbitrary regular expressions metacharacter.

Example :

import re

symbols="~`!@#$%^&*()_+{}[]().-/\?,;:''"

print(re.escape(symbols))

Output :

\~`!@\#\$%\^\&\*\(\)_\+\{\}\[\]\(\)\.\-/\\\?,;:''

Regular Expressions Match Objects

Methods like re.search() or re.match() return “None”, if there is no match. But if there is a match, then return a Match Object. The boolean value of a Match Object is always “True”.

Match Objects supports some attributes and methods:-
span()This function returns you a tuple of the start and end index positions of the Match Object.
start()You can use this function to get the starting index position of the Match Object.
end()You can use this function to get the ending index position of the Match Object.

Example :

import re

poem="twinkle twinkle little star, how i wonder and what you are."

print(re.search('are',poem))
print("Tuple of start and end positions :",re.search('are',poem).span())
print("Start position :",re.search('are',poem).start())
print("End position :",re.search('are',poem).end())

Output :

<re.Match object; span=(54, 57), match='are'>
Tuple of start and end positions : (54, 57)
Start position : 54
End position : 57

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *