Python Regular Expression

Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!

Regular expression is widely used for pattern matching. Python has built-in support for regular function. To use regular expression you need to import re module.

Now you are ready to use regular expression.

re.search() Method

re.search()  is used to find the first match for the pattern in the string.

Syntax: re.search(pattern, string, flags[optional])

re.search() method accepts pattern and string and returns a match object on success or None  if no match is found. match object  has group()  method which contains the matching text in the string.

You must specify the pattern using raw strings i.e prepending string with r like this.

All the special character and escape sequences loose their special meanings in raw string so \n  is not a newline character, it’s just backslash \  followed by n  .

above we have use \d\d\d as pattern. \d in regular expression matches a single digit, so

\d\d\d  will match digits like 111 , 222 , 786 it will not match 12 , 1444 .

Basic patterns used in regular expression

Symbol Description
. dot matches any character except newline
\w matches any word character i.e letters, alphanumeric, digits and underscore ( _ )
\W matches non word characters
\d matches a single digit
\D matches a single character that is not a digit
\s matches any white-spaces character like \n, \t, spaces
\S matches single non white space character
[abc] matches single character in the set i.e either match a, b or c
[^abc] match a single character other than a, b and c
[a-z] match a single character in the range a to z.
[a-zA-Z] match a single character in the range a-z or A-Z
[0-9] match a single character in the range 0-9
^ match start at beginning of the string
$ match start at end of the string
+ matches one or more of the preceding character (greedy match).
* matches zero or more of the preceding character (greedy match).

Let take one more example:

here we have used [\w.-]+@[\w.-]+ pattern to match an email address. On success re.search()  returns an match object , and its group()  method will contain the matching text.

Group capturing

Group capturing allows to extract parts from the matching string. You can create groups using parentheses () . Suppose we want to extract username and host name from the email address in the above example. To do this we need to add ()  around username and host name like this.

Note that parentheses will not change what the pattern will match. If the match is successful then match.group(1)  will contain the match from the first parentheses and match.group(2)  will contain the match from the second parentheses.

findall() Function

As you know by now re.search()  find only first match for the pattern, what if we want to find all matches in string, this is where findall()  comes into the play.

Syntax: findall(pattern, string, flags=0[optional])

On success it returns all the matches as a list of strings, otherwise an empty list.

Expected Output:

you can also use group capturing with findall() , when group capturing is applied then findall()  returns a list of tuples where tuples will contain the matching groups. An example will clear everything.

Expected Output:

Optional flags

Both re.search()  and re.findall() accepts and optional parameter called flags. flags are used to modify the behavior of the pattern matching.

Flags Description
re.IGNORECASE Ignores uppercase and lowercase
re.DOTALL Allows (.) to match newline, be default (.) matches any character except newline
re.MULTILINE This will allow ^ and $ to match start and end of each line

Using re.match()

re.match()  is very similar to re.search()  difference is that it will start looking for matches at the beginning of the string.

You can accomplish the same thing by applying ^ to a pattern with re.search() .

This completes everything you need to know about re module in python.

Other Tutorials

This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!

Leave a Reply

2 Comments on "Python Regular Expression"

Notify of
avatar

Sort by:   newest | oldest | most voted
sakmis
Guest
sakmis
4 months 25 days ago

Syntax: re.search(pattern, string, flags[optional])
now if I wanted to use re.IGNORECASE flag here, how do I give it as the optional argument?
should it be like


import re
s = "tim email is tim@somehost.com"
match = re.search(r'[\w.-]+@[\w.-]+', s, IGNORECASE)
 
# the above regular expression will match a email address
 
if match:
    print(match.group())
else:
    print("match not found")

wpDiscuz