Explain Like I'm 5: Regular Expressions
regex - Python re.search - Stack Overflow
Python: How to use RegEx in an if statement? - Stack Overflow
Probably the best tutorial on regular expressions I have ever read
Is that like "my best root canal ever"?
More on reddit.comVideos
Could someone please explain regular expressions and how they're used?
Every tutorial I've read online spends a lot of time going over special characters until I glaze over. After reading a bunch, I know what the special characters are, but not why/how to use them.
Could you include a simple function that illustrates?
Thank you
re.search() finds the pattern once in the string, documenation:
Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
In order to match every occurrence, you need re.findall(), documentation:
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
Example:
>>> import re
>>> regex = re.compile(r'([a-z]+)', re.I)
>>> # using search we only get the first item.
>>> regex.search("123hello456world789").groups()
('hello',)
>>> # using findall we get every item.
>>> regex.findall("123hello456world789")
['hello', 'world']
UPDATE:
Due to your duplicate question (as discussed at this link) I have added my other answer here as well:
>>> import re
>>> regex = re.compile(r'([a-z][a-z-\']+[a-z])')
>>> regex.findall("HELLO W-O-R-L-D") # this has uppercase
[] # there are no results here, because the string is uppercase
>>> regex.findall("HELLO W-O-R-L-D".lower()) # lets lowercase
['hello', 'w-o-r-l-d'] # now we have results
>>> regex.findall("123hello456world789")
['hello', 'world']
As you can see, the reason why you were failing on the first sample you provided is because of the uppercase, you can simply add the re.IGNORECASE flag, though you mentioned that matches should be lowercase only.
@InbarRose answer shows why re.search works that way, but if you want match objects rather than just the string outputs from re.findall, use re.finditer
>>> for match in re.finditer(pat, string):
... print match.groups()
...
('hello',)
('world',)
>>>
Or alternatively if you wanted a list
>>> list(re.finditer(pat, string))
[<_sre.SRE_Match object at 0x022DB320>, <_sre.SRE_Match object at 0x022DB660>]
It's also generally a bad idea to use string as a variable name given that it's a common module.
import re
if re.match(regex, content):
blah..
You could also use re.search depending on how you want it to match.
You can run this example:
"""
very nice interface to try regexes: https://regex101.com/
"""
# %%
"""Simple if statement with a regex"""
import re
regex = r"\s*Proof.\s*"
contents = ['Proof.\n', '\nProof.\n']
for content in contents:
assert re.match(regex, content), f'Failed on {content=} with {regex=}'
if re.match(regex, content):
print(content)
if re.search(r'pattern', string):
Simple if-regex example:
if re.search(r'ing\b', "seeking a great perhaps"): # any words end with ing?
print("yes")
Complex if-regex example (pattern check, extract a substring, case insensitive):
search_object = re.search(r'^OUGHT (.*) BE$', "ought to be", flags=re.IGNORECASE)
if search_object:
assert "to" == search_object.group(1) # what's between ought and be?
Notes:
Use
re.search()not re.match. The match method restricts to the start of the string, a confusing convention. If you want that, search explicitly with caret:re.search(r'^...', ...)(Or in re.MULTILINE mode use\A)Use raw string syntax
r'pattern'for the first parameter. Otherwise you would need to double up backslashes, as inre.search('ing\\b', ...)In these examples,
'\\b'orr'\b'is a special sequence meaning word-boundary for regex purposes. Not to be confused with'\b'or'\x08'backspace.re.search()returnsNoneif it doesn't find anything, which is always falsy.re.search()returns a Match object if it finds anything, which is always truthy.even though re.search() returns a Match object (
type(search_object) is re.Match) I have taken to calling the return value asearch_object. I keep returning to my own bloody answer here because I can't remember whether to use match or search. It's search, dammit.a group is what matched inside pattern parentheses.
group numbering starts at 1.
Specs
Tutorial