From the documentation of re.findall:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
While your regexp is matching the string three times, the (.*?) group is empty for the second two matches. If you want the output of the other half of the regexp, you can add a second group:
>>> re.findall(r'\((.*?)\)|(\w)', '(zyx)bc')
[('zyx', ''), ('', 'b'), ('', 'c')]
Alternatively, you could remove all the groups to get a simple list of strings again:
>>> re.findall(r'\(.*?\)|\w', '(zyx)bc')
['(zyx)', 'b', 'c']
You would need to manually remove the parentheses though.
Answer from James Henstridge on Stack OverflowFrom the documentation of re.findall:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
While your regexp is matching the string three times, the (.*?) group is empty for the second two matches. If you want the output of the other half of the regexp, you can add a second group:
>>> re.findall(r'\((.*?)\)|(\w)', '(zyx)bc')
[('zyx', ''), ('', 'b'), ('', 'c')]
Alternatively, you could remove all the groups to get a simple list of strings again:
>>> re.findall(r'\(.*?\)|\w', '(zyx)bc')
['(zyx)', 'b', 'c']
You would need to manually remove the parentheses though.
Other answers have shown you how to get the result you need, but with the extra step of manually removing the parentheses. If you use lookarounds in your regex, you won't need to strip the parentheses manually:
>>> import re
>>> s = '(zyx)bc'
>>> print (re.findall(r'(?<=\()\w+(?=\))|\w', s))
['zyx', 'b', 'c']
Explained:
(?<=\() // lookbehind for left parenthesis
\w+ // all characters until:
(?=\)) // lookahead for right parenthesis
| // OR
\w // any character
Handling multiple regex patterns when looking through a dataframe?
Having a logical and for multiple regex patterns - Python
Combine multiple regex pattern module into 1
How to combine multiple regex into single one in python? - Stack Overflow
Videos
I have a list of about 5000 schools. I want to iterate through this list and use each one as a regex search pattern. If the pattern is found when iterating through the dataframe column, I want to put that value in a new column. The issue is, there is sometimes more than 1 matching pattern and I want to see all of them incase one is a better match. My current code will either overwrite the pattern or use the first one found.
Posts['post_mod'] is the dataframe column I am iterating over and schools is a list.
index_posts = posts.columns.get_loc('post_mod')
index_schools = schools.columns.get_loc('Input')
for school in tqdm(schools_list):
school_pattern = rf'{school}'
for row in range(len(posts)):
try:
school = re.search(school_pattern, posts.iat[row, index_posts]).group()
except:
continue
posts.loc[row, index_posts] = school
final = pd.merge(posts, schools, how='left', left_on=6, right_on='Input')
final = final.drop_duplicates().reset_index(drop=True)Hey guys, I would consider myself a beginner in regex and was given a challenge (by codewars) which I would like to learn about (googling did not help):
I have several (4) simple patterns that all need to be given at the same time. To be specific I receive a string and should check whether it
-
contains at least 6 (only alpha-numeric) characters -> [a-zA-Z0-9]{6,}
-
at least 1 upper case letter -> [A-Z]{1,}
-
at least 1 lower case letter -> [a-z]{1,}
-
at least 1 number -> [0-9]{1,}
However, all requirements need to be checked in a single pattern. Basically I want to exchange the logical or (| -> "[a-zA-Z0-9]{6,}|[A-Z]{1,}|[a-z]{1,}|[0-9]{1,}) for an and :)
Can you help me learn? :)
Examples:
fjd3IR9 -> true
ghdfj32 -> false (no upper case)
fjd3 IR9 -> false (white space is not alpha numeric)
djI38D55 -> true
You need to compile all your regex functions. Check this example:
import re
re1 = r'\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*'
re2 = '\d*[/]\d*[A-Z]*\d*\s[A-Z]*\d*[A-Z]*'
re3 = '[A-Z]*\d+[/]\d+[A-Z]\d+'
re4 = '\d+[/]\d+[A-Z]*\d+\s\d+[A-Z]\s[A-Z]*'
sentences = [string1, string2, string3, string4]
for sentence in sentences:
generic_re = re.compile("(%s|%s|%s|%s)" % (re1, re2, re3, re4)).findall(sentence)
To findall with an arbitrary series of REs all you have to do is concatenate the list of matches which each returns:
re_list = [
'\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*', # re1 in question,
...
'\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*', # re4 in question
]
matches = []
for r in re_list:
matches += re.findall( r, string)
For efficiency it would be better to use a list of compiled REs.
Alternatively you could join the element RE strings using
generic_re = re.compile( '|'.join( re_list) )
You can use the built in functions any (or all if all regexes have to match) and a Generator expression to cycle through all the regex objects.
any (regex.match(line) for regex in [regex1, regex2, regex3])
(or any(re.match(regex_str, line) for regex in [regex_str1, regex_str2, regex_str2]) if the regexes are not pre-compiled regex objects, of course)
However, that will be inefficient compared to combining your regexes in a single expression. If this code is time- or CPU-critical, you should try instead to compose a single regular expression that encompasses all your needs, using the special | regex operator to separate the original expressions.
A simple way to combine all the regexes is to use the string join method:
re.match("|".join([regex_str1, regex_str2, regex_str2]), line)
A warning about combining the regexes in this way: It can result in wrong expressions if the original ones already do make use of the | operator.
Try this new regex: (regex1)|(regex2)|(regex3). This will match a line with any of the 3 regexs in it.