You need the first captured group:
a.group(1)
b.group(1)
...
without any captured group specification as argument to group(), it will show the full match, like what you're getting now.
Here's an example:
In [8]: string_one = 'file_record_transcript.pdf'
In [9]: re.search(r'^(file.*)\.pdf$', string_one).group()
Out[9]: 'file_record_transcript.pdf'
In [10]: re.search(r'^(file.*)\.pdf$', string_one).group(1)
Out[10]: 'file_record_transcript'
Answer from heemayl on Stack OverflowYou need the first captured group:
a.group(1)
b.group(1)
...
without any captured group specification as argument to group(), it will show the full match, like what you're getting now.
Here's an example:
In [8]: string_one = 'file_record_transcript.pdf'
In [9]: re.search(r'^(file.*)\.pdf$', string_one).group()
Out[9]: 'file_record_transcript.pdf'
In [10]: re.search(r'^(file.*)\.pdf$', string_one).group(1)
Out[10]: 'file_record_transcript'
you can also use match[index]
a[0] => Full match (file_record_transcript.pdf)
a[1] => First group (file_record_transcript)
a[2] => Second group (if any)
(Regex | Regular expression) match.group(1) = 'None' due to using appended patterns
Accessing a "symbolic group name" in Python regex
regex - Match groups in Python - Stack Overflow
Python Conditional Regex to Print Decimal Number
Videos
Hi there,
I am using regex to get 'pack size' (how a product is packaged) out of product descriptions. So 'Apple 1x300g' and taking out '1x300g'.
The issue is that there are lots of different variants a pack size can be, so I have been using the '|' between the regex expressions. However, when doing this it doesn't allow the match.group(n), where n >1 to work.
Quick example:
description = 'apple 1x300g'
UoM = r'(g|kg|ml) #Meaning Unit's of Measure
all_patterns = (
r'(\d+)\s*x\s*+(\d+)' + UoM + r'|'
r'(\d+)\s*' + UoM +r'\s*x(\d+)'
)
match = re.search(all_patterns, description)
However, match.group(0) will give '1x300g' and match.group(1) will be 'none' as all_patterns is just one big or. I am wanting it to be '1' and '300g'.
Is there a simple fix, other than looping through the patterns?
I am new to regex so appreciate any help.
PL :)
Hello all,
I'm trying to understand how to access a "symbolic group name" within regex groups.
https://docs.python.org/3.10/library/re.html?highlight=re#regular-expression-syntax
The documentation states:
"Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named.
Named groups can be referenced in three contexts. If the pattern is (?P<quote>['"]).*?(?P=quote) (i.e. matching a string quoted with either single or double quotes):"
I would have expected to do something like this:
import re
text_string = 'This is a nice string: we should use it some time\r\nThis is NOT a nice string: we should NEVER use it\r\n'
found = re.findall(r'(?P<name>.*?): (?P<value>.*?)\r\n', text_string)
print(f'Data Content:\t{found[0]("value")}')
If I'm correct and you can access a symbolic group name how do you do it? The documentation is not very clear on that section.
Kind regards
You could create a little class that returns the boolean result of calling match, and retains the matched groups for subsequent retrieval:
import re
class REMatcher(object):
def __init__(self, matchstring):
self.matchstring = matchstring
def match(self,regexp):
self.rematch = re.match(regexp, self.matchstring)
return bool(self.rematch)
def group(self,i):
return self.rematch.group(i)
for statement in ("I love Mary",
"Ich liebe Margot",
"Je t'aime Marie",
"Te amo Maria"):
m = REMatcher(statement)
if m.match(r"I love (\w+)"):
print "He loves",m.group(1)
elif m.match(r"Ich liebe (\w+)"):
print "Er liebt",m.group(1)
elif m.match(r"Je t'aime (\w+)"):
print "Il aime",m.group(1)
else:
print "???"
Update for Python 3 print as a function, and Python 3.8 assignment expressions - no need for a REMatcher class now:
import re
for statement in ("I love Mary",
"Ich liebe Margot",
"Je t'aime Marie",
"Te amo Maria"):
if m := re.match(r"I love (\w+)", statement):
print("He loves", m.group(1))
elif m := re.match(r"Ich liebe (\w+)", statement):
print("Er liebt", m.group(1))
elif m := re.match(r"Je t'aime (\w+)", statement):
print("Il aime", m.group(1))
else:
print()
Less efficient, but simpler-looking:
m0 = re.match("I love (\w+)", statement)
m1 = re.match("Ich liebe (\w+)", statement)
m2 = re.match("Je t'aime (\w+)", statement)
if m0:
print("He loves", m0.group(1))
elif m1:
print("Er liebt", m1.group(1))
elif m2:
print("Il aime", m2.group(1))
The problem with the Perl stuff is the implicit updating of some hidden variable. That's simply hard to achieve in Python because you need to have an assignment statement to actually update any variables.
The version with less repetition (and better efficiency) is this:
pats = [
("I love (\w+)", "He Loves {0}" ),
("Ich liebe (\w+)", "Er Liebe {0}" ),
("Je t'aime (\w+)", "Il aime {0}")
]
for p1, p3 in pats:
m = re.match(p1, statement)
if m:
print(p3.format(m.group(1)))
break
A minor variation that some Perl folk prefer:
pats = {
"I love (\w+)" : "He Loves {0}",
"Ich liebe (\w+)" : "Er Liebe {0}",
"Je t'aime (\w+)" : "Il aime {0}",
}
for p1 in pats:
m = re.match(p1, statement)
if m:
print(pats[p1].format(m.group(1)))
break
This is hardly worth mentioning except it does come up sometimes from Perl programmers.