I think you're misunderstanding the concept of a "non-capturing group". The text matched by a non-capturing group still becomes part of the overall regex match.
Both the regex (?:aaa)(_bbb) and the regex (aaa)(_bbb) return aaa_bbb as the overall match. The difference is that the first regex has one capturing group which returns _bbb as its match, while the second regex has two capturing groups that return aaa and _bbb as their respective matches. In your Python code, to get _bbb, you'd need to use group(1) with the first regex, and group(2) with the second regex.
The main benefit of non-capturing groups is that you can add them to a regex without upsetting the numbering of the capturing groups in the regex. They also offer (slightly) better performance as the regex engine doesn't have to keep track of the text matched by non-capturing groups.
If you really want to exclude aaa from the overall regex match then you need to use lookaround. In this case, positive lookbehind does the trick: (?<=aaa)_bbb. With this regex, group() returns _bbb in Python. No capturing groups needed.
My recommendation is that if you have the ability to use capturing groups to get part of the regex match, use that method instead of lookaround.
Answer from Jan Goyvaerts on Stack OverflowI think you're misunderstanding the concept of a "non-capturing group". The text matched by a non-capturing group still becomes part of the overall regex match.
Both the regex (?:aaa)(_bbb) and the regex (aaa)(_bbb) return aaa_bbb as the overall match. The difference is that the first regex has one capturing group which returns _bbb as its match, while the second regex has two capturing groups that return aaa and _bbb as their respective matches. In your Python code, to get _bbb, you'd need to use group(1) with the first regex, and group(2) with the second regex.
The main benefit of non-capturing groups is that you can add them to a regex without upsetting the numbering of the capturing groups in the regex. They also offer (slightly) better performance as the regex engine doesn't have to keep track of the text matched by non-capturing groups.
If you really want to exclude aaa from the overall regex match then you need to use lookaround. In this case, positive lookbehind does the trick: (?<=aaa)_bbb. With this regex, group() returns _bbb in Python. No capturing groups needed.
My recommendation is that if you have the ability to use capturing groups to get part of the regex match, use that method instead of lookaround.
group() and group(0) will return the entire match. Subsequent groups are actual capture groups.
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(0))
aaa_bbb
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(1))
_bbb
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(2))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IndexError: no such group
If you want the same behavior than group():
" ".join(re.match(r"(?:aaa)(_bbb)", string1).groups())
Videos
Should I use non-capturing groups if I don't care whether they capture?
Regex capture / non-capture groups best practice
Can someone explain what are non-capture groups?
I'm sooo confused (non capturing groups)
Let's say I'm just doing string validation and I don't care whether my groups capture or not. Is it advisable to use regular groups so that my expression has fewer characters and it's easier for people to understand? I like to use non-capturing groups by default because it feels polite to give the engine less work, but I don't think there's any appreciable impact on performance. Sorry if this is on a list of best practices, I couldn't find one.
Hi everyone!
I'm struggling to understand what are non-capturing groups.
My take and if I understood correctly:
-
when you group, you're applying precedence in terms of evaluation, like normal parenthesis would work in a math expression.
-
a normal group it creates some sort of indexing that the regex engine can use for other checks later on if it has advanced stuff like tagging or recursion.
-
when you use
?:- non-capturing group - you're also grouping as well but it doesn't do any indexing.
Is this correct?
Would there any difference between simple stuff like (^$)|(^(No|Yes)$) to (?:^$)|(?:^(?:No|Yes)$) ?
Thank you in advance.
So I'm doing the tutorials slowly moving through it here: https://regexlearn.com/learn/regex101 - and... why in the heck would he want to "exclude" - is that the right word - or do a non-capture group - to select all those haha characters? Like what's the point of the question mark when
(ha)-\1,(haa)-\2
Will achieve the same thing - I'm confused as to why it even needs to exist in the first place.
Thank you!