It is easy enough to make the season group optional:
(^.*?)(?:\Ws(?:eason )?(\d{1,2}|[ivxlcdm]{1,5}))?\Wp(?:art )?(\d{1,2}|[ivxlcdm]{1,5})\W(.*$)
using a non-capturing group ((?:...)) plus the 0 or 1 quantifier (?). I did have to make the first group non-greedy to prevent it from matching the season section of the name.
I also made the eason and art optional strings into non-capturing optional groups instead of character classes.
Result:
>>> import re
>>> p=re.compile(r'(^.*?)(?:\Ws(?:eason )?(\d{1,2}|[ivxlcdm]{1,5}))?\Wp(?:art )?(\d{1,2}|[ivxlcdm]{1,5})\W(.*$)', re.I)
>>> p.search('miniseries.season 1.part 5.720p.avi').groups()
('miniseries', '1', '5', '720p.avi')
>>> p.search('miniseries.part 5.720p.avi').groups()
('miniseries', None, '5', '720p.avi')
>>> p.search('miniseries.part VII.720p.avi').groups()
('miniseries', None, 'VII', '720p.avi')
Answer from Martijn Pieters on Stack OverflowI need to capture a group which contains a partial optional part inside, but I can't manage to build it.
Example: iphone or iphone11
I need to capture iphone (if it's only iPhone) or iphone11 if it has the 11 together. This is just an example, it isn't necessarily numbers.
Example 2: abcd or abcdef
I want to capture abcd or abcdef.
I was trying by using this:
(iphone(11)?) OR (abcd(ef)?)
But it obviously gives me 2 results if the second capturing group exists. And I need it as 1 result only.
It's more complex that simply putting simple alternatives like this:
(iphone|iphone11)
PCRE
Edit: clarifying
Using regex for an optional string
How can I make an optional named group with Regex Columnizer?
Problem with optional group captured by another group
regex - Optional named groups Python re - Stack Overflow
Hello, I'm trying to parse python docstrings (numpy format), which consists of 3 capture groups, but the last group (which is optional) ends up in the 2nd group. Can you help me get it to correctly assign ", optional" to the third group, if it exists in the string? (I don't actually need the third group, but I need the second group to not contain the ", optional" part)
You can see the issue in this picture - I would like ", optional" to be in a separate group.
Regex:
(\w+)\s*:\s*([\w\[\], \| \^\w]+)(, optional)?
Test cases:
a: int
a: Dict[str, Any]
a: str | any
a: int, optional
a: str | any, optional
Works this way to me:
r'^list_cv/(?:(?P<category>[\w+])/)?$'
EDIT:
Comparing to the original answer the difference is in the repetition match.
(?:(?P<category>[\w+])/)?$ vs original (?:(?P<category>[\w+])?/)$.
The last slash should be part of the optional RE, and the RE should be like
r'^list_cv/(?:(?P<category>[\w+])?/)$'
I didn't test it, though.
If you want to keep the 2 capturing groups and you want to match a range from 1-24 followed by hour and optionally a space and 30 minutes, you might shorten the pattern to:
(\d{4})(?:.* ((?:[1-9]|1[0-9]|2[0-4]) hour(?: 30 minutes)?))?
In parts
(\d{4})Capture group 1 Match 4 digits (You might prepend a word boundary\b)(?:Non capturing group.*Match any char 0+ times followed by a space (Or use.*\b)(Capture group 2(?:[1-9]|1[0-9]|2[0-4]) hourMatch a range 1-24 followed by hour(?: 30 minutes)?Optionally match 30 minutes
)Close group 2
)?Close on capturing group and make it optional
Regex demo
I didn't change your pattern a lot because you didn't explain what you want to extract exactly.
When you make second group optional everything will be consumed by .* because it's greedy, so you need to fix this first .*?.
now the second group should be also put in a non capturing group to match either the text ends with something like for 1 hour or end of line \n.
check this:
import re
text = """
(1) Pay for zone 1234 for 1 hour
(2) Pay for zone 4567
(3) Pay for zone 1234 for 1 hour 30 minutes
"""
RE = r'(\d{4}).*?(?:(30 minutes|1 hour(?: 30 minutes)?|(?:[2-9]|1[0-9]|2[0-4]) hour(?: 30 minutes)?)|\n)'
# same thing using compile with flags MULTILINE
# RE = re.compile(r'(\d{4}).*?(?:(30 minutes|1 hour(?: 30 minutes)?|(?:[2-9]|1[0-9]|2[0-4]) hour(?: 30 minutes)?)|$)', flags=re.MULTILINE)
print(re.findall(RE, text))
OUTPUT:
[('1234', '1 hour'), ('4567', ''), ('1234', '1 hour 30 minutes')]