The reason that you do not get an optional cat after a reluctantly-qualified .+? is that it is both optional and non-anchored: the engine is not forced to make that match, because it can legally treat the cat as the "tail" of the .+? sequence.
If you anchor the cat at the end of the string, i.e. use ^(dog).+?(cat)?$, you would get a match, though:
Pattern p = Pattern.compile("^(dog).+?(cat)?$");
for (String s : new String[] {"dog, cat", "dog, dog, cat", "dog, dog, dog"}) {
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)+" "+m.group(2));
}
}
This prints (demo 1)
dog cat
dog cat
dog null
Do you happen to know how to deal with it in case there's something after cat?
You can deal with it by constructing a trickier expression that matches anything except cat, like this:
^(dog)(?:[^c]|c[^a]|ca[^t])+(cat)?
Now the cat could happen anywhere in the string without an anchor (demo 2).
The reason that you do not get an optional cat after a reluctantly-qualified .+? is that it is both optional and non-anchored: the engine is not forced to make that match, because it can legally treat the cat as the "tail" of the .+? sequence.
If you anchor the cat at the end of the string, i.e. use ^(dog).+?(cat)?$, you would get a match, though:
Pattern p = Pattern.compile("^(dog).+?(cat)?$");
for (String s : new String[] {"dog, cat", "dog, dog, cat", "dog, dog, dog"}) {
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)+" "+m.group(2));
}
}
This prints (demo 1)
dog cat
dog cat
dog null
Do you happen to know how to deal with it in case there's something after cat?
You can deal with it by constructing a trickier expression that matches anything except cat, like this:
^(dog)(?:[^c]|c[^a]|ca[^t])+(cat)?
Now the cat could happen anywhere in the string without an anchor (demo 2).
Without any particular order, other options to match such patterns are:
Method 1
With non-capturing groups:
^(?:dog(?:, |
RegEx Demo 1
Or with capturing groups:
^(dog(?:, |
RegEx Demo 2
Method 2
With lookarounds,
(?<=^|, )dog|cat(?=$|,)
RegEx Demo 3
With word boundaries,
(?<=^|, )\b(?:dog|cat)\b(?=$|,)
RegEx Demo 4
Method 3
If we would have had only one cat and no dog in the string, then
^(?:dog(?:, |
would have been an option too.
RegEx Demo 5
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "^(?:dog(?:, |
";
final String string = "cat\n"
+ "dog, cat\n"
+ "dog, dog, cat\n"
+ "dog, dog, dog\n"
+ "dog, dog, dog, cat\n"
+ "dog, dog, dog, dog, cat\n"
+ "dog, dog, dog, dog, dog\n"
+ "dog, dog, dog, dog, dog, cat\n"
+ "dog, dog, dog, dog, dog, dog, dog, cat\n"
+ "dog, dog, dog, dog, dog, dog, dog, dog, dog\n";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Output
Full match: cat
Full match: dog, cat
Full match: dog, dog, cat
Full match: dog, dog, dog
Full match: dog, dog, dog, cat
Full match: dog, dog, dog, dog, cat
Full match: dog, dog, dog, dog, dog
Full match: dog, dog, dog, dog, dog, cat
Full match: dog, dog, dog, dog, dog, dog, dog, cat
Full match: dog, dog, dog, dog, dog, dog, dog, dog, dog
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

Videos
I need to capture a group which contains a partial optional part inside, but I can't manage to build it.
Example: iphone or iphone11
I need to capture iphone (if it's only iPhone) or iphone11 if it has the 11 together. This is just an example, it isn't necessarily numbers.
Example 2: abcd or abcdef
I want to capture abcd or abcdef.
I was trying by using this:
(iphone(11)?) OR (abcd(ef)?)
But it obviously gives me 2 results if the second capturing group exists. And I need it as 1 result only.
It's more complex that simply putting simple alternatives like this:
(iphone|iphone11)
PCRE
Edit: clarifying
I'm a complete beginner to Regex, and I'm basically only using it for 1 task in a program called Obsidian. It uses Javascript. This problem seems easy to me, but regex is a different beast. I've tried searching for answer for a few hours now, but I'm out of luck. Thank you for the help in advance—I really appreciate your time!
Here is my template essentially:
#card
sample text
Extra:
sample text
---
I would like to match 2 capture groups:
The first capture group should include everything (even whitespace) after the "#card" keyword and before the optional "Extra: " keyword. However, if "Extra: " is not present, it should include everything before the "---" keyword.
The second optional capture group should include everything after the "Extra: " keyword and before the "---" keyword. It only matches if it comes after "#card". Again, it's optional, so if "Extra: " is not present, it only returns the first capture group (if matched).
The multiline flag is applied by the plugin as well.
Here are some examples:
Examples that should match:
#card
(capture 1)
Extra:
(capture 2)
---
2.
#card
(capture 1)
---
3.
#card
(capture 1)
{white space}
---
Examples that shouldn't match:
Extra:
(anything)
---
2.
#card
(anything)
Extra:
3.
#random
(anything)
Extra:
(anything)
---
Thank you again for the help! Please let me know if you have any questions.
bash understands standard extended regular expressions ("ERE"), not PCRE ("Perl-compatible regular expressions").
Your PCRE:
cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(?:-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD))-[a-z]
The (?:...) in a PCRE is a non-capturing group (not an optional group). There is no equivalent in an ERE and all groups are capturing.
To make an expression optional, you may qualify it with ?, as I have done below. The ? means that the previous expression should match one or zero times.
As an ERE:
cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)?-[a-z]
or, contracting (SIT[a-z]|SIT[1-9]) into SIT[a-z1-9],
cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD)(-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD))?-[a-z]
You may also want to add anchoring to this:
^cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD)(-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD))?-[a-z]$
... otherwise it would match somethingcell-...-ablahblah
(?:...) is not an optional capture group, but a non-capturing group, which - as far as I know - is not even supported by bash. This should work:
cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD))?-[a-z]