The issue you're having is with the type of quantifier. You're using a greedy quantifier in your first group (index 1 - index 0 represents the whole Pattern), which means it'll match as much as it can (and since it's any character, it'll match as many characters as there are in order to fulfill the condition for the next groups).
In short, your 1st group .* matches anything as long as the next group \\d+ can match something (in this case, the last digit).
As per the 3rd group, it will match anything after the last digit.
If you change it to a reluctant quantifier in your 1st group, you'll get the result I suppose you are expecting, that is, the 3000 part.
Note the question mark in the 1st group.
String line = "This order was placed for QT3000! OK?";
Pattern pattern = Pattern.compile("(.*?)(\\d+)(.*)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println("group 1: " + matcher.group(1));
System.out.println("group 2: " + matcher.group(2));
System.out.println("group 3: " + matcher.group(3));
}
Output:
group 1: This order was placed for QT
group 2: 3000
group 3: ! OK?
More info on Java Pattern here.
Finally, the capturing groups are delimited by round brackets, and provide a very useful way to use back-references (amongst other things), once your Pattern is matched to the input.
In Java 6 groups can only be referenced by their order (beware of nested groups and the subtlety of ordering).
In Java 7 it's much easier, as you can use named groups.
Answer from Mena on Stack OverflowVideos
The issue you're having is with the type of quantifier. You're using a greedy quantifier in your first group (index 1 - index 0 represents the whole Pattern), which means it'll match as much as it can (and since it's any character, it'll match as many characters as there are in order to fulfill the condition for the next groups).
In short, your 1st group .* matches anything as long as the next group \\d+ can match something (in this case, the last digit).
As per the 3rd group, it will match anything after the last digit.
If you change it to a reluctant quantifier in your 1st group, you'll get the result I suppose you are expecting, that is, the 3000 part.
Note the question mark in the 1st group.
String line = "This order was placed for QT3000! OK?";
Pattern pattern = Pattern.compile("(.*?)(\\d+)(.*)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println("group 1: " + matcher.group(1));
System.out.println("group 2: " + matcher.group(2));
System.out.println("group 3: " + matcher.group(3));
}
Output:
group 1: This order was placed for QT
group 2: 3000
group 3: ! OK?
More info on Java Pattern here.
Finally, the capturing groups are delimited by round brackets, and provide a very useful way to use back-references (amongst other things), once your Pattern is matched to the input.
In Java 6 groups can only be referenced by their order (beware of nested groups and the subtlety of ordering).
In Java 7 it's much easier, as you can use named groups.
This is totally OK.
- The first group (
m.group(0)) always captures the whole area that is covered by your regular expression. In this case, it's the whole string. - Regular expressions are greedy by default, meaning that the first group captures as much as possible without violating the regex. The
(.*)(\\d+)(the first part of your regex) covers the...QT300int the first group and the0in the second. - You can quickly fix this by making the first group non-greedy: change
(.*)to(.*?).
For more info on greedy vs. lazy, check this site.
Your pattern doesn't match because it requires an open curly brace at the end, but your input doesn't have one.
Ignoring that small problem, the main problem is the little + after your capture group (.*)+. The plus requires one or more matches of .* and the group returned is the last match of the many. The term .* is greedy, so it consumes everything up to the bracket. The only way to match again is to consume nothing. So the last match of group 2 is blank.
To fix it, remove the + after group 2:
Pattern PATTERN = Pattern.compile("\\s*temp\\s+([A-Za-z]+)\\s*[(]\\s*(.*)[)]\\s*");
Note also how I removed other unnecessary characters from your regex, eg the single-character character classes - ie [\\s] is identical to \s. And \\s+\\s* is identical to just \\s+, because + is greedy.
I also removed the trailing curly bracket, which you can restore if your input data actually has it (your question showed input of "temp name(this is the data)", which has no trailing curly bracket).
Your regex should be this:
Pattern pattern = Pattern.compile("\\s*temp\\s+([A-Za-z]+)\\s*[(]\\s*(.*)[)]\\s*");
You had (.*)+ which means one or more matches of .*. This results in nothing being captured.
Testing:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("\\s*temp\\s+([A-Za-z]+)\\s*[(]\\s*(.*)[)]\\s*");
Matcher m = pattern.matcher("temp name(this is the data)");
if(m.matches()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
}
}
Output:
name
this is the data