Use
[A-Z]?
to make the letter optional. {1} is redundant. (Of course you could also write [A-Z]{0,1} which would mean the same, but that's what the ? is there for.)
You could improve your regex to
^([0-9]{5})+\s+([A-Z]?)\s+([A-Z])([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})
And, since in most regex dialects, \d is the same as [0-9]:
^(\d{5})+\s+([A-Z]?)\s+([A-Z])(\d{3})(\d{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])\d{3}(\d{4})(\d{2})(\d{2})
But: do you really need 11 separate capturing groups? And if so, why don't you capture the fourth-to-last group of digits?
Answer from Tim Pietzcker on Stack OverflowUse
[A-Z]?
to make the letter optional. {1} is redundant. (Of course you could also write [A-Z]{0,1} which would mean the same, but that's what the ? is there for.)
You could improve your regex to
^([0-9]{5})+\s+([A-Z]?)\s+([A-Z])([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})
And, since in most regex dialects, \d is the same as [0-9]:
^(\d{5})+\s+([A-Z]?)\s+([A-Z])(\d{3})(\d{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])\d{3}(\d{4})(\d{2})(\d{2})
But: do you really need 11 separate capturing groups? And if so, why don't you capture the fourth-to-last group of digits?
You can make the single letter optional by adding a ? after it as:
([A-Z]{1}?)
The quantifier {1} is redundant so you can drop it.
hey, I'm using python and I'd like to match shop no. strings like:
SHOP NO. 3 AND 4 AT PLOT NO. 24 - string 1 FIRST FLOOR OF SHOP NO. 7 ATLANTIC - string 2 SHOP NO-33 - string 3
this is what i expect:
3 AND 4 7 33
i came up with this regex: shop no\W*([\w\-,]*)\b (and|to)? (\d*)\b it matches the second and third stirng but not the first string.
Essentially I want to match the third group if the second group (and or to) exists. It will be even better if i can do this with only one group.
Any help is appreciated. Thanks
I have strings that could be something like this, where "very" is an optional word that may or may not be present:
string1 = "It is very cold outside" string2 = "It is cold outside"
I've tried the following regex with the "?:" optional group:
re.findall("It is (?:very) cold outside", string2)But it doesn't find anyting for string2. I thought it might have something to do with the spaces but "Itiscoldoutside" still doesn't match if I adjust the pattern.
What's the issue here?
(\d+)\s+(\(.*?\))?\s?Z
Note the escaped parentheses, and the ? (zero or once) quantifiers. Any of the groups you don't want to capture can be (?: non-capture groups).
I agree about the spaces. \s is a better option there. I also changed the quantifier to insure there are digits at the beginning. As far as newlines, that would depend on context: if the file is parsed line by line it won't be a problem. Another option is to anchor the start and end of the line (add a ^ at the front and a $ at the end).
This ought to work:
^\d+\s?(\([^\)]+\)\s?)?Z$
Haven't tested it though, but let me give you the breakdown, so if there are any bugs left they should be pretty straightforward to find:
First the beginning:
^ = beginning of string
\d+ = one or more decimal characters
\s? = one optional whitespace
Then this part:
(\([^\)]+\)\s?)?
Is actually:
(.............)?
Which makes the following contents optional, only if it exists fully
\([^\)]+\)\s?
\( = an opening bracket
[^\)]+ = a series of at least one character that is not a closing bracket
\) = followed by a closing bracket
\s? = followed by one optional whitespace
And the end is made up of
Z$
Where
Z = your constant string
$ = the end of the string
You have the #p outside of the capturing group, which makes it a required piece of the result. You are also using the dot character (.) improperly. Dot (in most reg-ex variants) will match any character. Change it to:
([0-9]*)(?:#p([0-9]*))?
The (?:) syntax is how you get a non-capturing group. We then capture just the digits that you're interested in. Finally, we make the whole thing optional.
Also, most reg-ex variants have a \d character class for digits. So you could simplify even further:
(\d*)(?:#p(\d*))?
As another person has pointed out, the * operator could potentially match zero digits. To prevent this, use the + operator instead:
(\d+)(?:#p(\d+))?
Your regex will actually match no digits, because you've used * instead of +.
This is what (I think) you want:
(\d+)(?:#p(\d+))?