It seems that this bug is related to backtracking. It occurs when a capture group is repeated, and the capture group matches but the pattern after the group doesn't.

An example:

>>> regex.sub(r'(?:(\d{1,3})x)+', r'\1', '123x5')
'5'

For reference, the expected output would be:

>>> re.sub(r'(?:(\d{1,3})x)+', r'\1', '123x5')
'1235'

In the first iteration, the capture group (\d{1,3}) consumes the first 3 digits, and x consumes the following "x" character. Then, because of the +, the match is attempted a 2nd time. This time, (\d{1,3}) matches "5", but the x fails to match. However, the capture group's value is now (re)set to the empty string instead of the expected 123.

As a workaround, we can prevent the capture group from matching. In this case, changing it to (\d{2,3}) is enough to bypass the bug (because it no longer matches "5"):

>>> regex.sub(r'(?:(\d{2,3})x)+', r'\1', '123x5')
'1235'

As for the pattern in question, we can use a lookahead assertion; we change (\w{1,3}) to (?=\w{1,3}(?:-|\.\.))(\w{1,3}):

>>> pattern= r"(?i)\b((?=\w{1,3}(?:-|\.\.))(\w{1,3})(-|\.{2,10})[\t ]?)+(\2\w{2,})"
>>> regex.sub(pattern, substitute, content)
'"Erm....yes. T-Thank you for that."'
Answer from Aran-Fey on Stack Overflow
๐ŸŒ
Python documentation
docs.python.org โ€บ 3 โ€บ library โ€บ re.html
re โ€” Regular expression operations
4 days ago - Source code: Lib/re/ This module provides regular expression matching operations similar to those found in Perl. Both patterns and strings to be searched can be Unicode strings ( str) as well as 8-...
Top answer
1 of 2
6

It seems that this bug is related to backtracking. It occurs when a capture group is repeated, and the capture group matches but the pattern after the group doesn't.

An example:

>>> regex.sub(r'(?:(\d{1,3})x)+', r'\1', '123x5')
'5'

For reference, the expected output would be:

>>> re.sub(r'(?:(\d{1,3})x)+', r'\1', '123x5')
'1235'

In the first iteration, the capture group (\d{1,3}) consumes the first 3 digits, and x consumes the following "x" character. Then, because of the +, the match is attempted a 2nd time. This time, (\d{1,3}) matches "5", but the x fails to match. However, the capture group's value is now (re)set to the empty string instead of the expected 123.

As a workaround, we can prevent the capture group from matching. In this case, changing it to (\d{2,3}) is enough to bypass the bug (because it no longer matches "5"):

>>> regex.sub(r'(?:(\d{2,3})x)+', r'\1', '123x5')
'1235'

As for the pattern in question, we can use a lookahead assertion; we change (\w{1,3}) to (?=\w{1,3}(?:-|\.\.))(\w{1,3}):

>>> pattern= r"(?i)\b((?=\w{1,3}(?:-|\.\.))(\w{1,3})(-|\.{2,10})[\t ]?)+(\2\w{2,})"
>>> regex.sub(pattern, substitute, content)
'"Erm....yes. T-Thank you for that."'
2 of 2
1

edit: the bug is now resolved in regex 2017.04.23

just tested in Python 3.6.1 and the original pattern works the same in re and regex


Original workaround - you can use a lazy operator +? (i.e. a different regex that will behave differently than original pattern in edge cases like T...Tha....Thank):

pattern = r"(?i)\b((\w{1,3})(-|\.{2,10})[\t ]?)+?(\2\w{2,})"


The bug in 2017.04.05 was due to backtracking, something like this:

The unsuccessful longer match creates empty \2 group and conceptually, it should trigger backtracking to shorter match, where the nested group will be not empty, but regex seems to "optimize" and does not compute the shorter match from scratch, but uses some cached values, forgetting to undo the update of nested match groups.

Example greedy matching ((\w{1,3})(\.{2,10})){1,3} will first attempt 3 repetitions, then backtracks to less:

import re
import regex

content = '"Erm....yes. T..T...Thank you for that."'
base_pattern_template = r'((\w{1,3})(\.{2,10})){%s}'
test_cases = ['1,3', '3', '2', '1']

for tc in test_cases:
    pattern = base_pattern_template % tc
    expected = re.findall(pattern, content)
    actual = regex.findall(pattern, content)
    # TODO: convert to test case, e.g. in pytest
    # assert str(expected) == str(actual), '{}\nexpected: {}\nactual: {}'.format(tc, expected, actual)
    print('expected:', tc, expected)
    print('actual:  ', tc, actual)

output:

expected: 1,3 [('Erm....', 'Erm', '....'), ('T...', 'T', '...')]
actual:   1,3 [('Erm....', '', '....'), ('T...', '', '...')]
expected: 3 []
actual:   3 []
expected: 2 [('T...', 'T', '...')]
actual:   2 [('T...', 'T', '...')]
expected: 1 [('Erm....', 'Erm', '....'), ('T..', 'T', '..'), ('T...', 'T', '...')]
actual:   1 [('Erm....', 'Erm', '....'), ('T..', 'T', '..'), ('T...', 'T', '...')]
Discussions

regex - An alternative regular expression module that is intended to eventually replace Python's re, with behavior more consistent with PCRE
Can we please have failed matches return something other than None so we can do re.match(...).group(1) without dying when it's not found? I'd like to be able to use regexes in lambdas and list comprehensions without writing a helper function to assign a variable. Just return an object that evaluates to false with __nonzero__/__bool__. More on reddit.com
๐ŸŒ r/Python
79
155
December 31, 2015
I've created a Python module for constructing Regex patterns in a more computer programming-familiar way, so you don't have to re-learn Regex each time you use it!
It seems pregnant with potential. More on reddit.com
๐ŸŒ r/Python
93
551
July 20, 2022
regular expressions - In what programming language is Python's regex module written in? - Software Engineering Stack Exchange
If I try to rewrite specific regex functionalities (e.g. substituting a string) in Python, a solution using the regex module is always faster. Is regex written in C? More on softwareengineering.stackexchange.com
๐ŸŒ softwareengineering.stackexchange.com
May 11, 2020
Regular Expressions (RE) Module - Search and Match Comparison
Hello, I have a question regarding the regular expression compile. I created a code snippet to compare the different search and match results using different strings and using different patterns. Here is the test code snippet: import re s1 = 'bob has a birthday on Feb 25th' s2 = 'sara has a ... More on discuss.python.org
๐ŸŒ discuss.python.org
0
October 26, 2023
๐ŸŒ
W3Schools
w3schools.com โ€บ python โ€บ python_regex.asp
Python RegEx
RegEx can be used to check if a string contains the specified search pattern. Python has a built-in package called re, which can be used to work with Regular Expressions.
๐ŸŒ
LearnByExample
learnbyexample.github.io โ€บ py_regular_expressions โ€บ regex-module.html
regex module - Understanding Python re(gex)?
The third-party regex module (https://pypi.org/project/regex/) offers advanced features like those found in the Perl language and other regular expression implementations. To install the module from the command line, you can use either of these depending on your usage: pip install regex in a virtual environment ยท python3.13 -m pip install --user regex for normal environments ยท
๐ŸŒ
PyPI
pypi.org โ€บ project โ€บ regex
regex ยท PyPI
Alternative regular expression module, to replace re.
      ยป pip install regex
    
Published ย  Feb 28, 2026
Version ย  2026.2.28
๐ŸŒ
freeCodeCamp
freecodecamp.org โ€บ news โ€บ how-to-import-a-regular-expression-in-python
Python RegEx โ€“ How to Import a Regular Expression in Python
March 1, 2023 - But using a flag with regular expressions in Python is different from how we use it in JavaScript. To use flags with regular expressions in Python, the re module provides the IGNORECASE, ASCII, MULTILINE, VERBOSE, DOTALL, and LOCAL options.
๐ŸŒ
Rexegg
rexegg.com โ€บ regex-python.php
Python Regex Tutorial
You should know that re, the default regex engine for Python, is the second worst among all the major engines (granted, JavaScript wins the loser contest by a long margin.) You should also know that Python has an alternate regular expressions module called regex, which is possibly the very best engine available in the major languages.
๐ŸŒ
YouTube
youtube.com โ€บ watch
Python Tutorial: re Module - How to Write and Match Regular Expressions (Regex) - YouTube
In this Python Programming Tutorial, we will be learning how to read, write, and match regular expressions with the re module. Regular expressions are extrem...
Published ย  October 24, 2017
Find elsewhere
๐ŸŒ
Reddit
reddit.com โ€บ r/python โ€บ regex - an alternative regular expression module that is intended to eventually replace python's re, with behavior more consistent with pcre
r/Python on Reddit: regex - An alternative regular expression module that is intended to eventually replace Python's re, with behavior more consistent with PCRE
December 31, 2015 - It supports all the Unicode property tests that link is to a PCRE syntax reference. The only problem with regex is that the doc is scattered as hell: you start with the Python re doc but then fumble through the pypi page doc with is "organized" if you can call it that, by the sequence in which issues were raised, and then finally go to a
๐ŸŒ
Regular-Expressions.info
regular-expressions.info โ€บ python.html
Python re Module - Use Regular Expressions with Python - Regex Support
Python is a high level open source scripting language. Pythonโ€™s built-in โ€œreโ€ module provides excellent support for regular expressions, with a modern and complete regex flavor. Two significant missing features, atomic grouping and possessive quantifiers, were added in Python 3.11.
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ python โ€บ regular-expression-python-examples
Python RegEx - GeeksforGeeks
August 14, 2025 - It can detect the presence or absence of a text by matching it with a particular pattern and also can split a pattern into one or more sub-patterns. Python has a built-in module named "re" that is used for regular expressions in Python.
๐ŸŒ
Reddit
reddit.com โ€บ r/python โ€บ i've created a python module for constructing regex patterns in a more computer programming-familiar way, so you don't have to re-learn regex each time you use it!
r/Python on Reddit: I've created a Python module for constructing Regex patterns in a more computer programming-familiar way, so you don't have to re-learn Regex each time you use it!
July 20, 2022 -

There does not yet exist a separate documentation page with specific instructions on how to use each class of the module, though all classes are sufficiently documented. There also exists a small example within the repo's README file to get the hang of it.

Here is the link to the repo: https://github.com/manoss96/pregex

Any feedback is welcome!

UPDATE: Thank you all for your comments and feedback, I hope this package helps you get the job done faster! I've gotten a lot of comments mentioning that having to import every stuff is annoying, and I can understand that. However, I still think that all classes should remain separated into different modules, as each module expresses a different functionality, but at the same time I don't think that importing everything all at once is a good thing, so I tried a different approach. All the modules that you'll need are now imported within the package's "__init__.py" by using a short alias for each module. For instance, "quantifiers.py" is imported as "qu". Thus, you can simply write "from pregex import *" at the top of your .py script, and then just use these aliases. Just be careful, this can only be done in pregex version >=1.0.2.

Top answer
1 of 1
6

Regex is a language. It doesn't look like much of one, but it is.

Like every language, it is not written in anything. A language is a set of mathematical rules and restrictions. If we can say that it is written in anything at all, we would probably say that it is written in English. (Or in a specific English-based jargon for specifying languages, enriched with graphical and mathematical tools for expressing language rules.)

A specific implementation of the language (regex) is of course written in a specific language, but the language itself isn't.

As an example, the implementation of the re module that ships as part of the CPython implementation of the Python programming language is called the Secret Labs' Regular Expression Engine (sre), and is written in Python and C. More precisely, it consists of a compiler written in Python that compiles re regexes into byte code for a virtual machine, and a VM written in C that interprets that byte code.

The implementation that ships with Jython uses the same Python code and byte code, but the byte code VM is written in Java, not C.

At first glance, IronPython looks similar: compiler in Python and VM in C#. However, if you look closer, the VM is actually a non-functional stub, and the real implementation is in C# and is based on System.Text.RegularExpressions from the CLI.

PyPy follows the standard pattern again: compiler in Python and the VM in RPython.

And of course other languages have completely different flavors of regex. E.g. Ruby's Regexp is quite different from Python's re. And in Ruby, we have similar diversity: YARV uses an engine called Onigmo to implement its Regexp class whereas JRuby uses joni.

๐ŸŒ
Quora
quora.com โ€บ How-do-I-download-and-install-the-RE-module-for-Python-3-6-on-Windows-7
How to download and install the RE module for Python 3.6 on Windows 7 - Quora
Answer (1 of 3): The [code ]re[/code] module is spelled all lowercase. Python 3.6 already has re. If you set the environment variable PYTHONCASEOK to 1 in Windows you can [code ]import RE[/code] and Python will import the re module.
๐ŸŒ
Google
developers.google.com โ€บ google for education โ€บ python โ€บ python regular expressions
Python Regular Expressions | Python Education | Google for Developers
This page gives a basic introduction to regular expressions themselves sufficient for our Python exercises and shows how regular expressions work in Python. The Python "re" module provides regular expression support.
๐ŸŒ
Python documentation
docs.python.org โ€บ 3 โ€บ howto โ€บ regex.html
Regular Expression HOWTO โ€” Python 3.14.3 documentation
This document is an introductory tutorial to using regular expressions in Python with the re module. It provides a gentler introduction than the corresponding section in the Library Reference.
๐ŸŒ
Simplilearn
simplilearn.com โ€บ home โ€บ resources โ€บ software development โ€บ your ultimate python tutorial for beginners โ€บ python regular expression (regex)
Python Regular Expression (RegEX)
September 3, 2024 - Python has many powerful features, and Python regular expression (RegEx) is one of those used for data cleansing. So, read on to learn more!
Address ย  5851 Legacy Circle, 6th Floor, Plano, TX 75024 United States
๐ŸŒ
Python.org
discuss.python.org โ€บ python help
Regular Expressions (RE) Module - Search and Match Comparison - Python Help - Discussions on Python.org
October 26, 2023 - Hello, I have a question regarding the regular expression compile. I created a code snippet to compare the different search and match results using different strings and using different patterns. Here is the test code snippet: import re s1 = 'bob has a birthday on Feb 25th' s2 = 'sara has a birthday on March 3rd' s3 = '12eup 586iu' s4 = '0turt' # '\w\w\w \d\d\w\w' bday1_re = re.compile('\w+ \d+\w+') # Also tried: '\w+ \d+\w+' bday2_re = re.comp...
๐ŸŒ
Medium
medium.com โ€บ @ebojacky โ€บ the-very-bare-minimum-essentials-for-regular-expressions-in-python-54e78c10b649
The Very Bare Minimum Essentials for Regular Expressions in Python | by Ebo Jackson | Medium
June 2, 2025 - Regular expressions (regex) in Python are a powerful tool for pattern matching, text manipulation, and data extraction. The re module provides a robust framework for working with regex, enabling developers to handle tasks like validation, parsing, ...
๐ŸŒ
GitHub
github.com โ€บ mrabarnett โ€บ mrab-regex
GitHub - mrabarnett/mrab-regex
It expects that all codepoints ... UTF-8. The regex module releases the GIL during matching on instances of the built-in (immutable) string classes, enabling other Python threads to run concurrently....
Starred by 576 users
Forked by 70 users
Languages ย  C 85.8% | Python 14.2%