Brave Search

Python regex module vs re module - pattern mismatch

stackoverflow.com › questions › 43563819 › python-regex-module-vs-re-module-pattern-mismatch

It seems that this bug is related to backtracking. It occurs when a capture group is repeated, and the capture group matches but the pattern after the group doesn't.

An example:

>>> regex.sub(r'(?:(\d{1,3})x)+', r'\1', '123x5')
'5'

For reference, the expected output would be:

>>> re.sub(r'(?:(\d{1,3})x)+', r'\1', '123x5')
'1235'

In the first iteration, the capture group (\d{1,3}) consumes the first 3 digits, and x consumes the following "x" character. Then, because of the +, the match is attempted a 2nd time. This time, (\d{1,3}) matches "5", but the x fails to match. However, the capture group's value is now (re)set to the empty string instead of the expected 123.

As a workaround, we can prevent the capture group from matching. In this case, changing it to (\d{2,3}) is enough to bypass the bug (because it no longer matches "5"):

>>> regex.sub(r'(?:(\d{2,3})x)+', r'\1', '123x5')
'1235'

As for the pattern in question, we can use a lookahead assertion; we change (\w{1,3}) to (?=\w{1,3}(?:-|\.\.))(\w{1,3}):

>>> pattern= r"(?i)\b((?=\w{1,3}(?:-|\.\.))(\w{1,3})(-|\.{2,10})[\t ]?)+(\2\w{2,})"
>>> regex.sub(pattern, substitute, content)
'"Erm....yes. T-Thank you for that."'

Answer from Aran-Fey on Stack Overflow

Python documentation

docs.python.org › 3 › library › re.html

re — Regular expression operations

4 days ago - Source code: Lib/re/ This module provides regular expression matching operations similar to those found in Perl. Both patterns and strings to be searched can be Unicode strings ( str) as well as 8-...

Stack Overflow

stackoverflow.com › questions › 43563819 › python-regex-module-vs-re-module-pattern-mismatch

Python regex module vs re module - pattern mismatch - Stack Overflow

Top answer

1 of 2

It seems that this bug is related to backtracking. It occurs when a capture group is repeated, and the capture group matches but the pattern after the group doesn't.

An example:

>>> regex.sub(r'(?:(\d{1,3})x)+', r'\1', '123x5')
'5'

For reference, the expected output would be:

>>> re.sub(r'(?:(\d{1,3})x)+', r'\1', '123x5')
'1235'

As a workaround, we can prevent the capture group from matching. In this case, changing it to (\d{2,3}) is enough to bypass the bug (because it no longer matches "5"):

>>> regex.sub(r'(?:(\d{2,3})x)+', r'\1', '123x5')
'1235'

As for the pattern in question, we can use a lookahead assertion; we change (\w{1,3}) to (?=\w{1,3}(?:-|\.\.))(\w{1,3}):

>>> pattern= r"(?i)\b((?=\w{1,3}(?:-|\.\.))(\w{1,3})(-|\.{2,10})[\t ]?)+(\2\w{2,})"
>>> regex.sub(pattern, substitute, content)
'"Erm....yes. T-Thank you for that."'

2 of 2

edit: the bug is now resolved in regex 2017.04.23

just tested in Python 3.6.1 and the original pattern works the same in re and regex

Original workaround - you can use a lazy operator +? (i.e. a different regex that will behave differently than original pattern in edge cases like T...Tha....Thank):

pattern = r"(?i)\b((\w{1,3})(-|\.{2,10})[\t ]?)+?(\2\w{2,})"

The bug in 2017.04.05 was due to backtracking, something like this:

The unsuccessful longer match creates empty \2 group and conceptually, it should trigger backtracking to shorter match, where the nested group will be not empty, but regex seems to "optimize" and does not compute the shorter match from scratch, but uses some cached values, forgetting to undo the update of nested match groups.

Example greedy matching ((\w{1,3})(\.{2,10})){1,3} will first attempt 3 repetitions, then backtracks to less:

import re
import regex

content = '"Erm....yes. T..T...Thank you for that."'
base_pattern_template = r'((\w{1,3})(\.{2,10})){%s}'
test_cases = ['1,3', '3', '2', '1']

for tc in test_cases:
    pattern = base_pattern_template % tc
    expected = re.findall(pattern, content)
    actual = regex.findall(pattern, content)
    # TODO: convert to test case, e.g. in pytest
    # assert str(expected) == str(actual), '{}\nexpected: {}\nactual: {}'.format(tc, expected, actual)
    print('expected:', tc, expected)
    print('actual:  ', tc, actual)

output:

expected: 1,3 [('Erm....', 'Erm', '....'), ('T...', 'T', '...')]
actual:   1,3 [('Erm....', '', '....'), ('T...', '', '...')]
expected: 3 []
actual:   3 []
expected: 2 [('T...', 'T', '...')]
actual:   2 [('T...', 'T', '...')]
expected: 1 [('Erm....', 'Erm', '....'), ('T..', 'T', '..'), ('T...', 'T', '...')]
actual:   1 [('Erm....', 'Erm', '....'), ('T..', 'T', '..'), ('T...', 'T', '...')]

Discussions

regex - An alternative regular expression module that is intended to eventually replace Python's re, with behavior more consistent with PCRE

Can we please have failed matches return something other than None so we can do re.match(...).group(1) without dying when it's not found? I'd like to be able to use regexes in lambdas and list comprehensions without writing a helper function to assign a variable. Just return an object that evaluates to false with __nonzero__/__bool__. More on reddit.com

r/Python

155

December 31, 2015

I've created a Python module for constructing Regex patterns in a more computer programming-familiar way, so you don't have to re-learn Regex each time you use it!

It seems pregnant with potential. More on reddit.com

r/Python

551

July 20, 2022

regular expressions - In what programming language is Python's regex module written in? - Software Engineering Stack Exchange

If I try to rewrite specific regex functionalities (e.g. substituting a string) in Python, a solution using the regex module is always faster. Is regex written in C? More on softwareengineering.stackexchange.com

softwareengineering.stackexchange.com

May 11, 2020

Regular Expressions (RE) Module - Search and Match Comparison

Hello, I have a question regarding the regular expression compile. I created a code snippet to compare the different search and match results using different strings and using different patterns. Here is the test code snippet: import re s1 = 'bob has a birthday on Feb 25th' s2 = 'sara has a ... More on discuss.python.org

discuss.python.org

October 26, 2023

W3Schools

w3schools.com › python › python_regex.asp

Python RegEx

RegEx can be used to check if a string contains the specified search pattern. Python has a built-in package called re, which can be used to work with Regular Expressions.

LearnByExample

learnbyexample.github.io › py_regular_expressions › regex-module.html

regex module - Understanding Python re(gex)?

The third-party regex module (https://pypi.org/project/regex/) offers advanced features like those found in the Perl language and other regular expression implementations. To install the module from the command line, you can use either of these depending on your usage: pip install regex in a virtual environment · python3.13 -m pip install --user regex for normal environments ·

PyPI

pypi.org › project › regex

regex · PyPI

Alternative regular expression module, to replace re.

      » pip install regex

Published Feb 28, 2026

Version 2026.2.28

Homepage https://github.com/mrabarnett/mrab-regex

freeCodeCamp

freecodecamp.org › news › how-to-import-a-regular-expression-in-python

Python RegEx – How to Import a Regular Expression in Python

March 1, 2023 - But using a flag with regular expressions in Python is different from how we use it in JavaScript. To use flags with regular expressions in Python, the re module provides the IGNORECASE, ASCII, MULTILINE, VERBOSE, DOTALL, and LOCAL options.

Rexegg

rexegg.com › regex-python.php

Python Regex Tutorial

You should know that re, the default regex engine for Python, is the second worst among all the major engines (granted, JavaScript wins the loser contest by a long margin.) You should also know that Python has an alternate regular expressions module called regex, which is possibly the very best engine available in the major languages.

YouTube

youtube.com › watch

Python Tutorial: re Module - How to Write and Match Regular Expressions (Regex) - YouTube

53:18

In this Python Programming Tutorial, we will be learning how to read, write, and match regular expressions with the re module. Regular expressions are extrem...

Published October 24, 2017

Find elsewhere

Google Bing Mojeek

reddit.com › r/python › regex - an alternative regular expression module that is intended to eventually replace python's re, with behavior more consistent with pcre

r/Python on Reddit: regex - An alternative regular expression module that is intended to eventually replace Python's re, with behavior more consistent with PCRE

December 31, 2015 - It supports all the Unicode property tests that link is to a PCRE syntax reference. The only problem with regex is that the doc is scattered as hell: you start with the Python re doc but then fumble through the pypi page doc with is "organized" if you can call it that, by the sequence in which issues were raised, and then finally go to a

Regular-Expressions.info

regular-expressions.info › python.html

Python re Module - Use Regular Expressions with Python - Regex Support

Python is a high level open source scripting language. Python’s built-in “re” module provides excellent support for regular expressions, with a modern and complete regex flavor. Two significant missing features, atomic grouping and possessive quantifiers, were added in Python 3.11.

GeeksforGeeks

geeksforgeeks.org › python › regular-expression-python-examples

Python RegEx - GeeksforGeeks

August 14, 2025 - It can detect the presence or absence of a text by matching it with a particular pattern and also can split a pattern into one or more sub-patterns. Python has a built-in module named "re" that is used for regular expressions in Python.

reddit.com › r/python › i've created a python module for constructing regex patterns in a more computer programming-familiar way, so you don't have to re-learn regex each time you use it!

r/Python on Reddit: I've created a Python module for constructing Regex patterns in a more computer programming-familiar way, so you don't have to re-learn Regex each time you use it!

July 20, 2022 -

There does not yet exist a separate documentation page with specific instructions on how to use each class of the module, though all classes are sufficiently documented. There also exists a small example within the repo's README file to get the hang of it.

Here is the link to the repo: https://github.com/manoss96/pregex

Any feedback is welcome!

UPDATE: Thank you all for your comments and feedback, I hope this package helps you get the job done faster! I've gotten a lot of comments mentioning that having to import every stuff is annoying, and I can understand that. However, I still think that all classes should remain separated into different modules, as each module expresses a different functionality, but at the same time I don't think that importing everything all at once is a good thing, so I tried a different approach. All the modules that you'll need are now imported within the package's "__init__.py" by using a short alias for each module. For instance, "quantifiers.py" is imported as "qu". Thus, you can simply write "from pregex import *" at the top of your .py script, and then just use these aliases. Just be careful, this can only be done in pregex version >=1.0.2.

Top answer

1 of 1

Regex is a language. It doesn't look like much of one, but it is.

Like every language, it is not written in anything. A language is a set of mathematical rules and restrictions. If we can say that it is written in anything at all, we would probably say that it is written in English. (Or in a specific English-based jargon for specifying languages, enriched with graphical and mathematical tools for expressing language rules.)

A specific implementation of the language (regex) is of course written in a specific language, but the language itself isn't.

As an example, the implementation of the re module that ships as part of the CPython implementation of the Python programming language is called the Secret Labs' Regular Expression Engine (sre), and is written in Python and C. More precisely, it consists of a compiler written in Python that compiles re regexes into byte code for a virtual machine, and a VM written in C that interprets that byte code.

The implementation that ships with Jython uses the same Python code and byte code, but the byte code VM is written in Java, not C.

At first glance, IronPython looks similar: compiler in Python and VM in C#. However, if you look closer, the VM is actually a non-functional stub, and the real implementation is in C# and is based on System.Text.RegularExpressions from the CLI.

PyPy follows the standard pattern again: compiler in Python and the VM in RPython.

And of course other languages have completely different flavors of regex. E.g. Ruby's Regexp is quite different from Python's re. And in Ruby, we have similar diversity: YARV uses an engine called Onigmo to implement its Regexp class whereas JRuby uses joni.

Quora

quora.com › How-do-I-download-and-install-the-RE-module-for-Python-3-6-on-Windows-7

How to download and install the RE module for Python 3.6 on Windows 7 - Quora

Answer (1 of 3): The [code ]re[/code] module is spelled all lowercase. Python 3.6 already has re. If you set the environment variable PYTHONCASEOK to 1 in Windows you can [code ]import RE[/code] and Python will import the re module.

Google

developers.google.com › google for education › python › python regular expressions

Python Regular Expressions | Python Education | Google for Developers

This page gives a basic introduction to regular expressions themselves sufficient for our Python exercises and shows how regular expressions work in Python. The Python "re" module provides regular expression support.

Python documentation

docs.python.org › 3 › howto › regex.html

Regular Expression HOWTO — Python 3.14.3 documentation

This document is an introductory tutorial to using regular expressions in Python with the re module. It provides a gentler introduction than the corresponding section in the Library Reference.

Simplilearn

simplilearn.com › home › resources › software development › your ultimate python tutorial for beginners › python regular expression (regex)

Python Regular Expression (RegEX)

September 3, 2024 - Python has many powerful features, and Python regular expression (RegEx) is one of those used for data cleansing. So, read on to learn more!

Address 5851 Legacy Circle, 6th Floor, Plano, TX 75024 United States

Python.org

discuss.python.org › python help

Regular Expressions (RE) Module - Search and Match Comparison - Python Help - Discussions on Python.org

October 26, 2023 - Hello, I have a question regarding the regular expression compile. I created a code snippet to compare the different search and match results using different strings and using different patterns. Here is the test code snippet: import re s1 = 'bob has a birthday on Feb 25th' s2 = 'sara has a birthday on March 3rd' s3 = '12eup 586iu' s4 = '0turt' # '\w\w\w \d\d\w\w' bday1_re = re.compile('\w+ \d+\w+') # Also tried: '\w+ \d+\w+' bday2_re = re.comp...

Medium

medium.com › @ebojacky › the-very-bare-minimum-essentials-for-regular-expressions-in-python-54e78c10b649

The Very Bare Minimum Essentials for Regular Expressions in Python | by Ebo Jackson | Medium

June 2, 2025 - Regular expressions (regex) in Python are a powerful tool for pattern matching, text manipulation, and data extraction. The re module provides a robust framework for working with regex, enabling developers to handle tasks like validation, parsing, ...

GitHub

github.com › mrabarnett › mrab-regex

GitHub - mrabarnett/mrab-regex

It expects that all codepoints ... UTF-8. The regex module releases the GIL during matching on instances of the built-in (immutable) string classes, enabling other Python threads to run concurrently....

Starred by 576 users

Forked by 70 users

Languages C 85.8% | Python 14.2%