I've had a lot of experience running a compiled regex 1000s of times versus compiling on-the-fly, and have not noticed any perceivable difference. Obviously, this is anecdotal, and certainly not a great argument against compiling, but I've found the difference to be negligible.

EDIT: After a quick glance at the actual Python 2.5 library code, I see that Python internally compiles AND CACHES regexes whenever you use them anyway (including calls to re.match()), so you're really only changing WHEN the regex gets compiled, and shouldn't be saving much time at all - only the time it takes to check the cache (a key lookup on an internal dict type).

From module re.py (comments are mine):

def match(pattern, string, flags=0):
    return _compile(pattern, flags).match(string)

def _compile(*key):

    # Does cache check at top of function
    cachekey = (type(key[0]),) + key
    p = _cache.get(cachekey)
    if p is not None: return p

    # ...
    # Does actual compilation on cache miss
    # ...

    # Caches compiled regex
    if len(_cache) >= _MAXCACHE:
        _cache.clear()
    _cache[cachekey] = p
    return p

I still often pre-compile regular expressions, but only to bind them to a nice, reusable name, not for any expected performance gain.

Answer from Kenan Banks on Stack Overflow
๐ŸŒ
Python documentation
docs.python.org โ€บ 3 โ€บ library โ€บ re.html
re โ€” Regular expression operations
4 days ago - Usually patterns will be expressed in Python code using this raw string notation. It is important to note that most regular expression operations are available as module-level functions and methods on compiled regular expressions. The functions are shortcuts that donโ€™t require you to compile a regex object first, but miss some fine-tuning parameters.
Top answer
1 of 16
560

I've had a lot of experience running a compiled regex 1000s of times versus compiling on-the-fly, and have not noticed any perceivable difference. Obviously, this is anecdotal, and certainly not a great argument against compiling, but I've found the difference to be negligible.

EDIT: After a quick glance at the actual Python 2.5 library code, I see that Python internally compiles AND CACHES regexes whenever you use them anyway (including calls to re.match()), so you're really only changing WHEN the regex gets compiled, and shouldn't be saving much time at all - only the time it takes to check the cache (a key lookup on an internal dict type).

From module re.py (comments are mine):

def match(pattern, string, flags=0):
    return _compile(pattern, flags).match(string)

def _compile(*key):

    # Does cache check at top of function
    cachekey = (type(key[0]),) + key
    p = _cache.get(cachekey)
    if p is not None: return p

    # ...
    # Does actual compilation on cache miss
    # ...

    # Caches compiled regex
    if len(_cache) >= _MAXCACHE:
        _cache.clear()
    _cache[cachekey] = p
    return p

I still often pre-compile regular expressions, but only to bind them to a nice, reusable name, not for any expected performance gain.

2 of 16
183

For me, the biggest benefit to re.compile is being able to separate definition of the regex from its use.

Even a simple expression such as 0|[1-9][0-9]* (integer in base 10 without leading zeros) can be complex enough that you'd rather not have to retype it, check if you made any typos, and later have to recheck if there are typos when you start debugging. Plus, it's nicer to use a variable name such as num or num_b10 than 0|[1-9][0-9]*.

It's certainly possible to store strings and pass them to re.match; however, that's less readable:

num = "..."
# then, much later:
m = re.match(num, input)

Versus compiling:

num = re.compile("...")
# then, much later:
m = num.match(input)

Though it is fairly close, the last line of the second feels more natural and simpler when used repeatedly.

Discussions

Do you see any reason to use re.compile() when you are working with regular expressions?
There is supposed to be a speedup if you compile a heavily used regexp. You may not see any difference if you just use a non-compiled regexp a few times. If in doubt in your particular case just compare execution times for compiled and non-compiled cases. More on reddit.com
๐ŸŒ r/learnpython
7
1
October 12, 2024
compilers - Can regex be compiled into efficient machine code? - Programming Language Design and Implementation Stack Exchange
The compilation happens at runtime, but each regex only needs to be compiled once and can be tested many times. For example, in Java you call Pattern.compile, in Python it's re.compile; but even APIs which accept regexes as strings, often cache the compiled form of the regex so that it only ... More on langdev.stackexchange.com
๐ŸŒ langdev.stackexchange.com
May 26, 2023
`re.compile` is compiling a invalid regex
Bug report re.compile is compiling a invalid regex. Your environment CPython versions tested on: 3.11.3 Operating system and architecture: Windows 10 x64 re.compile compiles the following expressio... More on github.com
๐ŸŒ github.com
2
May 30, 2023
[Python] How to speed up thousands of re.subs()
Since you mentioned you are using complex regular expressions, it might be worth looking into optimizing some of them. Here are some resources that discuss common pitfalls that can cause regular expressions to take much longer than necessary to find matches: https://www.regular-expressions.info/catastrophic.html https://www.regular-expressions.info/toolong.html More on reddit.com
๐ŸŒ r/regex
6
2
July 17, 2021
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ python โ€บ re-compile-in-python
re.compile() in Python - GeeksforGeeks
July 23, 2025 - The re.compile() method in Python is used to compile a regular expression pattern into a regex object.
๐ŸŒ
PYnative
pynative.com โ€บ home โ€บ python โ€บ regex โ€บ python compile regex pattern using re.compile()
Python Compile Regex Pattern using re.compile()
April 2, 2021 - Pythonโ€™s re.compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object (re.Pattern).
๐ŸŒ
Python documentation
docs.python.org โ€บ 3 โ€บ howto โ€บ regex.html
Regular Expression HOWTO โ€” Python 3.14.3 documentation
If the regex pattern is expressed in bytes, this is equivalent to the class [a-zA-Z0-9_]. If the regex pattern is a string, \w will match all the characters marked as letters in the Unicode database provided by the unicodedata module. You can use the more restricted definition of \w in a string pattern by supplying the re.ASCII flag when compiling the regular expression.
Find elsewhere
๐ŸŒ
Regex101
regex101.com
regex101: build, test, and debug regex
Regular expression tester with syntax highlighting, explanation, cheat sheet for PHP/PCRE, Python, GO, JavaScript, Java, C#/.NET, Rust.
๐ŸŒ
TheServerSide
theserverside.com โ€บ tip โ€บ The-benefits-of-using-compiled-regex-in-Python-and-Java
The benefits of using compiled regex in Python and Java | TheServerSide
December 28, 2023 - Compiled regular expressions, on the other hand, were breathtaking. They managed to process the 100,000 lines in just 52 ms. That's over three times faster than string primitives. Using the same methodology of 100,000 randomly generated input lines in Python showed some differences to Java.
๐ŸŒ
Medium
nowitsanurag.medium.com โ€บ regular-expression-in-python-f42483e80daa
Regular Expression in Python. Regex | by Anurag | Medium
January 11, 2023 - You can use the re-module to work with regular expressions in Python. ... You first need to import the re-module to use regular expressions in Python. Then you can use the re.compile() function to create a regular expression object.
๐ŸŒ
Interactive Chaos
interactivechaos.com โ€บ en โ€บ python โ€บ function โ€บ recompile
re.compile | Interactive Chaos
May 3, 2021 - re.compile(pattern, flags=0) ยท The re.compile function creates a regular expression object by compiling a regular expression pattern, which can be used as a matching pattern in the re.match, re.search, etc. functions
๐ŸŒ
W3Schools
w3schools.com โ€บ python โ€บ python_regex.asp
Python RegEx
RegEx can be used to check if a string contains the specified search pattern. Python has a built-in package called re, which can be used to work with Regular Expressions.
Top answer
1 of 6
18

Most languages that have regex have a regex parsing library that interprets the regex at runtime and matches them to strings.

This is mostly right, but "interpret" isn't really accurate. Regexes in most mainstream languages are compiled to a form which is efficient to check. The compilation happens at runtime, but each regex only needs to be compiled once and can be tested many times. For example, in Java you call Pattern.compile, in Python it's re.compile; but even APIs which accept regexes as strings, often cache the compiled form of the regex so that it only needs to be compiled once for the whole run of the program.

That said, the compiled form of a regex is still generally not quite as fast as a direct compilation to machine code would be.

what would it take for a language to 'compile' regex at compile time to be as efficient as writing an equivalent string matching function by hand?

The classical way to compile a regular expression is to convert it to a deterministic finite automaton (DFA) โ”€ a state machine. Compiling a regular expression to a DFA involves a few steps, but here's a typical procedure:

  • First build a non-deterministic finite automaton (NFA) using Thompson's construction,
  • Then convert the NFA to a DFA using the powerset construction,
  • Then, optionally, convert the DFA into a minimal equivalent DFA.

The resulting DFA can then evaluate the regular expression on a string of length n in O(n) time with a low coefficient โ”€ potentially just a single array lookup per character in the input string โ”€ regardless of how complicated the regular expression is. However, the DFA itself may take up a lot more memory for more complicated regular expressions.

Why do most languages not do this?

Because the classical approach has two really significant downsides which make it inapplicable for many real regexes. Firstly, it only works for regexes that are truly regular expressions in the formal sense, i.e. they recognise a regular language. But many features of modern regex engines allow regexes which are not true regular expressions โ”€ particularly backreferences. Additionally, in the worst case the resulting DFA is exponentially large in the length of the regex, so while the regex is very efficient to execute, it is not at all efficient in terms of code size.

That's not to say that there aren't other potential approaches to compiling regular expressions into machine code, without going via NFAs and DFAs. But

  • Those other approaches aren't nearly as well-known;
  • The current state of the art (i.e. compiling at runtime to a form which enables efficient execution) can be pretty efficient regardless, so there's probably not that much room for improvement;
  • Compiling regexes to machine code at runtime requires some way of hooking that machine code into the program while it's executing, and most language implementations don't have a mechanism for this;
  • On the other hand, compiling regexes to machine code at compile-time would require the regex implementation to be part of the compiler, rather than part of the standard library, and wouldn't support dynamic construction of regexes at runtime.
2 of 6
10

Can regex be compiled into efficient machine code?

In general and absent of any context about a specific language? Yes, it can. The deeper question is really about the context (design goals) and costs (working with tradeoffs).

At the library-level, one example is Hana Dusikova's compile-time regular expressions in C++. See also its WG21 proposal paper, GitHub repo, and website with links to various conference talks. It leverages C++'s compile-time facilities. I think it hits close to what you're talking about. From its proposal paper:

The current std::regex design and implementation are slow, mostly because the RE pattern is parsed and compiled at runtime. Users often donโ€™t need a runtime RE parser engine as the pattern is known during compilation in many common use cases. I think this breaks C++โ€™s promise of โ€œdonโ€™t pay for what you donโ€™t use.โ€ If the RE is known at compile time, the pattern should be checked during the compilation. The design of std::regex doesnโ€™t allow for this as the RE input is a runtime string and syntax errors are reported as exceptions.

Notice a couple of things:

  • The proposal addresses a core design goal of the language.
  • The language had features that were able to support the implementation of a non-standard library implementation.

What would it take for a language to 'compile' regex at compile time to be as efficient as writing an equivalent string matching function by hand?

The answer is not so black and white. There can be space and time efficiency tradeoffs. Now that you have contextual information at each site that can be used for specialized code-generation, how much code do you inline? What do you inline and what don't you inline? What do you extract to common procedures and what impact does that have on costs from function calls? Inlining isn't always the best- there can be cases where more code sharing results in better instruction-cache usage. How much code gets generated? In what way does the runtime data cost relate to the inputs? How do all those costs compare to the costs of a generalized regex engine? At what point does generating optimized code for known-at-compile-time regexes become more costly in code size than just having a regex engine in the runtime environment?

If implemented at the library level, (assuming that you care,) you'd need a way for the user to communicate at the library level what they want to optimize for / what tradeoff they want, and then for the language to provide powerful enough facilities for things living at the library level to estimate the space and time costs, and even at that point, if the language specification and language implementations are more separate, you'd hit some boundary in how accurate those estimates can be with respect to what the implementation actually does (Ex. what it compiles).

Why do most languages not do this?

Not all languages prioritize efficiency of codegen as a design goal enough to want this. That's a perfectly valid design choice. You really can't be everything. In fact, the general overarching trend in languages over time seems to be to move away from the hardware and losing some of its benefits and have higher-level, more abstract languages and runtimes.

And a lot of runtime models are not really geared towards doing heavy compile-time optimization / specialization. For example, Java bytecode has a limited set of supported instructions, and that can further limit what kinds of code you can generate. At that point, it can become a conversation about adding deeper features to the language or its components (which has its own costs), or just shrugging your shoulders and leaving it to runtime implementations to detect patterns and optimize what they do under the hood (which has its limitations).

Sometimes it's just a matter of nobody caring enough or having enough time to put the work into implementing or specifying it yet. Things don't just magically appear, and a lot of people who work on language design are not doing that as their day-job. Life can get in the way. For languages that aren't designed by just one person, collaborating on things and making decisions as a group has its own challenges as well, such as just meeting at the same time (timezone things), agreeing on the same priorities, dealing with conflicts in effects on different use-cases, etc.

What would be the advantages or disadvantages of having regex as a core language feature as opposed to a library function?

As stated above, you could have the people implementing the language compilers / interpreter optimizers implement the tradoff optimizations of time and space instead of having something at the library level try to do those optimizations within its limitations and then not have that information propagate to the compilers / interpreter optimizers.

๐ŸŒ
RegExr
regexr.com
RegExr: Learn, Build, & Test RegEx
RegExr is an online tool to learn, build, & test Regular Expressions (RegEx / RegExp).
๐ŸŒ
Scaler
scaler.com โ€บ home โ€บ topics โ€บ re.compile in python
Re.compile in Python - Scaler Topics
March 12, 2024 - The re.compile() in Python is a powerful tool for regex pattern development, allowing you to pre-compile and save patterns for easy reuse. This function improves speed by preventing unnecessary recompilations.
๐ŸŒ
Programiz
programiz.com โ€บ python-programming โ€บ regex
Python RegEx (With Examples)
In this tutorial, you will learn about regular expressions (RegEx), and use Python's re module to work with RegEx (with the help of examples).
๐ŸŒ
Pythex
pythex.org
Pythex: a Python regular expression editor
Pythex is a real-time regular expression editor for Python, a quick way to test your regular expressions.
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ python โ€บ regular-expression-python-examples
Python RegEx - GeeksforGeeks
August 14, 2025 - The re module in Python provides various functions that help search, match, and manipulate strings using regular expressions. Below are main functions available in the re module: Let's see the working of these RegEx functions with definition and examples:
๐ŸŒ
Regex Generator
regex-generator.olafneumann.org
Regex Generator - Creating regex is easy again!
A tool to generate simple regular expressions from sample text. Enable less experienced developers to create regex smoothly.
๐ŸŒ
GitHub
github.com โ€บ python โ€บ cpython โ€บ issues โ€บ 105121
`re.compile` is compiling a invalid regex ยท Issue #105121 ยท python/cpython
May 30, 2023 - Bug report re.compile is compiling a invalid regex. Your environment CPython versions tested on: 3.11.3 Operating system and architecture: Windows 10 x64 re.compile compiles the following expression: "[0-9]++" which should be invalid regex.
Published ย  May 30, 2023
๐ŸŒ
TechGeekBuzz
techgeekbuzz.com โ€บ blog โ€บ python-compile-regex-pattern-using-re-compile
Python Compile Regex Pattern using re.compile()
... The re.compile() method compiles the string pattern to a regular expression object and returns it. ... Now letโ€™s write a Python script that demonstrates the typical example case of re.compile() method.