Brave Search

stackoverflow.com › questions › 452104 › is-it-worth-using-pythons-re-compile

I've had a lot of experience running a compiled regex 1000s of times versus compiling on-the-fly, and have not noticed any perceivable difference. Obviously, this is anecdotal, and certainly not a great argument against compiling, but I've found the difference to be negligible.

EDIT: After a quick glance at the actual Python 2.5 library code, I see that Python internally compiles AND CACHES regexes whenever you use them anyway (including calls to re.match()), so you're really only changing WHEN the regex gets compiled, and shouldn't be saving much time at all - only the time it takes to check the cache (a key lookup on an internal dict type).

From module re.py (comments are mine):

def match(pattern, string, flags=0):
    return _compile(pattern, flags).match(string)

def _compile(*key):

    # Does cache check at top of function
    cachekey = (type(key[0]),) + key
    p = _cache.get(cachekey)
    if p is not None: return p

    # ...
    # Does actual compilation on cache miss
    # ...

    # Caches compiled regex
    if len(_cache) >= _MAXCACHE:
        _cache.clear()
    _cache[cachekey] = p
    return p

I still often pre-compile regular expressions, but only to bind them to a nice, reusable name, not for any expected performance gain.

Answer from Kenan Banks on Stack Overflow

Python documentation

docs.python.org › 3 › library › re.html

re — Regular expression operations

4 days ago - Usually patterns will be expressed in Python code using this raw string notation. It is important to note that most regular expression operations are available as module-level functions and methods on compiled regular expressions. The functions are shortcuts that don’t require you to compile a regex object first, but miss some fine-tuning parameters.

Stack Overflow

stackoverflow.com › questions › 452104 › is-it-worth-using-pythons-re-compile

regex - Is it worth using Python's re.compile? - Stack Overflow

Videos

02:18

YouTube

[regex_02] re.compile pattern - YouTube

Python Regex: Boost Performance with Compiled Patterns! 🚀 - YouTube

September 6, 2025

07:03

YouTube

Python Regex Compile [7-Minutes Primer] - YouTube

Master Python Regular Expressions in 1 Hour: Complete Guide - YouTube

Discussions

Do you see any reason to use re.compile() when you are working with regular expressions?

There is supposed to be a speedup if you compile a heavily used regexp. You may not see any difference if you just use a non-compiled regexp a few times. If in doubt in your particular case just compare execution times for compiled and non-compiled cases. More on reddit.com

r/learnpython

October 12, 2024

compilers - Can regex be compiled into efficient machine code? - Programming Language Design and Implementation Stack Exchange

The compilation happens at runtime, but each regex only needs to be compiled once and can be tested many times. For example, in Java you call Pattern.compile, in Python it's re.compile; but even APIs which accept regexes as strings, often cache the compiled form of the regex so that it only ... More on langdev.stackexchange.com

langdev.stackexchange.com

May 26, 2023

`re.compile` is compiling a invalid regex

Bug report re.compile is compiling a invalid regex. Your environment CPython versions tested on: 3.11.3 Operating system and architecture: Windows 10 x64 re.compile compiles the following expressio... More on github.com

github.com

May 30, 2023

[Python] How to speed up thousands of re.subs()

Since you mentioned you are using complex regular expressions, it might be worth looking into optimizing some of them. Here are some resources that discuss common pitfalls that can cause regular expressions to take much longer than necessary to find matches: https://www.regular-expressions.info/catastrophic.html https://www.regular-expressions.info/toolong.html More on reddit.com

r/regex

July 17, 2021

GeeksforGeeks

geeksforgeeks.org › python › re-compile-in-python

re.compile() in Python - GeeksforGeeks

July 23, 2025 - The re.compile() method in Python is used to compile a regular expression pattern into a regex object.

PYnative

pynative.com › home › python › regex › python compile regex pattern using re.compile()

Python Compile Regex Pattern using re.compile()

April 2, 2021 - Python’s re.compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object (re.Pattern).

reddit.com › r/learnpython › do you see any reason to use re.compile() when you are working with regular expressions?

r/learnpython on Reddit: Do you see any reason to use re.compile() when you are working with regular expressions?

October 12, 2024 -

beginner here, i don't really see a difference just assigning the regex pattern to a variable. are there some cases where it can be helpful?

Top answer

1 of 6

Most languages that have regex have a regex parsing library that interprets the regex at runtime and matches them to strings.

This is mostly right, but "interpret" isn't really accurate. Regexes in most mainstream languages are compiled to a form which is efficient to check. The compilation happens at runtime, but each regex only needs to be compiled once and can be tested many times. For example, in Java you call Pattern.compile, in Python it's re.compile; but even APIs which accept regexes as strings, often cache the compiled form of the regex so that it only needs to be compiled once for the whole run of the program.

That said, the compiled form of a regex is still generally not quite as fast as a direct compilation to machine code would be.

what would it take for a language to 'compile' regex at compile time to be as efficient as writing an equivalent string matching function by hand?

The classical way to compile a regular expression is to convert it to a deterministic finite automaton (DFA) ─ a state machine. Compiling a regular expression to a DFA involves a few steps, but here's a typical procedure:

First build a non-deterministic finite automaton (NFA) using Thompson's construction,
Then convert the NFA to a DFA using the powerset construction,
Then, optionally, convert the DFA into a minimal equivalent DFA.

The resulting DFA can then evaluate the regular expression on a string of length n in O(n) time with a low coefficient ─ potentially just a single array lookup per character in the input string ─ regardless of how complicated the regular expression is. However, the DFA itself may take up a lot more memory for more complicated regular expressions.

Why do most languages not do this?

Because the classical approach has two really significant downsides which make it inapplicable for many real regexes. Firstly, it only works for regexes that are truly regular expressions in the formal sense, i.e. they recognise a regular language. But many features of modern regex engines allow regexes which are not true regular expressions ─ particularly backreferences. Additionally, in the worst case the resulting DFA is exponentially large in the length of the regex, so while the regex is very efficient to execute, it is not at all efficient in terms of code size.

That's not to say that there aren't other potential approaches to compiling regular expressions into machine code, without going via NFAs and DFAs. But

Those other approaches aren't nearly as well-known;
The current state of the art (i.e. compiling at runtime to a form which enables efficient execution) can be pretty efficient regardless, so there's probably not that much room for improvement;
Compiling regexes to machine code at runtime requires some way of hooking that machine code into the program while it's executing, and most language implementations don't have a mechanism for this;
On the other hand, compiling regexes to machine code at compile-time would require the regex implementation to be part of the compiler, rather than part of the standard library, and wouldn't support dynamic construction of regexes at runtime.

2 of 6

Can regex be compiled into efficient machine code?

In general and absent of any context about a specific language? Yes, it can. The deeper question is really about the context (design goals) and costs (working with tradeoffs).

At the library-level, one example is Hana Dusikova's compile-time regular expressions in C++. See also its WG21 proposal paper, GitHub repo, and website with links to various conference talks. It leverages C++'s compile-time facilities. I think it hits close to what you're talking about. From its proposal paper:

The current std::regex design and implementation are slow, mostly because the RE pattern is parsed and compiled at runtime. Users often don’t need a runtime RE parser engine as the pattern is known during compilation in many common use cases. I think this breaks C++’s promise of “don’t pay for what you don’t use.” If the RE is known at compile time, the pattern should be checked during the compilation. The design of std::regex doesn’t allow for this as the RE input is a runtime string and syntax errors are reported as exceptions.

Notice a couple of things:

The proposal addresses a core design goal of the language.
The language had features that were able to support the implementation of a non-standard library implementation.

What would it take for a language to 'compile' regex at compile time to be as efficient as writing an equivalent string matching function by hand?

The answer is not so black and white. There can be space and time efficiency tradeoffs. Now that you have contextual information at each site that can be used for specialized code-generation, how much code do you inline? What do you inline and what don't you inline? What do you extract to common procedures and what impact does that have on costs from function calls? Inlining isn't always the best- there can be cases where more code sharing results in better instruction-cache usage. How much code gets generated? In what way does the runtime data cost relate to the inputs? How do all those costs compare to the costs of a generalized regex engine? At what point does generating optimized code for known-at-compile-time regexes become more costly in code size than just having a regex engine in the runtime environment?

If implemented at the library level, (assuming that you care,) you'd need a way for the user to communicate at the library level what they want to optimize for / what tradeoff they want, and then for the language to provide powerful enough facilities for things living at the library level to estimate the space and time costs, and even at that point, if the language specification and language implementations are more separate, you'd hit some boundary in how accurate those estimates can be with respect to what the implementation actually does (Ex. what it compiles).

Why do most languages not do this?

Not all languages prioritize efficiency of codegen as a design goal enough to want this. That's a perfectly valid design choice. You really can't be everything. In fact, the general overarching trend in languages over time seems to be to move away from the hardware and losing some of its benefits and have higher-level, more abstract languages and runtimes.

And a lot of runtime models are not really geared towards doing heavy compile-time optimization / specialization. For example, Java bytecode has a limited set of supported instructions, and that can further limit what kinds of code you can generate. At that point, it can become a conversation about adding deeper features to the language or its components (which has its own costs), or just shrugging your shoulders and leaving it to runtime implementations to detect patterns and optimize what they do under the hood (which has its limitations).

Sometimes it's just a matter of nobody caring enough or having enough time to put the work into implementing or specifying it yet. Things don't just magically appear, and a lot of people who work on language design are not doing that as their day-job. Life can get in the way. For languages that aren't designed by just one person, collaborating on things and making decisions as a group has its own challenges as well, such as just meeting at the same time (timezone things), agreeing on the same priorities, dealing with conflicts in effects on different use-cases, etc.

What would be the advantages or disadvantages of having regex as a core language feature as opposed to a library function?

As stated above, you could have the people implementing the language compilers / interpreter optimizers implement the tradoff optimizations of time and space instead of having something at the library level try to do those optimizations within its limitations and then not have that information propagate to the compilers / interpreter optimizers.

RegExr

regexr.com

RegExr: Learn, Build, & Test RegEx

RegExr is an online tool to learn, build, & test Regular Expressions (RegEx / RegExp).

Scaler

scaler.com › home › topics › re.compile in python

Re.compile in Python - Scaler Topics

March 12, 2024 - The re.compile() in Python is a powerful tool for regex pattern development, allowing you to pre-compile and save patterns for easy reuse. This function improves speed by preventing unnecessary recompilations.

Programiz

programiz.com › python-programming › regex

Python RegEx (With Examples)

In this tutorial, you will learn about regular expressions (RegEx), and use Python's re module to work with RegEx (with the help of examples).

Pythex

pythex.org

Pythex: a Python regular expression editor

Pythex is a real-time regular expression editor for Python, a quick way to test your regular expressions.

GeeksforGeeks

geeksforgeeks.org › python › regular-expression-python-examples

Python RegEx - GeeksforGeeks

August 14, 2025 - The re module in Python provides various functions that help search, match, and manipulate strings using regular expressions. Below are main functions available in the re module: Let's see the working of these RegEx functions with definition and examples:

Regex Generator

regex-generator.olafneumann.org

Regex Generator - Creating regex is easy again!

A tool to generate simple regular expressions from sample text. Enable less experienced developers to create regex smoothly.

GitHub

github.com › python › cpython › issues › 105121

`re.compile` is compiling a invalid regex · Issue #105121 · python/cpython

May 30, 2023 - Bug report re.compile is compiling a invalid regex. Your environment CPython versions tested on: 3.11.3 Operating system and architecture: Windows 10 x64 re.compile compiles the following expression: "[0-9]++" which should be invalid regex.

Published May 30, 2023

TechGeekBuzz

techgeekbuzz.com › blog › python-compile-regex-pattern-using-re-compile

Python Compile Regex Pattern using re.compile()

... The re.compile() method compiles the string pattern to a regular expression object and returns it. ... Now let’s write a Python script that demonstrates the typical example case of re.compile() method.