python 3.11 performance

As shown by benchmarking back to Python 3.8, there isn't normally too much variation in the performance department between CPython releases. But with Python 3.11 it's a big change for increasing the performance and making this de facto Python implementation more competitive to the likes of ...

Python documentation

docs.python.org › 3 › whatsnew › 3.11.html

What's New In Python 3.11 — Python 3.14.3 documentation

January 30, 2026 - In Python 3.11, the frame struct was reorganized to allow performance optimizations.

Discussions

Python 3.11 is faster than 3.8

zig: Emulated 600 frames in 0.24s (2521fps) rs: Emulated 600 frames in 0.37s (1626fps) cpp: Emulated 600 frames in 0.40s (1508fps) nim: Emulated 600 frames in 0.44s (1367fps) go: Emulated 600 frames in 1.75s (342fps) php: Emulated 600 frames in 23.74s (25fps) py: Emulated 600 frames in 26.16s ... More on news.ycombinator.com

news.ycombinator.com

295

378

October 30, 2022

performance - Python 3.11 worse optimized than 3.10? - Stack Overflow

The older version executes in 187ms, Python 3.11 needs about 17000ms. Does 3.10 realize that only the first 5 chars of a are needed, whereas 3.11 executes the whole loop? I confirmed this performance difference on godbolt. More on stackoverflow.com

stackoverflow.com

Why does this specific code run faster in Python 3.11? - Stack Overflow

When I tried to run with multiple python versions I am seeing a drastic performance difference. C:\Users\Username\Desktop>py -3.10 benchmark.py 16.79652149998583 C:\Users\Username\Desktop>py -3.11 benchmark.py 10.92280820000451 More on stackoverflow.com

stackoverflow.com

Python 3.11 vs 3.10 performance

12.4 ms

6.35 ms: 1.96x faster

That's 1.96x as fast. Unless 1x faster means the exact same speed, and 0.5x faster is actually half the speed.

This is one of my biggest pet peeves in benchmarks.

Videos

youtube.com

Python 3.11 Speed Boosts Are Real

m.youtube.com

Python 3.11 - Massive Speed Improvements: What You Need ...

33:26

YouTube

How Python 3.11 is Speeding Up - GeekcampSG 2022 - YouTube

November 18, 2022

07:53

YouTube

Are you using the SLOWER Python? - YouTube

October 31, 2022

28:52

YouTube

How we are making Python 3.11 faster - presented by Mark Shannon ...

October 26, 2022

10:57

YouTube

RELEASE: Python 3.11 is out w/ 10-60% performance, task groups ...

October 25, 2022

975

View all

reddit.com › r/python › how python 3.11 became so fast!!!

r/Python on Reddit: How Python 3.11 became so fast!!!

January 16, 2023 -

With Python 3.11, it’s making quite some noise in the Python circles. It has become almost 2x times faster than its predecessor. But what's new in this version of Python?

New Data structure: Because of the removal of the exception stack huge memory is being saved which is again used by the cache to allocate to the newly created python object frame.

Specialized adaptive Interpreter:

Each instruction is one of the two states.

General, with a warm-up counter: When the counter reaches zero, the instruction is specialized. (to do general lookup)
Specialized, with a miss counter: When the counter reaches zero, the instruction is de-optimized. (to lookup particular values or types of values)

Specialized bytecode: Specialization is just how the memory is read (the reading order) when a particular instruction runs. The same stuff can be accessed in multiple ways, specialization is just optimizing the memory read for that particular instruction.

Read the full article here: https://medium.com/aiguys/how-python-3-11-is-becoming-faster-b2455c1bc555

Top answer

1 of 5

160

Can we please ban links to these worthless blog posts?

2 of 5

No way Python must be going 1000km/h

Towards Data Science

towardsdatascience.com › home › latest › running faster than ever before

Running Faster than Ever Before | Towards Data Science

March 5, 2025 - However, with the ever increasing volumes of data on our hands, wouldn’t it be great to complete computations faster? The upcoming Python 3.11 release is highly anticipated for the expected 10–60% boost in performance in comparison to the ...

Medium

medium.com › aiguys › how-python-3-11-is-becoming-faster-b2455c1bc555

How Python 3.11 became so fast | AIGuys

November 7, 2022 - Python 3.11 became 2x times faster than its predecessor. Architecture change of Python 3.11. Faster Python. Python speed comparison with other languages.

DEVCLASS

devclass.com › home › development › how python 3.11 is gaining performance at the cost of ‘a bit...

How Python 3.11 is gaining performance at the cost of 'a bit more memory'

July 31, 2023 - "Python 3.11 is up to 10-60 percent faster than Python 3.10." But how? DevClass takes a look

Kracekumar

kracekumar.com › post › micro-benchmark-python-311

Python 3.11 micro-benchmark · Technical Ramblings

Python 3.11 is faster compared to Python 3.10 by 2.89%. The median execution times, Python 3.9 - 11.46s, Python 3.10 - 11.35s, Python 3.11 - 11.13s. Since most of the code was doing network call, it’s surprising to see a small performance improvement in Python 3.11.

Trendblog

trendblog.net › home › coding › python 3.11 performance benchmark show huge improvement

Python 3.11 Performance Benchmark Show Huge Improvement - Trendblog.net

December 8, 2025 - As shown by benchmarking back to Python 3.8, there isn’t normally too much variation in the performance department between CPython releases. But with Python 3.11 it’s a big change for increasing the performance and making this de facto Python implementation more competitive to the likes of Pyston and PyPy.

Find elsewhere

Google Bing Mojeek

Andy Pearce

andy-pearce.com › blog › posts › 2022 › Dec › whats-new-in-python-311-performance-improvements

What’s New in Python 3.11 - Performance Improvements

December 16, 2022 - These changes are the work of the Faster CPython project, and they claim the changes in Python 3.11 make it around 25% faster on average, or anything from 10-60% depending on specific use-cases.

Towards Data Science

towardsdatascience.com › home › latest › python 3.11 is indeed faster than 3.10

Python 3.11 Is Indeed Faster Than 3.10 | Towards Data Science

March 5, 2025 - Python 3.11 took only 21 seconds to sort while the 3.10 counterpart took 39 seconds. An interesting performance challenge is how fast our program reads and writes information on the disk.

Europython

ep2022.europython.eu › session › how-we-are-making-python-3-11-faster

How we are making Python 3.11 faster - Mark Shannon - EuroPython 2022 | July 11th-17th 2022 | Dublin Ireland & Remote

August 14, 2022 - Python 3.11 is between 10% and 60% faster than Python 3.10, depending on the application. We have achieved this in a fully portable way by making the interpreter adapt to the program being run, and by streamlining key data structures.

Jott

jott.live › markdown › py3.11_vs_3.8

Python 3.11 is much faster than 3.8

([Python source code](https://... -0.169077842 python3.11 sim.py 10000000 31.92s user 0.05s system 99% cpu 31.976 total ``` Python 3.11 took only **31.98 seconds**! That's 3x faster!...

Hacker News

news.ycombinator.com › item

Python 3.11 is faster than 3.8 | Hacker News

October 30, 2022 - zig: Emulated 600 frames in 0.24s (2521fps) rs: Emulated 600 frames in 0.37s (1626fps) cpp: Emulated 600 frames in 0.40s (1508fps) nim: Emulated 600 frames in 0.44s (1367fps) go: Emulated 600 frames in 1.75s (342fps) php: Emulated 600 frames in 23.74s (25fps) py: Emulated 600 frames in 26.16s ...

Medium

medium.com › @hieutrantrung.it › pythons-performance-revolution-how-3-11-made-speed-a-priority-4cdeee59c349

Python’s Performance Revolution: How 3.11+ Made Speed a Priority | by Trung Hiếu Trần | Medium

January 5, 2026 - ... Result: Up to 10–20% improvement in function-heavy code. ... While not strictly a performance feature, Python 3.11’s improved error messages help developers debug faster — an indirect but real performance win.

Lewoniewski

en.lewoniewski.info › 2023 › python-3-10-vs-python-3-11-performance-testing

Python 3.10 vs Python 3.11 – performance testing

October 17, 2023 - The result shows that Python 3.11 has the best performance results over Python 3.10 in the following tests: deltablue (1.63x faster), logging_silent (1.43x faster), richards (1.40x faster).

Stack Overflow

stackoverflow.com › questions › 74605279 › python-3-11-worse-optimized-than-3-10

performance - Python 3.11 worse optimized than 3.10? - Stack Overflow

General guidelines

This is an antipattern. You should not write such a code if you want this to be fast. This is described in PEP-8:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).
For example, do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn’t present at all in implementations that don’t use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

Indeed, other implementations like PyPy does not perform an efficient in-place string concatenation for example. A new bigger string is created for every iteration (since strings are immutable, the previous one may be referenced and PyPy does not use a reference counting but a garbage collector). This results in a quadratic runtime as opposed to a linear runtime in CPython (at least in past implementation).

Deep Analysis

I can reproduce the problem on Windows 10 between the embedded (64-bit x86-64) version of CPython 3.10.8 and the one of 3.11.0:

Timings:
 - CPython 3.10.8:    146.4 ms
 - CPython 3.11.0:  15186.8 ms

It turns out the code has not particularly changed between CPython 3.10 and 3.11 when it comes to Unicode string appending. See for example PyUnicode_Append: 3.10 and 3.11.

A low-level profiling analysis shows that nearly all the time is spent in one unnamed function call of another unnamed function called by PyUnicode_Concat (which is also left unmodified between CPython 3.10.8 and 3.11.0). This slow unnamed function contains a pretty small set of assembly instructions and nearly all the time is spent in one unique x86-64 assembly instruction: rep movsb byte ptr [rdi], byte ptr [rsi]. This instruction is basically meant to copy a buffer pointed by the rsi register to a buffer pointed by the rdi register (the processor copy rcx bytes for the source buffer to the destination buffer and decrement the rcx register for each byte until it reach 0). This information shows that the unnamed function is actually memcpy of the standard MSVC C runtime (ie. CRT) which appears to be called by _copy_characters itself called by _PyUnicode_FastCopyCharacters of PyUnicode_Concat (all the functions are still belonging to the same file). However, these CPython functions are still left unmodified between CPython 3.10.8 and 3.11.0. The non-negligible time spent in malloc/free (about 0.3 seconds) seems to indicate that a lot of new string objects are created -- certainly at least 1 per iteration -- matching with the call to PyUnicode_New in the code of PyUnicode_Concat. All of this indicates that a new bigger string is created and copied as specified above.

The thing is calling PyUnicode_Concat is certainly the root of the performance issue here and I think CPython 3.10.8 is faster because it certainly calls PyUnicode_Append instead. Both calls are directly performed by the main big interpreter evaluation loop and this loop is driven by the generated bytecode.

It turns out that the generated bytecode is different between the two version and it is the root of the performance issue. Indeed, CPython 3.10 generates an INPLACE_ADD bytecode instruction while CPython 3.11 generates a BINARY_OP bytecode instruction. Here is the bytecode for the loops in the two versions:

CPython 3.10 loop:

        >>   28 FOR_ITER                 6 (to 42)
             30 STORE_NAME               4 (_)
  6          32 LOAD_NAME                1 (a)
             34 LOAD_CONST               2 ('a')
             36 INPLACE_ADD                             <----------
             38 STORE_NAME               1 (a)
             40 JUMP_ABSOLUTE           14 (to 28)

CPython 3.11 loop:

        >>   66 FOR_ITER                 7 (to 82)
             68 STORE_NAME               4 (_)
  6          70 LOAD_NAME                1 (a)
             72 LOAD_CONST               2 ('a')
             74 BINARY_OP               13 (+=)         <----------
             78 STORE_NAME               1 (a)
             80 JUMP_BACKWARD            8 (to 66)

This changes appears to come from this issue. The code of the main interpreter loop (see ceval.c) is different between the two CPython version. Here are the code executed by the two versions:

        // In CPython 3.10.8
        case TARGET(INPLACE_ADD): {
            PyObject *right = POP();
            PyObject *left = TOP();
            PyObject *sum;
            if (PyUnicode_CheckExact(left) && PyUnicode_CheckExact(right)) {
                sum = unicode_concatenate(tstate, left, right, f, next_instr); // <-----
                /* unicode_concatenate consumed the ref to left */
            }
            else {
                sum = PyNumber_InPlaceAdd(left, right);
                Py_DECREF(left);
            }
            Py_DECREF(right);
            SET_TOP(sum);
            if (sum == NULL)
                goto error;
            DISPATCH();
        }

//----------------------------------------------------------------------------

        // In CPython 3.11.0
        TARGET(BINARY_OP_ADD_UNICODE) {
            assert(cframe.use_tracing == 0);
            PyObject *left = SECOND();
            PyObject *right = TOP();
            DEOPT_IF(!PyUnicode_CheckExact(left), BINARY_OP);
            DEOPT_IF(Py_TYPE(right) != Py_TYPE(left), BINARY_OP);
            STAT_INC(BINARY_OP, hit);
            PyObject *res = PyUnicode_Concat(left, right); // <-----
            STACK_SHRINK(1);
            SET_TOP(res);
            _Py_DECREF_SPECIALIZED(left, _PyUnicode_ExactDealloc);
            _Py_DECREF_SPECIALIZED(right, _PyUnicode_ExactDealloc);
            if (TOP() == NULL) {
                goto error;
            }
            JUMPBY(INLINE_CACHE_ENTRIES_BINARY_OP);
            DISPATCH();
        }

Note that unicode_concatenate calls PyUnicode_Append (and do some reference counting checks before). In the end, CPython 3.10.8 calls PyUnicode_Append which is fast (in-place) and CPython 3.11.0 calls PyUnicode_Concat which is slow (out-of-place). It clearly looks like a regression to me.

People in the comments reported having no performance issue on Linux. However, experimental tests shows a BINARY_OP instruction is also generated on Linux, and I cannot find so far any Linux-specific optimization regarding string concatenation. Thus, the difference between the platforms is pretty surprising.

Update: towards a fix

I have opened an issue about this available here. One should not that putting the code in a function is significantly faster due to the variable being local (as pointed out by @Dennis in the comments).

How slow is Python's string concatenation vs. str.join?
Python string 'join' is faster (?) than '+', but what's wrong here?
Python string concatenation in for-loop in-place?

2 of 2

As mentioned in the other answer, this is indeed a regression but it will NOT be fixed in Python 3.12, from the GitHub issue:

We aren't implementing a register VM, so the performance in 3.12+ will be like 3.11. Moving the iteration into a function will restore the n ln(n) performance.

Medium

medium.com › homelane-tech › how-python-became-60-faster-in-version-3-11-3d0d110fb87e

How Python Became 60% Faster in Version 3.11! | by Puneet Gupta | homelane-tech | Medium

November 28, 2022 - So, every frame object in python’s internal stack now consumes 160 bytes of memory compared to 240 bytes of the earlier version. There are similar improvements done across the core python codebase which resulted in better performance and speed in ...

GitHub

github.com › mandiant › capa › issues › 1846

evaluate python 3.8 vs python 3.11 performance for standalone builds · Issue #1846 · mandiant/capa

we've been using the oldest supported python build in order to support the widest array of operating systems (esp linux with glibc). but, if newer pythons are faster, then maybe it makes sense to also build a standalone binary that makes use of the latest optimizations. ... enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers ... You can’t perform that action at this time.

Stack Overflow

stackoverflow.com › questions › 74206978 › why-does-this-specific-code-run-faster-in-python-3-11

Why does this specific code run faster in Python 3.11? - Stack Overflow