I am able to reproduce the observed behavior between CPython 3.11.2 and CPython 3.12.0rc2 on Debian Linux 6.1.0-6 using an Intel i5-9600KF CPU. I tried to use a low-level profiling approach so to find the differences. Put it shortly: your benchmark is very specific and CPython 3.12 is less optimized for this specific case. CPython 3.12 seems to manage object allocations, and more specifically range a bit differently. CPython 3.12 appears to create a new object from the constant 2 for every iteration of the loop as opposed to CPython 3.11. Moreover, the main evaluation function do an indirect function pointer call which is particularly slow in this case. Anyway, you should not use (C)Python in such a use-case (this is stated in the CPython doc).


Under the hood

Here is the results I get (pretty stable between multiple launches):

3.11: 2.026395082473755
3.12: 2.4122846126556396

Thus, CPython 3.12 is roughly 20% slower than CPython 3.11 on my machine.

Profiling results indicates that half the overhead comes from an indirect function pointer call in the main evaluation function of CPython 3.12 which was not present in CPython 3.11. This function call is expensive on most modern processors. Here is the assembly code of the hot part:

       β”‚      ↓ je         4b8
  1,28 β”‚        mov        (%rdi),%rax
  0,33 β”‚        test       %eax,%eax
  0,61 β”‚      ↓ js         4b8
  0,02 β”‚        sub        $0x1,%rax
  2,80 β”‚        mov        %rax,(%rdi)
       β”‚      ↓ jne        4b8
  0,08 β”‚        mov        0x8(%rdi),%rax
 16,28 β”‚      β†’ call       *0x30(%rax)            <---------------- HERE
       β”‚        nop
  1,53 β”‚ 4b8:   mov        (%rsp),%rax
       β”‚        lea        0x4(%rax),%rcx
       β”‚        movzbl     0x3(%rax),%eax
  0,06 β”‚        mov        0x48(%r13,%rax,8),%rdx
  1,82 β”‚        mov        (%rdx),%eax
  0,04 β”‚        add        $0x1,%eax
       β”‚      ↓ je         8800

While the assembly code of the same function in CPython 3.11 is similar, it does not have such expensive call. Still, there are many similar indirect function calls like this already in CPython 3.11. My hypothesis is that such a call is more expensive in CPython 3.12 because it is less predictable by the hardware prediction unit (possibly because the same instruction calls multiple functions). For more information about that, please read this great post. I cannot say much more about this part since the assembly code is really HUGE (and it turns out the C code is also pretty big).

The rest of the overhead seems to come from the way object and more specifically constant are managed in CPython 3.12. Indeed, In CPython 3.11, PyObject_Free calls are slow (because you spent all your time creating/deleting objects), while in CPython 3.12, such a call is not even visible in the profiler, but there is instead PyLong_FromLong which is quite slow (not visible in CPython 3.11). The rest of the (many other) functions only takes less than 25~30% and look similar in the two versions. Based on that, we can conclude that CPython 3.12 creates a new object from the constant 2 for each iteration of the loop as opposed to CPython 3.11. This is clearly not efficient in this case (one should keep in mind that CPython is an interpreter and not a compiler though so it is not surprising it does not perform optimizations on this). There is a simple way to check that: store 2 in a variable before the loop and use this variable in the loop. Here is the corrected code:

import time

def calc():
    const = 2
    for i in range(100_000_000):
        x = i * const

t = time.time()
calc()
print(time.time() - t)

Here are the timings of the corrected code:

3.11: 2.045902967453003
3.12: 2.2230796813964844

The CPython 3.12 version is now substantially faster than before while the other version is almost unaffected by the modification of the code. At first glance, it tends to confirm the last hypothesis. That being said, the profiler still report many calls to PyLong_FromLong in the modified code! It turns out this change removed the issue related to the indirect function pointer call discussed in the beginning of this section!

My hypothesis is that the PyLong_FromLong calls are coming from a different way to manage the objects generated from range (i.e. i). The following code tends to confirm that (note the code require ~4 GiB of RAM due to the list so it should not be used in production but only for testing purposes):

import time

def calc():
    const = 2
    TMP = list(range(100_000_000))
    t = time.time()
    for i in TMP:
        x = i * const
    print(time.time() - t)

calc()

Here are results on my machine:

3.11: 1.6515681743621826
3.12: 1.7162528038024902

The gap is closer than before and the timings of the loop is smaller since objects are all pre-computed in the list before. Profiling results confirm PyLong_FromLong is not called in the timed loop. Thus, range is slower in this case in CPython 3.12.

The rest of the overhead is small (<4%). Such a performance gap can come from compiler optimizations or even very tiny changes in the CPython source code. For example, simple things like the address of conditional jumps can significantly impact performance results on many Intel CPUs (see: JCC erratum). Tiny details like this matters and compilers are not perfect. This is why such a variation of performance is common and rather expected so it's not worth investigating.

By the way, if you care about performance, then please use Cython or PyPy for such a computing code.

Answer from JΓ©rΓ΄me Richard on Stack Overflow
🌐
Python documentation
docs.python.org β€Ί 3 β€Ί whatsnew β€Ί 3.12.html
What's New In Python 3.12 β€” Python 3.14.3 documentation
February 23, 2026 - ---------------------------------------------------------------------- Ran 158 tests in 9.869s OK (skipped=3) ... Add a command-line interface. (Contributed by Adam Chhina in gh-88597.) Remove wstr and wstr_length members from Unicode objects. It reduces object size by 8 or 16 bytes on 64bit platform. (PEP 623) (Contributed by Inada Naoki in gh-92536.) Add experimental support for using the BOLT binary optimizer in the build process, which improves performance by 1-5%. (Contributed by Kevin Modzelewski in gh-90536 and tuned by Donghee Na in gh-101525)
🌐
DEV Community
dev.to β€Ί maximsaplin β€Ί python-312-performance-a-quick-test-4en5
Python 3.12 Performance - a Quick Test - DEV Community
December 7, 2023 - 3.12.0 - ~54 sec 3.11.6 - ~54 sec 3.10.12 - ~85 sec 3.9.18 - ~85 sec Dart VM - ~0.62 sec gcc - ~0.33 sec Β· For comparison I have added results of Dart implementation (run in VM) and C++ (compiled via gcc) - we're talking of ~100x difference.
Top answer
1 of 1
17

I am able to reproduce the observed behavior between CPython 3.11.2 and CPython 3.12.0rc2 on Debian Linux 6.1.0-6 using an Intel i5-9600KF CPU. I tried to use a low-level profiling approach so to find the differences. Put it shortly: your benchmark is very specific and CPython 3.12 is less optimized for this specific case. CPython 3.12 seems to manage object allocations, and more specifically range a bit differently. CPython 3.12 appears to create a new object from the constant 2 for every iteration of the loop as opposed to CPython 3.11. Moreover, the main evaluation function do an indirect function pointer call which is particularly slow in this case. Anyway, you should not use (C)Python in such a use-case (this is stated in the CPython doc).


Under the hood

Here is the results I get (pretty stable between multiple launches):

3.11: 2.026395082473755
3.12: 2.4122846126556396

Thus, CPython 3.12 is roughly 20% slower than CPython 3.11 on my machine.

Profiling results indicates that half the overhead comes from an indirect function pointer call in the main evaluation function of CPython 3.12 which was not present in CPython 3.11. This function call is expensive on most modern processors. Here is the assembly code of the hot part:

       β”‚      ↓ je         4b8
  1,28 β”‚        mov        (%rdi),%rax
  0,33 β”‚        test       %eax,%eax
  0,61 β”‚      ↓ js         4b8
  0,02 β”‚        sub        $0x1,%rax
  2,80 β”‚        mov        %rax,(%rdi)
       β”‚      ↓ jne        4b8
  0,08 β”‚        mov        0x8(%rdi),%rax
 16,28 β”‚      β†’ call       *0x30(%rax)            <---------------- HERE
       β”‚        nop
  1,53 β”‚ 4b8:   mov        (%rsp),%rax
       β”‚        lea        0x4(%rax),%rcx
       β”‚        movzbl     0x3(%rax),%eax
  0,06 β”‚        mov        0x48(%r13,%rax,8),%rdx
  1,82 β”‚        mov        (%rdx),%eax
  0,04 β”‚        add        $0x1,%eax
       β”‚      ↓ je         8800

While the assembly code of the same function in CPython 3.11 is similar, it does not have such expensive call. Still, there are many similar indirect function calls like this already in CPython 3.11. My hypothesis is that such a call is more expensive in CPython 3.12 because it is less predictable by the hardware prediction unit (possibly because the same instruction calls multiple functions). For more information about that, please read this great post. I cannot say much more about this part since the assembly code is really HUGE (and it turns out the C code is also pretty big).

The rest of the overhead seems to come from the way object and more specifically constant are managed in CPython 3.12. Indeed, In CPython 3.11, PyObject_Free calls are slow (because you spent all your time creating/deleting objects), while in CPython 3.12, such a call is not even visible in the profiler, but there is instead PyLong_FromLong which is quite slow (not visible in CPython 3.11). The rest of the (many other) functions only takes less than 25~30% and look similar in the two versions. Based on that, we can conclude that CPython 3.12 creates a new object from the constant 2 for each iteration of the loop as opposed to CPython 3.11. This is clearly not efficient in this case (one should keep in mind that CPython is an interpreter and not a compiler though so it is not surprising it does not perform optimizations on this). There is a simple way to check that: store 2 in a variable before the loop and use this variable in the loop. Here is the corrected code:

import time

def calc():
    const = 2
    for i in range(100_000_000):
        x = i * const

t = time.time()
calc()
print(time.time() - t)

Here are the timings of the corrected code:

3.11: 2.045902967453003
3.12: 2.2230796813964844

The CPython 3.12 version is now substantially faster than before while the other version is almost unaffected by the modification of the code. At first glance, it tends to confirm the last hypothesis. That being said, the profiler still report many calls to PyLong_FromLong in the modified code! It turns out this change removed the issue related to the indirect function pointer call discussed in the beginning of this section!

My hypothesis is that the PyLong_FromLong calls are coming from a different way to manage the objects generated from range (i.e. i). The following code tends to confirm that (note the code require ~4 GiB of RAM due to the list so it should not be used in production but only for testing purposes):

import time

def calc():
    const = 2
    TMP = list(range(100_000_000))
    t = time.time()
    for i in TMP:
        x = i * const
    print(time.time() - t)

calc()

Here are results on my machine:

3.11: 1.6515681743621826
3.12: 1.7162528038024902

The gap is closer than before and the timings of the loop is smaller since objects are all pre-computed in the list before. Profiling results confirm PyLong_FromLong is not called in the timed loop. Thus, range is slower in this case in CPython 3.12.

The rest of the overhead is small (<4%). Such a performance gap can come from compiler optimizations or even very tiny changes in the CPython source code. For example, simple things like the address of conditional jumps can significantly impact performance results on many Intel CPUs (see: JCC erratum). Tiny details like this matters and compilers are not perfect. This is why such a variation of performance is common and rather expected so it's not worth investigating.

By the way, if you care about performance, then please use Cython or PyPy for such a computing code.

🌐
Lewoniewski
en.lewoniewski.info β€Ί 2023 β€Ί python-3-11-vs-python-3-12-performance-testing
Python 3.11 vs Python 3.12 – performance testing - Lewoniewski
October 21, 2023 - The result shows that Python 3.12 has the best performance results over Python 3.11 in the following tests: typing_runtime_protocols (2.99x faster), generators (1.55x faster), asyncio_tcp (1.49x faster).
🌐
Flyaps
flyaps.com β€Ί update python 3.12: is it 2 times faster? key changes
Update Python 3.12: Is it 2 times faster? Key changes
June 17, 2025 - In Python 3.12, the isinstance() ... when checking protocols. Most isinstance() protocol checks should be at least twice as fast as in version 3.11, with some being up to 20 times faster or more....
🌐
Reddit
reddit.com β€Ί r/python β€Ί python 3.12: a game-changer in performance and efficiency
r/Python on Reddit: Python 3.12: A Game-Changer in Performance and Efficiency
March 3, 2023 - No. Not yet. The thing I think people are missing with this change is that while yes, obviously, python is not yet a truly multi-threaded, performance-focused language, it is moving in that direction. 3.12 is as close to losing the GIL as Python has ever been, and iteratively we will get there.
🌐
Reddit
reddit.com β€Ί r/programming β€Ί python 3.12 vs python 3.13 – performance testing. a total of 100 various benchmark tests were conducted on computers with the amd ryzen 7000 series and the 13th-generation of intel core processors for desktops, laptops or mini pcs.
r/programming on Reddit: Python 3.12 vs Python 3.13 – performance testing. A total of 100 various benchmark tests were conducted on computers with the AMD Ryzen 7000 series and the 13th-generation of Intel Core processors for desktops, laptops or mini PCs.
October 14, 2024 - AMD Ryzen 7000 Series Desktop Processor Group of benchmarks Python 3.13 to Python 3.12 apps 1.06x faster asyncio 1.22x faster math 1.07x faster regex not significant serialize 1.05x faster startup 1.04x slower template 1.03x faster Result (geometric mean) 1.08x faster 13th Gen Intel Core Mobile Processor Group of benchmarks Python 3.13 to Python 3.12 apps 1.00x faster asyncio 1.19x faster math 1.06x faster regex 1.04x slower serialize 1.02x faster startup 1.01x slower template 1.02x slower Result (geometric mean) 1.05x faster
🌐
FB
engineering.fb.com β€Ί home β€Ί meta contributes new features to python 3.12
Meta contributes new features to Python 3.12 - Engineering at Meta
October 5, 2023 - In Python 3.12, PEP 709 inlines all list, dict, and set comprehensions for better performance (up to two times better in the best case).
Find elsewhere
🌐
GitHub
github.com β€Ί faster-cpython β€Ί ideas β€Ί wiki β€Ί Python-3.12-Goals
Python 3.12 Goals Β· faster-cpython/ideas Wiki
March 3, 2023 - There are a number of opportunities for decreasing the size of Python object structs. Since they are used so frequently, this benefits not just overall memory usage, but cache coherency as well. We plan to implement the most promising of these ideas for 3.12. There are some tradeoffs between backward compatibility and performance here that is likely to result in a PEP to build consensus.
Author Β  faster-cpython
🌐
HackerNoon
hackernoon.com β€Ί python-312-overview-past-limitations-faster-cpythons-advent-and-the-excitement-of-new-versions
Python 3.12 Overview: Past Limitations, Faster CPython's Advent, and the Excitement of New Versions | HackerNoon
November 1, 2023 - Focused on turbocharging Python's performance, CPython aimed to not only address the age-old speed concerns but also navigate through a spectrum of issues.
🌐
Medium
medium.com β€Ί @HeCanThink β€Ί python-3-12-more-faster-and-more-efficient-python-b636f00b0471
Python 3.12: More Faster and More Efficient Python πŸƒ | by Manoj Das | Medium
May 3, 2023 - Python 3.12 builds upon this foundation, introducing additional opcode enhancements and simplifying specialization. As a result, execution speed is significantly accelerated, delivering performance gains across a wide range of scenarios.
🌐
AppMaster
appmaster.io β€Ί home β€Ί news β€Ί python 3.12 revolutionizes python's performance and efficiency
Python 3.12 Revolutionizes Python's Performance and Efficiency | AppMaster
May 3, 2023 - The upcoming Python 3.12 release promises significant enhancements to speed, memory efficiency, adaptability, and concurrency. These improvements aim to future-proof the popular dynamic programming language and streamline development.
🌐
Medium
medium.com β€Ί @backendbyeli β€Ί python-3-13-vs-3-12-why-upgrading-improves-productivity-f53802557df5
Python 3.13 vs 3.12: Performance & Productivity Gains | Medium
November 13, 2025 - Discover Python 3.13's JIT compiler, improved error messages, and performance gains. Learn why upgrading from 3.12 boosts backend development productivity today.
🌐
Phoronix
phoronix.com β€Ί news β€Ί Python-3.12-RC1
Python 3.12 RC1 Available For Testing - Better Performance, Linux Perf Integration - Phoronix
August 7, 2023 - On the performance front, Python 3.12 is to bring a variety of "many large and small performance improvements" with around 5% better performance overall.
🌐
Python
docs.python.org β€Ί 3.12 β€Ί whatsnew β€Ί 3.12.html
What’s New In Python 3.12
September 11, 2023 - ---------------------------------------------------------------------- Ran 158 tests in 9.869s OK (skipped=3) ... Add a command-line interface. (Contributed by Adam Chhina in gh-88597.) Remove wstr and wstr_length members from Unicode objects. It reduces object size by 8 or 16 bytes on 64bit platform. (PEP 623) (Contributed by Inada Naoki in gh-92536.) Add experimental support for using the BOLT binary optimizer in the build process, which improves performance by 1-5%. (Contributed by Kevin Modzelewski in gh-90536 and tuned by Donghee Na in gh-101525)
🌐
GitHub
github.com β€Ί python β€Ί cpython β€Ί issues β€Ί 123540
Performance regression for loops in 3.12 vs 3.11 Β· Issue #123540 Β· python/cpython
August 31, 2024 - max): 356.0 ms … 478.5 ms 100 runs Benchmark 2: python3.12 test_loops.py --num-rows 5000000 Time (mean Β± Οƒ): 408.3 ms Β± 27.9 ms [User: 284.6 ms, System: 123.6 ms] Range (min … max): 371.0 ms …
Author Β  stephanGarland
🌐
Lost
lost.co.nz β€Ί articles β€Ί sixteen-years-of-python-performance
Sixteen Years of Python Performance - Lost
September 9, 2024 - Python 3.12 saw us double Python 2's best effort, and the optional JIT shipped with the soon-to-be-released Python 3.13 squeezes out a few more percentage points.
🌐
Lewoniewski
en.lewoniewski.info β€Ί 2024 β€Ί python-3-12-vs-python-3-13-performance-testing
Python 3.12 vs Python 3.13 – performance testing
November 11, 2025 - The result shows that Python 3.13 has the best performance results over Python 3.12 in the following tests: asyncio_tcp_ssl (1.51x faster), async_tree_io_tg (1.43x faster), async_tree_eager_io (1.40x faster).