You can use the multiprocessing module. For this case I might use a processing pool:

from multiprocessing import Pool
pool = Pool()
result1 = pool.apply_async(solve1, [A])    # evaluate "solve1(A)" asynchronously
result2 = pool.apply_async(solve2, [B])    # evaluate "solve2(B)" asynchronously
answer1 = result1.get(timeout=10)
answer2 = result2.get(timeout=10)

This will spawn processes that can do generic work for you. Since we did not pass processes, it will spawn one process for each CPU core on your machine. Each CPU core can execute one process simultaneously.

If you want to map a list to a single function you would do this:

args = [A, B]
results = pool.map(solve1, args)

Don't use threads because the GIL locks any operations on python objects.

Answer from Matt Williamson on Stack Overflow
🌐
Reddit
reddit.com › r/python › achieve true parallelism in python 3.12
r/Python on Reddit: Achieve true parallelism in Python 3.12
April 18, 2024 -

Article link: https://rishiraj.me/articles/2024-04/python_subinterpreter_parallelism

I have written an article, which should be helpful to folks at all experience levels, covering various multi-tasking paradigms in computers, and how they apply in CPython, with its unique limitations like the Global Interpreter Lock. Using this knowledge, we look at traditional ways to achieve "true parallelism" (i.e. multiple tasks running at the same time) in Python.

Finally, we build a solution utilizing newer concepts in Python 3.12 to run any arbitrary pure Python code in parallel across multiple threads. All the code used to achieve this, along with the benchmarking code are available in the repository linked in the blog-post.

This is my first time writing a technical post in Python. Any feedback would be really appreciated! 😊

🌐
Python
docs.python.org › 3 › library › multiprocessing.html
multiprocessing — Process-based parallelism — Python 3.14.3 documentation
February 23, 2026 - Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both POSIX and Windows. The multiprocessing module also introduces the Pool object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism).
Discussions

parallel processing - How do I parallelize a simple Python loop? - Stack Overflow
You can let joblib use multiple ... multiple processes, but this (or using import threading directly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another thread · Since Python 3.7, as an alternative to threading, you can parallelise work with ... More on stackoverflow.com
🌐 stackoverflow.com
Converting for loops to run in parallel?
A few thoughts I think you are maybe confused with starmap - as you have it written I don't think it would work. Each iterable needs to have the same number of elements. If you call pool.starmap(function, ([1,2,3], [4,5,6])) then it's going to call: function(1, 4) function(2, 5) function(3, 6) It looks like instead you want "all the possible combinations of i and j 2. when you use multiprocessing, that "with" block needs to be under a "if __name__ == '__main__'" block. More recent versions of python will warn you about this I'll see if I can put together an example to work from More on reddit.com
🌐 r/learnpython
9
3
May 6, 2024
Python 3.12 speedup plan! Includes less RC overhead, compact objects, trace optimized interpreter and more!
I hope one day I learn enough to understand wth you guys are talking about More on reddit.com
🌐 r/Python
73
521
September 20, 2022
What are the differences between Python 3.12 sub-interpreters and multithreading/multiprocessing?
Basically yes. You can think of it as a more performant multiprocessing. Although the Python interface you mention will only be available in 3.13, for now it's strictly for the C API More on reddit.com
🌐 r/Python
16
94
October 3, 2023
Top answer
1 of 11
224

You can use the multiprocessing module. For this case I might use a processing pool:

from multiprocessing import Pool
pool = Pool()
result1 = pool.apply_async(solve1, [A])    # evaluate "solve1(A)" asynchronously
result2 = pool.apply_async(solve2, [B])    # evaluate "solve2(B)" asynchronously
answer1 = result1.get(timeout=10)
answer2 = result2.get(timeout=10)

This will spawn processes that can do generic work for you. Since we did not pass processes, it will spawn one process for each CPU core on your machine. Each CPU core can execute one process simultaneously.

If you want to map a list to a single function you would do this:

args = [A, B]
results = pool.map(solve1, args)

Don't use threads because the GIL locks any operations on python objects.

2 of 11
59

This can be done very elegantly with Ray.

To parallelize your example, you'd need to define your functions with the @ray.remote decorator, and then invoke them with .remote.

import ray

ray.init()

# Define the functions.

@ray.remote
def solve1(a):
    return 1

@ray.remote
def solve2(b):
    return 2

# Start two tasks in the background.
x_id = solve1.remote(0)
y_id = solve2.remote(1)

# Block until the tasks are done and get the results.
x, y = ray.get([x_id, y_id])

There are a number of advantages of this over the multiprocessing module.

  1. The same code will run on a multicore machine as well as a cluster of machines.

  2. Processes share data efficiently through shared memory and zero-copy serialization.

  3. Error messages are propagated nicely.

  4. These function calls can be composed together, e.g.,

     @ray.remote
     def f(x):
         return x + 1
    
     x_id = f.remote(1)
     y_id = f.remote(x_id)
     z_id = f.remote(y_id)
     ray.get(z_id)  # returns 4
    
  5. In addition to invoking functions remotely, classes can be instantiated remotely as actors.

Note that Ray is a framework I've been helping develop.

🌐
Daily Dose of DS
blog.dailydoseofds.com › p › 4-parallel-processing-techniques
4 Parallel Processing Techniques in Python - by Avi Chawla
February 4, 2026 - These are isolated execution environments within a single process. Each has its own memory space and GIL, enabling safe parallelism with less overhead than multiprocessing. They’re safer than threads because they don’t share global objects ...
🌐
Machine Learning Plus
machinelearningplus.com › blog › parallel processing in python – a practical guide with examples
Parallel Processing in Python - A Practical Guide with Examples | ML+
April 20, 2022 - Problem 2: Use Pool.map() to run the following python scripts in parallel. Script names: ‘script1.py’, ‘script2.py’, ‘script3.py’ · Show Solution · python · import os import multiprocessing as mp processes = ('script1.py', 'script2.py', 'script3.py') def run_python(process): os.system('python {}'.format(process)) pool = mp.Pool(processes=3) pool.map(run_python, processes) Problem 3: Normalize each row of 2d array (list) to vary between 0 and 1. python · list_a = [[2, 3, 4, 5], [6, 9, 10, 12], [11, 12, 13, 14], [21, 24, 25, 26]] Show Solution ·
🌐
Berkeley
computing.stat.berkeley.edu › tutorial-parallelization › parallel-python.html
Parallel processing in Python
Python provides a variety of functionality for parallelization, including threaded operations (in particular for linear algebra), parallel looping and map statements, and parallelization across multiple machines. For the CPU, this material focuses on Python’s ipyparallel package and JAX, ...
🌐
Anyscale
anyscale.com › blog › parallelizing-python-code
Guide to Parallelizing Python Code
September 2, 2021 - The first approach is to use process-based parallelism. With this approach, it is possible to start several processes at the same time (concurrently). This way, they can concurrently perform calculations.
Find elsewhere
🌐
Towards Data Science
towardsdatascience.com › home › latest › parallelize your python code to save time on data processing
Parallelize your python code to save time on data processing | Towards Data Science
January 28, 2025 - Now doing this process on each image is independent of each other, i.e., processing one image would not affect any other image in the folder. Hence multiprocessing could help us reduce the total time. Our total time will be reduced by a factor equal to the number of processors we use in parallel.
Top answer
1 of 15
314

The CPython implementation currently has a global interpreter lock (GIL) that prevents threads of the same interpreter from concurrently executing Python code. This means CPython threads are useful for concurrent I/O-bound workloads, but usually not for CPU-bound workloads. The naming calc_stuff() indicates that your workload is CPU-bound, so you want to use multiple processes here (which is often the better solution for CPU-bound workloads anyway, regardless of the GIL).

There are two easy ways of creating a process pool into the Python standard library. The first one is the multiprocessing module, which can be used like this:

pool = multiprocessing.Pool(4)
out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))

Note that this won't work in the interactive interpreter due to the way multiprocessing is implemented.

The second way to create a process pool is concurrent.futures.ProcessPoolExecutor:

with concurrent.futures.ProcessPoolExecutor() as pool:
    out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))

This uses the multiprocessing module under the hood, so it behaves identically to the first version.

2 of 15
211
from joblib import Parallel, delayed
def process(i):
    return i * i
    
results = Parallel(n_jobs=2)(delayed(process)(i) for i in range(10))
print(results)  # prints [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The above works beautifully on my machine (Ubuntu, package joblib was pre-installed, but can be installed via pip install joblib).

Taken from https://blog.dominodatalab.com/simple-parallelization/


Edit on Mar 31, 2021: On joblib, multiprocessing, threading and asyncio

  • joblib in the above code uses import multiprocessing under the hood (and thus multiple processes, which is typically the best way to run CPU work across cores - because of the GIL)
  • You can let joblib use multiple threads instead of multiple processes, but this (or using import threading directly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another thread
  • Since Python 3.7, as an alternative to threading, you can parallelise work with asyncio, but the same advice applies like for import threading (though in contrast to latter, only 1 thread will be used; on the plus side, asyncio has a lot of nice features which are helpful for async programming)
  • Using multiple processes incurs overhead. Think about it: Typically, each process needs to initialise/load everything you need to run your calculation. You need to check yourself if the above code snippet improves your wall time. Here is another one, for which I confirmed that joblib produces better results:
import time
from joblib import Parallel, delayed

def countdown(n):
    while n>0:
        n -= 1
    return n


t = time.time()
for _ in range(20):
    print(countdown(10**7), end=" ")
print(time.time() - t)  
# takes ~10.5 seconds on medium sized Macbook Pro


t = time.time()
results = Parallel(n_jobs=2)(delayed(countdown)(10**7) for _ in range(20))
print(results)
print(time.time() - t)
# takes ~6.3 seconds on medium sized Macbook Pro
🌐
Berkeley
computing.stat.berkeley.edu › tutorial-parallelization
Parallel processing in Python, R, Julia, MATLAB, and C/C++
March 25, 2025 - Note that this is a different notion than a processor that is hyperthreaded. With hyperthreading a single core appears as two cores to the operating system. Threads generally do share objects in memory, thereby allowing us to have a single copy of objects instead of one per thread. One very common use of threading is for linear algebra, using threaded linear alebra packages accessed from Python, R, MATLAB, or C/C++. Parallel ...
🌐
Stack Abuse
stackabuse.com › parallel-processing-in-python
Parallel Processing in Python
July 16, 2023 - The system() method is part of the os module, which allows to execute external command line programs in a separate process from your Python program. The system() method is a blocking call, and you have to wait until the call is finished and returns. As a UNIX/Linux fetishist you know that a ...
🌐
Reddit
reddit.com › r/learnpython › converting for loops to run in parallel?
r/learnpython on Reddit: Converting for loops to run in parallel?
May 6, 2024 -

I have some code that works really well, and does exactly what I want it to do - but runs much slower than I’d like. I am trying to explore parallel processing to speed it up since it feels like a perfect case for it, but I’ve been really struggling: all examples I’ve found feel convoluted and don’t seem to work.

Basically I have two for loops that call a function. The function does a bunch of math and returns a single value. My current code is as follows, where a,b,c are constant-value arguments that don’t change between iterations. b is a list that can be really large (between 20 to 1,000 values), so I don’t know if that will impact the parallel options. My base code is below.

a = 5. 

b = [5,7,1,9,4,0,3,2,6,7,8,9,3,7,0,7,5,4,6,8]. 

c = 2.5. 

List1 = np.linspace(0,1,10). 

List2 = np.linspace(2,8,20). 

Result = np.zeros([len(List1),len(List2)]). 

For i in enumerate(List1):  
  For j in enumerate(List2):  

     Result[i[0],j[0]] = Function(i[1],j[1],a,b,c)

I’ve tried a few things using multiprocessing.pool and starmap, but they don’t seem to work. It creates the subprocesses via task manager, but they never do anything. The code normally runs in 2 minutes, but I had the multiprocessing option running for 2 hours and it was still going with no end in sight. I have no idea what it’s doing or what it is stuck on: my usual troubleshooting approach of adding print statements doesn’t seem to work - probably because it is printing in a separate process.

Things I’ve tried are:

  • Putting all arguments into a single list, using multiprocessing.Pool.map, and unpacking the arguments inside of the function

  • Using multiprocessing.starmap

My starmap attempt that ran forever looked like:

With multiprocessing.Pool(processe=6) as pool:

    Results = pool.starmap(Function, (List1, List2, a,b,c))
Top answer
1 of 3
1
A few thoughts I think you are maybe confused with starmap - as you have it written I don't think it would work. Each iterable needs to have the same number of elements. If you call pool.starmap(function, ([1,2,3], [4,5,6])) then it's going to call: function(1, 4) function(2, 5) function(3, 6) It looks like instead you want "all the possible combinations of i and j 2. when you use multiprocessing, that "with" block needs to be under a "if __name__ == '__main__'" block. More recent versions of python will warn you about this I'll see if I can put together an example to work from
2 of 3
1
See code below, if you have any questions let me know. I made up some nonsense with a delay for "function()" Results: Starting single Starting multi Single: 5.734 Multi: 1.799 The code: import itertools import numpy as np import multiprocessing import time a = 5 b = [5, 7, 1, 9, 4, 0, 3, 2, 6, 7, 8, 9, 3, 7, 0, 7, 5, 4, 6, 8] c = 2.5 list1 = np.linspace(0, 1, 10) list2 = np.linspace(2, 8, 20) def function(i, j, a, b, c): time.sleep(.025) return i + j + a + b[0] + c def single(): print("Starting single") results = np.zeros([len(list1), len(list2)]) for i in enumerate(list1): for j in enumerate(list2): r = function(i[1], j[1], a, b, c) results[i[0], j[0]] = r def multi(): print("Starting multi") results = np.zeros([len(list1), len(list2)]) with multiprocessing.Pool(processes=6) as pool: list_args = itertools.product(list1, list2) # chunk_size is a tunable parameter, it controls approximately how many jobs each process "takes" at once map_results = pool.starmap(function, ((l1, l2, a, b, c) for (l1, l2) in list_args), chunksize=5) # map_results has a list of the answers in order. I'll leave it as an exercise to create the results np array if __name__ == '__main__': t1 = time.time() single() t2 = time.time() multi() t3 = time.time() print(f"Single: {t2-t1:0.3f} Multi: {t3-t2:0.3f}")
🌐
Aaltoscicomp
aaltoscicomp.github.io › python-for-scicomp › parallel
Parallel programming — Python for Scientific Computing documentation
Next we take advantage of the MPI parallelisation and run on 2 cores: $ mpirun -n 2 python mpi_test.py before gather: rank 0, n_inside_circle: 3926634 before gather: rank 1, n_inside_circle: 3925910 after gather: rank 1, n_inside_circle: None after gather: rank 0, n_inside_circle: [3926634, 3925910] number of darts: 10000000, estimate: 3.1410176, time spent: 1.3 seconds · Note that two MPI processes are now printing output.
🌐
Real Python
realpython.com › ref › stdlib › multiprocessing
multiprocessing | Python Standard Library – Real Python
In this example, the multiprocessing package helps you distribute the workload across multiple processes, significantly reducing the time needed to process all images in the directory. ... In this tutorial, you'll explore concurrency in Python, including multi-threaded and asynchronous solutions for I/O-bound tasks, and multiprocessing for CPU-bound tasks.
🌐
Medium
medium.com › swlh › 5-step-guide-to-parallel-processing-in-python-ac0ecdfcea09
5-Steps Guide to Parallel Processing in Python | by Kurt F. | The Startup | Medium
September 10, 2020 - To basically understand parallel processing, this example can be given: from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': with Pool(5) as p: print(p.map(f, [1, 2, 3])) output: [1, 4, 9] If you think whyif __name__ == '__main__' part is necessary, “Yes, the” entry point “of the program must be protected” (https://docs.python.org/3.8/library/multiprocessing.html#multiprocessing-programming)
🌐
Yale University
docs.ycrc.yale.edu › parallel_python
Parallel Programming with Python
Modules: pandas, numpy, multiprocessing, joblib, dask and dask-distributed, PIL (for imamge processing), matplotlib, cupy (for GPU parallelism) ... Clone or download the zip file that contains this notebook and required data. ... # Define an array of numbers foo = np.array([0, 1, 2, 3, 4, 5]) ...
🌐
SitePoint
sitepoint.com › blog › programming › a guide to python multiprocessing and parallel programming
A Guide to Python Multiprocessing and Parallel Programming — SitePoint
November 13, 2024 - Learn what Python multiprocessing is, its advantages, and how to improve the running time of Python programs by using parallel programming.
🌐
Real Python
realpython.com › python-parallel-processing
Bypassing the GIL for Parallel Processing in Python – Real Python
September 6, 2023 - (venv) $ time python numpy_threads.py real 0m2.955s user 0m7.415s sys 0m1.126s (venv) $ export OMP_NUM_THREADS=1 (venv) $ time python numpy_threads.py real 0m5.887s user 0m5.753s sys 0m0.132s · In the first case, the real elapsed time is two and a half times shorter than the user CPU time, which is a clear giveaway that your code is running in parallel, taking advantage of multiple threads.
🌐
Joblib
joblib.readthedocs.io › en › latest › parallel.html
Embarrassingly parallel for loops — joblib 1.6.dev0 documentation
from joblib import Parallel, delayed, parallel_config with parallel_config(backend="loky", inner_max_num_threads=2): results = Parallel(n_jobs=4)(delayed(func)(x, y) for x, y in data) In this example, 4 Python worker processes will be allowed to use 2 threads each, meaning that this program will be able to use up to 8 CPUs concurrently. Prior to version 0.12, joblib used the 'multiprocessing' backend as default backend instead of 'loky'.
🌐
Uppmax
uppmax.github.io › HPC-python › day4 › parallel.html
Parallel computing with Python — Using Python in an HPC environment 2.0 documentation
By using this module, one can create several threads to do some work in parallel (in principle). For jobs dealing with files I/O one can observe some speedup by using the threading module. However, for CPU intensive jobs one would see a decrease in performance w.r.t. the serial code.