You can use the multiprocessing module. For this case I might use a processing pool:
from multiprocessing import Pool
pool = Pool()
result1 = pool.apply_async(solve1, [A]) # evaluate "solve1(A)" asynchronously
result2 = pool.apply_async(solve2, [B]) # evaluate "solve2(B)" asynchronously
answer1 = result1.get(timeout=10)
answer2 = result2.get(timeout=10)
This will spawn processes that can do generic work for you. Since we did not pass processes, it will spawn one process for each CPU core on your machine. Each CPU core can execute one process simultaneously.
If you want to map a list to a single function you would do this:
args = [A, B]
results = pool.map(solve1, args)
Don't use threads because the GIL locks any operations on python objects.
Answer from Matt Williamson on Stack OverflowArticle link: https://rishiraj.me/articles/2024-04/python_subinterpreter_parallelism
I have written an article, which should be helpful to folks at all experience levels, covering various multi-tasking paradigms in computers, and how they apply in CPython, with its unique limitations like the Global Interpreter Lock. Using this knowledge, we look at traditional ways to achieve "true parallelism" (i.e. multiple tasks running at the same time) in Python.
Finally, we build a solution utilizing newer concepts in Python 3.12 to run any arbitrary pure Python code in parallel across multiple threads. All the code used to achieve this, along with the benchmarking code are available in the repository linked in the blog-post.
This is my first time writing a technical post in Python. Any feedback would be really appreciated! 😊
parallel processing - How do I parallelize a simple Python loop? - Stack Overflow
Converting for loops to run in parallel?
Python 3.12 speedup plan! Includes less RC overhead, compact objects, trace optimized interpreter and more!
What are the differences between Python 3.12 sub-interpreters and multithreading/multiprocessing?
Videos
You can use the multiprocessing module. For this case I might use a processing pool:
from multiprocessing import Pool
pool = Pool()
result1 = pool.apply_async(solve1, [A]) # evaluate "solve1(A)" asynchronously
result2 = pool.apply_async(solve2, [B]) # evaluate "solve2(B)" asynchronously
answer1 = result1.get(timeout=10)
answer2 = result2.get(timeout=10)
This will spawn processes that can do generic work for you. Since we did not pass processes, it will spawn one process for each CPU core on your machine. Each CPU core can execute one process simultaneously.
If you want to map a list to a single function you would do this:
args = [A, B]
results = pool.map(solve1, args)
Don't use threads because the GIL locks any operations on python objects.
This can be done very elegantly with Ray.
To parallelize your example, you'd need to define your functions with the @ray.remote decorator, and then invoke them with .remote.
import ray
ray.init()
# Define the functions.
@ray.remote
def solve1(a):
return 1
@ray.remote
def solve2(b):
return 2
# Start two tasks in the background.
x_id = solve1.remote(0)
y_id = solve2.remote(1)
# Block until the tasks are done and get the results.
x, y = ray.get([x_id, y_id])
There are a number of advantages of this over the multiprocessing module.
The same code will run on a multicore machine as well as a cluster of machines.
Processes share data efficiently through shared memory and zero-copy serialization.
Error messages are propagated nicely.
These function calls can be composed together, e.g.,
@ray.remote def f(x): return x + 1 x_id = f.remote(1) y_id = f.remote(x_id) z_id = f.remote(y_id) ray.get(z_id) # returns 4In addition to invoking functions remotely, classes can be instantiated remotely as actors.
Note that Ray is a framework I've been helping develop.
The CPython implementation currently has a global interpreter lock (GIL) that prevents threads of the same interpreter from concurrently executing Python code. This means CPython threads are useful for concurrent I/O-bound workloads, but usually not for CPU-bound workloads. The naming calc_stuff() indicates that your workload is CPU-bound, so you want to use multiple processes here (which is often the better solution for CPU-bound workloads anyway, regardless of the GIL).
There are two easy ways of creating a process pool into the Python standard library. The first one is the multiprocessing module, which can be used like this:
pool = multiprocessing.Pool(4)
out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))
Note that this won't work in the interactive interpreter due to the way multiprocessing is implemented.
The second way to create a process pool is concurrent.futures.ProcessPoolExecutor:
with concurrent.futures.ProcessPoolExecutor() as pool:
out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))
This uses the multiprocessing module under the hood, so it behaves identically to the first version.
from joblib import Parallel, delayed
def process(i):
return i * i
results = Parallel(n_jobs=2)(delayed(process)(i) for i in range(10))
print(results) # prints [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The above works beautifully on my machine (Ubuntu, package joblib was pre-installed, but can be installed via pip install joblib).
Taken from https://blog.dominodatalab.com/simple-parallelization/
Edit on Mar 31, 2021: On joblib, multiprocessing, threading and asyncio
joblibin the above code usesimport multiprocessingunder the hood (and thus multiple processes, which is typically the best way to run CPU work across cores - because of the GIL)- You can let
joblibuse multiple threads instead of multiple processes, but this (or usingimport threadingdirectly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another thread - Since Python 3.7, as an alternative to
threading, you can parallelise work with asyncio, but the same advice applies like forimport threading(though in contrast to latter, only 1 thread will be used; on the plus side,asynciohas a lot of nice features which are helpful for async programming) - Using multiple processes incurs overhead. Think about it: Typically, each process needs to initialise/load everything you need to run your calculation. You need to check yourself if the above code snippet improves your wall time. Here is another one, for which I confirmed that
joblibproduces better results:
import time
from joblib import Parallel, delayed
def countdown(n):
while n>0:
n -= 1
return n
t = time.time()
for _ in range(20):
print(countdown(10**7), end=" ")
print(time.time() - t)
# takes ~10.5 seconds on medium sized Macbook Pro
t = time.time()
results = Parallel(n_jobs=2)(delayed(countdown)(10**7) for _ in range(20))
print(results)
print(time.time() - t)
# takes ~6.3 seconds on medium sized Macbook Pro
I have some code that works really well, and does exactly what I want it to do - but runs much slower than I’d like. I am trying to explore parallel processing to speed it up since it feels like a perfect case for it, but I’ve been really struggling: all examples I’ve found feel convoluted and don’t seem to work.
Basically I have two for loops that call a function. The function does a bunch of math and returns a single value. My current code is as follows, where a,b,c are constant-value arguments that don’t change between iterations. b is a list that can be really large (between 20 to 1,000 values), so I don’t know if that will impact the parallel options. My base code is below.
a = 5.
b = [5,7,1,9,4,0,3,2,6,7,8,9,3,7,0,7,5,4,6,8].
c = 2.5.
List1 = np.linspace(0,1,10).
List2 = np.linspace(2,8,20).
Result = np.zeros([len(List1),len(List2)]).
For i in enumerate(List1):
For j in enumerate(List2):
Result[i[0],j[0]] = Function(i[1],j[1],a,b,c)I’ve tried a few things using multiprocessing.pool and starmap, but they don’t seem to work. It creates the subprocesses via task manager, but they never do anything. The code normally runs in 2 minutes, but I had the multiprocessing option running for 2 hours and it was still going with no end in sight. I have no idea what it’s doing or what it is stuck on: my usual troubleshooting approach of adding print statements doesn’t seem to work - probably because it is printing in a separate process.
Things I’ve tried are:
-
Putting all arguments into a single list, using multiprocessing.Pool.map, and unpacking the arguments inside of the function
-
Using multiprocessing.starmap
My starmap attempt that ran forever looked like:
With multiprocessing.Pool(processe=6) as pool:
Results = pool.starmap(Function, (List1, List2, a,b,c))