A few points here. Firstly, (and most importantly), to time code on the GPU, you need to use either gputimeit, or you need to inject a call to wait(gpuDevice) before calling `toc`. That's because work is launched asynchronously on the GPU, and you only get accurate timings by waiting for it to finish. With those minor modifications, on my GPU, I see 0.09 seconds for the `gpuArray` method, and 0.18 seconds for the `arrayfun` version. Running a loop of GPU operations is generally inefficient, so the main gain you can get here is by pushing the loop inside the `arrayfun` function body so that that loop runs directly on the GPU. Like this: %%% Function to operate on matrices %%% function x = test_function(x,Nt) for ii = 1:Nt x = exp(-1i*(x + abs(x).^2)); end end You'll need to invoke it like `A = arrayfun(@test_function, A, Nt)`. On my GPU, this brings the `arrayfun` time down to 0.05 seconds, so about twice as fast as the plain `gpuArray` version. Answer from Edric Ellis on mathworks.com
🌐
MathWorks
mathworks.com › parallel computing toolbox › gpu computing › gpu computing in matlab
arrayfun - Apply function to each element of array on GPU - MATLAB
This MATLAB function applies a function func to each element of a gpuArray A and then concatenates the outputs from func into output gpuArray B.
🌐
MathWorks
mathworks.com › parallel computing toolbox › gpu computing › gpu computing in matlab
Improve Performance of Element-Wise MATLAB Functions on the GPU Using arrayfun - MATLAB & Simulink Example
For more information see, Run MATLAB Functions on a GPU. Because lorentz contains individual element-wise operations, performing each operation one at a time on the GPU does not yield significant performance improvements. You can improve the performance by executing all of the operations in the lorentz function at once using arrayfun.
🌐
MathWorks
mathworks.com › parallel computing toolbox › gpu computing › gpu computing in matlab
Using GPU arrayfun for Monte-Carlo Simulations - MATLAB & Simulink Example
To run the simulations on the GPU, prepare the input data on the GPU by creating a gpuArray object. ... When you call arrayfun with a GPU array and a function handle as inputs, arrayfun applies the function you specify to each element of the array. This behavior means that looping over each ...
🌐
MathWorks
mathworks.com › parallel computing toolbox › gpu computing › gpu computing in matlab
gpuArray - Array stored on GPU - MATLAB
To precompile and run purely element-wise code on gpuArray objects, use the arrayfun function. To run C++ code containing CUDA® device code or library calls, use a MEX function. For more information, see Run MEX Functions Containing CUDA Code. To run existing GPU kernels written in CUDA C++, use the MATLAB CUDAKernel interface.
🌐
MathWorks
mathworks.com › matlabcentral › answers › 232855-arrayfun-on-gpu-with-each-call-working-from-common-block-of-data
arrayfun on GPU with each call working from common block of data - MATLAB Answers - MATLAB Central
August 8, 2015 - arrayfun on the GPU cannot access the parent workspace of anonymous functions, but it can access the parent workspace for nested function handles. There's a detailed example in the documentation.
🌐
MathWorks
mathworks.com › matlab › language fundamentals › data types › structures
arrayfun - Apply function to each element of array - MATLAB
If the execution of func causes ... from MATLAB results. If the input array to accumarray is empty, then the code generator can use zero-valued inputs to predetermine output types. func must not error when its inputs are zero, or the generated code can produce unexpected errors. Refer to the usage notes and limitations in the C/C++ Code Generation section. The same usage notes and limitations apply to GPU code generation. The arrayfun function fully ...
🌐
NVIDIA Developer
developer.nvidia.com › blog › high-performance-matlab-gpu-acceleration
High-Performance MATLAB with GPU Acceleration | NVIDIA Technical Blog
August 21, 2022 - Using arrayfun, custom kernels can be written in MATLAB to further optimize performance by minimizing kernel launch overhead and supporting scalar operations and standard MATLAB syntax.
🌐
MathWorks
mathworks.com › matlabcentral › answers › 403497-indexing-arrays-for-loops-in-a-gpuarray-arrayfun-called-function
Indexing Arrays for Loops in a gpuArray/arrayfun-called Function - MATLAB Answers - MATLAB Central
June 1, 2018 - So, I suspect you need to structure your code so that either you can operate in a completely element-wise manner on A and B - and then use arrayfun, or else you fully vectorise your code so that the whole of A and B can be passed in. Sign in to comment. Sign in to answer this question. Parallel Computing Parallel Computing Toolbox GPU Computing GPU Computing in MATLAB
Find elsewhere
🌐
MathWorks
mathworks.com › matlabcentral › answers › 282191-how-do-i-use-arrayfun-on-gpu-when-the-size-of-the-output-array-doesn-t-equal-the-size-of-some-inpu
How do I use arrayfun on GPU when the size of the output array does...
May 4, 2016 - Yes, arrayfun requires input matrices to be the same size. It seems pagefun will do the job. You can use functions like shiftdim to manage multiple dimensions of inputs. Please note I don't have much experience with GPU computing in MATLAB, so I may be wrong about this.
🌐
GitHub
github.com › NVIDIA-developer-blog › code-samples › blob › master › MATLAB_arrayfun › ArrayfunArticle.m
code-samples/MATLAB_arrayfun/ArrayfunArticle.m at master · NVIDIA-developer-blog/code-samples
% |arrayfun| - that take advantage of GPU hardware, yet require no · % specialist parallel programming skills. The most advanced function, · % |arrayfun|, allows you to write your own custom kernels in the MATLAB · % language. · % · % If these techniques do not provide the performance or flexibility you ·
Author   NVIDIA-developer-blog
Top answer
1 of 2
2

The reason that you don't get any speed increase by calling matlabpool before calling arrayfun is that just the act of creating multiple workers doesn't make all code utilize these workers to perform calculations. If you want to exploit the pool of workers, you need to explicitly parallelize your code with parfor (related info here).

parfor k = 1:10
    result{k} = sum(sum(a*b));
end 

In general, arrayfun does not do any parallelization or acceleration. In fact, it's often slower than simply writing out the for loop because the explicit for loop allows for better JIT acceleration.

for k = 1:10
    result(k) = sum(sum(a * b));
end

If you want to perform the operation you've shown using the GPU, if the input data to arrayfun is a gpuarray, then it will excecute on the GPU (using the distributed version of arrayfun). The issue though is that anything performed on the GPU using arrayfun has to be element-wise operations only so that the operation on each element is independent of the operations on all other elements (making it parallelizable). In your case, it is not element-wise operations and therefore the GPU-version of arrayfun cannot be used.

As a side-note, you'll want to use parpool rather than matlabpool since the latter has been deprecated.

2 of 2
1

Core MATLAB does use threads and vector operations, but you have to vectorize the code yourself. For your example, for instance, you need to write

A = rand(1000, 1000, 100);
B = sum( sum( A, 1 ), 2 );

B is now a 1-by-1-by-100 array of the sums. I've used two sums to help you understand what's going on, if you actually wanted to sum every number in a matrix you'd go sum(A(:)), or for this batch example, sum( reshape(A, [], 100) ).

For task parallelism rather than data parallelism use parfor, batch, parfeval or some other parallel instruction.

🌐
Stack Overflow
stackoverflow.com › questions › 58579941 › matlab-arrayfun-on-gpu-with-parfor-loop
MATLAB arrayfun on GPU with parfor loop - Stack Overflow
N = 2000; dp = 0.005; p1 = [0:dp:1]; p2 = [0:dp:1]; pB = [0:dp:2]; [p1,p2,pB] = meshgrid(p1,p2,pB); p1 = gpuArray(p1); p2 = gpuArray(p2); pB = gpuArray(pB); A = zeros(N,1); parfor i = 1:N A(i) = arrayfun(@MYFUN,p1,p2,pB); end · First, I am surprised that for N=2000, the parfor almost takes the same time as the ordinary for loop (when using parfor, it seems that my MATLAB connects to 6 workers).
🌐
MathWorks
mathworks.com › parallel computing toolbox › gpu computing
Measure and Improve GPU Performance - MATLAB & Simulink
The arrayfun function on the GPU turns an element-wise MATLAB function into a custom CUDA kernel, which reduces the overhead of performing the operation. You can often use arrayfun with a subset of your code even if arrayfun does not support your entire code.
🌐
MathWorks
mathworks.com › help › parallel-computing › gpuarray.arrayfun_ja_JP.html
arrayfun - 関数を GPU 上の配列内の各要素に適用する
This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment. The arrayfun function fully supports GPU arrays. To run the function on a GPU, specify the input data as a gpuArray.
Top answer
1 of 2
2

Unfortunately, this is not supported by Parallel Computing Toolbox in R2012b. The gpuArray version of arrayfun currently does not support binding in the constant data to an anonymous function handle. Arrayfun arguments must be passed directly, and must all either be scalar or the same size.

If you could bind in the constant arguments, you would next discover that you cannot currently index into them (or perform any non-scalar operations on them).

Perhaps you might be able to build up your algorithm using supported routines such as CONV2 or FILTER2.

2 of 2
0

this is a very old post, but since I was struggeling with a similar issue, I wanted to share what I found out about this:

If you put your call of arrayfun within a function, you might be able to implement the analyze function as a nested function that has access to your constant arrays. However, this might require quite some effort in rewriting your code, because within the nested analyze function you cannot pass any full array to any other function, which means you have to rewrite everything in a way that you use only single indexed array entries of your constant arrays, e.g. in a for loop over the array. Accordingly all calls of functions like size etc. will not work and should be moved outside of analyze (at least this is the case for Matlab2015b, which I am using). Here is an example of how it can be done (not mine):

https://devblogs.nvidia.com/high-performance-matlab-gpu-acceleration/

Best,

Hans-Martin

🌐
MathWorks
mathworks.com › parallel computing toolbox › gpu computing › gpu computing in matlab
bsxfun - Binary singleton expansion function for gpuArray - MATLAB
The function arrayfun offers improved functionality compared to bsxfun. arrayfun is recommended. This function behaves similarly to the MATLAB® function bsxfun, except that the evaluation of the function happens on the GPU, not on the CPU. Any required data not already on the GPU is moved ...