Brave Search

Possible to speed up this gpuArray calculation with arrayfun() (or otherwise)?

mathworks.com › matlabcentral › answers › 720250-possible-to-speed-up-this-gpuarray-calculation-with-arrayfun-or-otherwise

A few points here. Firstly, (and most importantly), to time code on the GPU, you need to use either gputimeit, or you need to inject a call to wait(gpuDevice) before calling `toc`. That's because work is launched asynchronously on the GPU, and you only get accurate timings by waiting for it to finish. With those minor modifications, on my GPU, I see 0.09 seconds for the `gpuArray` method, and 0.18 seconds for the `arrayfun` version. Running a loop of GPU operations is generally inefficient, so the main gain you can get here is by pushing the loop inside the `arrayfun` function body so that that loop runs directly on the GPU. Like this: %%% Function to operate on matrices %%% function x = test_function(x,Nt) for ii = 1:Nt x = exp(-1i*(x + abs(x).^2)); end end You'll need to invoke it like `A = arrayfun(@test_function, A, Nt)`. On my GPU, this brings the `arrayfun` time down to 0.05 seconds, so about twice as fast as the plain `gpuArray` version. Answer from Edric Ellis on mathworks.com

MathWorks

mathworks.com › parallel computing toolbox › gpu computing › gpu computing in matlab

arrayfun - Apply function to each element of array on GPU - MATLAB

This MATLAB function applies a function func to each element of a gpuArray A and then concatenates the outputs from func into output gpuArray B.

MathWorks

mathworks.com › parallel computing toolbox › gpu computing › gpu computing in matlab

Improve Performance of Element-Wise MATLAB Functions on the GPU Using arrayfun - MATLAB & Simulink Example

For more information see, Run MATLAB Functions on a GPU. Because lorentz contains individual element-wise operations, performing each operation one at a time on the GPU does not yield significant performance improvements. You can improve the performance by executing all of the operations in the lorentz function at once using arrayfun.

MathWorks

mathworks.com › parallel computing toolbox › gpu computing › gpu computing in matlab

Using GPU arrayfun for Monte-Carlo Simulations - MATLAB & Simulink Example

To run the simulations on the GPU, prepare the input data on the GPU by creating a gpuArray object. ... When you call arrayfun with a GPU array and a function handle as inputs, arrayfun applies the function you specify to each element of the array. This behavior means that looping over each ...

MathWorks

mathworks.com › parallel computing toolbox › gpu computing › gpu computing in matlab

gpuArray - Array stored on GPU - MATLAB

To precompile and run purely element-wise code on gpuArray objects, use the arrayfun function. To run C++ code containing CUDA® device code or library calls, use a MEX function. For more information, see Run MEX Functions Containing CUDA Code. To run existing GPU kernels written in CUDA C++, use the MATLAB CUDAKernel interface.

MathWorks

mathworks.com › matlabcentral › answers › 720250-possible-to-speed-up-this-gpuarray-calculation-with-arrayfun-or-otherwise

Possible to speed up this gpuArray calculation with arrayfun() (or otherwise)? - MATLAB Answers - MATLAB Central

Top answer

1 of 1

1

A few points here. Firstly, (and most importantly), to time code on the GPU, you need to use either gputimeit, or you need to inject a call to wait(gpuDevice) before calling `toc`. That's because work is launched asynchronously on the GPU, and you only get accurate timings by waiting for it to finish. With those minor modifications, on my GPU, I see 0.09 seconds for the `gpuArray` method, and 0.18 seconds for the `arrayfun` version. Running a loop of GPU operations is generally inefficient, so the main gain you can get here is by pushing the loop inside the `arrayfun` function body so that that loop runs directly on the GPU. Like this: %%% Function to operate on matrices %%% function x = test_function(x,Nt) for ii = 1:Nt x = exp(-1i*(x + abs(x).^2)); end end You'll need to invoke it like `A = arrayfun(@test_function, A, Nt)`. On my GPU, this brings the `arrayfun` time down to 0.05 seconds, so about twice as fast as the plain `gpuArray` version.

MathWorks

mathworks.com › matlabcentral › answers › 232855-arrayfun-on-gpu-with-each-call-working-from-common-block-of-data

arrayfun on GPU with each call working from common block of data - MATLAB Answers - MATLAB Central

August 8, 2015 - arrayfun on the GPU cannot access the parent workspace of anonymous functions, but it can access the parent workspace for nested function handles. There's a detailed example in the documentation.

MathWorks

mathworks.com › matlab › language fundamentals › data types › structures

arrayfun - Apply function to each element of array - MATLAB

If the execution of func causes ... from MATLAB results. If the input array to accumarray is empty, then the code generator can use zero-valued inputs to predetermine output types. func must not error when its inputs are zero, or the generated code can produce unexpected errors. Refer to the usage notes and limitations in the C/C++ Code Generation section. The same usage notes and limitations apply to GPU code generation. The arrayfun function fully ...

NVIDIA Developer

developer.nvidia.com › blog › high-performance-matlab-gpu-acceleration

High-Performance MATLAB with GPU Acceleration | NVIDIA Technical Blog

August 21, 2022 - Using arrayfun, custom kernels can be written in MATLAB to further optimize performance by minimizing kernel launch overhead and supporting scalar operations and standard MATLAB syntax.

MathWorks

mathworks.com › matlabcentral › answers › 403497-indexing-arrays-for-loops-in-a-gpuarray-arrayfun-called-function

Indexing Arrays for Loops in a gpuArray/arrayfun-called Function - MATLAB Answers - MATLAB Central

June 1, 2018 - So, I suspect you need to structure your code so that either you can operate in a completely element-wise manner on A and B - and then use arrayfun, or else you fully vectorise your code so that the whole of A and B can be passed in. Sign in to comment. Sign in to answer this question. Parallel Computing Parallel Computing Toolbox GPU Computing GPU Computing in MATLAB

Find elsewhere

Google Bing Mojeek

MathWorks

mathworks.com › matlabcentral › answers › 282191-how-do-i-use-arrayfun-on-gpu-when-the-size-of-the-output-array-doesn-t-equal-the-size-of-some-inpu

How do I use arrayfun on GPU when the size of the output array does...

May 4, 2016 - Yes, arrayfun requires input matrices to be the same size. It seems pagefun will do the job. You can use functions like shiftdim to manage multiple dimensions of inputs. Please note I don't have much experience with GPU computing in MATLAB, so I may be wrong about this.

GitHub

github.com › NVIDIA-developer-blog › code-samples › blob › master › MATLAB_arrayfun › ArrayfunArticle.m

code-samples/MATLAB_arrayfun/ArrayfunArticle.m at master · NVIDIA-developer-blog/code-samples

% |arrayfun| - that take advantage of GPU hardware, yet require no · % specialist parallel programming skills. The most advanced function, · % |arrayfun|, allows you to write your own custom kernels in the MATLAB · % language. · % · % If these techniques do not provide the performance or flexibility you ·

Author NVIDIA-developer-blog

Stack Overflow

stackoverflow.com › questions › 41916485 › why-does-arrayfun-only-use-a-single-core

matlab - Why does arrayfun only use a single core? - Stack Overflow

Top answer

1 of 2

2

The reason that you don't get any speed increase by calling matlabpool before calling arrayfun is that just the act of creating multiple workers doesn't make all code utilize these workers to perform calculations. If you want to exploit the pool of workers, you need to explicitly parallelize your code with parfor (related info here).

parfor k = 1:10
    result{k} = sum(sum(a*b));
end

In general, arrayfun does not do any parallelization or acceleration. In fact, it's often slower than simply writing out the for loop because the explicit for loop allows for better JIT acceleration.

for k = 1:10
    result(k) = sum(sum(a * b));
end

If you want to perform the operation you've shown using the GPU, if the input data to arrayfun is a gpuarray, then it will excecute on the GPU (using the distributed version of arrayfun). The issue though is that anything performed on the GPU using arrayfun has to be element-wise operations only so that the operation on each element is independent of the operations on all other elements (making it parallelizable). In your case, it is not element-wise operations and therefore the GPU-version of arrayfun cannot be used.

As a side-note, you'll want to use parpool rather than matlabpool since the latter has been deprecated.

2 of 2

1

Core MATLAB does use threads and vector operations, but you have to vectorize the code yourself. For your example, for instance, you need to write

A = rand(1000, 1000, 100);
B = sum( sum( A, 1 ), 2 );

B is now a 1-by-1-by-100 array of the sums. I've used two sums to help you understand what's going on, if you actually wanted to sum every number in a matrix you'd go sum(A(:)), or for this batch example, sum( reshape(A, [], 100) ).

For task parallelism rather than data parallelism use parfor, batch, parfeval or some other parallel instruction.

MathWorks

mathworks.com › matlabcentral › answers › 40104-how-to-run-arrayfun-with-a-gpu

How to run arrayfun with a GPU? - MATLAB Answers - MATLAB Central

Top answer

1 of 1

2

Inside arrayfun, one can only perform computations with scalars, so vector and matrix computations are not supported. In particular, one cannot create random matrices. Having said that, you can easily write loops, create multiple random numbers, etc., so the equivalent functionality of the example code would be achieved via: function highvalue = getHighRandValue(outerbound) highvalue = -inf; for i = 1:outerbound highvalue = max(highvalue, randn()); end end Best, Narfi

MathWorks

mathworks.com › matlabcentral › answers › 303822-parallel-matrix-operations-on-gpu-using-arrayfun-why-is-it-slower-than-looping-and-or-is-there-a-b

Parallel matrix operations on GPU using arrayfun: why is it slower than looping, and/or is there a better method for coding t... - MATLAB Answers - MATLAB Central

Top answer

1 of 1

1

|arrayfun| IS just a loop, unless the inputs are |gpuArrays|. Your input isn't a |gpuArray|, because you've passed all your |gpuArrays| as up-values. What you're doing is just simple algebra with scalar expansion: corr1 = reshape(corr1, [], [], length(corr1)); corr2 = reshape(corr2, [], [], length(corr2)); imfinal = sum( arrayfun(@calculateCorrImage, im, corr1, corr2), 3 );

Stack Overflow

stackoverflow.com › questions › 58579941 › matlab-arrayfun-on-gpu-with-parfor-loop

MATLAB arrayfun on GPU with parfor loop - Stack Overflow

N = 2000; dp = 0.005; p1 = [0:dp:1]; p2 = [0:dp:1]; pB = [0:dp:2]; [p1,p2,pB] = meshgrid(p1,p2,pB); p1 = gpuArray(p1); p2 = gpuArray(p2); pB = gpuArray(pB); A = zeros(N,1); parfor i = 1:N A(i) = arrayfun(@MYFUN,p1,p2,pB); end · First, I am surprised that for N=2000, the parfor almost takes the same time as the ordinary for loop (when using parfor, it seems that my MATLAB connects to 6 workers).

MathWorks

mathworks.com › parallel computing toolbox › gpu computing

Measure and Improve GPU Performance - MATLAB & Simulink

The arrayfun function on the GPU turns an element-wise MATLAB function into a custom CUDA kernel, which reduces the overhead of performing the operation. You can often use arrayfun with a subset of your code even if arrayfun does not support your entire code.

MathWorks

mathworks.com › help › parallel-computing › gpuarray.arrayfun_ja_JP.html

arrayfun - 関数を GPU 上の配列内の各要素に適用する

This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment. The arrayfun function fully supports GPU arrays. To run the function on a GPU, specify the input data as a gpuArray.

Stack Overflow

stackoverflow.com › questions › 13236509 › usage-of-anonymous-functions-in-arrayfun-with-gpu-acceleration-matlab

cuda - Usage of anonymous functions in arrayfun with GPU acceleration (Matlab) - Stack Overflow

Top answer

1 of 2

2

Unfortunately, this is not supported by Parallel Computing Toolbox in R2012b. The gpuArray version of arrayfun currently does not support binding in the constant data to an anonymous function handle. Arrayfun arguments must be passed directly, and must all either be scalar or the same size.

If you could bind in the constant arguments, you would next discover that you cannot currently index into them (or perform any non-scalar operations on them).

Perhaps you might be able to build up your algorithm using supported routines such as CONV2 or FILTER2.

2 of 2

0

this is a very old post, but since I was struggeling with a similar issue, I wanted to share what I found out about this:

If you put your call of arrayfun within a function, you might be able to implement the analyze function as a nested function that has access to your constant arrays. However, this might require quite some effort in rewriting your code, because within the nested analyze function you cannot pass any full array to any other function, which means you have to rewrite everything in a way that you use only single indexed array entries of your constant arrays, e.g. in a for loop over the array. Accordingly all calls of functions like size etc. will not work and should be moved outside of analyze (at least this is the case for Matlab2015b, which I am using). Here is an example of how it can be done (not mine):

https://devblogs.nvidia.com/high-performance-matlab-gpu-acceleration/

Best,

Hans-Martin

MathWorks

mathworks.com › parallel computing toolbox › gpu computing › gpu computing in matlab

bsxfun - Binary singleton expansion function for gpuArray - MATLAB

The function arrayfun offers improved functionality compared to bsxfun. arrayfun is recommended. This function behaves similarly to the MATLAB® function bsxfun, except that the evaluation of the function happens on the GPU, not on the CPU. Any required data not already on the GPU is moved ...