matlab cellfun faster than for loop python

stackoverflow.com › questions › 18284027 › cellfun-versus-simple-matlab-loop-performance

Cellfun versus Simple Matlab Loop performance - Stack Overflow

reddit.com › r/matlab › cellfun... shall i use it?

1 of 4

If performance is a major factor you should avoid using cells, loops or cellfun/arrayfun. It's usually much quicker to use a vector operation (assuming this is possible).

The code below expands on Werner's add example with standard array loop and array operations.

The results are:

Cell Loop Time - 0.1679
Cellfun Time - 2.9973
Loop Array Time - 0.0465
Array Time - 0.0019

Code:

nTimes = 1000;
nValues = 1000;
myCell = repmat({0},1,nValues);
output = zeros(1,nValues);

% Basic operation
tic;
for k=1:nTimes
  for m=1:nValues
    output(m) = myCell{m} + 1;
  end
end
cell_loop_timeAdd=toc;    
fprintf(1,'Cell Loop Time %0.4f\n', cell_loop_timeAdd);

tic;        
for k=1:nTimes
  output = cellfun(@(in) in+1,myCell);
end
cellfun_timeAdd=toc;
fprintf(1,'Cellfun Time %0.4f\n', cellfun_timeAdd);


myData = repmat(0,1,nValues);
tic;
for k=1:nTimes
  for m=1:nValues
    output(m) = myData(m) + 1;
  end
end
loop_timeAdd=toc;
fprintf(1,'Loop Array Time %0.4f\n', loop_timeAdd);

tic;
for k=1:nTimes
    output = myData + 1;
end
array_timeAdd=toc;
fprintf(1,'Array Time %0.4f\n', array_timeAdd);

2 of 4

I will add one answer with the results that I tested myself, but I would be glad if people contribute with their knowledge, this is just a simple test I've made.

I've tested the following conditions with cell size of 1000 and 1000 loops (results on total time, and I would probably have to run more than 1000 times, because I am having a little fluctuation on the results, but anyway, this is not a scientific article):

Basic operation (sum)
- Simple for loop: 0.2663 s
- cellfun: 9.4612 s
String Operation (strcmp)
- Simple for loop: 1.3124 s
- cellfun: 11.8099 s
Built-in (isempty)
- Simple for loop: 8.9042 s
- cellfun (string input -> see this reference): 0.0105 s
- cellfun (fcn handle input -> see this reference): 0.9209 s
Non-uniform (regexp)
- Simple for loop: 24.2157 s
- cellfun (string input): 44.0424 s

So, it seems that cellfun with anonymous function calls are slower than a simple for loop, but if you will use a builtin matlab method, do it with cellfun and use it with the string quotation. This is not necessarily true for all cases, but at least for the tested functions.

The implemented test code (I am far from being an optimization specialist, so here is the code in case I did something wrong):

function ...
  [loop_timeAdd,cellfun_timeAdd,...
  loop_timeStr,cellfun_timeStr,...
  loop_timeBuiltIn,cellfun_timeBuiltInStrInput,...
  cellfun_timeBuiltyInFcnHandle,...
  loop_timeNonUniform,cellfun_timeNonUniform] ...
  = test_cellfun(nTimes,nCells)

myCell = repmat({0},1,nCells);
output = zeros(1,nCells);

% Basic operation
tic;
for k=1:nTimes
  for m=1:nCells
    output(m) = myCell{m} + 1;
  end
end
loop_timeAdd=toc;

tic;
for k=1:nTimes
  output = cellfun(@(in) in+1,myCell);
end
cellfun_timeAdd=toc;

% String operation
myCell = repmat({'matchStr'},1,nCells); % Add str that matches
myCell(1:2:end) = {'dontMatchStr'}; % Add another str that doesnt match
output = zeros(1,nCells);

tic;
for k=1:nTimes
  for m=1:nCells
    output(m) = strcmp(myCell{m},'matchStr');
  end
end
loop_timeStr=toc;

tic;
for k=1:nTimes
  output = cellfun(@(in) strcmp(in,'matchStr'),myCell);
end
cellfun_timeStr=toc;

% Builtin function (isempty)
myCell = cell(1,nCells); % Empty
myCell(1:2:end) = {0}; % not empty
output = zeros(1,nCells);

tic;
for k=1:nTimes
  for m=1:nCells
    output(m) = isempty(myCell{m});
  end
end
loop_timeBuiltIn=toc;

tic;
for k=1:nTimes
  output = cellfun(@isempty,myCell);
end
cellfun_timeBuiltyInFcnHandle=toc;

tic;
for k=1:nTimes
  output = cellfun('isempty',myCell);
end
cellfun_timeBuiltInStrInput=toc;

% Builtin function (isempty)
myCell = repmat({'John'},1,nCells);
myCell(1:2:end) = {'Doe'};
output = cell(1,nCells);

tic;
for k=1:nTimes
  for m=1:nCells
    output{m} = regexp(myCell{m},'John','match');
  end
end
loop_timeNonUniform=toc;

tic;
for k=1:nTimes
  output = cellfun(@(in) regexp(in,'John','match'),myCell,...
    'UniformOutput',false);
end
cellfun_timeNonUniform=toc;

r/matlab on Reddit: Cellfun... shall I use it?

May 8, 2023 -

Hi everyone,

Lately I am using functions like cellfun, arrayfun, etc. all the time to avoid writing loops. I was wondering if this is a good practise.

Is it better or simple loop which is much easier to write and read is a better approach?

In addition a for loop can run in parallel later.

Cellfun and arrayfun are generally much slower than simple loop. Their main usecase is to simplify code where performance is not an issue. Better alternative is vectorization which is both fast and concise. (For gpuArrays, arrayfun is faster but that is a special case).

1 of 6

2 of 6

Like others have said, there's no huge performance difference between for loops and cellfun. Use whichever is clearer. I generally use them when I'm taking advantage of a simple thing like isnan, isempty, etc. When you have to start dealing with the anonymous functions in them, it can get a little messy, and that's typically where I start moving towards for loops. Sometimes I will break that rule and use cellfun purely for the purpose of not having to store intermediate values in a for loop.

mathworks.com › matlabcentral › answers › 42335-array-cellfun-vs-for-loop

array/cellfun vs. for loop - MATLAB Answers - MATLAB Central

For loops are usually faster than arrayfun or cellfun, as the for loop does not need to invoke the function handle each time. The for loop also has opportunities for optimizations between statements that the arrayfun or cellfun would not have. arrayfun() or cellfun() can be faster to write the code for, as they are a higher level concept. Not always, though: some of the twists one has to go through to create the behaviour as an anonymous function can be messy.

UBC Computer Science

cs.ubc.ca › ~murphyk › Software › matlabTutorial › html › speedup.html

Speedup tricks

In the last example, we used cellfun() function but there is a similar function arrayfun() that applies a function to every element of an array. When other vectorization techniques fail, this can be a better alternative than looping over every element yourself.

reddit.com › r/matlab › investigating the speed of arrayfun vs alternatives

r/matlab on Reddit: Investigating the speed of arrayfun vs alternatives

November 10, 2016 -

I have been writing some matlab code and I've found myself using arrayfun to elegantly perform certain operations without a for loop. (similar to list comprehension in Python).

However, I started to think about the speed of it. so I made a contrived example of squaring a large matrix. (my actual uses are a bit more fancy). See the code below if you're interested

The final times were:

method	time (s)
array_fun	5.1231
array_fun_fixed	5.0522
pre_loop	0.0343
non_pre_loop	0.3276
pre_loop_fun	0.5520
pre_loop_fun_each	71.9136
non_pre_loop_fun	0.8565
direct	9.1100e-04

I expected the direct would be the fastest. And I expected the preallocated loops to be faster than the dynamically sized ones. I also expected the one one where it creates the function each time to be the slowest.

And, I kind of expected for an anonymous function to be slower than direct manipulation. But what really surprised me was the (a) the HUGE slowdown of arrayfun and (b) that it was (much) slower than a loop calling the same function. It seems as though fixing the function does not matter, but arrayfun still underperforms a loop.

If anything, I would expect arrayfun to be on par with a pre-allocated loop. Assuming UniformOutput is set to false (default), Matlab knows the final size of the returned array. Furthermore, it should just be looping over it in a pretty clear manner.

Any thoughts or insights into the overhead and methods of this?

Thanks

Code

The script I ran:

clear
R = rand(1000);

% Arrafun with anonymous function
clear BLA
tic
    BLA = arrayfun(@(n) n^2,R);
T.array_fun = toc;

clear BLA
tic
    FUN = @(n) n^2;
    BLA = arrayfun(FUN,R);
T.array_fun_fixed = toc;

% Preallocated loop
clear BLA
tic
    BLA = zeros(size(R));
    for ii = 1:size(R,1)
        for jj = 1:size(R,2)
            BLA(ii,jj) = R(ii,jj)^2;
        end
    end 
T.pre_loop = toc;

% Non-Preallocated loop
clear BLA
tic
    for ii = 1:size(R,1)
        for jj = 1:size(R,2)
            BLA(ii,jj) = R(ii,jj)^2;
        end
    end 
T.non_pre_loop = toc;

% Preallocated loop with with anonymous function
clear BLA
tic
    fun = @(n) n^2;
    BLA = zeros(size(R));
    for ii = 1:size(R,1)
        for jj = 1:size(R,2)
            BLA(ii,jj) = fun(R(ii,jj));
        end
    end 
T.pre_loop_fun = toc;

% Preallocated loop with with anonymous function EACH TIME
clear BLA
tic
    BLA = zeros(size(R));
    for ii = 1:size(R,1)
        for jj = 1:size(R,2)
            fun = @(n) n^2;
            BLA(ii,jj) = fun(R(ii,jj));
        end
    end 
T.pre_loop_fun_each = toc;

% Non-Preallocated loop with with anonymous function
clear BLA
tic
    fun = @(n) n^2;
    for ii = 1:size(R,1)
        for jj = 1:size(R,2)
            BLA(ii,jj) = fun(R(ii,jj));
        end
    end 
T.non_pre_loop_fun = toc;

% Direct
clear BLA
tic
    BLA = R.^2;
T.direct = toc;

1 of 3

Looking around on StackOverflow, it seems like this is recognized behavior. But that doesn't mean there aren't cases where arrayFun can perform better.

On my machine, my times are roughly twice yours for your code. But consider this example instead:

x = gpuArray(rand(100,100,3));
tic
    z = arrayfun(@(x) x^2, x);
toc

z = gpuArray(zeros(length(x),1));
tic
    for i=1:length(x(:))
        z(i) = x(i).^2;
    end
toc

Elapsed time is 0.011370 seconds.
Elapsed time is 17.763806 seconds.

Arrayfun's assumptions that allow for parallel code leads to a huge jump in performance in that use case (mostly because the for loop approach isn't great for running on the GPU). Not sure if it's even a fair comparison, frankly.

On Matlab Central, one poster suggests avoiding arrayfun if you're not planning on using a GPU: https://www.mathworks.com/matlabcentral/answers/144344-in-my-code-arrayfun-slower-than-for-loop

With the exception of being on a GPU, arrayfun will most likely often be slower than a for-loop and harder to read. It's just a less flexible more complex for-loop.

Personally, I'd recommend against it at all cost unless you're targeting a GPU.

Obviously, my for loop on a GPU is much slower, and the GPU arrayfun call is roughly the same speed as your "direct" method on my machine. For more complex examples, I'd expect the GPU approach to really show its stuff.

2 of 3

i enjoy these types of experiments. some comments on your benchmarking methodology.

i would either use matlabs built-in 'timeit' function or take an average of many iterations of the same thing. several results appear in the noise of a standard deviation.
since you're benchmarking, you want to ensure matlabs not getting in the way by trying to be helpful. it does all sorts of voodoo behind the scene to anticipate and optimize execution. that is, use 'clear all' between each test.
goes without saying that you're benchmarking is at the mercy of the os, so the more you can do to reduce interrupts the better.

last, i don't think 'arrayfun' is intended for the usage you demonstrated. matrix operations make no sense for arrayfun. but imagine you had an array of 1s and 0s and you wanted to test whether a given 1 was surrounded by 0s. that's the kind of thing 'arrayfun' would make easier, albeit not necessarily faster.

mathworks.com › matlabcentral › answers › 873328-speeding-up-using-cellfunction-and-arrayfun-versus-for-loop

speeding up using cellfunction and arrayfun versus for-loop - MATLAB Answers - MATLAB Central

July 6, 2021 - An exception are the commands provided ... cellfun and arrayfun are cool and allow a compact notation. But they are not designed to be faster than loops....

blogs.mathworks.com › loren › 2019 › 09 › 25 › which-way-to-compute-cellfun-or-for-loop

Which Way to Compute: cellfun or for-loop? » Loren on the Art of MATLAB - MATLAB & Simulink

September 25, 2019 - If you do use one of the sanctioned fix character arrays (and now scalar strings), however, you will get performance equivalent to the for-loop version. Here are the strings you can supply as the first input to cellfun and expect high performance ...

reddit.com › r/matlab › cellfun or for loop

r/matlab on Reddit: Cellfun or For loop

September 11, 2019 -

Hi I'm currently doing a assignment and I'm just wonder which one would be quicker. Cellfun or for loop

Tell us once you tried it :)

1 of 5

2 of 5

I’m predicting minor, if any speed difference. The reason to sometimes prefer cellfun is that it more concisely describes the intention of doing the same functional thing to each element in your cell array than a for loop as you are reading code.

stackoverflow.com › questions › 76196836 › is-anything-faster-than-or-equally-as-fast-as-matlabs-cellfun-function-in-py

arrays - Is anything faster than or equally as fast as MATLAB's cellfun( ) function in Python for complex matrix operations? - Stack Overflow

May 8, 2023 - When I do it using MATLAB's cellfun( ) function, it takes 0.1 seconds. In Python, whether I use loop or map function or list comprehension, the same operation takes around 0.6 seconds.

Find elsewhere

Google Bing Mojeek

mathworks.com › matlabcentral › answers › 340649-built-in-functions-vs-explicit-loops-which-is-faster-in-matlab

Built-in functions VS explicit loops: which is faster in MATLAB? - MATLAB Answers - MATLAB Central

May 17, 2017 - I did some speed tests with implicit functions and explicit loops, both repeat 1000 times and calculate average time. First one is eTe: ... In the first test, ete is faster than for loops; in the second test, for loops are faster than cellfun.

Google Groups

groups.google.com › g › comp.soft-sys.matlab › c › DCGTm-BhGIE

"cellfun" Is Slower Than "for" Loop

January 28, 2011 - % Function Handle for the cellfun operation fun = @(x) x .* 5; ... Results: Elapsed time is 1.598640 seconds. Elapsed time is 0.861648 seconds. Looping through the cell is faster by a factor of 2. How could that be?

mathworks.com › matlabcentral › answers › 362399-matlab-performance-question-nested-for-loops-vs-inbuilt-functions-cellfun-circshift

Matlab Performance Question (Nested for loops vs inbuilt functions (cellfun, circshift)) - MATLAB Answers - MATLAB Central

Well, I got a few minutes at the airport. Try this in your comparison: tic ; s = cellfun('length', t) ; v = cumsum(s) ; e_cw = double([t{:}]) ; e_cw = [e_cw; e_cw(2:end),0] ; e_cw(2,v) = e_cw(1,[0,v(1:end-1)]+1) ; toc (your move, Andrei ;) )

mathworks.com › matlabcentral › answers › 1449189-is-there-a-better-way-to-use-cellfun-with-arguments-and-is-it-better-than-for-loop

Is there a better way to use cellfun with arguments? and is it better than for-loop? - MATLAB Answers - MATLAB Central

sizes = cellfun('size', cellArr, 1); Using the CHAR vector arguments is mentioned in the documentation as "backward compatibility". This calling style is the only one, in which cellfun is faster than a simple loop.

Medium

medium.com › mathworks › which-way-to-compute-cellfun-or-for-loop-bfedfd4b46c0

Which Way to Compute: cellfun or for-loop? | by MathWorks Editor | MathWorks | Medium

November 27, 2019 - If you do use one of the sanctioned fix character arrays (and now scalar strings), however, you will get performance equivalent to the for-loop version. Here are the strings you can supply as the first input to cellfun and expect high performance ...

mathworks.com › matlabcentral › answers › 119498-use-cellfun-instead-of-for-loop

Use cellfun instead of for loop - MATLAB Answers - MATLAB Central

February 28, 2014 - Use cellfun instead of for loop. Learn more about cellfun

mathworks.com › matlabcentral › answers › 388205-if-function-with-cellfun-i-e-vectorized-code-instead-of-ridiculously-slow-loop

If function with cellfun (i.e. vectorized code instead of ridiculously slow loop). - MATLAB Answers - MATLAB Central

March 14, 2018 - https://www.mathworks.com/matlabcentral/answers/388205-if-function-with-cellfun-i-e-vectorized-code-instead-of-ridiculously-slow-loop#comment_545316 ... Using cellfun isn't the same as vectorizing and won't make things faster. It just hides ...

stackoverflow.com › questions › 16143314 › matlab-arrayfun-cellfun-spfun-and-structfun-vs-simple-for-loop

Matlab: arrayfun, cellfun, spfun and structfun vs. simple for-loop - Stack Overflow

It really depends on what you call 'performance' :)

If you mean minimum execution time, well, sometimes *fun are faster (for example, cellfun('isempty', ...); (yes, string argument!) for sure beats the loop version). Sometimes a loop is faster. If you're on a Matlab version < 2006, go for the *fun functions by default. If you're on anything more recent, go for the loops by default. You'll still always have to profile to find out which one's faster.

As noted by Amro, if you have a GPU capable of doing FP arithmetic, and a recent version of Matlab that supports GpGPU, then a call to arrayfun for gpuArray inputs will be massively-parallelized. However, no general statements can be made regardnig execution time; for smaller arrays, or absolutely humungous ones, the overhead of copying everything over to the GPU might undo any benefit of parallelizing the computations, so...profiling is really the only way to know for sure.

If you mean minimum coding time, then I'd say it's usually faster to code in terms of *fun as long as the operations are simple. For anything complex it's usually better to go for the loop.

If you mean optimum readability and thus minimum time required for maintenance and implementation of changes in a professional context, for sure, go for the loop.

At this point in time, there's not really a clear-cut simple answer to your question :)

mathworks.com › matlabcentral › answers › 718275-use-arrayfun-and-cellfun-to-avoid-for-loops

Use arrayfun and cellfun to avoid for-loops - MATLAB Answers - MATLAB Central

January 16, 2021 - Use arrayfun and cellfun to avoid for-loops. Learn more about cellfun, arrayfun, avoid loops, for loop, performance, runtime optimizing MATLAB

stackoverflow.com › questions › 12522888 › arrayfun-can-be-significantly-slower-than-an-explicit-loop-in-matlab-why

arrays - arrayfun can be significantly slower than an explicit loop in matlab. Why? - Stack Overflow

1 of 2

102

You can get the idea by running other versions of your code. Consider explicitly writing out the computations, instead of using a function in your loop

tic
Soln3 = ones(T, N);
for t = 1:T
    for n = 1:N
        Soln3(t, n) = 3*x(t, n)^2 + 2*x(t, n) - 1;
    end
end
toc

Time to compute on my computer:

Soln1  1.158446 seconds.
Soln2  10.392475 seconds.
Soln3  0.239023 seconds.
Oli    0.010672 seconds.

Now, while the fully 'vectorized' solution is clearly the fastest, you can see that defining a function to be called for every x entry is a huge overhead. Just explicitly writing out the computation got us factor 5 speedup. I guess this shows that MATLABs JIT compiler does not support inline functions. According to the answer by gnovice there, it is actually better to write a normal function rather than an anonymous one. Try it.

Next step - remove (vectorize) the inner loop:

tic
Soln4 = ones(T, N);
for t = 1:T
    Soln4(t, :) = 3*x(t, :).^2 + 2*x(t, :) - 1;
end
toc

Soln4  0.053926 seconds.

Another factor 5 speedup: there is something in those statements saying you should avoid loops in MATLAB... Or is there really? Have a look at this then

tic
Soln5 = ones(T, N);
for n = 1:N
    Soln5(:, n) = 3*x(:, n).^2 + 2*x(:, n) - 1;
end
toc

Soln5   0.013875 seconds.

Much closer to the 'fully' vectorized version. Matlab stores matrices column-wise. You should always (when possible) structure your computations to be vectorized 'column-wise'.

We can go back to Soln3 now. The loop order there is 'row-wise'. Lets change it

tic
Soln6 = ones(T, N);
for n = 1:N
    for t = 1:T
        Soln6(t, n) = 3*x(t, n)^2 + 2*x(t, n) - 1;
    end
end
toc

Soln6  0.201661 seconds.

Better, but still very bad. Single loop - good. Double loop - bad. I guess MATLAB did some decent work on improving the performance of loops, but still the loop overhead is there. If you would have some heavier work inside, you would not notice. But since this computation is memory bandwidth bounded, you do see the loop overhead. And you will even more clearly see the overhead of calling Func1 there.

So what's up with arrayfun? No function inlinig there either, so a lot of overhead. But why so much worse than a double nested loop? Actually, the topic of using cellfun/arrayfun has been extensively discussed many times (e.g. here, here, here and here). These functions are simply slow, you can not use them for such fine-grain computations. You can use them for code brevity and fancy conversions between cells and arrays. But the function needs to be heavier than what you wrote:

tic
Soln7 = arrayfun(@(a)(3*x(:,a).^2 + 2*x(:,a) - 1), 1:N, 'UniformOutput', false);
toc

Soln7  0.016786 seconds.

Note that Soln7 is a cell now.. sometimes that is useful. Code performance is quite good now, and if you need cell as output, you do not need to convert your matrix after you have used the fully vectorized solution.

So why is arrayfun slower than a simple loop structure? Unfortunately, it is impossible for us to say for sure, since there is no source code available. You can only guess that since arrayfun is a general purpose function, which handles all kinds of different data structures and arguments, it is not necessarily very fast in simple cases, which you can directly express as loop nests. Where does the overhead come from we can not know. Could the overhead be avoided by a better implementation? Maybe not. But unfortunately the only thing we can do is study the performance to identify the cases, in which it works well, and those, where it doesn't.

Update Since the execution time of this test is short, to get reliable results I added now a loop around the tests:

for i=1:1000
   % compute
end

Some times given below:

Soln5   8.192912 seconds.
Soln7  13.419675 seconds.
Oli     8.089113 seconds.

You see that the arrayfun is still bad, but at least not three orders of magnitude worse than the vectorized solution. On the other hand, a single loop with column-wise computations is as fast as the fully vectorized version... That was all done on a single CPU. Results for Soln5 and Soln7 do not change if I switch to 2 cores - In Soln5 I would have to use a parfor to get it parallelized. Forget about speedup... Soln7 does not run in parallel because arrayfun does not run in parallel. Olis vectorized version on the other hand:

Oli  5.508085 seconds.

2 of 2

-8

That because!!!!

x = randn(T, N);

is not gpuarray type;

All you need to do is

x = randn(T, N,'gpuArray');