If performance is a major factor you should avoid using cells, loops or cellfun/arrayfun. It's usually much quicker to use a vector operation (assuming this is possible).

The code below expands on Werner's add example with standard array loop and array operations.

The results are:

  • Cell Loop Time - 0.1679
  • Cellfun Time - 2.9973
  • Loop Array Time - 0.0465
  • Array Time - 0.0019

Code:

nTimes = 1000;
nValues = 1000;
myCell = repmat({0},1,nValues);
output = zeros(1,nValues);

% Basic operation
tic;
for k=1:nTimes
  for m=1:nValues
    output(m) = myCell{m} + 1;
  end
end
cell_loop_timeAdd=toc;    
fprintf(1,'Cell Loop Time %0.4f\n', cell_loop_timeAdd);

tic;        
for k=1:nTimes
  output = cellfun(@(in) in+1,myCell);
end
cellfun_timeAdd=toc;
fprintf(1,'Cellfun Time %0.4f\n', cellfun_timeAdd);


myData = repmat(0,1,nValues);
tic;
for k=1:nTimes
  for m=1:nValues
    output(m) = myData(m) + 1;
  end
end
loop_timeAdd=toc;
fprintf(1,'Loop Array Time %0.4f\n', loop_timeAdd);

tic;
for k=1:nTimes
    output = myData + 1;
end
array_timeAdd=toc;
fprintf(1,'Array Time %0.4f\n', array_timeAdd);
Answer from grantnz on Stack Overflow
Top answer
1 of 4
11

If performance is a major factor you should avoid using cells, loops or cellfun/arrayfun. It's usually much quicker to use a vector operation (assuming this is possible).

The code below expands on Werner's add example with standard array loop and array operations.

The results are:

  • Cell Loop Time - 0.1679
  • Cellfun Time - 2.9973
  • Loop Array Time - 0.0465
  • Array Time - 0.0019

Code:

nTimes = 1000;
nValues = 1000;
myCell = repmat({0},1,nValues);
output = zeros(1,nValues);

% Basic operation
tic;
for k=1:nTimes
  for m=1:nValues
    output(m) = myCell{m} + 1;
  end
end
cell_loop_timeAdd=toc;    
fprintf(1,'Cell Loop Time %0.4f\n', cell_loop_timeAdd);

tic;        
for k=1:nTimes
  output = cellfun(@(in) in+1,myCell);
end
cellfun_timeAdd=toc;
fprintf(1,'Cellfun Time %0.4f\n', cellfun_timeAdd);


myData = repmat(0,1,nValues);
tic;
for k=1:nTimes
  for m=1:nValues
    output(m) = myData(m) + 1;
  end
end
loop_timeAdd=toc;
fprintf(1,'Loop Array Time %0.4f\n', loop_timeAdd);

tic;
for k=1:nTimes
    output = myData + 1;
end
array_timeAdd=toc;
fprintf(1,'Array Time %0.4f\n', array_timeAdd);
2 of 4
4

I will add one answer with the results that I tested myself, but I would be glad if people contribute with their knowledge, this is just a simple test I've made.

I've tested the following conditions with cell size of 1000 and 1000 loops (results on total time, and I would probably have to run more than 1000 times, because I am having a little fluctuation on the results, but anyway, this is not a scientific article):

  • Basic operation (sum)
    • Simple for loop: 0.2663 s
    • cellfun: 9.4612 s
  • String Operation (strcmp)
    • Simple for loop: 1.3124 s
    • cellfun: 11.8099 s
  • Built-in (isempty)
    • Simple for loop: 8.9042 s
    • cellfun (string input -> see this reference): 0.0105 s
    • cellfun (fcn handle input -> see this reference): 0.9209 s
  • Non-uniform (regexp)
    • Simple for loop: 24.2157 s
    • cellfun (string input): 44.0424 s

So, it seems that cellfun with anonymous function calls are slower than a simple for loop, but if you will use a builtin matlab method, do it with cellfun and use it with the string quotation. This is not necessarily true for all cases, but at least for the tested functions.

The implemented test code (I am far from being an optimization specialist, so here is the code in case I did something wrong):

function ...
  [loop_timeAdd,cellfun_timeAdd,...
  loop_timeStr,cellfun_timeStr,...
  loop_timeBuiltIn,cellfun_timeBuiltInStrInput,...
  cellfun_timeBuiltyInFcnHandle,...
  loop_timeNonUniform,cellfun_timeNonUniform] ...
  = test_cellfun(nTimes,nCells)

myCell = repmat({0},1,nCells);
output = zeros(1,nCells);

% Basic operation
tic;
for k=1:nTimes
  for m=1:nCells
    output(m) = myCell{m} + 1;
  end
end
loop_timeAdd=toc;

tic;
for k=1:nTimes
  output = cellfun(@(in) in+1,myCell);
end
cellfun_timeAdd=toc;

% String operation
myCell = repmat({'matchStr'},1,nCells); % Add str that matches
myCell(1:2:end) = {'dontMatchStr'}; % Add another str that doesnt match
output = zeros(1,nCells);

tic;
for k=1:nTimes
  for m=1:nCells
    output(m) = strcmp(myCell{m},'matchStr');
  end
end
loop_timeStr=toc;

tic;
for k=1:nTimes
  output = cellfun(@(in) strcmp(in,'matchStr'),myCell);
end
cellfun_timeStr=toc;

% Builtin function (isempty)
myCell = cell(1,nCells); % Empty
myCell(1:2:end) = {0}; % not empty
output = zeros(1,nCells);

tic;
for k=1:nTimes
  for m=1:nCells
    output(m) = isempty(myCell{m});
  end
end
loop_timeBuiltIn=toc;

tic;
for k=1:nTimes
  output = cellfun(@isempty,myCell);
end
cellfun_timeBuiltyInFcnHandle=toc;

tic;
for k=1:nTimes
  output = cellfun('isempty',myCell);
end
cellfun_timeBuiltInStrInput=toc;

% Builtin function (isempty)
myCell = repmat({'John'},1,nCells);
myCell(1:2:end) = {'Doe'};
output = cell(1,nCells);

tic;
for k=1:nTimes
  for m=1:nCells
    output{m} = regexp(myCell{m},'John','match');
  end
end
loop_timeNonUniform=toc;

tic;
for k=1:nTimes
  output = cellfun(@(in) regexp(in,'John','match'),myCell,...
    'UniformOutput',false);
end
cellfun_timeNonUniform=toc;
Discussions

speeding up using cellfunction and arrayfun versus for-loop
But it is expected, that loops are faster: cellfun and arrayfun are mex functions, which have to call the Matlab level for each element. An exception are the commands provided as char vectors to cellfun(): 'isempty', 'islogical','isreal', 'length', 'ndims', 'prodofsize', 'size', 'isclass'. They are processed in the mex level and faster. cellfun and arrayfun are cool and allow a compact notation. But they are not designed to be faster than ... More on mathworks.com
🌐 mathworks.com
1
0
July 6, 2021
Matlab Performance Question (Nested for loops vs inbuilt functions (cellfun, circshift))
Even in cases where the cost of ... less than the "for" overhead, if you are low on memory then looping might turn out to be faster, as it can avoid pushing your memory use to the point where you are swapping. ... https://www.mathworks.com/matlabcentral/answers/362399-matlab-performance-question-nested-for-loops-vs-inbuilt-functions-cellfun-circshift... More on mathworks.com
🌐 mathworks.com
2
0
October 20, 2017
Cellfun or For loop
Tell us once you tried it :) More on reddit.com
🌐 r/matlab
6
3
September 11, 2019
Built-in functions VS explicit loops: which is faster in MATLAB?
Built-in functions VS explicit loops: which is... Learn more about cellfun, built-in, for, loop More on mathworks.com
🌐 mathworks.com
0
0
May 17, 2017
🌐
MathWorks
mathworks.com › matlabcentral › answers › 873328-speeding-up-using-cellfunction-and-arrayfun-versus-for-loop
speeding up using cellfunction and arrayfun versus for-loop - MATLAB Answers - MATLAB Central
July 6, 2021 - An exception are the commands provided ... cellfun and arrayfun are cool and allow a compact notation. But they are not designed to be faster than loops....
🌐
MathWorks
blogs.mathworks.com › loren › 2019 › 09 › 25 › which-way-to-compute-cellfun-or-for-loop
Which Way to Compute: cellfun or for-loop? » Loren on the Art of MATLAB - MATLAB & Simulink
September 25, 2019 - If you do use one of the sanctioned fix character arrays (and now scalar strings), however, you will get performance equivalent to the for-loop version. Here are the strings you can supply as the first input to cellfun and expect high performance ...
🌐
UBC Computer Science
cs.ubc.ca › ~murphyk › Software › matlabTutorial › html › speedup.html
Speedup tricks
In the last example, we used cellfun() function but there is a similar function arrayfun() that applies a function to every element of an array. When other vectorization techniques fail, this can be a better alternative than looping over every element yourself.
Find elsewhere
🌐
MathWorks
mathworks.com › matlabcentral › answers › 340649-built-in-functions-vs-explicit-loops-which-is-faster-in-matlab
Built-in functions VS explicit loops: which is faster in MATLAB? - MATLAB Answers - MATLAB Central
May 17, 2017 - I did some speed tests with implicit functions and explicit loops, both repeat 1000 times and calculate average time. First one is eTe: ... In the first test, ete is faster than for loops; in the second test, for loops are faster than cellfun.
🌐
Google Groups
groups.google.com › g › comp.soft-sys.matlab › c › DCGTm-BhGIE
"cellfun" Is Slower Than "for" Loop
January 28, 2011 - % Function Handle for the cellfun operation fun = @(x) x .* 5; ... Results: Elapsed time is 1.598640 seconds. Elapsed time is 0.861648 seconds. Looping through the cell is faster by a factor of 2. How could that be?
🌐
MathWorks
mathworks.com › matlabcentral › answers › 388205-if-function-with-cellfun-i-e-vectorized-code-instead-of-ridiculously-slow-loop
If function with cellfun (i.e. vectorized code instead of ridiculously slow loop). - MATLAB Answers - MATLAB Central
March 14, 2018 - https://www.mathworks.com/matlabcentral/answers/388205-if-function-with-cellfun-i-e-vectorized-code-instead-of-ridiculously-slow-loop#comment_545316 ... Using cellfun isn't the same as vectorizing and won't make things faster. It just hides ...
🌐
Aalto
math.aalto.fi › ~apiola › matlab › opas › TUUTOR11 › html › speedup.html
Speedup tricks
In the last example, we used cellfun() function but there is a similar function arrayfun() that applies a function to every element of an array. When other vectorization techniques fail, this can be a better alternative than looping over every element yourself, but is not always much faster.
🌐
Medium
medium.com › mathworks › which-way-to-compute-cellfun-or-for-loop-bfedfd4b46c0
Which Way to Compute: cellfun or for-loop? | by MathWorks Editor | MathWorks | Medium
November 27, 2019 - If you do use one of the sanctioned fix character arrays (and now scalar strings), however, you will get performance equivalent to the for-loop version. Here are the strings you can supply as the first input to cellfun and expect high performance ...
🌐
Reddit
reddit.com › r/matlab › investigating the speed of arrayfun vs alternatives
r/matlab on Reddit: Investigating the speed of arrayfun vs alternatives
November 10, 2016 -

I have been writing some matlab code and I've found myself using arrayfun to elegantly perform certain operations without a for loop. (similar to list comprehension in Python).

However, I started to think about the speed of it. so I made a contrived example of squaring a large matrix. (my actual uses are a bit more fancy). See the code below if you're interested

The final times were:

method time (s)
array_fun 5.1231
array_fun_fixed 5.0522
pre_loop 0.0343
non_pre_loop 0.3276
pre_loop_fun 0.5520
pre_loop_fun_each 71.9136
non_pre_loop_fun 0.8565
direct 9.1100e-04

I expected the direct would be the fastest. And I expected the preallocated loops to be faster than the dynamically sized ones. I also expected the one one where it creates the function each time to be the slowest.

And, I kind of expected for an anonymous function to be slower than direct manipulation. But what really surprised me was the (a) the HUGE slowdown of arrayfun and (b) that it was (much) slower than a loop calling the same function. It seems as though fixing the function does not matter, but arrayfun still underperforms a loop.

If anything, I would expect arrayfun to be on par with a pre-allocated loop. Assuming UniformOutput is set to false (default), Matlab knows the final size of the returned array. Furthermore, it should just be looping over it in a pretty clear manner.

Any thoughts or insights into the overhead and methods of this?

Thanks

Code

The script I ran:

clear
R = rand(1000);

% Arrafun with anonymous function
clear BLA
tic
    BLA = arrayfun(@(n) n^2,R);
T.array_fun = toc;

clear BLA
tic
    FUN = @(n) n^2;
    BLA = arrayfun(FUN,R);
T.array_fun_fixed = toc;

% Preallocated loop
clear BLA
tic
    BLA = zeros(size(R));
    for ii = 1:size(R,1)
        for jj = 1:size(R,2)
            BLA(ii,jj) = R(ii,jj)^2;
        end
    end 
T.pre_loop = toc;

% Non-Preallocated loop
clear BLA
tic
    for ii = 1:size(R,1)
        for jj = 1:size(R,2)
            BLA(ii,jj) = R(ii,jj)^2;
        end
    end 
T.non_pre_loop = toc;

% Preallocated loop with with anonymous function
clear BLA
tic
    fun = @(n) n^2;
    BLA = zeros(size(R));
    for ii = 1:size(R,1)
        for jj = 1:size(R,2)
            BLA(ii,jj) = fun(R(ii,jj));
        end
    end 
T.pre_loop_fun = toc;

% Preallocated loop with with anonymous function EACH TIME
clear BLA
tic
    BLA = zeros(size(R));
    for ii = 1:size(R,1)
        for jj = 1:size(R,2)
            fun = @(n) n^2;
            BLA(ii,jj) = fun(R(ii,jj));
        end
    end 
T.pre_loop_fun_each = toc;

% Non-Preallocated loop with with anonymous function
clear BLA
tic
    fun = @(n) n^2;
    for ii = 1:size(R,1)
        for jj = 1:size(R,2)
            BLA(ii,jj) = fun(R(ii,jj));
        end
    end 
T.non_pre_loop_fun = toc;

% Direct
clear BLA
tic
    BLA = R.^2;
T.direct = toc;
Top answer
1 of 3
5

Looking around on StackOverflow, it seems like this is recognized behavior. But that doesn't mean there aren't cases where arrayFun can perform better.

On my machine, my times are roughly twice yours for your code. But consider this example instead:

x = gpuArray(rand(100,100,3));
tic
    z = arrayfun(@(x) x^2, x);
toc

z = gpuArray(zeros(length(x),1));
tic
    for i=1:length(x(:))
        z(i) = x(i).^2;
    end
toc

Elapsed time is 0.011370 seconds.
Elapsed time is 17.763806 seconds.

Arrayfun's assumptions that allow for parallel code leads to a huge jump in performance in that use case (mostly because the for loop approach isn't great for running on the GPU). Not sure if it's even a fair comparison, frankly.

On Matlab Central, one poster suggests avoiding arrayfun if you're not planning on using a GPU: https://www.mathworks.com/matlabcentral/answers/144344-in-my-code-arrayfun-slower-than-for-loop

With the exception of being on a GPU, arrayfun will most likely often be slower than a for-loop and harder to read. It's just a less flexible more complex for-loop.

Personally, I'd recommend against it at all cost unless you're targeting a GPU.

Obviously, my for loop on a GPU is much slower, and the GPU arrayfun call is roughly the same speed as your "direct" method on my machine. For more complex examples, I'd expect the GPU approach to really show its stuff.

2 of 3
2

i enjoy these types of experiments. some comments on your benchmarking methodology.

  1. i would either use matlabs built-in 'timeit' function or take an average of many iterations of the same thing. several results appear in the noise of a standard deviation.

  2. since you're benchmarking, you want to ensure matlabs not getting in the way by trying to be helpful. it does all sorts of voodoo behind the scene to anticipate and optimize execution. that is, use 'clear all' between each test.

  3. goes without saying that you're benchmarking is at the mercy of the os, so the more you can do to reduce interrupts the better.

last, i don't think 'arrayfun' is intended for the usage you demonstrated. matrix operations make no sense for arrayfun. but imagine you had an array of 1s and 0s and you wanted to test whether a given 1 was surrounded by 0s. that's the kind of thing 'arrayfun' would make easier, albeit not necessarily faster.

🌐
MathWorks
mathworks.com › matlabcentral › answers › 2093156-how-can-i-decrease-the-runtime-for-a-function-using-cellfun-or-parfor
How can I decrease the runtime for a function (using cellfun or parfor)? - MATLAB Answers - MATLAB Central
March 11, 2024 - One FOR-loop would be faster. Then you could also avoid things like NUM2CELL: rather than forcing MATLAB to duplicate the data in lots of separate arrays, just use a simple FOR-loop and indexing into some numeric arrays.