The reason that you don't get any speed increase by calling matlabpool before calling arrayfun is that just the act of creating multiple workers doesn't make all code utilize these workers to perform calculations. If you want to exploit the pool of workers, you need to explicitly parallelize your code with parfor (related info here).
parfor k = 1:10
result{k} = sum(sum(a*b));
end
In general, arrayfun does not do any parallelization or acceleration. In fact, it's often slower than simply writing out the for loop because the explicit for loop allows for better JIT acceleration.
for k = 1:10
result(k) = sum(sum(a * b));
end
If you want to perform the operation you've shown using the GPU, if the input data to arrayfun is a gpuarray, then it will excecute on the GPU (using the distributed version of arrayfun). The issue though is that anything performed on the GPU using arrayfun has to be element-wise operations only so that the operation on each element is independent of the operations on all other elements (making it parallelizable). In your case, it is not element-wise operations and therefore the GPU-version of arrayfun cannot be used.
As a side-note, you'll want to use parpool rather than matlabpool since the latter has been deprecated.
Core MATLAB does use threads and vector operations, but you have to vectorize the code yourself. For your example, for instance, you need to write
A = rand(1000, 1000, 100);
B = sum( sum( A, 1 ), 2 );
B is now a 1-by-1-by-100 array of the sums. I've used two sums to help you understand what's going on, if you actually wanted to sum every number in a matrix you'd go sum(A(:)), or for this batch example, sum( reshape(A, [], 100) ).
For task parallelism rather than data parallelism use parfor, batch, parfeval or some other parallel instruction.
Unfortunately, this is not supported by Parallel Computing Toolbox in R2012b. The gpuArray version of arrayfun currently does not support binding in the constant data to an anonymous function handle. Arrayfun arguments must be passed directly, and must all either be scalar or the same size.
If you could bind in the constant arguments, you would next discover that you cannot currently index into them (or perform any non-scalar operations on them).
Perhaps you might be able to build up your algorithm using supported routines such as CONV2 or FILTER2.
this is a very old post, but since I was struggeling with a similar issue, I wanted to share what I found out about this:
If you put your call of arrayfun within a function, you might be able to implement the analyze function as a nested function that has access to your constant arrays. However, this might require quite some effort in rewriting your code, because within the nested analyze function you cannot pass any full array to any other function, which means you have to rewrite everything in a way that you use only single indexed array entries of your constant arrays, e.g. in a for loop over the array. Accordingly all calls of functions like size etc. will not work and should be moved outside of analyze (at least this is the case for Matlab2015b, which I am using).
Here is an example of how it can be done (not mine):
https://devblogs.nvidia.com/high-performance-matlab-gpu-acceleration/
Best,
Hans-Martin