Videos
The space usage is at most for all Strassen-like algorithms (i.e. those based on upper bounding the rank of matrix multiplication algebraically). See Space complexity of Coppersmith–Winograd algorithm
However, I realized in my previous answer that I did not explain why the space usage is ... so here goes something hand-wavy. Consider what a Strassen-like algorithm does. It starts from a fixed algorithm for
matrix multiplication that uses
multiplications for some constant
. In particular, this algorithm (whatever it is) can WLOG be written so that:
It computes
different matrices
which multiply entries of the first matrix
by various scalars and
matrices
from the second matrix
of a similar form,
It multiplies those linear combinations
, then
It multiplies entries of
by various scalars, then adds all these matrices up entrywise to obtain
.
(This is a so-called "bilinear" algorithm, but it turns out that every "algebraic" matrix multiplication algorithm can be written in this way.) For each , this algorithm only has to store the current product
and the current value of
(initially set to all-zeroes) in memory at any given point, so the space usage is
.
Given this finite algorithm, it is then extended to arbitrary matrices, by breaking the large matrices into
blocks of dimensions
, applying the finite
algorithm to the block matrices, and recursively calling the algorithm whenever it needs to multiply two blocks. At each level of recursion, we need to keep only
field elements in memory (storing
different
matrices). Assuming the space usage for
matrix multiplication is
, the space usage of this recursive algorithm is
, which for
solves to
.
More generally, fast matrix multiplication can be done on processors in
memory per processor. However, the communication between processors is then suboptimal. Optimal communication can be achieved by using more memory. As far as I know, it is not known whether optimal communication and optimal memory can be achieved simultaneously. Details are in http://dx.doi.org/10.1007/PL00008264
Using linear algebra, there exist algorithms that achieve better complexity than the naive O(n3). Solvay Strassen algorithm achieves a complexity of O(n2.807) by reducing the number of multiplications required for each 2x2 sub-matrix from 8 to 7.
The fastest known matrix multiplication algorithm is Coppersmith-Winograd algorithm with a complexity of O(n2.3737). Unless the matrix is huge, these algorithms do not result in a vast difference in computation time. In practice, it is easier and faster to use parallel algorithms for matrix multiplication.
The naive algorithm, which is what you've got once you correct it as noted in comments, is O(n^3).
There do exist algorithms that reduce this somewhat, but you're not likely to find an O(n^2) implementation. I believe the question of the most efficient implementation is still open.
See this wikipedia article on Matrix Multiplication for more information.
I can't imagine an argument that the space required for an algorithm is less than what is required to store the result; that should be the lower bound of the space required.
But apparently my imagination is not up to the task at hand, and neither the space for the input parameters nor the space for the output/result should be counted against the algorithm.
So (as the comments below have convinced me): no.
As other responses say, you must differentiate between the space taken by the matrix itself and the multiplication algorithm.
As for a classic NxM matriz data structure, the space taken is O(NM).
As for the algorithm per se, it depends: the basic secuential multiplication algorithm takes O(1) space since it multiply and sum one element at a time.
In a parallel algorithm multiplying NxM and MxP matrixes, each processor should take O(1) space since each process calculates one multiplcation value, but is O(X) in space, where X is the number of parallel processes working on the solution.