specialized notation for multivariable calculus
In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices. It collects the various partial derivatives of a single function with respect to many variables, … Wikipedia
🌐
Wikipedia
en.wikipedia.org › wiki › Matrix_calculus
Matrix calculus - Wikipedia
October 9, 2025 - More complicated examples include the derivative of a scalar function with respect to a matrix, known as the gradient matrix, which collects the derivative with respect to each matrix element in the corresponding position in the resulting matrix. In that case the scalar must be a function of ...
🌐
Stanford University
web.stanford.edu › class › math114 › lecture_notes › gradients_involving_matrices.pdf pdf
CME 108/MATH 114 Introduction to Scientific Computing Summer 2019
and a symmetric n × n matrix A. In this case, the corresponding differential is ... Remark. The formula ∇f(x) = 2Ax, which interprets the gradient as a column
Top answer
1 of 3
6

is an normal real valued function. If you want you can write it componentwise as

$$f(x) = {1\over 2}\sum_j\sum_k p_{jk}x_jx_k + \sum_j q_jx_j + r$$

Now the first double sum contains the $x_jx_k$ term twice if and if it becomes an term, so the derivate with respect to becomes:

Which in matrix notation becomes

2 of 3
3

I simply would use the Gâteaux-Derivative. That derivative is the natural expansion of the 1D Derivative to higher dimensions. Since your function maps $f:ℝ^n→ℝ$ we need an arbitrary direction $δx∈ℝ^n$, and a small increment . Using that " formulation the Gâteaux-Derivative for your function reads \begin{align*} d(\|Ax-b\|²;[x,δx]) = (\frac{d}{dε}\|A(x+εδx) - b\|²)\big|_{ε=0} \end{align*}

First it is \begin{align*} \frac{d}{dε}\|A(x+εδx) - b\|² =& \frac{d}{dε}[(A(x+εδx) - b, A(x+εδx) - b)] \\ =&\frac{d}{dε}[\{(Ax, Ax)+ (Ax,Aεδx) + (Ax, -b)\} \\ &+ \{(Aεδx, Ax) + (Aεδx, Aεδx) + (Aεδx, -b)\} \\ &+ \{(-b, Ax) + (-b, Aεδx) + (-b, -b)\} ] \\ =¹&\frac{d}{dε}[\{\|Ax\|²+ \|b\|²+ 2(Ax, -b)\} \\ &+ ε\{2(Ax,Aδx) + 2(-b, Aδx)\} \\ &+ ε²\|Aδx\|² ]\\ =& \{2(Ax,Aδx) + 2(-b, Aδx)\} + 2ε\|Aδx\|². \end{align*} ¹Sorting by powers of ε.

Setting ε=0, yields \begin{align*} (\frac{d}{dε}\|A(x+εδx) - b\|²)\big|_{ε=0} &= 2(Ax,Aδx) + 2(-b, Aδx) \\ &= 2(Ax-b, Aδx)= (2A^\top[Ax-b], δx). \end{align*}

Hence, the derivative is .

That is because, $∇f = (∂_{e_1}f, ∂_{e_2}f, …)^T$. So replacing δx with gives: $$∂_{e_i} = {2A^\top[Ax-b]}_i.$$

Higher derivatives can be calculated in the same way: \begin{align*} \frac{d}{dε}(2A^\top[A(x+δxε-b])\big|_{ε=0} &= (2A^\top Aδx)\big|_{ε=0} \\ &=2A^\top Aδx \end{align*} $⇒∇^2f(x) = 2A^\top A.$

🌐
Wikipedia
en.wikipedia.org › wiki › Gradient
Gradient - Wikipedia
2 weeks ago - In rectangular coordinates, the gradient of a vector field f = ( f1, f2, f3) is defined by: ... {\displaystyle \nabla \mathbf {f} =g^{jk}{\frac {\partial f^{i}}{\partial x^{j}}}\mathbf {e} _{i}\otimes \mathbf {e} _{k},} (where the Einstein summation notation is used and the tensor product of the vectors ei and ek is a dyadic tensor of type (2,0)). Overall, this expression equals the transpose of the Jacobian matrix:
🌐
MathWorks
mathworks.com › matlab › mathematics › numerical integration and differential equations › numerical integration and differentiation
gradient - Numerical gradient - MATLAB
[FX,FY] = gradient(F) returns the x and y components of the two-dimensional numerical gradient of matrix F. The additional output FY corresponds to ∂F/∂y, which are the differences in the y (vertical) direction.
🌐
Explained
explained.ai › matrix-calculus
The Matrix Calculus You Need For Deep Learning
As another example, let's sum the result of multiplying a vector by a constant scalar. If then . The gradient is: The derivative with respect to scalar variable z is : We can't compute partial derivatives of very complicated functions using just the basic matrix calculus rules we've seen so far.
🌐
Math Insight
mathinsight.org › gradient_vector
The gradient vector - Math Insight
When we write vectors as matrices, we tend to write an $n$-dimensional vector vector as $n \times 1$ column matrix. But, in this case, we'll make an exception, and view this derivative matrix as a vector, called the gradient of $f$ and denoted as $\nabla f$: $$\nabla f(\vc{x}) = \left(\pdiff{f}{x_1}(\vc{x}), \pdiff{f}{x_2}(\vc{x}), \cdots, \pdiff{f}{x_n}(\vc{x}) \right).$$
🌐
Stanford
web.stanford.edu › class › archive › cs › cs224n › cs224n.1184 › readings › gradient-notes.pdf pdf
Computing Neural Network Gradients Kevin Clark 1 Introduction
Since this matrix has the same shape as W , we could just subtract it (times · the learning rate) from W when doing gradient descent. So (in a slight abuse · of notation) let’s find this matrix as · ∂J · ∂W instead. This way of arranging the gradients becomes complicated when computing ·
Find elsewhere
🌐
Edward Hu
edwardshu.com › posts › matrix-matrix-gradient
Edward Hu | Gradient of a Matrix Matrix multiplication
July 28, 2018 - We want to find $\frac{\partial L}{\partial W}$, so let’s start by looking at a specific weight $W_{dc}$. This way we can think more easily about the gradient of $L$ for a single weight and extrapolate for all weights $W$. \[\frac{\partial L}{\partial W_{dc}}=\sum_{i,j} \frac{\partial L}{\partial D_{ij}}\frac{\partial D_{ij}}{\partial W_{dc}}\]
🌐
Peter Frick
frickp.github.io › matrix-gradient-descent.html
Peter Frick – Gradient descent by matrix multiplication
February 23, 2017 - %%time # Initialize variables learned_weights = orig_learned_weights.copy() y_hat_minus_y = np.zeros((num_samples,1)) gradient = np.zeros((X.shape[1],1)) gradient_step = np.zeros((X.shape[1],1)) for epoch in range(10000): for i in range(X.shape[0]): y_hat_minus_y[i] = (X[i,0] * learned_weights[0] + X[i,1] * learned_weights[1] + X[i,2] * learned_weights[2]) - y[i] for j in range(X.shape[1]): gradient[j] = np.sum(y_hat_minus_y * X[:,j].reshape(num_samples,1)) gradient_step[j] = gradient[j]/num_samples*learning_rate learned_weights[j] = learned_weights[j] - gradient_step[j] CPU times: user 6.5 s, sys: 8.09 ms, total: 6.51 s Wall time: 6.51 s · The matrix implementation is about an order of magnitude faster (~0.7s vs 7s).
🌐
Stanford CCRMA
ccrma.stanford.edu › ~dattorro › matrixcalc.pdf pdf
Appendix D Matrix Calculus
APPENDIX D. MATRIX CALCULUS · The gradient of vector-valued function v(x) : R→RN on real domain is a row vector
🌐
Robot Chinwag
robotchinwag.com › posts › gradient-of-matrix-multiplicationin-deep-learning
Gradients of Matrix Multiplication in Deep Learning | Robot Chinwag
January 15, 2025 - Firstly, what does $\partial Y / \partial X$ mean? $Y$ is a function with a matrix output and $X$ is a matrix input to that function. The object $\partial Y / \partial X$ is the collection of gradients, and it has one gradient for each $Y$ component with respect to each $X$ component.
🌐
Reddit
reddit.com › r/mathhelp › how does the gradient operator work with a matrix?
r/MathHelp on Reddit: How does the gradient operator work with a matrix?
January 3, 2019 -

I'm doing some math warm-up questions for a fluids class I'm about to start and I'm stuck on a vector calculus question.

Matrix A is a 2x2 matrix with row 1 = [a b] and row 2 = [c d]. (Sorry, I can't figure out how to make a decent looking matrix, but I think you get the idea).

The exact question goes as follows: Show that ∇ · (∇ ∧ A) = 0. Here ∇ is the gradient operator in 2D and is given by ∇ = (∂/∂x)i + (∂/∂y)j.

I think the wedge operator is being used like a cross operator (×). Now, I understand that for a vector field, this is basically saying that the divergence of the curl is zero and that makes sense to me. Given a vector-valued function, I would just cross ∇ and the function and then take the dot product with what comes out of the cross product and find zero. What's throwing me off here is that A is a matrix and not a vector-valued function.

Could someone please help me understand what (∇ ∧ A) means? I've thought of just multiplying the ∇ vector with A and then dotting ∇ with the result, which gets me ∇A = [(∂/∂x)a + (∂/∂y)c]i + [(∂/∂x)b + (∂/∂y)d]j, and then ∇ · (∇A) = (∂/∂x)[(∂/∂x)a + (∂/∂y)c] + (∂/∂y)[(∂/∂x)b + (∂/∂y)d]. I don't think that's what I'm supposed to do, though. Help is much appreciated!

🌐
Stack Overflow
stackoverflow.com › questions › 57761618 › how-to-calculate-the-gradient-of-a-matrix
math - How to calculate the gradient of a matrix - Stack Overflow
let f(x) = [2x^2, 3y^5] I know how to calculate the derivative of f(x), which will be [d/dx 2x^2, d/dx 3y^5]. Is there a similar process being done when calculating the gradient of f(x)? If not, ...
🌐
UBC Computer Science
cs.ubc.ca › ~schmidtm › Courses › 340-F16 › linearQuadraticGradients.pdf pdf
Deriving the Gradient of Linear and Quadratic Functions in Matrix Notation
October 21, 2016 - Deriving the Gradient of Linear and Quadratic Functions ... for a scalar β. But in this case we still have ∇f(w) = a since the y-intercept β does not depend on w. ... where w is a length-d vector and A is a d by d matrix.
🌐
Scribd
scribd.com › document › 882562266 › Gradients-Involving-Matrices
Matrix Gradient Computation Methods | PDF | Gradient | Matrix (Mathematics)
The key difference between the "gradient" and the "differential" particularly in matrix functions lies in their representation: gradients are typically considered as column vectors, while differentials are represented as row vectors.
🌐
Springer
link.springer.com › home › indian journal of pure and applied mathematics › article
What is the gradient of a scalar function of a symmetric matrix? | Indian Journal of Pure and Applied Mathematics
August 15, 2022 - For a real valued function \(\phi \) of a matrix argument, the gradient \(\nabla \phi \) is calculated using a standard approach that follows from the definition of a Fréchet derivative for matrix functionals. In cases where the matrix argument is restricted to the space of symmetric matrices, the approach is easily modified to determine that the gradient ought to be \((\nabla \phi + \nabla \phi ^T)/2\).
🌐
YouTube
youtube.com › watch
Example: Matrix Gradient
Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.