Brave Search

specialized notation for multivariable calculus

In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices. It collects the various partial derivatives of a single function with respect to many variables, … Wikipedia

Wikipedia

en.wikipedia.org › wiki › Matrix_calculus

Matrix calculus - Wikipedia

October 9, 2025 - More complicated examples include the derivative of a scalar function with respect to a matrix, known as the gradient matrix, which collects the derivative with respect to each matrix element in the corresponding position in the resulting matrix. In that case the scalar must be a function of ...

Scope Notation Derivatives with vectors Derivatives with matrices Layout conventions Identities Applications Further reading

Stanford University

web.stanford.edu › class › math114 › lecture_notes › gradients_involving_matrices.pdf pdf

CME 108/MATH 114 Introduction to Scientiﬁc Computing Summer 2019

and a symmetric n × n matrix A. In this case, the corresponding diﬀerential is ... Remark. The formula ∇f(x) = 2Ax, which interprets the gradient as a column

Videos

05:14

YouTube

Gradient, Jacobi- & Hesse-Matrix | Mathe by Daniel Jung - YouTube

May 25, 2025

07:28

YouTube

Gradients for Matrices - YouTube

Gradient und Hesse Matrix bestimmen mit 2 Variablen erklärt - YouTube

March 20, 2023

168

m.youtube.com

Gradient und Hesse Matrix Beispiel | Mathe by Daniel Jung

14:50

YouTube

Gradients (Module3, Part 2) Introduction to Linear Algebra for ...

math.stackexchange.com › questions › 2358321 › gradient-of-a-matrix

Gradient of a matrix? - Mathematics Stack Exchange

Top answer

1 of 3

$\text{[math]}$ is an normal real valued function. If you want you can write it componentwise as

$$f(x) = {1\over 2}\sum_j\sum_k p_{jk}x_jx_k + \sum_j q_jx_j + r$$

Now the first double sum contains the $x_jx_k$ term twice if $\text{[math]}$ and if $\text{[math]}$ it becomes an $\text{[math]}$ term, so the derivate with respect to $\text{[math]}$ becomes:

$\text{[math]}$

Which in matrix notation becomes

$\text{[math]}$

2 of 3

I simply would use the Gâteaux-Derivative. That derivative is the natural expansion of the 1D Derivative $\text{[math]}$ to higher dimensions. Since your function maps $f:ℝ^n→ℝ$ we need an arbitrary direction $δx∈ℝ^n$, and a small increment $\text{[math]}$ . Using that " $\text{[math]}$ formulation the Gâteaux-Derivative for your function reads \begin{align*} d(\|Ax-b\|²;[x,δx]) = (\frac{d}{dε}\|A(x+εδx) - b\|²)\big|_{ε=0} \end{align*}

First it is \begin{align*} \frac{d}{dε}\|A(x+εδx) - b\|² =& \frac{d}{dε}[(A(x+εδx) - b, A(x+εδx) - b)] \\ =&\frac{d}{dε}[\{(Ax, Ax)+ (Ax,Aεδx) + (Ax, -b)\} \\ &+ \{(Aεδx, Ax) + (Aεδx, Aεδx) + (Aεδx, -b)\} \\ &+ \{(-b, Ax) + (-b, Aεδx) + (-b, -b)\} ] \\ =¹&\frac{d}{dε}[\{\|Ax\|²+ \|b\|²+ 2(Ax, -b)\} \\ &+ ε\{2(Ax,Aδx) + 2(-b, Aδx)\} \\ &+ ε²\|Aδx\|² ]\\ =& \{2(Ax,Aδx) + 2(-b, Aδx)\} + 2ε\|Aδx\|². \end{align*} ¹Sorting by powers of ε.

Setting ε=0, yields \begin{align*} (\frac{d}{dε}\|A(x+εδx) - b\|²)\big|_{ε=0} &= 2(Ax,Aδx) + 2(-b, Aδx) \\ &= 2(Ax-b, Aδx)= (2A^\top[Ax-b], δx). \end{align*}

Hence, the derivative is $\text{[math]}$ .

That is because, $∇f = (∂_{e_1}f, ∂_{e_2}f, …)^T$. So replacing δx with $\text{[math]}$ gives: $$∂_{e_i} = {2A^\top[Ax-b]}_i.$$

Higher derivatives can be calculated in the same way: \begin{align*} \frac{d}{dε}(2A^\top[A(x+δxε-b])\big|_{ε=0} &= (2A^\top Aδx)\big|_{ε=0} \\ &=2A^\top Aδx \end{align*} $⇒∇^2f(x) = 2A^\top A.$

Wikipedia

en.wikipedia.org › wiki › Gradient

Gradient - Wikipedia

2 weeks ago - In rectangular coordinates, the gradient of a vector field f = ( f1, f2, f3) is defined by: ... {\displaystyle \nabla \mathbf {f} =g^{jk}{\frac {\partial f^{i}}{\partial x^{j}}}\mathbf {e} _{i}\otimes \mathbf {e} _{k},} (where the Einstein summation notation is used and the tensor product of the vectors ei and ek is a dyadic tensor of type (2,0)). Overall, this expression equals the transpose of the Jacobian matrix:

Motivation Notation Definition Relationship with derivative Further properties and applications Generalizations Further reading

MathWorks

mathworks.com › matlab › mathematics › numerical integration and differential equations › numerical integration and differentiation

gradient - Numerical gradient - MATLAB

[FX,FY] = gradient(F) returns the x and y components of the two-dimensional numerical gradient of matrix F. The additional output FY corresponds to ∂F/∂y, which are the differences in the y (vertical) direction.

Explained

explained.ai › matrix-calculus

The Matrix Calculus You Need For Deep Learning

As another example, let's sum the result of multiplying a vector by a constant scalar. If then . The gradient is: The derivative with respect to scalar variable z is : We can't compute partial derivatives of very complicated functions using just the basic matrix calculus rules we've seen so far.

Math Insight

mathinsight.org › gradient_vector

The gradient vector - Math Insight

When we write vectors as matrices, we tend to write an $n$-dimensional vector vector as $n \times 1$ column matrix. But, in this case, we'll make an exception, and view this derivative matrix as a vector, called the gradient of $f$ and denoted as $\nabla f$: $$\nabla f(\vc{x}) = \left(\pdiff{f}{x_1}(\vc{x}), \pdiff{f}{x_2}(\vc{x}), \cdots, \pdiff{f}{x_n}(\vc{x}) \right).$$

Stanford

web.stanford.edu › class › archive › cs › cs224n › cs224n.1184 › readings › gradient-notes.pdf pdf

Computing Neural Network Gradients Kevin Clark 1 Introduction

Since this matrix has the same shape as W , we could just subtract it (times · the learning rate) from W when doing gradient descent. So (in a slight abuse · of notation) let’s ﬁnd this matrix as · ∂J · ∂W instead. This way of arranging the gradients becomes complicated when computing ·

Find elsewhere

Google Bing Mojeek

Edward Hu

edwardshu.com › posts › matrix-matrix-gradient

Edward Hu | Gradient of a Matrix Matrix multiplication

July 28, 2018 - We want to find $\frac{\partial L}{\partial W}$, so let’s start by looking at a specific weight $W_{dc}$. This way we can think more easily about the gradient of $L$ for a single weight and extrapolate for all weights $W$. \[\frac{\partial L}{\partial W_{dc}}=\sum_{i,j} \frac{\partial L}{\partial D_{ij}}\frac{\partial D_{ij}}{\partial W_{dc}}\]

Peter Frick

frickp.github.io › matrix-gradient-descent.html

Peter Frick – Gradient descent by matrix multiplication

February 23, 2017 - %%time # Initialize variables learned_weights = orig_learned_weights.copy() y_hat_minus_y = np.zeros((num_samples,1)) gradient = np.zeros((X.shape[1],1)) gradient_step = np.zeros((X.shape[1],1)) for epoch in range(10000): for i in range(X.shape[0]): y_hat_minus_y[i] = (X[i,0] * learned_weights[0] + X[i,1] * learned_weights[1] + X[i,2] * learned_weights[2]) - y[i] for j in range(X.shape[1]): gradient[j] = np.sum(y_hat_minus_y * X[:,j].reshape(num_samples,1)) gradient_step[j] = gradient[j]/num_samples*learning_rate learned_weights[j] = learned_weights[j] - gradient_step[j] CPU times: user 6.5 s, sys: 8.09 ms, total: 6.51 s Wall time: 6.51 s · The matrix implementation is about an order of magnitude faster (~0.7s vs 7s).

Stanford CCRMA

ccrma.stanford.edu › ~dattorro › matrixcalc.pdf pdf

Appendix D Matrix Calculus

APPENDIX D. MATRIX CALCULUS · The gradient of vector-valued function v(x) : R→RN on real domain is a row vector

Carnegie Mellon University

cs.cmu.edu › ~epxing › Class › 10701-08s › recitation › mc.pdf pdf

Appendix D Matrix calculus From too much study, and from extreme passion,

APPENDIX D. MATRIX CALCULUS · Gradient of vector-valued function g(X) : RK×L→RN on matrix domain

Robot Chinwag

robotchinwag.com › posts › gradient-of-matrix-multiplicationin-deep-learning

Gradients of Matrix Multiplication in Deep Learning | Robot Chinwag

January 15, 2025 - Firstly, what does $\partial Y / \partial X$ mean? $Y$ is a function with a matrix output and $X$ is a matrix input to that function. The object $\partial Y / \partial X$ is the collection of gradients, and it has one gradient for each $Y$ component with respect to each $X$ component.

reddit.com › r/mathhelp › how does the gradient operator work with a matrix?

r/MathHelp on Reddit: How does the gradient operator work with a matrix?

January 3, 2019 -

I'm doing some math warm-up questions for a fluids class I'm about to start and I'm stuck on a vector calculus question.

Matrix A is a 2x2 matrix with row 1 = [a b] and row 2 = [c d]. (Sorry, I can't figure out how to make a decent looking matrix, but I think you get the idea).

The exact question goes as follows: Show that ∇ · (∇ ∧ A) = 0. Here ∇ is the gradient operator in 2D and is given by ∇ = (∂/∂x)i + (∂/∂y)j.

I think the wedge operator is being used like a cross operator (×). Now, I understand that for a vector field, this is basically saying that the divergence of the curl is zero and that makes sense to me. Given a vector-valued function, I would just cross ∇ and the function and then take the dot product with what comes out of the cross product and find zero. What's throwing me off here is that A is a matrix and not a vector-valued function.

Could someone please help me understand what (∇ ∧ A) means? I've thought of just multiplying the ∇ vector with A and then dotting ∇ with the result, which gets me ∇A = [(∂/∂x)a + (∂/∂y)c]i + [(∂/∂x)b + (∂/∂y)d]j, and then ∇ · (∇A) = (∂/∂x)[(∂/∂x)a + (∂/∂y)c] + (∂/∂y)[(∂/∂x)b + (∂/∂y)d]. I don't think that's what I'm supposed to do, though. Help is much appreciated!

Top answer

1 of 2

Have you taken a look at the matrix cookbook from MIT? It explains how all of the differential operations work with vectors/matrices. https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf

2 of 2

Reminder: What have you tried so far? (See Rule #2) Please don't delete your post. (See Rule #7) We, the moderators of r/MathHelp , appreciate that your question contributes to the MathHelp archived questions that will help others searching for similar answers in the future. Thank you for obeying these instructions. I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

Stack Overflow

stackoverflow.com › questions › 57761618 › how-to-calculate-the-gradient-of-a-matrix

math - How to calculate the gradient of a matrix - Stack Overflow

let f(x) = [2x^2, 3y^5] I know how to calculate the derivative of f(x), which will be [d/dx 2x^2, d/dx 3y^5]. Is there a similar process being done when calculating the gradient of f(x)? If not, ...

UBC Computer Science

cs.ubc.ca › ~schmidtm › Courses › 340-F16 › linearQuadraticGradients.pdf pdf

Deriving the Gradient of Linear and Quadratic Functions in Matrix Notation

October 21, 2016 - Deriving the Gradient of Linear and Quadratic Functions ... for a scalar β. But in this case we still have ∇f(w) = a since the y-intercept β does not depend on w. ... where w is a length-d vector and A is a d by d matrix.

University of Washington

atmos.washington.edu › ~dennis › MatrixCalculus.pdf pdf

Matrix Diﬀerentiation ( and some other stuﬀ) Randal J. Barnes

Let A be a square, nonsingular matrix of order m.

Scribd

scribd.com › document › 882562266 › Gradients-Involving-Matrices

Matrix Gradient Computation Methods | PDF | Gradient | Matrix (Mathematics)

The key difference between the "gradient" and the "differential" particularly in matrix functions lies in their representation: gradients are typically considered as column vectors, while differentials are represented as row vectors.

Springer

link.springer.com › home › indian journal of pure and applied mathematics › article

What is the gradient of a scalar function of a symmetric matrix? | Indian Journal of Pure and Applied Mathematics

August 15, 2022 - For a real valued function $\phi $ of a matrix argument, the gradient $\nabla \phi $ is calculated using a standard approach that follows from the definition of a Fréchet derivative for matrix functionals. In cases where the matrix argument is restricted to the space of symmetric matrices, the approach is easily modified to determine that the gradient ought to be $(\nabla \phi + \nabla \phi ^T)/2$.

YouTube

youtube.com › watch

Example: Matrix Gradient

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.