Brave Search

Compute the gradient of the SVM loss function

stackoverflow.com › questions › 36020583 › compute-the-gradient-of-the-svm-loss-function

The method to calculate gradient in this case is Calculus (analytically, NOT numerically!). So we differentiate loss function with respect to W(yi) like this:

and with respect to W(j) when j!=yi is:

The 1 is just indicator function so we can ignore the middle form when condition is true. And when you write in code, the example you provided is the answer.

Since you are using cs231n example, you should definitely check note and videos if needed.

Hope this helps!

Answer from dexhunter on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 36020583 › compute-the-gradient-of-the-svm-loss-function

python - Compute the gradient of the SVM loss function - Stack Overflow

Top answer

1 of 3

The method to calculate gradient in this case is Calculus (analytically, NOT numerically!). So we differentiate loss function with respect to W(yi) like this:

and with respect to W(j) when j!=yi is:

The 1 is just indicator function so we can ignore the middle form when condition is true. And when you write in code, the example you provided is the answer.

Since you are using cs231n example, you should definitely check note and videos if needed.

Hope this helps!

2 of 3

If the substraction less than zero the loss is zero so the gradient of W is also zero. If the substarction larger than zero, then the gradient of W is the partial derviation of the loss.

Stack Exchange

math.stackexchange.com › questions › 2572318 › derivation-of-gradient-of-svm-loss

machine learning - Derivation of gradient of SVM loss - Mathematics Stack Exchange

Top answer

1 of 3

Let's start with basics. The so-called gradient is just the ordinary derivative, that is, slope. For example, slope of the linear function $\text{[math]}$ equals $\text{[math]}$ , so its gradient w.r.t. $\text{[math]}$ equals $\text{[math]}$ . If $\text{[math]}$ and $\text{[math]}$ are not numbers, but vectors, then the gradient is also a vector.

Another piece of good news is that gradient is a linear operator. It means, you can add functions and multiply by constants before or after differentiation, it doesn't make any difference

Now take the definition of SVM loss function for a single $\text{[math]}$ -th observation. It is

$\text{[math]}$

where $\text{[math]}$ . Thus, loss equals $\text{[math]}$ , if the latter is non-negative, and $\text{[math]}$ otherwise.

In the first (non-negative) case the loss $\text{[math]}$ is linear in $\text{[math]}$ , so the gradient is just the slope of this function of $\text{[math]}$ , that is , $\text{[math]}$ .

In the second (negative) case the loss $\text{[math]}$ is constant, so its derivative is also $\text{[math]}$ .

To write all this cases in one equation, we invent a function (it is called indicator) $\text{[math]}$ , which equals $\text{[math]}$ if $\text{[math]}$ is true, and $\text{[math]}$ otherwise. With this function, we can write

$\text{[math]}$

If $\text{[math]}$ , the first multiplier equals 1, and gradient equals $\text{[math]}$ . Otherwise, the first multiplier equals 0, and gradient as well. So I just rewrote the two cases in a single line.

Now let's turn from a single $\text{[math]}$ -th observation to the whole loss. The loss is sum of individual losses. Thus, because differentiation is linear, the gradient of a sum equals sum of gradients, so we can write

$\text{total derivative} = \sum(I(something - w_y*x_i > 0) * (-x_i))$

Now, move the $\text{[math]}$ multiplier from $\text{[math]}$ to the beginning of the formula, and you will get your expression.

2 of 3

David has provided good answer. But I would point out that the sum() in David's answer:

total_derivative = sum(I(something - w_y*x[i] > 0) * (-x[i]))

is different from the one in the original Nikhil's question:

$\text{[math]}$ The above equation is still the gradient due to the i-th observation, but for the weight of the ground truth class, i.e. $\text{[math]}$ . There is the summation $\text{[math]}$ , because $\text{[math]}$ is in every term of the SVM loss $\text{[math]}$ :

$\text{[math]}$ For every non-zero term, i.e. $\text{[math]}$ , you would obtain the gradient $\text{[math]}$ . In total, the gradient $\text{[math]}$ is $numOfNonZeroTerm \times (- x_i)$, same as the equation above.

Gradients of individual observations $\text{[math]}$ (computed above) are then averaged to obtain the gradient of the batch of observations $\text{[math]}$ .

Discussions

svm loss function gradient - Cross Validated - Stack Exchange

Bring the best of human thought and AI automation together at your work. Explore Stack Internal ... I was taking Stanford's cs231n class and was unable to understand the gradient calculated using the SVM loss function. More on stats.stackexchange.com

stats.stackexchange.com

June 8, 2019

Gradient for hinge loss multiclass - Cross Validated

While the expression may look scary when it is written out, when you're implementing this in code you'd simply count the number of classes that didn't meet the desired margin (and hence contributed to the loss function) and then the data vector $x_i$ scaled by this number is the gradient. More on stats.stackexchange.com

stats.stackexchange.com

June 2, 2015

Concept explanation - AI Discussions - DeepLearning.AI

Hi community, I am following a computer vision class and I am trying to implement the naive SVM. The aim is to compute the gradient of the SVM term of the loss function: compute the derivative at the same time that the loss is being computed. Here is the function code: def svm_loss_naive( W: ... More on community.deeplearning.ai

community.deeplearning.ai

July 11, 2024

Calculating SVM Gradient

Can you be more specific about the part you didn't understand? Are you just asking about the numerical gradient?

Videos

10:46

YouTube

Gradient Descent for Support Vector Machines and Subgradients - ...

March 14, 2020

733