multiclass hinge loss formula

in machine learning, a loss function used for maximum‐margin classification

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended … Wikipedia

Wikipedia

en.wikipedia.org › wiki › Hinge_loss

Hinge loss - Wikipedia

January 26, 2026 - In structured prediction, the hinge loss can be further extended to structured output spaces. Structured SVMs with margin rescaling use the following variant, where w denotes the SVM's parameters, y the SVM's predictions, φ the joint feature function, and Δ the Hamming loss:

Extensions Optimization

Stack Exchange

stats.stackexchange.com › questions › 336205 › where-does-the-multi-class-hinge-loss-come-from

machine learning - Where does the multi class hinge loss come from? - Cross Validated

Top answer

1 of 1

The (multi-class) hinge loss can be understood as attempting to make sure that the score for the correct class is higher than the other classes by at least some margin Δ>0 , otherwise a loss is incurred.

Remember that $\hat y = argmax(W^Tx_i)$, so there's no need for $W^Tx_i$ to be equal exactly to some (e.g.) $\text{[math]}$ , it just needs that the $\text{[math]}$ position score will be the greatest than all the rest. If we ignore the margin for a second, the loss becomes $(w_j^Tx_i - w_{y_i}^Tx_i)_+$ which I think is kind of intuitive.

(Note, regarding your comment in the binary case, I think the $\text{[math]}$ in $(1-y_ix_i^Tw)_+$ is arbitrary. It comes from the problem formulation (mainly I believe it comes from the labels you choose for y $\text{[math]}$ ), and it ends up defining the margins you require between the two classes. You could set up a hinge loss with larger margins, say 2, ( $\text{[math]}$ ) but then it would have to be: $(2 - sign(y_i)x_i^Tw)_+$. Since the final margin is $\text{[math]}$ increasing the $\text{[math]}$ will have the same effect like decreasing $\text{[math]}$ , so you can simply stay with the regular formulation, and add a regularization term to the loss, i.e. $\text{[math]}$ .)

PyTorch

docs.pytorch.org › reference api › torch.nn › multimarginloss

MultiMarginLoss — PyTorch 2.11 documentation

January 1, 2023 - >>> loss = nn.MultiMarginLoss() >>> x = torch.tensor([[0.1, 0.2, 0.4, 0.8]]) >>> y = torch.tensor([3]) >>> # 0.25 * ((1-(0.8-0.1)) + (1-(0.8-0.2)) + (1-(0.8-0.4))) >>> loss(x, y) tensor(0.32...)

scikit-learn

scikit-learn.org › stable › modules › generated › sklearn.metrics.hinge_loss.html

hinge_loss — scikit-learn 1.8.0 documentation

In binary class case, assuming labels in y_true are encoded with +1 and -1, when a prediction mistake is made, margin = y_true * pred_decision is always negative (since the signs disagree), implying 1 - margin is always greater than 1. The cumulated ...

Massachusetts Institute of Technology

mit.edu › ~rakhlin › 6.883 › lectures › lecture05.pdf pdf

6.883: Online Methods in Machine Learning

A layout example with a side menu that hides on mobile, just like the Pure website.

Davidrosenberg

davidrosenberg.github.io › mlcourse › Archive › 2017 › Homework › hw5.pdf pdf

Homework 5: Generalized Hinge Loss and Multiclass SVM

New homework on multiclass hinge loss and multiclass SVM

PyImageSearch

pyimagesearch.com › home › blog › multi-class svm loss

Multi-class SVM Loss - PyImageSearch

April 17, 2021 - As for which loss function you should use, that is entirely dependent on your dataset. It’s typical to see the standard hinge loss function used more often, but on some datasets the squared variation might obtain better accuracy — overall, this is a hyperparameter that you should cross-validate.

Stack Overflow

stackoverflow.com › questions › 36904298 › how-to-implement-multi-class-hinge-loss-in-tensorflow

neural network - How to implement multi-class hinge loss in tensorflow - Stack Overflow

Top answer

1 of 3

top_k has gradients, added in version 0.8 here

2 of 3

Adding another implementation with three lines of code scores: unscaled scores, tensor, shape=(n_classes, batch_size), dtype=float32 classes: tensor, shape=(batch_size, batch_size), dtype=float32

For implementing above loss with choosing the most violated class instead of considering all classes

#H - hard negative for each sample
H = tf.reduce_max(scores * (1 - classes), 0)    
L = tf.nn.relu((1 - scores + H) * classes)
final_loss = tf.reduce_mean(tf.reduce_max(L, 0))

Another implementation where we sum over all negative classes

# implements loss as sum_(j~=y) max(0, 1 - s(x, y) + s(x, j))
def multiclasshingeloss1(scores, classes):
    true_classes = tf.argmax(classes, 0)
    idx_flattened = tf.range(0, scores.get_shape()[1]) * scores.get_shape()[0]+\
    tf.cast(true_classes, dtype=tf.int32)
    true_scores = tf.gather(tf.reshape(tf.transpose(scores), [-1]),
                            idx_flattened)
    L = tf.nn.relu((1 - true_scores + scores) * (1 - classes))
    final_loss = tf.reduce_mean(L)
    return final_loss

You can minimize the transposes here based on your implementation.

GitHub

github.com › christianversloot › machine-learning-articles › blob › main › how-to-use-categorical-multiclass-hinge-with-keras.md

machine-learning-articles/how-to-use-categorical-multiclass-hinge-with-keras.md at main · christianversloot/machine-learning-articles

How tensorflow.keras.losses.CategoricalHinge can be used in your TensorFlow 2 based Keras model. ... Update 10/Feb/2021: ensure that article is up to date. Code examples now reflect TensorFlow 2 ecosystem and have been upgraded from TensorFlow/Keras 1.x. ... This code example demonstrates quickly how to use categorical (multiclass) hinge loss with TensorFlow 2 based Keras.

Author christianversloot

Find elsewhere

Google Bing Mojeek

Stack Exchange

stats.stackexchange.com › questions › 155088 › gradient-for-hinge-loss-multiclass

Gradient for hinge loss multiclass - Cross Validated

Top answer

1 of 2

Let's use the example of the SVM loss function for a single datapoint:

$L_i = \sum_{j\neq y_i} \left[ \max(0, w_j^Tx_i - w_{y_i}^Tx_i + \Delta) \right]$

Where $\Delta$ is the desired margin.

We can differentiate the function with respect to the weights. For example, taking the gradient with respect to $w_{yi}$ we obtain:

$\nabla_{w_{y_i}} L_i = - \left( \sum_{j\neq y_i} \mathbb{1}(w_j^Tx_i - w_{y_i}^Tx_i + \Delta > 0) \right) x_i$

Where 1 is the indicator function that is one if the condition inside is true or zero otherwise. While the expression may look scary when it is written out, when you're implementing this in code you'd simply count the number of classes that didn't meet the desired margin (and hence contributed to the loss function) and then the data vector $x_i$ scaled by this number is the gradient. Notice that this is the gradient only with respect to the row of $W$ that corresponds to the correct class. For the other rows where $j≠{{y}_{i}}$ the gradient is:

$\nabla_{w_j} L_i = \mathbb{1}(w_j^Tx_i - w_{y_i}^Tx_i + \Delta > 0) x_i$

Once you derive the expression for the gradient it is straight-forward to implement the expressions and use them to perform the gradient update.

Taken from Stanford CS231N optimization notes posted on github.

2 of 2

First of all, note that multi-class hinge loss function is a function of $W_r$. \begin{equation} l(W_r) = \max( 0, 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i - W_{y_i} \cdot x_i) \end{equation} Next, max function is non-differentiable at $0$. So, we need to calculate the subgradient of it. \begin{equation} \frac{\partial l(W_r)}{\partial W_r} = \begin{cases} \{0\}, & W_{y_i}\cdot x_i > 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i \\ \{x_i\}, & W_{y_i}\cdot x_i < 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i\\ \{\alpha x_i\}, & \alpha \in [0,1], W_{y_i}\cdot x_i = 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i \end{cases} \end{equation} In the second case, $W_{y_i}$ is independent of $W_r$. Above definition of subgradient of multi-class hinge loss is similar to subgradient of binary class hinge loss.

Readthedocs

torchmetrics.readthedocs.io › en › v0.11.0 › classification › hinge_loss.html

Hinge Loss — PyTorch-Metrics 0.11.0 documentation

class torchmetrics.classification.MulticlassHingeLoss(num_classes, squared=False, multiclass_mode='crammer-singer', ignore_index=None, validate_args=True, **kwargs)[source] Computes the mean Hinge loss typically used for Support Vector Machines (SVMs) for multiclass tasks.

Gitbooks

sharad-s.gitbooks.io › cs231n › content › lecture_3_-_loss_functions_and_optimization › multiclass_svm_loss_deep_dive.html

Multiclass SVM Loss (Deep Dive) · CS231n

Q: If all of your scores are so small that they are approximately 0, what kind of loss would you expect? A: You would expect a loss of approximately (C-1) where C is the number of classes. This is because if you look at the equation for Multiclass SVM Loss, you will see that max(0, 0-0 + 1) ...

TheCVF

openaccess.thecvf.com › content › WACV2021 › papers › Kavalerov_A_Multi-Class_Hinge_Loss_for_Conditional_GANs_WACV_2021_paper.pdf pdf

A Multi-Class Hinge Loss for Conditional GANs Ilya Kavalerov

matching IPM loss (McGAN)[24] following the empirical · successes of the Maximum Mean Discrepancy objective [17] ... McGAN. When combined with spectral normalization of · weights in D [22], the hinge loss greatly improves perfor-

Quora

quora.com › What-is-an-intuitive-explanation-of-the-multiclass-hinge-loss

What is an intuitive explanation of the multiclass hinge loss? - Quora

Answer (1 of 2): A2A. To me, sometimes stepping back from the problem helps. That is, take some time to look at motivation. And, actually, history can help there. Reducing the context is a good start. In this case, we have the hinge loss function. Then, it gets extended beyond simple. Simple, m...

Stack Exchange

stats.stackexchange.com › questions › 218467 › help-interpreting-formula-for-multi-class-hinge-loss

classification - Help interpreting formula for multi-class hinge loss - Cross Validated

Top answer

1 of 1

The answer at Gradient for Hinge Loss Multiclass gives you the Gradient per class.
Since the classifier for the $ j $ - th class is given by the row $ j $ of $ W $ (Which is notated at the answer as $ {W}_{j} $) all you need on each iteration is to update $ {W}_{j} $ according to $ {\nabla}_{{W}_{j}} {L}_{i} $ according to train sample $ {x}_{i} $.

Namely, $ {W}_{j}^{\left( k + 1 \right)} = {W}_{j}^{\left( k \right)} - \eta {\nabla}_{ {W}_{j} } {L}_{i} $, Where $ j $ is the index of the updated row, $ k $ is the iteration counter and $ \eta $ is the Step Size / Learning Rate..

YouTube

youtube.com › watch

4. Hinge Loss/Multi-class SVM Loss - YouTube

05:08

Hinge Loss/Multi-class SVM Loss is used for maximum-margin classification, especially for support vector machines or SVM. Hinge loss at value one is a safe m...

Published July 2, 2022

Medium

medium.com › analytics-vidhya › loss-functions-multiclass-svm-loss-and-cross-entropy-loss-9190c68f13e0

Loss Functions — Multiclass SVM Loss and Cross Entropy Loss | by Ramji Balasubramanian | Analytics Vidhya | Medium

December 24, 2020 - Formula: Loss = max(0,predicted-original+1) For first image, True label is dog, and its predicted with the value of 4.26 as dog, 1.33 as cat and -1.01 as Panda. image_1 = max(0, 1.33–4.26 + 1) + max(0, -1.01–4.26 + 1) Similarly calculate ...

Readthedocs

torchmetrics.readthedocs.io › en › v0.9.2 › classification › hinge_loss.html

Hinge Loss — PyTorch-Metrics 0.9.2 documentation

In the multi-class case when multiclass_mode=MulticlassMode.ONE_VS_ALL or multiclass_mode='one-vs-all', this metric will use a one-vs-all approach to compute the hinge loss, giving a vector of C outputs where each entry pits that class against all remaining classes.

ICML

icml.cc › Conferences › 2011 › papers › 356_icmlpaper.pdf pdf

Multiclass Boosting with Hinge Loss based on Output Coding

The International Conference on Machine Learning (ICML) is the premier conference for machine learning research. It is organized by the International Machine Learning Society (IMLS) · The 28th ICML , was held in Bellevue, Washington, USA, from June 28 through July 2, 2011, with events beginning ...

Lightning AI

lightning.ai › docs › torchmetrics › stable › classification › hinge_loss.html

Hinge Loss — PyTorch-Metrics 1.9.0 documentation

class torchmetrics.classification.MulticlassHingeLoss(num_classes, squared=False, multiclass_mode='crammer-singer', ignore_index=None, validate_args=True, **kwargs)[source]¶ · Compute the mean Hinge loss typically used for Support Vector Machines (SVMs) for multiclass tasks.