in machine learning, a loss function used for maximum‐margin classification
In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended … Wikipedia
🌐
Wikipedia
en.wikipedia.org › wiki › Hinge_loss
Hinge loss - Wikipedia
January 26, 2026 - In structured prediction, the hinge loss can be further extended to structured output spaces. Structured SVMs with margin rescaling use the following variant, where w denotes the SVM's parameters, y the SVM's predictions, φ the joint feature function, and Δ the Hamming loss:
🌐
PyTorch
docs.pytorch.org › reference api › torch.nn › multimarginloss
MultiMarginLoss — PyTorch 2.11 documentation
January 1, 2023 - >>> loss = nn.MultiMarginLoss() >>> x = torch.tensor([[0.1, 0.2, 0.4, 0.8]]) >>> y = torch.tensor([3]) >>> # 0.25 * ((1-(0.8-0.1)) + (1-(0.8-0.2)) + (1-(0.8-0.4))) >>> loss(x, y) tensor(0.32...)
🌐
scikit-learn
scikit-learn.org › stable › modules › generated › sklearn.metrics.hinge_loss.html
hinge_loss — scikit-learn 1.8.0 documentation
In binary class case, assuming labels in y_true are encoded with +1 and -1, when a prediction mistake is made, margin = y_true * pred_decision is always negative (since the signs disagree), implying 1 - margin is always greater than 1. The cumulated ...
🌐
PyImageSearch
pyimagesearch.com › home › blog › multi-class svm loss
Multi-class SVM Loss - PyImageSearch
April 17, 2021 - As for which loss function you should use, that is entirely dependent on your dataset. It’s typical to see the standard hinge loss function used more often, but on some datasets the squared variation might obtain better accuracy — overall, this is a hyperparameter that you should cross-validate.
🌐
GitHub
github.com › christianversloot › machine-learning-articles › blob › main › how-to-use-categorical-multiclass-hinge-with-keras.md
machine-learning-articles/how-to-use-categorical-multiclass-hinge-with-keras.md at main · christianversloot/machine-learning-articles
How tensorflow.keras.losses.CategoricalHinge can be used in your TensorFlow 2 based Keras model. ... Update 10/Feb/2021: ensure that article is up to date. Code examples now reflect TensorFlow 2 ecosystem and have been upgraded from TensorFlow/Keras 1.x. ... This code example demonstrates quickly how to use categorical (multiclass) hinge loss with TensorFlow 2 based Keras.
Author   christianversloot
Find elsewhere
Top answer
1 of 2
8

Let's use the example of the SVM loss function for a single datapoint:

$L_i = \sum_{j\neq y_i} \left[ \max(0, w_j^Tx_i - w_{y_i}^Tx_i + \Delta) \right]$

Where $\Delta$ is the desired margin.

We can differentiate the function with respect to the weights. For example, taking the gradient with respect to $w_{yi}$ we obtain:

$\nabla_{w_{y_i}} L_i = - \left( \sum_{j\neq y_i} \mathbb{1}(w_j^Tx_i - w_{y_i}^Tx_i + \Delta > 0) \right) x_i$

Where 1 is the indicator function that is one if the condition inside is true or zero otherwise. While the expression may look scary when it is written out, when you're implementing this in code you'd simply count the number of classes that didn't meet the desired margin (and hence contributed to the loss function) and then the data vector $x_i$ scaled by this number is the gradient. Notice that this is the gradient only with respect to the row of $W$ that corresponds to the correct class. For the other rows where $j≠{{y}_{i}}$ the gradient is:

$\nabla_{w_j} L_i = \mathbb{1}(w_j^Tx_i - w_{y_i}^Tx_i + \Delta > 0) x_i$

Once you derive the expression for the gradient it is straight-forward to implement the expressions and use them to perform the gradient update.

Taken from Stanford CS231N optimization notes posted on github.

2 of 2
0

First of all, note that multi-class hinge loss function is a function of $W_r$. \begin{equation} l(W_r) = \max( 0, 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i - W_{y_i} \cdot x_i) \end{equation} Next, max function is non-differentiable at $0$. So, we need to calculate the subgradient of it. \begin{equation} \frac{\partial l(W_r)}{\partial W_r} = \begin{cases} \{0\}, & W_{y_i}\cdot x_i > 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i \\ \{x_i\}, & W_{y_i}\cdot x_i < 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i\\ \{\alpha x_i\}, & \alpha \in [0,1], W_{y_i}\cdot x_i = 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i \end{cases} \end{equation} In the second case, $W_{y_i}$ is independent of $W_r$. Above definition of subgradient of multi-class hinge loss is similar to subgradient of binary class hinge loss.

🌐
Readthedocs
torchmetrics.readthedocs.io › en › v0.11.0 › classification › hinge_loss.html
Hinge Loss — PyTorch-Metrics 0.11.0 documentation
class torchmetrics.classification.MulticlassHingeLoss(num_classes, squared=False, multiclass_mode='crammer-singer', ignore_index=None, validate_args=True, **kwargs)[source] Computes the mean Hinge loss typically used for Support Vector Machines (SVMs) for multiclass tasks.
🌐
Gitbooks
sharad-s.gitbooks.io › cs231n › content › lecture_3_-_loss_functions_and_optimization › multiclass_svm_loss_deep_dive.html
Multiclass SVM Loss (Deep Dive) · CS231n
Q: If all of your scores are so small that they are approximately 0, what kind of loss would you expect? A: You would expect a loss of approximately (C-1) where C is the number of classes. This is because if you look at the equation for Multiclass SVM Loss, you will see that max(0, 0-0 + 1) ...
🌐
TheCVF
openaccess.thecvf.com › content › WACV2021 › papers › Kavalerov_A_Multi-Class_Hinge_Loss_for_Conditional_GANs_WACV_2021_paper.pdf pdf
A Multi-Class Hinge Loss for Conditional GANs Ilya Kavalerov
matching IPM loss (McGAN)[24] following the empirical · successes of the Maximum Mean Discrepancy objective [17] ... McGAN. When combined with spectral normalization of · weights in D [22], the hinge loss greatly improves perfor-
🌐
Quora
quora.com › What-is-an-intuitive-explanation-of-the-multiclass-hinge-loss
What is an intuitive explanation of the multiclass hinge loss? - Quora
Answer (1 of 2): A2A. To me, sometimes stepping back from the problem helps. That is, take some time to look at motivation. And, actually, history can help there. Reducing the context is a good start. In this case, we have the hinge loss function. Then, it gets extended beyond simple. Simple, m...
🌐
YouTube
youtube.com › watch
4. Hinge Loss/Multi-class SVM Loss - YouTube
Hinge Loss/Multi-class SVM Loss is used for maximum-margin classification, especially for support vector machines or SVM. Hinge loss at value one is a safe m...
Published   July 2, 2022
🌐
Medium
medium.com › analytics-vidhya › loss-functions-multiclass-svm-loss-and-cross-entropy-loss-9190c68f13e0
Loss Functions — Multiclass SVM Loss and Cross Entropy Loss | by Ramji Balasubramanian | Analytics Vidhya | Medium
December 24, 2020 - Formula: Loss = max(0,predicted-original+1) For first image, True label is dog, and its predicted with the value of 4.26 as dog, 1.33 as cat and -1.01 as Panda. image_1 = max(0, 1.33–4.26 + 1) + max(0, -1.01–4.26 + 1) Similarly calculate ...
🌐
Readthedocs
torchmetrics.readthedocs.io › en › v0.9.2 › classification › hinge_loss.html
Hinge Loss — PyTorch-Metrics 0.9.2 documentation
In the multi-class case when multiclass_mode=MulticlassMode.ONE_VS_ALL or multiclass_mode='one-vs-all', this metric will use a one-vs-all approach to compute the hinge loss, giving a vector of C outputs where each entry pits that class against all remaining classes.
🌐
ICML
icml.cc › Conferences › 2011 › papers › 356_icmlpaper.pdf pdf
Multiclass Boosting with Hinge Loss based on Output Coding
The International Conference on Machine Learning (ICML) is the premier conference for machine learning research. It is organized by the International Machine Learning Society (IMLS) · The 28th ICML , was held in Bellevue, Washington, USA, from June 28 through July 2, 2011, with events beginning ...
🌐
Lightning AI
lightning.ai › docs › torchmetrics › stable › classification › hinge_loss.html
Hinge Loss — PyTorch-Metrics 1.9.0 documentation
class torchmetrics.classification.MulticlassHingeLoss(num_classes, squared=False, multiclass_mode='crammer-singer', ignore_index=None, validate_args=True, **kwargs)[source]¶ · Compute the mean Hinge loss typically used for Support Vector Machines (SVMs) for multiclass tasks.