Brave Search

in machine learning, a loss function used for maximum‐margin classification

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended … Wikipedia

Wikipedia

en.wikipedia.org › wiki › Hinge_loss

Hinge loss - Wikipedia

January 26, 2026 - The Hinge loss is not a proper scoring rule. While binary SVMs are commonly extended to multiclass classification in a one-vs.-all or one-vs.-one fashion, it is also possible to extend the hinge loss itself for such an end. Several different variations of multiclass hinge loss have been proposed.

Extensions Optimization

Stack Exchange

stats.stackexchange.com › questions › 336205 › where-does-the-multi-class-hinge-loss-come-from

machine learning - Where does the multi class hinge loss come from? - Cross Validated

Top answer

1 of 2

Let's use the example of the SVM loss function for a single datapoint:

$L_i = \sum_{j\neq y_i} \left[ \max(0, w_j^Tx_i - w_{y_i}^Tx_i + \Delta) \right]$

Where $\text{[math]}$ is the desired margin.

We can differentiate the function with respect to the weights. For example, taking the gradient with respect to $\text{[math]}$ we obtain:

$\nabla_{w_{y_i}} L_i = - \left( \sum_{j\neq y_i} \mathbb{1}(w_j^Tx_i - w_{y_i}^Tx_i + \Delta > 0) \right) x_i$

Where 1 is the indicator function that is one if the condition inside is true or zero otherwise. While the expression may look scary when it is written out, when you're implementing this in code you'd simply count the number of classes that didn't meet the desired margin (and hence contributed to the loss function) and then the data vector $\text{[math]}$ scaled by this number is the gradient. Notice that this is the gradient only with respect to the row of $\text{[math]}$ that corresponds to the correct class. For the other rows where $j≠{{y}_{i}}$ the gradient is:

$\nabla_{w_j} L_i = \mathbb{1}(w_j^Tx_i - w_{y_i}^Tx_i + \Delta > 0) x_i$

Once you derive the expression for the gradient it is straight-forward to implement the expressions and use them to perform the gradient update.

Taken from Stanford CS231N optimization notes posted on github.

2 of 2

First of all, note that multi-class hinge loss function is a function of $\text{[math]}$ . \begin{equation} l(W_r) = \max( 0, 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i - W_{y_i} \cdot x_i) \end{equation} Next, max function is non-differentiable at $\text{[math]}$ . So, we need to calculate the subgradient of it. \begin{equation} \frac{\partial l(W_r)}{\partial W_r} = \begin{cases} \{0\}, & W_{y_i}\cdot x_i > 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i \\ \{x_i\}, & W_{y_i}\cdot x_i < 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i\\ \{\alpha x_i\}, & \alpha \in [0,1], W_{y_i}\cdot x_i = 1 + \underset{r \neq y_i}{ \max } W_r \cdot x_i \end{cases} \end{equation} In the second case, $\text{[math]}$ is independent of $\text{[math]}$ . Above definition of subgradient of multi-class hinge loss is similar to subgradient of binary class hinge loss.

Davidrosenberg

davidrosenberg.github.io › mlcourse › Archive › 2017 › Homework › hw5.pdf pdf

Homework 5: Generalized Hinge Loss and Multiclass SVM

New homework on multiclass hinge loss and multiclass SVM

TheCVF

openaccess.thecvf.com › content › WACV2021 › papers › Kavalerov_A_Multi-Class_Hinge_Loss_for_Conditional_GANs_WACV_2021_paper.pdf pdf

A Multi-Class Hinge Loss for Conditional GANs Ilya Kavalerov

matching IPM loss (McGAN)[24] following the empirical · successes of the Maximum Mean Discrepancy objective [17] ... McGAN. When combined with spectral normalization of · weights in D [22], the hinge loss greatly improves perfor-

PyImageSearch

pyimagesearch.com › home › blog › multi-class svm loss

Multi-class SVM Loss - PyImageSearch

April 17, 2021 - We’ll return to regularization in a future post once we better understand loss functions. ... I’m glad you asked. Essentially, the hinge loss function is summing across all incorrect classes (

Quora

quora.com › What-is-an-intuitive-explanation-of-the-multiclass-hinge-loss

What is an intuitive explanation of the multiclass hinge loss? - Quora

Answer (1 of 2): A2A. To me, sometimes stepping back from the problem helps. That is, take some time to look at motivation. And, actually, history can help there. Reducing the context is a good start. In this case, we have the hinge loss function. Then, it gets extended beyond simple. Simple, m...

Find elsewhere

Google Bing Mojeek

Massachusetts Institute of Technology

mit.edu › ~rakhlin › 6.883 › lectures › lecture05.pdf pdf

6.883: Online Methods in Machine Learning

A layout example with a side menu that hides on mobile, just like the Pure website.

PyTorch

docs.pytorch.org › reference api › torch.nn › multimarginloss

MultiMarginLoss — PyTorch 2.10 documentation

January 1, 2023 - Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input

Readthedocs

torchmetrics.readthedocs.io › en › stable › classification › hinge_loss.html

Hinge Loss — PyTorch-Metrics 1.8.2 documentation

class torchmetrics.classification.MulticlassHingeLoss(num_classes, squared=False, multiclass_mode='crammer-singer', ignore_index=None, validate_args=True, **kwargs)[source]¶ · Compute the mean Hinge loss typically used for Support Vector Machines (SVMs) for multiclass tasks.

Stanford Artificial Intelligence Laboratory

ai.stanford.edu › ~tianshig › papers › multiclassHingeBoost-ICML2011.pdf pdf

Multiclass Boosting with Hinge Loss based on Output Coding Tianshi Gao

HingeBoost.OC. Although both methods use the same · loss, the regularization and the optimization procedure · are diﬀerent. Empirically the stage-wise optimization · and regularization seems to give better performance, which might be due to lower variance than that of the ... Acknowledgment. This work was supported by the · NSF under grant No. ... Allwein, E. L., Schapire, R. E., and Singer, Y. Reducing · multiclass ...

YouTube

youtube.com › watch

4. Hinge Loss/Multi-class SVM Loss - YouTube

05:08

Hinge Loss/Multi-class SVM Loss is used for maximum-margin classification, especially for support vector machines or SVM. Hinge loss at value one is a safe m...

Published July 2, 2022

Machinecurve

machinecurve.com › index.php › 2019 › 10 › 17 › how-to-use-categorical-multiclass-hinge-with-keras

How to use categorical / multiclass hinge with TensorFlow 2 and Keras? | MachineCurve.com

October 17, 2019 - Multiclass hinge was introduced by researchers Weston and Watkins (Wikipedia, 2011): ... For a prediction $y$, take all $y$ values unequal to $t$, and compute the individual losses.

Google Research

research.google › pubs › l1-and-l2-regularization-for-multiclass-hinge-loss-models

L1 and L2 Regularization for Multiclass Hinge Loss Models

This paper investigates the relationship between the loss function, the type of regularization, and the resulting model sparsity of discriminatively-trained multiclass linear models. The effects on sparsity of optimizing log loss are straightforward: L2 regularization produces very dense models while L1 regularization produces much sparser models. However, optimizing hinge loss yields more nuanced behavior.

University of Edinburgh

homepages.inf.ed.ac.uk › htang2 › sigml › symposium2011 › papers › Moore+DeNero_Regularization.pdf pdf

L1 AND L2 REGULARIZATION FOR MULTICLASS HINGE LOSS MODELS

models obtained by L1 and L2 regularization of log loss and · found that the hinge-loss-based models were much sparser

Gitbooks

sharad-s.gitbooks.io › cs231n › content › lecture_3_-_loss_functions_and_optimization › multiclass_svm_loss_deep_dive.html

Multiclass SVM Loss (Deep Dive) · CS231n

Q: If all of your scores are so small that they are approximately 0, what kind of loss would you expect? A: You would expect a loss of approximately (C-1) where C is the number of classes. This is because if you look at the equation for Multiclass SVM Loss, you will see that max(0, 0-0 + 1) ...

HandWiki

handwiki.org › wiki › Hinge_loss

Hinge loss - HandWiki

February 6, 2024 - For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as ... y should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, ... While binary SVMs are commonly extended to multiclass classification in a one-vs.-all or one-vs.-one fashion,[2] it is also possible to extend the hinge loss itself for such an end.

Stack Overflow

stackoverflow.com › questions › 36904298 › how-to-implement-multi-class-hinge-loss-in-tensorflow

neural network - How to implement multi-class hinge loss in tensorflow - Stack Overflow