in machine learning, a loss function used for maximum‐margin classification
In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended … Wikipedia
🌐
Wikipedia
en.wikipedia.org › wiki › Hinge_loss
Hinge loss - Wikipedia
January 26, 2026 - In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined ...
🌐
Analytics Vidhya
analyticsvidhya.com › home › what is hinge loss in machine learning?
What is Hinge loss in Machine Learning?
December 23, 2024 - Hinge loss is pivotal in classification tasks and widely used in Support Vector Machines (SVMs), quantifies errors by penalizing predictions near or across decision boundaries. By promoting robust margins between classes, it enhances model ...
🌐
Medium
medium.com › analytics-vidhya › understanding-loss-functions-hinge-loss-a0ff112b40a1
Understanding loss functions : Hinge loss | by Kunal Chowdhury | Analytics Vidhya | Medium
January 18, 2024 - Looking at the graph for SVM in Fig 4, we can see that for yf(x) ≥ 1, hinge loss is ‘0’. However, when yf(x) < 1, then hinge loss increases massively.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › hinge-loss-relationship-with-support-vector-machines
Hinge-loss & Relationship with Support Vector Machines - GeeksforGeeks
August 21, 2025 - Its purpose is to penalize predictions that are incorrect or insufficiently confident in the context of binary classification. It is used in binary classification problems where the objective is to separate the data points in two classes typically ...
🌐
Programmathically
programmathically.com › home › machine learning › classical machine learning › understanding hinge loss and the svm cost function
Understanding Hinge Loss and the SVM Cost Function - Programmathically
June 26, 2022 - The hinge loss function is most commonly employed to regularize soft margin support vector machines. The degree of regularization determines how aggressively the classifier tries to prevent misclassifications and can be controlled with an additional ...
Find elsewhere
🌐
NISER
niser.ac.in › ~smishra › teach › cs460 › 23cs460 › lectures › lec11.pdf pdf
HINGE LOSS IN SUPPORT VECTOR MACHINES Chandan Kumar Sahu and Maitrey Sharma
February 7, 2023 - For an intended output of t = ±1 and a classifier score y, the hinge loss of the prediction y is defined ... Note that y should be raw output of the classifier’s decision function, not the predicted class label. For instance, in linear SVMs, y = wT ·
Top answer
1 of 1
23

I will answer one thing at at time

Is an SVM as simple as saying it's a discriminative classifier that simply optimizes the hinge loss?

SVM is simply a linear classifier, optimizing hinge loss with L2 regularization.

Or is it more complex than that?

No, it is "just" that, however there are different ways of looking at this model leading to complex, interesting conclusions. In particular, this specific choice of loss function leads to extremely efficient kernelization, which is not true for log loss (logistic regression) nor mse (linear regression). Furthermore you can show very important theoretical properties, such as those related to Vapnik-Chervonenkis dimension reduction leading to smaller chance of overfitting.

Intuitively look at these three common losses:

  • hinge: max(0, 1-py)
  • log: y log p
  • mse: (p-y)^2

Only the first one has the property that once something is classified correctly - it has 0 penalty. All the remaining ones still penalize your linear model even if it classifies samples correctly. Why? Because they are more related to regression than classification they want a perfect prediction, not just correct.

How do the support vectors come into play?

Support vectors are simply samples placed near the decision boundary (losely speaking). For linear case it does not change much, but as most of the power of SVM lies in its kernelization - there SVs are extremely important. Once you introduce kernel, due to hinge loss, SVM solution can be obtained efficiently, and support vectors are the only samples remembered from the training set, thus building a non-linear decision boundary with the subset of the training data.

What about the slack variables?

This is just another definition of the hinge loss, more usefull when you want to kernelize the solution and show the convexivity.

Why can't you have deep SVM's the way you can't you have a deep neural network with sigmoid activation functions?

You can, however as SVM is not a probabilistic model, its training might be a bit tricky. Furthermore whole strength of SVM comes from efficiency and global solution, both would be lost once you create a deep network. However there are such models, in particular SVM (with squared hinge loss) is nowadays often choice for the topmost layer of deep networks - thus the whole optimization is actually a deep SVM. Adding more layers in between has nothing to do with SVM or other cost - they are defined completely by their activations, and you can for example use RBF activation function, simply it has been shown numerous times that it leads to weak models (to local features are detected).

To sum up:

  • there are deep SVMs, simply this is a typical deep neural network with SVM layer on top.
  • there is no such thing as putting SVM layer "in the middle", as the training criterion is actually only applied to the output of the network.
  • using of "typical" SVM kernels as activation functions is not popular in deep networks due to their locality (as opposed to very global relu or sigmoid)
🌐
Medium
koshurai.medium.com › understanding-hinge-loss-in-machine-learning-a-comprehensive-guide-0a1c82478de4
Understanding Hinge Loss in Machine Learning: A Comprehensive Guide | by KoshurAI | Medium
January 12, 2024 - One common task in machine learning is classification, where the goal is to assign a label to a given input. To optimize the performance of these models, it is essential to choose an appropriate loss function. Hinge loss is one such function that is commonly used in classification problems, especially in the context of support vector machines (SVM...
🌐
OpenGenus
iq.opengenus.org › hinge-loss-for-svm
Hinge Loss for SVM
April 21, 2023 - Outliers can have a significant impact on the learned model and can cause overfitting, but hinge loss mitigates this effect by ignoring points that are correctly classified but are still close to the decision boundary. Sparsity: SVM with hinge loss can result in a sparse model, which means that many of the coefficients in the weight vector are set to zero.
🌐
Taylor & Francis
taylorandfrancis.com › knowledge › Engineering_and_technology › Engineering_support_and_special_topics › Hinge_loss
Hinge loss – Knowledge and References - Taylor & Francis
Hinge loss is a loss function used in training classifiers with large margins, such as support vector machines (SVM). It is designed to penalize negative margins that represent incorrect classifications.
Top answer
1 of 2
9

Searching for the quoted text, it seems the book is Data Science for Business (Provost and Fawcett), and they're describing the soft-margin SVM. Their description of the hinge loss is wrong. The problem is that it doesn't penalize misclassified points that lie within the margin, as you mentioned.

In SVMs, smaller weights correspond to larger margins. So, using this "version" of the hinge loss would have pathological consequences: We could achieve the minimum possible loss (zero) simply by choosing weights small enough such that all points lie within the margin. Even if every single point is misclassified. Because the SVM optimization problem contains a regularization term that encourages small weights (i.e. large margins), the solution will always be the zero vector. This means the solution is completely independent of the data, and nothing is learned. Needless to say, this wouldn't make for a very good classifier.

The correct expression for the hinge loss for a soft-margin SVM is:

$$\max \Big( 0, 1 - y f(x) \Big)$$

where $f(x)$ is the output of the SVM given input $x$, and $y$ is the true class (-1 or 1). When the true class is -1 (as in your example), the hinge loss looks like this:

Note that the loss is nonzero for misclassified points, as well as correctly classified points that fall within the margin.

For a proper description of soft-margin SVMs using the hinge loss formulation, see The Elements of Statistical Learning (section 12.3.2) or the Wikipedia article.

2 of 2
1

The (A) hinge function can be expressed as

$$y_{i} = \gamma \max{\left(x_{i}-\theta, 0\right)} + \varepsilon_{i},$$

where:

  • $\gamma$ is the change in slope after the hinge. In your example, this amounts to the slope following the hinge, since your hinge-only model (see below) assumes zero effect of $x$ on $y$ until the hinge.

  • $\theta$ is the point (in $\boldsymbol{x}$) at which the hinge is located, and is a parameter estimated for the model. I believe your question is answered by considering that the location of the hinge is informed by the loss function.

  • $\varepsilon_{i}$ is some error term with some distribution.

Hinge functions can also be useful in changing any line:

$$y_{i} = \alpha_{0} + \beta x_{i} + \gamma \max{\left(x_{i}-\theta, 0\right)} + \varepsilon_{i},$$

where:

  • $\alpha$ is the model constant, and the intercept of the curve before the hinge (i.e. for $x < \theta$). Of course, if $\theta < 0$, then the curve intersects the $y$-axis after the hinge so $\alpha$ will not necessarily be the $y$-intercept of the bent line.
  • $\beta$ is the slope of the line relating $y$ to $x$
  • $\gamma$ is the change in slope after the hinge.

In addition, the hinge can be used to model how a functional relationship between $y$ and $x$ changes form, as in this model where the relationship becomes quadra

$$y_{i} = \alpha_{0} + \beta x_{i} + \gamma \max{\left(x_{i}-\theta, 0\right)^{2}} + \varepsilon_{i},$$

🌐
Medium
medium.com › @jainilgosalia › hinge-loss-understanding-and-implementing-it-from-scratch-a273d786f8e6
Hinge Loss: Understanding and Implementing it from Scratch | by Jainil Gosalia | Medium
May 15, 2025 - Hinge loss is like this system — it encourages the model not only to be correct but to have a comfortable “passing score” margin. When training an SVM, your goal is to minimize two things:
🌐
Baeldung
baeldung.com › home › artificial intelligence › machine learning › differences between hinge loss and logistic loss
Differences Between Hinge Loss and Logistic Loss | Baeldung on Computer Science
February 28, 2025 - Another feature of hinge loss is that it leads to sparse models. The reason is that when solving the corresponding optimization problem, most of the training samples don’t play a role in the resulting discriminator. The ones that contribute to the margin between classes are called support vectors in the SVM model.
🌐
ScienceDirect
sciencedirect.com › science › article › abs › pii › S0925231224004405
Kernel support vector machine classifiers with ℓ0-norm hinge loss - ScienceDirect
April 16, 2024 - Their key idea is maximizing the margin from the data to the hyperplane subject to correct classification on training samples. In the SVM training model, hinge loss is sensitive to label noise and unstable for resampling.
🌐
Soulpageit
soulpageit.com › home
Hinge Loss
June 30, 2023 - Hinge loss is commonly used in SVMs, where the goal is to find the hyperplane that separates the classes with the maximum margin. SVMs aim to minimize this loss while also incorporating a regularization term to control the complexity of the model.
🌐
Quora
quora.com › Can-you-explain-why-SVMs-use-hinge-loss-function-in-a-simple-manner
Can you explain why SVMs use hinge loss function in a simple manner? - Quora
Answer: I will explain u what I understand: From the diagram, We want to maximize the distance between positive and negative points. (let's say the distance between the optimal hyperplane to both positive and negative is 1) So maximize \frac 2 {\left\| w \right\|} In other words, we can write...
🌐
Medium
medium.com › @vantakulasatyakiran › what-is-hinge-loss-that-is-used-in-svm-6b292fbbb48c
What is Hinge Loss that is used in SVM? | by Vantakula Satya kiran | Medium
January 28, 2025 - Hinge loss is a widely used loss function in machine learning, particularly for training classifiers like Support Vector Machines(SVMs).It plays a critical role in enforcing the margin-based optimization framework that defines SVMs.