The hinge loss term in soft margin SVM penalizes misclassifications. In hard margin SVM there are, by definition, no misclassifications.

This indeed means that hard margin SVM tries to minimize $\|\mathbf{w}\|^2$. Due to the formulation of the SVM problem, the margin is $2/\|\mathbf{w}\|$. As such, minimizing the norm of is geometrically equivalent to maximizing the margin. Exactly what we want!

Regularization is a technique to avoid overfitting by penalizing large coefficients in the solution vector. In hard margin SVM $\|\mathbf{w}\|^2$ is both the loss function and an regularizer.

In soft-margin SVM, the hinge loss term also acts like a regularizer but on the slack variables instead of and in rather than . regularization induces sparsity, which is why standard SVM is sparse in terms of support vectors (in contrast to least-squares SVM).

Answer from Marc Claesen on Stack Exchange
🌐
MIT CSAIL
people.csail.mit.edu › dsontag › courses › ml14 › slides › lecture2.pdf pdf
Support vector machines (SVMs) Lecture 2 David Sontag New York University
Allowing for slack: “Soft margin SVM” · For each data point: • If margin ≥ 1, don’t care · • If margin < 1, pay linear penalty · w.x + b = +1 · w.x + b = -1 · w.x + b = 0 · + C Σj ξj · - ξj · ξj≥0 · Slack penalty C > 0: • C=∞  have to separate the data! • C=0  ignores the data entirely! • Select using cross-validation · “slack variables” · ξ2 · ξ1 · ξ3 · ξ4 · Equivalent formulation using hinge loss ·
🌐
Wikipedia
en.wikipedia.org › wiki › Hinge_loss
Hinge loss - Wikipedia
January 26, 2026 - In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined ...
🌐
Medium
medium.com › data-science › optimization-loss-function-under-the-hood-part-iii-5dff33fa015d
Loss Function(Part III): Support Vector Machine | by Shuyu Luo | TDS Archive | Medium
October 17, 2018 - Remember putting the raw model output into Sigmoid Function gives us the Logistic Regression’s hypothesis. What is the hypothesis for SVM? It’s simple and straightforward. When θᵀx ≥ 0, predict 1, otherwise, predict 0. ... Then back to loss function plot, aka.
🌐
University of Oxford
robots.ox.ac.uk › ~az › lectures › ml › lect2.pdf pdf
Lecture 2: The SVM classifier
• Support Vector Machine (SVM) classifier · • Wide margin · • Cost function · • Slack variables · • Loss functions revisited · • Optimization · Binary Classification · Given training data (xi, yi) for i = 1 . . . N, with · xi ∈Rd and yi ∈{−1, 1}, learn a classifier f(x) such that ·
🌐
CS231n
cs231n.github.io › linear-classify
CS231n Deep Learning for Computer Vision
The difference was only 2, which is why the loss comes out to 8 (i.e. how much higher the difference would have to be to meet the margin). In summary, the SVM loss function wants the score of the correct class \(y_i\) to be larger than the incorrect class scores by at least by \(\Delta\) (delta).
🌐
Gitbooks
sharad-s.gitbooks.io › cs231n › content › lecture_3_-_loss_functions_and_optimization › multiclass_svm_loss_deep_dive.html
Multiclass SVM Loss (Deep Dive) · CS231n
The SVM loss function wants the score of the correct class yi to be larger than the incorrect class scores by at least by Δ (delta).
🌐
MathWorks
mathworks.com › statistics and machine learning toolbox › classification › support vector machine classification
loss - Find classification error for support vector machine (SVM) classifier - MATLAB
L = loss(SVMModel,Tbl,ResponseVarName) returns the classification error (see Classification Loss), a scalar representing how well the trained support vector machine (SVM) classifier (SVMModel) classifies the predictor data in table Tbl compared to the true class labels in Tbl.ResponseVarName.
Top answer
1 of 3
35

The hinge loss term in soft margin SVM penalizes misclassifications. In hard margin SVM there are, by definition, no misclassifications.

This indeed means that hard margin SVM tries to minimize $\|\mathbf{w}\|^2$. Due to the formulation of the SVM problem, the margin is $2/\|\mathbf{w}\|$. As such, minimizing the norm of is geometrically equivalent to maximizing the margin. Exactly what we want!

Regularization is a technique to avoid overfitting by penalizing large coefficients in the solution vector. In hard margin SVM $\|\mathbf{w}\|^2$ is both the loss function and an regularizer.

In soft-margin SVM, the hinge loss term also acts like a regularizer but on the slack variables instead of and in rather than . regularization induces sparsity, which is why standard SVM is sparse in terms of support vectors (in contrast to least-squares SVM).

2 of 3
2

There's no "loss function" for hard-margin SVMs, but when we're solving soft-margin SVMs, it turns out the loss exists.

Now is the detailed explanation:

When we talk about loss function, what we really mean is a training objective that we want to minimize.

In hard-margin SVM setting, the "objective" is to maximize the geometric margin s.t each training example lies outside the separating hyperplane, i.e. $$\begin{aligned} & \max_{\gamma, w, b}\frac{1}{\Vert w \Vert} \\ &s.t\quad y(w^Tx+b) \ge 1 \end{aligned} $$ Note that this is a quadratic programming problem, so we cannot solve it numerically using direct gradient descent approach, that is, there is no analytic "loss function" for hard-margin SVMs.

However, in soft-margin SVM setting, we add a slack variable to allow our SVM to made mistakes. We now try to solve $$\begin{aligned} & \min_{w,b,\boldsymbol{\xi}}\frac{1}{2}\Vert w \Vert_2^2 + C\sum \xi_i \\ s.t\quad &y_i(w^Tx_i+b) \ge 1-\xi_i \\ & \boldsymbol{\xi} \succeq \mathbf{0} \end{aligned} $$ This is the same as we try to penalize the misclassified training example by adding to our objective to be minimized. Recall hinge loss: since if the training example lies outside the margin will be zero and it will only be nonzero when training example falls into margin region, and since hinge loss is always nonnegative, it happens we can rephrase our problem as We know that hinge loss is convex and its derivative is known, thus we can solve for soft-margin SVM directly by gradient descent.

So the slack variable is just hinge loss in disguise, and the property of hinge loss happens to wrap up our optimization constraints (i.e. nonnegativity and activates input when it's less than 1).

Find elsewhere
🌐
Programmathically
programmathically.com › home › machine learning › classical machine learning › understanding hinge loss and the svm cost function
Understanding Hinge Loss and the SVM Cost Function - Programmathically
June 26, 2022 - The hinge loss function is most commonly employed to regularize soft margin support vector machines. The degree of regularization determines how aggressively the classifier tries to prevent misclassifications and can be controlled with an additional ...
🌐
Medium
medium.com › analytics-vidhya › loss-functions-multiclass-svm-loss-and-cross-entropy-loss-9190c68f13e0
Loss Functions — Multiclass SVM Loss and Cross Entropy Loss | by Ramji Balasubramanian | Analytics Vidhya | Medium
December 24, 2020 - Multi-class SVM Loss (as the name suggests) is inspired by (Linear) Support Vector Machines (SVMs), which uses a scoring function f to map our data points to numerical scores for each class labels.
🌐
NISER
niser.ac.in › ~smishra › teach › cs460 › 23cs460 › lectures › lec11.pdf pdf
HINGE LOSS IN SUPPORT VECTOR MACHINES Chandan Kumar Sahu and Maitrey Sharma
February 7, 2023 - For an intended output of t = ±1 and a classifier score y, the hinge loss of the prediction y is defined ... Note that y should be raw output of the classifier’s decision function, not the predicted class label. For instance, in linear SVMs, y = wT ·
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › hinge-loss-relationship-with-support-vector-machines
Hinge-loss & Relationship with Support Vector Machines - GeeksforGeeks
August 21, 2025 - Hinge loss is a loss function widely used in machine learning for training classifiers such as support vector machines (SVMs). Its purpose is to penalize predictions that are incorrect or insufficiently confident in the context of binary ...
🌐
PyImageSearch
pyimagesearch.com › home › blog › multi-class svm loss
Multi-class SVM Loss - PyImageSearch
April 17, 2021 - Given a scoring function (which maps input data to output class labels), our loss function can be used to quantify how “good” or “bad” our scoring function is at predicting the correct class labels in our dataset.
🌐
arXiv
arxiv.org › abs › 2403.16654
[2403.16654] A Novel Loss Function-based Support Vector Machine for Binary Classification
March 25, 2024 - This oversight affects the generalization ability of the SVM classifier to some extent. To address this limitation, from the perspective of confidence margin, we propose a novel Slide loss function ($\ell_s$) to construct the support vector machine classifier($\ell_s$-SVM).
🌐
Stack Overflow
stackoverflow.com › questions › 66740435 › svm-loss-function
python - SVM Loss Function - Stack Overflow
def svm_loss_naive(W, X, y): """ SVM loss function, naive implementation calculating loss for each sample using loops. Inputs: - X: A numpy array of shape (n, m) containing data(samples).
🌐
ScienceDirect
sciencedirect.com › science › article › abs › pii › S0957417423026702
Support vector machine with eagle loss function - ScienceDirect
October 19, 2023 - SVM utilizes the hinge loss function and maximum margin to find the separating hyperplane. In SVM, only the boundary instances/support vectors confine…
🌐
Anna-Lena Popkes
alpopkes.com › posts › machine_learning › support_vector_machines
Support vector machines
April 13, 2021 - Theoretically, the primal SVM can be solved in multiple ways. The most well known way is to use the hinge loss function together with subgradient descent.