The hinge loss term in soft margin SVM penalizes misclassifications. In hard margin SVM there are, by definition, no misclassifications.

This indeed means that hard margin SVM tries to minimize . Due to the formulation of the SVM problem, the margin is . As such, minimizing the norm of is geometrically equivalent to maximizing the margin. Exactly what we want!

Regularization is a technique to avoid overfitting by penalizing large coefficients in the solution vector. In hard margin SVM is both the loss function and an regularizer.

In soft-margin SVM, the hinge loss term also acts like a regularizer but on the slack variables instead of and in rather than . regularization induces sparsity, which is why standard SVM is sparse in terms of support vectors (in contrast to least-squares SVM).

Answer from Marc Claesen on Stack Exchange
Top answer
1 of 3
35

The hinge loss term in soft margin SVM penalizes misclassifications. In hard margin SVM there are, by definition, no misclassifications.

This indeed means that hard margin SVM tries to minimize . Due to the formulation of the SVM problem, the margin is . As such, minimizing the norm of is geometrically equivalent to maximizing the margin. Exactly what we want!

Regularization is a technique to avoid overfitting by penalizing large coefficients in the solution vector. In hard margin SVM is both the loss function and an regularizer.

In soft-margin SVM, the hinge loss term also acts like a regularizer but on the slack variables instead of and in rather than . regularization induces sparsity, which is why standard SVM is sparse in terms of support vectors (in contrast to least-squares SVM).

2 of 3
2

There's no "loss function" for hard-margin SVMs, but when we're solving soft-margin SVMs, it turns out the loss exists.

Now is the detailed explanation:

When we talk about loss function, what we really mean is a training objective that we want to minimize.

In hard-margin SVM setting, the "objective" is to maximize the geometric margin s.t each training example lies outside the separating hyperplane, i.e. Note that this is a quadratic programming problem, so we cannot solve it numerically using direct gradient descent approach, that is, there is no analytic "loss function" for hard-margin SVMs.

However, in soft-margin SVM setting, we add a slack variable to allow our SVM to made mistakes. We now try to solve $$\begin{aligned} & \min_{w,b,\boldsymbol{\xi}}\frac{1}{2}\Vert w \Vert_2^2 + C\sum \xi_i \\ s.t\quad &y_i(w^Tx_i+b) \ge 1-\xi_i \\ & \boldsymbol{\xi} \succeq \mathbf{0} \end{aligned} $$ This is the same as we try to penalize the misclassified training example by adding $C\xi_i$ to our objective to be minimized. Recall hinge loss: since if the training example lies outside the margin $\xi_i$ will be zero and it will only be nonzero when training example falls into margin region, and since hinge loss is always nonnegative, it happens we can rephrase our problem as We know that hinge loss is convex and its derivative is known, thus we can solve for soft-margin SVM directly by gradient descent.

So the slack variable is just hinge loss in disguise, and the property of hinge loss happens to wrap up our optimization constraints (i.e. nonnegativity and activates input when it's less than 1).

🌐
Gitbooks
sharad-s.gitbooks.io › cs231n › content › lecture_3_-_loss_functions_and_optimization › multiclass_svm_loss_deep_dive.html
Multiclass SVM Loss (Deep Dive) · CS231n
A: You would expect a loss of approximately (C-1) where C is the number of classes. This is because if you look at the equation for Multiclass SVM Loss, you will see that max(0, 0-0 + 1) evaluates to a loss of 1 for each class.
🌐
CS231n
cs231n.github.io › linear-classify
CS231n Deep Learning for Computer Vision
The score function takes the pixels and computes the vector \( f(x_i, W) \) of class scores, which we will abbreviate to \(s\) (short for scores). For example, the score for the j-th class is the j-th element: \( s_j = f(x_i, W)_j \). The Multiclass SVM loss for the i-th example is then formalized as follows: \[L_i = \sum_{j\neq y_i} \max(0, s_j - s_{y_i} + \Delta)\]
🌐
Wikipedia
en.wikipedia.org › wiki › Hinge_loss
Hinge loss - Wikipedia
January 26, 2026 - In structured prediction, the hinge loss can be further extended to structured output spaces. Structured SVMs with margin rescaling use the following variant, where w denotes the SVM's parameters, y the SVM's predictions, φ the joint feature function, and Δ the Hamming loss:
🌐
MathWorks
mathworks.com › statistics and machine learning toolbox › classification › support vector machine classification
loss - Find classification error for support vector machine (SVM) classifier - MATLAB
Example: loss(SVMModel,Tbl,Y,'Weights',W) weighs the observations in each row of Tbl using the corresponding weight in each row of the variable W in Tbl. Loss function, specified as a built-in loss function name or a function handle.
🌐
University of Oxford
robots.ox.ac.uk › ~az › lectures › ml › lect2.pdf pdf
Lecture 2: The SVM classifier
• Support Vector Machine (SVM) classifier · • Wide margin · • Cost function · • Slack variables · • Loss functions revisited · • Optimization · Binary Classification · Given training data (xi, yi) for i = 1 . . . N, with · xi ∈Rd and yi ∈{−1, 1}, learn a classifier f(x) such that ·
🌐
Medium
medium.com › data-science › optimization-loss-function-under-the-hood-part-iii-5dff33fa015d
Loss Function(Part III): Support Vector Machine | by Shuyu Luo | TDS Archive | Medium
October 17, 2018 - Take a certain sample x and certain landmark l as an example, when σ² is very large, the output of kernel function f is close 1, as σ² getting smaller, f moves towards to 0. In other words, with a fixed distance between x and l, a big σ² ...
Find elsewhere
🌐
Stack Overflow
stackoverflow.com › questions › 66740435 › svm-loss-function
python - SVM Loss Function - Stack Overflow
""" # Compute the loss num_classes = W.shape[0] # classes weights are in row wise fashion num_samples = X.shape[1] # samples of unknown images are in column-wise fashion loss = 0.0 delta = 1 # SVM parameter for i in range(num_samples): scores = np.dot(W, X[:,i]) correct_class_score = scores[y[i]] for j in range(num_classes): if j == y[i]: continue margin = max(0, scores[j] - correct_class_score + delta ) loss = loss + margin # Average loss loss = loss / num_samples return loss
🌐
NISER
niser.ac.in › ~smishra › teach › cs460 › 23cs460 › lectures › lec11.pdf pdf
HINGE LOSS IN SUPPORT VECTOR MACHINES Chandan Kumar Sahu and Maitrey Sharma
February 7, 2023 - For an intended output of t = ±1 and a classifier score y, the hinge loss of the prediction y is defined ... Note that y should be raw output of the classifier’s decision function, not the predicted class label. For instance, in linear SVMs, y = wT ·
🌐
GeeksforGeeks
geeksforgeeks.org › hinge-loss-relationship-with-support-vector-machines
Hinge-loss & relationship with Support Vector Machines - GeeksforGeeks
June 7, 2024 - Therefore the product t.y will always be negative and the value of (1-t)y will be always positive and greater than 1. So the loss function value max(0,1-t.y) will always be the value given by (1-t)y . Here the loss value will increase linearly with increase in value of y. This is indicated by the red region in above graph. Let us understand the relationship between hinge loss and svm mathematically .
🌐
PyImageSearch
pyimagesearch.com › home › blog › multi-class svm loss
Multi-class SVM Loss - PyImageSearch
April 17, 2021 - Now that we’ve taken a look at ... a worked example. We’ll again assume that we’re working with the Kaggle Dogs vs. Cats dataset, which as the name suggests, aims to classify whether a given image contains a dog or a cat. There are only two possible class labels in this dataset and is therefore a 2-class problem which can be solved using a standard, binary SVM loss function...
🌐
ResearchGate
researchgate.net › figure › Various-loss-functions-used-with-SVM_fig4_337692020
Various loss functions used with SVM | Download Scientific Diagram
Download scientific diagram | Various loss functions used with SVM from publication: Robust statistics-based support vector machine and its variants: a survey | Support vector machines (SVMs) are versatile learning models which are used for both classification and regression.
🌐
Medium
medium.com › analytics-vidhya › understanding-loss-functions-hinge-loss-a0ff112b40a1
Understanding loss functions : Hinge loss | by Kunal Chowdhury | Analytics Vidhya | Medium
January 18, 2024 - I hope, that now the intuition behind loss function and how it contributes to the overall mathematical cost of a model is clear. Almost, all classification models are based on some kind of models. E.g. Logistic regression has logistic loss (Fig 4: exponential), SVM has hinge loss (Fig 4: Support Vector), etc.
🌐
Cornell Computer Science
cs.cornell.edu › courses › cs4780 › 2018sp › lectures › lecturenote10.html
10: Empirical Risk Minimization
Remember the unconstrained SVM Formulation \[ \min_{\mathbf{w}}\ C\underset{Hinge-Loss}{\underbrace{\sum_{i=1}^{n}\max[1-y_{i}\underset{h({\mathbf{x}_i})}{\underbrace{(w^{\top}{\mathbf{x}_i}+b)}},0]}}+\underset{l_{2}-Regularizer}{\underbrace{\left\Vert w\right\Vert _{z}^{2}}} \] The hinge loss is the SVM's error function of choice, whereas the $\left.l_{2}\right.$-regularizer reflects the complexity of the solution, and penalizes complex solutions. This is an example of empirical risk minimization with a loss function $ \ell$ and a regularizer $r$, \[ \min_{\mathbf{w}}\frac{1}{n}\sum_{i=1}^{n}\underset{Loss}{\underbrace{l(h_{\mathbf{w}}({\mathbf{x}_i}),y_{i})}}+\underset{Regularizer}{\underbrace{\lambda r(w)}}, \] where the loss function is a continuous function which penalizes training error, and the regularizer is a continuous function which penalizes classifier complexity.
🌐
Programmathically
programmathically.com › home › machine learning › classical machine learning › understanding hinge loss and the svm cost function
Understanding Hinge Loss and the SVM Cost Function - Programmathically
June 26, 2022 - The further an observation lies from the plane, the more confident it is in the classification. For example, if an observation was associated with an actual outcome of +1, and the SVM produced an output of 1.5, the loss ...
🌐
OpenGenus
iq.opengenus.org › hinge-loss-for-svm
Hinge Loss for SVM
April 21, 2023 - If C is set to a very large value, then the SVM will try to minimize the hinge loss function at all costs, even if it means overfitting the data. Conversely, if C is set to a very small value, then the SVM will prioritize having a large margin, even if it means misclassifying some data points.
🌐
Anna-Lena Popkes
alpopkes.com › posts › machine_learning › support_vector_machines
Support vector machines
April 13, 2021 - If a training example ($y = 1$) is on the wrong side of the decision hyperplane (that is, $f(\mathbf{x}) \lt 0$), the hinge loss returns an even larger value. This value increases linearly with the distance from the decision hyperplane · Using the hinge loss we can reformulate the optimization problem of the primal soft-margin SVM.
🌐
MathWorks
mathworks.com › statistics and machine learning toolbox › classification › neural networks
resubLoss - Resubstitution classification loss - MATLAB
Cost is a K-by-K numeric matrix of misclassification costs. For example, Cost = ones(K) – eye(K) specifies a cost of 0 for correct classification and 1 for misclassification. ... Classification loss functions measure the predictive inaccuracy of classification models.