in machine learning, a loss function used for maximum‐margin classification
In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended … Wikipedia
🌐
Wikipedia
en.wikipedia.org › wiki › Hinge_loss
Hinge loss - Wikipedia
January 26, 2026 - In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined ...
🌐
Medium
medium.com › analytics-vidhya › understanding-loss-functions-hinge-loss-a0ff112b40a1
Understanding loss functions : Hinge loss | by Kunal Chowdhury | Analytics Vidhya | Medium
January 18, 2024 - Looking at the graph for SVM in Fig 4, we can see that for yf(x) ≥ 1, hinge loss is ‘0’. However, when yf(x) < 1, then hinge loss increases massively.
Discussions

neural networks - What is the definition of the hinge loss function? - Artificial Intelligence Stack Exchange
I came across the hinge loss function for training a neural network model, but I did not know the analytical form for the same. I can write the mean squared error loss function (which is more often used for regression) as More on ai.stackexchange.com
🌐 ai.stackexchange.com
February 11, 2021
machine learning - hinge loss vs logistic loss advantages and disadvantages/limitations - Cross Validated
$\begingroup$ +1. Minimizing logistic loss corresponds to maximizing binomial likelihood. Minimizing squared-error loss corresponds to maximizing Gaussian likelihood (it's just OLS regression; for 2-class classification it's actually equivalent to LDA). Do you know if minimizing hinge loss ... More on stats.stackexchange.com
🌐 stats.stackexchange.com
April 14, 2015
Why do we use log-loss in logistic regression instead of just taking the absolute difference between expected probability and actual value for each instance?
You can try it and see if it works🤷‍♂️ Absolute is usually avoided because makes a "V" shaped gradient. Sharp corners are bad in general for gradient based optimization. Same reason we use MSE or RMSE instead of absolute error for regression tasks. More on reddit.com
🌐 r/learnmachinelearning
9
3
April 26, 2023
Is support vector machine just about simplifying logistic regression formula? If so, why this name?

No. The main difference between the costs function is that the cross entropy loss (CEL) penalizes based on prediction distance from the answer. So if something is predicted using CEL as class 1 with probability 0.51 and it is actually class 1, it is penalized more strongly than if it had been predicted with probability 0.99, but for the hinge loss for SVM it's just counted the same whether or not you barely predict the answer or have high confidence. However both methods are penalized by 'distance' when they predict the wrong answer

More on reddit.com
🌐 r/learnmachinelearning
13
13
July 12, 2020
🌐
ScienceDirect
sciencedirect.com › topics › engineering › hinge-loss-function
Hinge Loss Function - an overview | ScienceDirect Topics
The loss is computed based on some predefined loss function (e.g., mean squared error [MSE] for regression or cross-entropy for classification tasks, discussed in Section 16.2.3), which measures the difference between the network's predictions and actual targets.
🌐
Analytics Vidhya
analyticsvidhya.com › home › what is hinge loss in machine learning?
What is Hinge loss in Machine Learning?
December 23, 2024 - Hinge loss in machine learning, a key loss function in SVMs, enhances model robustness by penalizing incorrect or marginal predictions.
🌐
arXiv
arxiv.org › pdf › 2103.00233 pdf
Learning with Smooth Hinge Losses Junru Luo ∗, Hong Qiao †, and Bo Zhang ‡
Hinge loss with these two smooth Hinge losses, we obtain two smooth support · vector machines (SSVMs) which can be solved with second-order methods. In · particular, they can be solved by the inexact Newton method with a quadratic · 2 · convergence rate as conducted in [1, 20] for the logistic regression.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › hinge-loss-relationship-with-support-vector-machines
Hinge-loss & Relationship with Support Vector Machines - GeeksforGeeks
August 21, 2025 - Its purpose is to penalize predictions that are incorrect or insufficiently confident in the context of binary classification. It is used in binary classification problems where the objective is to separate the data points in two classes typically ...
🌐
Medium
koshurai.medium.com › understanding-hinge-loss-in-machine-learning-a-comprehensive-guide-0a1c82478de4
Understanding Hinge Loss in Machine Learning: A Comprehensive Guide | by KoshurAI | Medium
January 12, 2024 - The key idea behind hinge loss is to penalize the model more when it misclassifies a sample that is closer to the decision boundary.
Find elsewhere
🌐
Baeldung
baeldung.com › home › artificial intelligence › machine learning › differences between hinge loss and logistic loss
Differences Between Hinge Loss and Logistic Loss | Baeldung on Computer Science
February 28, 2025 - Between the margins (), however, even if a sample’s prediction is correct, there’s still a small loss. This is to penalize the model for making less certain predictions. ... One of the main characteristics of hinge loss is that it’s a convex function. This makes it different from other losses such as the 0-1 loss.
🌐
scikit-learn
scikit-learn.org › stable › modules › generated › sklearn.metrics.hinge_loss.html
hinge_loss — scikit-learn 1.8.0 documentation
In binary class case, assuming labels in y_true are encoded with +1 and -1, when a prediction mistake is made, margin = y_true * pred_decision is always negative (since the signs disagree), implying 1 - margin is always greater than 1. The cumulated hinge loss is therefore an upper bound of the number of mistakes made by the classifier.
🌐
Cornell Computer Science
cs.cornell.edu › courses › cs4780 › 2018sp › lectures › lecturenote10.html
10: Empirical Risk Minimization
Remember the unconstrained SVM ...,0]}}+\underset{l_{2}-Regularizer}{\underbrace{\left\Vert w\right\Vert _{z}^{2}}} \] The hinge loss is the SVM's error function of choice, whereas the $\left.l_{2}\right.$-regularizer reflects the complexity of the solution, and penalizes complex ...
🌐
Davidrosenberg
davidrosenberg.github.io › ml2015 › docs › 3a.loss-functions.pdf pdf
Loss Functions for Regression and Classification David Rosenberg
Most classification losses depend only on the margin. ... Optimization is NP-Hard. ... Hinge is a convex, upper bound on 0−1 loss.
🌐
NISER
niser.ac.in › ~smishra › teach › cs460 › 23cs460 › lectures › lec11.pdf pdf
HINGE LOSS IN SUPPORT VECTOR MACHINES Chandan Kumar Sahu and Maitrey Sharma
February 7, 2023 - Figure. The support vector loss function (hinge loss), compared to the negative log-likelihood loss (binomial · deviance) for logistic regression, squared-error loss, and a “Huberized” version of the squared hinge loss.
🌐
NeurIPS
papers.neurips.cc › paper › 1610-linear-hinge-loss-and-average-margin.pdf
Linear Hinge Loss and Average Margin
In 2021, NeurIPS introduced a new track, Datasets and Benchmarks. The first year of that track, 2021, has its own proceedings, accessible by the link below. From 2022 on, the Datasets and Benchmarks papers are in the main NeurIPS proceedings · Requests for name changes in the electronic ...
🌐
Towards Data Science
towardsdatascience.com › home › latest › a definitive explanation to hinge loss for support vector machines.
A definitive explanation to Hinge Loss for Support Vector Machines. | Towards Data Science
January 23, 2025 - We see that correctly classified points will have a small(or none) loss size, while incorrectly classified instances will have a high loss size. A negative distance from the boundary incurs a high hinge loss.
Top answer
1 of 3
34

Logarithmic loss minimization leads to well-behaved probabilistic outputs.

Hinge loss leads to some (not guaranteed) sparsity on the dual, but it doesn't help at probability estimation. Instead, it punishes misclassifications (that's why it's so useful to determine margins): diminishing hinge-loss comes with diminishing across margin misclassifications.

So, summarizing:

  • Logarithmic loss ideally leads to better probability estimation at the cost of not actually optimizing for accuracy

  • Hinge loss ideally leads to better accuracy and some sparsity at the cost of not actually estimating probabilities

In ideal scenarios, each respective method would excel in their domain (accuracy vs probability estimation). However, due to the No-Free-Lunch Theorem, it is not possible to know, a priori, if the model choice is optimal.

2 of 3
6

@Firebug had a good answer (+1). In fact, I had a similar question here.

What are the impacts of choosing different loss functions in classification to approximate 0-1 loss

I just want to add more on another big advantages of logistic loss: probabilistic interpretation. An example, can be found in UCLA - Advanced Research - Statistical Methods and Data Analysis - Computing Logit Regression | R Data Analysis Examples

Specifically, logistic regression is a classical model in statistics literature. (See, What does the name "Logistic Regression" mean? for the naming.) There are many important concept related to logistic loss, such as maximize log likelihood estimation, likelihood ratio tests, as well as assumptions on binomial. Here are some related discussions.

Likelihood ratio test in R

Why isn't Logistic Regression called Logistic Classification?

Is there i.i.d. assumption on logistic regression?

Difference between logit and probit models

🌐
Techkluster
techkluster.com › technology › hinge-loss-vs-logistic-loss
Differences Between Hinge Loss and Logistic Loss – TechKluster
Hinge loss increases linearly with the margin between the prediction and the true label, and it is zero when the prediction is on the correct side of the margin. Logistic loss, also known as cross-entropy loss or log loss, is commonly used in logistic regression and other probabilistic ...
🌐
TTIC
home.ttic.edu › ~nati › Publications › RennieSrebroIJCAI05.pdf pdf
Loss Functions for Preference Levels: Regression with Discrete Ordered Labels
The hinge loss, as well as the smoothed hinge, in- troduce a linear dependence on the magnitude of the error, but such a linear (at least) dependence is unavoidable in a · convex loss function. The modified least squares goes be- yond this necessary dependence on the magnitude of the er- ror, and introduces an unnecessary (from the point of view of · convexity) quadratic dependence, further deviating from the · zero/one margin error. Logistic regression ·
🌐
ScienceDirect
sciencedirect.com › science › article › abs › pii › S0031320320301989
Robust twin support vector regression based on rescaled Hinge loss - ScienceDirect
April 28, 2020 - In this work, with the help of the rescaled Hinge loss, we propose a twin support vector regression (TSVR) model that is robust to noise. The corresponding optimization problem turns out to be non-convex with smooth l2 regularizer. To solve the problem efficiently, we convert it to its dual form, thereby transforming it into a convex optimization problem.