is hinge loss convex in machine learning

in machine learning, a loss function used for maximum‐margin classification

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended … Wikipedia

Wikipedia

en.wikipedia.org › wiki › Hinge_loss

Hinge loss - Wikipedia

January 26, 2026 - The hinge loss is a convex function, so many of the usual convex optimizers used in machine learning can work with it.

Extensions Optimization

Carnegie Mellon University

cs.cmu.edu › ~yandongl › loss.html

Loss Function

Square loss: $\min_\theta \sum_i||y^{(i)}-\theta^Tx^{(i)}||^2$ Fortunately, hinge loss, logistic loss and square loss are all convex functions.

Videos

m.youtube.com

What is the Hinge Loss in SVM in Machine Learning | Data ...

10:46

YouTube

Gradient Descent for Support Vector Machines and Subgradients - ...

Hinge Loss, SVMs, and the Loss of Users - YouTube

Hinge Loss for Binary Classifiers - YouTube

March 14, 2020

41:14

YouTube

8. Loss Functions for Regression and Classification - YouTube

July 11, 2018

View all

Baeldung

baeldung.com › home › artificial intelligence › machine learning › differences between hinge loss and logistic loss

Differences Between Hinge Loss and Logistic Loss | Baeldung on Computer Science

February 28, 2025 - One of the main characteristics of hinge loss is that it’s a convex function. This makes it different from other losses such as the 0-1 loss. With convexity comes the existence of a global optimum.

ScienceDirect

sciencedirect.com › topics › engineering › hinge-loss-function

Hinge Loss Function - an overview | ScienceDirect Topics

Deep neural networks with several hidden layers and/or the inclusion of non-linear activation functions (discussed in Section 16.2.2) typically have loss landscapes (i.e., a surface in some high-dimensional space defined by the loss function) that are highly non-convex.

UBC Computer Science

cs.ubc.ca › ~schmidtm › Courses › 340-F17 › L21.pdf pdf

CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017

• This is the called hinge loss. – It’s convex: max(constant,linear). – It’s not degenerate: w=0 now gives an error of 1 instead of 0. Hinge Loss: Convex Approximation to 0-1 Loss · 7 · Hinge Loss: Convex Approximation to 0-1 Loss · 8 · Hinge Loss: Convex Approximation to 0-1 Loss ·

Davidrosenberg

davidrosenberg.github.io › mlcourse › Archive › 2016 › Homework › hw6-multiclass › hw6.pdf pdf

DS-GA 1003: Machine Learning and Computational Statistics

But to solve our machine learning ... we’re talking about. ... we have a linear hypothesis space. We’ll start with a special case, that the hinge loss is a convex...

arXiv

arxiv.org › pdf › 2103.00233 pdf

Learning with Smooth Hinge Losses Junru Luo ∗, Hong Qiao †, and Bo Zhang ‡

SVMs with diﬀerent convex loss functions and then introduce the smooth Hinge · loss functions ψG(α; σ), ψM(α; σ). The general smooth convex loss function ψ(α) is then presented and discussed in Section 3. In Section 4, we give the smooth · support vector machine by replacing the ...

Core

files01.core.ac.uk › download › pdf › 213011306.pdf pdf

From Convex to Nonconvex: a Loss Function Analysis for Binary Classiﬁcation

convex [8]. The main advantage of this type of loss functions is · the computational simplicity, and complex global optimization · approaches can be avoided. Square loss and hinge loss are the · most commonly adopted loss functions in machine learning.

Find elsewhere

Google Bing Mojeek

Gabormelli

gabormelli.com › RKB › Hinge-Loss_Function

Hinge-Loss Function - GM-RKB - Gabor Melli

The hinge loss function is defined as : [math]\displaystyle{ V(f(\vec{x}),y) = \max(0, 1-yf(\vec{x})) = |1 - yf(\vec{x}) |_{+}. }[/math] The hinge loss provides a relatively tight, convex upper bound on the 0–1 indicator function. Specifically, the hinge loss equals the 0–1 indicator function ...

Tel Aviv University

tau.ac.il › ~mansour › advanced-agt+ml › scribe2_covex_func.pdf pdf

Advanced Topics in Machine Learning and Algorithmic Game Theory

November 7, 2011 - hyperplane (essentially, an hypothesis), x ∈X and y ∈[−1, 1]. The hinge loss is a max- imum of linear functions and therefore convex.

Wikipedia

en.wikipedia.org › wiki › Loss_functions_for_classification

Loss functions for classification - Wikipedia

January 12, 2026 - The hinge loss provides a relatively tight, convex upper bound on the 0–1 indicator function. Specifically, the hinge loss equals the 0–1 indicator function when ... {\displaystyle |yf({\vec {x}})|\geq 1} . In addition, the empirical risk ...

Bayes consistency Proper loss functions, loss margin and regularization Square loss Logistic loss Exponential loss Savage loss Tangent loss Hinge loss Generalized smooth hinge loss

Shadecoder

shadecoder.com › topics › hinge-loss-a-comprehensive-guide-for-2025

Hinge Loss: A Comprehensive Guide for 2025 - Shadecoder - 100% Invisibile AI Coding Interview Copilot

• Focuses on hard examples: Because correctly classified examples with a sufficient margin receive zero loss, training emphasizes borderline or misclassified points, which can accelerate learning where it matters most. • Convexity for linear models: For linear classifiers, hinge loss is convex, ...

Quora

quora.com › What-is-a-rigorous-proof-that-the-hinge-loss-is-a-convex-loss-function

What is a rigorous proof that the hinge loss is a convex loss function? - Quora

Answer: The hinge loss is the maximum of two linear functions, so you can prove it in two steps: 1. Any linear function is convex. 2. The maximum of two convex functions is convex.

Texas A&M University

people.tamu.edu › ~sji › classes › loss-slides.pdf pdf

A Uniﬁed View of Loss Functions in Supervised Learning Shuiwang Ji

are convex. Note that the hinge loss and perceptron loss are not · strictly convex. 10 / 12 · Comparison of diﬀerent loss functions in a uniﬁed view · 11 / 12 · THANKS!

Stack Exchange

math.stackexchange.com › questions › 3587895 › showing-regularized-hinge-loss-is-convex-or-concave

Showing regularized Hinge Loss is convex or concave - Mathematics Stack Exchange

March 20, 2020 - Thus, $L(w)$ is convex too.

Grokipedia

grokipedia.com › hinge loss

Hinge loss — Grokipedia

January 14, 2026 - Hinge loss is a convex loss function primarily used in binary classification tasks within machine learning to penalize predictions that are incorrect or positioned too close to the decision boundary, thereby promoting robust separation between ...

Stack Exchange

stats.stackexchange.com › questions › 222585 › what-are-the-impacts-of-choosing-different-loss-functions-in-classification-to-a

machine learning - What are the impacts of choosing different loss functions in classification to approximate 0-1 loss - Cross Validated

Top answer

1 of 3

Some of my thoughts, may not be correct though.

I understand the reason we have such design (for hinge and logistic loss) is we want the objective function to be convex.

Convexity is surely a nice property, but I think the most important reason is we want the objective function to have non-zero derivatives, so that we can make use of the derivatives to solve it. The objective function can be non-convex, in which case we often just stop at some local optima or saddle points.

and interestingly, it also penalize correctly classified instances if they are weakly classified. It is a really strange design.

I think such design sort of advises the model to not only make the right predictions, but also be confident about the predictions. If we don't want correctly classified instances to get punished, we can for example, move the hinge loss (blue) to the left by 1, so that they no longer get any loss. But I believe this often leads to worse result in practice.

what are the prices we need to pay by using different "proxy loss functions", such as hinge loss and logistic loss?

IMO by choosing different loss functions we are bringing different assumptions to the model. For example, the logistic regression loss (red) assumes a Bernoulli distribution, the MSE loss (green) assumes a Gaussian noise.

Following the least squares vs. logistic regression example in PRML, I added the hinge loss for comparison.

As shown in the figure, hinge loss and logistic regression / cross entropy / log-likelihood / softplus have very close results, because their objective functions are close (figure below), while MSE is generally more sensitive to outliers. Hinge loss does not always have a unique solution because it's not strictly convex.

However one important property of hinge loss is, data points far away from the decision boundary contribute nothing to the loss, the solution will be the same with those points removed.

The remaining points are called support vectors in the context of SVM. Whereas SVM uses a regularizer term to ensure the maximum margin property and a unique solution.

2 of 3

Posting a late reply, since there is a very simple answer which has not been mentioned yet.

what are the prices we need to pay by using different "proxy loss functions", such as hinge loss and logistic loss?

When you replace the non-convex 0-1 loss function by a convex surrogate (e.g hinge-loss), you are actually now solving a different problem than the one you intended to solve (which is to minimize the number of classification mistakes). So you gain computational tractability (the problem becomes convex, meaning you can solve it efficiently using tools of convex optimization), but in the general case there is actually no way to relate the error of the classifier that minimizes a "proxy" loss and the error of the classifier that minimizes the 0-1 loss. If what you truly cared about was minimizing the number of misclassifications, I argue that this really is a big price to pay.

I should mention that this statement is worst-case, in the sense that it holds for any distribution $\mathcal D$. For some "nice" distributions, there are exceptions to this rule. The key example is of data distributions that have large margins w.r.t the decision boundary - see Theorem 15.4 in Shalev-Shwartz, Shai, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.

Aiml

aiml.com › home › posts › machine learning interview questions › supervised learning › classification › support vector machine › how does hinge loss differ from logistic loss?

How does hinge loss differ from logistic loss? - AIML.com

March 26, 2023 - This property is one of the reasons SVM performs very well on many data sets, as it enables hyperplanes to find margins that result in the highest accuracy possible. As can be seen in the graphs above, hinge loss is non-differentiable, which means that the optimization problem is no longer convex.

ScienceDirect

sciencedirect.com › science › article › abs › pii › S0925231221012509

Learning with smooth Hinge losses - ScienceDirect

August 18, 2021 - Although first-order methods are ... Motivated by the proposed smooth Hinge losses, we also propose a general smooth convex loss function...

Stack Exchange

stats.stackexchange.com › questions › 187186 › whats-the-relationship-between-an-svm-and-hinge-loss

regression - What's the relationship between an SVM and hinge loss? - Cross Validated