Brave Search

in machine learning, a loss function used for maximum‐margin classification

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended … Wikipedia

Wikipedia

en.wikipedia.org › wiki › Hinge_loss

Hinge loss - Wikipedia

January 26, 2026 - The hinge loss is a convex function, so many of the usual convex optimizers used in machine learning can work with it.

Extensions Optimization

Quora

quora.com › What-is-a-rigorous-proof-that-the-hinge-loss-is-a-convex-loss-function

What is a rigorous proof that the hinge loss is a convex loss function? - Quora

Answer: The hinge loss is the maximum of two linear functions, so you can prove it in two steps: 1. Any linear function is convex. 2. The maximum of two convex functions is convex.

Videos

22:50

YouTube

Hinge Loss, SVMs, and the Loss of Users - YouTube

Hinge Loss for Binary Classifiers - YouTube

March 14, 2020

41:14

YouTube

8. Loss Functions for Regression and Classification - YouTube

July 11, 2018

m.youtube.com

What is the Hinge Loss in SVM in Machine Learning | Data ...

10:46

YouTube

Gradient Descent for Support Vector Machines and Subgradients - ...

cs.ubc.ca › ~schmidtm › Courses › 340-F17 › L21.pdf pdf

CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017

• This is the called hinge loss. – It’s convex: max(constant,linear). – It’s not degenerate: w=0 now gives an error of 1 instead of 0. Hinge Loss: Convex Approximation to 0-1 Loss · 7 · Hinge Loss: Convex Approximation to 0-1 Loss · 8 · Hinge Loss: Convex Approximation to 0-1 Loss ·

Carnegie Mellon University

cs.cmu.edu › ~yandongl › loss.html

Loss Function

0/1 loss: $\min_\theta\sum_i L_{0/1}(\theta^Tx)$. We define $L_{0/1}(\theta^Tx) =1$ if $y\cdot f \lt 0$, and $=0$ o.w. Non convex and very hard to optimize. Hinge loss: approximate 0/1 loss by $\min_\theta\sum_i H(\theta^Tx)$. We define $H(\theta^Tx) = max(0, 1 - y\cdot f)$. Apparently $H$ ...

Stack Exchange

math.stackexchange.com › questions › 3587895 › showing-regularized-hinge-loss-is-convex-or-concave

Showing regularized Hinge Loss is convex or concave - Mathematics Stack Exchange

March 20, 2020 - From 1 and 2, $L'(w)$ is convex because max of 2 convex functions is convex.

arXiv

arxiv.org › pdf › 2103.00233 pdf

Learning with Smooth Hinge Losses Junru Luo ∗, Hong Qiao †, and Bo Zhang ‡

loss functions ψG(α; σ), ψM(α; σ). The general smooth convex loss function ψ(α) is then presented and discussed in Section 3. In Section 4, we give the smooth · support vector machine by replacing the Hinge loss with the smooth Hinge

Davidrosenberg

davidrosenberg.github.io › mlcourse › Archive › 2016 › Homework › hw6-multiclass › hw6.pdf pdf

Generalized Hinge Loss and Multiclass SVM

we will eventually need our loss function to be a convex function of some w ∈Rd that parameterizes our hypothesis · space. It’ll be clear in what follows what we’re talking about. ... we have a linear hypothesis space. We’ll start with a special case, that the hinge loss is a convex

arXiv

arxiv.org › abs › 2103.00233

[2103.00233] Learning with Smooth Hinge Losses

March 15, 2021 - In this paper, we introduce two smooth Hinge losses $\psi_G(\alpha;\sigma)$ and $\psi_M(\alpha;\sigma)$ which are infinitely differentiable and converge to the Hinge loss uniformly in $\alpha$ as $\sigma$ tends to $0$. By replacing the Hinge loss with these two smooth Hinge losses, we obtain two smooth support vector machines(SSVMs), respectively. Solving the SSVMs with the Trust Region Newton method (TRON) leads to two quadratically convergent algorithms. Experiments in text classification tasks show that the proposed SSVMs are effective in real-world applications. We also introduce a general smooth convex loss function to unify several commonly-used convex loss functions in machine learning.

Find elsewhere

Google Bing Mojeek

Stack Exchange

math.stackexchange.com › questions › 2096289 › is-this-function-hinge-loss-times-squared-error-convex

Is this function (hinge loss times squared error) convex? - Mathematics Stack Exchange

Top answer

1 of 1

Since $(x-a)^2 \geq 0$ you can rewrite $h$ to $h(x) = \max\{(x-a)^2, (1-ax)(x-a)^2\}$. Since the second function is only larger than the first one if $ax \leq 0$, you can replace its value for $ax\geq 0$ without changing $h$. Define

$$g(x) = \begin{cases}(1-ax)(x-a)^2 & \text{if } ax \leq 0 \\ a^2 - (a^3+2a)x & \text{else.} \end{cases}$$ Since the first branch is convex, and the second branch is just the tangent of the first branch at $x=0$, the function $g$ is convex. Now $h(x) = \max\{(x-a)^2, g(x)\}$, i.e., the maximum of two convex functions, and therefore convex.

Davidrosenberg

davidrosenberg.github.io › mlcourse › Archive › 2017 › Homework › hw5.pdf pdf

Homework 5: Generalized Hinge Loss and Multiclass SVM

New homework on multiclass hinge loss and multiclass SVM · New homework on Bayesian methods, specifically the beta-binomial model, hierarchical models, empirical Bayes ML-II, MAP-II · New short lecture on correlated variables with L1, L2, and Elastic Net regularization · Added some details about subgradient methods, including a one-slide proof that subgradient descent moves us towards a minimizer of a convex ...

Baeldung

baeldung.com › home › artificial intelligence › machine learning › differences between hinge loss and logistic loss

Differences Between Hinge Loss and Logistic Loss | Baeldung on Computer Science

February 28, 2025 - The hinge loss isn’t differentiable everywhere but is convex and simple to compute and optimize.

Wikipedia

en.wikipedia.org › wiki › Loss_functions_for_classification

Loss functions for classification - Wikipedia

January 12, 2026 - While the hinge loss function is both convex and continuous, it is not smooth (is not differentiable) at

Bayes consistency Proper loss functions, loss margin and regularization Square loss Logistic loss Exponential loss Savage loss Tangent loss Hinge loss Generalized smooth hinge loss

JMLR

jmlr.org › papers › v9 › bartlett08a.html

Classification with a Reject Option using a Hinge Loss

Just as in the conventional classification problem, minimization of the sample average of the cost is a difficult optimization problem. As an alternative, we propose the optimization of a certain convex loss function φ, analogous to the hinge loss used in support vector machines (SVMs).

arXiv

arxiv.org › abs › 1309.6813

[1309.6813] Hinge-loss Markov Random Fields: Convex Inference for Structured Prediction

September 26, 2013 - Graphical models for structured domains are powerful tools, but the computational complexities of combinatorial prediction spaces can force restrictions on models, or require approximate inference in order to be tractable. Instead of working in a combinatorial space, we use hinge-loss Markov random fields (HL-MRFs), an expressive class of graphical models with log-concave density functions over continuous variables, which can represent confidences in discrete predictions.

Gabormelli

gabormelli.com › RKB › Hinge-Loss_Function

Hinge-Loss Function - GM-RKB - Gabor Melli

The hinge loss function is defined as : [math]\displaystyle{ V(f(\vec{x}),y) = \max(0, 1-yf(\vec{x})) = |1 - yf(\vec{x}) |_{+}. }[/math] The hinge loss provides a relatively tight, convex upper bound on the 0–1 indicator function. Specifically, the hinge loss equals the 0–1 indicator function ...

ScienceDirect

sciencedirect.com › topics › engineering › hinge-loss-function

Hinge Loss Function - an overview | ScienceDirect Topics

The hinge loss encourages the network to maximize the margin around the decision boundary separating the two classes, which can lead to better generalization performance than using cross-entropy. Additionally, the hinge loss has sparse gradients, which can be useful for training large models with limited memory (unlike cross-entropy with dense gradients). A frequently used variant of the hinge loss is the squared hinge loss, given by

arXiv

arxiv.org › abs › 1512.07797

[1512.07797] The Lovász Hinge: A Novel Convex Surrogate for Submodular Losses

May 15, 2017 - We propose instead a novel surrogate ... loss function to compute a gradient or cutting-plane. We prove that the Lovász hinge is convex and yields an extension....

Stack Exchange

stats.stackexchange.com › questions › 187186 › whats-the-relationship-between-an-svm-and-hinge-loss

regression - What's the relationship between an SVM and hinge loss? - Cross Validated

Top answer

1 of 1

Here's my attempt to answer your questions:

Is an SVM as simple as saying it's a discriminative classifier that simply optimizes the hinge loss? Or is it more complex than that? Yes, you can say that. Also, don't forget that it regularizes the model too. I wouldn't say SVM is more complex than that, however, it is important to mention that all of those choices (e.g. hinge loss and $L_2$ regularization) have precise mathematical interpretations and are not arbitrary. That's what makes SVMs so popular and powerful. For example, hinge loss is a continuous and convex upper bound to the task loss which, for binary classification problems, is the $0/1$ loss. Note that $0/1$ loss is non-convex and discontinuous. Convexity of hinge loss makes the entire training objective of SVM convex. The fact that it is an upper bound to the task loss guarantees that the minimizer of the bound won't have a bad value on the task loss. $L_2$ regularization can be geometrically interpreted as the size of the margin.
How do the support vectors come into play? Support vectors play an important role in training SVMs. They identify the separating hyperplane. Let $D$ denote a training set and $SV(D) \subseteq D$ be the set of support vectors that you get by training an SVM on $D$ (assume all hyperparameters are fixed a priori). If we throw out all the non-SV samples from $D$ and train another SVM (with the same hyperparameter values) on the remaining samples (i.e. on $SV(D)$) we get the same exact classifier as before!
What about the slack variables? SVM was originally designed for problems where there exists a separating hyperplane (i.e. a hyperplane that perfectly separates the training samples from the two classes), and the goal was to find, among all separating hyperplanes, the hyperplane with the largest margin. The margin, denoted by $d(w, D)$, is defined for a classifier $w$ and a training set $D$. Assuming $w$ perfectly separates all the examples in $D$, we have $d(w, D) = \min_{(x, y) \in D} y \frac{w^Tx}{||w||_2}$, which is the distance of the closest training example from the separating hyperplane $w$. Note that $y \in \{+1, -1\}$ here. The introduction of slack variables made it possible to train SVMs on problems where either 1) a separating hyperplane does not exist (i.e. the training data is not linearly separable), or 2) you are happy to (or would like to) sacrifice making some error (higher bias) for better generalization (lower variance). However, this comes at the price of breaking some of the concrete mathematical and geometric interpretations of SVMs without slack variables (e.g. the geometrical interpretation of the margin).
Why can't you have deep SVM's? SVM objective is convex. More precisely, it is piecewise quadratic; that is because the $L_2$ regularizer is quadratic and the hinge loss is piecewise linear. The training objectives in deep hierarchical models, however, are much more complex. In particular, they are not convex. Of course, one can design a hierarchical discriminative model with hinge loss and $L_2$ regularization etc., but, it wouldn't be called an SVM. In fact, the hinge loss is commonly used in DNNs (Deep Neural Networks) for classification problems.

Massachusetts Institute of Technology

mit.edu › ~9.520 › spring07 › Classes › svmwithfenchel.pdf pdf

Several Views of Support Vector Machines Ryan M. Rifkin

Unfortunately, the 0-1 loss is not convex. Therefore, we · have little hope of being able to optimize this loss function · in practice. (Note that the representer theorem does hold ... This is (basically) an SVM. So what? How will you solve this problem (ﬁnd the minimizing y)? The hinge ...

Stack Exchange

stats.stackexchange.com › questions › 520792 › hinge-loss-is-the-tightest-convex-upper-bound-on-the-0-1-loss

svm - Hinge loss is the tightest convex upper bound on the 0-1 loss - Cross Validated

April 21, 2021 - I have read many times that the hinge loss is the tightest convex upper bound on the 0-1 loss (e.g. here, here and here). However, I have never seen a formal proof of this statement. How can we for...