hinge loss deep learning

December 23, 2024 - Hinge loss in machine learning, a key loss function in SVMs, enhances model robustness by penalizing incorrect or marginal predictions.

Hinge loss

in machine learning, a loss function used for maximum‐margin classification

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended … Wikipedia

Wikipedia

en.wikipedia.org › wiki › Hinge_loss

Hinge loss - Wikipedia

January 26, 2026 - For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as ... {\displaystyle y} should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, ... {\displaystyle |y|<1} , even if it has the same sign (correct prediction, but not by enough margin). The Hinge loss is not a proper scoring rule.

Extensions Optimization

Videos

05:30

YouTube

What is the Hinge Loss in SVM in Machine Learning | Data Science ...

April 9, 2023

1.94K

youtube.com

43 Hinge Loss - Cost Function for SVM : Numerical Examples

11:04

YouTube

Introduction to Hinge Loss | Loss function SVM | Machine Learning ...

February 16, 2023

14:42

YouTube

Week 4 Lecture 25 SVM - Hinge Loss Formulation - YouTube

August 5, 2021

41:50

YouTube

Loss function (Hinge Loss) based interpretation | Support Vector ...

Hinge loss/ Multiclass SVM loss function - lecture 30/machine ...

March 17, 2020

View all

Medium

koshurai.medium.com › understanding-hinge-loss-in-machine-learning-a-comprehensive-guide-0a1c82478de4

Understanding Hinge Loss in Machine Learning: A Comprehensive Guide | by KoshurAI | Medium

January 12, 2024 - One common task in machine learning is classification, where the goal is to assign a label to a given input. To optimize the performance of these models, it is essential to choose an appropriate loss function. Hinge loss is one such function that is commonly used in classification problems, especially in the context of support vector machines (SVM).

Medium

medium.com › analytics-vidhya › understanding-loss-functions-hinge-loss-a0ff112b40a1

Understanding loss functions : Hinge loss | by Kunal Chowdhury | Analytics Vidhya | Medium

January 18, 2024 - Looking at the graph for SVM in Fig 4, we can see that for yf(x) ≥ 1, hinge loss is ‘0’. However, when yf(x) < 1, then hinge loss increases massively.

Taylor & Francis

taylorandfrancis.com › knowledge › Engineering_and_technology › Engineering_support_and_special_topics › Hinge_loss

Hinge loss – Knowledge and References - Taylor & Francis

Hinge loss is typically non-differentiable and can be expressed as loss = maximum (1 – (ytrue × ypred ),0), where ytrue values are expected to be -1 or 1.From: Handbook of Big Data [2019], Effective Processing of Convolutional Neural Networks for Computer Vision: A Tutorial and Survey [2022], Statistical Learning with Sparsity [2019], High-Performance Medical Image Processing [2022]

arXiv

arxiv.org › abs › 2103.00233

[2103.00233] Learning with Smooth Hinge Losses

March 15, 2021 - In this paper, we introduce two smooth Hinge losses $\psi_G(\alpha;\sigma)$ and $\psi_M(\alpha;\sigma)$ which are infinitely differentiable and converge to the Hinge loss uniformly in $\alpha$ as $\sigma$ tends to $0$. By replacing the Hinge loss with these two smooth Hinge losses, we obtain two smooth support vector machines(SSVMs), respectively. Solving the SSVMs with the Trust Region Newton method (TRON) leads to two quadratically convergent algorithms. Experiments in text classification tasks show that the proposed SSVMs are effective in real-world applications. We also introduce a general smooth convex loss function to unify several commonly-used convex loss functions in machine learning.

Stack Exchange

ai.stackexchange.com › questions › 26330 › what-is-the-definition-of-the-hinge-loss-function

neural networks - What is the definition of the hinge loss function? - Artificial Intelligence Stack Exchange

Top answer

1 of 1

The hinge loss/error function is the typical loss function used for binary classification (but it can also be extended to multi-class classification) in the context of support vector machines, although it can also be used in the context of neural networks, as described here.

The hinge loss function is defined as follows

$$ \ell(y) = \max(0, 1-t \cdot y) \tag{1}\label{1}, $$ where

$\text{[math]}$ is the label (so, if your labels are in the set $\text{[math]}$ , you will have to first map them to $\text{[math]}$ )
$\text{[math]}$ is the output of the classifier (e.g. in the context of the linear SVM, $\text{[math]}$ , where $\text{[math]}$ and $\text{[math]}$ are the parameter of the hyper-plane)

This means that the loss in equation \ref{1} is always non-negative. If you're familiar with the ReLU, this loss should look familiar to you. In fact, their plots are very similar.

For more details, you probably should start with the related Wikipedia article, then maybe one of the many machine learning books that covers support vector machines, for example, Pattern Recognition and Machine Learning (2006) by Christopher Bishop, chapter 7 (page 325).

Find elsewhere

Google Bing Mojeek

GitHub

github.com › christianversloot › machine-learning-articles › blob › main › how-to-use-hinge-squared-hinge-loss-with-keras.md

machine-learning-articles/how-to-use-hinge-squared-hinge-loss-with-keras.md at main · christianversloot/machine-learning-articles

October 15, 2019 - In order to discover the ins and outs of the Keras deep learning framework, I'm writing blog posts about commonly used loss functions, subsequently implementing them with Keras to practice and to see how they behave. Today, we'll cover two closely related loss functions that can be used in neural networks - and hence in TensorFlow 2 based Keras - that behave similar to how a Support Vector Machine generates a decision boundary for classification: the hinge ...

Author christianversloot

arXiv

arxiv.org › abs › 2202.02193

[2202.02193] Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification

July 18, 2022 - Yet, proposing top-K losses tailored for deep learning remains a challenge, both theoretically and practically. In this paper we introduce a stochastic top-K hinge loss inspired by recent developments on top-K calibrated losses. Our proposal is based on the smoothing of the top-K operator building ...

ScienceDirect

sciencedirect.com › science › article › abs › pii › S0925231221012509

Learning with smooth Hinge losses - ScienceDirect

August 18, 2021 - On the other hand, zero–one loss and hinge loss that focus on the classification results are logical losses in classification tasks and have been widely applied in machine learning, as depicted in Fig. 1. Nevertheless, in deep learning, models with these loss functions are hard to optimize [11,12]. Therefore, although CE does not match the classification goal exactly in nature, it is still the most efficient loss function in neural network classification, yielding remarkable results.

Programmathically

programmathically.com › home › machine learning › classical machine learning › understanding hinge loss and the svm cost function

Understanding Hinge Loss and the SVM Cost Function - Programmathically

June 26, 2022 - The hinge loss is a specific type of cost function that incorporates a margin or distance from the classification boundary into the cost calculation. Even if new observations are classified correctly, they can incur a penalty if the margin from ...

Quora

quora.com › Why-wasnt-hinge-loss-commonly-used-in-a-neural-network

Why wasn't hinge loss commonly used in a neural network? - Quora

Answer (1 of 2): SVM classifiers use Hinge Loss. Softmax uses Cross-entropy loss. The differential comes to be one of generalized nature and differential in application of Interdimensional interplay in terms of Hyperdimensions. See, when we are to make clear and distinctive predictions in term...

HackerNoon

hackernoon.com › hinge-loss-a-steadfast-loss-evaluation-function-for-the-svm-classification-models-in-ai-and-ml

Hinge Loss - A Steadfast Loss Evaluation Function for the SVM Classification Models in AI & ML | HackerNoon

January 4, 2023 - Researchers use an algebraic acme called “Losses” in order to optimise the machine learning space defined by a specific use case.

ScienceDirect

sciencedirect.com › topics › engineering › hinge-loss-function

Hinge Loss Function - an overview | ScienceDirect Topics

Loss functions are chosen based on the nature of the learning/predictive task of interest, the characteristics of the training data available, the manner in which the target variables are represented/encoded, and whether it is necessary/desirable to constrain the optimization process in some way (i.e., regularization, discussed in Section 16.2.6). Estimation of the (at least locally) optimal parameters ... A brief overview of the most common types of loss functions used to train deep ...

Baeldung

baeldung.com › home › artificial intelligence › machine learning › differences between hinge loss and logistic loss

Differences Between Hinge Loss and Logistic Loss | Baeldung on Computer Science

February 28, 2025 - Between the margins (), however, even if a sample’s prediction is correct, there’s still a small loss. This is to penalize the model for making less certain predictions. ... One of the main characteristics of hinge loss is that it’s a convex function. This makes it different from other losses such as the 0-1 loss.

Towards Data Science

towardsdatascience.com › home › latest › a definitive explanation to hinge loss for support vector machines.

A definitive explanation to Hinge Loss for Support Vector Machines. | Towards Data Science

January 23, 2025 - We see that correctly classified points will have a small(or none) loss size, while incorrectly classified instances will have a high loss size. A negative distance from the boundary incurs a high hinge loss.

reddit.com › r/machinelearning › why isn't cnns with hinge loss popular? can we call it a deep svm?

r/MachineLearning on Reddit: Why isn't CNNs with hinge loss popular? Can we call it a Deep SVM?

January 9, 2016 - CNN with hinge loss actually used sometimes, there are several papers about it. It's just that they are less "natural" for multiclass classification, as opposed to 2-class - you have to choose strategy like one vs all, or group vs group etc. without clear indication what's better. Even for 2 classes they are not overwhelmingly better. No we can't call it deep ...

Stack Overflow

stackoverflow.com › questions › 34325759 › whats-the-relationship-between-an-svm-and-hinge-loss

machine learning - What's the relationship between an SVM and hinge loss? - Stack Overflow

Top answer

1 of 1

I will answer one thing at at time

Is an SVM as simple as saying it's a discriminative classifier that simply optimizes the hinge loss?

SVM is simply a linear classifier, optimizing hinge loss with L2 regularization.

Or is it more complex than that?

No, it is "just" that, however there are different ways of looking at this model leading to complex, interesting conclusions. In particular, this specific choice of loss function leads to extremely efficient kernelization, which is not true for log loss (logistic regression) nor mse (linear regression). Furthermore you can show very important theoretical properties, such as those related to Vapnik-Chervonenkis dimension reduction leading to smaller chance of overfitting.

Intuitively look at these three common losses:

hinge: max(0, 1-py)
log: y log p
mse: (p-y)^2

Only the first one has the property that once something is classified correctly - it has 0 penalty. All the remaining ones still penalize your linear model even if it classifies samples correctly. Why? Because they are more related to regression than classification they want a perfect prediction, not just correct.

How do the support vectors come into play?

Support vectors are simply samples placed near the decision boundary (losely speaking). For linear case it does not change much, but as most of the power of SVM lies in its kernelization - there SVs are extremely important. Once you introduce kernel, due to hinge loss, SVM solution can be obtained efficiently, and support vectors are the only samples remembered from the training set, thus building a non-linear decision boundary with the subset of the training data.

What about the slack variables?

This is just another definition of the hinge loss, more usefull when you want to kernelize the solution and show the convexivity.

Why can't you have deep SVM's the way you can't you have a deep neural network with sigmoid activation functions?

You can, however as SVM is not a probabilistic model, its training might be a bit tricky. Furthermore whole strength of SVM comes from efficiency and global solution, both would be lost once you create a deep network. However there are such models, in particular SVM (with squared hinge loss) is nowadays often choice for the topmost layer of deep networks - thus the whole optimization is actually a deep SVM. Adding more layers in between has nothing to do with SVM or other cost - they are defined completely by their activations, and you can for example use RBF activation function, simply it has been shown numerous times that it leads to weak models (to local features are detected).

To sum up:

there are deep SVMs, simply this is a typical deep neural network with SVM layer on top.
there is no such thing as putting SVM layer "in the middle", as the training criterion is actually only applied to the output of the network.
using of "typical" SVM kernels as activation functions is not popular in deep networks due to their locality (as opposed to very global relu or sigmoid)

DataCamp

datacamp.com › tutorial › loss-function-in-machine-learning

Loss Functions in Machine Learning Explained | DataCamp

December 4, 2024 - To ensure the maximum margin between the data points and boundaries, hinge loss penalizes predictions from the machine learning model that are wrongly classified, which are predictions that fall on the wrong side of the margin boundary and also predictions that are correctly classified but are within close proximity to the decision boundary.

Number Analytics

numberanalytics.com › blog › hinge-loss-ultimate-guide-for-ml-practitioners

Hinge Loss: The Ultimate Guide for ML Practitioners

June 11, 2025 - Hinge loss can be used as a loss function in deep learning models, particularly in the context of binary classification problems.