neural networks - What is the definition of the hinge loss function? - Artificial Intelligence Stack Exchange
machine learning - hinge loss vs logistic loss advantages and disadvantages/limitations - Cross Validated
Why do we use log-loss in logistic regression instead of just taking the absolute difference between expected probability and actual value for each instance?
Is support vector machine just about simplifying logistic regression formula? If so, why this name?
No. The main difference between the costs function is that the cross entropy loss (CEL) penalizes based on prediction distance from the answer. So if something is predicted using CEL as class 1 with probability 0.51 and it is actually class 1, it is penalized more strongly than if it had been predicted with probability 0.99, but for the hinge loss for SVM it's just counted the same whether or not you barely predict the answer or have high confidence. However both methods are penalized by 'distance' when they predict the wrong answer
More on reddit.comVideos
Logarithmic loss minimization leads to well-behaved probabilistic outputs.
Hinge loss leads to some (not guaranteed) sparsity on the dual, but it doesn't help at probability estimation. Instead, it punishes misclassifications (that's why it's so useful to determine margins): diminishing hinge-loss comes with diminishing across margin misclassifications.
So, summarizing:
Logarithmic loss ideally leads to better probability estimation at the cost of not actually optimizing for accuracy
Hinge loss ideally leads to better accuracy and some sparsity at the cost of not actually estimating probabilities
In ideal scenarios, each respective method would excel in their domain (accuracy vs probability estimation). However, due to the No-Free-Lunch Theorem, it is not possible to know, a priori, if the model choice is optimal.
@Firebug had a good answer (+1). In fact, I had a similar question here.
What are the impacts of choosing different loss functions in classification to approximate 0-1 loss
I just want to add more on another big advantages of logistic loss: probabilistic interpretation. An example, can be found in UCLA - Advanced Research - Statistical Methods and Data Analysis - Computing Logit Regression | R Data Analysis Examples
Specifically, logistic regression is a classical model in statistics literature. (See, What does the name "Logistic Regression" mean? for the naming.) There are many important concept related to logistic loss, such as maximize log likelihood estimation, likelihood ratio tests, as well as assumptions on binomial. Here are some related discussions.
Likelihood ratio test in R
Why isn't Logistic Regression called Logistic Classification?
Is there i.i.d. assumption on logistic regression?
Difference between logit and probit models