Logarithmic loss minimization leads to well-behaved probabilistic outputs.
Hinge loss leads to some (not guaranteed) sparsity on the dual, but it doesn't help at probability estimation. Instead, it punishes misclassifications (that's why it's so useful to determine margins): diminishing hinge-loss comes with diminishing across margin misclassifications.
So, summarizing:
Logarithmic loss ideally leads to better probability estimation at the cost of not actually optimizing for accuracy
Hinge loss ideally leads to better accuracy and some sparsity at the cost of not actually estimating probabilities
In ideal scenarios, each respective method would excel in their domain (accuracy vs probability estimation). However, due to the No-Free-Lunch Theorem, it is not possible to know, a priori, if the model choice is optimal.
Answer from Firebug on Stack Exchangemachine learning - hinge loss vs logistic loss advantages and disadvantages/limitations - Cross Validated
machine learning - softmax+cross entropy compared with square regularized hinge loss for CNNs - Cross Validated
DailyML 28: MSE, Hinge Loss, and cross-entropy are all types of ___________.
Why do we use log-loss in logistic regression instead of just taking the absolute difference between expected probability and actual value for each instance?
Videos
Logarithmic loss minimization leads to well-behaved probabilistic outputs.
Hinge loss leads to some (not guaranteed) sparsity on the dual, but it doesn't help at probability estimation. Instead, it punishes misclassifications (that's why it's so useful to determine margins): diminishing hinge-loss comes with diminishing across margin misclassifications.
So, summarizing:
Logarithmic loss ideally leads to better probability estimation at the cost of not actually optimizing for accuracy
Hinge loss ideally leads to better accuracy and some sparsity at the cost of not actually estimating probabilities
In ideal scenarios, each respective method would excel in their domain (accuracy vs probability estimation). However, due to the No-Free-Lunch Theorem, it is not possible to know, a priori, if the model choice is optimal.
@Firebug had a good answer (+1). In fact, I had a similar question here.
What are the impacts of choosing different loss functions in classification to approximate 0-1 loss
I just want to add more on another big advantages of logistic loss: probabilistic interpretation. An example, can be found in UCLA - Advanced Research - Statistical Methods and Data Analysis - Computing Logit Regression | R Data Analysis Examples
Specifically, logistic regression is a classical model in statistics literature. (See, What does the name "Logistic Regression" mean? for the naming.) There are many important concept related to logistic loss, such as maximize log likelihood estimation, likelihood ratio tests, as well as assumptions on binomial. Here are some related discussions.
Likelihood ratio test in R
Why isn't Logistic Regression called Logistic Classification?
Is there i.i.d. assumption on logistic regression?
Difference between logit and probit models