They are essentially the same; usually, we use the term log loss for binary classification problems, and the more general cross-entropy (loss) for the general case of multi-class classification, but even this distinction is not consistent, and you'll often find the terms used interchangeably as synonyms.

From the Wikipedia entry for cross-entropy:

The logistic loss is sometimes called cross-entropy loss. It is also known as log loss

From the fast.ai wiki entry on log loss [link is now dead]:

Log loss and cross-entropy are slightly different depending on the context, but in machine learning when calculating error rates between 0 and 1 they resolve to the same thing.

From the ML Cheatsheet:

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1.

Answer from desertnaut on Stack Overflow
🌐
Medium
medium.com › biased-algorithms › log-loss-vs-cross-entropy-740df12d7526
Log Loss vs Cross Entropy. I understand that learning data science… | by Amit Yadav | Biased-Algorithms | Medium
April 18, 2025 - At its core, cross entropy comes ... (what your model thinks will happen). If log loss is your go-to for binary classification, think of cross entropy as its bigger, multi-class sibling....
Discussions

probability distributions - How is logistic loss and cross-entropy related? - Mathematics Stack Exchange
I found that Kullback-Leibler loss, log-loss or cross-entropy is the same loss function. Is the logistic-loss function used in logistic regression equivalent to the cross-entropy function? If yes, ... More on math.stackexchange.com
🌐 math.stackexchange.com
December 19, 2014
Difference between Cross-Entropy Loss or Log Likelihood Loss?
I’m very confused the difference between cross-entropy loss or log likelihood loss when dealing with Multi-Class Classification (including Binary Classification) or Multi-Label Classification ? Could you explain the difference ? Thanks. More on discuss.pytorch.org
🌐 discuss.pytorch.org
5
2
March 4, 2019
classification - Why we use log function for cross entropy? - Cross Validated
I'm learning about a binary classifier. It uses the cross-entropy function as its loss function. $y_i \log p_i + (1-y_i) \log(1-p_i)$ But why does it use the log function? How about just use linear More on stats.stackexchange.com
🌐 stats.stackexchange.com
September 11, 2018
Cross entropy and log loss question
The full cross entropy function relies on a kronecker delta (or something similar), this is basically just an advance math tool for zero'ing out most elements in a summation. In the case of cross entropy, we basically just want to take the log of ONLY the probability associated with the correct class. The term "Actual" for Cross Entropy is short hand for something like "1 if the current score's class is the target class, 0 otherwise". For log loss, that's actually just binary cross entropy. BCE will use 1 node for output, which is fine because if there were two the output would be A and (1-A) anyways, which is why those two terms show up in binary cross entropy. More on reddit.com
🌐 r/learnmachinelearning
1
2
December 22, 2018
🌐
ML Glossary
ml-cheatsheet.readthedocs.io › en › latest › loss_functions.html
Loss Functions — ML Glossary documentation - Read the Docs
Log loss penalizes both types of errors, but especially those predictions that are confident and wrong! Cross-entropy and log loss are slightly different depending on context, but in machine learning when calculating error rates between 0 and 1 they resolve to the same thing.
🌐
Last9
last9.io › blog › understanding-log-loss-and-cross-entropy
What is Log Loss and Cross-Entropy | Last9
July 10, 2025 - Cross-entropy loss represents the broader mathematical concept that works with any number of classes. When you have exactly two classes (binary classification), cross-entropy reduces to what we call log loss or binary cross-entropy.

They are essentially the same; usually, we use the term log loss for binary classification problems, and the more general cross-entropy (loss) for the general case of multi-class classification, but even this distinction is not consistent, and you'll often find the terms used interchangeably as synonyms.

From the Wikipedia entry for cross-entropy:

The logistic loss is sometimes called cross-entropy loss. It is also known as log loss

From the fast.ai wiki entry on log loss [link is now dead]:

Log loss and cross-entropy are slightly different depending on the context, but in machine learning when calculating error rates between 0 and 1 they resolve to the same thing.

From the ML Cheatsheet:

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1.

Answer from desertnaut on Stack Overflow
🌐
Coralogix
coralogix.com › home › understanding binary cross-entropy and log loss for effective model monitoring
Understanding Binary Cross-Entropy and Log Loss for Effective Model Monitoring
June 3, 2025 - In binary classification, you have ... Cross Entropy/Log Loss measures the dissimilarity between the actual labels and the predicted probabilities of the data points being in the positive class....
🌐
Wikipedia
en.wikipedia.org › wiki › Cross-entropy
Cross-entropy - Wikipedia
2 weeks ago - Mao, Mohri, and Zhong (2023) give ... model. This is also known as the log loss (or logarithmic loss or logistic loss); the terms "log loss" and "cross-entropy loss" are used interchangeably....
Find elsewhere
🌐
Lei Mao's Log Book
leimao.github.io › blog › Conventional-Classification-Loss-Functions
Cross Entropy Loss VS Log Loss VS Sum of Log Loss - Lei Mao's Log Book
July 18, 2020 - If we have $n = 2$ for cross entropy loss and compare it with log loss, we would immediately see that the form of log loss is exactly the same to binary cross entropy loss, and log loss for logistic regression is a special case for cross entropy ...
🌐
Arize
arize.com › arize ai › courses › binary cross entropy: where to use log loss in model monitoring
Binary Cross Entropy: Where To Use Log Loss In Model Monitoring - Arize AI
October 12, 2025 - Binary cross entropy (also known as logarithmic loss or log loss) is a model metric that tracks incorrect labeling of the data class by a model, penalizing the model if deviations in probability occur into classifying the labels.
🌐
GeeksforGeeks
geeksforgeeks.org › deep learning › binary-cross-entropy-log-loss-for-binary-classification
Binary Cross Entropy/Log Loss for Binary Classification - GeeksforGeeks
July 23, 2025 - Binary cross-entropy (log loss) is a loss function used in binary classification problems. It quantifies the difference between the actual class labels (0 or 1) and the predicted probabilities output by the model.
Top answer
1 of 2
51

The relationship between Cross-entropy, logistic loss and K-L divergence is quite natural and immersed in the definition itself.

Cross-entropy is defined as: \begin{equation} H(p, q) = \operatorname{E}_p[-\log q] = H(p) + D_{\mathrm{KL}}(p \| q)=-\sum_x p(x)\log q(x) \end{equation} Where, and are two distributions and using the definition of K-L divergence. is the entropy of p. Now if and , we can re-write cross-entropy as: \begin{equation} H(p, q) = -\sum_x p_x \log q_x =-y\log \hat{y}-(1-y)\log (1-\hat{y}) \end{equation} which is nothing but logistic loss. Further, log loss is also related to logistic loss and cross-entropy as follows:

Expected Log loss is defined as follows: \begin{equation} E[-\log q] \end{equation} Note the above loss function used in logistic regression where q is a sigmoid function. Excess risk for the above loss function is defined as follows: \begin{equation} E[\log p - \log q ]=E[\log\frac{p}{q}]=D_{KL}(p||q) \end{equation} Notice that the K-L divergence is nothing but the excess risk of the log loss and K-L differs from Cross-entropy by a constant factor (see the first definition). One important thing to remember is that we usually minimize the log loss instead of the cross-entropy in logistic regression which is not perfectly OK but it is in practice.

2 of 2
4

yes they are related.
the cross entropy used in logistic regression is derived from the Maximum Likelihood principle (or equivalently minimise (- log(likelihood))). see section 28.2.1 Kullback-Liebler divergence:

Suppose ν and µ are the distributions of two probability models, and ν << µ. Then the cross-entropy is the expected negative log-likelihood of the model corresponding to ν, when the actual distribution is µ

🌐
Towards Data Science
towardsdatascience.com › home › latest › understanding binary cross-entropy / log loss: a visual explanation
Understanding binary cross-entropy / log loss: a visual explanation | Towards Data Science
March 7, 2025 - It turns out, taking the (negative) log of the probability suits us well enough for this purpose (since the log of values between 0.0 and 1.0 is negative, we take the negative log to obtain a positive value for the loss). Actually, the reason we use log for this comes from the definition of cross-entropy, please check the “Show me the math” section below for more details.
🌐
MachineLearningMastery
machinelearningmastery.com › home › blog › a gentle introduction to cross-entropy for machine learning
A Gentle Introduction to Cross-Entropy for Machine Learning - MachineLearningMastery.com
December 22, 2020 - Entropy can be calculated for a probability distribution as the negative sum of the probability for each event multiplied by the log of the probability for the event, where log is base-2 to ensure the result is in bits. ... As we will see later, both cross-entropy and KL divergence calculate the same quantity when they are used as loss functions for optimizing a classification predictive model.
🌐
James D. McCaffrey
jamesmccaffreyblog.com › home › log loss and cross entropy are almost the same
Log Loss and Cross Entropy are Almost the Same - James D. McCaffreyJames D. McCaffrey
June 27, 2018 - In words, for log loss with binary prediction, you just take the negative log of your predicted probability of the true result. This is the same as cross entropy.
🌐
Towards Data Science
towardsdatascience.com › home › latest › understanding sigmoid, logistic, softmax functions, and cross-entropy loss (log loss)
Understanding Sigmoid, Logistic, Softmax Functions, and Cross-Entropy Loss (Log Loss) | Towards Data Science
January 23, 2025 - So where does the definition of log loss come from? Cross-Entropy is a concept derived from information theory that measures the difference between two probability distributions, and the definition of it between true probability distribution p and estimated probability q in the information theory is:
Top answer
1 of 3
15

For binary classification one way to encode the probability of an output is $p^y(1-p)^{1-y}$, if y is encoded as 0 or 1. This is the likelihood function and it’s meaning is with probability p we output 0 and with probability 1-p if output is 1.

Now you have a sample and you want to find p which best fits your data. One way is to find the maximum likelihood estimator. If your observations are independent your mle is found by maximizing the likelihood over the whole sample. This is the product of individual likelihoods $\pi_{i=1}^n p^{y_i}(1-p)^{y_i-1}$. But this is hard to use. Because of that one transform likelihood with logs. The transformation is monotonous and you get rid of products and obtain sums which are more tractable. Apply logs and get your expression.

Why not use your encoding instead? I think there is no reason why not. The question is which are the properties of your estimator? The first formulation uses likelihood and mle which has some theory behind which includes the fact that your estimator is efficient. The second formulation is not used often, don’t know any example of encoding the probability like that which does not exclude your approach.

2 of 3
8

I was also looking for an explanation and found one reason I find intuitive here:

It heavily penalizes predications that are confident and wrong.

Check this graph, it shows the range of possible log loss values given a true observation:

The Log Loss increases rapidly as the predicted probability approaches 0(wrong prediction).

🌐
Towards Data Science
towardsdatascience.com › home › latest › cross-entropy, negative log-likelihood, and all that jazz
Cross-Entropy, Negative Log-Likelihood, and All That Jazz | Towards Data Science
March 5, 2025 - Just to make things a little more complicated since "minimizing loss" makes more sense, we can instead take the negative of the log-likelihood and minimize that, resulting in the well known Negative Log-Likelihood Loss: To recap, our original goal was to maximize the likelihood of observing the data given some parametric settings theta. The minimizing negative log-likelihood objective is the "same" as our original objective in the sense that both should have the same optimal solution (in a convex optimization setting to be pedantic). In the discrete setting, given two probability distributions p and q, their cross-entropy is defined as
🌐
Medium
medium.com › ai-enthusiast › cross-entropy-and-log-loss-mathematical-foundations-and-their-use-in-classification-eb708f9f629f
Cross-Entropy and Log Loss: Mathematical Foundations and Their Use in Classification | by Deepankar Singh | AI-Enthusiast | Medium
April 12, 2025 - In the realm of machine learning, especially in classification problems, choosing the right loss function is pivotal to building accurate and reliable models. Among various loss functions, one that frequently emerges at the heart of both theory and practice is the Cross-Entropy Loss, also known in some contexts as Log Loss.