binary cross entropy function

July 23, 2025 - Binary cross-entropy (log loss) is a loss function used in binary classification problems. It quantifies the difference between the actual class labels (0 or 1) and the predicted probabilities output by the model.

Arize

arize.com › arize ai › courses › binary cross entropy: where to use log loss in model monitoring

Binary Cross Entropy: Where To Use Log Loss In Model Monitoring - Arize AI

October 12, 2025 - Binary cross entropy is equal to -1*log(likelihood). Here Yi represents the actual class and log(p(yi)is the probability of that class. ... Log loss can be used in training as the logistic regression cost function and in production as a performance ...

Discussions

What is the difference between Categorical cross entropy vs binary cross entropy with 2 classes in tf keras?

They are (mostly) equivalent. Binary cross entropy is equivalent to categorical cross entropy where the logit score for the second class is fixed to zero (which can be shown easily by plugging in a zero in the cross entropy formula). Fixing the last class to zero does not reduce the representational power of the model, because classes probabilities will be normalized to one anyway. It is technically true that this will affect the learning dynamics during training, since cross-entropy with two classes will have more parameters than the a binary-cross entropy model, but generally that is not expected to make a difference in performance. More on reddit.com

r/MLQuestions

January 14, 2020

[D] Is there any reason besides theory not to use binary cross-entropy for each class in a multi-class classification problem?

There are at least three reasons that are in favour of the multiclass cross-entropy: coherence, the weak likelihood principle, and the concept of proper scoring rules. Coherence: when you encode a categorical outcome into multiple binary outcomes and model probabilities for each binary outcome then your model assigns probability to events that cannot happen (e.g. class 1 and 2 being true at the same time). Such statements are incoherent because you can simultaneously believe things which cannot be true together. Weak likelihood principle: this principle from statistics says that any conclusions obtained from data and a model must be obtained as a result of the likelihood function. There are good reasons for this principle, see e.g. Berger's book on the likelihood principle. Now, when using a sum of binary cross entropies for a categorical outcome you do not use the likelihood of a single probabilistic model. The normal multiclass cross-entropy is a function of the likelihood. Proper scoring rules: when estimating parameters of a probabilistic model by minimising a function the function being minimised should be what is called a proper scoring rule. Proper scoring rules guarantee that you could recover true probabilities with enough data and a sufficiently rich model, see Gneiting and Raftery, JASA 2007. Now, the binary cross entropy is indeed a proper scoring rule for binary outcomes, but the sum of binary cross entropies is not a proper scoring rule for categorical outcomes which are merely encoded as one hot vectors. More on reddit.com

r/MachineLearning

June 10, 2019

Why does combining the sigmoid with the binary cross entropy loss work so well?

This seems to be a good explanation More on reddit.com

r/learnmachinelearning

January 12, 2021

Why do we use accuracy instead of binary cross entropy for hyper parameter tuning?

BCE is a loss function not an evaluation metric of the model. The loss function is used during training to tune parameters through gradient descent. Minimizing BCE is how your model “learns” but you need something after to evaluate how well your model learned. So Accuracy on the other hand is just the percentage of correct v incorrect. Much more intuitive when evaluating a model but not the only option! Precision, recall, F1 score are all other examples More on reddit.com

r/learnmachinelearning

January 24, 2024

Videos