categorical cross entropy - Brave Search

geeksforgeeks.org › deep learning › categorical-cross-entropy-in-multi-class-classification

Categorical Cross-Entropy in Multi-Class Classification - GeeksforGeeks

November 25, 2025 - Categorical Cross-Entropy measures the difference between the true labels and the predicted probabilities of a model. It penalizes the model when it assigns low confidence to the correct class.

gombru.github.io › 2018 › 05 › 23 › cross_entropy_loss

Understanding Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss, Softmax Loss, Logistic Loss, Focal Loss and all those confusing names

That’s why it is used for multi-label classification, were the insight of an element belonging to a certain class should not influence the decision for another class. It’s called Binary Cross-Entropy Loss because it sets up a binary classification problem between \(C’ = 2\) classes for every class in \(C\), as explained above.

Videos

Sparse Categorical Cross-Entropy | Explanation and Practical Tips ...

October 19, 2023

Categorical Cross Entropy Explained | Beginner's Guide | Loss ...

October 11, 2023

Neural Networks from Scratch - P.7 Calculating Loss with Categorical ...

January 23, 2021

The Categorical Cross-Entropy - YouTube

September 12, 2020

Neural Networks Part 6: Cross Entropy - YouTube

tensorflow.org › tensorflow v2.16.1 › tf.keras.losses.categoricalcrossentropy

tf.keras.losses.CategoricalCrossentropy | TensorFlow v2.16.1

Computes the crossentropy loss between the labels and predictions.

stats.stackexchange.com › questions › 623058 › is-categorical-cross-entropy-loss-wasting-training-data

Is categorical cross entropy loss wasting training data?

August 3, 2023 - Q&A for people interested in statistics, machine learning, data analysis, data mining, and data visualization

medium.com › @abhishekjainindore24 › binary-and-categorical-cross-entropy-loss-0274d2b978b9

Binary and categorical Cross Entropy Loss - Abhishek Jain - Medium

January 27, 2025 - Cross-entropy loss measures the difference between the predicted probability distribution output by a model and the true label distribution (ground truth).

mdpi.com › 2079-9292 › 15

A Comparative Study on UNSW-NB15 and CIC-IDS2017

Electronics, an international, peer-reviewed Open Access journal.

frontiersin.org › journals › psychology › articles › 10.3389 › fpsyg.2026.1802451 › full

Frontiers | Influence of the development trajectory of interpersonal security on physical activities in school among early adolescents: the mediating effect of control beliefs

1 week ago - If the LMR and BLRT results conflicted, we prioritized the BLRT outcome (Nylund et al., 2007). Furthermore, the subgroup proportions of category probabilities was required to exceed 5%. Finally, we used the bootstrap method to examine the relative mediating effects of multiple independent variables.

docs.pytorch.org › reference api › torch.nn › crossentropyloss

CrossEntropyLoss — PyTorch 2.11 documentation

January 1, 2023 - >>> # Example of target with class indices >>> loss = nn.CrossEntropyLoss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.empty(3, dtype=torch.long).random_(5) >>> output = loss(input, target) >>> output.backward() >>> >>> # Example of target with class probabilities >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5).softmax(dim=1) >>> output = loss(input, target) >>> output.backward()

Find elsewhere

Google Bing Mojeek

keras.io › api › losses › probabilistic_losses

Keras documentation: Probabilistic losses

Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers. If you want to provide labels using one-hot representation, please use CategoricalCrossentropy loss.

fmorenovr.medium.com › sparse-categorical-cross-entropy-vs-categorical-cross-entropy-ea01d0392d28

Sparse Categorical Cross-Entropy vs Categorical Cross-Entropy | by Felipe A. Moreno | Medium

November 30, 2021 - It is a Sigmoid activation plus a Cross-Entropy loss. ... Difference between Multi-class and Multi-label. Source: here · Multi-Class only classify one object from multiples objects in one sample. Multi-Label can classify multiples objects in one sample. In this case, we can calculate using two different methods: Categorical Cross-Entropy and Sparse Categorical Cross-Entropy.

en.wikipedia.org › wiki › Cross-entropy

Cross-entropy - Wikipedia

2 weeks ago - The cross entropy arises in classification problems when introducing a logarithm in the guise of the log-likelihood function.

Definition Motivation Estimation Relation to maximum likelihood Cross-entropy minimization Cross-entropy loss function and logistic regression Relation to linear regression Amended cross-entropy Further reading

en.wikipedia.org › wiki › Entropy_(information_theory)

Entropy (information theory) - Wikipedia

1 week ago - Cross entropy – is a measure of the average number of bits needed to identify an event from a set of possibilities between two probability distributions

Introduction Definition Example 2 Characterization Further properties Aspects Efficiency (normalized entropy)Entropy for continuous random variables Use in number theory Use in combinatorics Use in machine learning Further reading

stats.stackexchange.com › questions › 326065 › cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other

machine learning - Cross Entropy vs. Sparse Cross Entropy: When to use one over the other - Cross Validated

Both, categorical cross entropy and sparse categorical cross entropy have the same loss function which you have mentioned above. The only difference is the format in which you mention $Y_i$ (i,e true labels).

If your $Y_i$'s are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]

But if your $Y_i$'s are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]

The usage entirely depends on how you load your dataset. One advantage of using sparse categorical cross entropy is it saves time in memory as well as computation because it simply uses a single integer for a class, rather than a whole vector.

The formula which you posted in your question refers to binary_crossentropy, not categorical_crossentropy. The former is used when you have only one class. The latter refers to a situation when you have multiple classes and its formula looks like below:

$$J(\textbf{w}) = -\sum_{i=1}^{N} y_i \text{log}(\hat{y}_i).$$

This loss works as skadaver mentioned on one-hot encoded values e.g [1,0,0], [0,1,0], [0,0,1]

The sparse_categorical_crossentropy is a little bit different, it works on integers that's true, but these integers must be the class indices, not actual values. This loss computes logarithm only for output index which ground truth indicates to. So when model output is for example [0.1, 0.3, 0.7] and ground truth is 3 (if indexed from 1) then loss compute only logarithm of 0.7. This doesn't change the final value, because in the regular version of categorical crossentropy other values are immediately multiplied by zero (because of one-hot encoding characteristic). Thanks to that it computes logarithm once per instance and omits the summation which leads to better performance. The formula might look like this:

$$J(\textbf{w}) = -\text{log}(\hat{y}_y).$$

stackoverflow.com › questions › 58159154 › how-to-calculate-categorical-cross-entropy-by-hand

python - How to calculate Categorical Cross-Entropy by hand? - Stack Overflow

SparseCategoricalCrossentropy is CategoricalCrossentropy that takes integer labels as opposed to one-hot. Example from source code, the two below are equivalent:

scce = tf.keras.losses.SparseCategoricalCrossentropy()
cce = tf.keras.losses.CategoricalCrossentropy()

labels_scce = K.variable([[0, 1, 2]]) 
labels_cce  = K.variable([[1,    0,  0], [0,    1,  0], [0,   0,   1]])
preds       = K.variable([[.90,.05,.05], [.50,.89,.60], [.05,.01,.94]])

loss_cce  = cce(labels_cce,   preds, from_logits=False)
loss_scce = scce(labels_scce, preds, from_logits=False)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run([loss_cce, loss_scce])

print(K.get_value(loss_cce))
print(K.get_value(loss_scce))
# [0.10536055  0.8046684  0.0618754]
# [0.10536055  0.8046684  0.0618754]

As to how to do it 'by hand', we can refer to the Numpy backend:

np_labels = K.get_value(labels_cce)
np_preds  = K.get_value(preds)

losses = []
for label, pred in zip(np_labels, np_preds):
    pred /= pred.sum(axis=-1, keepdims=True)
    losses.append(np.sum(label * -np.log(pred), axis=-1, keepdims=False))
print(losses)
# [0.10536055  0.8046684  0.0618754]

from_logits = True: preds is model output before passing it into softmax (so we pass it into softmax)
from_logits = False: preds is model output after passing it into softmax (so we skip this step)

So in summary, to compute it by hand:

Convert integer labels to one-hot labels
If preds are model outputs before softmax, we compute their softmax
pred /= ... normalizes predictions before computing logs; this way, high-probab. preds on zero-labels penalize correct predictions on one-labels. If from_logits = False, this step is skipped, since softmax does the normalization. See this snippet. Further reading
For each observation / sample, compute element-wise negative log (base e) only where label==1
Take mean of losses for all the observations

Lastly, the mathematical formula for categorical crossentropy is:

i iterates over N observations
c iterates over C classes
1 is the indicator function - here, like binary crossentropy, except operates on length-C vectors
p_model [y_i \in C_c] - predicted probability of observation i belonging to class c

Weights & Biases

wandb.ai › mostafaibrahim17 › ml-articles › reports › Understanding-the-Difference-in-Performance-Between-Binary-Cross-Entropy-and-Categorical-Cross-Entropy--Vmlldzo0Nzk4NDI2

Weights & Biases

2 days ago - Weights & Biases, developer tools for machine learning

v7labs.com › home › blog › cross entropy loss: intro, applications, code

Cross Entropy Loss: Intro, Applications, Code

Binary cross entropy is calculated on top of sigmoid outputs, whereas Categorical cross-entropy is calculated over softmax activation outputs.

swebb.io › blog › interpreting-the-categorical-cross-entropy-loss-function

Interpreting the Categorical Cross-Entropy Loss Function — swebb.io

May 9, 2024 - We can actually do something similar ... The categorical cross-entropy is the information entropy associated with putting each marble into a bucket and sorting them in the right buckets....

medium.com › @shireenchand › choosing-between-cross-entropy-and-sparse-cross-entropy-the-only-guide-you-need-abea92c84662

Choosing between Cross Entropy and Sparse Cross Entropy — The Only Guide you Need! | by Shireen Chand | Medium

July 20, 2023 - In contrast to categorical cross-entropy loss, where the true labels are represented as one-hot encoded vectors, sparse categorical cross-entropy loss expects the target labels to be integers indicating the class indices directly.

ml-cheatsheet.readthedocs.io › en › latest › loss_functions.html

Loss Functions — ML Glossary documentation

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation ...

github.com › christianversloot › machine-learning-articles › blob › main › how-to-use-sparse-categorical-crossentropy-in-keras.md

machine-learning-articles/how-to-use-sparse-categorical-crossentropy-in-keras.md at main · christianversloot/machine-learning-articles

This means that you'll have to convert these targets first. In Keras, this can be done with to_categorical, which essentially applies one-hot encoding to your training set's targets. When applied, you can start using categorical crossentropy.

Author christianversloot