๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ deep learning โ€บ categorical-cross-entropy-in-multi-class-classification
Categorical Cross-Entropy in Multi-Class Classification - GeeksforGeeks
November 25, 2025 - Categorical Cross-Entropy measures the difference between the true labels and the predicted probabilities of a model. It penalizes the model when it assigns low confidence to the correct class.
๐ŸŒ
Gombru
gombru.github.io โ€บ 2018 โ€บ 05 โ€บ 23 โ€บ cross_entropy_loss
Understanding Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss, Softmax Loss, Logistic Loss, Focal Loss and all those confusing names
Thatโ€™s why it is used for multi-label classification, were the insight of an element belonging to a certain class should not influence the decision for another class. Itโ€™s called Binary Cross-Entropy Loss because it sets up a binary classification problem between \(Cโ€™ = 2\) classes for every class in \(C\), as explained above.
๐ŸŒ
Medium
medium.com โ€บ @abhishekjainindore24 โ€บ binary-and-categorical-cross-entropy-loss-0274d2b978b9
Binary and categorical Cross Entropy Loss - Abhishek Jain - Medium
January 27, 2025 - Cross-entropy loss measures the difference between the predicted probability distribution output by a model and the true label distribution (ground truth).
๐ŸŒ
MDPI
mdpi.com โ€บ 2079-9292 โ€บ 15
A Comparative Study on UNSW-NB15 and CIC-IDS2017
Electronics, an international, peer-reviewed Open Access journal.
๐ŸŒ
Frontiers
frontiersin.org โ€บ journals โ€บ psychology โ€บ articles โ€บ 10.3389 โ€บ fpsyg.2026.1802451 โ€บ full
Frontiers | Influence of the development trajectory of interpersonal security on physical activities in school among early adolescents: the mediating effect of control beliefs
1 week ago - If the LMR and BLRT results conflicted, we prioritized the BLRT outcome (Nylund et al., 2007). Furthermore, the subgroup proportions of category probabilities was required to exceed 5%. Finally, we used the bootstrap method to examine the relative mediating effects of multiple independent variables.
๐ŸŒ
PyTorch
docs.pytorch.org โ€บ reference api โ€บ torch.nn โ€บ crossentropyloss
CrossEntropyLoss โ€” PyTorch 2.11 documentation
January 1, 2023 - >>> # Example of target with class indices >>> loss = nn.CrossEntropyLoss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.empty(3, dtype=torch.long).random_(5) >>> output = loss(input, target) >>> output.backward() >>> >>> # Example of target with class probabilities >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5).softmax(dim=1) >>> output = loss(input, target) >>> output.backward()
Find elsewhere
๐ŸŒ
Keras
keras.io โ€บ api โ€บ losses โ€บ probabilistic_losses
Keras documentation: Probabilistic losses
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers. If you want to provide labels using one-hot representation, please use CategoricalCrossentropy loss.
๐ŸŒ
Medium
fmorenovr.medium.com โ€บ sparse-categorical-cross-entropy-vs-categorical-cross-entropy-ea01d0392d28
Sparse Categorical Cross-Entropy vs Categorical Cross-Entropy | by Felipe A. Moreno | Medium
November 30, 2021 - It is a Sigmoid activation plus a Cross-Entropy loss. ... Difference between Multi-class and Multi-label. Source: here ยท Multi-Class only classify one object from multiples objects in one sample. Multi-Label can classify multiples objects in one sample. In this case, we can calculate using two different methods: Categorical Cross-Entropy and Sparse Categorical Cross-Entropy.
Top answer
1 of 5
112

Both, categorical cross entropy and sparse categorical cross entropy have the same loss function which you have mentioned above. The only difference is the format in which you mention $Y_i$ (i,e true labels).

If your $Y_i$'s are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]

But if your $Y_i$'s are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]

The usage entirely depends on how you load your dataset. One advantage of using sparse categorical cross entropy is it saves time in memory as well as computation because it simply uses a single integer for a class, rather than a whole vector.

2 of 5
14

The formula which you posted in your question refers to binary_crossentropy, not categorical_crossentropy. The former is used when you have only one class. The latter refers to a situation when you have multiple classes and its formula looks like below:

$$J(\textbf{w}) = -\sum_{i=1}^{N} y_i \text{log}(\hat{y}_i).$$

This loss works as skadaver mentioned on one-hot encoded values e.g [1,0,0], [0,1,0], [0,0,1]

The sparse_categorical_crossentropy is a little bit different, it works on integers that's true, but these integers must be the class indices, not actual values. This loss computes logarithm only for output index which ground truth indicates to. So when model output is for example [0.1, 0.3, 0.7] and ground truth is 3 (if indexed from 1) then loss compute only logarithm of 0.7. This doesn't change the final value, because in the regular version of categorical crossentropy other values are immediately multiplied by zero (because of one-hot encoding characteristic). Thanks to that it computes logarithm once per instance and omits the summation which leads to better performance. The formula might look like this:

$$J(\textbf{w}) = -\text{log}(\hat{y}_y).$$

Top answer
1 of 1
12

SparseCategoricalCrossentropy is CategoricalCrossentropy that takes integer labels as opposed to one-hot. Example from source code, the two below are equivalent:

scce = tf.keras.losses.SparseCategoricalCrossentropy()
cce = tf.keras.losses.CategoricalCrossentropy()

labels_scce = K.variable([[0, 1, 2]]) 
labels_cce  = K.variable([[1,    0,  0], [0,    1,  0], [0,   0,   1]])
preds       = K.variable([[.90,.05,.05], [.50,.89,.60], [.05,.01,.94]])

loss_cce  = cce(labels_cce,   preds, from_logits=False)
loss_scce = scce(labels_scce, preds, from_logits=False)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run([loss_cce, loss_scce])

print(K.get_value(loss_cce))
print(K.get_value(loss_scce))
# [0.10536055  0.8046684  0.0618754]
# [0.10536055  0.8046684  0.0618754]

As to how to do it 'by hand', we can refer to the Numpy backend:

np_labels = K.get_value(labels_cce)
np_preds  = K.get_value(preds)

losses = []
for label, pred in zip(np_labels, np_preds):
    pred /= pred.sum(axis=-1, keepdims=True)
    losses.append(np.sum(label * -np.log(pred), axis=-1, keepdims=False))
print(losses)
# [0.10536055  0.8046684  0.0618754]
  • from_logits = True: preds is model output before passing it into softmax (so we pass it into softmax)
  • from_logits = False: preds is model output after passing it into softmax (so we skip this step)

So in summary, to compute it by hand:

  1. Convert integer labels to one-hot labels
  2. If preds are model outputs before softmax, we compute their softmax
  3. pred /= ... normalizes predictions before computing logs; this way, high-probab. preds on zero-labels penalize correct predictions on one-labels. If from_logits = False, this step is skipped, since softmax does the normalization. See this snippet. Further reading
  4. For each observation / sample, compute element-wise negative log (base e) only where label==1
  5. Take mean of losses for all the observations

Lastly, the mathematical formula for categorical crossentropy is:

  • i iterates over N observations
  • c iterates over C classes
  • 1 is the indicator function - here, like binary crossentropy, except operates on length-C vectors
  • p_model [y_i \in C_c] - predicted probability of observation i belonging to class c
๐ŸŒ
V7 Labs
v7labs.com โ€บ home โ€บ blog โ€บ cross entropy loss: intro, applications, code
Cross Entropy Loss: Intro, Applications, Code
Binary cross entropy is calculated on top of sigmoid outputs, whereas Categorical cross-entropy is calculated over softmax activation outputs.
๐ŸŒ
Swebb
swebb.io โ€บ blog โ€บ interpreting-the-categorical-cross-entropy-loss-function
Interpreting the Categorical Cross-Entropy Loss Function โ€” swebb.io
May 9, 2024 - We can actually do something similar ... The categorical cross-entropy is the information entropy associated with putting each marble into a bucket and sorting them in the right buckets....
๐ŸŒ
Medium
medium.com โ€บ @shireenchand โ€บ choosing-between-cross-entropy-and-sparse-cross-entropy-the-only-guide-you-need-abea92c84662
Choosing between Cross Entropy and Sparse Cross Entropy โ€” The Only Guide you Need! | by Shireen Chand | Medium
July 20, 2023 - In contrast to categorical cross-entropy loss, where the true labels are represented as one-hot encoded vectors, sparse categorical cross-entropy loss expects the target labels to be integers indicating the class indices directly.
๐ŸŒ
ML Glossary
ml-cheatsheet.readthedocs.io โ€บ en โ€บ latest โ€บ loss_functions.html
Loss Functions โ€” ML Glossary documentation
Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation ...
๐ŸŒ
GitHub
github.com โ€บ christianversloot โ€บ machine-learning-articles โ€บ blob โ€บ main โ€บ how-to-use-sparse-categorical-crossentropy-in-keras.md
machine-learning-articles/how-to-use-sparse-categorical-crossentropy-in-keras.md at main ยท christianversloot/machine-learning-articles
This means that you'll have to convert these targets first. In Keras, this can be done with to_categorical, which essentially applies one-hot encoding to your training set's targets. When applied, you can start using categorical crossentropy.
Author ย  christianversloot