categorical cross entropy vs sparse categorical cross entropy

Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

stats.stackexchange.com › questions › 326065 › cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other

Both, categorical cross entropy and sparse categorical cross entropy have the same loss function which you have mentioned above. The only difference is the format in which you mention $\text{[math]}$ (i,e true labels).

If your $\text{[math]}$ 's are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]

But if your $\text{[math]}$ 's are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]

The usage entirely depends on how you load your dataset. One advantage of using sparse categorical cross entropy is it saves time in memory as well as computation because it simply uses a single integer for a class, rather than a whole vector.

Answer from skadaver on Stack Exchange

Stack Exchange

stats.stackexchange.com › questions › 326065 › cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other

machine learning - Cross Entropy vs. Sparse Cross Entropy: When to use one over the other - Cross Validated

Top answer

1 of 5

112

Both, categorical cross entropy and sparse categorical cross entropy have the same loss function which you have mentioned above. The only difference is the format in which you mention $\text{[math]}$ (i,e true labels).

If your $\text{[math]}$ 's are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]

But if your $\text{[math]}$ 's are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]

The usage entirely depends on how you load your dataset. One advantage of using sparse categorical cross entropy is it saves time in memory as well as computation because it simply uses a single integer for a class, rather than a whole vector.

2 of 5

14

The formula which you posted in your question refers to binary_crossentropy, not categorical_crossentropy. The former is used when you have only one class. The latter refers to a situation when you have multiple classes and its formula looks like below:

$\text{[math]}$

This loss works as skadaver mentioned on one-hot encoded values e.g [1,0,0], [0,1,0], [0,0,1]

The sparse_categorical_crossentropy is a little bit different, it works on integers that's true, but these integers must be the class indices, not actual values. This loss computes logarithm only for output index which ground truth indicates to. So when model output is for example [0.1, 0.3, 0.7] and ground truth is 3 (if indexed from 1) then loss compute only logarithm of 0.7. This doesn't change the final value, because in the regular version of categorical crossentropy other values are immediately multiplied by zero (because of one-hot encoding characteristic). Thanks to that it computes logarithm once per instance and omits the summation which leads to better performance. The formula might look like this:

$\text{[math]}$

GeeksforGeeks

geeksforgeeks.org › deep learning › sparse-categorical-crossentropy-vs-categorical-crossentropy

Sparse Categorical Crossentropy vs. Categorical Crossentropy - GeeksforGeeks

July 26, 2025 - Sparse Categorical Crossentropy is functionally similar to Categorical Crossentropy but is designed for cases where the target labels are not one-hot encoded. Instead, the labels are represented as integers corresponding to the class indices.

Discussions

Categorical_crossentropy vs sparse categorical crossentropy

Can someone please explain me the difference between the two loss functions categorical crossentropy and sparse categorical crossentropy More on community.deeplearning.ai

community.deeplearning.ai

1

0

February 27, 2023

neural network - Sparse_categorical_crossentropy vs categorical_crossentropy (keras, accuracy) - Data Science Stack Exchange

Which is better for accuracy or are they the same? Of course, if you use categorical_crossentropy you use one hot encoding, and if you use sparse_categorical_crossentropy you encode as normal integ... More on datascience.stackexchange.com

datascience.stackexchange.com

December 1, 2018

C2_W2_SoftMax Lab - question about SparseCategorialCrossentropy or CategoricalCrossEntropy

In the C2_W2_SoftMax lab it says: … and I thought I understood what it was saying, but in the lab we have (for the first two example vectors): [[6.18e-03 1.51e-03 9.54e-01 3.84e-02] [9.93e-01 6.15e-03 3.59e-04 3.78e-04]] … and that would seem to me that it is approximating the one-hot encoding: ... More on community.deeplearning.ai

community.deeplearning.ai

0

July 30, 2022

I am confused between sparse categorical crossentropy and categorical crossentropy

A sparse tensor is one where any element has 0 value, like in one hot encoded. So why is SCCE not used when the target label is one hot encoded but used when an integer class like 1, 2, 3, or 4 is passed? I am confused because the definition says something and the implementation is something else. You're slightly misunderstanding the "sparse" part. You're correct that a sparse vector/matrix has a lot of zeroes, but we're talking specifically about representation here. One-hot-encoding is not a sparse representation, because you still use the same amount of memory as for a non-sparse matrix, i.e. one float per number, even if that number is zero. So one-hot-encoding is a non-sparse (i.e. dense) representation of a sparse matrix. On the other hand, a vector of integers representing the index of the non-zero element is a sparse representation of a sparse matrix: we don't store all the zeroes and thus we save memory. So the only difference between these loss functions is the format/representation in which you supply the targets. SCCE uses a sparse representation, whereas CCE uses a dense representation. More on reddit.com

r/MLQuestions

4

3

September 25, 2022

Videos