sparse categorical cross entropy vs binary cross entropy

Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

stats.stackexchange.com › questions › 326065 › cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other

Both, categorical cross entropy and sparse categorical cross entropy have the same loss function which you have mentioned above. The only difference is the format in which you mention $\text{[math]}$ (i,e true labels).

If your $\text{[math]}$ 's are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]

But if your $\text{[math]}$ 's are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]

The usage entirely depends on how you load your dataset. One advantage of using sparse categorical cross entropy is it saves time in memory as well as computation because it simply uses a single integer for a class, rather than a whole vector.

Answer from skadaver on Stack Exchange

Stack Exchange

stats.stackexchange.com › questions › 326065 › cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other

machine learning - Cross Entropy vs. Sparse Cross Entropy: When to use one over the other - Cross Validated

Top answer

1 of 5

112

If your $\text{[math]}$ 's are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]

But if your $\text{[math]}$ 's are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]

2 of 5

The formula which you posted in your question refers to binary_crossentropy, not categorical_crossentropy. The former is used when you have only one class. The latter refers to a situation when you have multiple classes and its formula looks like below:

$\text{[math]}$

This loss works as skadaver mentioned on one-hot encoded values e.g [1,0,0], [0,1,0], [0,0,1]

The sparse_categorical_crossentropy is a little bit different, it works on integers that's true, but these integers must be the class indices, not actual values. This loss computes logarithm only for output index which ground truth indicates to. So when model output is for example [0.1, 0.3, 0.7] and ground truth is 3 (if indexed from 1) then loss compute only logarithm of 0.7. This doesn't change the final value, because in the regular version of categorical crossentropy other values are immediately multiplied by zero (because of one-hot encoding characteristic). Thanks to that it computes logarithm once per instance and omits the summation which leads to better performance. The formula might look like this:

$\text{[math]}$

Medium

medium.com › @shivamvbomble › understanding-sparse-categorical-cross-entropy-and-binary-cross-entropy-d37aa33aed8e

Understanding Sparse Categorical Cross-Entropy and Binary Cross-Entropy | by Shivam Bomble | Medium

February 3, 2025 - Binary Cross-Entropy: For binary classification problems. Categorical Cross-Entropy: For multi-class classification problems. Sparse Categorical Cross-Entropy: A variant of categorical cross-entropy for integer-labeled classes.

Discussions

tensorflow - What is the difference of BinaryCrossentropy and SparseCategoricalCrossentropy? - Stack Overflow

I´m training a CNN for binary image classification and I´m at the point where I have to choose the loss function and searching for answers. At this point I´m getting confused because one half is sa... More on stackoverflow.com

stackoverflow.com

neural network - Sparse_categorical_crossentropy vs categorical_crossentropy (keras, accuracy) - Data Science Stack Exchange

Which is better for accuracy or are they the same? Of course, if you use categorical_crossentropy you use one hot encoding, and if you use sparse_categorical_crossentropy you encode as normal integ... More on datascience.stackexchange.com

datascience.stackexchange.com

December 1, 2018

python - What is the difference between sparse_categorical_crossentropy and categorical_crossentropy? - Stack Overflow

One good example of the sparse-categorical-cross-entropy is the fasion-mnist dataset. More on stackoverflow.com

stackoverflow.com

tensorflow - Meaning of sparse in "sparse cross entropy loss"? - Stack Overflow

However, note that this sparse cross-entropy is only suitable for "sparse labels", where exactly one value is 1 and all others are 0 (if the labels were represented as a vector and not just an index). On the other hand, the general CategoricalCrossentropy also works with targets that are not ... More on stackoverflow.com

stackoverflow.com

Videos