binary cross entropy with logits

What should I use as target vector when I use BinaryCrossentropy(from_logits=True) in tensorflow.keras

stackoverflow.com › questions › 61233425 › what-should-i-use-as-target-vector-when-i-use-binarycrossentropyfrom-logits-tru

First, let me give some notes about the numerical stability:

As mentioned in the comments section, the numerical instability in case of using from_logits=False comes from the transformation of probability values back into logits which involves a clipping operation (as discussed in this question and its answer). However, to the best of my knowledge, this does NOT create any serious issues for most of practical applications (although, there are some cases where applying the softmax/sigmoid function inside the loss function, i.e. using from_logits=True, would be more numerically stable in terms of computing gradients; see this answer for a mathematical explanation).

In other words, if you are not concerned with precision of generated probability values with sensitivity of less than 1e-7, or a related convergence issue observed in your experiments, then you should not worry too much; just use the sigmoid and binary cross-entropy as before, i.e. model.compile(loss='binary_crossentropy', ...), and it would work fine.

All in all, if you are really concerned with numerical stability, you can take the safest path and use from_logits=True without using any activation function on the last layer of the model.

Now, to answer the original question, the true labels or target values (i.e. y_true) should be still only zeros or ones when using BinaryCrossentropy(from_logits=True). Rather, that's the y_pred (i.e. the output of the model) which should not be a probability distribution in this case (i.e. the sigmoid function should not be used on the last layer if from_logits=True).

Answer from today on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 61233425 › what-should-i-use-as-target-vector-when-i-use-binarycrossentropyfrom-logits-tru

python - What should I use as target vector when I use BinaryCrossentropy(from_logits=True) in tensorflow.keras - Stack Overflow

Top answer

1 of 2

20

First, let me give some notes about the numerical stability:

As mentioned in the comments section, the numerical instability in case of using from_logits=False comes from the transformation of probability values back into logits which involves a clipping operation (as discussed in this question and its answer). However, to the best of my knowledge, this does NOT create any serious issues for most of practical applications (although, there are some cases where applying the softmax/sigmoid function inside the loss function, i.e. using from_logits=True, would be more numerically stable in terms of computing gradients; see this answer for a mathematical explanation).

In other words, if you are not concerned with precision of generated probability values with sensitivity of less than 1e-7, or a related convergence issue observed in your experiments, then you should not worry too much; just use the sigmoid and binary cross-entropy as before, i.e. model.compile(loss='binary_crossentropy', ...), and it would work fine.

All in all, if you are really concerned with numerical stability, you can take the safest path and use from_logits=True without using any activation function on the last layer of the model.

Now, to answer the original question, the true labels or target values (i.e. y_true) should be still only zeros or ones when using BinaryCrossentropy(from_logits=True). Rather, that's the y_pred (i.e. the output of the model) which should not be a probability distribution in this case (i.e. the sigmoid function should not be used on the last layer if from_logits=True).

2 of 2

3

I tested GAN on recovering realistic image from sketch and the only difference between two train cycles was BinaryCrossentropy(from_logits=True/False). Last network layer is Conv2D with no activation, so the right choice should be from_logits=True, but for experimental purposes - I found huge difference in generator and discriminator loss

orange - True,
blue - False.

Here is the link to collab notebook. Exercise based on Tensorflow tutorial pix2pix.

According to exercise description if from_logits=True

The value log(2) = 0.69 is a good reference point for these losses, as it indicates a perplexity of 2: That the discriminator is on average equally uncertain about the two options.
For the disc_loss a value below 0.69 means the discriminator is doing better than random, on the combined set of real+generated images.
For the gen_gan_loss a value below 0.69 means the generator i doing better than random at foolding the descriminator.

Otherwise loss twice higher for both: generator and discriminator. SImilar explanation doesn't look to hold relevance anymore.

Final images are also different:

In case of from_logits==False , image looks blurry and non-realistic

Medium

medium.com › @pxszxrzpz › use-binary-cross-entropy-loss-with-logits-instead-of-cross-entropy-loss-on-binary-classification-89b7b75443fa

Use Binary Cross Entropy with logits loss instead of Cross Entropy Loss on binary classification task | by Pxszxrzpz | Medium

September 5, 2025 - It combines a Sigmoid activation with Binary Cross Entropy, offering several advantages: ... Simpler model output: You only need to output a single logit per sample, rather than two class scores.

Discussions

How to use binary cross entropy with logits in binary target and 3d output

I have batch size = 5 my network ... torch.Size([5]) (i.e. ex [1.0, 0.0, 0.0, 1.0, 1.0]) Then i pass it to following loss function loss = F.binary_cross_entropy_with_logits(output, target) I get the following value error raise ValueError("Target size ({}) must be the same as input ... More on discuss.pytorch.org

discuss.pytorch.org

1

0

August 7, 2019

python - What is the difference between binary crossentropy and binary crossentropy with logits in keras? - Stack Overflow

If there is a sigmoid layer, it will squeeze the class scores into probabilities, in this case from_logits should be False. The loss function will transform the probabilities into logits, because that's what tf.nn.sigmoid_cross_entropy_with_logits expects. More on stackoverflow.com

stackoverflow.com

Manual Calculation of Binary Cross Entropy with logits

In order to ensure that I understood how BCE with logits loss works in pytorch, I tried to manually calculate the loss, however I cannot reconcile my manual calculation with the loss generated by the pytorch function F.binary_cross_entropy_with_logits. can somebody please explain what i am ... More on discuss.pytorch.org

discuss.pytorch.org

3

0

November 20, 2019

How do I calculate the binary cross entropy loss directly from logits in a deep net?

How Do I Calculate the Binary Cross Entropy Loss Directly From Logits In a Deep Net? More on mathworks.com

mathworks.com

1

0

May 19, 2020

Videos