nn.BCELoss() expects your output to be probabilities, that is with the sigmoid activation.
nn.BCEWithLogitsLoss() expects your output to be logits, that is without the sigmoid activation.
I think maybe you calculated something wrong (like accuracy). Here I give you a simple example based on your code:
With probabilities:
dummy_x = torch.randn(1000,1)
dummy_y = (dummy_x > 0).type(torch.float)
model1 = nn.Sequential(
nn.Linear(1, 1),
nn.Sigmoid()
)
criterion1 = nn.BCELoss()
optimizer = torch.optim.Adam(model1.parameters(), 0.001)
def binary_accuracy(preds, y, logits=False):
if logits:
rounded_preds = torch.round(torch.sigmoid(preds))
else:
rounded_preds = torch.round(preds)
correct = (rounded_preds == y).float()
accuracy = correct.sum() / len(y)
return accuracy
for e in range(2000):
y_hat = model1(dummy_x)
loss = criterion1(y_hat, dummy_y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if e != 0 and e % 100==0:
print(f"Epoch: {e}, Loss: {loss:.4f}")
print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y)}")
#Result:
Epoch: 100, Loss: 0.5840
Epoch: 100, Acc: 0.5839999914169312
Epoch: 200, Loss: 0.5423
Epoch: 200, Acc: 0.6499999761581421
...
Epoch: 1800, Loss: 0.2862
Epoch: 1800, Acc: 0.9950000047683716
Epoch: 1900, Loss: 0.2793
Epoch: 1900, Acc: 0.9929999709129333
Now with logits
model2 = nn.Linear(1, 1)
criterion2 = nn.BCEWithLogitsLoss()
optimizer2 = torch.optim.Adam(model2.parameters(), 0.001)
for e in range(2000):
y_hat = model2(dummy_x)
loss = criterion2(y_hat, dummy_y)
optimizer2.zero_grad()
loss.backward()
optimizer2.step()
if e != 0 and e % 100==0:
print(f"Epoch: {e}, Loss: {loss:.4f}")
print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y, logits=True)}")
#Results:
Epoch: 100, Loss: 1.1042
Epoch: 100, Acc: 0.007000000216066837
Epoch: 200, Loss: 1.0484
Epoch: 200, Acc: 0.01899999938905239
...
Epoch: 1800, Loss: 0.5019
Epoch: 1800, Acc: 0.9879999756813049
Epoch: 1900, Loss: 0.4844
Epoch: 1900, Acc: 0.9879999756813049
Answer from TheEngineerProgrammer on Stack Overflowpython - Pytorch's nn.BCEWithLogitsLoss() behaves totaly differently than nn.BCELoss() - Stack Overflow
BCELoss with class weights
How to use BCE loss and CrossEntropyLoss correctly?
Using BCELoss() with real-valued labels without any correspondance to a class
Videos
nn.BCELoss() expects your output to be probabilities, that is with the sigmoid activation.
nn.BCEWithLogitsLoss() expects your output to be logits, that is without the sigmoid activation.
I think maybe you calculated something wrong (like accuracy). Here I give you a simple example based on your code:
With probabilities:
dummy_x = torch.randn(1000,1)
dummy_y = (dummy_x > 0).type(torch.float)
model1 = nn.Sequential(
nn.Linear(1, 1),
nn.Sigmoid()
)
criterion1 = nn.BCELoss()
optimizer = torch.optim.Adam(model1.parameters(), 0.001)
def binary_accuracy(preds, y, logits=False):
if logits:
rounded_preds = torch.round(torch.sigmoid(preds))
else:
rounded_preds = torch.round(preds)
correct = (rounded_preds == y).float()
accuracy = correct.sum() / len(y)
return accuracy
for e in range(2000):
y_hat = model1(dummy_x)
loss = criterion1(y_hat, dummy_y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if e != 0 and e % 100==0:
print(f"Epoch: {e}, Loss: {loss:.4f}")
print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y)}")
#Result:
Epoch: 100, Loss: 0.5840
Epoch: 100, Acc: 0.5839999914169312
Epoch: 200, Loss: 0.5423
Epoch: 200, Acc: 0.6499999761581421
...
Epoch: 1800, Loss: 0.2862
Epoch: 1800, Acc: 0.9950000047683716
Epoch: 1900, Loss: 0.2793
Epoch: 1900, Acc: 0.9929999709129333
Now with logits
model2 = nn.Linear(1, 1)
criterion2 = nn.BCEWithLogitsLoss()
optimizer2 = torch.optim.Adam(model2.parameters(), 0.001)
for e in range(2000):
y_hat = model2(dummy_x)
loss = criterion2(y_hat, dummy_y)
optimizer2.zero_grad()
loss.backward()
optimizer2.step()
if e != 0 and e % 100==0:
print(f"Epoch: {e}, Loss: {loss:.4f}")
print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y, logits=True)}")
#Results:
Epoch: 100, Loss: 1.1042
Epoch: 100, Acc: 0.007000000216066837
Epoch: 200, Loss: 1.0484
Epoch: 200, Acc: 0.01899999938905239
...
Epoch: 1800, Loss: 0.5019
Epoch: 1800, Acc: 0.9879999756813049
Epoch: 1900, Loss: 0.4844
Epoch: 1900, Acc: 0.9879999756813049
You would need to modify the code according to the loss function (aka criterion) you are using.
For BCEloss - Since you are using the sigmoid layer in your model: so the output are between 0 and 1.
For BCEWithLogitsLoss - Output is the logit. Logit can be negative or positive. Logit is z, where
z = w1*x1 + w2*x2 + ... wn*xn
So, for your predictions while using BCEWithLogitsLoss, you need to pass this output through a sigmoid layer (For this you can create a small function which returns
1/(1+np.exp(-np.dot(x,w)))
and then you should calculate the accuracy.
Hope this helps!!!
Update
The BCELoss function did not use to be numerically stable. See this issue https://github.com/pytorch/pytorch/issues/751. However, this issue has been resolved with Pull #1792, so that BCELoss is numerically stable now!
Old answer
If you build PyTorch from source, you can use the numerically stable function BCEWithLogitsLoss(contributed in https://github.com/pytorch/pytorch/pull/1792), which takes logits as input.
Otherwise, you can use the following function (contributed by yzgao in the above issue):
class StableBCELoss(nn.modules.Module):
def __init__(self):
super(StableBCELoss, self).__init__()
def forward(self, input, target):
neg_abs = - input.abs()
loss = input.clamp(min=0) - input * target + (1 + neg_abs.exp()).log()
return loss.mean()
You might want to use a sigmoid layer at the end of the network. In that way the number would represent probabilities. Also make sure that the targets are binary numbers. If you post your complete code we might help more.