You are missing a minus sign before your binary cross entropy loss function. The loss function you currently have becomes more negative (positive) if the predictions are worse (better), therefore if you minimize this loss function the model will change its weights in the wrong direction and start performing worse. To make the model perform better you either maximize the loss function you currently have (i.e. use gradient ascent instead of gradient descent, as you have in your second example), or you add a minus sign so that a decrease in the loss is linked to a better prediction.

Answer from Oxbowerce on Stack Exchange
🌐
MachineLearningMastery
machinelearningmastery.com › home › blog › how to implement logistic regression from scratch in python
How To Implement Logistic Regression From Scratch in Python - MachineLearningMastery.com
December 11, 2019 - In this tutorial, you discovered how to implement logistic regression using stochastic gradient descent from scratch with Python.
🌐
GitHub
github.com › KhaledAshrafH › Logistic-Regression
GitHub - KhaledAshrafH/Logistic-Regression: This program implements logistic regression from scratch using the gradient descent algorithm in Python to predict whether customers will purchase a new car based on their age and salary.
The program defines a class LogisticRegression that implements logistic regression from scratch using the gradient descent algorithm.
Starred by 8 users
Forked by 4 users
Languages   Python 100.0% | Python 100.0%
🌐
IBM
developer.ibm.com › articles › implementing-logistic-regression-from-scratch-in-python
Implementing logistic regression from scratch in Python
Implement binary logistic regression from scratch in Python using NumPy. Learn sigmoid functions, binary cross-entropy loss, and gradient descent with real code.
🌐
Optimization Daily
optimization-daily.netlify.app › posts › 2022-07-13-logistic-regression-and-gradient-descent-in-python
Optimization Daily: Logistic Regression and Gradient Descent in Python
July 13, 2022 - In this post we will walk through how to train a Logistic Regression model from scratch using Gradient Descent in Python.
Top answer
1 of 2
2

You are missing a minus sign before your binary cross entropy loss function. The loss function you currently have becomes more negative (positive) if the predictions are worse (better), therefore if you minimize this loss function the model will change its weights in the wrong direction and start performing worse. To make the model perform better you either maximize the loss function you currently have (i.e. use gradient ascent instead of gradient descent, as you have in your second example), or you add a minus sign so that a decrease in the loss is linked to a better prediction.

2 of 2
2

I think your implementation is correct and the answer provided is just wrong.

Just for reference, the below figure represents the theory / math we are using here to implement Logistic Regression with Gradient Descent:

Here, we have the learnable parameter vector and (since a singe data point), with , where corresponds to the intercept (bias) term.

Just making your implementation a little modular and increasing the number of epochs to 10 (instead of 1):

def update_params(a, b, x, y, z, lr):
    a = a + lr * x * (z-y)
    a = np.round(a, decimals=3)
    b = b + lr * (z-y)
    b = np.round(b, decimals=3)
    return a, b          

def LogitRegression(arr):
  x, y, a, b = arr
  lr = 1.0
  num_epochs = 10
  #losses, preds = [], []
  for _ in range(num_epochs):    
      z = 1.0 / (1.0 + np.exp(-a * x - b))    
      bce = -y*np.log(z) -(1-y)*np.log(1-z)
      #losses.append(bce)
      #preds.append(z)
      print(bce, y, z, a, b)
      a, b = update_params(a, b, x, y, z, lr)            
  
  return ", ".join([str(a), str(b)])

LogitRegression([1,1,1,1])
# 0.12692801104297263 1 0.8807970779778823 1 1
# 0.10135698320837692 1 0.9036104015620354 1.119 1.119 # values after 1 epoch
# 0.08437500133718023 1 0.9190865327845347 1.215 1.215
# 0.0721998635352405 1 0.9303449352007099 1.296 1.296
# 0.06305834631954188 1 0.9388886913913739 1.366 1.366
# 0.05601486564909184 1 0.9455250799418752 1.427 1.427
# 0.05042252914331105 1 0.9508275872468411 1.481 1.481
# 0.04582166273506799 1 0.9552122969502131 1.53 1.53
# 0.041959389233941616 1 0.958908721799535 1.575 1.575
# 0.03871910934525996 1 0.962020893877162 1.616 1.616

If you plot the BCE loss and the predicted y (i.e., z) over iterations, you get the following figure (as expected, BCE loss is monotonically decreasing and z is getting closer to ground truth y with increasing iterations, leading to convergence):

Now, if you change your update_params() to the following:

def update_params(a, b, x, y, z, lr):
    a = a + lr * x * (z-y)
    a = np.round(a, decimals=3)
    b = b + lr * (z-y)
    b = np.round(b, decimals=3)
    return a, b

and call LogitRegression() with the same set of inputs:

LogitRegression([1,1,1,1])
# 0.12692801104297263 1 0.8807970779778823 1 1
# 0.15845663982299638 1 0.8534599691639768 0.881 0.881 # provided in the answer
# 0.2073277757451888 1 0.8127532055353431 0.734 0.734
# 0.28883714051459425 1 0.7491341990786479 0.547 0.547
# 0.4403300268044629 1 0.6438239068707556 0.296 0.296
# 0.7549461015956136 1 0.4700359482354282 -0.06 -0.06
# 1.4479476778575628 1 0.2350521962362353 -0.59 -0.59
# 2.774416770021533 1 0.0623858513799944 -1.355 -1.355
# 4.596141947283801 1 0.010090691161759239 -2.293 -2.293
# 6.56740642634977 1 0.0014054377957286094 -3.283 -3.283

and you will end up with the following figure if you plot (clearly this is wrong, since the loss function increases with every epoch and z goes further away from ground-truth y, leading to divergence):

Also, the above implementation can easily be extended to multi-dimensional data containing many data points like the following:

def VanillaLogisticRegression(x, y): # LR without regularization
    m, n = x.shape
    w = np.zeros((n+1, 1))
    X = np.hstack((np.ones(m)[:,None],x)) # include the feature corresponding to the bias term
    num_epochs = 1000 # number of epochs to run gradient descent, tune this hyperparametrer
    lr = 0.5 # learning rate, tune this hyperparameter
    losses = []
    for _ in range(num_epochs):
        y_hat = 1. / (1. + np.exp(-np.dot(X, w))) # predicted y by the LR model
        J = np.mean(-y*np.log2(y_hat) - (1-y)*np.log2(1-y_hat)) # the binary cross entropy loss function
        grad_J = np.mean((y_hat - y)*X, axis=0) # the gradient of the loss function
        w -= lr * grad_J[:, None] # the gradient descent step, update the parameter vector w
        losses.append(J)
        # test corretness of the implementation
        # loss J should monotonically decrease & y_hat should be closer to y, with increasing iterations
        # print(J)            
    return w

m, n = 1000, 5 # 1000 rows, 5 columns
# randomly generate dataset, note that y can have values as 0 and 1 only
x, y = np.random.random(m*n).reshape(m,n), np.random.randint(0,2,m).reshape(-1,1)
w = VanillaLogisticRegression(x, y)
w # learnt parameters
# array([[-0.0749518 ],
#   [ 0.28592107],
#   [ 0.15202566],
#   [-0.15020757],
#   [ 0.08147078],
#   [-0.18823631]])

If you plot the loss function value over iterations, you will get a plot like the following one, showing how it converges.

Finally, let's compare the above implementation with sklearn's implementation, which uses a more advanced optimization algorithm lbfgs by default, hence likely to converge much faster, but if our implementation is correct both of then should converge to the same global minima, since the loss function is convex (note that sklearn by default uses regularization, in order to have almost no regularization, we need to have the value of the input hyper-parameter very high):

from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(random_state=0, C=10**12).fit(x, y)
print(clf.coef_, clf.intercept_)
# [[ 0.28633262  0.15256914 -0.14975667  0.08192404 -0.18780851]] [-0.07612282]

Compare the parameter values obtained from the above implementation and the one obtained with sklearn's implementation: they are almost equal.

Also, let's compare the predicted probabilities obtained using these two different implementations of LR (one from scratch, another one from sklearn's library function), as can be seen from the following scatterplot, they are almost identical:

pred_probs = 1 / (1 + np.exp(-X@w))
plt.scatter(pred_probs, clf.predict_proba(x)[:,1])
plt.grid()
plt.xlabel('pred prob', size=20)
plt.ylabel('pred prob (sklearn)', size=20)
plt.show()

Finally, let's compute the accuracies obtained, they are identical too:

print(sum((pred_probs > 0.5) == y) / len(y)) 
# [0.527]
clf.score(x, y)   
# 0.527

This also shows the correctness of the implementation.

Find elsewhere
🌐
Dataquest
dataquest.io › blog › logistic-regression-in-python
An Intro to Logistic Regression in Python (100+ Code Examples)
January 7, 2025 - In this section, you'll build your own custom logistic regression model using stochastic gradient descent.
🌐
Atmamani
atmamani.github.io › projects › ml › implementing-logistic-regression-in-python
Implementing Gradient Descent for Logistic Regression - Atma's blog
Note: At this point, I realize my gradient descent is not really optimizing well. The equation of the decision boundary line is way off. Hence I approach to solve this problem using Scikit-Learn and see what its parameters are. Using the logistic regression from SKlearn, we fit the same data ...
🌐
Kaggle
kaggle.com › code › marissafernandes › logistic-regression-sgd-in-python-from-scratch
Logistic Regression + SGD in Python from scratch
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Medium
medium.com › @IwriteDSblog › gradient-descent-for-logistics-regression-in-python-18e033775082
Gradient Descent for Logistics Regression in Python | by Hoang Phong | Medium
July 31, 2021 - Therefore, for the binary classifier, we need to compute an algorithm that lives in the range 0 to 1, inclusively. This article will cover how Logistics Regression utilizes Gradient Descent to find the optimized parameters and how to implement the algorithm in Python.
🌐
Medium
medium.com › intro-to-artificial-intelligence › logistic-regression-using-gradient-descent-bf8cbe749ceb
Logistic regression using gradient descent | by Dhanoop Karunakaran | Intro to Artificial Intelligence | Medium
June 16, 2023 - This concept is taken from the linear regression. Partial derivative of cost function with respect to jth weight. Partial derivative of cost function with respect to bias, b · We can rewrite the f term with sigmoid function to complete the implementation ready algorithm for the gradient descent.
🌐
Medium
medium.com › @edwinvarghese4442 › logistic-regression-with-gradient-descent-tutorial-part-1-theory-529c93866001
Logistic regression with gradient descent —Tutorial Part 1 — Theory | by Edwin Varghese | Medium
June 18, 2018 - For this tutorial, we are going to use Logistic regression. Let’s frame the equation for the first observation as follows: ... 1/(1+e^(-z)) is the sigmoid function or also called as the logit function. It squashes the value of the output between 0 and 1. ... Suppose, a(probability) = .26 and ground truth is 1. Now there is an error. This error we have reduce by adjusting the weights using the method of gradient descent algorithm.
🌐
Upgrad
upgrad.com › home › blog › artificial intelligence › understanding gradient descent in logistic regression: a guide for beginners
Gradient Descent in Logistic Regression - Learn in Minutes!
June 26, 2025 - This adjustment is made using the gradient descent update rule, which determines how much the weights should change at each iteration. ... Iterating these steps, logistic regression finds parameters that minimize error, improving prediction accuracy.
🌐
YOU CANalytics
ucanalytics.com › home › gradient descent for logistic regression simplified – step by step visual guide
YOU CANalytics | Gradient Descent for Logistic Regression Simplified - Step by Step Visual Guide – YOU CANalytics |
September 16, 2018 - If you will run the gradient descent without assuming β1 = β2 then β0 =-15.4233, β1 = 0.1090, and β2 = 0.1097. I suggest you try all these solutions using this code: Gradient Descent – Logistic Regression (R Code).
🌐
GitHub
github.com › PierreExeter › logistic-regression-from-scratch › blob › master › logistic_regression_from_scratch.ipynb
logistic-regression-from-scratch/logistic_regression_from_scratch.ipynb at master · PierreExeter/logistic-regression-from-scratch
"fitting the coefficients $\\theta$. This is done by computing the derivative of the loss function with respect to each coefficient $\\theta$. This gradient is an indication of how much the loss would vary if we change the coefficient.
Author   PierreExeter
🌐
Medium
medium.com › analytics-vidhya › ml-from-scratch-logistic-regression-gradient-descent-63b6beb1664c
[ML from scratch] Logistic Regression — Gradient Descent | by Giang Tran | Analytics Vidhya | Medium
December 12, 2021 - Surprisingly, the derivative J with respect to w of logistic regression is identical with the derivative of linear regression. The only difference is that the output of linear regression is h which is linear function, and in logistic is z which is sigmoid function. After found derivative we use gradient descent to update the parameters:
🌐
CodeSignal
codesignal.com › learn › courses › regression-and-gradient-descent › lessons › understanding-logistic-regression-and-its-implementation-using-gradient-descent
Understanding Logistic Regression and Its Implementation ...
In this lesson, we'll guide you through the basic concepts that define Logistic Regression, focusing on its unique components like the Sigmoid function and Log-Likelihood. Eventually, we'll utilize Python to engineer a straightforward Logistic Regression model using Gradient Descent.