Brave Search

stats.stackexchange.com › questions › 219241 › gradient-for-logistic-loss-function

My answer for my question: yes, it can be shown that gradient for logistic loss is equal to difference between true values and predicted probabilities. Brief explanation was found here.

First, logistic loss is just negative log-likelihood, so we can start with expression for log-likelihood (p. 74 - this expression is log-likelihood itself, not negative log-likelihood):

$\text{[math]}$

$\text{[math]}$ is logistic function: $\text{[math]}$ , where $\text{[math]}$ is predicted values before logistic transformation (i.e., log-odds):

$\text{[math]}$

First derivative obtained using Wolfram Alpha:

$\text{[math]}$

After multiplying by $\text{[math]}$ :

$\text{[math]}$

After changing sign we have expression for gradient of logistic loss function:

$\text{[math]}$

Answer from Ogurtsov on Stack Exchange

Medium

medium.com › @ilmunabid › beginners-guide-to-finding-gradient-derivative-of-log-loss-by-hand-detailed-steps-74a6cacfe5cf

Beginner’s Guide to Finding Gradient/Derivative of Log Loss by Hand (Detailed Steps) | by Abid Ilmun Fisabil | Medium

August 17, 2022 - For a quick reference to logistic regression. cost function is used to evaluate our prediction. And the prediction (using linear equation) is transformed into probability using sigmoid function before can be used inside the cost function. We calculate the gradient of cost function to know which direction our loss is moving, up or down.

Stack Exchange

stats.stackexchange.com › questions › 219241 › gradient-for-logistic-loss-function

r - Gradient for logistic loss function - Cross Validated

Top answer

1 of 2

My answer for my question: yes, it can be shown that gradient for logistic loss is equal to difference between true values and predicted probabilities. Brief explanation was found here.

First, logistic loss is just negative log-likelihood, so we can start with expression for log-likelihood (p. 74 - this expression is log-likelihood itself, not negative log-likelihood):

$\text{[math]}$

$\text{[math]}$ is logistic function: $\text{[math]}$ , where $\text{[math]}$ is predicted values before logistic transformation (i.e., log-odds):

$\text{[math]}$

First derivative obtained using Wolfram Alpha:

$\text{[math]}$

After multiplying by $\text{[math]}$ :

$\text{[math]}$

After changing sign we have expression for gradient of logistic loss function:

$\text{[math]}$

2 of 2

AdamO is correct, if you just want the gradient of the logistic loss (what the op asked for in the title), then it needs a 1/p(1-p). Unfortunately people from the DL community for some reason assume logistic loss to always be bundled with a sigmoid, and pack their gradients together and call that the logistic loss gradient (the internet is filled with posts asserting this). Since the gradient of sigmoid happens to be p(1-p) it eliminates the 1/p(1-p) of the logistic loss gradient. But if you are implementing SGD (walking back the layers), and applying the sigmoid gradient when you get to the sigmoid, then you need to start with the actual logistic loss gradient -- which has a 1/p(1-p).

Discussions

Confused in the gradient descent of the logistic log loss function

Lets keep the derivation part a side, it is too complicated for now. Why y is subtracted, in the previous lecture (simplified form), no matter what class we use, thr y term supposed to be multiplied to the ln part. … More on community.deeplearning.ai

community.deeplearning.ai

January 11, 2023

numpy - How is the gradient and hessian of logarithmic loss computed in the custom objective function example script in xgboost's github repository? - Stack Overflow

The log loss function is the sum of where . The gradient (with respect to p) is then however in the code its . More on stackoverflow.com

stackoverflow.com

September 18, 2016

sgd - Gradient for log regression loss - Stack Overflow

dev. of 7 runs, 1 loop each) %timeit ... res=calc_loss_grad_2(weights, X_batch, y_batch) #49.1 µs ± 503 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ... Find the answer to your question by asking. Ask question ... See similar questions with these tags. ... 0 Find negative log-likelihood cost for logistic regression in python and gradient loss with ... More on stackoverflow.com

stackoverflow.com

reinforcement learning - Can we simply remove the log term for loss in policy gradient methods? - Artificial Intelligence Stack Exchange

2 Is the negative of the policy loss function in a simple policy gradient algorithm an estimator of expected returns? 2 What happens with policy gradient methods if rewards are differentiable? 3 What specifically is the gradient of the log of the probability in policy gradient methods? More on ai.stackexchange.com

ai.stackexchange.com

Videos