Brave Search

stats.stackexchange.com › questions › 219241 › gradient-for-logistic-loss-function

My answer for my question: yes, it can be shown that gradient for logistic loss is equal to difference between true values and predicted probabilities. Brief explanation was found here.

First, logistic loss is just negative log-likelihood, so we can start with expression for log-likelihood (p. 74 - this expression is log-likelihood itself, not negative log-likelihood):

$\text{[math]}$

$\text{[math]}$ is logistic function: $\text{[math]}$ , where $\text{[math]}$ is predicted values before logistic transformation (i.e., log-odds):

$\text{[math]}$

First derivative obtained using Wolfram Alpha:

$\text{[math]}$

After multiplying by $\text{[math]}$ :

$\text{[math]}$

After changing sign we have expression for gradient of logistic loss function:

$\text{[math]}$

Answer from Ogurtsov on Stack Exchange

Medium

medium.com › @ilmunabid › beginners-guide-to-finding-gradient-derivative-of-log-loss-by-hand-detailed-steps-74a6cacfe5cf

Beginner’s Guide to Finding Gradient/Derivative of Log Loss by Hand (Detailed Steps) | by Abid Ilmun Fisabil | Medium

August 17, 2022 - For a quick reference to logistic regression. cost function is used to evaluate our prediction. And the prediction (using linear equation) is transformed into probability using sigmoid function before can be used inside the cost function. We calculate the gradient of cost function to know which direction our loss is moving, up or down.

Stack Exchange

stats.stackexchange.com › questions › 219241 › gradient-for-logistic-loss-function

r - Gradient for logistic loss function - Cross Validated

Top answer

1 of 2

My answer for my question: yes, it can be shown that gradient for logistic loss is equal to difference between true values and predicted probabilities. Brief explanation was found here.

First, logistic loss is just negative log-likelihood, so we can start with expression for log-likelihood (p. 74 - this expression is log-likelihood itself, not negative log-likelihood):

$\text{[math]}$

$\text{[math]}$ is logistic function: $\text{[math]}$ , where $\text{[math]}$ is predicted values before logistic transformation (i.e., log-odds):

$\text{[math]}$

First derivative obtained using Wolfram Alpha:

$\text{[math]}$

After multiplying by $\text{[math]}$ :

$\text{[math]}$

After changing sign we have expression for gradient of logistic loss function:

$\text{[math]}$

2 of 2

AdamO is correct, if you just want the gradient of the logistic loss (what the op asked for in the title), then it needs a 1/p(1-p). Unfortunately people from the DL community for some reason assume logistic loss to always be bundled with a sigmoid, and pack their gradients together and call that the logistic loss gradient (the internet is filled with posts asserting this). Since the gradient of sigmoid happens to be p(1-p) it eliminates the 1/p(1-p) of the logistic loss gradient. But if you are implementing SGD (walking back the layers), and applying the sigmoid gradient when you get to the sigmoid, then you need to start with the actual logistic loss gradient -- which has a 1/p(1-p).

Discussions

numpy - How is the gradient and hessian of logarithmic loss computed in the custom objective function example script in xgboost's github repository? - Stack Overflow

The log loss function is the sum of where . The gradient (with respect to p) is then however in the code its . More on stackoverflow.com

stackoverflow.com

September 18, 2016

sgd - Gradient for log regression loss - Stack Overflow

dev. of 7 runs, 1 loop each) %timeit ... res=calc_loss_grad_2(weights, X_batch, y_batch) #49.1 µs ± 503 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ... Find the answer to your question by asking. Ask question ... See similar questions with these tags. ... 0 Find negative log-likelihood cost for logistic regression in python and gradient loss with ... More on stackoverflow.com

stackoverflow.com

reinforcement learning - Can we simply remove the log term for loss in policy gradient methods? - Artificial Intelligence Stack Exchange

2 Is the negative of the policy loss function in a simple policy gradient algorithm an estimator of expected returns? 2 What happens with policy gradient methods if rewards are differentiable? 3 What specifically is the gradient of the log of the probability in policy gradient methods? More on ai.stackexchange.com

ai.stackexchange.com

Question about the sign of the advantage and log probability in the policy gradient

In reinforcement learning, the only real metric that is generally measured is average return per trajectory and that should more or less always improve (akin to accuracy in classification). Loss here is not that informative tbh, it is basically telling you how surprised the model was at that iteration; note that value function baseline is what the model is expecting as reward and if it gets exactly that, model will not be surprised at all. But if reward is more or less than the expected value, then model is somewhat surprised and should look to minimize this surprise in the future. If you do want to plot loss, I will suggest plotting the absolute value of loss and that should by and large go down. But do note that loss is more indicative of your value function approximation than policy network. More on reddit.com

r/reinforcementlearning

May 24, 2020

Videos