top of page

Derivation of the Binary Cross Entropy Loss Gradient

  • Writer: Gianluca Turcatel
    Gianluca Turcatel
  • Dec 23, 2021
  • 1 min read

ree

The binary cross entropy loss function is the preferred loss function in binary classification tasks, and is utilized to estimate the value of the model's parameters through gradient descent. In order to apply gradient descent we must calculate the derivative (gradient) of the loss function w.r.t. the model's parameters. Deriving the gradient is usually the most tedious part of training a machine learning model.

In this article we will derive the derivative of the binary cross entropy loss function w.r.t. W, step by step.


The binary cross entropy loss is given by

ree

y is the observed class, y_hat the prediction, W the model's parameters. Predictions are given by:

ree

z is equal to:

ree

To calculate the gradient of L(W) w.r.t. W we will use the chain rule:

ree

Let's derive the first term:

ree

The second term is a little more complicated:

ree
ree
ree
ree
ree

Done with the second term. The derivative of the third term is straight forward:

ree

Now let's put everything together:

ree

And there you have it: the derivative of the binary cross entropy loss function w.r.t. the model's parameters.


Follow me on Twitter and Facebook to stay updated.



bottom of page