Derivation of the Binary Cross Entropy Loss Gradient

Gianluca Turcatel
Dec 23, 2021
1 min read

The binary cross entropy loss function is the preferred loss function in binary classification tasks, and is utilized to estimate the value of the model's parameters through gradient descent. In order to apply gradient descent we must calculate the derivative (gradient) of the loss function w.r.t. the model's parameters. Deriving the gradient is usually the most tedious part of training a machine learning model.

In this article we will derive the derivative of the binary cross entropy loss function w.r.t. W, step by step.

The binary cross entropy loss is given by