top of page

Search

Derivation of the Binary Cross Entropy Loss Gradient

Gianluca Turcatel
Dec 23, 2021
1 min read

The binary cross entropy loss function is the preferred loss function in binary classification tasks, and is utilized to estimate the value of the model's parameters through gradient descent. In order to apply gradient descent we must calculate the derivative (gradient) of the loss function w.r.t. the model's parameters. Deriving the gradient is usually the most tedious part of training a machine learning model.

In this article we will derive the derivative of the binary cross entropy loss function w.r.t. W, step by step.

The binary cross entropy loss is given by

y is the observed class, y_hat the prediction, W the model's parameters. Predictions are given by:

z is equal to:

To calculate the gradient of L(W) w.r.t. W we will use the chain rule:

Let's derive the first term:

The second term is a little more complicated:

Done with the second term. The derivative of the third term is straight forward:

Now let's put everything together:

And there you have it: the derivative of the binary cross entropy loss function w.r.t. the model's parameters.

Follow me on Twitter and Facebook to stay updated.

#Python #MachineLearning #CrossEntropy #BinaryClassification

Recent Posts

GPT2 From Scratch

GPT2 From Scratch

Transformer from Scratch

Transformer from Scratch

Hierarchical Clustering From Scratch

Hierarchical Clustering From Scratch

bottom of page