I am trying to wrap my head around back-propagation in a neural network with a Softmax classifier, which uses the Softmax function:

pj=eoj∑keok

This is used in a loss function of the form

L=−∑jyjlogpj,

where o is a vector. I need the derivative of L with respect to o. Now if my derivatives are right,

∂pj∂oi=pi(1−pi),i=j

and

∂pj∂oi=−pipj,i≠j.

Using this result we obtain

∂L∂oi=−(yi(1−pi)+∑k≠i−pkyk)=piyi−yi+∑k≠ipkyk=(∑ipiyi)−yi

According to slides I’m using, however, the result should be

∂L∂oi=pi−yi.

Can someone please tell me where I’m going wrong?

**Answer**

Your derivatives ∂pj∂oi are indeed correct, however there is an error when you differentiate the loss function L with respect to oi.

We have the following (where I have highlighted in red where you have gone wrong)

∂L∂oi=−∑kyk∂logpk∂oi=−∑kyk1pk∂pk∂oi=−yi(1−pi)−∑k≠iyk1pk(−pkpi)=−yi(1−pi)+∑k≠iyk(pi)=−yi+yipi+∑k≠iyk(pi)=pi(∑kyk)−yi=pi−yi given that ∑kyk=1 from the slides (as y is a vector with only one non-zero element, which is 1).

**Attribution***Source : Link , Question Author : Moos Hueting , Answer Author : Alijah Ahmed*