I am trying to wrap my head around back-propagation in a neural network with a Softmax classifier, which uses the Softmax function:
This is used in a loss function of the form
where o is a vector. I need the derivative of L with respect to o. Now if my derivatives are right,
Using this result we obtain
According to slides I’m using, however, the result should be
Can someone please tell me where I’m going wrong?
Your derivatives ∂pj∂oi are indeed correct, however there is an error when you differentiate the loss function L with respect to oi.
We have the following (where I have highlighted in red where you have gone wrong)
∂L∂oi=−∑kyk∂logpk∂oi=−∑kyk1pk∂pk∂oi=−yi(1−pi)−∑k≠iyk1pk(−pkpi)=−yi(1−pi)+∑k≠iyk(pi)=−yi+yipi+∑k≠iyk(pi)=pi(∑kyk)−yi=pi−yi given that ∑kyk=1 from the slides (as y is a vector with only one non-zero element, which is 1).