I am trying to wrap my head around back-propagation in a neural network with a Softmax classifier, which uses the Softmax function:
pj=eoj∑keok
This is used in a loss function of the form
L=−∑jyjlogpj,
where o is a vector. I need the derivative of L with respect to o. Now if my derivatives are right,
∂pj∂oi=pi(1−pi),i=j
and
∂pj∂oi=−pipj,i≠j.
Using this result we obtain
∂L∂oi=−(yi(1−pi)+∑k≠i−pkyk)=piyi−yi+∑k≠ipkyk=(∑ipiyi)−yi
According to slides I’m using, however, the result should be
∂L∂oi=pi−yi.
Can someone please tell me where I’m going wrong?
Answer
Your derivatives ∂pj∂oi are indeed correct, however there is an error when you differentiate the loss function L with respect to oi.
We have the following (where I have highlighted in red where you have gone wrong)
∂L∂oi=−∑kyk∂logpk∂oi=−∑kyk1pk∂pk∂oi=−yi(1−pi)−∑k≠iyk1pk(−pkpi)=−yi(1−pi)+∑k≠iyk(pi)=−yi+yipi+∑k≠iyk(pi)=pi(∑kyk)−yi=pi−yi given that ∑kyk=1 from the slides (as y is a vector with only one non-zero element, which is 1).
Attribution
Source : Link , Question Author : Moos Hueting , Answer Author : Alijah Ahmed