Derivative of Softmax loss function

I am trying to wrap my head around back-propagation in a neural network with a Softmax classifier, which uses the Softmax function:

pj=eojkeok

This is used in a loss function of the form

L=jyjlogpj,

where o is a vector. I need the derivative of L with respect to o. Now if my derivatives are right,

pjoi=pi(1pi),i=j

and

pjoi=pipj,ij.

Using this result we obtain

Loi=(yi(1pi)+kipkyk)=piyiyi+kipkyk=(ipiyi)yi

According to slides I’m using, however, the result should be

Loi=piyi.

Can someone please tell me where I’m going wrong?

Answer

Your derivatives pjoi are indeed correct, however there is an error when you differentiate the loss function L with respect to oi.

We have the following (where I have highlighted in red where you have gone wrong)
Loi=kyklogpkoi=kyk1pkpkoi=yi(1pi)kiyk1pk(pkpi)=yi(1pi)+kiyk(pi)=yi+yipi+kiyk(pi)=pi(kyk)yi=piyi given that kyk=1 from the slides (as y is a vector with only one non-zero element, which is 1).

Attribution
Source : Link , Question Author : Moos Hueting , Answer Author : Alijah Ahmed

Leave a Comment