derivative of cost function for Logistic Regression

I am going over the lectures on Machine Learning at Coursera.

I am struggling with the following. How can the partial derivative of

J(θ)=1mmi=1yilog(hθ(xi))+(1yi)log(1hθ(xi))

where hθ(x) is defined as follows

hθ(x)=g(θTx)
g(z)=11+ez

be θjJ(θ)=1mmi=1(hθ(xi)yi)xij

In other words, how would we go about calculating the partial derivative with respect to θ of the cost function (the logs are natural logarithms):

J(θ)=1mmi=1yilog(hθ(xi))+(1yi)log(1hθ(xi))

Answer

The reason is the following. We use the notation:

θxi:=θ0+θ1xi1++θpxip.

Then

loghθ(xi)=log11+eθxi=log(1+eθxi), log(1hθ(xi))=log(111+eθxi)=log(eθxi)log(1+eθxi)=θxilog(1+eθxi), [ this used: 1=(1+eθxi)(1+eθxi), the 1’s in numerator cancel, then we used: log(x/y)=log(x)log(y)]

Since our original cost function is the form of:

J(θ)=1mmi=1yilog(hθ(xi))+(1yi)log(1hθ(xi))

Plugging in the two simplified expressions above, we obtain
J(θ)=1mmi=1[yi(log(1+eθxi))+(1yi)(θxilog(1+eθxi))], which can be simplified to:
J(θ)=1mmi=1[yiθxiθxilog(1+eθxi)]=1mmi=1[yiθxilog(1+eθxi)],  ()

where the second equality follows from

θxilog(1+eθxi)=[logeθxi+log(1+eθxi)]=log(1+eθxi). [ we used log(x)+log(y)=log(xy) ]

All you need now is to compute the partial derivatives of () w.r.t. θj. As
θjyiθxi=yixij,
θjlog(1+eθxi)=xijeθxi1+eθxi=xijhθ(xi),

the thesis follows.

Attribution
Source : Link , Question Author : dreamwalker , Answer Author : amWhy

Leave a Comment