I am going over the lectures on Machine Learning at Coursera.

I am struggling with the following. How can the partial derivative of

J(θ)=−1mm∑i=1yilog(hθ(xi))+(1−yi)log(1−hθ(xi))

where hθ(x) is defined as follows

hθ(x)=g(θTx)

g(z)=11+e−zbe ∂∂θjJ(θ)=1mm∑i=1(hθ(xi)−yi)xij

In other words, how would we go about calculating the partial derivative with respect to θ of the cost function (the logs are natural logarithms):

J(θ)=−1mm∑i=1yilog(hθ(xi))+(1−yi)log(1−hθ(xi))

**Answer**

The reason is the following. We use the notation:

θxi:=θ0+θ1xi1+⋯+θpxip.

Then

loghθ(xi)=log11+e−θxi=−log(1+e−θxi), log(1−hθ(xi))=log(1−11+e−θxi)=log(e−θxi)−log(1+e−θxi)=−θxi−log(1+e−θxi), [ this used: 1=(1+e−θxi)(1+e−θxi), the 1’s in numerator cancel, then we used: log(x/y)=log(x)−log(y)]

Since our original cost function is the form of:

J(θ)=−1mm∑i=1yilog(hθ(xi))+(1−yi)log(1−hθ(xi))

Plugging in the two simplified expressions above, we obtain

J(θ)=−1mm∑i=1[−yi(log(1+e−θxi))+(1−yi)(−θxi−log(1+e−θxi))], which can be simplified to:

J(θ)=−1mm∑i=1[yiθxi−θxi−log(1+e−θxi)]=−1mm∑i=1[yiθxi−log(1+eθxi)], (∗)

where the second equality follows from

−θxi−log(1+e−θxi)=−[logeθxi+log(1+e−θxi)]=−log(1+eθxi). [ we used log(x)+log(y)=log(xy) ]

All you need now is to compute the partial derivatives of (∗) w.r.t. θj. As

∂∂θjyiθxi=yixij,

∂∂θjlog(1+eθxi)=xijeθxi1+eθxi=xijhθ(xi),

the thesis follows.

**Attribution***Source : Link , Question Author : dreamwalker , Answer Author : amWhy*