I am going over the lectures on Machine Learning at Coursera.
I am struggling with the following. How can the partial derivative of
where hθ(x) is defined as follows
In other words, how would we go about calculating the partial derivative with respect to θ of the cost function (the logs are natural logarithms):
The reason is the following. We use the notation:
loghθ(xi)=log11+e−θxi=−log(1+e−θxi), log(1−hθ(xi))=log(1−11+e−θxi)=log(e−θxi)−log(1+e−θxi)=−θxi−log(1+e−θxi), [ this used: 1=(1+e−θxi)(1+e−θxi), the 1’s in numerator cancel, then we used: log(x/y)=log(x)−log(y)]
Since our original cost function is the form of:
Plugging in the two simplified expressions above, we obtain
J(θ)=−1mm∑i=1[−yi(log(1+e−θxi))+(1−yi)(−θxi−log(1+e−θxi))], which can be simplified to:
where the second equality follows from
−θxi−log(1+e−θxi)=−[logeθxi+log(1+e−θxi)]=−log(1+eθxi). [ we used log(x)+log(y)=log(xy) ]
All you need now is to compute the partial derivatives of (∗) w.r.t. θj. As
the thesis follows.