# derivative of cost function for Logistic Regression

I am going over the lectures on Machine Learning at Coursera.

I am struggling with the following. How can the partial derivative of

$$J(θ)=−1mm∑i=1yilog(hθ(xi))+(1−yi)log(1−hθ(xi))J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i}))$$

where $$hθ(x)h_{\theta}(x)$$ is defined as follows

$$hθ(x)=g(θTx)h_{\theta}(x)=g(\theta^{T}x)$$
$$g(z)=11+e−zg(z)=\frac{1}{1+e^{-z}}$$

be $$∂∂θjJ(θ)=1mm∑i=1(hθ(xi)−yi)xij \frac{\partial}{\partial\theta_{j}}J(\theta) =\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x^{i})-y^i)x_j^i$$

In other words, how would we go about calculating the partial derivative with respect to $$θ\theta$$ of the cost function (the logs are natural logarithms):

$$J(θ)=−1mm∑i=1yilog(hθ(xi))+(1−yi)log(1−hθ(xi))J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i}))$$

The reason is the following. We use the notation:

$$θxi:=θ0+θ1xi1+⋯+θpxip.\theta x^i:=\theta_0+\theta_1 x^i_1+\dots+\theta_p x^i_p.$$

Then

$$loghθ(xi)=log11+e−θxi=−log(1+e−θxi),\log h_\theta(x^i)=\log\frac{1}{1+e^{-\theta x^i} }=-\log ( 1+e^{-\theta x^i} ),$$ $$log(1−hθ(xi))=log(1−11+e−θxi)=log(e−θxi)−log(1+e−θxi)=−θxi−log(1+e−θxi),\log(1- h_\theta(x^i))=\log(1-\frac{1}{1+e^{-\theta x^i} })=\log (e^{-\theta x^i} )-\log ( 1+e^{-\theta x^i} )=-\theta x^i-\log ( 1+e^{-\theta x^i} ),$$ [ this used: $$1=(1+e−θxi)(1+e−θxi), 1 = \frac{(1+e^{-\theta x^i})}{(1+e^{-\theta x^i})},$$ the 1’s in numerator cancel, then we used: $$log(x/y)=log(x)−log(y)\log(x/y) = \log(x) - \log(y)$$]

Since our original cost function is the form of:

$$J(θ)=−1mm∑i=1yilog(hθ(xi))+(1−yi)log(1−hθ(xi))J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i}))$$

Plugging in the two simplified expressions above, we obtain
$$J(θ)=−1mm∑i=1[−yi(log(1+e−θxi))+(1−yi)(−θxi−log(1+e−θxi))]J(\theta)=-\frac{1}{m}\sum_{i=1}^m \left[-y^i(\log ( 1+e^{-\theta x^i})) + (1-y^i)(-\theta x^i-\log ( 1+e^{-\theta x^i} ))\right]$$, which can be simplified to:
$$J(θ)=−1mm∑i=1[yiθxi−θxi−log(1+e−θxi)]=−1mm∑i=1[yiθxi−log(1+eθxi)], (∗)J(\theta)=-\frac{1}{m}\sum_{i=1}^m \left[y_i\theta x^i-\theta x^i-\log(1+e^{-\theta x^i})\right]=-\frac{1}{m}\sum_{i=1}^m \left[y_i\theta x^i-\log(1+e^{\theta x^i})\right],~~(*)$$

where the second equality follows from

$$−θxi−log(1+e−θxi)=−[logeθxi+log(1+e−θxi)]=−log(1+eθxi).-\theta x^i-\log(1+e^{-\theta x^i})= -\left[ \log e^{\theta x^i}+ \log(1+e^{-\theta x^i} ) \right]=-\log(1+e^{\theta x^i}).$$ [ we used $$log(x)+log(y)=log(xy) \log(x) + \log(y) = log(x y)$$ ]

All you need now is to compute the partial derivatives of $$(∗)(*)$$ w.r.t. $$θj\theta_j$$. As
$$∂∂θjyiθxi=yixij,\frac{\partial}{\partial \theta_j}y_i\theta x^i=y_ix^i_j,$$
$$∂∂θjlog(1+eθxi)=xijeθxi1+eθxi=xijhθ(xi),\frac{\partial}{\partial \theta_j}\log(1+e^{\theta x^i})=\frac{x^i_je^{\theta x^i}}{1+e^{\theta x^i}}=x^i_jh_\theta(x^i),$$

the thesis follows.