I’m taking a machine learning course. The professor has a model for linear regression. Where hθ is the hypothesis (proposed model. linear regression, in this case), J(θ1) is the cost function, m is the number of elements in the training set, x(i) and y(i) are the variables of the training set element at i

hθ=θ1x

J(θ1)=12mm∑i=1(hθ(x(i))−y(i))2

What I don’t understand is why he is dividing the sum by 2m.

**Answer**

The 1m is to “average” the squared error over the number of components so that the number of components doesn’t affect the function (see John’s answer).

So now the question is why there is an extra 12. In short, it doesn’t matter. The solution that minimizes J as you have written it will also minimize 2J=1m∑i(h(xi)−yi)2. The latter function, 2J, may seem more “natural,” but the factor of 2 does not matter when optimizing.

The only reason some authors like to include it is because when you take the derivative with respect to x, the 2 goes away.

**Attribution***Source : Link , Question Author : Daniel says Reinstate Monica , Answer Author : angryavian*