I verified experimentally that in Java the equality
Math.sqrt(x*x) = x
holds for all
long x
such thatx*x
doesn’t overflow. Here, Javalong
is a 64 bit signed type anddouble
is a IEEE binary floating point type with at least 53 bits mantissa and sufficiently long exponent.Mathematically, there are two imprecise functions involved:
- Conversion from
long
todouble
which loses precision due to the mantissa being only 53 bits where 63 bits would be needed. This operation is guaranteed to return the closest representable result.- Computing square root, which is also guaranteed to return the closest representable result.
Mathematically, this can be expressed like
∀x∈Nx≤3037000499round(√round(x2))=x
where round is the rounding function from R into the set of all numbers representable as
double
.I’m looking for a proof since no experiment can assure it works across all machines.
Answer
The idea is simple: Find upper and lower bounds for
X:=√round(x2)
and show that round(X)=x.
Let ulp(x) denote the unit of least precision at x
and let E(x) and M(x) denote the exponent and mantissa of x, i.e.,
x=M(x)⋅2E(x)
with 1≤M(x)<2 and E(x)∈Z. Define
Δ(x)=ulp(x)x=μ⋅2E(x)x=μM(x)
where μ=2−52 is the machine epsilon.
Expressing the rounding function by its relative error leads to
X=√(1+ϵ)⋅x2=√(1+ϵ)⋅x<(1+ϵ2)⋅x
We know that |ϵ|≤12Δ(x2) and get (ignoring the trivial case x=0)
Xx<1+Δ(x2)4=1+μ4M(x2)
By observing M(x) and M(x2) e.g. over the interval [1,4],
it can be easily be shown that M(x)M(x2)≤√2 which gives us
Xx<1+μ√24M(x)
and therefore
X<x+√24μM(x)⋅x<x+12ulp(x)
Analogously we get the corresponding lower bound. Just instead of
√(1+ϵ)<(1+ϵ2)
we use something like
√(1−ϵ)>(1−(1+ϵ)⋅ϵ2)
which suffices, since we used a very generous estimate (√2/4<12) in the last step.
Because of |X−x| being smaller than 12ulp(x), x is the double
closest to X, therefore round(X) must equal to x, q.e.d.
Attribution
Source : Link , Question Author : maaartinus , Answer Author : maaartinus