msun: fix cbrt iterations from Newton to Halley method
Since we're inverting a cube, we have:
f(Tₙ)=Tₙ³-x (1)
Its first and second derivatives are:
f'(Tₙ)=3Tₙ² (2)
f"(Tₙ)=6Tₙ (3)
Halley iteration[1] uses:
Tₙ₊₁=Tₙ-2f(Tₙ)f'(Tₙ)/(2f'(Tₙ)²-f(Tₙ)f"(Tₙ)) (4)
Replacing the terms of (4) using (1), (2) and (3):
Tₙ₊₁ = Tₙ-2f(Tₙ)f'(Tₙ)/(2f'(Tₙ)²-f(Tₙ)f"(Tₙ))
= Tₙ-2(Tₙ³-x)3Tₙ²/(2(3Tₙ²)²-(Tₙ³-x)6Tₙ)
= <snip, see WolframAlpha[2] alternate forms>
= Tₙ(2x+Tₙ³)/(x+2Tₙ³)
This formula corresponds to the exact expression used in the code.
Newton formula is Tₙ-f(Tₙ)/f'(Tₙ) which would have simplified to
(2Tₙ³+x)/(3Tₙ²) instead.
[1] https://en.wikipedia.org/wiki/Halley's_method
[2] https://www.wolframalpha.com/input?i=T-2%28T%5E3-x%293T%5E2%2F%282%283T%5E2%29%5E2-%28T%5E3-x%296T%29
Note: UTF8 in commit message due to the heavy math being hard to
recreate w/o it. -- imp
Signed-off-by: Clément Bœsch <u@pkh.me>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1684
This commit is contained in:
committed by
Warner Losh
parent
5b9660caff
commit
f887d0215f
@@ -90,7 +90,7 @@ cbrt(double x)
|
||||
* the result is larger in magnitude than cbrt(x) but not much more than
|
||||
* 2 23-bit ulps larger). With rounding towards zero, the error bound
|
||||
* would be ~5/6 instead of ~4/6. With a maximum error of 2 23-bit ulps
|
||||
* in the rounded t, the infinite-precision error in the Newton
|
||||
* in the rounded t, the infinite-precision error in the Halley
|
||||
* approximation barely affects third digit in the final error
|
||||
* 0.667; the error in the rounded t can be up to about 3 23-bit ulps
|
||||
* before the final error is larger than 0.667 ulps.
|
||||
@@ -99,7 +99,7 @@ cbrt(double x)
|
||||
u.bits=(u.bits+0x80000000)&0xffffffffc0000000ULL;
|
||||
t=u.value;
|
||||
|
||||
/* one step Newton iteration to 53 bits with error < 0.667 ulps */
|
||||
/* one step Halley iteration to 53 bits with error < 0.667 ulps */
|
||||
s=t*t; /* t*t is exact */
|
||||
r=x/s; /* error <= 0.5 ulps; |r| < |t| */
|
||||
w=t+t; /* t+t is exact */
|
||||
|
||||
@@ -50,7 +50,7 @@ cbrtf(float x)
|
||||
SET_FLOAT_WORD(t,sign|(hx/3+B1));
|
||||
|
||||
/*
|
||||
* First step Newton iteration (solving t*t-x/t == 0) to 16 bits. In
|
||||
* First step Halley iteration (solving t*t-x/t == 0) to 16 bits. In
|
||||
* double precision so that its terms can be arranged for efficiency
|
||||
* without causing overflow or underflow.
|
||||
*/
|
||||
@@ -59,7 +59,7 @@ cbrtf(float x)
|
||||
T=T*((double)x+x+r)/(x+r+r);
|
||||
|
||||
/*
|
||||
* Second step Newton iteration to 47 bits. In double precision for
|
||||
* Second step Halley iteration to 47 bits. In double precision for
|
||||
* efficiency and accuracy.
|
||||
*/
|
||||
r=T*T*T;
|
||||
|
||||
@@ -126,7 +126,7 @@ cbrtl(long double x)
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Final step Newton iteration to 64 or 113 bits with
|
||||
* Final step Halley iteration to 64 or 113 bits with
|
||||
* error < 0.667 ulps
|
||||
*/
|
||||
s=t*t; /* t*t is exact */
|
||||
|
||||
Reference in New Issue
Block a user