http://www.codemaestro.com/reviews/review00000105.html
Any 3D engine draws it's power and speed from the mathematical models and implementations within, and trust John Carmack of ID software for using really good hacks. As it turns out, a very interesting hack is used in Quake III to calculate an inverse square root.
Preface
ID software has recently released the source code of Quake III engine with a GPL license. In this article we'll see Carmack work his black magic to calculate the square root of a floating point number blazingly fast.
Carmack's Unusual Inverse Square Root
A fast glance at the file game/code/q_math.c reveals many interesting performance hacks. The first thing that pops out is the use of the number 0x5f3759df in the function Q_rsqrt, which calculates the inverse square root of a float and why does this function actually work?
Observe the original function from q_math.c:
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
#ifndef Q3_VM
#ifdef __linux__
assert( !isnan(y) ); // bk010122 - FPE?
#endif
#endif
return y;
}
Not only does it work, on some CPU's Carmack's Q_rsqrt runs up to 4 times faster than (float)(1.0/sqrt(x), eventhough sqrt() is usually implemented using the FSQRT assembley instruction!
In another file, code/common/cm_trace.c , a neater implementation of the same hack can be found. This time it is used for calculating the square root of a float - sqrt(x). Note that the only real difference is in the return value ? instead of returning y, return number*y as the square root:
================
SquareRootFloat
================
*/
float SquareRootFloat(float number) {
long i;
float x, y;
const float f = 1.5F;
x = number * 0.5F;
y = number;
i = * ( long * ) &y;
i = 0x5f3759df - ( i >> 1 );
y = * ( float * ) &i;
y = y * ( f - ( x * y * y ) );
y = y * ( f - ( x * y * y ) );
return number * y;
}
Newton's Approximation of Roots
The code above implements the well known Newton Approximation of roots [3]. As in most other iterative approximation calculations, The Newton approximation is supposed to be ran in iterations; each iteration enhances the accuracy until enough iterations have been made for reaching the desired accuracy.
The general idea behind Newton's approximation is that whenever we have a guess y for the value of the square root of a number x, we can perform a simple manipulation to get a better guess (one closer to the actual square root) by averaging y with x/y. For example we can compute the square root of 2 as follows; Suppose the initial guess is 1:
2/1.5 = 1.3333; ( 1.5 + 1.3333 ) / 2 = 1.4167
2/1.4167 = 1.4117; ( 1.4167 + 1.4117 ) / 2 = 1.4142
And so on
As mentioned before, Newton's approximation is a well known method of fast root calculations. However, Carmack picked an unusual first guess for the first iteration that enhances the approximation drastically and lowers the number of required iterations in Quake III's calculations to only one iteration!
A Witchcraft Number
The really interesting aspect of this function is the magic constant 0x5f3759df, used to calculate the initial guess, in :
Hence, dividing the input by 2 and subtracting it from the magic constant. This constant works almost perfectly - Only one Newton approximation iteration is required for a low relative error of 10^-3. . As the code illustrates in the commented out second iteration, this approximation is good enough for the Quake III engine.
It turns out that the magic constant 0x5f3759df is a real mystery. In the article "Fast Inverse Square Root" [2] , mathematician Chris Lomont of Purdue University researches the meaning of this magic constant. Using several elaborate techniques, Lomont tries to mathematically calculate this constant himself. The results are very surprising ? Lomont's well calculated mathematically optimal constant turns out to be slighltly different (0x5f37642f) , and in spite of being theoretically better, it yields worse results then the original constant used in the source code!! Indeed, John Carmack must have used genuine black magic to find this number.
Only by numerically searching for another better constant, Lomont found one, which is slightly better that the original. However, in practice the two constants are likely to yield similar results. Lomont proposes this function with the better constant:
{
float xhalf = 0.5f*x;
int i = *(int*)&x; // get bits for floating value
i = 0x5f375a86- (i>>1); // gives initial guess y0
x = *(float*)&i; // convert bits back to float
x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
return x;
}