Style Guidelines
Do not use extended unless absolutely necessary
While the FPU performs calculations internally in extended (80 bit)
precision, it does not load and store in this format very efficiently.
Consequently, using the extended type can double the
overall execution time of the simpler arithmetic operations (+,,*).
This is not due to additional time for actually performing the operation,
but rather due to the extra time needed to load and store these values.
Additionally, the size of extended type variables is
awkward (10 bytes, 12 bytes with doubleword alignment), leading to
in increased likelihood for the variable to straddle a cache line
which causes a performance loss. Finally, in compiler versions 2 through
4 local extended type variables are aligned in the last
10 bytes of the 12 bytes (3 dwords) allocated for them, instead of
the first 10. This means that they always are misaligned
as local variables. This has been fixed in Version 5 and does not
apply to compiler generated temporary variables in any version.
Avoid mixing floating point types
The basic problem is that you will force an unnecessary type "conversion"
step in two cases: 1) assigning one variable to another, and 2) Passing
a variable as a parameter. In these, two instances a variable will
have to be loaded on to the FP stack and saved as the new type, rather
than simply copied. This can take 3 or 4 times a long.
The floating point unit's register stack is only eight entries deep.
Consequently, to prevent the stack from overflowing, function calls
from within an expression require that the register stack be unloaded
prior to making the call. The only exception is that the first function
call in an expression is free from this unloading because it can be
called just after its arguments are determined, but before the rest
of expression is evaluated. Delphi unloads the stack by saving any
stored values to temporary (and invisible) extended variables.
As was already noted, extended is bad, so you should
make your own temporary variables and break up expressions so that
only one function call is made per variable assignment. This rule
also covers compiler "magic" functions found in the SYSTEM
unit, like Abs and Sqr . It does not include
"nested" calls. That is, function calls contained in the
parameter expression another function call. Since floating point parameters
are always passed on the stack each parameter expression represents
a separate expression.
Floating point constants
Floating point constants must be saved in the executable as a specific
type (i.e. single, double or extended). Basically, whole number constants
are saved as single and fractional numbers are saved as extended.
As already mentioned, using extended incurs a high cost,
so you should force the constants to be of a given size (single
or double ) by making them typed constants. Note
that this does not increase the overall executable size since the
value had to be included in the binary anyway. For example:
const
e: Double = 2.71828; // Euler constant
begin
...
SomeVariable := e*sqr(r);
...
will be both faster (use of double ) and smaller (double
only requiring 8 bytes) than the equivalent routine using the extended
type. Note, though, that a typed "constant" can be written
to with the $J+ directive.
Also, the compiler will combine constants at compile time if possible.
If the operation between two constants has a higher precedence than
any operation involving those constants and any variable or variable
expressions then the constants will be "folded" together.
Additionally, in Delphi 2 and 3 division by a constant would always
be converted into multiplication by its reciprocal. Unfortunately,
this was eliminated in version 4. So as an example, in D2 and D3 the
statement:
fp:=fp*3*4/5+3*4/2;
will actually be calculated as:
fp:=fp*3*4*0.2+6
In D4 the same statement will actually be calculated as:
fp:=fp*3*4/5+6
You can get better constant folding by placing the constants in front
of any variables:
fp:=3*4/5*fp+3*4/2;
Will actually be calculated as:
fp:=2.4*fp+6
Set the control word precision to the appropriate level
Floating point division and square root instructions can take a substantial
amount of time. However, you can save some of that time if you do
not need maximum accuracy. You can modify the level of accuracy
by changing the FPU's control word. The default accuracy, as initialized
by the Delphi runtime library, is the slowest, but most precise one
(i.e. extended ). Delphi supports direct modification
of the FPU's control word with the Set8087CW procedure
and the global variable Default8087CW . Use the following
lines to set the control word to different precision levels:
Single: Set8087CW(Default8087CW and $FCFF);
Double: Set8087CW((Default8087CW and $FCFF) or $0200);
Extended: Set8087CW(Default8087CW or $0300);
Note that changing this control word only changes the execution time
of division and, in the case of Pentium II and Pentium III processors,
square roots.
As of version 6 this has gotten easier as you can simply call the
SetPrecisionMode() with the proper precision level constant
(pmSingle ,pmDouble ,
or pmExtended ).
Prefer Round over Trunc
Trunc reads and sets the FPU control word, which is
very costly. The Round function, on the other hand, does
not do this and therefore is about 2.5 times faster on a Pentium II.
Favor procedures with var parameters over functions
This is an overhead management issue and hence comes into play more
with small functions where overhead is a greater percentage of the
total processing time. For example changing:
function Calc(a: Double): Double;
begin
result := a*1.1;
end;
to:
procedure Calc(var Result; a: Double);
begin
Result := a*1.1;
end;
cuts execution time in half (on a Pentium II). This is especially
true if you are mainly passing a value around (including simple assignment)
rather than actually using it. For instance:
function SetValue(NewValue: Double): Double;
begin
Result := Value;
Value := NewValue;
end;
results in a function composed almost entirely of overhead.
The downside of this technique is that you need to use var
instead of const for parameters that are not supposed
to change, because const does not really do anything
on floating point parameters except to force a compiletime check
that the parameter indeed is not changed.
Trapping Floating Point Exceptions
FP exceptions (such as divide by zero) aren't actually triggered
when the error occurs. Instead they are delayed until the next floating
point instruction. Presumably this implementation was used to allow
for testing and handling of the error locally. However, it can have
the rather odd effect of making the wrong code look guilty if and
when the exception is finally triggered. The solution to this is to
stick a wait or FWait instruction in to
force the exception. This just what the compiler does after each and
every floating point statement. Of course executing all those waits
can be costly, so in a hand written floating point assembly routine
you may want to simply stick one in right at the end of the routine
once you have it debugged. This keeps the cost low, but still ensures
that any exception generated still points to at least the proper routine.
Of course, every rule needs an exception. One example where this
is not the case is Windows 95 (!) and this code (From Stefan Hoffmeister's
FPU Demo):
x := 1;
asm
fld x
// Generate an IEEE invalid operation:
// sqrt(1)
fsqrt
fwait
end;
Under NT, this (correctly) raises an FP exception. Not so on Win95.
Jam in an FXAM before the FWAIT in Win95  and get the exception.
Thank you, Microsoft.
Optimization Techniques
You need to do your own floating point optimization
Delphi does no floating point optimization. You are going to get
exactly what you ask for. Thus, do not assume things like common expressions
are going to be combined. You need to do all this yourself.
Make great effort to reduce the number of divisions
Division is very expensive, taking about 2040 times as long as multiplication,
addition, or subtraction. Move divisions outside of loops whenever
possible. Do not forget to convert a division by a constant into the
corresponding multiplication with its reciprocal.
How to avoid floating point checks for zero
Under certain circumstances it can be beneficial avoid a direct comparison
to check for a zero in a floating point variable and instead utilize
typecasting to test the underlying representation of the variable.
This is because floating point comparisons require a true floating
point based zero check by taking advantage of the way zero is stored.
Considering substantially reduced readability of this technique it
should be used sparingly.
To check a single variable for zero use: DWord(Pointer(SomeSingleVar))
shl 1 = 0
Checking a double variable is more complicated:
type
PDoubleData=^TDoubleData
TDoubleData=record lo,hi:DWord end;
// two possible ways
var
DoubleData:PDoubleData;
...
DoubleData:=@SomeDoubleVar;
if (DoubleData.hi shl 1 ) + DoubleData.Lo = 0 then
...
// or
var
DoubleData:TDoubleData absolute SomeDoubleVar;
...
if (DoubleData.hi shl 1 ) + DoubleData.Lo = 0 then
...
The above techniques can shave about 3040% off the comparison time
on a Pentium II.
