High Performance Delphi

 Home   Fundamentals   Guide   Code   Links   Tools   Feedback 

Floating Point Optimization Guidelines


Contents

Style Guide

Extended type

Mixing Types

Function Calls

Constants

FP Control Word

Round vs Trunc

function vs procedure

Trapping Exceptions

Optimize Guide

Compiler FP optimizations

division

Checking for Zero

division


Style Guidelines

Do not use extended unless absolutely necessary

While the FPU performs calculations internally in extended (80 bit) precision, it does not load and store in this format very efficiently. Consequently, using the extended type can double the overall execution time of the simpler arithmetic operations (+,-,*). This is not due to additional time for actually performing the operation, but rather due to the extra time needed to load and store these values. Additionally, the size of extended type variables is awkward (10 bytes, 12 bytes with doubleword alignment), leading to in increased likelihood for the variable to straddle a cache line which causes a performance loss. Finally, in compiler versions 2 through 4 local extended type variables are aligned in the last 10 bytes of the 12 bytes (3 dwords) allocated for them, instead of the first 10. This means that they always are misaligned as local variables. This has been fixed in Version 5 and does not apply to compiler generated temporary variables in any version.

Avoid mixing floating point types

The basic problem is that you will force an unnecessary type "conversion" step in two cases: 1) assigning one variable to another, and 2) Passing a variable as a parameter. In these, two instances a variable will have to be loaded on to the FP stack and saved as the new type, rather than simply copied. This can take 3 or 4 times a long.

Strive to have one function call in each assignment expression

The floating point unit's register stack is only eight entries deep. Consequently, to prevent the stack from overflowing, function calls from within an expression require that the register stack be unloaded prior to making the call. The only exception is that the first function call in an expression is free from this unloading because it can be called just after its arguments are determined, but before the rest of expression is evaluated. Delphi unloads the stack by saving any stored values to temporary (and invisible) extended variables. As was already noted, extended is bad, so you should make your own temporary variables and break up expressions so that only one function call is made per variable assignment. This rule also covers compiler "magic" functions found in the SYSTEM unit, like Abs and Sqr. It does not include "nested" calls. That is, function calls contained in the parameter expression another function call. Since floating point parameters are always passed on the stack each parameter expression represents a separate expression.

Floating point constants

Floating point constants must be saved in the executable as a specific type (i.e. single, double or extended). Basically, whole number constants are saved as single and fractional numbers are saved as extended. As already mentioned, using extended incurs a high cost, so you should force the constants to be of a given size (single or double) by making them typed constants. Note that this does not increase the overall executable size since the value had to be included in the binary anyway. For example:

const
  e: Double = 2.71828; // Euler constant
begin
  ...
  SomeVariable := e*sqr(r);
  ...

will be both faster (use of double) and smaller (double only requiring 8 bytes) than the equivalent routine using the extended type. Note, though, that a typed "constant" can be written to with the $J+ directive.

Also, the compiler will combine constants at compile time if possible. If the operation between two constants has a higher precedence than any operation involving those constants and any variable or variable expressions then the constants will be "folded" together. Additionally, in Delphi 2 and 3 division by a constant would always be converted into multiplication by its reciprocal. Unfortunately, this was eliminated in version 4. So as an example, in D2 and D3 the statement:

fp:=fp*3*4/5+3*4/2;

will actually be calculated as:

fp:=fp*3*4*0.2+6

In D4 the same statement will actually be calculated as:

fp:=fp*3*4/5+6

You can get better constant folding by placing the constants in front of any variables:

fp:=3*4/5*fp+3*4/2;

Will actually be calculated as:

fp:=2.4*fp+6

Set the control word precision to the appropriate level

Floating point division and square root instructions can take a substantial amount of time. However, you can save some of that time if you do not need maximum accuracy. You can modify the level of accuracy by changing the FPU's control word. The default accuracy, as initialized by the Delphi runtime library, is the slowest, but most precise one (i.e. extended). Delphi supports direct modification of the FPU's control word with the Set8087CW procedure and the global variable Default8087CW. Use the following lines to set the control word to different precision levels:

Single:   Set8087CW(Default8087CW and $FCFF); 
Double:   Set8087CW((Default8087CW and $FCFF) or $0200); 
Extended: Set8087CW(Default8087CW or $0300);

Note that changing this control word only changes the execution time of division and, in the case of Pentium II and Pentium III processors, square roots.

As of version 6 this has gotten easier as you can simply call the SetPrecisionMode() with the proper precision level constant (pmSingle,pmDouble, or pmExtended).

Prefer Round over Trunc

Trunc reads and sets the FPU control word, which is very costly. The Round function, on the other hand, does not do this and therefore is about 2.5 times faster on a Pentium II.

Favor procedures with var parameters over functions

This is an overhead management issue and hence comes into play more with small functions where overhead is a greater percentage of the total processing time. For example changing:

function Calc(a: Double): Double;
begin
  result := a*1.1;
end;

to:

procedure Calc(var Result; a: Double);
begin
  Result := a*1.1;
end;

cuts execution time in half (on a Pentium II). This is especially true if you are mainly passing a value around (including simple assignment) rather than actually using it. For instance:

function SetValue(NewValue: Double): Double;
begin
  Result := Value;
  Value := NewValue;
end;

results in a function composed almost entirely of overhead.

The downside of this technique is that you need to use var instead of const for parameters that are not supposed to change, because const does not really do anything on floating point parameters except to force a compile-time check that the parameter indeed is not changed.

Trapping Floating Point Exceptions

FP exceptions (such as divide by zero) aren't actually triggered when the error occurs. Instead they are delayed until the next floating point instruction. Presumably this implementation was used to allow for testing and handling of the error locally. However, it can have the rather odd effect of making the wrong code look guilty if and when the exception is finally triggered. The solution to this is to stick a wait or FWait instruction in to force the exception. This just what the compiler does after each and every floating point statement. Of course executing all those waits can be costly, so in a hand written floating point assembly routine you may want to simply stick one in right at the end of the routine once you have it debugged. This keeps the cost low, but still ensures that any exception generated still points to at least the proper routine.

Of course, every rule needs an exception. One example where this is not the case is Windows 95 (!) and this code (From Stefan Hoffmeister's FPU Demo):

  x := -1;
  asm
      fld x
      // Generate an IEEE invalid operation:
      //   sqrt(-1)
      fsqrt


      fwait
  end;

Under NT, this (correctly) raises an FP exception. Not so on Win95. Jam in an FXAM before the FWAIT in Win95 - and get the exception. Thank you, Microsoft.

Optimization Techniques

You need to do your own floating point optimization

Delphi does no floating point optimization. You are going to get exactly what you ask for. Thus, do not assume things like common expressions are going to be combined. You need to do all this yourself.

Make great effort to reduce the number of divisions

Division is very expensive, taking about 20-40 times as long as multiplication, addition, or subtraction. Move divisions outside of loops whenever possible. Do not forget to convert a division by a constant into the corresponding multiplication with its reciprocal.

How to avoid floating point checks for zero

Under certain circumstances it can be beneficial avoid a direct comparison to check for a zero in a floating point variable and instead utilize typecasting to test the underlying representation of the variable. This is because floating point comparisons require a true floating point based zero check by taking advantage of the way zero is stored. Considering substantially reduced readability of this technique it should be used sparingly.

To check a single variable for zero use: DWord(Pointer(SomeSingleVar)) shl 1 = 0

Checking a double variable is more complicated:

type
  PDoubleData=^TDoubleData
  TDoubleData=record lo,hi:DWord end;

// two possible ways

var
  DoubleData:PDoubleData;
...
  DoubleData:=@SomeDoubleVar;
  if (DoubleData.hi shl 1 ) + DoubleData.Lo = 0 then
...

// or

var
  DoubleData:TDoubleData absolute SomeDoubleVar;
...
  if (DoubleData.hi shl 1 ) + DoubleData.Lo = 0 then
...

The above techniques can shave about 30-40% off the comparison time on a Pentium II.

 Home   Fundamentals   Guide   Code   Links   Tools   Feedback 


Copyright © 2003 Robert Lee ([email protected])