A look at a marginally more complex case
What happens when the function is slightly more complex?
function Add(const a, b : Double) : Double; begin Result := a+b; end;
Well, the non-inlined form still compiles rather inefficiently in both XE & XE6
Unit1.pas.45: begin 005D7354 55 push ebp 005D7355 8BEC mov ebp,esp 005D7357 83C4F0 add esp,-$10 Unit1.pas.46: Result := a+b; 005D735A DD4510 fld qword ptr [ebp+$10] 005D735D DC4508 fadd qword ptr [ebp+$08] 005D7360 DD5DF0 fstp qword ptr [ebp-$10] 005D7363 9B wait 005D7364 DD45F0 fld qword ptr [ebp-$10] 005D7367 DD5DF8 fstp qword ptr [ebp-$08] 005D736A 9B wait Unit1.pas.47: end; 005D736B DD45F8 fld qword ptr [ebp-$08] 005D736E 8BE5 mov esp,ebp 005D7370 5D pop ebp 005D7371 C21000 ret $0010
By reference, the optimal form would involve just three instructions
fld a fadd b ret
What about inlining? Well things changed, but not all for the best…
Here is the inefficient inlining in Delphi XE
Unit1.pas.53: f := Add(a, b); 004AB65B DD442408 fld qword ptr [esp+$08] 004AB65F DC442410 fadd qword ptr [esp+$10] 004AB663 DD5C2418 fstp qword ptr [esp+$18] 004AB667 9B wait 004AB668 8B442418 mov eax,[esp+$18] 004AB66C 890424 mov [esp],eax 004AB66F 8B44241C mov eax,[esp+$1c] 004AB673 89442404 mov [esp+$04],eax
and here is the inefficient inlining in Delphi XE6
Unit1.pas.53: f := Add(a, b); 005D7357 DD442408 fld qword ptr [esp+$08] 005D735B DC442410 fadd qword ptr [esp+$10] 005D735F DD5C2418 fstp qword ptr [esp+$18] 005D7363 9B wait 005D7364 DD442418 fld qword ptr [esp+$18] 005D7368 DD1C24 fstp qword ptr [esp] 005D736B 9B wait
So the stack juggling is still there, except that instead of being handled by integer instructions, it’s now handled by FPU instructions, along with a pointless wait instruction.
If your code is already bottle-necked by the FPU, this just won’t help…
The new function inlining in XE6 can provide some improvements over XE, but it can also result in less efficient code in a floating-point heavy context.
It also means that the need for using procedure-with-var-for-result instead of functions has – alas – not been eliminated by XE6, and there may be just as many cases in which performance goes up as cases in which performance will go down.