Stalking the Prey
Looking at the above code, the first obvious optimization that developers suggest seems to be taking the conditional out of the loop, resulting in several case-specific loops. On small vectors, this nets about a 30% speedup. For further speedups, the suggestions are typically to go for loop unrolling, asm, and other heavy-handed solutions that come with a significant development time and code complexity increase.
Of course, readers of this website will know better than to jump straight into the code and apply optimization recipes: they would run the code through a profiler first. And since we’re dealing with a single procedure, an instrumenting profiler would be of little help, so they would run Sampling Profiler instead, and would get to see something like this:
In this run, only the dwInc case was stressed (line 37), and obviously the procedure spends less than 30% of its time doing what it was asked of, and most of its time (33%) on the “end“, ie. cleaning up, plus 8% setting up in “begin“. That’s 40%+ doing nothing but stack and setup/cleanup work!
The conditional in the loop that could have looked like the most worrying bit is eating a bit less than 20% of the time.
What is the source of all that begin/end work? Place a breakpoint on begin, run and hit Ctrl+Alt+C when your breakpoint is reached, go have a look at the CPU view, and you’ll see this:
This is a fairly significant stack setup for such a small procedure, and those instructions with “fs:” at the bottom are the setting up of an (implicit) exception frame. An exception frame for what? if you haven’t guessed already, navigate your CPU view near the “end” line.
No wonder “end” was a bottleneck! The call to UStrArrayClr indicates that the exception frame is here to cleanup several strings… these strings are those of the raise Exception, one is the string returned by GetEnumName, the other is the result of the concatenation passed to Exception.Create.
Isolate and Kill
How to get rid of that exception frame? One typical way is to use “Exception.CreateFmt”, and pass only constant strings to it, but that is not possible here with the call to GetEnumName, which returns a string. The other way is to isolate the exception to its own (nested) procedure:
procedure RaiseUnsupported(what : TDoWhat); begin raise Exception.Create('Unsupported: '+GetEnumName(TypeInfo(TDoWhat), Integer(what))); end;
and call RaiseUnsupported in the “case else“. Doing so will move the exception frame to the new procedure, where it’s irrelevant in terms of performance.
This simple change nets us a 33% speedup, ie. we reclaimed most of the lost time in begin/end! We also gained a bit from the UStrArrayClr, which did essentially nothing since those strings it was used to clear weren’t defined (as long as we did not hit the exception).
Note that if you use a nested procedure for RaiseUnsupported, you can be tempted not to pass it the “what” parameter, but use directly the “what” from its parent procedure. However by doing so, you’ll have the compiler use a special stack setup (so that the nested procedure can access the parent procedure’s variables). This setup will be faster than the exception frame it replaces, but with it, begin/end would still be taking about 18% of the CPU time spent in the procedure.