In Delphi you can initialize a dynamic array in two ways, either manually or via the Create magic constructor
type TIntegerArray = array of Integer; procedure Test; var a : TIntegerArray; begin // magic constructor a := TIntegerArray.Create(1, 2); // manual creation SetLength(a, 2); a := 1; a := 2; end;
The outcome in both cases is the same, are all things equal?
Some array initializations are more equal than others
The first method is less verbose in code, but quite a bit less efficient, if you check the CPU view, that becomes obvious
TestUnit.pas.32: a := TIntegerArray.Create(1, 2); 00511335 8D45F8 lea eax,[ebp-$08] 00511338 8B15F0125100 mov edx,[$005112f0] 0051133E E89576EFFF call @DynArrayClear // anybody knows why? 00511343 6A02 push $02 00511345 8D45F8 lea eax,[ebp-$08] 00511348 B901000000 mov ecx,$00000001 0051134D 8B15F0125100 mov edx,[$005112f0] 00511353 E87476EFFF call @DynArraySetLength 00511358 83C404 add esp,$04 0051135B 8B45F8 mov eax,[ebp-$08] 0051135E C70001000000 mov [eax],$00000001 00511364 8B45F8 mov eax,[ebp-$08] 00511367 C7400402000000 mov [eax+$04],$00000002 0051136E 8B55F8 mov edx,[ebp-$08] 00511371 8D45FC lea eax,[ebp-$04] 00511374 8B0DF0125100 mov ecx,[$005112f0] 0051137A E89576EFFF call @DynArrayAsg // Manual initialization TestUnit.pas.35: SetLength(a, 2); 0051137F 6A02 push $02 00511381 8D45FC lea eax,[ebp-$04] 00511384 B901000000 mov ecx,$00000001 00511389 8B15F0125100 mov edx,[$005112f0] 0051138F E83876EFFF call @DynArraySetLength 00511394 83C404 add esp,$04 TestUnit.pas.36: a := 1; 00511397 8B45FC mov eax,[ebp-$04] 0051139A C70001000000 mov [eax],$00000001 TestUnit.pas.37: a := 2; 005113A0 8B45FC mov eax,[ebp-$04] 005113A3 C7400402000000 mov [eax+$04],$00000002
Now before you complain on the compiler capability, you’ve got to realize that the two ways of initializing a dynamic arrays are not equivalent:
- the magic constructor creates an array, then assigns it, so the array variable is always in a well-defined state
- the manual initialization mutates the array in several steps, so the array during the intermediate state is in an unfinished step
Of course, in the limited Test procedure, the compiler could figure out the array isn’t visible from the outside, and thus use the shorter form, but that’s an optimization that would apply only to a local variables.
A more generic optimization would be to have the compiler waive the temporary array when the array that is initialized isn’t referenced anywhere else (so intermediate states don’t matter), that’s possible given that dynamic arrays are reference-counted.
Next: Overhead in detail. Building a better Array constructor.
8 thoughts on “Delphi array constructors performance (or lack of)”
I created the request to support the simpler initialization syntax some while ago: http://qc.embarcadero.com/wc/qcmain.aspx?d=93549
DynArrayClear call should be an optimization. Without it the subsequent DynArraySetLength call will copy old `a` contents to a new location if `a nil`.
Blog engine had eaten < and > chars
Take a look about how DynArraySetLength() in System.pas is implemented.
Checking Length(a)length(Value) will not be much faster, since you will just bypass simple RTTI lookup, some adds, one mul and one div, then call reallocmem which is a no op for FastMM4 if the length is the same. And in most cases, Length(a) is indeed not equal to length(Value), so your optimization will slow down everything.
Dynamic arrays are reference counted values, with “Copy On Write” pattern.
DynArrayClear is there to handle the reference counting of any previous dynamic array instance stored in the variable. Here, your local variable is nil, so it will return with no delay. But if a should be nil, it will release any previous instance (if not copied anywhere).
This is exactly the same for a string, a variant or an interface (or a record containing any of those types).
What is slow with dynamic arrays is not the initialization, but the change of size. If you use SetLength(a,length(a)+1) to add items, it will be dead slow.
This is why in our TDynArray wrapper, we allow to define an external “Count: integer” variable, so the array length is not the actual count number, but the array capacity.
In our enhanced RTL for Delphi 7 and 2007, we rewrote some part of the DynArraySetLength() low-level function. But speed increase is not huge.
Note that DynArraySetLength() current implementation in System.pas is not thread-safe about the reference counting (whereas it is thread safe e.g. for strings since Delphi 5).
What may render dynamic arrays faster in Delphi should be the emission of code instead of using the RTTI (similar for record copy). Up to now, optimization exists if there is no RTTI for records, but dynamic arrays or records containing reference counts uses RTTI, which is slower. By the way, this is a reason similar to why TValue/variant types are slower than some optimized versions (like TOmniValue) using proper inlining features, which allows some compile-time optimization, by passing RTTI.
This is not what benchmarking reports, the 10x speedup us easy to observe in that case.
A simple comparsion like that one that is correctly branch-predicted is very cheap, you’ll be hard-pressed to get measurable a difference in benchmarks.
And if it’s not correctly branch-predicted, then it means you’re in the same size-situation, and will avoid branch mispredicts that happen in DynSetArrayLength.
They’re not, Strings are copy-on-write, dynamic arrays are just reference-counted types, which happen to be duplicated by SetLength when there is more than one reference to them. They’re hybrids.
It doesn’t in the real world, the reason is that this test is hard to branch-predict, from the POV of the CPU it’s close to random as you have a mix of nil and non-nil arrays which all go through the same test. And a branch mispredict is very costly on modern architectures.
It’s dead-slow even when the size isn’t changed, benchmark it if you doubt it, the RTL source doesn’t give the whole story.
Btw, the gains in your DynArray wrapper with a capacity come from that aspect as well.
When rewriting DynArraySetLength, you don’t get around the branch mispredictions (since there is still only one routine, with only one location for the jumps), but with your wrapper, there are multiple source locations for the Count comparison, which means they are a bit more context-specific, and thus better branch predicted.
Branch misprediction can be spotted when profiling: if the most expensive line is an innocuous-looking trivial test, chances are this is a misprediction. Normal bottlenecks are arithmetic, read/write from memory, complex calls, etc. When a simple boolean test comes on top, it’s because it’s mispredicted.
Yes, this was reported way back in D5 or D7, alongside being able to have custom constructors/destructors for records, rather than the RTTI-based RTL ones (which are structurally slow).
You may want to check this 🙂
You are right: they are not COW values, just reference counted. But not 100% thread safe!
Of course, in your case, checking length() before SetLength() will be 10x faster, but I suspect it won’t exist in real world, unless your code is very bad written.
The “a” value is always initialized to nil, during the function/method local stack initialization, just like any reference-counted variable. Check the generated asm code by the compiler.
In all cases, branch mis-prediction is nothing in comparison to a ReallocMem(), a InitializeArray() or a move() – as run within DynArraySetLength(). TDynArray is not much faster because of branch prediction and CPU cache issues, but because it bypassed the reallocation. This is what a profiler (like your great sample profiler) shows on real code, not dedicated benchmarks like calling SetLength() in loop. Then accessing a dynamic array is a O(1) opcode, just like any array.
I know the link in cromis.net – I just did not want to report the whole story.
I’m monitoring progress with JS typed arrays, but at the moment they’re still slower than regular JS arrays, so they’re only beneficial for memory space (when large) or to talk with WebGL. Arrays benefit from quite a lot of optimizations in Chrome V8.
With the JS compiler you already avoid issues of mixed type arrays (integers & floats f.i.) which are detrimental to v8 optimizations, and the compiler also always issues variable declaration with an initialization (from which V8 infers typing), but there are some other things to be aware of, like try..except/finally which turns off some JITing features.
Thanks for the feedback about V8 inferring typing.
I did read this some time ago, but it was out of my mind.
Great work, Eric!
Comments are closed.