As a followup to the String Concatenation  article, let’s take a look at a less trivial case: what if instead of concatenating a couple strings, you want to concatenate a few hundred?
Sounds like a task at which TStringBuilder should excel, but one should never assume, and always measure.
Eating Lots of Apples
While some drink bottles of beer , we will eat apples instead.
Here is the Trivial version
Result := ''; for i := 1 to NB do Result := Result + #13#10'Eating apple #' + IntToStr(i);
The TStringBuilder and other object version are a bit longer, but to keep things short, I’m not reproducing the variable declarations or the try..finally for constructor/destructor.
for i := 1 to NB do sb.Append(#13#10'Eating apple #').Append(i); Result := sb.ToString;
I made two variants, one without pre-allocation, and another with pre-allocated buffer (through the Capacity property).
Similarly, you can have a TStringStream version (using WriteString in place of Append, and ToString with DataString).
Finally for the Format function lovers there is a Trivial Format as well
Result := ''; for i := 1 to NB do Result := Result + Format('#13#10'Eating apple #%d', [i]);
And just for the fun of it, I made a version with DWScript ‘s TWriteOnlyBlockStream (yes, that is a mouthful) whose code is similar to the TStringBuilder and TStringStream contenders.
Okay, ladies and gentlemen, place your bets, let the drums roll end and let’s see the benchmark results.
The Mighty Have Fallen
Here are the times per-iteration, lower is better, for 10, 100, 1000 and 10000 loop iterations. Loop execution time is normalized to per-iteration times (ie. loop run times were divided by 10 to the 10 iterations loop, 100 for the 100 iterations loop, etc.).
1 – The first thing that is obvious is that if you’re using TStringStream, well, stop. You shouldn’t. Or you need a very, very good reason to do so.
2 – The second is that if you changed your trivial, KISS  string concatenations for
TStringBuilder, well… take that hanky. Maybe you got lucky, and are in one of the cases where TStringBuilder is okay? Okay, maybe not.
3 – Trivial concatenation is simple, readable, and scales well. We’re not in .Net or Java La-La-Land where a simple string concatenation gone wrong can throw you in deep pits of swapping hell. This is why I like the Delphi String type.
Format does not seem to be shining, but in addition to string concatenation and integer conversion, it also has to parse the format string, and given the flexibility, the performance is not bad at all.
TWriteOnlyBlockStream is doing fine with a decent lead, but its real-magic becomes visible in multi-threaded scenarios (which this benchmark isn’t).
Why are Plain Delphi Strings doing so Well?
Should not simple String concatenation fail like Schlemiel the Painter ?
Two things: Copy-On-Write and FastMM. Both work hand in hand there.
Copy-On-Write is what makes Delphi String different , it gives them both the advantages of immutability and those of mutability, which are leveraged here.
FastMM on the other hand performs automatic speculative allocation for buffers that grow, and mutable strings that are appended to are just buffers that grow.
Why is TStringBuilder slow?
Some will say that is because it was just ripped from .Net or Java, where constraints are difference, and where it’s used as a trick to work around String immutability. While there is some truth is that, TWriteOnlyBlockStream is using a structurally similar approach, yet leads comfortably.
No, one of the main reasons is because the implementer of TStringBuilder ran afoul of implicit workloads:
- most of the Append overloads work by creating a small local string and appending it, which means an exception frame, a string allocation and de-allocation each time (in contrast, the trivial implementations reuse the same local variable each time)
- Append(String) looks innocent, but is choke-full of implicitness, just look at it in the CPU view.
Last but not least, the SetLength implementation, which is invoked from each Append call, is just not very efficient. For instance it does two checks on the value where one guard check could suffice, and it systematically enters an exception block that is only useful when growing the buffer.
So even if you pre-allocate the buffer, you still pay for most of the overhead of TStringBuilder, which is why pre-allocating doesn’t have a magical effect. Buffer growth isn’t the bottleneck (and would not be under FastMM anyway).
Check the followup article: Going Multi-Threaded .