Memory Managers and String Building

synscalemm[This is a guest post, written by Primož Gabrijelčič]

One thing interested me since I started reading Eric’s series on string concatenation performance – how would different memory managers compare in a multi-threaded scenario. Today I decided to spend an hour finding out…

edit 18/11: the tests were run with debug mode, which affected TTextWriter very negatively (TWOBS is also affected a negatively, but less, and StringBuilder and Trivial aren’t affected much).  I’ll be repeating tests with more memory managers and in more stable conditions in the next few weeks.


Three memory managers were tested. In the first place, there was FastMM – not the built-in version but the latest SourceForge release (4.991).

Next I tested with SynScaleMM, which is part of the mORMot/Synopse.

The last contender was a new memory manager which I’ll call NN. [I’m discussing the possibility of releasing it as an open source with the authors, but for the moment this is a proprietary memory manager.]


I took Eric’s source code and wrote a simple console app that generates results for the four tests he used in the multithreaded test and for number of cores from 1 to 12. I have a computer with two Xeon E5-2620 processors, each containing six cores, so 12 threads was a natural limit.

Programs were compiled in XE2 Update 4 Hotfix 1 (my natural habitat) and then I closed all programs running on Windows (7, 64-bit) and ran the tests from the command line.

Besides the time (which was calculated by the Eric’s unit) I have also measured peak working set usage for each memory manager. This gives us a good indication of how much memory each program was using.

Memory Manager Performance


On my hardware (or maybe with my FastMM – I don’t know the exact version Eric was using for his tests), trivial builder performs even better than TTextWriter and TWriteOnlyBlockStream. StringBuilder, however, is terribly slow.


With SynScaleMM, TWriteOnlyBlockStream performs the best. There’s actually almost no slowdown up to the 7 cores (and even the result on 8 cores could be a measurement error). TTextWriter is the slowest and trivial string builder performs better than StringBuilder.


The NN memory manager is, interestingly, the fastest and the slowest of the bunch. As you can see from the graph above, all object-oriented algorithms are performing very well and are not linearly slowing down with the increased number of threads. On the bad side, the trivial algorithm performed so bad that I couldn’t even plot it – it was literally 200 times slower than the FastMM4 version. I have notified the authors about that problem.

Next: Algorithm Performance and Memory Manager

9 thoughts on “Memory Managers and String Building

  1. Some of the benchmark screens are missing the title so it is hard to tell what they are benchmarking…
    You could also include to your test this memory manager: (not the same as syncscalemm).

    BTW: nice idea with Pierre on a Kickstarter!:)

  2. A important thing you need to know, is that SynScaleMM is actually the same as ScaleMMv1, which runs on top of FastMM (which explains similar performance). However, my initial Proof of Concept (v1) only contained a memory manager per thread for small memory (1mb) is always directly requested from Windows (same as FastMM).
    (I also tried to make ScaleMM3 with a very different approached, but it turned out to be much slower…)

    I was also busy with testing on my Quad core, and ScaleMM2 scaled (almost) linear!
    I also tested it with Google’s TCmalloc, which has a similar performance as SMM2 (but never releases it memory to Windows!)
    The MSVCRT MM (Win7) seems to scale fine with stringbuilder (but being slower than all others). But with trivial string it performs very bad! I think it has the same problem as NN: it does a full realloc everytime. Whereas FastMM, ScaleMM and TCmalloc do some kind of “smart capacity” expanding (e.g. alloc 25% more space so need to do a full realloc+move for every byte!).

    I will mail you the necessary sources (and my results so far)

    At least it shows that strings are memory manager bound in Delphi: with the default mm (fastmm) it stays at 25% cpu on my quad core with 8 threads due to the global lock of fastmm. But with other MM’s it will reach 100% cpu, so making full usage of all cores!

  3. Fascinating! I can make a guess about what NN is – a well-known memory manager that begins with N? 😉

    I have been working on a new memory manager myself for some time, although it’s been on the back-burner for a few months while traveling. It aims to have good multithreaded performance, ie it’s designed from the outset for a situation where many threads allocate and free at the same time. Unfortunately it’s not done yet, not even to a beta state. However, I will try to find time to continue working and run your performance tests using it…

  4. Nice article!
    BUT there is an other factor that is important : fragmentation. For long running / memory hog applications this can be a problem also (can cause OutofMemory errors even ICO plenty free memory)
    Unfortunately I don’t know how can measue it…

  5. Please do not use SynScaleMM, which is a Proof Of Concept, never to be used on production.
    Try ScaleMM2 which is much more stable and also fast/tuned.

    I just checked the source code.

    I would have rather written in this case:

    function UseTextWriter : String;
    i : Integer;
    tw : TTextWriter;
    st: TRawByteStringStream;
    st := TRawByteStringStream.Create;
    tw := TTextWriter.Create(st,65536);
    for i := 1 to NB do begin
    tw.AddString(#13#10'Eating apple #');

    Since the default buffer may be too small for such generation.

    What is pretty “unfair” in the comparison is that you include a UTF-8 to Unicode conversion during the test, only for TTextWriter!
    Perhaps using Ansi7ToString() may be a bit faster (even if our UTF-8/Unicode conversion is pretty optimized).
    But in all cases, other classes DID NOT do any such conversion.
    You are comparing apples with oranges, here.

    All those drawings are pretty nice, but…
    Which kind of program will do a fixed pattern of string + number concatenation in loop in all threads at once?
    A benchmark. Only a benchmark.

    More general tests as we use in our regression and performance tests (including JSON creation of several kind of data, JSON parsing, HTTP client/server, RTTI access, caching, search, database backend with disk read/write, logging, with up to 50,000 concurrent clients, IOCP and a thread pool).
    What I like very much is feedback for mORMot users using it on production – like

    My current challenge is to provide some code to
    I’m adding MVC support to mORMot currently, using JavaScript BTW.
    Here we will see how it works. In the real world…


  6. Thanks for taking the time to do the comparisons and write them up. Unfortunately, using different scales on your graphs makes it difficult to appreciate the actual differences. This is a cardinal sin of visually representing quantitative information. If you’re interested in how to present information visually, then I highly recommend reading some of Edward Tufte’s book, such as “The Visual Display of Quantitative Information”. Link below.

  7. Interesting stuff. We use the nexusdb memory mananger in FinalBuilder and Automise. We have a bunch of benchmark/test FinalBuilder projects (which exercise the stepping engine with multiple threads), and for those projects the nexus memory manager typically performs twice as fast as FastMM4.

  8. I found a small bug in the test source: in function UseWOBS : String;, the line wobs.WriteString(i); won’t compile since i is an integer, and in what I think is the latest version of DWS, which I just downloaded, there is no overload for integers, only strings.

    I replaced it with wobs.WriteString(IntToStr(i)); instead.

  9. @A. Bouchez IMHO TTextWriter cannot be used because its not UTF16 ready there for TStringBuilder is still the clear choice for us until Embarcadero improves it.

Comments are closed.