Algorithm Performance and Memory Manager
We get an interesting insight into speed of memory managers if we compare times for one algorithm running on different memory managers.
In the trivial case, both FastMM4 and SynScaleMM behave almost the same, while the NN was so slow I couldn’t even show it on the chart.
An interesting observation – in both algorithms two threads performed better than one, which is really a surprise for me as two threads also have to execute (both together) twice as much work. So in a two-thread case, each thread was working faster than in a single thread case.
This is an interesting side-effect of the two processor setup: when a single thread is busy, the OS will tend to switch it across processors to spread the load, and when switched from processor, the L1 and L2 cache are lost, the processor needs to read everything from RAM again. When the one thread is pinned to a processor via its thread affinity, the issue goes away.
For two threads, the OS is apparently smart enough to switch keep one thread on each processor.
We already know that StringBuilder is performing substandardly with the FastMM, but from this graph we also see that the problem is not that much in the StringBuilder implementation but in the FastMM, where multiple threads are fighting for the same memory manager (because they allocate blocks of the same size). Both SynScalMM and NN are performing well, with NN taking a small lead on >8 cores.
This was in my opinion the most surprising result of the test – not the good performance of the NN, but practically identical performance of FastMM4 and SynScaleMM. It looks that under some circumstances FastMM4 performs quite well even in multithreaded tests.
In the last comparison – TWriteOnlyBlockStream – SynScaleMM and NN are performing equally well with SynScaleMM being slightly faster while FastMM4 is about 2,5 times slower. The weird spikes in the FastMM4 graph are probably measurement artifacts.