Memory Managers and String Building

synscalemm[This is a guest post, written by Primož Gabrijelčič]

One thing interested me since I started reading Eric’s series on string concatenation performance – how would different memory managers compare in a multi-threaded scenario. Today I decided to spend an hour finding out…

edit 18/11: the tests were run with debug mode, which affected TTextWriter very negatively (TWOBS is also affected a negatively, but less, and StringBuilder and Trivial aren’t affected much).  I’ll be repeating tests with more memory managers and in more stable conditions in the next few weeks.


Optimizing for memory: a tighter TList

One of the memory hogs when you have object trees or graphs can be good old TList and its many variants (TObjectList, TList<T>, etc.).

This is especially an issue when you have thousandths of lists that typically hold very few items, or are empty.
In the DWS expression tree f.i. there are quickly thousandths of such lists, for local symbol tables, parameter lists, etc.

How much does a TList cost in terms of memory?

A TList holding a single item already costs you:

  • 4 bytes for the field in the owner object
  • 20 bytes for the TList instance
    • 8 hidden bytes: Monitor + VMT pointer
    • 12 field bytes: data pointer + Count + Capacity
  • 4 bytes for the data

So that’s 28 bytes, with two dynamically allocated blocks which, and those dynamic allocations, depending on your allocation alignment settings, can cost you something like an extra dozen of bytes (even with FastMM).

What about the other TList variants?

  • TObjectList has an extra boolean field, which with alignment, costs an extra 4 bytes.
  • TList<T> has an instance size of 28 bytes, and a dynamic array storage with 8 hidden extra bytes (4 for length, 4 for the refcount).

Neither of these are better candidates for memory-efficient small lists.

Enter the TTightList

You can find TTightList in DWS’s dwsUtils unit.
For an empty list, the cost is 8 bytes, no dynamic memory, and for a list with a single item, 8 bytes still with no dynamic memory.
For a n-items list, the cost is 8 bytes plus one n*4 bytes dynamic block.

To achieve that, the TTightList makes use of two tricks:

  • it’s designed to be composed, and hosted as an object field
    • it’s a record-with-methods, not a class, but retains classic-looking use semantics (.Add, .IndexOf, .Clear etc.).
    • eliminates the need for a pointer to the instance in the host object
    • eliminates the dynamically allocated storage for the TTightList itself
  • if the Count is one, the array pointer itself points to the only item, rather than to a dynamically allocated block holding a pointer to the item.

The second trick is where we sacrifice a bit of performance, to save one dynamic allocation for lists holding a single item. Though if you benchmark the TTightList, you’ll see it holds its own fairly well against TList for the smaller item counts, which is what it was designed for.
That’s partly thanks to TList‘s own inefficiencies, and FastMM’s in-place reallocation (on which TTightLight relies, since it doesn’t maintain a capacity field).