TMonitor vs TRTLCriticalSection

In this new episode of the TMonitor saga is a comparison of its locking features vs TRTLCriticalSection, aka the OS-supported critical section mechanism. Is it fair? is it fast?

Fairness means that if you have several threads attempting to lock the same resources, they’ll all get a chance and none of them gets the lion’s share. In other words, that there are no race conditions.

Speed is obvious, it’s about the overhead the locking mechanism incurs.

Edit 2013-08-24: looks like the issue discussed here should be resolved in XE5, cf Monitoring the Monitor by Allen Bauer.


A Fistful of TMonitors

…or why you can’t hide under the complexity carpet 😉

As uncovered in previous episodes, one of the keys behind TMonitor performance issues is that it allocates a dynamic block of memory for its locking purposes, and when those blocks end up allocated on the same CPU cache line, the two TMonitor on the same cache line will end up fighting for the cache line, resulting in a drastic drop of performance and thread contention. The technical term for that behavior is false sharing.

Once upon a time in a thread…

Last episode in the TMonitor saga. In the previous episode, Chris Rolliston posted a more complete test case, for which he got surprising results (including that a Critical Section approach wouldn’t scale with the thread count). Starting from  his code I initially also got similar surprising results.

edit: apparently the “crash” part of the TMonitor issues have been acknowledged by the powers that be, and a hotfix could be on the way, though it points back to QC 78415, an issue reported in 2009, ouch. Guess those 4 bytes per instance haven’t seen much use…

TMonitor woes

Primoz Gabrijelcic recently reported a possible bug with TMonitor, in the more advanced side of TMonitor.

However, when experimenting with it for DWS, I bumped on issues in the basic usage scenarios too, and reverted to using critical sections. It seems that as of Delphi XE, short of a patch, TMonitor is just a waste of 4 bytes per object instance.