Slim Reader/Writer Locks Rock!

padlockI recently posted abut the new Slim R/W Locks introduced with Vista, and how they were vastly more efficient than TMREWS.

Apparently, they’re also more efficient than Critical Sections…

Following Up

If you follow the comments in the previous article, some extra benchmarks where published, including “degenerate” cases where the Slim R/W Locks are used as exclusive locks.

If you only use the Write lock portion of a R/W Lock, it will behave like an exclusive lock, aka Critical Section.

It appeared that Slim R/W Locks are 1.5 to 2 times faster than Critical Sections.

Wrapping Up

So the advantages of Critical Sections that remain are:

  • re-entrance, though code that relies on re-entrance will often be because of bugs/shortcuts, and may be vulnerable to deadlocks as well
  • deadlock detection, though this is an IDE debugger feature, and not really automatic
  • available on Windows XP and Windows 2003

Both Critical Sections and Slim R/W Locks share the following advantages over custom implementations (like TMonitor)

  • efficient relinquishing of CPU time when lock can’t be acquired (rather than hard-coded and hand-tuned spin-lock limits)
  • future-proof for past and upcoming CPU architectures (will be updated with the rest of the OS)
  • non-Delphi debugging tools support
  • low memory footprint
  • efficient native implementations exist on other platforms and architectures (cf POSIX)

11 thoughts on “Slim Reader/Writer Locks Rock!

  1. +1
    Since Windows Vista it is my favorite, along with Condition Variable.
    Both implemented in

    Worth writing an article about Condition Variables also.
    I suspect combinations of SRW and CV may be more efficient than IOCP (on which you recently wrote about).

    Have implemented thread-safe Queue,Stack,LinkedList based on SRW and CV.
    If you find this useful, I may contribute sources for testing.

  2. Did you try the optional “spin” feature of critical sections, in your tests? AFAIK since Windows 2000 you can call InitializeCriticalSectionAndSpinCount(), and have the cs “spin” for the given count instead of putting the thread to sleep. Of course the spin count has to be carefully determined.
    Unluckily Delphi often still thinks NT4 is the last OS released by MS 🙂

  3. @LDS That function was introduced in Windows 2000 times, changes in 2003, 2008 & 2012 have made it largely irrelevant. Also hard-coding a spin count value is little different from ancient era programs that used loops as a form of timers: changes in CPU and OS architecture are bound to make you choices incorrect (even on current server hardware, you have to contend with Xeon, Atom and AMD, which have quite different performance behaviors).

  4. That is great to hear. When I read your first article I looked at your code and did my own implementation (more a wrapper as all there is to do i call OS APIs) that falls back to critical section if APIs are not available.

    I am currently playing with slim locking ad lock free structures. The goal is to do a lock free hash table. I already did one with micro locking, where each bucket in the table has its own SRW lock. This is possible due to minimalistic implementation of SRW and I think it does scale well, so you can have a lot of SRW locks with no hit on OS. I will blog about it but I need time to research all of this.

  5. @Iztok For micro-locking, you ideally need to make sure every bucket gets its own cache line, otherwise the SRW locks can interfere with each other.

    That said, IME SRW locks for a whole hashlist is often more than good enough, provided you compute the hash before acquiring the lock.

  6. @Eric. That is exactly what I found out in my tests. Hash tables are specific because the access is mostly so fast that there is no contention between threads. I was just curious what results will micro-locking give me. It confirmed my speculations that single SRW lock for hash table is more then enough, because reads and writes in hash table are basically direct access to a bucked if you calculated the hash in advance as you said. Where it becomes interesting is when you have to do a lot of enumeration (like enumeration for dead session in web server). Here a good SRW lock or lock free should shine. Micro-locking interestingly fails here because it has to lock each bucket when traversing the table. In the end this is more costly then locking globally and then just traversiong all the buckets.

    Things change if the table is to small and you have collisions as then you get linked lists for each bucket. But in such case the table already failed its purpose.

  7. @Iztok Yes, collisions are the real enemy of hash tables. The hash is one of the reason why I’m not using TDictionary btw, the “BobJenkins hash” it uses is slow and got abnormal collision rates in some tests.

  8. @Eric. Exactly, that hash is outdated. And there is no way to replace it. On top of that, enumeration on their old dictionaries was a pain (have no idea how it is now) and you could only store pointers so if you wanted to store simple data types you had to write a wrapper for each value, no thank you :). I know with generics that changed but I am still forced to use ANSI Delphi for some of our code.

    That is why I did my own version:

    I can store anything (both as key and as value), you can easily override the hashing function and it uses a very fast hashing function with good distribution. There was a competition once on the borland forums and hashes like murmur hash were ported to pascal and tested. I took one that Davy Landman ported. I still need to implement some features but it is good as it is.

Comments are closed.