Performance issue in NextGen ARC model


Weak references and their zeroing is managed in Delphi NextGen through a global hash table, with a global lock, see Arnaud Bouchez take on it:


Also apart from the global locking which itself is bad enough, there are a couple further performance issues:

  • If the locks are really through TMonitor, rather than through OS-supported critical sections, it’ll induce an extra slowdown. Under Windows 32 & 64, TMonitor.Enter/Exit is IME about 2 to 2.5 times slower than Enter/LeaveCriticalSection in low contention situations, and 10 to 20 times slower in cases of high contentions (weak references in a multi-threaded applications would thus be hit particularly hard)
  • Since all references are managed in a single hash list for the whole application, you’re bound to take a noteworthy hit from the hash list itself (especially if it is based on TDictionary<>, rather than being a dedicated implementation)

There are several ways around that situation, the first is to maintain a list per-class, as is done in mORMot, which will spread the locking and reduce the list maintenance overhead. Another way around is to do away with weak references entirely and collect reference cycles with a dedicated GC, which is what Python does, and what DWScript does (more modestly) as well. This has the benefit of removing the error-prone weak reference qualification from the code, without incurring the stalls a full GC has (see wikipedia article links) and is concurrency-friendly.

8 thoughts on “Performance issue in NextGen ARC model

  1. Embarcadero people should also start to look at MSDN and see what has been added since Vista (i.e. SWR locks). The compiler should become able to target a minimum release of Windows and take advantage of new features when they exists. For some XP/2003 support can still be important, but when your code will only run on newer releases why it should be stuck to old calls?

  2. What a stupid idea to zero weak references! Weak references must be like pointers to objects in the current delphi. If you access a member of an already destroyed or a not initialized object reference, it’s your fault!

    What a stupid idea to automatic reference count all objects!
    IME, even the automatic AddRef, Release calls for interface variables with InterlockedIncrement, InterlockedDecrement involved can reduce performance of parallel code significantly. When you call an atomic operation on a memory location, not only this memory location is locked, but the whole memory bus is locked! (at least on x86 cpus).

  3. @Vitali

    I suspect you did not read the technical documentation of Intel/AMD CPUs since years.

    AFAIR the “lock” prefix does not lock the memory bus any more.
    And latest versions are able not to lock the L1, L2 or L3 cache, but only if needed.

  4. @A. Bouchez
    It would be great, but on my developer PC atomic operations are very costly, and the CPU is not so much outdated, it’s not an I5, but still a quad core. By some reason all my attempts to gain a 4x speed up on parallelization of algorithms failed, and believe I tried hard. Maximum what I usually get is 3x. And the main bottlenecks are memory writes and atomic operations.

Comments are closed.