Here I will share my current workflow for low-level optimization, which these days is basically a roundtrip between Delphi, SamplingProfiler, ChatGPT and Godbolt.
This can allow you to produce code that runs faster than any single C compiler, while staying with Delphi code. (more…)
Just created a new repository with a “LibCBLAS” unit meant to use the OpenBLAS library in its Windows 64bit incarnation from Delphi 10.3+
OpenBLAS is an optimized BLAS library (Basic Linear Algebra Subprograms), the DLL itself can be obtained from the “xianyi” repository where pre-compiled Windows DLL are maintained.
It occurred to me that SHA-3 being a cryptographic hash, it is one of those peculiar bits of code that are fully self-testing. Any bug in a cryptographic hash will quickly cascade to a different result, no matter the bug or the input.
This means the ad-hoc-compiler-monkey can be unleashed “safely”, and can be allowed to try “improper changes.”
A new kernel for SHA-3 (Keccak) cryptographic hashing has been committed to the DWScript repository.
It is almost 3 times faster than the Pascal version, makes use of MMX asm, and involved an “ad hoc compiler”.
A trivial way to turn a case-sensitive String hash function into a case-insensitive one is to to pass a lower-case (or upper-case) version of the String.
However, in our days of Unicode strings, this is not innocuous…
Following a recent post by A. Bouchez about an optimized CRC32 hash, I took it as an opportunity to re-run a small String Hashing Shootout on the worst hash function collision torture test I know: ZIP codes in UTF-16 (Delphi’s default String format).
In a Google+ comment to my recent article about inlining in XE6, Leif Uneus posted results from Scimark.
It appears that XE6 is about 30% slower than previous versions at least from XE5 to XE for 32bits floating point.
Note that Scimark does not make use of inlining, but does make heavy use of floating-point computations, loops and arrays.
Edit: issue discussed here was reported in QC 124652 (now marked as resolved)
First noticed by dewresearch, Delphi XE6 introduced a new optimization for inlined functions that return a floating-point value.
Here is an exploration of what was improved… and what was not improved.
I recently posted abut the new Slim R/W Locks introduced with Vista, and how they were vastly more efficient than TMREWS.
Apparently, they’re also more efficient than Critical Sections…
Here are a few findings on Multi-Read Exclusive-Write Synchronizer from a recent upgrade of DWScript‘s GlobalVar functions.
I ran some comparisons between a plain Critical Section, Delphi’s TMultiReadExclusiveWriteSynchronizer and Windows Slim Reader/Writer Lock, of which an implementation was added to the dwsXPlatform unit.