- DelphiTools - https://www.delphitools.info -

Optimistic Unicode case-insensitive CompareText

In Unicode Delphi, post-Delphi 2009, there are two ways of making case-insensitive string comparisons, CompareText [1], which only does case-insensitivity in the ASCII range (non-accentuated characters), and the judiciously misnommed AnsiCompareText [2], which works on the whole Unicode range by calling into the Windows API.

Alas, AnsiCompareText is slow, very slow, as illustrated by TStringList in this post [3] f.i., not just because Microsoft didn’t do a good job at implementing it, but also because the Unicode Consortium [4] did its very best to scatter case-sensitive characters across the whole set… In addition you have special locale-based collation rules to deal with, which have to be applied independently from Unicode.

However, if you’re dealing with primarily latin-based strings, and looking at case-sensitive matching (rather than ordering), it’s possible to cut down significantly on the complexity of AnsiCompareText by replacing it with an “Optimistic UnicodeCompareText“, which will use an ASCII-based strategy like CompareText does, and fallback to the Windows API function when, and only when a non-ASCII character is encountered.

Of course, it won’t help with Cyrillic or Greek, but it will work well for most western-languages, where accentuated and special characters are infrequent (such as german or french).

You can find such a UnicodeCompareText function in DWScript [5]‘s dwsUtils.pas [6] unit, it’s being used to add support for Unicode identifiers in DWS, without incurring a performance penalty for all the pure ASCII identifiers.