- DelphiTools - https://www.delphitools.info -

Zero-based Strings indexes?

In a now infamous and enormous thread I won’t name, Allen Bauer [1] dropped a bomb:

<bomb>Oh, and strings may become immutable and 0-based ;-)…</bomb>

Currently Oxygene [2] has zero-based strings, I was considering it for DWScript [3] too, but the backward-compatibility implications are a bit too huge (yes, we and customers have many years of accumulated DWS code), and the kind of issues triggered by that are hard to track/fix/warn about… or are they?

One evolution that is looming (at least for DWS, can’t speak for Delphi) is having methods on base-types too, since these would be new methods, with no legacy, a zero-based convention could be introduced there, f.i.:

sub := Copy(str, 1, 10); // legacy, 1-based
sub := str.Copy(0, 10); // new, 0-based

As time passes, the functions would be marked as “deprecated”, and code migrated over to methods incrementally. The interim time would of course be a mess mix of zero and 1-based conventions… not very desirable, but certainly preferable to breaking code in non-obvious ways.

One hard case (without easy compiler-warnings) that would remain would be that of indexed character access, like “str[i]“. I can think of only one safe way around that one: not having a default array property. That could however be leverage to gain some, f.i.:

char16 := str.Char16[i]; // equivalent to old str[i]
code := str.Code16[i]; // equivalent to old Ord(str[i])
charStr := str.Char[i]; // new, retrieves the whole character (1 one or more char)
codePoint := str.Code[i]; // new, retrieves the whole unicode codepoint

The Xxx16 versions would return a a WordChar, equivalent to a current Char, and only capable of holding a character from the BMP [4]. The Xxx version would return a String (a whole Unicode character/codepoint) or an UTF32 code.

Comments? Other Ideas?