- DelphiTools - https://www.delphitools.info -

UTF-8, UTF-16 or both? (poll)

dws-mirrorThe FreePascal [1] version of DWScript [2] has been stalled for a little while on the incomplete UnicodeString (utf-16) support among other things.

It’s hard to blame the FreePascal team for that, given that Linux is primarily utf-8, and that utf-8 has quite a few advantages [3] over utf-16.

utf-16 is an historical quirk

500px-Unicode_logoSummarily, utf-16 was designed in an era where 65536 characters was thought to be good enough for everyone, but this just didn’t quite turn out that way [4] as Verity Stob recounts, and utf-16 is just as variable-length as utf-8 in the modern world where a fair share of people [5] use alphabets with many glyphs. On the other hand, utf-16 became a de-facto standard in many languages and platforms, Java, .Net, JavaScript, Delphi since 2009, etc. despite its many quirks (from giving a false sense of security with characters to being exposed to endianness issues [6]). And utf-16 isn’t even saved by non-latin content: just go to any Chinese text-heavy webpage and compare the utf-8 and utf-16 sizes. Punctuation and markup get utf-8 ahead.

DWScript situation

DWScript’s String type is currently utf-16, in Windows and in Smart [7], but I’m wondering whether to allow them to be utf-8 instead on some targets (for FreePascal & Linux). DWScript doesn’t have a distinct utf-16 character type, characters in DWScript are either Unicode code-points (utf-32) or small strings (to accommodate Chinese characters). While this would “fork” the language, the effects would be restricted to code that

All those cases are probably quite low-level. The rest of the code would remain utf-8/utf-16 agnostic.

So what do you think?

DWScript utf-8 and/or utf-16 Strings?

  • Stick to utf-16 everywhere (lower performance in Linux) (29%, 36 Votes)
  • Fork the language (use what's best on each platform) (56%, 71 Votes)
  • I have no idea what this all means (15%, 19 Votes)

Total Voters: 126

Loading ... Loading ...

Delphi mobile compilers

Note that I am aware that the new Delphi mobile compilers dropped UTF8String [8] support (leading to ugly marshalling and performance issues [9]), but the new Delphi mobile compilers have priced themselves out of the market as far as I’m concerned, and the reliance on FMX is either problematic or an extra cost [10].

So non-HTML5 mobile support for DWScript is more likely to come through FreePascal than Delphi at the moment.

freepascal