iif… anonymous expression parameters?

While making the rounds of “compiler magic” functions, I bumped on iif, the ternary operator, which f.i. Prism, VB and other support. Which looks like:

function iif (boolean; expressionIfTrue; expressionIfFalse) : value;

One part of the magic goes to the type-checking, the other part which interests me here, is that in a regular function call, all parameters are evaluated once before the call is made.
For iif, either expressionIfTrue or expressionIfFalse is evaluated, but not both, this means you can have such code as:

v := iif( Assigned(myObject); myObject.VirtualGetMethod; zero );

While with a regular function (without compiler magic), if myObject isn’t assigned, you would get an exception, as myObject.VirtualGetMethod would be invoked regardless. There are obviously countless other uses when the expressions have side-effects.

It occurred to me that in DWS, that “magic” is not only already available to internal “magic functions”, but that it could also be easily offered in the language and made no longer magic at all. It could just be another call convention, in which you wouldn’t pass a value or a reference, but something akin to a “light” anonymous expression parameter.

Would it be worth it?

Such a parameter could be qualified with a uses for instance (to reuse the keyword) rather than a var or const.

function iif( bool : Boolean; uses ifTrue, ifFalse : Variant) : Variant;
begin
   if bool then
      Result := ifTrue
   else Result := ifFalse;
end;

Would declare the iif “magic” function on variants.

Nothing would limit you to invoke a uses expression only once, so f.i.

procedure PrintNFloats(n : Integer; uses needFloat : Float);
begin
   while n>0 do begin
      Print(needFloat);
      Dec(n);
   end;
end;

PrintNFloats(10, Random); // will print 10 different random numbers

And you could use the caller’s capture for side-effects, f.i. by combining a var parameter and a uses expression parameter making use of that variable.

procedure SkipEmpty(var iter: Integer; maxIter: Integer; uses needStr: String);
begin
   while (iterator<=maxIterator) and (needString='') do
      Inc(iterator);
end;
...
SkipEmpty(iter, sl.Count-1, sl[iter]);  // with a TStrings
SkipEmpty(iter, High(tbl), tbl[iter]);  // with an array

Contrary to anonymous functions, the capture is thus explicitly declarative, and also hierarchical only (it’s only valid in functions directly called from your functions). That’s a drastic limitation, so such a syntax isn’t intended for out-of-context tasks (like closures), but for local sub-tasks, which you also guarantee will be local (something that anonymous methods can’t guarantee).

And as a final sample, in the exemple above if you want to equate the ‘hello’ and ‘world’ strings to an empty string for SkipEmpty, you could use:

SkipEmpty(iter, sl.Count-1,
          iif(sl[iter] in ['hello', 'world'], '', sl[iter])
          );

You could thus chain the expression parameters to introduce some non-traditional (for Delphi code) behaviors.

All in all, this could cover a variety of scenarios for default values, conditional alternatives, iterators, with a much restricted capability compared to full-blown anonymous methods, but with hopefully less scope for confusion than anonymous methods offer. But still, it would introduce the possibility of complex side-effects.

Any opinions? Should the possibility be surfaced or be kept only as an internal magic?

Post Scriptum:

As Craig Stuntz & APZ noted in the comments, this is similar to Digital Mars D’s lazy parameters, and both suggested using the “lazy” keyword in place of “uses” to match. However, lazy evokes more delayed evaluation, but evaluated once (as in “lazy binding”, etc.), something D doesn’t seem to support AFAICT with the lazy keyword (every use of a parameter leads to an evaluationif I’m not mistaken). While “uses” was to indicate you could “use” the parmeter’s underlying expression, as many times as you wanted to. More input welcome 🙂

Goodies from the SVN side for DWScript

Here is a summary of the recent changes:

  • support for step in the for loop structure (step value must be >=1, enforced at compile time for constants, at runtime for dynamic values). Note that step is only a contextual keyword, and not a reserved word (you can still have variables named “step” f.i.).
    for i := 0 to 10 step 2 do ...
    for i := 100 downto 1 step 3 do ...
  • support Prism-like exit syntax, ie. the following statements are now equivalent:
    Result := value; exit;
    Exit( value );
    exit value;
  • support not in form for the in operator, both following expressions are equivalent:
    value not in [...alternatives...]
    not (value in [...alternatives...])
  • break/continue outside of a loop are now detected at compile-time.
  • improved infinite loop detection: compiler will now warn about more cases of while/repeat with constant conditions and no inner break/exit.

The first elements to support language extensions are also in, more details coming in a future post!

All the Delphi arrays…

Delphi arrays have a few quirks (as mentionned here on TURBU f.i.), which arise from there being actually four different types arrays in Delphi, with limited interoperability:

  • array [low..high] of TSomeType: the bounded array, a value type, useful for structures, fixed-size vectors & matrices, and also piggybacked by the constant arrays.
  • array of TSomeType: the dynamic array, a reference type, almost all the time, but not strictly usable as a reference type (see below).
  • array of const: the open array, actually an array of TVarRec managed with compiler magic.
  • String/AnsiString: arrays of Char/AnsiChar, behaves like a value type, implemented as a reference with copy-on-write.

Dyanamic array half-bakedness

Thing is, all of these arrays have pitfalls of their own, the dynamic array f.i. is a classical trap, and you don’t really want to use it as anything else but a field or local variable. Why? Because it is a reference type which the RTL treats as a value type, f.i.

    var a, b : array of Integer;
    SetLength(a, 1);
    a[0] := 1;
    b := a;
    b[0] := 2; // after that a[0] is 2
    SetLength(a, 2); // that effectively decouples a and b
    a[0] := 3; // after that b[0] is still 2

In other words, the dynamic arrays stays a reference as long as you don’t change its size… If you wonder why using TBytes for holding binary buffer is frowned upon, that is one of the reasons. Things quickly turn to mush if pass a dynamic array around functions (think Java-like deep vs shallow copy mess).

This could all be solved and improved by providing RTL functions that would treat dynamic arrays as a reference type, and ideally by having a new flavor of dynamic arrays with copy-on-write semantics (like what String does for Char, but that would allow any element type).
Though arguably, once you have that new dynamic array, the old dynamic array as a reference type only value would be useful for optimization (to bypass the reference count check when writing to the elements of the array).

Array interoperability

Another issue, is that there is little to no cross-array type support, f.i. you can’t initialize a dynamic arrays by assigning to it a constant array f.i., or have inline or default values for typed array parameters, etc.

Why am I mentioning all this? Because I’m currently investigating array support in DWS. Currently DWS doesn’t have dynamic arrays (apart from COM variant arrays), that isn’t too problematic in a scripting language, as you usually don’t code basic containers script-side, and once you have various base TXxxList, you don’t really “need” dynamic arrays, but still, they would be convenient is a variety of situations.

Having more “cooperative” array types is high on the DWS priority list, as uncooperative array types are a recurring PITA in Delphi. Ideally, this cooperativeness would extend to containers (being able to initialize a dynamic array from an enumerator, etc.). But I also don’t want to fork a whole new incompatible Pascal language branch on that aspect, just for DWS.
Hopefully, there will be some debate on this issue, and good ideas may be flown around. ^_^

DWScript conditional compilation directives

Current SVN version of Delphi Web Script now supports the following directives (all new but include and filter):

  • $I, $INCLUDE, $F, $FILTER: include specified file (which can be on disk or come through the virtual file system), the FILTER variants will include the file after filtering it.
    {$INCLUDE 'mysource.inc'}
  • $DEFINE, $UNDEF: define and un-define a conditional.
    {$DEFINE SPECIAL_CODE}
  • $IFDEF, $IFNDEF, $ELSE, $ENDIF: allow specifying conditionally compiled blocks.
    {$IFDEF SPECIAL_CODE}
    ...special code here...
    {$ELSE}
    ...not so special code here...
    {$ENDIF}
  • $HINT, $WARNING, $ERROR, $FATAL: output a custom compiler hint, warning or error message.
    {$HINT 'That is not wise...'}
    {$WARNING 'You should NOT be doing that!'}
    {$ERROR 'Do not do that. Period.'}
    {$FATAL 'After that, I won''t even try to compile your code!'}

Conditionals are case-insensitive names, conditional blocks can be nested.
Default conditionals when compiling can be adjusted via a new property in the TdwsConfiguration.

edit: added $FATAL

Faster, smaller, safer

Here is a summary of recent changes for DWScript, available in the SVN version:

  • faster: compilation is now about 30% faster in situations like that benchmark, thanks to a few bug fixes (typechecks were performed multiple times) and a couple tokenizer enhancements.
  • smaller: reduced memory usage for compiled scripts (about 15% in the infamous benchmark, which translates in an execution speedup of around 5% once a few hundred lines are involved).
  • safer: fixed an old issue with object reference cycles, which weren’t covered by the reference counting and thus could be leaked.
  • basic support for “for..in”, at the moment limited to the case of enumerations (for <element> in <enumeration>).
  • fixed compile error source locations for some issues, plus various minor fixes and enhancements.

DWScript preview 4

Last DWS preview zip was already a while back, so I posted a new preview zip of Delphi Web Script for the SVN-averse.
For the changes since the previous zip, you may want to check here, here and here), as while you’re at it, you may as well check the following:

  • reactivated the COM Connector, which allows to connect to arbitrary COM/OLE objects from within a script. More tests are required.
  • introduced initial support for the “in” operator, at the moment, it merely allows use of the “case..of” syntax in boolean expressions, with no specific optimizations f.i.
    if (n in [4, 6..8, 10]) then ...
    if (s in ['abc', 'def']) then ...
  • open arrays can now be declared in script functions, as “const someName : array of const” parameters. They behave like arrays of variants from a type-checking point of view.
  • added basic support for default values in the shorthand notation used to register internal functions, this is used for Inc and Dec at the moment.
  • introduced constant unification, this is both a memory and performance optimization, it’s still largely experimental at the moment.
  • various other optimizations and bug fixes.

Thanks go to Alexey Kazantsev for the testing efforts!

Optimizing for memory: a tighter TList

One of the memory hogs when you have object trees or graphs can be good old TList and its many variants (TObjectList, TList<T>, etc.).

This is especially an issue when you have thousandths of lists that typically hold very few items, or are empty.
In the DWS expression tree f.i. there are quickly thousandths of such lists, for local symbol tables, parameter lists, etc.

How much does a TList cost in terms of memory?

A TList holding a single item already costs you:

  • 4 bytes for the field in the owner object
  • 20 bytes for the TList instance
    • 8 hidden bytes: Monitor + VMT pointer
    • 12 field bytes: data pointer + Count + Capacity
  • 4 bytes for the data

So that’s 28 bytes, with two dynamically allocated blocks which, and those dynamic allocations, depending on your allocation alignment settings, can cost you something like an extra dozen of bytes (even with FastMM).

What about the other TList variants?

  • TObjectList has an extra boolean field, which with alignment, costs an extra 4 bytes.
  • TList<T> has an instance size of 28 bytes, and a dynamic array storage with 8 hidden extra bytes (4 for length, 4 for the refcount).

Neither of these are better candidates for memory-efficient small lists.

Enter the TTightList

You can find TTightList in DWS’s dwsUtils unit.
For an empty list, the cost is 8 bytes, no dynamic memory, and for a list with a single item, 8 bytes still with no dynamic memory.
For a n-items list, the cost is 8 bytes plus one n*4 bytes dynamic block.

To achieve that, the TTightList makes use of two tricks:

  • it’s designed to be composed, and hosted as an object field
    • it’s a record-with-methods, not a class, but retains classic-looking use semantics (.Add, .IndexOf, .Clear etc.).
    • eliminates the need for a pointer to the instance in the host object
    • eliminates the dynamically allocated storage for the TTightList itself
  • if the Count is one, the array pointer itself points to the only item, rather than to a dynamically allocated block holding a pointer to the item.

The second trick is where we sacrifice a bit of performance, to save one dynamic allocation for lists holding a single item. Though if you benchmark the TTightList, you’ll see it holds its own fairly well against TList for the smaller item counts, which is what it was designed for.
That’s partly thanks to TList‘s own inefficiencies, and FastMM’s in-place reallocation (on which TTightLight relies, since it doesn’t maintain a capacity field).