Last update September 23, 2012

Language Devel / DIPs /
DIP19



DIP19: Remove comma operator from D and provision better syntactic support for tuples

DIP:19
Title:Remove comma operator from D and provision better syntactic support for tuples
Version:1
Status:Draft
Created:2012-09-23
Last Modified:2012-09-23
Author:Andrei Alexandrescu
Links:Archive?

Abstract

The comma operator that D inherited from C reserves a highly convenient syntactic construct for a feature (sequence expressions) that is of obscure utility and precludes more useful semantics such as value tuple literals and types. This DIP proposes removal of the comma expression while keeping its more frequent uses (such as in a for statement) intact by modifying the grammar.

Rationale

Virtually all of the rationale of removing the comma operator is tied to using constructs involving comma-separated lists for better syntactic integration of tuples.

Currently, support for value tuples in D is provided via the library type constructor Tuple and the helper function tuple, both defined in std.typecons. For example, to define a function that both accepts and returns a tuple, one would write:

Tuple!(double, double) fun(Tuple!(int, int) a, double b) {
    return tuple(a[0] / b, a[1] / b);
}

To invoke the function above, user code would need to create a tuple, possibly by using the tuple function itself:

auto result = fun(tuple(1, 2));

To pass the result of fun to another function that accepts two doubles, one would need to deconstruct the tuple by using the expand property like this:

void gun(double x, double y) { ... }
gun(fun(tuple(1, 2)).expand);

Requiring explicit deconstruction is unlike other languages, in which deconstruction is implicit. This is arguably a good decision for D because implicit deconstruction would make it difficult to analyze what functions get called in what contexts, given overloading. Essentially with implicit deconstruction a call such as gun(fun(tuple(1, 2))) may call a function with arity impossible to know without knowing all overloads of fun and gun. With implicit deconstruction, tuples are trafficked as a first-class value all the time, and "explode" into their constituents only when a trailing .expand is present.

The library type Tuple obeys the normal rules of D structs. This has been a mixed blessing. One the plus side, using the power of the language to implement a useful abstraction is a dogfood issue. On the negative side, new users often start with the expectation that tuples are supposed to be somehow special in providing magic not afforded to library constructs, such as special syntax for tuple types and tuple literals. Inventing new tuple-related syntax and capabilities has been a popular topic on D forums. Most often, the syntax (T1, T2, T3) is suggested as most natural for describing a tuple type containing one value for each of the types T1, T2, and T3. Also, the syntax (expr1, expr2, expr3) is suggested for representing tuple literals. These constructs are made inaccessible by the comma operator, which already assigns to (expr1, expr2, expr3) the meaning "evaluate each expression in sequence and yield the result of expr3 as the expression's result". This DIP mainly intends to deprecate this arguably useless meaning of a comma-separated expression list, without prescribing a specific line of action for tuples. Nevertheless the design space opened by the deprecation of comma expressions is worth a thorough look to assess the opportunities opened by this DIP. To explore a step further, assuming the syntactic constructs just mentioned, the code above could be written as:

(double, double) fun((int, int) a, double b) {
    return (a[0] / b, a[1] / b);
}
...
auto result = fun((1, 2));
...
void gun(double x, double y) { ... }
gun(fun((1, 2)).expand);

(Explicit expansion has been kept, for it is arguably useful.) The code is shorter by the use of the new types and literals. It could be argued that the new notations introduce their own issues. For example, tuples of one element are problematic. If (1, 2) is a tuple with type (int, int), then it may be expected that (1) should be a tuple with type (int) (i.e. a tuple with only one int), and by simple composition 1 should have type int (i.e. a tuple with only one element, which in turn is a tuple with one int). Ascribing meaning to parentheses is problematic, and in order to fix that, a possible solution would be to require a trailing comma in tuples with only one element. As such, (1, ) would be a tuple containing only one int, and ((1, )', ) would be a tuple containing a tuple of one int. Similarly, empty tuples occur naturally in code that handles tuples of arbitrary size by recursively reducing their size. The empty tuple could not be () because of various potential confusions, so a possibility would be to use the comma again for the syntax (,). At this point, it is worth critically reevaluating the entire approach to tuple literals as a significant syntactic machinery that adds a variety of new constructs and rules, for an increasingly doubtful benefit. Revisiting the existing syntax makes it shine by comparison: tuple(1), tuple(tuple(1)), and tuple() are simple and self-explanatory constructs that don't need any special syntactic or semantic support.

Similar syntactic issues affect writing tuple types. There must be a way to express tuple types with only one type. In that space, arguably (int) and int may be more palatable syntax because extra parens are traditionally not ignored in types as they are in expressions. Still, using () for the empty tuple type may be ambiguous or at least confusing.

Getting back to the current state of the art, there are a few imperfections of the current library implementation that ask for special language support. One is slicing. If t has type Tuple!(int, string, double), then t[0 .. 2] is supposed to yield Tuple!(int, string). However, that's not the case. To illustrate, the code below compiles but its assertion fails:

import std.typecons;
void main() {
    Tuple!(int, string, double) t1;
    Tuple!(int, string) t2 = t1[0 .. 2];
    auto t3 = t1[0 .. 2];
    assert(is(typeof(t2) == typeof(t3));

To investigate further, code that prints the types of t2 and t3 could be added:

    writeln(typeid(t2));
    writeln(typeid(t3));

Surprisingly, the first line compiles but the second doesn't, with the error message "Error: no type for typeid(t3)". This is highly irregular, because t3 is a value and all values are supposed to have a type and allow typeid applied to them. To investigate further, the code above could be replaced with

    writeln(typeof(t2).stringof);
    writeln(typeof(t3).stringof);

This code compiles and runs, printing Tuple!(int,string) for t2 and then (int, string) for t3. The latter type name is synthesized by the compiler but does not have a correspondent in D source code. Similarly, typeof(t1[0 .. 1]).stringof is (int) and typeof(t1[0 .. 0]).stringof is ().

This anomaly is due to the fact that the slicing operator applies not to the Tuple!(int, string, double) structure, but instead to its internally-held built-in tuple, which is exposed through alias this. There are no simple and obvious ways to resolve this anomaly. One option would be to forego the use of the slicing operator and use a template method instead to replace t1[0 .. 2] with t1.slice!(0, 2). This is entirely possible as an engineering tradeoff, but hardly a step forward toward improving syntactic convenience of tuples.

It should be mentioned that this irregularity with slicing is rarely noticeable and easy to work around. The result of a slice is readily accepted by tuple and converted to the expected tuple; writing tuple(t3[0 .. 2]) yields an object with type Tuple!(int, double) initialized appropriately. Also, this definition works:

Tuple!(int, string) t2 = t1[0 .. 2];

although why that works is unclear, since substituting t1[0 .. 2] with 1, "a" or (1, "a") does not compile.

Description

This proposal eliminates the grammar for expr1, expr2 [ , expr3 ]*. However, it does not eliminate, at least for the time being, the syntax used inside the for statement. This is because the use of comma in the increment expression of the for statement is the most frequent use of the comma operator. It is also confined enough to be converted safely into punctuation specific to that statement. Under this DIP, the grammar of for changes from:

for ( for-init-statement ; expr ; expr ) statement

to

for ( for-init-statement ; expr ; expr [ , expr ]*) statement

Backward compatibility

This DIP breaks existing code that currently uses the comma operator (except for code in for statements as noted). It could also potentially change the semantics of code silently. Consider:

auto fun(int i, int j) {
    return (++i, j + 2);
}

Under the current rules fun returns int, but under the new rules it returns a tuple containing two ints.

Usage

Usage of this DIP entails foregoing use of the comma operator. There is arguably no case in which the comma operator is vital to some coding idiom or style.

Copyright

This document has been placed in the Public Domain.

FrontPage | News | TestPage | MessageBoard | Search | Contributors | Folders | Index | Help | Preferences | Edit

Edit text of this page (date of last change: September 23, 2012 22:35 (diff))