Last update January 19, 2007

Bill Baxter



I'm mostly a C++ guy, though I've started doing some Python recently too mostly due to becoming weary of programming C++. But Python isn't fast enough for all my needs. That's why I'm excited about D. The speed of C++ with something like the elegance of Python.

One thing that I found surprising going to Python was that I thought being freed from the compile-debug cycle would be liberating. It's not. Running Python programs under a debugger can take as long as compiling and running a smallish C++ program, and the biggest problem is that with dynamic typing you don't find out about even the dumbest of programming errors until you actually run the code. You spend 5 minutes getting your program to the state in which it used to crash to check your fix and then -- what? What do you mean there's no variable named "xCoord"? Oh wait I wrote "xCooord", duh. Maybe there are some lint-like tools to help with this stuff, but it's not the default. With a statically typed language like C++ or D, you catch most of the stupid bugs at compile time. If it compiles, it actually has a chance of being correct. With Python you really have to write unit tests aggressively because if you haven't exercised a particlar code path, there's a very good chance it could have a syntax error. It's good practice to write unit tests for compiled languages too, but they just aren't as critical as they are with a language like Python. D also makes writing unittests very easy.

D's Biggest Triumphs

This is a list of what I see as D's best features. These are things that either I have frequently wished that C++ had, or things about C++ that annoy me and which D gets right.

  • Elimination of header files (unless you want them)
  • Delegates (both versions)
    • block-of-code delegates and code-block literals (essentially anonymous function literals, aka lambda expressions)
    • member function pointer + bound object instance delegates
  • Inner functions
  • Built-in dynamic arrays (a la std::vector)
  • Built-in associative arrays (a la std::hashmap, which isn't actually standard, yet.)
  • Memory management (a.k.a. built-in garbage collection)
  • 'static if' (for template metaprogramming)

D Inconsistencies

This is a collection of some inconsistencies, unnecessary special-cases, or missed opportunities in D. Eliminating these inconsistencies would make D a more beautiful, easy-to-learn, and easy-to-use language. Each is rated with my subjective opinion of how big a deal it is.

[major] Array [0..length] / [0..$] syntax

This is special-cased for arrays. Doesn't work for classes with opSlice/opSliceAssign overloads. It should work for any class that has both an opSlice and a length property/method. In other words, identifier[x..length] and identifier[x..$] should behave identically in all situations to identifier[x..identifier.length].

Andrei Alexandrescu started a thread about this on the D.announce mailing list recently ( digitalmars.D.announce:6412). Basically he said the same thing. Additionally he has a nice idea to make $ into a general nullary OR unary operator. So you can also say $id to mean id.length. That's nice, but I think it would be nicer to make the $ a postfix unary operator rather than a prefix one. So id$ means id.length. Then you can really think of the '$' as just a textual substitute for '.length'. It also generalizes better I think if .length is overloaded to return an array or a function. id$(3) or id$[2] look less ambiguous to me than $id(3) or $id[4], even if the prescedence of $ is set higher than that of [] or ().

[semi-major] Length vs Size

In D, the .length property is associated with the number of elements in an array. This is great for linear containers like single-dimensional arrays. But it doesn't generalize well. For instance associative arrays also have a .length property giving the number of elements. But the way AA's are stored is non-linear. A "length" is something you can measure going from point A to point B. For an array or a string that makes since, because all the elements are lined up in memory one after the other. But for an associative array where items are scattered semi-randomly in a hash table/binary tree data structure the term 'length' of the array doesn't make much sense. Using length is fine for regular linear arrays, but it should be an alias for a more general term like .size. C++ and STL got that one right. 'Size' makes sense for just about any container you could imagine. 'Length' does not.

[major] Arrays and toString

Arrays are recognized as a special case by std.format right now, making writfln(anArray) work. But arrays should provide a ".toString()" method just like classes do instead of relying on a special-case means of conversion.

Consider a class that serves as a light wrapper around an array. It wants to have a toString method that basically gets formatted the same as array. It should be able to just forward its toString method to array:

  char[] toString() { return m_array.toString(); }
But instead the user has to be aware that the formatting of arrays is something special built into std.format, not the array itself. Arrays could present themselves to users like normal objects, but they don't, meaning special cases are necessary when handing arrays together with objects that represent collections.

[minor] Delegate literals not interchangeable with delegate variables

(Taken from this posting by BCS.)

Calling a function delegate via a static variable is allowed:

  void foo(int[] ar){}
  ...
  static int[] i = [1,2,3];
  i.foo();
And so is making a delegate variable out of an object/method pair:
  struct I
  {
    int[] v;
    void foo(){}
  }
  ...
  I i;
  auto dg = &i.foo;
But it is not possible to make a delegate variable out of a static variable/function pair.
  void foo(int[] ar){}
  ...
  static int[] i = [1,2,3];
  auto dg = &i.foo; 
Also, setting a variable to a function literal then calling that function is ok,
   static int[] i = [1,2,3];
   void function(int[]) fp = function void(int[] j){return;};
   i.fp();
But it doesn't work if the function is an anonymous literal:
   static int[] i = [1,2,3];
   i.function void(int[] j){return;}(); 

[major] Lack of Struct Constructors

Structs should at least be allowed to have constructors, if not destructors as well. Everyone uses static opCall functions to get this effect, why not just let people call it what it is, a constructor. If for no other reason then because opCall-constructors allow silly stuff like:

   Foo a;
   Foo b = a(1,2,3);
   Foo c = b(3,2,1);
D should have struct constructors. The usage syntax can be just like static op call, except it won't be callable through an instance:
   Foo c = Foo(3,2,1);  // ok -- calls this(int,int,int)
   Foo c = c(1,2,3);    // not ok unless you have static opCall defined.
If both aconstructor and static opCall with the same arguments are defined, it should generate an "ambiguious overload" error.

Destructors would be useful for implementing lightweight scoped settings. (Though the need for this has mostly been eliminated by scope classes being allocated on the stack now). See this posting:

   struct TemporaryRoundingMode
   {
      RoundingMode oldmode;
      this(RoundingMode newmode) { /*change mode*/ oldmode = setRoundingMode(newmode); }
      ~this() { /*put mode back to what it was*/ setRoundingMode(oldmode); }
   };

I think this one can be done like this now:

   scope class TemporaryRoundingMode
   {
      RoundingMode oldmode;
      this(RoundingMode newmode) { /*change mode*/ oldmode = setRoundingMode(newmode); }
      ~this() { /*put mode back to what it was*/ setRoundingMode(oldmode); }
   };
   // for using:
   { 
      scope TemporaryRoundingMode s(themode); 
      // do stuff with temp mode....
   }

[semi-major] Struct initializers

Only static structs can be initialized? There seems to be no good reason for this limitation since this:

MyStruct x = {a:expr1, b:expr2, c:expr3};

should really just be syntactic sugar for

MyStruct x;
x.a = expr1;
x.b = expr2;
x.c = expr3;

with the built-in initializations of MyStruct's a,b, and c members suppressed.

[major] No associative array literals

If they existed it would make sense for them to look like struct initializers (althogh see below for doubts about struct initializers).

   int[char[]] dict = { "foo" : 3; "bar" : 7; };

[major] methods as properties on classes

    class Foo
    {
       int foo() { return m_foo; }
       int m_foo;
    }
obj.foo returns m_foo, but it cannot be the lvalue of op=, ++, or -- operators. Thus properties cannot be used interchangably with real properties. The behavior for obj.foo is very similar, but different from that of obj.m_foo.

This type of issue (not being able to return an lvalue) has been brought up by Andrei Alexandrescu recently ( digitalmars.D.announce:6320) and apparently there's interest on Walter's part of fixing it. Here's hoping.

[major] opSlice / opSliceAssign indices

The opSlice methods get triggered by the presence of a '..' in an index, and the values on either side of the .. are passed as parameters. That's fine for 1D arrays, but more complex containers like multi-dimensional arrays, one slice range is not really adequate. You'd like to be able to slice on each index like: A[1..3,0..1,4], but this syntax generates a compiler error now. Note also that you might not want all of the indices to be slices. The only way I see to fix it is to make a Slice type and pass that as a parameter to opSlice methods that take slices. Basically it would just be a 'pair' type. Something like:

struct slice(T) 
{
   T lo, T hi; 
}
Then the opSlice could be defined like
    MySlice opSlice(slice!(int)[] s...)

which would allow the class to accept any number of slices. If any argument contains a slice.., probably all arguments should be converted to slices.

[semi-major] Slice syntax only works inside []

There are a number of ways to generalize this special case. One is to say that 1..3 is really equivalent to the array literal: [1,2], or say it's a literal equivalent to a particular kind of 'slice' object. Or say it is equivalent to an iterator. Then you could do things like

   foreach(i, 0..10) { writefln(i); }
or
   int[] x = 0..10;

[major] Varargs: _argptr and _arguments

From the Regular Expressions article "Implicit naming of match variables - this causes problems with name collisions, and just doesn't fit with the rest of the way D works."

Yet implicit naming of magic variables is exactly the approach taken for D's varargs support. These are special magic variables that simple wouldn't be necessary if the vararg "..." argument could be named. Likely syntax would be:

   void func(T1 a, T2 b, ... args) { }
Here the '...' serves the same role as the type of the other arguments. The name 'args' would then automatically be set up as a local variable of type std.stdarg.varargs. This varargs struct should basically be like Chris Miller's VArgs struct (though I propose the slightly modified version attached).

This change also need not break backwards compatibility. If the user doesn't provide a varargs argument name, then the current behavior can remain.

Links:

Chris Miller's variadic.d Bill Baxter's modification (adds a get() in addition to next())

[major] foreach/opApply/foreach_reverse/opApplyReverse

One could write a small book about this one. Certainly a small book's worth of posting about it have been made to the D newsgroups.

Here's the way I understand it. First there was foreach. There are special cases built into the compiler for built-in array types. For user types, one overrides the opApply method and the block following the foreach is passed to opApply it as a delegate parameter. For the built-in types, foreach basically compiles down to something equivalent to a regular for loop. But for user classes, all that delegate calling stuff is not quite that optimal.

Then in comes the desire to iterate backwards over containers. foreach can take an arbitary delegate instead of a container, so it is quite possible for classes to have their own reverse_iter() method, and then just do backwards iteration using

   foreach(x; &myclass.reverse_iter) { ...}
According to Walter, however, despite being very general, this "looks hackish". And perhaps more importantly, since it uses the delegate code path, it can't be as efficient as a for loop for built-in array types. And if it isn't as efficient as 'for' then people just won't use it.

The solution? Introduce a new 15-letter keyword 'foreach_reverse', which basically does the exact same thing as foreach, except it uses a backwards for-loop for built-in arrays, and calls opApplyReverse instead of opApply for user types. Quoth Walter "It was trivial to implement". Naturally so, because it's the same damn thing as 'foreach' with a few names changed!

Now what about other kinds of iteration? For trees do we need foreach_inorder / foreach_preorder / foreach_postorder? Apparently not.

The main issue seems to be that opApply just isn't as efficient as a for loop. But if Walter's not satisfied its performance for the built-in arrays, why should writers of user types be satisfied with its performance on their own data? The real issue is that the performance difference with using opApply needs to be eliminated. Then built-in arrays could use it, too, and there would be no real need for a foreach_reverse.

[semi-major] Generality of opApply

Other operators give classes the ability to behave identically to built-ins with respect to syntax supported by the built-ins. For instance opIndex allows a class to support array's [] syntax. Thus for any given object that supports indexing you can always use [] to access that functionality.

On the other hand opApply doesn't have an associated syntax. It is a function that takes a block of code (a delegate) and controls how it gets run. This is used by foreach, but is more general than foreach.

If I want to write a function that implements some sort of custom foreach behavior (say a foreach function that takes an optional delegate to do something different on the first iteration), then I can't just treat objects and arrays the same, because arrays are special. They have no opApply, and there is no syntax associated with opApply's functionality. There is a workaround using template constructs to recognize the special case of arrays, but that shouldn't be necessary.

[major] The int in opApply

A typical opApply function looks like this:

    int opApply(int delegate(inout uint) dg)
    {   int result = 0;

	for (int i = 0; i < array.length; i++)
	{
	    result = dg(array[i]);
	    if (result)
		break;
	}
	return result;
    }

Ok, what is all that result stuff about? What does it mean? It's actually a very implementation-specific detail that the writer of an opApply is expected to deal with without really understanding its meaning. The spec which explains foreach statements and opApply implementations doesn't even explain this. But in a nutshell, when you write this:

    foreach(x; foo) 
    {
        /some code/
    }
the compiler transforms the block after the foreach() into a delegate. And it is that delegate that is passed to your opApply as 'dg'. When you call dg() in your opApply you are executing one iteration of the loop body. Ok, so what's this 'result' value? You probably don't have any return statements in your loop body, so why does dg have an int return value? What happens is that the compiler turns things like goto's, labeled break's, labeled continue's, actual return statements, etc all into return statements. And each such return statement gets a different return code. Then the driver (the thing that calls your opApply) looks a the return value from your opApply and then decides if it needs to do a goto or break, etc, or just keep going.

So basically the int result code is a low-level foreach implementation detail. If you return the wrong value from your opApply, then your code will fail in certain circumstances, but you probably won't notice for a long time, until that one day when you decide you really need a goto inside your foreach body. Then your program will inexplicably not goto the place you told it to goto.

Making users handing this value is akin to making them do pointer arithmetic on the vtable to call methods. It's just not done. It's unsafe for one, and for two it's an implementation detail that users shouldn't have to worry about.

Instead the foreach loops could easily be rewritten like:

     { int _result_code=0;
       auto loop_body = {  /some code/ }
       foo.opApply( &loop_body );
       switch (_result_code) {
         ...
       }
     }
Instead of the goto's and breaks inside the loop body being rewritten as 'return some_code;' they would be rewritten as '_result_code = some_code; return true;'.

Given that, then users could write their opApply's without worrying about the return code, thus both simplifying their lives, and making their code more robust (since there's no way to mess up the return code, except by failing to return):

    void opApply(bool delegate(inout uint) dg)
    {
	for (int i = 0; i < array.length; i++)
	{
	    if (dg(array[i])) { break; }
	}
    }

The changes above should be fairly trivial to implement, and are in my opinion no-brainers.

The next step would be to figure out how to make the rewritten loop body be able to automatically return to opApply's parent, but I think that would require more significant changes (like adding coroutines). The exception mechanism is a possibility, but that would likely be too slow to be acceptable.

[minor] Cast expression

We can all rejoice that D got rid of C's cast syntax and replaced it with something that stands out a little more. But at the same time, D kept C's special unary operator grammar for the construct. Instead of that, D could have just made it appear to be a normal template function. The syntax would then become:

   cast!(float)(expression)
instead of
   cast(float) expression
There seems no real need for a special unary operator syntax here. Using standard template syntax only requires typing 3 more characters in the general case, and only one more character if 'expression' already needs to be parenthesized for precedence.

[normal/enhancement] foreach and in

The token 'in' is already a keyword (used to check presence in a list, and in function preconditions). I think D missed out on an opportunity to make foreach statements more readable by not using this keyword. Current legal foreach expressions include things like:

    foreach(var; aggregate) { ... }
    foreach(i,var; aggregate) { ... }
    foreach(uint i, float var; aggregate) { ... }
    foreach(char var; "hello" in aggregate) { ... }
I propose the ';' be replaced with 'in'. Then we have
    foreach(var in aggregate) { ... }
    foreach(i,var in aggregate) { ... }
    foreach(uint i, float var in aggregate) { ... }
    foreach(char var in "hello" in aggregate) { ... }
The last one is the tricky case. But I don't think it's ambiguous if the part inside the parens is just parsed greedily from left to right. As soon as you hit an 'in' or a ';' you know you're past the variable declarations, because 'in' cannot be part of such a declaration.

Note that this change need not break backwards compatibily. Both the ; and the 'in' can continue to be recognized as valid syntax.

[minor] Array literals not interchangeable with array variables

You can't index an array literal like you can an array variable.

    int[] ia = [1,2,3][0];
gives 'unexpected ['. With parenthesis it seems to work:
    int[] ia = ([1,2,3])[0];
but slicing the result
    int[] ia = ([1,2,3])[0..2];
gives access violation if you use ia.

[minor] auto for automatic type inferrence

I used to be unhappy with 'auto' being used for automatic type inference (ATI). But then Don Clugston explained to me by way of Sean Kelly that it's really a storage class that means 'regular old automatic variable'. And it's not the 'auto' that makes the type inference happen, but rather the lack of a type.

Still, I think I'd be happier with any of

   var foo = expression;
   val foo = expression;
   def foo = expression;
   let foo = expression;
or even with special syntax:
   @foo = expression;

[minor] Built-in init properties

It should be possible to set the init property for a typedef'ed or other user defined type. E.g.

   typedef int myint;
   myint.init = -1;
Actually it turns out there is a way to do this, just the syntax isn't quite as obvious as setting .init:
   typedef int myint = -1;
Thanks to Kirk McDonald for the tip!


FolderContributors
FrontPage | News | TestPage | MessageBoard | Search | Contributors | Folders | Index | Help | Preferences | Edit

Edit text of this page (date of last change: January 19, 2007 21:59 (diff))