C# 4.0 goes dynamic - a step too far?

 

By Mike James, published on 12 Feb 2009

 

With C# 3.0 still so new that many are only just beginning to appreciate, let alone use, its new features it might seem premature to be discussing the next version of the most popular .NET language. Microsoft, however, has its plans for C# 4.0 well advanced, and the changes are so important that you might not recognise your favourite language after the upgrade. Now is the time to look over the horizon in the hope that end users can influence the outcome.

The first thing to say is that C# occupies a very special niche in the panoply of .NET languages. When it was introduced there were essentially two classes of Windows programmers corresponding to mastery and use of either VB or C++. The split was fairly clear-cut as VB was easy to learn and easy to use but limited, and C++ was difficult to fathom but could do anything. You can even think of C++ as an object-oriented machine-independent assembly language if you want to, but VB, being interpreted and well removed from machine constructs, was no such thing. In addition there was also the aesthetic distinction to take into account – VB being messy and pragmatic and C++ pure and logical. Of course none of these characterisations is 100% true, they are just approximations to an average truth that is at least recognised by most programmers even if they’d argue over the fine details.

Not quite C++ nor VB

Currently we still have VB and C++ so where does C# fit into today’s landscape? Again there is no absolute truth to be found, but a reasonable answer is that C# was originally designed to be an easier to use C++. It aspired to be more logical and cleaner, yet still easy (easier) to use. This ease of use not only derives from the language structure but from the simplicity of the Framework and the IDE (i.e. Visual Studio). The Framework simplifies everything by the simple act of dumping COM as a component object technology. The IDE makes creating a C# program much easier to create than a C++ program simply by providing a drag-and-drop editor. User interface construction was never very complicated in C++, but equally its IDE support never really rose much above the simple dialog box designer. (The fact that today you can use C++ as a standard .NET language complete with drag-and-drop interface designers is something of an anomaly that doesn’t fit into the overall picture – a fact I and most other programmers choose to ignore in favour of C#.)

So C# is an “easy” C++ complete with all its advantages? Not really. The problem is that C# isn’t as powerful as C++ and this is presented as a design decision. C# creates safe, managed applications and as such simply cannot provide the programmer with all of the dangerous facilities of C++ such as pointers and direct memory access. (Of course if you go out of your way to implement them then many of the dangerous mechanisms in C++ can be yours in C#, but that’s another story.) The result is that C# programmers migrating from C++ might be a little disappointed that their dangerous tools have been confiscated, but for the VB programmer it looks like a very sharp knife indeed. The only downside for the migrating VB programmer is that C# isn’t as easy to use. It’s a bit too logical and a lot too restrictive. It tends to insist on doing things “right”.

You can’t duck out of object-oriented programming in C# and doing this right is a big learning task. However after the ideas are learned and object-oriented programming in C# seems natural, most VB programmers still can’t quite understand why C# is strongly typed. Strong typing is designed to make it possible to catch stupid programming errors at design time rather than run time. The argument goes that it automatically picks up any attempt to store apples in a container designed to store oranges. This sounds good and there is a lot of logic in the argument, but when a VB programmer first encounters the problem similar to:

string i;
i = 1;

Of course you can’t assign an integer to a string – that’s just plain silly. However a VB user (or any user of a dynamic language) will find it far from silly and wonder why they have to go to so much trouble to do explicit type conversion when they’ve been accustomed to having the compiler/language do all the work for them. Put simply a dynamic language doesn’t care much about typing and attempts to make things “just work”.

A dynamic language isn’t strongly typed because data typing is considered to be a matter of the low level mechanics of the hardware. An integer is just a way of storing a number that makes some types of arithmetic easy but display more difficult. In a dynamic language data does what it is told to and if what you tell it to do isn’t reasonable then you throw an exception which needs to be handled properly to create a robust program.

On being dynamic

You can see that strong typing isn’t quite the clear-cut good idea it first seems to be. It is also true that it is the biggest current divide between programmers and programming languages. C# started out as a young, clean, strongly-typed language but as it reaches its middle age it is displaying signs of being envious of the currently popular dynamic languages. This is made worse by the desire to meet the demand to be simpler, easier to use and more productive. The evidence of this envy is the simple fact that C# 3.0 introduced many new features that weakened type enforcement while still trying to play within the rules. Now in C# 4.0 the rule book has been thrown away with the introduction of the dynamic type – but first anonymous typing.

Anonymous typing

First recall that in C# 2.0 you always had to determine the type of the result of an expression and use an appropriate type or super type to store it. The most type-free statement you could write in C# 2.0 is something like:

object x = MyFunc();

…which works no matter what type MyFunc returns. However you can’t do anything with the returned object unless you cast it to a more appropriate type.

In C# 3.0 the anonymous type was introduced and the confusion began. An anonymous type is strongly typed in that the compiler works out its type at compile time and then applies all the usual strong typing rules. The only complication here is that if the type implied by the code doesn’t actually exist as a named type the compiler will also generate a suitable name and type. Let’s start off with the simplest ambiguous type. For example, given the function:

public int MyFunc() { return 1; }

…the statement:

var x = MyFunc();

…allows the compiler to deduce that the variable must be an int. So after this you can use:

x = 2;

…but the statements:

string i;
i = x;

…will still result in a compile time type error as x is an int and i is a string.

A slightly more complicated situation is where the type is created “on the fly”:

var x = new { name = "Mike", Address = "AnyTown" };

In this case the new type doesn’t have a name and so the compiler creates one something along the lines of “AnonymousType#1”. Now you can use statements like:

string i;
i = x.name;

…as not only has the type of the structure been determined so has the type of each of its fields. A subtle point that is important not to miss is that an anonymous type created in this way is read-only so you can’t assign to any of its fields and any attempts to do so will be picked up by the compiler. Notice that this is different from the behaviour when the inferred type already exists when the type is, unless otherwise restricted, read/write. For example, if you first declare the structure previously created on the fly:

public struct MyAdd
{
    public string name;
    public string Address;
}

…and change the function to read:

public MyAdd MyFunc()
{
    return new MyAdd {
    name = "Mike", Address = "MyTown" };
}

…then you can write:

string i;
var x = MyFunc();
x.name="new name";
i = x.name;

If you need even more proof that an anonymous type is strongly typed just notice the fact that the type and its fields are included in Intellisense prompting as you work with the code at design time! It really is that simple. If you declare two anonymous types that have an identical field structure then the compiler is smart enough to notice that it needs to create only a single new type name, but as the resulting objects are read-only you still can’t use assignment.

There are some restrictions on how you can use anonymously typed variables but they are all fairly obvious and reasonable. In particular the anonymous variable has to be local, i.e have method scope. This means that you can still use them within for, foreach and using statements. Of course it has to be mentioned that anonymous types were introduced mainly to make LINQ look easier to use – but you can still use LINQ, and indeed everything in C#, without ever using the var statement. Anonymous types are entirely a convenience and in no way a necessity.

Dynamic typing

The other end of the spectrum from strong or static typing is dynamic typing. This isn’t the abandonment of type, just a move to type checking at run rather than compile time. However this switch in emphasis tends to make programmers using a dynamic language far less aware of type than their strongly typed colleagues. Put simply, they can write almost any assignment or method call and everything will work – if it can work. The compiler’s job is to make the best sense that it can using implicit type conversions where necessary.

It’s important to realise that there are many shades of dynamic typing and many different twists on its implementation. In many ways it is better to just regard dynamic typing as a move away from the more easy-to-define, strict, static typing. Languages such as JavaScript, Ruby, Lisp, Perl, PHP, Prolog, Python and Basic are all dynamically typed even if many programmers are currently behaving as if it was a new idea. Perhaps this is just a reaction to the recent dominance of static typing and a falling out of favour of RAD – Rapid Application Development – in preference for design and testing.

Whatever the reason it now seems that C# has to go dynamic. New in C# 4.0 is the dynamic type. The best way of thinking about the new dynamic type is as an object type which allows you access to the methods and properties of the actual data type in use.

For example, suppose we have a class:

public class MyClass
{
    public int MyProperty;
    public int MyMethod() { return 1; }
}

If we now create an instance using an object reference type:

object MyObject = new MyClass();

…then trying to access any of the methods or properties of the object will fail for obvious reasons – an object type doesn’t have the methods and properties of a MyClass type. However, if you use a cast to MyClass then you can access all of the properties and methods as if MyObject was of type MyClass that is:

((MyClass) MyObject).MyMethod();

…works perfectly. In this sense using object and cast has long been the C# programmer’s way of implementing dynamic typing. However you must have had the thought “why do I need to cast the object to the correct type? – either the method call works or doesn’t work at run-time”. Apart from making it easier to discover the programmer’s intention the cast does absolutely nothing to protect you from an error at compile time – any problems only become apparent at runtime. With the new dynamic type you can indeed “drop the cast”. The same code in C# 4.0 can be written:

dynamic MyObject = new MyClass();
MyObject.MyMethod();

The dynamic type only resolves to a method or property at run time. Interestingly you can mix dynamic and anonymous as in:

dynamic MyObject = new MyClass();
var i=MyObject.MyMethod();

…and the compiler correctly works out that i should be an int – suggesting that it isn’t completely blind to the type stored in MyObject. Notice that as dynamic is a valid static type name it is perfectly possible that an anonymous type will resolve to dynamic.

You can swap from dynamic to fully typed simply by making appropriate assignments. For example:

dynamic j = 1;
int i = j;

…first creates a dynamic variable, an int, which is then converted to a strongly typed int. You can also force a conversion using a cast but as always if it can’t be done the result is a runtime exception.

Whenever you change something in a language, no matter how small or innocent the change is, the ripples spread out and reach parts of the language that you might never have guessed at. For example, with dynamic typing late binding is the rule even if the method in use isn’t virtual. Consider the following class with two overloaded versions of the same method:

public class MyClass
{
    public string MyMethod(int i)
    	{ return "Integer"; }
    public string MyMethod(double f)
    	{ return "Double"; }
}

If we now call the method but with a random type, something that is difficult to do before the introduction of dynamic, as in:

Random R = new Random();
MyClass MyObject = new MyClass();
dynamic i;
if(R.NextDouble()<.5)
{
    i = 1;
}
else
{
    i=1.0;
}
string result=MyObject.MyMethod(i);

then which MyMethod is actually called is determined at run time according to the type of i which is discovered using reflection. This isn’t a bad “side-effect”, in fact if things didn’t work like this you might well be feeling short-changed. However some effects are less desirable and much more difficult to explain logically.

The guiding principle is that what happens should correspond to what happens if the dynamic type was known at compile rather than runtime. This innocent principle can create some situations that might seem counter to what you might expect. The key difficulty is caused by the simple rule that all statically defined types use their compile time type, while which method to use is resolved at run time. For example, suppose we have a derived class which overrides a method in the original MyClass:

public class MyClass2:MyClass
{
    public string MyMethod(double f)
    	{return "Class2 Double";}
}

If we now change the creation of MyObject in the previous example to read:

MyClass MyObject = new MyClass2();

…which method will be used for a double? At compile time the type of MyObject is MyClass even if at run time it actually refers to a MyClass2 object. Applying the previous rule this means that the MyClass methods are used at run time when i turns out to be a double. Not everything about dynamic is dynamic! However if the behaviour at compile time is different then when we pretend that the dynamic type is known at compile time we do indeed get different behaviour at run time. For example, if you change the method declarations to virtual and override:

public class MyClass
{
    public string MyMethod(int i)
    	{ return "Integer"; }
    public virtual string MyMethod(double f)
    	{ return "Double"; }
}
public class MyClass2:MyClass
{
    public override string MyMethod(double f)
    	{return "Class2 Double";}
}

…then:

MyClass MyObject = new MyClass2();
string result=MyObject.MyMethod(i);

…will call the MyClass2’s double method and not MyClass’s when i resolves to double at runtime. It does this for the simple reason that this is what would happen if i were a double at compile time.

There are a number of other interesting but fairly esoteric “features” of dynamic but one final one worth mentioning is accessibility. Currently all methods and properties have to be public to be dynamically accessible. This isn’t a huge problem but it means that you can’t call private methods from within a class using dynamic parameters even though without the dynamic parameters the call would be perfectly legal. Similarly you can’t use extension methods dynamically – the information to implement them isn’t available at run time. Anonymous functions can’t appear as arguments to a dynamic method call for the same reason. This makes it difficult to use LINQ queries over dynamic objects, which is strange given that LINQ was and is a major motivation for C# becoming more dynamic.

Beyond plain .NET objects

How the method invocation or property access is handled depends on the type of object that the dynamic type references. You might think that the only possibility is the plain old .NET object, but part of the reason for introducing dynamic is to make externally derived objects easier to work with. In the case of a standard .NET object reflection is used to dispatch the operation. This is more sophisticated than you might imagine because any dynamic objects passed as parameters are resolved using reflection and then the resulting signature combined with reflection is used to make call to the appropriate method.

Moving beyond plain .NET objects a new class of dynamic objects can customise how they behave by implementing the IDynamicObject interface. In this case the task of working out which method or property is needed is handed off to the object itself to work out using any method that suits. This is the key to building truly dynamic object models that can respond in sophisticated ways.

A very big advantage of dynamic types comes when you start to think about C#’s relationship with external and non-native objects – COM objects in particular. In this case a dynamic type is resolved using the COM IDispatch interface and this in turn means that you can use COM objects “raw”, i.e. without a Primary Interop Assembly (PIA). As many COM objects make extensive use of the variant type, which can store any of a number of standard data types, being able to use dynamic types in place of variants is a big simplification.

For example, consider the standard difficulty encountered in using the Office COM object model:

((Excel.Range)excel.Cells[1,1]).
    Value= "some string"

The cast has to be included because the PIA uses object types to represent variants. Using dynamic types in place of objects makes it possible to dispense with the cast and allow the run time system to work out what is legal:

excel.Cells[1,1].Value= "some string"

Not using the PIA and driving the COM interface raw also means that you can hope to achieve a more efficient and lightweight program. There are a range of other minor enhancements to the way COM objects are dealt with in C# 4.0 that go together with dynamic types to make the whole thing easier to use. For example, COM objects often pass parameters using pointers which result in the use of ref parameters in the corresponding C# methods. This can force you to create temporary variables to avoid any changes to a variable that you regard as logically being passed by value. Now the compiler will do the job for you by converting the value parameter to a temporary copy and passing this by reference. The overall result is pass-by-value semantics for parameters that are passed as pointers.

Conclusion

As languages mature they tend to develop to meet every need and requirement. C# is in very real danger of becoming an over-stuffed language in which it is possible to program in any style, including no style at all. Looked at individually it is difficult to be too critical of any new feature. After all, who now would be without generics and even extension methods. However, a language shouldn’t just be a pragmatic collection of features – it should embody a vision of how things are to be done. Currently C# seems to be moving towards VB.NET – let’s hope it doesn’t overshoot and turn into VB 6 with classes.

posted @ 2009-09-05 01:57  pursue  阅读(195)  评论(0编辑  收藏  举报