New version of Data Structures and Algorithms book now online

Some of you may remember an earlier post about a free Data Structures and Algorithms book which I’m occasionally helping out with in terms of editing.

I’ve just been told that a new version has recently been uploaded – please check it out. I have to admit that I haven’t actually done any work on this for a while… maybe I’ll get round to it in the New Year. Happy Christmas reading :)

Value types and parameterless constructors

There have been a couple of questions on StackOverflow about value types and parameterless constructors:

I learned quite a bit when answering both of these. When a further question about the default value of a type (particularly with respect to generics) came up, I thought it would be worth delving into a bit more depth. Very little of this is actually relevant most of the time, but it’s interesting nonetheless.

I won’t go over most of the details I discovered in my answer to the first question,  but if you’re interested in the IL generated by the statement “x = new Guid();” then have a look there for more details.

Let’s start off with the first and most important thing I’ve learned about value types recently:

Yes, you can write a parameterless constructor for a value type in .NET

I very carefully wrote “in .NET” there – “in C#” would have been incorrect. I had always believed that the CLI spec prohibited value types from having parameterless constructors. (The C# spec used the terminology in a slightly different way – it treats all value types as having a parameterless constructor. This makes the language more consistent for the most part, but it does give rise to some interesting behaviour which we’ll see later on.)

It turns out that if you write your value type in IL, you can provide your own parameterless constructor with custom code without ilasm complaining at all. It’s possible that other languages targeting the CLI allow you to do this as well, but as I don’t know any, I’ll stick to IL. Unfortunately I don’t know IL terribly well, so I thought I’d just start off with some C# and go from there:

public struct Oddity
{
    public Oddity(int removeMe)
    {
        System.Console.WriteLine(“Oddity constructor called”);
    }
}

I compiled that into its own class library, and then disassembled it with ildasm /out:Oddity.il Oddity.dll. After changing the constructor to be parameterless, removing a few comments, and removing some compiler-generated assembly attributes) I ended up with this IL:

.assembly extern mscorlib
{
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )  
  .ver 2:0:0:0
}
.assembly Oddity
{
  .hash algorithm 0x00008004
  .ver 0:0:0:0
}
.module Oddity.dll
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003
.corflags 0x00000001

.class public sequential ansi sealed beforefieldinit Oddity
       extends [mscorlib]System.ValueType
{
  .pack 0
  .size 1
  .method public hidebysig specialname rtspecialname 
          instance void  .ctor() cil managed
  {
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldstr      "Oddity constructor called"
    IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_000b:  nop
    IL_000c:  ret
  }
}

I reassembled this with ilasm /dll /out:Oddity.dll Oddity.il. So far, so good. We have a value type with a custom constructor in a class library. It doesn’t do anything particularly clever – it just logs that it’s been called. That’s enough for our test program.

When does the parameterless constructor get called?

There are various things one could investigate about parameterless constructors, but I’m mostly interested in when they get called. The test application is reasonably simple, but contains lots of cases – each writes to the console what it’s about to do, then does something which might call the constructor. Without further ado:

using System;
using System.Runtime.CompilerServices;

class Test
{
    static Oddity staticField;
    Oddity instanceField;
   
    static void Main()
    {
        Report(“Declaring local variable”);
        Oddity localDeclarationOnly;
        // No variables within the value, so we can use it
        // without inializing anything
        Report(“Boxing”);
        object o = localDeclarationOnly;
        // Just make sure it’s really done it
        Report(o.ToString());
        Report(“new Oddity() – set local variable”);
        Oddity local = new Oddity();
        Report(“Create instance of Test – contains instance variable”);
        Test t = new Test();
        Report(“new Oddity() – set instance field”);
        t.instanceField = new Oddity();
        Report(“new Oddity() – set static field”);
        staticField = new Oddity();
        Report(“new Oddity[10]”);
        o = new Oddity[10];
        Report(“Passing argument to method”);
        MethodWithParameter(local);
        GenericMethod<Oddity>();
        GenericMethod2<Oddity>();
        Report(“Activator.CreateInstance(typeof(Oddity))”);
        Activator.CreateInstance(typeof(Oddity));
        Report(“Activator.CreateInstance<Oddity>()”);
        Activator.CreateInstance<Oddity>();
    }
   
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void MethodWithParameter(Oddity oddity)
    {
        // No need to do anything
    }
   
    static void GenericMethod<T>() where T : new()
    {
        Report(“default(T) in generic method with new() constraint”);
        T t = default(T);
        Report(“new T() in generic method with new() constraint”);
        t = new T();
    }
   
    static void GenericMethod2<T>() where T : struct
    {
        Report(“default(T) in generic method with struct constraint”);
        T t = default(T);
        Report(“new T() in generic method with struct constraint”);
        t = new T();
    }

    static void Report(string text)
    {
        Console.WriteLine(text);
    }
}

And here are the results:

Declaring local variable
Boxing
Oddity
new Oddity() – set local variable
Oddity constructor called
Create instance of Test – contains instance variable
new Oddity() – set instance field
Oddity constructor called
new Oddity() – set static field
Oddity constructor called
new Oddity[10]
Passing argument to method
default(T) in generic method with new() constraint
new T() in generic method with new() constraint
default(T) in generic method with struct constraint
new T() in generic method with struct constraint
Activator.CreateInstance(typeof(Oddity))
Oddity constructor called
Activator.CreateInstance<Oddity>()
Oddity constructor called

So, to split these out:

Operations which do call the constructor

  • new Oddity() – whatever we’re storing the result in. This isn’t much of a surprise. What may surprise you is that it gets called even if you compile Test.cs against the original Oddity.dll (without the custom parameterless constructor) and then just rebuild Oddity.dll.
  • Activator.CreateInstance<T>() and Activator.CreateInstance(Type). I wouldn’t be particular surprised by this either way.

Operations which don’t call the constructor

  • Just declaring a variable, whether local, static or instance
  • Boxing
  • Creating an array – good job, as this could be a real performance killer
  • Using default(T) in a generic method - this one didn't surprise me
  • Using new T() in a generic method – this one really did surprise me. Not only is it counterintuitive, but in IL it just calls Activator.CreateInstance<T>(). What’s the difference between this and calling Activator.CreateInstance<Oddity>()? I really don’t understand.

Conclusions

Well, I’m still glad that C# doesn’t let us define our own parameterless constructors for value types, given the behaviour. The main reason for using it – as far as I’ve seen – it to make sure that the “default value” for a type is sensible. Given that it’s possible to get a usable value of the type without the constructor being called, this wouldn’t work anyway. Writing such a constructor would be like making a value type mutable – almost always a bad idea.

However, it’s nice to know it’s possible, just on the grounds that learning new things is always a good thing. And at least next time someone asks a similar question, I’ll have somewhere to point them…

Redesigning System.Object/java.lang.Object

I’ve had quite a few discussions with a colleague about some failures of Java and .NET. The issue we keep coming back to is the root of the inheritance tree. There’s no doubt in my mind that having a tree with a single top-level class is a good thing, but it’s grown a bit too big for its boots.

Pretty much everything in this post applies to both .NET and Java, sometimes with a few small changes. Where it might be unclear, I’ll point out the changes explicitly – otherwise I’ll just use the .NET terminology.

What’s in System.Object?

Before we work out what we might be able to change, let’s look at what we’ve got. I’m only talking about instance methods. At the moment:

Life-cycle and type identity

There are three members which I believe really need to be left alone.

We need a parameterless constructor because (at least with the current system of chaining constructors to each other) we have to have some constructor, and I can’t imagine what parameter we might want to give it. I certainly find it hard to believe there’s a particular piece of state which really deserves to be a part of every object but which we’re currently missing.

I really don’t care that much about finalizers. Should the finalizer be part of Object itself, or should it just get handled automatically by the CLR if and only if it’s defined somewhere in the inheritance chain? Frankly, who cares. No doubt it makes a big difference to the implementation somewhere, but that’s not my problem. All I care about when it comes to finalizers is that when I have to write them it’s as easy as possible to do it properly, and that I don’t have to write them very often in the first place. (With SafeHandle, it should be a pretty rare occurrence in .NET, even when you’re dealing directly with unmanaged resources.)

GetType() or (getClass() in Java) is pretty important. I can’t see any particular alternative to having this within Object, unless you make it a static method somewhere else with an Object parameter. In fact, that would have the advantage of freeing up the name for use within your own classes. The functionality is sufficiently important (and really does apply to every object) that I think it’s worth keeping.

Comparison methods

Okay, time to get controversial. I don’t think every object should have to be able to compare itself with another object. Of course, most types don’t really support this anyway – we just end up with reference equality by default.

The trouble with comparisons is that everyone’s got a different idea of what makes something equal. There are some types where it really is obvious – there’s only one natural comparison. Integers spring to mind. There are other types which have multiple natural equality comparisons – floating point numbers (exact, within an absolute epsilon, and within a relative epsilon) and strings (ordinal, culture sensitive and/or case sensitive) are examples of this. Then there are composite types where you may or may not care about certain aspects – when comparing URLs, do I care about case? Do I care about fragments? For http, if the port number is explicitly specified as 80, is that different to a URL which is still http but leaves the port number implicit?

.NET represents these reasonably well already, with the IEquatable<T> interface saying “I know how to compare myself with an instance of type T, and how to produce a hashcode for myself” and IEqualityComparer<T> interface saying “I know how to compare two instances of T, and how to produce a hashcode for one instance of T.” Now suppose we didn’t have the (nongeneric!) Equals() method and GetHashCode() in System.Object. Any type which had a natural equality comparison would still let you compare it for equality by implementing IEquatable<T>.Equals – but anything else would either force you to use reference equality or an implementation of IEqualityComparer<T>.

Some of the principle consumers of equality comparisons are collections – particularly dictionaries (which is why it’s so important that the interfaces should include hashcode generation). With the current way that .NET generics work, it would be tricky to have a constraint on a constructor such that if you only specified the types, it would only work if the key type implemented IEquatable<T>, but it’s easy enough to do with static methods (on a non-generic type). Alternatively you could specify any type and an appropriate IEqualityComparer<T> to use for the keys. We’d need an IdentityComparer<T> to work just with references (and provide the equivalent functionaliy to Object.GetHashCode) but that’s not hard – and it would be absolutely obvious what the comparison was when you built the dictionary.

Monitors and threading

This is possibly my biggest gripe. The fact that every object has a monitor associated with it was a mistake in Java, and was unfortunately copied in .NET. This promotes the bad practice of locking on “this” and on types – both of which are typically publicly accessible references. I believe that unless a reference is exposed explicitly for the purpose of locking (like ICollection.SyncRoot) then you should avoid locking on any reference which other code knows about. I typically have a private read-only variable for locking purposes. If you’re following these guidelines, it makes no sense to be able to lock on absolutely any reference – it would be better to make the Monitor class instantiable, and make Wait/Pulse/PulseAll instance members. (In Java this would mean creating a new class and moving Object.wait/notify/notifyAll members to that class.)

This would lead to cleaner, more readable code in my view. I’d also do away with the “lock” statement in C#, making Monitor.Enter return a token implementing IDisposable – so “using” statements would replace locks, freeing up a keyword and giving the flexibility of having multiple overloads of Monitor.Enter. Arguably if one were redesigning all of this anyway, it would be worth looking at whether or not monitors should really be reentrant. Any time you use lock reentrancy, you’re probably not thinking hard enough about the design. Now there’s a nice overgeneralisation with which to end this section…

String representations

This is an interesting one. I’m genuinely on the fence here. I find ToString() (and the fact that it’s called implicitly in many circumstances) hugely useful, but it feels like it’s attempting to satisfy three different goals:

  • Giving a developer-readable representation when logging and debugging
  • Giving a user-readable representation as part of a formatted message in a UI
  • Giving a machine-readable format (although this is relatively rare for anything other than numeric types)

It’s interesting to note that Java and .NET differ as to which of these to use for numbers – Java plumps for “machine-readable” and .NET goes for “human-readable in the current thread’s culture”. Of course it’s clearer to explicitly specify the culture on both platforms.

The trouble is that very often, it’s not immediately clear which of these has been implemented. This leads to guidelines such as “don’t use ToString() other than for logging” on the grounds that at least if it’s implemented inappropriately, it’ll only be a log file which ends up with difficult-to-understand data.

Should this usage be explicitly stated – perhaps even codified in the name: “ToDebugString” or something similar? I will leave this for smarter minds to think about, but I think there’s enough value in the method to make it worth keeping.

MemberwiseClone

Again, I’m not sure on this one. It would perhaps be better as a static (generic!) method somewhere in a class whose name indicated “this is for sneaky runtime stuff”. After all, it constructs a new object without calling a constructor, and other funkiness. I’m less bothered by this than the other items though.

Conclusion

To summarise, in an ideal world:

  • Equals and GetHashCode would disappear from Object. Types would have to explicitly say that they could be compared
  • Wait/Pulse/PulseAll would become instance methods in Monitor, which would be instantiated every time you want a lock.
  • ToString might be renamed to give clearer usage guidance.
  • MemberwiseClone might be moved to a different class.

Obviously it’s far too late for either Java or .NET to make these changes, but it’s always interesting to dream. Any more changes you’d like to see? Or violent disagreements with any of the above?

List of talks now up on C# in Depth site

I think I’ve now done enough public speaking to make it worth having a page with resources etc, so I’ve added it to the C# in Depth site. Where feasible, it will include slides, code and videos (and anything else suitable, really) from various events.

I haven’t got any more booked at the moment, although I’ve had a couple of invitations I really ought to sort out at some point. A couple of points of interest:

  • I’ve included “before, during and after” code for the “LINQ to Objects in 60 minutes” presentation at DDD. The “before” code is just unit tests and shells of signatures etc. The “during” code is what I was able to get done during the event. The “after” code is a bit more complete – basically I implemented everything I already had unit tests for.
  • I’ve included a brief summary of each of the Copenhagen videos. The last two talks are the ones with the most “new” material for C# in Depth readers, although there are a few other nuggets scattered around the place. (Fun tip: if you download the wmv files, you can watch at 1.5x speed. It makes it look like I can type really fast!)

The video from DDD isn’t up yet, but when it is I’ll edit the page.