Don’t let this get away

Josh Twist asked me this via Twitter:

is it possible to invoke a member before a ctor is finished (eg maybe using threaded IL trickery) or is this forbidden somehow? :D

Now I don’t know why everyone seems to think I enjoy writing code which could have bizarre effects on either you, the compiler, the resulting execution or your co-workers… but it’s an interesting topic to look at, anyway.

The perils of partially constructed objects

Hopefully it’s reasonably obvious why it’s dangerous to access a member before it’s been properly constructed – but it may be worse than you’ve considered.

In particular, immutable types are only immutable after they’re fully constructed. It’s entirely reasonable for an immutable type to change read-only fields several times during the course of initialization. The fields can only be set in the constructor itself (or a variable initializer for the field) but this can occur several times. If the constructor for the immutable type exposes the instance it’s constructing to other code, all the immutability guarantees go out of the window.

Even in mostly-mutable types, code may well assume that it’s dealing with some fixed aspects. For example, you may have some database entity type which is either freshly created with a random GUID, or created from an existing record with an ID from the database. In either case, code consuming this type wouldn’t expect to see an ID of Guid.Empty, or for the ID to change after it’s been observed… even if other properties of the object can be changed later.

What C# does to protect you

C# as a language (plus conforming compiler, of course) protects you from some of this.

When you chain to another constructor, you can’t use this to calculate any arguments you want to pass to the other constructor. The code is clearly dealing with a partially constructed object at this point – it knows none of your constructor body has been executed – so it’s protecting you from harm. Unfortunately this means you can’t even call this.GetType(), which can make it tricky to write objects which populate themselves using reflection.

During the constructor body, you have complete access to this of course – you have to, in order to set any state within the object. This is where things can get nasty.

Virtual methods

One way in which Java, C# and C++ diverge in their constructor behaviour is with regard to virtual methods:

  • In C++, the object only really "becomes" an instance of the subclass when the subclass constructor has been executed, so calling a virtual method will only execute the override in the "current" type hierarchy.
  • In Java, the object is of the final type from the start, so the most deeply overridden implementation of the virtual method is called – but this will occur without any initialization having taken place. All fields will still have their default values (null, 0 etc).
  • C# is like Java, except that variable initializers will have been executed (as they’re executed before the base class constructor is called). In other words, initialization within the constructor body won’t have taken place, but any fields which are initialized as part of the declaration will have their appropriate values.

This is really dangerous if you’re not aware of it. In particular, any time you override a virtual method in Java or C#, you need to know whether it might be called in a partially-initialized state.

Wherever possible, try not to call virtual methods from the constructor for precisely this reason. I would advise that if you absolutely have to do it (I failed to remove this behaviour when porting Joda Time to Noda Time, for example) you document that fact very heavily and make sure that you don’t call the method in any other place. Make it protected, too. Basically it should only be part of initialization. If you need similar behaviour at other times, create another method. This allows derived classes to tailor their implementation to the expected state at the time of invocation.

Callbacks

You may be thinking that this is all easy: just avoid virtual methods, don’t do anything stupid like setting a static variable to this during a constructor (making it visible from other threads before initialization is completed) and you’ll be fine.

Well, I suspect that almost every Windows Forms app in existence publishes this during the constructor. Any time you have an event handler, that’s effectively providing a callback… and if that’s an instance method, it’s tied to the relevant instance, usually this.

How sure are you – really, really sure – that none of those event handlers will fire as part of the rest of the initialization? For example, if you use Visual Studio to hook up the ControlAdded event for a WinForms form, and also add a bunch of controls to the form… when is that event going to fire? Will the autogenerated code add the event handler after it adds the controls, or before? If it adds the handler at the start, then clearly the method handling the event will be called before your constructor finishes… so you need to be ready for that.

How much of a problem is this really?

Like many matters of purity, I suspect this is usually more of a theoretical issue than a practical one. In complicated situations like the Windows Forms one above, most event handlers are likely to be fired after initialization… and there’s typically not as strong a sense of invariants being set up as there would be in an immutable data type, for example.

Immutable data types, in turn, are less likely to accidentally let this escape during construction… but the consequences of them doing so are much more severe, of course.

Conclusion

To answer Josh’s question: Yes. At least on the simplest reading of the question: members can certainly end up being invoked on an object during its construction. They can potentially end up being invoked on multiple threads during construction. This is basically under the control of the constructors in the type hierarchy though.

In particular, I believe that the .NET memory model is stricter than the ECMA specification in terms of threading: I believe a constructor will have completed (and all its writes retired) before the reference returned by the constructor can be published to another variable, which was a concern in double-checked locking. It’s a valid concern to consider though.

Alternative conclusion: almost nothing is really as simple as it appears to be.

13 thoughts on “Don’t let this get away”

  1. You missed two: pass a delegate into the ctor; or call a method on a singleton/static. This allows you to circumvent the second paragraph in your conclusion. I do agree with you wholly though, the idea of side-effects during the ctor is just asking for trouble.

    I have done experimentation though. All field initializers are done before the first line of your ctor (but remember, `this` is not a valid field initializer); and your object is a valid instance when the first line of your ctor code is executed (most possibly not verifiable).

    I have used this behavior once before for legitimate reasons: a rather fancy game networking library I wrote. It was a hairy experience and I wouldn’t wish it on my worst enemy. Don’t do it.

    Like

  2. @Jonathan: Importantly, all field initializers are executed before your *base* constructor is called, too. Basically all field initializers occur before any constructor body code, if everything’s in C#. That’s different to Java. As for an object being a valid instance… it’s certainly valid as far as the CLR is concerned, but it may well not be valid in terms of object invariants :)

    Like

  3. On the memory model:
    If you are storing the reference to the new object in a volatile field, then it’s safe because the memory writes to the object cannot be moved below the volatile write (write barrier).
    The above is valid for the ECMA memory model – the MS .NET Framework implementation has a much nicer guarantee: EVERY write works as if it was volatile. (see http://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S5)

    Like

  4. “Will the autogenerated code add the event handler after it adds the controls, or before?”

    Aren’t the event handlers executed on the same thread that’s running the constructor? You could have some exotic setup that marshals the event to another thread, but it seems for the designer-generated code even if the event is raised while you’re in the constructor, the event handler won’t execute until the constructor is finished. Is this correct?

    Like

  5. @Owen: Nope. Even though it’s on the same thread, why would that mean it waits until the constructor has run? If you’ve subscribed to the ControlAdded event handler, then use Controls.Add(…), why would it wait until after your constructor call had finished before calling your event handler?

    It’s all in the same thread, but that doesn’t mean it’ll all happen after the constructor has finished…

    Like

  6. “Even though it’s on the same thread, why would that mean it waits until the constructor has run?”

    I think Owen’s point (he can correct me if I’m wrong :) ) is that the _threading_ issues don’t come up for the event handlers in Forms objects.

    Which is not to say there aren’t other issues. It’s not uncommon for the various “…Changed” events to get fired during construction, for example, even before interesting/useful members have been initialized. Gotta watch out for those divide-by-zeroes when trying to handle control size changes, for example. :)

    In fact, it’s a reasonably common problem in Forms programming, with one common solution being to suppress the code in event handlers until the constructor is finished or the Load or Shown events have been raised (via some flag, or just checking for default values in the class members of interest).

    But the _threading_ issues can generally be safely ignored for Forms objects.

    Like

  7. @Peter: I’m not sure that was Owen’s point – or at least not all of it… as otherwise he wouldn’t have said that “the event handler won’t execute until the constructor is finished”.

    But you’re right – threading isn’t one of the issues here.

    Like

  8. So, how would you categorise the Windows Forms code that hooks up event handlers and does other sorts of work in the constructor? Slightly awkward? A necessary evil? Just plain bad? Is it a sign of a bad API if you’re encouraged or required to do this sort of thing in a constructor?

    Like

  9. Ah, you’re right. I glossed over the exact text, and inferred a meaning that made more sense than what was actually written. :)

    Like

  10. @Weeble: I think “slightly awkward” is probably a reasonable description. Worth avoiding where possible, worth being aware of where necessary.

    Like

  11. @Weeble: for better or worse, hooking up event handlers in a constructor is not uncommon at all. Forms makes is a bit worse by doing a lot of the work in the Designer-generated code, both event handler subscriptions _and_ property initialization, which is what leads to the events that have been subscribed getting raised.

    On the other hand, because C#’s design ensures that fields (and thus properties dependent on fields) are all initialized to _something_ before any constructor gets to run, at least event handlers will see valid data, even if just initialized to the default.

    Ideally, event subscriptions would always be done last in the constructor. But IMHO for this particular subset of “problems”, it’s just one of those things you have to learn to deal with when writing Forms code, and it becomes reasonably second-nature to just make sure the code works with default-initialized values.

    Finally note that a lot of the event subscription that goes on in the constructor of a Forms object is to subscribe to _other_ object events, not the “this” object. You still have the issue when dealing with custom Control sub-classes (including Form sub-classes), and of course with “On…()” method overrides. But the problem is mitigated at least somewhat in scope.

    Like

  12. “note that a lot of the event subscription that goes on in the constructor of a Forms object is to subscribe to _other_ object events, not the “this” object.”

    Clarification for the above: I don’t mean to say subscribing to other controls’ events eliminates problems. It’s just that they are less likely to cause problems. Obviously, if your event handler is looking at “this” while it’s handling an event for some other object, lack of full initialization can still come up.

    Like

Leave a comment