Redesigning System.Object/java.lang.Object

I’ve had quite a few discussions with a colleague about some failures of Java and .NET. The issue we keep coming back to is the root of the inheritance tree. There’s no doubt in my mind that having a tree with a single top-level class is a good thing, but it’s grown a bit too big for its boots.

Pretty much everything in this post applies to both .NET and Java, sometimes with a few small changes. Where it might be unclear, I’ll point out the changes explicitly – otherwise I’ll just use the .NET terminology.

What’s in System.Object?

Before we work out what we might be able to change, let’s look at what we’ve got. I’m only talking about instance methods. At the moment:

Life-cycle and type identity

There are three members which I believe really need to be left alone.

We need a parameterless constructor because (at least with the current system of chaining constructors to each other) we have to have some constructor, and I can’t imagine what parameter we might want to give it. I certainly find it hard to believe there’s a particular piece of state which really deserves to be a part of every object but which we’re currently missing.

I really don’t care that much about finalizers. Should the finalizer be part of Object itself, or should it just get handled automatically by the CLR if and only if it’s defined somewhere in the inheritance chain? Frankly, who cares. No doubt it makes a big difference to the implementation somewhere, but that’s not my problem. All I care about when it comes to finalizers is that when I have to write them it’s as easy as possible to do it properly, and that I don’t have to write them very often in the first place. (With SafeHandle, it should be a pretty rare occurrence in .NET, even when you’re dealing directly with unmanaged resources.)

GetType() or (getClass() in Java) is pretty important. I can’t see any particular alternative to having this within Object, unless you make it a static method somewhere else with an Object parameter. In fact, that would have the advantage of freeing up the name for use within your own classes. The functionality is sufficiently important (and really does apply to every object) that I think it’s worth keeping.

Comparison methods

Okay, time to get controversial. I don’t think every object should have to be able to compare itself with another object. Of course, most types don’t really support this anyway – we just end up with reference equality by default.

The trouble with comparisons is that everyone’s got a different idea of what makes something equal. There are some types where it really is obvious – there’s only one natural comparison. Integers spring to mind. There are other types which have multiple natural equality comparisons – floating point numbers (exact, within an absolute epsilon, and within a relative epsilon) and strings (ordinal, culture sensitive and/or case sensitive) are examples of this. Then there are composite types where you may or may not care about certain aspects – when comparing URLs, do I care about case? Do I care about fragments? For http, if the port number is explicitly specified as 80, is that different to a URL which is still http but leaves the port number implicit?

.NET represents these reasonably well already, with the IEquatable<T> interface saying “I know how to compare myself with an instance of type T, and how to produce a hashcode for myself” and IEqualityComparer<T> interface saying “I know how to compare two instances of T, and how to produce a hashcode for one instance of T.” Now suppose we didn’t have the (nongeneric!) Equals() method and GetHashCode() in System.Object. Any type which had a natural equality comparison would still let you compare it for equality by implementing IEquatable<T>.Equals – but anything else would either force you to use reference equality or an implementation of IEqualityComparer<T>.

Some of the principle consumers of equality comparisons are collections – particularly dictionaries (which is why it’s so important that the interfaces should include hashcode generation). With the current way that .NET generics work, it would be tricky to have a constraint on a constructor such that if you only specified the types, it would only work if the key type implemented IEquatable<T>, but it’s easy enough to do with static methods (on a non-generic type). Alternatively you could specify any type and an appropriate IEqualityComparer<T> to use for the keys. We’d need an IdentityComparer<T> to work just with references (and provide the equivalent functionaliy to Object.GetHashCode) but that’s not hard – and it would be absolutely obvious what the comparison was when you built the dictionary.

Monitors and threading

This is possibly my biggest gripe. The fact that every object has a monitor associated with it was a mistake in Java, and was unfortunately copied in .NET. This promotes the bad practice of locking on “this” and on types – both of which are typically publicly accessible references. I believe that unless a reference is exposed explicitly for the purpose of locking (like ICollection.SyncRoot) then you should avoid locking on any reference which other code knows about. I typically have a private read-only variable for locking purposes. If you’re following these guidelines, it makes no sense to be able to lock on absolutely any reference – it would be better to make the Monitor class instantiable, and make Wait/Pulse/PulseAll instance members. (In Java this would mean creating a new class and moving Object.wait/notify/notifyAll members to that class.)

This would lead to cleaner, more readable code in my view. I’d also do away with the “lock” statement in C#, making Monitor.Enter return a token implementing IDisposable – so “using” statements would replace locks, freeing up a keyword and giving the flexibility of having multiple overloads of Monitor.Enter. Arguably if one were redesigning all of this anyway, it would be worth looking at whether or not monitors should really be reentrant. Any time you use lock reentrancy, you’re probably not thinking hard enough about the design. Now there’s a nice overgeneralisation with which to end this section…

String representations

This is an interesting one. I’m genuinely on the fence here. I find ToString() (and the fact that it’s called implicitly in many circumstances) hugely useful, but it feels like it’s attempting to satisfy three different goals:

  • Giving a developer-readable representation when logging and debugging
  • Giving a user-readable representation as part of a formatted message in a UI
  • Giving a machine-readable format (although this is relatively rare for anything other than numeric types)

It’s interesting to note that Java and .NET differ as to which of these to use for numbers – Java plumps for “machine-readable” and .NET goes for “human-readable in the current thread’s culture”. Of course it’s clearer to explicitly specify the culture on both platforms.

The trouble is that very often, it’s not immediately clear which of these has been implemented. This leads to guidelines such as “don’t use ToString() other than for logging” on the grounds that at least if it’s implemented inappropriately, it’ll only be a log file which ends up with difficult-to-understand data.

Should this usage be explicitly stated – perhaps even codified in the name: “ToDebugString” or something similar? I will leave this for smarter minds to think about, but I think there’s enough value in the method to make it worth keeping.

MemberwiseClone

Again, I’m not sure on this one. It would perhaps be better as a static (generic!) method somewhere in a class whose name indicated “this is for sneaky runtime stuff”. After all, it constructs a new object without calling a constructor, and other funkiness. I’m less bothered by this than the other items though.

Conclusion

To summarise, in an ideal world:

  • Equals and GetHashCode would disappear from Object. Types would have to explicitly say that they could be compared
  • Wait/Pulse/PulseAll would become instance methods in Monitor, which would be instantiated every time you want a lock.
  • ToString might be renamed to give clearer usage guidance.
  • MemberwiseClone might be moved to a different class.

Obviously it’s far too late for either Java or .NET to make these changes, but it’s always interesting to dream. Any more changes you’d like to see? Or violent disagreements with any of the above?

29 thoughts on “Redesigning System.Object/java.lang.Object”

  1. Another benefit of removing Equals and GetHashCode from Object would be that value type implementers would be forced to write them explicitly, rather than relying on the runtime’s rather inefficient default implementation; it would also avoid boxing when calling Object.Equals (Framework Design Guidelines devotes several sections to these points.)

    Like you, I always lock on a private object. Having Monitor.Enter being an instance method (that returns IDisposable) would allow classes like TimedMonitor (http://code.logos.com/blog/2008/12/profiling_lock_contention.html) to be much easier drop-in replacements.

    Like

  2. Great post! I agree with pretty much everything you say.

    Instead of ToString or to_string, Python has __string__ and __repr__.

    __repr__ returns a machine readable value which OFTEN can be evaluated back into the that object by executing that string at the interactive prompt. Super useful, especially when used with the totally awesome doctest module.

    Like

  3. I completely agree that .GetHashcode() is better off not being on object. There are an incredible number of implicit contracts that come with using .GetHashcode(). The vast majority of users are completely unaware of these contracts and consequently implement GetHashcode() incorrectly. What’s even worse is the contracts are so implicit that even when confronted with the bugs users are quite simply apathetic. “Well, I’ve been doing it that way for X years so it can’t really be wrong.”

    Like

  4. @ICR: What else do you think GetHashCode is used for other than equality? Even if it *is* used for something else, you could still use it via an IEqualityComparer.

    Like

  5. Actually, even Microsoft’s _default_ implementation of GetHashCode is incorrect, by the definition of a hash code.

    Hash codes for two objects that might be equal must always be equal. But the default implementation returns a value based on the object’s memory address, which means that the hash codes returned by two different objects are always different, which means that GetHashCode is by default wrong for any possible override of Equals.

    Thus, an endless stream of obscure bugs when random BCL collections turn out to call GetHashCode which people didn’t override because they didn’t intend to use their classes in a hashtable…

    The default implementation of GetHashCode should return constant zero — that would be a correct (if inefficient) value for any possible Equals override. Any address-based value that the runtime needs should have been hidden in the private implementation of the runtime, not published in GetHashCode.

    Or just not have GetHashCode in Object in the first place, as Jon suggested. That’s arguably an even better solution.

    Like

  6. @Chris: That’s why when you override Equals without overriding GetHashCode, the C# compiler issues a warning. I don’t have much sympathy for people who ignore warnings. Returning 0 would make hashtables basically useless for types which don’t override Equals/GetHashCode.

    But yes, all of this would be clearer if they were part of a separate interface :)

    Like

  7. The way that you can lock on any arbitrary object in .net seems unusual to me and reminds me a bit of the way that C++ allows you to throw any arbitrary data type as an “exception”. The .net designers certainly had no problems seeing that design decision in C++ as an error, I wonder why they didn’t recognize the same sort of problems with the design of multi-thread locking on objects in .net.

    Like

  8. @skeet well, of course, the fact that it’s necessary to hack the compiler to issue a warning about failing to implement an implicit interface (e.g. “IHashable” or some such) is very much a “smell” (I suppose there can be syntax smells and compiler smells as well as code smells) that they made a design error.

    Like

  9. Jon, the problem is that the BCL uses GetHashCode all over the place as a shortcut for equality comparisons, not just in hashtables. If GetHashCode returned zero by default the performance degradation when only overriding Equals should be negligible in many cases. That’s what I think would warrant a mere compiler warning.

    As it stands, not overriding GetHashCode should be a fatal error, not just a warning, because an Equals implementation without GetHashCode will cause outright incorrect results in many unexpected places within the BCL.

    Understandably, users don’t expect this and will ignore a GetHashCode warning if they don’t intend to use their type in a hashtable. Moreso since implementing a correct hash code that does not just return zero is indeed quite difficult. That’s just a bad situation that the System.Object designers have placed their users in.

    Like

  10. The performance degredation when only overriding Equals would be negigible in some cases, but *terrible* in others.

    I agree with wedge that it’s a design smell that the compiler needs to know to warn about it – and maybe the warning should be stronger – but *given* that it’s part of Object, I think the current implementation is okay.

    With the current implementation, if developers want to implement GetHashCode in a lazy way they can always just return 0 – whereas if Object.GetHashCode returned 0, getting to the current (identity) situation would be impossible without extra support.

    Like

  11. Wouldn’t the removal of GetHashCode require you to create your own anytime you wanted to add your object to a Dictionary? It is pretty nice to not really have to work hard to get a free hash code implementation to throw your object into hash tables.

    I think (and thanks to Bob Jenkins, I know) writing a correct hash code implementation is harder than writing a correct Equals/GetHashCode contract.

    Like

  12. @C.Watford – if you want to provide your own implementation, you could either do it in the class (and implement IEquatable) or do it in a separate class and implement IEqualityComparer. If you wanted to use the “equality is identity” comparison you’d use a new IdentityComparer class, and not write your own code.

    In other words, you wouldn’t have to “work hard” at all. You’d never have to write more code than you do now – but you’d be explicitly saying, “Build a dictionary using identity comparisons” if you were just using the stock implementation. I believe being explicit about that is a good thing.

    Like

  13. object: if ever was anything `abstract`, then surely this is it? The parameterless constructor should then be protected. There are a few uses of raw object – often as sync locks (which you’ve already covered), but also sometimes for keys (for example, EventHandlerList) – this would be trivial to fix with a more suitable (concrete) key object.

    Like

  14. An awesome admittion of all the design mistakes and suprisingly frank Java copying argument, directly from MS.. wow, this is worth reading.

    It is never too late to fix VM deficiencies though, it should be a priority. Quite obviously, and in dramatic fashion, the object bloat is spreading far far beyond any hardware that can handle it (see WPF and any other high-level lib Java 6.0 performance; it is awful).

    Multiple inheritance, I won’t go into, just like you, but an elegant design should enable it: mix-ins included.

    Like

  15. Another point to the above wow, and although it might seem a rant, it really doesn’t take much of a glasses/lens fix to read your entry from an angle that solves all the problems:

    Refactor the entire show without System.Object and you’re done.

    Naturally, you have to stop adding further to the bloat, freeze base APIs.. even then, however, efficient translation is the most beautiful feature any compiler does, no matter what the language or typeless ‘void’/’object’ code base is.

    M _

    Like

  16. @Wowow ow: I don’t work for MS. I never have done. I’m an MVP, but that doesn’t mean I have to toe any party lines, or anything like that.

    @MK: Removing System.Object – no, I don’t buy it. The need for it goes down significantly with generics, but every so often it really is handy to have a reference to “something, but I’ve no idea what and don’t really care”.

    Like

  17. Enjoyed the post.

    Historically, I’ve been opposed to an “object” type in any language, since I prefer enforcing a very strict, strong type safety. However, this causes problems when you *do* need a variant, and solutions like boost::any are not pretty to implement.

    Regarding your article, I completely agree with the comparision and threading. Especially the threading. I’ve done multithreaded programming and synchronization from device drivers to web services over a period of 13 years. Every so often someone comes up with the idea that an implicit or very simplistic monitor (e.g., C#’s lock) will solve all multithread contention woes. But no, it doesn’t. Multithreaded synchronization is hard, and creating a simplistic sync scheme like this only makes it appear easy. For simple problems, it works; and then for more complex problems, some programmers try to solve it using the simplistic sync scheme (which they never really understood in the first place), and suddenly realize that “lock(this)” isn’t a magic bullet… (and I’m not even touching all the problems that can come up when locking a non-private object)

    Anyway… I also agree with your last two points. ToString() could have a better name. And does anyone even use MemberwiseClone? I’ve always used a private “copy constructor” instead.

    At this point, every member of Object could easily be moved to static members on more appropriate classes (Type.GetType, etc). So, do we really need Object? Just for bypassing the strict type system… so, will we really need it in the near future?

    I’m very excited about the forthcoming C#4 “dynamic”. It seems to me that this type will address several current shortcomings:
    1) Lack of duck typing. I do really miss the duck typing made possible by C++ templates, or “static polymorphism” as we called it.
    2) Awkward syntax for runtime binding. This would help clean up the syntax for LINQ to XML and other hot areas as well.

    So, now here’s a question: when “dynamic” is introduced, will there be any more need for “object”?

    P.S. I finally bought C# in Depth this weekend, and I’ve started my way through it. So far, so good. :) Thanks for having Herb Sutter review – he’s a top mind in language design, even though he’s C++.

    Like

  18. @Stephen: Not sure where you got the Herb Sutter bit from. Eric Lippert was the technical reviewer.

    I’m on the fence when it comes to whether “object” is still needed. It’ll be interested to see what happens in the *next* platform…

    Like

  19. I agree with everything that you said.

    Currently I have implemented a static ‘Hash’ class that supports generating several different correct hash functions given various inputs or byte[]s. If this were in the BCL it would solve alot of issues.

    I also have a static ‘Clone’ class that implements Shallow and Deep by using MemberwiseClone via reflection, or binary serialization, or xml serialization. This would be a better BCL class instead of the current rarely unused implementation.

    Finally, as for an ‘object’ base class. Really this should have been a marker interface with no functionality. It keeps one reference type hierarchy, and allows you to handle having an instance handle where you do not know/care what type it is. As you said the other functions could (should) be re/moved to where they make more sense.

    Like

  20. “GetType() or (getClass() in Java) is pretty important. I can’t see any particular alternative to having this within Object”

    C# already has “typeof()” which only applies to types; it could have been changed to allow instances, having the same behavior as “.GetType()” does today, freeing up the name while not introducing another language element.

    RE: ToString() – it is out of balance — where is FromString() to go with it? Why isn’t there ToXml() and ToBinariy(), too? I think my questions and yours have the same answer.

    Like

  21. I agree about Equals and GetHashCode that should be moved somewhere else as for some objects it does not make sense to have comparison

    However, the default implementation could have been implemented as an overridable shallow memberwise comparison instead of a reference comparison
    If objects can be cloned this way in the object implementation why can they not be compared this way?

    Like

  22. I read over this article and I think this is a great analysis of the root object in any hierarchy. It just seems to be that they put too much into .NET’s System.Object. I find it annoying to have to override Equals and GetHashCode in objects that are going to be put in collections and then I have to implement IEquatable and IComparable (including the non-generic one explicitly). To top that off, in the overridden Equals method, you have to check the type of the parameter. If it doesn’t match the current object type, what do you do? You can’t throw an exception as that is bad practice in such a method (at least that’s what I’ve read), but returning false doesn’t seem to fit the desired result either. After all, it isn’t a false equality, you’re comparing apples to oranges.

    Like

Leave a comment