You are all individuals! (I’m not…)

I’ve been meaning to post this for a while, but recently a couple of events have coincided, reminding me about the issue.

First, Joe Duffy blogged in defence of premature optimization. Second, I started reading Bill Wagner’s Effective C#, 2nd edition, which contains advice such as "make almost all your types serializable". Now, let’s be clear: I have a great deal of respect for both of these gentlemen… but in both cases I think there’s a problem: to some extent they’re assuming a certain type of development.

In some cases, you really, really want to understand the nuts and bolts of every bit of performance. If, for example, you’re writing a parallelization library to be the part of the .NET framework. For Noda Time I’m pretty obsessed with performance, too – I really want it to be very fast indeed. And to be clear, Joe does give a certain amount of balance in the article… but I think it’s probably still biased due to his background on working on libraries where it really, really matters. For many developers, it’s vastly preferable to have the in-house HR web app used by 100 people take a little bit more time to process each request than to take an extra few days of developer work (cumulative) making sure that every little bit of it is as fast as possible. And many of the questions I’ve seen on Stack Overflow are asking for micro-optimizations which are really, really unlikely to matter. (EDIT: Just to be clear, there’s a lot of stuff I agree with in Joe’s post, but I think enough of us find correctness hard enough to start with, without having to consider every possible performance hit of every statement. At least some of the time.)

Likewise for a certain class of development, it probably does make sense to make most types serializable. If most of your objects are modelling data, serialization really will be a major factor. For other people, it won’t be. Most of my working life has been spent writing code which really doesn’t need to serialize anything… or which uses Protocol Buffers for serialization, in order to preserve portability, compactness and flexible versioning. Very few of my types should really be serializable using the platform-default binary serialization (whether in Java or .NET). Relatively few of them need to be serializable at all.

Finally, I’ll mention another example I’ve probably been guilty of: the assumption that a "public API" really can’t be changed without major hassle. An example of this is making a "public const" in C#, and later wanting to change the value of it. "No," I hear you cry… "Make it a public static readonly field instead, to avoid callers baking the value into their compiled code." Absolutely. If you’re in a situation where you may well not know all of your callers, or can’t recompile them all on every deployment, that’s great advice. But I suspect a lot of developers work in environments where they can recompile everything – where the only code which calls their code is written within the same company, and deployed all in one go.

In short, we’re not all writing system libraries. We’re not all writing data-driven business apps. We’re not all writing the same kind of code at all. Good "one size fits all" advice is pretty rare, and "we" (the community preaching best practices etc) should take that into account more often. I absolutely include myself in that chastisement, too.

15 thoughts on “You are all individuals! (I’m not…)”

  1. You get similar problems simply renaming private field names in Serializable classes. Serialized data uses the private field name and when trying to re-load the data with a different private field name won’t load the old data…

    Like

  2. I agree, speaking from a data-driven business apps point of view. The only time we really make serializable types is when we need to utilize Out of Proc sessioning in SQL Server for our web apps.

    Like

  3. I mostly agree with you, but can you give me an example where using public const rather than readonly actually matters? I’d regard public-const as either a premature micro-optimization or a sign that the public interface could use some redesign.

    Like

  4. Making classes serializable is still good practice IMHO, even if the classes are not really meant to be persisted.

    As soon as you cross context boundaries, for instance by using a separate AppDomain, you need to deal with types which are either serializable or descendants of MarshalByRefObject.

    Let’s assume you write some code which loads add-ins and have it load and run the add-ins in a separate AppDomain for security reasons and also because there is no other way assemblies can be unloaded. All the (data) objects which are to be passed across the boundary need to be serializable. Let me tell you that you’re going to be unhappy if your code fails at runtime because something is not serializable, especially if you don’t have control over the code involved.

    Like

  5. @Sune: Well, if const *feels* more natural, I’d argue it’s fine. But yes, there aren’t many downsides to using static readonly.

    @Lucero: “Let’s assume you write some code which loads add-ins and have it load and run the add-ins in a separate AppDomain for security reasons” – this is precisely the point of my post. Why should we assume that? What proportion of developers are *actually* in that position? Why go to extra work (and handling serialization really well *is* extra work) when there’s no indication that you need that ability?

    Like

  6. It’s as if you’ve read my mind, Jon :).

    A related rant: I just don’t understand why it is that interviewers *love* to ask about serialization in particular. I always end up discovering that they hardly (if ever) use the platform-default serialization mechanism anyway.
    What ORM do they use for persistence? None – low level ADO.NET. Instant job turn-down.

    Like

  7. Thank you for saying something that I have felt but have not been able to articulate. The “external library” vs. “internal app” difference is a valid one.

    Coding situations are not created equal.

    Like

  8. @Lucero and @skeet:

    My two cents about serialization ..

    In my humble opinion, .NET’s BinaryFormatter is the one to blame, tar and feather. If BinaryFormatter encounters some awfully simple, yet not serializable type, it just chokes and throws (up).

    Instead it should use reflection on that type and simply serialize field by field. That would cover almost all use cases except the really weird – and then you can still write your own serialization code.

    BinaryFormatter itself is an example, where ‘being obsessed with performance’ would have been nice. Its runtime performance scales really bad, and the serialized stream is much bigger than needed.

    So a while ago, I sat down and wrote my own replacement Formatter and now I’m living happily ever after .. :)

    Like

  9. @MillKa: I don’t know that I’d want my types to be serialized without me saying so. Otherwise I have no idea to what extent I should be considering them for versioning etc.

    However, all this is of course away from the topic of the blog post, which was merely using serialization as an example…

    Like

  10. @skeet:

    ‘.. merely an example ..’: Yep I’m kinda off topic, but I think the laziness in BinaryFormatter is the main reason why Wagner recommends to ‘make almost all of your types serializable’. Because it is usually always painful and sometimes even impossible to add serialization code to types that are not our own code.

    And since I am already off topic ..

    ‘.. my types to be serialized without me saying so ..’: Either decorate them with [NotSerializable] or implement ISerializable and my formatter wont do any naughty stuff behind your back. And for those cases, where you cant add your serialization code to someone elses 3rd party type, my formatter happily accepts a register-this-serialization-code-for-that-type call.

    ‘.. versioning ..’: Of course, everyones mileage varies, but so often I have to pump stuff thru the wire, to/from disk or to another AppDomain, while I had to bother with versioning very seldom. So I would be very happy, if BinaryFormatter had a faster and smaller sibling without versioning, but performance (in memory space aw well as runtime) much closer to C(++) fwrite (memory to stream) and fread (stream to memory).

    Like

  11. The piece of this conversation that’s missing for me is context. That is the medium in which these decisions are made. Without it, there is no criteria for choosing approach A instead of B or C – how do you know which is the optimal choice if you don’t know what problem you’re solving?

    Saying “you should do this in every situation” explicitly eschews context, and removes decision-making from the process of coding. While that may seem attractive on the surface, it also removes the ability to adjust technique to match input.

    This is akin to a physical architect saying “Well of course every building should have a lobby! All of the buildings I have ever designed had lobbies!” He just hasn’t encountered an exception to his rule.

    Jimmy Bogard has a great post on this:

    http://www.lostechies.com/blogs/jimmy_bogard/archive/2010/01/26/context-and-best-practices.aspx

    Like

  12. There is one thing Joe Duffy says that resonates deeply: “Rather than making one routine 1000% slower, you may have made your entire program 3% slower. Make enough of these sorts of decisions, and you will have dug yourself a hole deep enough to take a considerable percentage of the original development time just digging out.”

    I’ve seen programs that were just badly written. The architecture was relatively okay; there were no real tight-look-calling-ridiculous-functions problem; it’s just that everything they did was slow, because whenever you had two ways to do something, they’d choose the slowest. How are you going to find with a profiler that in thousands of locations they perform a list scan instead of using a set, for example? And then you’d have to fix them one-by-one. Because it’s not only code that runs again and again that’s the problem. It’s the coding style that ignores performance altogether.

    Like

  13. That said, Joe’s following example is ridiculous. He’s looking at code that goes over an array using linq, and complains about the allocation of delegate objects. And then, “In all likelihood, the Where and Select operators are going to allocate new IEnumerable and new IEnumerator objects”

    Like

  14. I often hit this problem, e.g. in most job I have had we have had a build system that “recompile everything”. However you still get lots of developers quoting the “.net framework coding standard” on items like not changing value of consts.

    Like

Leave a comment