What’s in a name?

T.S. Eliot had the right idea when he wrote “The naming of cats”:

The Naming of Cats is a difficult matter,
It isn’t just one of your holiday games

When you notice a cat in profound meditation,
The reason, I tell you, is always the same:
His mind is engaged in a rapt contemplation
Of the thought, of the thought, of the thought of his name:
His ineffable effable
Effanineffable
Deep and inscrutable singular Name.

Okay, so developers may not contemplate their own names much, but I know I’ve certainly spent a significant amount of time recently trying to work out the right name for various types and methods.  It always feels like it’s just out of reach; tauntingly, tantalisingly close.

Recently I’ve been thinking a bit about what the goals might be in coming up with a good name. In particular, I seem to have been plagued with the naming problem more than usual in the last few weeks.

Operations on immutable types

A while ago I asked a question on Stack Overflow about naming a method which “adds” an item to an immutable collection. Of course, when I say “adds” I mean “returns a new collection whose contents is the old collection and the new item.” There’s a really wide range of answers (currently 38 of them) which mostly seem to fall into four categories:

  • Use Add because it’s idiomatic for .NET collections. Developers should know that the type is immutable and act accordingly.
  • Use Cons because that’s the term functional programming has used for this exact operation for decades.
  • Use a new method name (Plus being my favourite at the moment) which will be obvious to non-functional developers, but without being so familiar that it suggests mutability.
  • Use a constructor taking the old collection and the new item.

Part of the reasoning for Add being okay is that I originally posted the question purely about “an immutable collection” – e.g. a type which would have a name like ImmutableList<T>. I then revealed my true intention (which I should have done from the start) – to use this in MiniBench, where the “collection” would actually be a TestSuite. Everything in MiniBench is immutable (it’s partly an exploration in functional programming, as it seems to fit very nicely) but I don’t want to have to name every single type as Immutable[Whatever]. There’s the argument that a developer should know at least a little bit about any API they’re using, and the immutability aspect is one of the first things they should know. However, MiniBench is arguably an extreme case, because it’s designed for sharing test code with people who’ve never seen it before.

I’m pretty sure I’m going to go with Plus in the end:

  • It’s close enough to Add to be familiar
  • It’s different enough to Add to suggest that it’s not quite the same thing as adding to a normal collection
  • It sounds like it returns something – a statement which just calls Plus without using the result sounds like it’s wrong (and indeed it would be)
  • It’s meaningful to everyone
  • I have a precedent in the Joda Time API

Another option is to overload the + operator, but I’m not really sure I’m ready to do that just yet. It would certainly leave brief code, but is that really the most important thing?

Let’s look at a situation with some of the same issues…

LINQ operators

Work on MoreLINQ has progressed faster than expected, mostly because the project now has four members, and they’ve been expending quite a bit of energy on it. (I must do a proper consistency review at some point – in particular it would be nice to have the docs refer to the same concepts in the same way each time. I digress…)

Most of the discussion in the project hasn’t been about functionality – it’s been about naming. In fact, LINQ is particularly odd in this respect. If I had to guess at how the time has been spent (at least for the operators I’ve implemented) I’d go for:

  • 15% designing the behaviour
  • 20% writing the tests
  • 10% implementation
  • 5% writing the documentation (just XML docs)
  • 50% figuring out the best name

It really is that brutal – and for a lot of the operators we still haven’t got the “right” name yet, in my view. There’s generally too much we want to convey in a word or two. As an example, we’ve got an operator similar to the oft-implemented ForEach one, but which yields the input sequence back out again. Basically it takes an action, and for each element it calls the action and then yields the element. The use case is something like logging. We’ve gone through several names, such as Pipe, Tee, Via… and just this morning I asked a colleague who suggested Apply, just off the top of his head. It’s better than anything we’d previously thought of, but does it convey both the “apply an action” and “still yield the original sequence” aspects?

The old advice of “each method should only do one thing” is all very well, and it clearly helps to make naming simpler, but with situations like this one there are just naturally more concepts which you want to get across in the name.

Let’s stay on the LINQ topic, but stray a bit further from the well-trodden path…

The heart of Push LINQ: IDataProducer

I’ve probably bored most of you with Push LINQ by now, and I’m not actively developing it at the moment, but there’s still one aspect which I’m deeply uncomfortable with: the core interface. IDataProducer represents a stream of data which can be observed. Basically clients subscribe to events, and their event handlers will be called when data is “produced” and when the stream ends.

I know IDataProducer is an awful name – but so far I haven’t found anything better. IObservable? Ick. Overused and isn’t descriptive. IPushEnumerable? Sounds like the client can iterate over the data, which they can’t. The actual event names (DataProduced/EndOfData) are okay but there must be something better than IDataProducer. (Various options have been suggested in the past – none of them have been so obviously “right” as to stick in my head…)

This situation is slightly different to the previous ones, however, simply because it’s such a pivotal type. You would think that the more important the type, the more important the name would be – but in some ways the reverse is true. You see, Push LINQ isn’t a terribly “obvious” framework. I say that without shame – it’s great at what it does, but it takes a few mental leaps before you really grok it. You’re really going to have to read some documentation or examples before you write your own queries.

Given that constraint, it doesn’t matter too much what the interface is called – it’s going to be explained to you before you need it. It doesn’t need to be discoverable – whereas when you’re picking method names to pop up in Intellisense, you really want the developer to be able to guess its purpose even before they hover over it and check the documentation.

I haven’t given up on IDataProducer (and I hope to be moving Push LINQ into MoreLINQ, by the way – working out a better name is one of the blockers) but it doesn’t feel like quite as much of a problem.

Read-only or not read-only?

This final example came up at work, just yesterday – after I’d started writing this post. I wanted to refactor some code to emphasize which methods only use the read-only side of an interface. This was purely for the sake of readability – I wanted to make it easier to reason about which areas of the code modified an object and which didn’t. It’s a custom collection – the details don’t matter, but for the sake of discussion let’s call it House and pretend we’re modelling the various things which might be in a house. (This is Java, hence House rather than IHouse.)

I’m explicitly not doing this for safety – I don’t mind the fact that the reference could be cast to a mutable interface. The point is just to make it self-documenting that if a method only has a parameter in the non-mutating form, it’s not going to change the contents of the house.

So, we have two interfaces, like this:

public interface NameMePlease
{
    Color getDoorColor();
    int getWindowCount();

    // This already returned a read-only collection
    Set<Furniture> getFurniture();
}

public interface House extends NameMePlease
{
    void setDoorColor(Color doorColor);
    void setWindowCount(int windows);
    void addFurniture(Furniture item);
}

Obviously the challenge is to find a name for NameMePlease. One option is to use something like ImmutableHouse or ReadOnlyHouse – but the inheritance hierarchy makes liars of both of those names. How can it be a ReadOnlyHouse if there are methods in an implementation which change it? The interface should say what you can do with the type, rather than specifying what you can’t do – unless part of the contract of the interface is that the implementation will genunely prohibit changes.

Thinking of this “positive” aspect led me to ReadableHouse, which is what I’ve gone with for the moment. It states what you can do with it – read information. Again, this is a concept which Joda Time uses.

Another option is to make it just House, and change the mutable interface to MutableHouse or something similar. In this particular situation the refactoring involved would have been enormous. Simple to automate, but causing a huge check-in for relatively little benefit. Almost all uses are actually mutating ones. The consensus within the Google Java mailing list seems to be that this would have been the preferred option, all things being equal. One interesting data point was that although Joda Time uses ReadableInstant etc, the current proposals for the new date/time API which will be included in Java 7, designed by the author of Joda Time, don’t use this convention. Presumably the author found it didn’t work quite as well as he’d hoped, although I don’t have know of any specific problems.

Conclusion

You’ll probably be unsurprised to hear that I don’t have a recipe for coming up with good names. However, in thinking about naming I’ve at least worked out a few points to think about:

  • Context is important: how discoverable does this need to be? Is accuracy more important than brevity? Do you have any example uses (e.g. through tests) which can help to see whether the code feels right or not?
  • Think of your audience. How familiar will they be with the rest of the code you’re writing? Are they likely to have a background in other areas of computer science where you could steal terminology? Can you make your name consistent with other common frameworks they’re likely to use? The reverse is true too: are you reusing a familiar name for a different concept, which could confuse readers?
  • Work out the information the name is trying to convey. For types, this includes working out how it participates in inheritance. Is it trying to advertise capabilities or restrictions?
  • Is it possible to make correct code look correct, and buggy code look wrong? This is rarely feasible, but it’s one of the main attractions of “Plus” in the benchmark case. (I believe this is one of the main selling points of true Hungarian Notation for variable naming, by the way. I’m not generally a fan, but I like this aspect.)

I may expand this list over time…

I think it’s fitting to close with a quote from Phil Karlton:

There are only two hard things in Computer Science: cache invalidation and naming things.

Almost all of us have to handle naming things. Let’s hope most of us don’t have to mess with cache invalidation as well.

21 thoughts on “What’s in a name?”

  1. Naming things is not only hard, but its incredibly important, more so than most developers realize I think. I like Plus, and IDataProducer isn’t bad really, especially given your point about being introduced to it beforehand. The last case is the really interesting one. Take an example from the framework: streams. All of the Stream APIs use suffixes instead of prefixes: StreamReader/Writer, TextReader/Writer, etc. Note that the underlying stream may or may not be read- or write-only; the Reader suffix merely states that all you can do with that particular interface is read.

    The question is which is better, the Readable prefix or the Reader suffix yet. My tendency is to say that the suffix is better, only because it will cause the class its associated readers/writers to be grouped near each other in any alphabetical listing. But it wouldn’t surprise me if someone could come up with a more compelling argument that would change my mind.

    Nitpick: I think you mean “have precedent”, not “have precedence”.

    Like

  2. @David: Fixed the precedent bit, thanks.

    While the suffix form is indeed useful, I think it serves a different purpose here. A StreamReader reads *from* a stream; it isn’t a readable stream in itself. A TextReader reads text from anywhere, it isn’t the text itself. In this case it really *is* a House, but just from a reading viewpoint.

    Definitely needs further consideration though.

    I was originally going to comment on the difference in choice between “map” and “dictionary” (Java and .NET respectively) but I felt the post was getting too long already…

    Like

  3. I couldn’t agree more. Making readable code is just as important as making it bug free. Interfaces that don’t convey the meaning and use of a class might take less time to write. But in reality they simply transfer the time needed to name things right to the point in time when you actually need to use the interface. It’s technological debt wearing a new disguise.

    Like

  4. @Keith: That implies (to me) that it’s not really an instance of a House, it’s a view onto another House. In particular, it would be pretty odd (IMO) to have “House extends HouseView”. I don’t want anything to imply that this isn’t the actual object containing the data. It’s the actual object, just seen from a particular point of view.

    Like

  5. Not that I think there’s anything wrong with “Plus”, but using the most popular .NET immutable class as a precedent, you could have gone with “Concat”. Personally, I prefer “Append”, because it implies ordering too (in most collections, the order is important, whereas “Plus” is technically a commutative operation).

    As far as MutableHouse goes, note that that’s the approach used in OpenStep/NextStep/Cocoa. I hear you on the refactoring pain, but for a ground-up design or something intended for widespread public consumption it might be worth using that paradigm. One particular reason like this naming approach is that read-only uses are generally more common than writeable uses, and so that optimizes typing for the common case. :) More seriously though, having the writeable case be the explicitly-named case means that the more “dangerous” usage is the one that requires more forethought.

    All that said, while I agree that in many cases having a good name is very helpful, I think it’s also important to have some perspective. The very most important thing about any code is that it works right. The next most important thing is that the implementation is understandable and maintainable.

    Naming is a very useful aspect of an API, but if you get the first two things wrong but the name right, you’re in a much worse position than if you get the first two things right but the name wrong. :)

    I have found myself sometimes very much regretting the time I spent trying to come up with a good name. In the end, I find not everyone appreciates the effort or, worse, thinks I did it all wrong. It’s one of the reasons I like the Hungarian naming convention: it provides specific guidance for naming that takes much of the linguistic pitfalls out of the equation, leaving me more time to write code rather than spending it naming things. :)

    (And before anyone goes complaining about Hungarian, read up on the difference between “apps” and “systems” Hungarian, and take note that the “systems” Hungarian that everyone disparages — and rightly so — isn’t really the actual Hungarian naming convention.) (There used to be a good Wikipedia article on the topic, but for some reason I can’t find it right now, otherwise I’d provide a link here).

    Like

  6. @Peter,

    You seem to imply that there is a difference between “understandable and maintainable” and “getting the name right”. Naming is the single most important factor in making sure that a library is understandable, and it is a significant factor in maintainability. Most people don’t recognize this about the .NET framework, but a major reason why it is so easy to just pick up and use is that the framework teams spends a HUGE amount of time on naming.

    Moreover, if your library is not easy for other developers to integrate into their applications, they won’t use it, no matter how regardless of its functionality. If you spend all your time focusing on correctness and neglect naming, then all of your effort has been in vain.

    Like

  7. You seem to imply that there is a difference
    between “understandable and maintainable”
    and “getting the name right”.

    Not at all. You’re conflating two different things. I’m talking about the code _inside_ the library, not the public API. Getting the name right has nothing to do with the understandability of the code inside the library.

    Naming is the single most important factor
    in making sure that a library is understandable

    I will respectfully disagree. Not that naming isn’t important, but there are lots of other ways to make a library understandable. I’ve used too many libraries with poor naming that are still reasonably understandable to think that naming “is the single most important factor”. Is it _a_ factor? Certainly. But I don’t think you can go farther than that with the claim.

    Moreover, if your library is not easy for other
    developers to integrate into their applications,
    they won’t use it, no matter how regardless of
    its functionality.

    Sorry, there are too many counter-examples for that to be true. Again, there are lots of other things that factor into whether someone uses a library. A trivial example is Java. The core JDK has all sorts of poorly named things, all over the place, and yet it’s so popular that Microsoft responded in kind with their own language and framework.

    Even .NET isn’t free of poorly-named things. I’ve seen too many people asking perfectly reasonable questions about what a method or property really does for me to think .NET’s primary strength is naming.

    If you want to make naming your very highest priority, by all means…I am the last to want to dissuade you from that. But I can’t say it’s how I want to spend 50% of my development time, nor do I think it’s necessary to do so.

    Like

  8. @Peter,

    The fact that neither Java nor .NET (nor any other library or framework in existence for that matter) is perfectly named in no way negates the importance of naming. It merely means that everyone has some room to improve.

    “I’ve used too many libraries with poor naming that are still reasonably understandable to think that naming ‘is the single most important factor'”.

    Obviously “poor naming” is not a binary value, it is a multi-dimensional continuum; many libraries may be well named in some areas and poorly named in others (Java and .NET could both be taken as examples). But in my experience, libraries which have not bothered to come up with decent names are generally incomprehensible without reading reams of documentation and holding a matrix of incongruities in your head. There are simply too many technologies out there to learn for me to waste my time trying to understand a library which has not bothered to spend the time to make itself understandable.

    Like

  9. > Let’s hope most of us don’t have to mess with > cache invalidation as well.

    You are messing it up not messing with it, every single time you start a CLR or C# project. Cache gets blasted to obvlivion, its poor tech.

    Regarding PushLINQ, already done, see CLINQ on cplex..

    Anyway, the only good point was on naming.

    Like

  10. >> As an example, we’ve got an operator similar to the oft-implemented ForEach one, but which yields the input sequence back out again

    I used a name “Pass” for this in Reactive LINQ (http://tomasp.net/blog/reactive-iv-reactivegame.aspx). But that’s because my terminology came from F#, so I also used “Listen” instead of “ForEach”.

    >> Regarding PushLINQ, already done, see CLINQ on cplex..
    CLINQ isn’t the first incarnation of that idea either (even if it was done earlier than PushLINQ). See Functional Reactive Programming in Haskell. Every attempt to do something like this is quite different and I’m sure PushLINQ has quite a lot to add as well…

    Like

  11. @Tomas: Absolutely. Even if there’s nothing new in Push LINQ in itself, it was certainly new to *me* – I got a lot of “value” just out of investigating the idea and implementing it.

    I suspect very few people get to invent anything *truly* new :)

    Like

  12. You can take it as joke if you wish, but most C# or any Java users are blisfully unaware what runtimes do to the CPU and overall hardware architecture.

    If you’re planning to call yourself an engineer, I’d suggest you fire up VTune. Learn it for about a month, and then start watching what kind of damage CLR or JVM does. You will laugh how inefficient the ‘tech’ really is, and the data will not be funny.

    Like

  13. @Jedi: If you weren’t joking, then you misunderstood the quote. The point is that designing a cache (not just a CPU level cache, but a general cache – think authentication tokens, security certificates etc) with invalidating in mind is a hard problem.

    Like

  14. For your apply-and-return, how about “AfterApply” or “Applied”?

    It tells you an action will be applied, and tells you what you’re getting back is the affected thing.

    Like

  15. @Strilanc: Possibly. The point is that usually – unless you write an action which mutates the items, which IMO should be the exception rather than the rule, what you’re getting back is the original results. The primary purpose is to have side-effects elsewhere (e.g. logging) rather than to change the results (where a projection would be more appropriate).

    Like

Leave a comment