June | 2006 | Jon Skeet's coding blog

Introduction

Recently, Eric Gunnerson made a post
in his blog with the idea of “the seven deadly sins of
programmers”. Eric is posting his ideas for such a list one at a time, periodically. He invited others to write
their own lists, however, and it was such an intriguing idea that I couldn’t resist. The list is in descending
order of importance (I figure not everyone will make it to the bottom of this post) but the order is fairly
arbitrary anyway. Hopefully none of this will be a surprise to most of my readership, but it’s nice to write
as a sort of manifesto anyway. I’ve included a “personal guilt rating” out of 10 for each of these sins. I’m
very willing to change these in the light of feedback from those with experience of working with me :)

#1 – Overengineering (in complexity and/or performance)

Personal guilt rating: historically 8, currently 3

It’s amazing how much people care about performance. They care about the smallest things, even
if there’s not a chance that those things will have a significant impact in reality. A good example
of this is implementing a singleton. I’ve seen
the double-checked lock algorithm (as described on the page, except often broken) repeatedly brought
forward as the best way to go. There’s rarely any mention of the fact that you have to get it just right
(in terms of making the variable volatile or using explicit memory barriers) in order for it to be properly
thread-safe. There’s no way of making it work in Java, so anyone porting code later will end up with a bug
they’re very unlikely to be aware of. Yet people use it over simply locking every time on the grounds of
performance. Now, acquiring an uncontested lock is very, very cheap in .NET. Yes, it’ll be slightly more expensive
on multi-processor boxes, but it’s still mind-bogglingly quick. The time taken to lock, check a variable for nullity
and then unlock is unlikely to cause a significant issue in almost any real world application. There may be a very,
very few where it would become a problem – but developers of those applications can profile and change the code
when they’ve proved that it’s a problem. Until that time, using double-checked locking just adds complexity
for no tangible benefit.

That’s just a single example. People are always asking performance questions on the newsgroups which just won’t
make a difference. For example, if you have a string reference in an Object expression, is it faster
to cast it to String or to call the toString method? Now, don’t get me wrong – I find
it very interesting to investigate this kind of thing, but it’s only as a matter of interest – not because I think
it should affect what code should be written. Whatever the result, the simplest code which achieves
the result in the most readable fashion is the best code until performance has proved to be an issue.

I should stress that this doesn’t extend to being stupid about performance. If I’m going to concatenate an unknown
number of strings together, I’ll use StringBuilder
of course – that’s what it’s designed for, and I’ve seen that it can make a huge difference in real world situations.
That’s the key though – it’s evidence based optimisation. In this case it’s general past evidence, whereas for many other
situations I’d use application-specific evidence, applying an optimisation which may make the code harder to read only
when I’ve proved that the particular application in question is suffering to a significant extent.

The other side of this issue is complexity in terms of what the solution can achieve. These days,
I mostly code for testability, knowing that if I can test that my code does what it wants to, chances are
it’ll be flexible enough to meet my needs without being overly complicated. I look for complexity all over the place,
particularly in terms of trying to anticipate a complicated requirement without knowing that the requirement will
actually be used. For instance, if I’m writing some kind of collection (not that that happens often), I won’t add sorting capabilities
until I know they’re needed. It just creates more work if you write unnecessary code. Of course, when writing libraries for
public consumption, things become much trickier – you basically can’t tell how a library will be used ahead of time,
so you may well end up adding features which are rarely used. The closer the communication between the code and its clients,
the better. (This is also relevant in terms of performance. One area I did spend time optimising was the
enhanced locking capabilities part of my miscellaneous
utility library. Locking is cheap, so any replacement for “standard” locking should also be cheap, otherwise it discourages
its own use.)

When I look at a design and see that it’s simple, I’m happy. When I look at implementation and see that it’s simple, I’m happy.
It’s not “clever” to write complicated code. Anyone can write complicated code – particularly if they don’t check that it works.
What takes time is writing simple code which is still powerful.

#2 – Not considering the code’s readership

Personal guilt rating: 4

This is actually closely linked to the first sin, but with a more human face. We know that code is generally in
maintenance for longer than it’s under initial development. We know that companies often (not always, thankfully) put their
“top developers” (however they decide to judge that) onto feature development rather than maintenance. These make it
essential that we consider “the next guy” when writing our code. This doesn’t necessarily mean reams of documentation –
indeed, too much documentation is as bad as too little documentation, as it can make the code harder to read (if it’s
inline code documentation) or be hard to find your way around (if it’s external documents). One project I was working on
decided to extract one document describing data formats from the main system architecture document, and we found that the
extracted document became absolutely crucial to both testers and coders. That document was kept accurate, and was short enough
to be easy to follow. The document from which it was extracted was rarely used.

Of course, the simpler the code, the less documentation is required. Likewise, well-written unit tests can often express the
correct behaviour (and expected use) of a class more succinctly than lots of documentation – as well as being kept accurate
automatically.

There are times when writing good documentation is very difficult indeed. Recently I wrote a class which did exactly the right
thing, and did it in an elegant manner. However, explaining what its purpose was even in person was difficult. Understanding
it from just the documentation would be almost impossible – unless the reader looked at what problem was being solved and worked
through what was required, which would lead fairly naturally to the same kind of solution. I’m not proud of this. I’m proud of the
class itself, but I don’t like finding myself stuck for words.

Sometimes, simple code (in terms of number of characters) is quite complicated. At one point on a project I had to find
whether only a single bit was set in a long. I’m no good at remembering the little tricks involved for this kind of thing,
but I’m aware they exist, and using them can be a lot more reliable than writing bit-twiddling loops to do things in a
long-winded fashion. In this case, we found the appropriate trick on the web, and included a link in the code. Without the link
(or at least a description in a comment) the code would have been effectively incomprehensible to anyone who didn’t recognise the
trick.

Code reviews definitely help readability – when they’re done properly. In some ways, they can be better than pair programming
for this, particularly if the original developer of the code doesn’t try to explain anything to the reviewer. At this point
the reviewer is able to take on the role of the first person who has to maintain the code – if anything isn’t clear just from
what’s available in terms of code and documentation (as opposed to human intervention) then that probably needs a bit of work.
Not verbally explaining what’s happening when you’re the author is incredibly difficult to do, and I’m very bad at it. It adds
time to the review, and when you’re under a lot of pressure (and who isn’t?) it can be very frustrating to watch someone
painstakingly understanding your code line by line. This is not to say that code reviews shouldn’t be the source of discussion
as well, of course – when I’m working with people I respect, I rarely come through a review without some changes to my code,
and the reverse is true too. After all, what are the chances that anyone gets something absolutely right to start with?

#3 – Assuming your code works

Personal guilt rating: historically 9, currently 3

Ever since I heard about unit testing I’ve seen the appeal, but I didn’t start to use
it regularly until 2005 when I joined Clearswift and met Stuart. Until then, I hadn’t heard about mock objects
which I find absolutely crucial in unit testing. I had read articles and seen examples of unit tests, but everything seemed to fall
apart when I tried to write my own – everything seemed to need something else. Of course, taken to extremes this is often a fault
of the code itself, where some classes require a complete system to be up and running before they can do anything. Unit testing
such a beast is difficult to say the least. In other situations you merely need to be able to specify a collaborator, which is where
mock objects come in.

So, since finding out about mock objects I’ve been more and more keen on unit testing. I know it’s not a silver bullet – I know that more
testing is required, both at an integration level between components, and at a system level, sometimes including manual tests. However,
once I started unit testing regularly, I got a sense of just how often code which I’d assumed would work simply wouldn’t. I’ve seen examples
of code which could never have possibly worked, with any input – so they can’t possibly have been used, or the results weren’t actually
important in the first place.

These days, I get frustrated when I either have to work with code which isn’t under test and can’t be easily put under
test due to the design (see point 7 on Eric’s list, excessive coupling) or when I’m writing code which is necessarily difficult to test.
Obviously I try to minimise the amount of the code which really can’t be tested – but sometimes it’s a significant amount. Urgh.

I currently don’t have many tests around my Miscellaneous Utility Library. I’ve
resolved to add tests for any new features I add or bugs I find though. Someone mailed me about a bug within the Utf32String
class. In the process of writing a unit test to demonstrate that bug I found at least two or three others – and that wasn’t even
trying to exercise the whole code, just enough to get to the original bug. I only mention this as a defence against the “I don’t need
unit tests – my code doesn’t have bugs in it” mentality. I would really like to think that I’m a pretty good coder – but everyone mistakes.
Unit tests won’t catch all of them, but it gets an awful lot. It also acts as documentation and gives you a much better base on which to
refactor code into the right shape, of course…

#4 – Using the wrong tool for the job

Personal guilt rating: 3

“When all you have is a hammer, everything looks like a nail.” That’s the typical quote used when tackling this topic. I hope this
sin is fairly self-explanatory. If all you know is Java and C#, you’ll find it a lot harder to solve problems which are best solved
with scripts, for instance. If all you know is C, you’ll find writing a complex web app a lot harder than you would with Java or
.NET. (I know it’s doable, and I used to do it – but it was really painful.) If you’re writing a device driver, you’ll find life
sucks pretty hard if you don’t know C or C++, I suspect.

Using the wrong tool for the job can be really damaging in terms of maintenance. While bad code can often be refactored into
good code over time (with a lot of effort) there are often significant implications in changing implementation language/technology.

This is a really good reason to make sure you keep yourself educated. You don’t need to necessarily keep up to date with all
the buzzword technologies – and indeed you’d find you did nothing else if you tried to keep up with everything – but there’s
always plenty to learn. Recently I’ve been looking at
Windows PowerShell
and Groovy. Next on my list is Squeak (Smalltalk).
I’ve been promising myself that I’d learn Smalltalk for years – at a recent Scrum
training course I met yet another Smalltalk evangelist, who had come from the Java side of things. It’s got to be worth trying…

#5 – Excessive code pride

Personal guilt rating: 2

I was recently pair programming and looking at some code Stuart had written a couple of weeks before. There were various bits
I wasn’t sure about, but thought were probably a bit smelly, and I asked my pairing partner to include a lot of comments
such as // TODO: Stuart to justify this code and
// TODO: Stuart to explain why this test is useful in the slightest. It’s worth bearing in mind at this point that
Stuart is significantly senior to me. With some people, comments like this would have been a career-limiting move. Stuart,
however, is a professional. He knows that code can usually be improved – and that however hard we try, we sometimes take
our eyes off the ball. Stuart had enough pride in his code to feel a need to fix it once the flaws had been pointed out,
but not enough pride to blind him to the flaws in the first place. This appropriate level of pride is vital when you’re working
with others, in my view. I don’t mind if people change my code – assuming they improve it. I expect that to happen
at a good code review, if I haven’t been pairing. A code review which is more of a rubber stamp than anything else is just a
waste of time.

This doesn’t mean I will always agree with others, of course. If I think my design/code is better than their suggested
(or even committed!) change I’m happy to put my case robustly (and with no deference to seniority) – but usually if someone’s
put in sufficient effort to understand and want to change my code in the first place, chances are they’ll think of something
I haven’t.

Write code you can be proud of – but don’t be proud to the point of stubbornness. Be prepared to take ownership of code you write
in terms of being responsible for your own problems – but don’t try to own it in terms of keeping other people out of it.

#6 – Failing to acknowledge weaknesses

Personal guilt rating: 2

ASP.NET, JSP, SQL, Perl, COBOL, Ruby, security, encryption, UML, VB… what do they all have in common? I wouldn’t claim
to “know” any of them, even though I use some on a regular basis. They are just some of my many technological weak spots.
Ask me a question about C# or Java in terms of the languages and I’ll be fairly confident about my chances of
knowing the answer (less so with generics). For lots of other stuff, I can get by. For the rest, I would need to immediately
turn to a book or a colleague who knows the subject. It’s important to know what you know and what you don’t.

The result of believing you know more than you actually do is usually writing code which just about works, at least in
your situation, but which is unidiomatic, probably non-performant, and quite possibly fails on anything other than the
data you’ve given it.

Ask for help when you need it, and don’t be afraid to admit to not being an expert on everything. No-one is an expert
at everything, even though Don Box does a pretty good impression of such a person…

#7 – Speaking with an accent

Personal guilt rating: 6 (but conscious and deliberate in some places)

Some of the worst Java code I’ve seen has come from C++ developers who then learned Java. This code typically
brings idioms of C/C++ such as tests like if (0==x) (which is safer than if (x==0) in C as missing
out an equals sign would just cause an accidental assignement rather than a compiler error. Similarly, Java code which
assumes that double and Double mean the same thing (as they do in C#) can end up behaving
contrary to expectations.

This is related to sin #6, in terms of people’s natural reaction to their own ignorance: find something similar that
they’re not ignorant about, and hope/assume that the similarities are enough to carry them through. In a way, this can
make it a better idea to learn VB.NET if you know Java, and C# if you know VB. (There are other pros and cons, of course –
this is only one tiny aspect.)

One way of trying to make yourself think in the language you’re actually writing in rather than the similar language you’re
more familiar with is to use the naming conventions of the target language. For instance, .NET methods are conventionally
PascalCased whereas Java methods are conventionally camelCased. If you see Pascal casing in Java code, look for other C#
idioms. Likewise, if you see method_names_with_underscores in either language, look for C/C++ idioms. (The most obvious C
idiom is likely to be checking return codes instead of using exceptions.)

Naming conventions are the most obvious “tell” of an accent, but sometimes it may be worth going against them deliberately.
For instance, I like the .NET conventions of prefixing interfaces with I, and of using Pascal casing for constants instead of
JAVA_SHOUTY_CONSTANTS. It’s important when you favour this sort of “breakaway” behaviour that you consult with the rest of
the team. The default should always be the conventions of the language you’re working in, but if the whole team decides that
using parts of a different convention helps more than it reduces consistency with other libraries, that’s reasonable.

What isn’t so reasonable is breaking the coding idioms of a language – simply because they other languages’ idioms
tend not to work. For example, RAII just doesn’t work in Java, and isn’t
automatic in C# either (you can “fake it” with a using statement, but I’m not sure that really counts as RAII).
Idiomatic code tends to be easier to read (particularly for those who are genuinely familiar with the language to start with)
and less semantically troublesome than “imported” styles from other languages.

Conclusion

So, that’s the lot. Ask me in a year and I may well have a different list, but this seems like a good starting point.
I’ve committed every sin in here at least once, so don’t feel too guilty if you have too. If you agree with them
and are still breaking them, that’s worth feeling a bit guilty about. If you disagree with them to start with,
that’s fair enough – add a comment to say why :)

Jon Skeet's coding blog

Monthly Archives: June 2006

The 7 Deadly Sins of Software Development