Current book project: Real World Functional Programming

February 27, 2009 jonskeet 14 Comments

Now that it’s all official, I can reveal that since the tail end of last year I’ve been helping out with Real World Functional Programming by Tomáš Petříček (and me, ish).

I’m doing the same kinds of things I did with Groovy in Action, coming to the text with fresh eyes: I’m technically competent but don’t have any expertise in the subject matter. This means I can see it as a “real” reader would, challenge the assumptions, point out places which need more explanation etc. I’m also polishing the language a bit and adding a few informative notes. I suspect by the end there won’t be a single paragraph that I’ve written in its entirety, but there’ll also be few paragraphs which I haven’t touched at all.

It’s exciting to get a chance to work on the book, and I hope I’m improving it. Check it out, join the MEAP, and give us feedback :)

I have a couple of other books on the back-burner at the moment too; hopefully I’ll be able to give more details reasonably soon.

C#, General, Java, LINQ

What’s in a name?

February 27, 2009 jonskeet 21 Comments

T.S. Eliot had the right idea when he wrote “The naming of cats”:

The Naming of Cats is a difficult matter,
It isn’t just one of your holiday games
…
When you notice a cat in profound meditation,
The reason, I tell you, is always the same:
His mind is engaged in a rapt contemplation
Of the thought, of the thought, of the thought of his name:
His ineffable effable
Effanineffable
Deep and inscrutable singular Name.

Okay, so developers may not contemplate their own names much, but I know I’ve certainly spent a significant amount of time recently trying to work out the right name for various types and methods. It always feels like it’s just out of reach; tauntingly, tantalisingly close.

Recently I’ve been thinking a bit about what the goals might be in coming up with a good name. In particular, I seem to have been plagued with the naming problem more than usual in the last few weeks.

Operations on immutable types

A while ago I asked a question on Stack Overflow about naming a method which “adds” an item to an immutable collection. Of course, when I say “adds” I mean “returns a new collection whose contents is the old collection and the new item.” There’s a really wide range of answers (currently 38 of them) which mostly seem to fall into four categories:

Use Add because it’s idiomatic for .NET collections. Developers should know that the type is immutable and act accordingly.
Use Cons because that’s the term functional programming has used for this exact operation for decades.
Use a new method name (Plus being my favourite at the moment) which will be obvious to non-functional developers, but without being so familiar that it suggests mutability.
Use a constructor taking the old collection and the new item.

Part of the reasoning for Add being okay is that I originally posted the question purely about “an immutable collection” – e.g. a type which would have a name like ImmutableList<T>. I then revealed my true intention (which I should have done from the start) – to use this in MiniBench, where the “collection” would actually be a TestSuite. Everything in MiniBench is immutable (it’s partly an exploration in functional programming, as it seems to fit very nicely) but I don’t want to have to name every single type as Immutable[Whatever]. There’s the argument that a developer should know at least a little bit about any API they’re using, and the immutability aspect is one of the first things they should know. However, MiniBench is arguably an extreme case, because it’s designed for sharing test code with people who’ve never seen it before.

I’m pretty sure I’m going to go with Plus in the end:

It’s close enough to Add to be familiar
It’s different enough to Add to suggest that it’s not quite the same thing as adding to a normal collection
It sounds like it returns something – a statement which just calls Plus without using the result sounds like it’s wrong (and indeed it would be)
It’s meaningful to everyone
I have a precedent in the Joda Time API

Another option is to overload the + operator, but I’m not really sure I’m ready to do that just yet. It would certainly leave brief code, but is that really the most important thing?

Let’s look at a situation with some of the same issues…

LINQ operators

Work on MoreLINQ has progressed faster than expected, mostly because the project now has four members, and they’ve been expending quite a bit of energy on it. (I must do a proper consistency review at some point – in particular it would be nice to have the docs refer to the same concepts in the same way each time. I digress…)

Most of the discussion in the project hasn’t been about functionality – it’s been about naming. In fact, LINQ is particularly odd in this respect. If I had to guess at how the time has been spent (at least for the operators I’ve implemented) I’d go for:

15% designing the behaviour
20% writing the tests
10% implementation
5% writing the documentation (just XML docs)
50% figuring out the best name

It really is that brutal – and for a lot of the operators we still haven’t got the “right” name yet, in my view. There’s generally too much we want to convey in a word or two. As an example, we’ve got an operator similar to the oft-implemented ForEach one, but which yields the input sequence back out again. Basically it takes an action, and for each element it calls the action and then yields the element. The use case is something like logging. We’ve gone through several names, such as Pipe, Tee, Via… and just this morning I asked a colleague who suggested Apply, just off the top of his head. It’s better than anything we’d previously thought of, but does it convey both the “apply an action” and “still yield the original sequence” aspects?

The old advice of “each method should only do one thing” is all very well, and it clearly helps to make naming simpler, but with situations like this one there are just naturally more concepts which you want to get across in the name.

Let’s stay on the LINQ topic, but stray a bit further from the well-trodden path…

The heart of Push LINQ: IDataProducer

I’ve probably bored most of you with Push LINQ by now, and I’m not actively developing it at the moment, but there’s still one aspect which I’m deeply uncomfortable with: the core interface. IDataProducer represents a stream of data which can be observed. Basically clients subscribe to events, and their event handlers will be called when data is “produced” and when the stream ends.

I know IDataProducer is an awful name – but so far I haven’t found anything better. IObservable? Ick. Overused and isn’t descriptive. IPushEnumerable? Sounds like the client can iterate over the data, which they can’t. The actual event names (DataProduced/EndOfData) are okay but there must be something better than IDataProducer. (Various options have been suggested in the past – none of them have been so obviously “right” as to stick in my head…)

This situation is slightly different to the previous ones, however, simply because it’s such a pivotal type. You would think that the more important the type, the more important the name would be – but in some ways the reverse is true. You see, Push LINQ isn’t a terribly “obvious” framework. I say that without shame – it’s great at what it does, but it takes a few mental leaps before you really grok it. You’re really going to have to read some documentation or examples before you write your own queries.

Given that constraint, it doesn’t matter too much what the interface is called – it’s going to be explained to you before you need it. It doesn’t need to be discoverable – whereas when you’re picking method names to pop up in Intellisense, you really want the developer to be able to guess its purpose even before they hover over it and check the documentation.

I haven’t given up on IDataProducer (and I hope to be moving Push LINQ into MoreLINQ, by the way – working out a better name is one of the blockers) but it doesn’t feel like quite as much of a problem.

Read-only or not read-only?

This final example came up at work, just yesterday – after I’d started writing this post. I wanted to refactor some code to emphasize which methods only use the read-only side of an interface. This was purely for the sake of readability – I wanted to make it easier to reason about which areas of the code modified an object and which didn’t. It’s a custom collection – the details don’t matter, but for the sake of discussion let’s call it House and pretend we’re modelling the various things which might be in a house. (This is Java, hence House rather than IHouse.)

I’m explicitly not doing this for safety – I don’t mind the fact that the reference could be cast to a mutable interface. The point is just to make it self-documenting that if a method only has a parameter in the non-mutating form, it’s not going to change the contents of the house.

So, we have two interfaces, like this:

public interface NameMePlease
{
Color getDoorColor();
int getWindowCount();

// This already returned a read-only collection
Set<Furniture> getFurniture();
}

public interface House extends NameMePlease
{
    void setDoorColor(Color doorColor);
    void setWindowCount(int windows);
    void addFurniture(Furniture item);
}

Obviously the challenge is to find a name for NameMePlease. One option is to use something like ImmutableHouse or ReadOnlyHouse – but the inheritance hierarchy makes liars of both of those names. How can it be a ReadOnlyHouse if there are methods in an implementation which change it? The interface should say what you can do with the type, rather than specifying what you can’t do – unless part of the contract of the interface is that the implementation will genunely prohibit changes.

Thinking of this “positive” aspect led me to ReadableHouse, which is what I’ve gone with for the moment. It states what you can do with it – read information. Again, this is a concept which Joda Time uses.

Another option is to make it just House, and change the mutable interface to MutableHouse or something similar. In this particular situation the refactoring involved would have been enormous. Simple to automate, but causing a huge check-in for relatively little benefit. Almost all uses are actually mutating ones. The consensus within the Google Java mailing list seems to be that this would have been the preferred option, all things being equal. One interesting data point was that although Joda Time uses ReadableInstant etc, the current proposals for the new date/time API which will be included in Java 7, designed by the author of Joda Time, don’t use this convention. Presumably the author found it didn’t work quite as well as he’d hoped, although I don’t have know of any specific problems.

Conclusion

You’ll probably be unsurprised to hear that I don’t have a recipe for coming up with good names. However, in thinking about naming I’ve at least worked out a few points to think about:

Context is important: how discoverable does this need to be? Is accuracy more important than brevity? Do you have any example uses (e.g. through tests) which can help to see whether the code feels right or not?
Think of your audience. How familiar will they be with the rest of the code you’re writing? Are they likely to have a background in other areas of computer science where you could steal terminology? Can you make your name consistent with other common frameworks they’re likely to use? The reverse is true too: are you reusing a familiar name for a different concept, which could confuse readers?
Work out the information the name is trying to convey. For types, this includes working out how it participates in inheritance. Is it trying to advertise capabilities or restrictions?
Is it possible to make correct code look correct, and buggy code look wrong? This is rarely feasible, but it’s one of the main attractions of “Plus” in the benchmark case. (I believe this is one of the main selling points of true Hungarian Notation for variable naming, by the way. I’m not generally a fan, but I like this aspect.)

I may expand this list over time…

I think it’s fitting to close with a quote from Phil Karlton:

There are only two hard things in Computer Science: cache invalidation and naming things.

Almost all of us have to handle naming things. Let’s hope most of us don’t have to mess with cache invalidation as well.

General, Stack Overflow

Answering technical questions helpfully

February 17, 2009 jonskeet 19 Comments

I’m unsure of whether this should be a blog post or an article, so I’ll probably make it both. I’ve probably written most of it before in Stack Overflow answers, but as I couldn’t find anything when I was looking earlier (to answer a question about Stack Overflow answers – now deleted, so only available for those over 10k rep) I figured it was time to write something in a medium I had more control over.

This is not a guide to getting huge amounts of reputation on Stack Overflow. As it happens, following the guidelines here is likely to result in decent rep, but I’m sure there are various somewhat underhand ways of gaining reputation without actually being helpful. In other words, I’m not going to explain how you might game the system, but just share my views on how to work with the system for the benefit of all involved. A lot of this isn’t specific to Stack Overflow, but some of the advice isn’t applicable to some forums.

Hypocrisy warning: I’m not going to claim I always follow everything in this list. I try, but that’s not the same thing – and quite often time pressures will compromise the quality of an answer. Oh, and don’t read anything into the ordering here. It’s how it happened to come out, that’s all.

Read the question

All too often I’ve written what I thought was a great answer… only to reread the question and find out that it wasn’t going to help the questioner at all.

Now, questions are often written pretty badly, leaving out vital information, being vague about the problem, including lots of irrelevant code but leaving out the crucial bit which probably contains the bug, etc. It’s at least worth commenting on the question to ask for more information, but just occasionally you can apply psychic debugging to answer the question anyway. If you do have to make assumptions when writing an answer, state them explicitly. That will not only reduce the possibility of miscommunication, but will also point out the areas which need further clarification.

Code is king

I can’t believe I nearly posted this article without mentioning sample code.

Answers with sample code are gold… if the code is appropriate. A few rules of thumb:

Make sure it compiles, assuming it’s meant to. This isn’t always possible – for example, I often post at work from a machine without .NET on it, or on the train from a laptop without Java. If you can’t get a compiler to make sure your code is valid, be extra careful in terms of human inspection.
Snippets are okay, but complete programs rock. If you’re not already comfortable with writing short console apps, practise it. Often you can write a complete app in about 15 lines which doesn’t just give the solution, but shows it working. Imagine the level of confidence that gives to anyone reading your answer. Get rid of anything you don’t need – brevity is really helpful. (In C# for example that usually means getting rid of [STAThread], namespace declarations and unused using directives.)
Take a bit of care over formatting. If possible, try to prevent the code wrapping. That’s not always realistically feasible, but it’s a nice ideal. Make sure the spacing at least looks okay – you may want to use a 2-space indent if your code involves a lot of nesting.
If you skip best practices, add a comment. For example, you might include // Insert error handling here to indicate that production code really should check the return value etc. This doesn’t include omitting the STAThread attribute and working without a namespace – those are just reasonable assumptions and unlikely to be copied wholesale, whereas the main body of the code may well be. If your code leaks resources, someone’s production code may do so as well…

Code without an explanation is rarely useful, however. At least provide a sentence or two to explain what’s going on.

Answer the question and highlight side-issues

Other developers don’t always do things the way we’d like them to. Questions often reflect this, basically asking how to do something which (in our view) shouldn’t be attempted in the first place. It may completely infeasible, or it may just be a really bad idea.

Occasionally, the idea is so awful – and possibly harmful to users, especially when it comes to security questions – that the best response is just to explain (carefully and politely) why this is a really bad thing to do. Usually, however, it’s better to answer the question and give details of better alternatives at the same time. Personally I prefer to give these alternatives before the answer to the question asked, as I suspect it makes it more likely that the questioner will read the advice and take it on board. Don’t forget that the more persuasive you can be, the more likely it is they’ll abandon their original plans. In other words, “Don’t do this!” isn’t nearly as useful as “Don’t do this because…”

It’s okay to guess, but be honest

This may be controversial. I’ve certainly been downvoted twice on SO for having the temerity to post an answer without being 100% sure that it’s the right one – and (worse?) for admitting as much.

Sometimes there are questions which are slightly outside your own area of expertise, but they feel an awful lot like a situation you’ve been in before. In this kind of case, you can often write an answer which may well help – and would at least suggest something for the questioner to investigate a a possible answer to their problem. Sometimes you may be way off base, which is why it’s worth explaining in your answer that you are applying a bit of educated guesswork. If another answer is posted by an expert in the topic, it may well be worth the questioner trying their solution first… but at least you’re providing an alternative if they run out of other possibilities.

Now, somewhat contradictory…

Raise the overall accuracy level

It should go without saying that a correct answer is more helpful than an incorrect one. There are plenty of entirely inaccurate answers on Stack Overflow – and on newsgroups, and basically every online community-based resource I’ve ever seen. This isn’t surprising, but the best ways to counter it are:

Challenge inaccurate information
Provide accurate information yourself

One of the key aspects of this is to provide evidence. If you make an objective statement without any sort of disclaimer about your uncertainty, that should mean you’ve got good reason to believe you’re correct. That doesn’t necessarily mean you need to provide evidence, but as soon as there’s disagreement, evidence is king. If you want to assert that your code is faster than some other code, write a benchmark (carefully!). If you want to prove that an object can be collected at a certain point in time, write a test to show it. Short but complete programs are great for this, and can stop an argument dead in its tracks.

Another source of evidence is documentation and specifications. Be aware that they’re not always accurate, but I generally believe documentation unless I have a specific reason to doubt it.

Provide links to related resources

There have been a few questions on Stack Overflow as to whether it’s appropriate to link to other resources on the web. My own opinion is that it’s absolutely appropriate and can add a lot of value to an answer. In particular, I like to link to:

MSDN and JavaDoc documentation, or the equivalent for other platforms. With MSDN URLs, if they end in something like http://msdn.microsoft.om/foo(VS80).aspx, take the bit in brackets out of the URL (leaving http://msdn.microsoft.om/foo.aspx in this case). That way the link will always be to the most recent version of the documentation, and it doesn’t give WMD as many problems either.
Language specifications, in particular those for C# and Java. I generally link to the Word document for the C# spec, which has the disadvantage that I can’t link to a specific section, and it won’t just open in the browser. On the other hand, I find it easier to navigate than the MSDN version, and I’ve seen inaccuracies in MSDN.
My own articles and blog posts (unsurprisingly :)
Wikipedia
Other resources which are unlikely to become unavailable

The point about the resources not becoming unavailable is important – one of the main arguments against linking is that the page might go away in the future. That’s not a particularly compelling argument for most reference material IMO, but it is relevant in other cases. Either way, it’s worth including some sort of summary of what you’re linking to – a link on its own doesn’t really invite the reader to follow it, whereas a quick description of what they’ll find there provides more incentive.

One quick tip: In Chrome I have an MSDN “search engine” set up and in Firefox a keyword bookmark. In both cases the URL is http://msdn.microsoft.com/en-us/library/%s.aspx – this makes it easier to get to the MSDN page for a particular namespace, type or member. For example, by typing “msdn system.io.fileinfo” I get straight to the FileInfo page. It doesn’t work for generics, however. At some point I’d like to make this simpler somehow…

Care about your reader: spelling, grammar and style matter

I’m lucky: I’m a native English speaker, and I have a reasonably good natural command of English. Having said that, I still take a certain amount of care when writing answers: I’ll often rewrite a sentence several times until I feel it works. I’ve noticed that answers with correct spelling and grammar are generally upvoted more than ones with essentially the same content but less careful presentation.

I’m not recommending style over substance; I’m saying that both are important. You could be putting forward the most insightful comment in the world, but if you can’t communicate it effectively to your readers, it’s not going to help anyone. Having said that…

A time-limited answer may be better than no answer at all

I answer Stack Overflow questions in whatever spare time I have: waiting for a build, on the train, taking a break from editing etc. I frequently see a question which would take a good 15 minutes or more to answer properly – but I only have 30 seconds. If the question already has answers, there’s probably no sensible contribution I can make, but if the question is completely unanswered, I sometimes add a very short answer with the most important points I can think of at the time.

Ideally, I’d then go back later and edit the answer to make it more complete – but at least I may have given the questioner something to think about or an option to try. Usually these “quickies” are relatively fruitless, but occasionally I’ll come back to find that the answer was accepted: the slight nudge in the answer was all that was required. If I’m feeling diligent at that point I’ll still complete the answer, but the point is that a brief answer is usually better than nothing.

Don’t be afraid to delete (or edit heavily) useless answers

It’s almost inevitable that if you post enough answers, one of them will be less than helpful. It may start off being a good one, but if a later answer includes all the information from your answer and more, or explains it in a better way, it’s just clutter.

Likewise if you make a wild stab in the dark about the cause of a problem, and that guess turns out to be wrong, your answer could become actively unhelpful. Usually the community will let you know this in comments or voting, but sometimes you have to recognise it on your own.

Be polite

It’s a shame that I have to include this, and it’s even more of a shame that I need to take better notice of it myself. However boneheaded a question is, there’s no need to be rude. You can express your dismay at a question, and even express dismay at someone failing to read your answer properly, without resorting to inflammatory language. In the end, no-one really wins from a flame war. Remember that there’s a real person at the other end of the network connection, and they’re probably just as frustrated with you as you are with them. If things are getting out of hand, just write a polite note explaining that you don’t think it’s productive to discuss the topic any more, and walk away. (This can be surprisingly difficult advice to heed.)

Next time I get too “involved” in a question, could someone please direct me to this point? And don’t take “But I’m right, darnit!” as an excuse.

Don’t “answer and run”

Sometimes an answer is very, very nearly spot on – but that final 1% is all the difference between the reader understanding fully and having a dangerous misunderstanding of the topic.

Stack Overflow makes it pretty easy to see comments to your answers: monitor this carefully so you can respond and clarify where appropriate. Many web forums have an “email me if this thread is updated” option – again, this is useful. It’s really frustrating (as a questioner) to be left hanging with an answer which looks like it might solve a real headache, if only you could just get the ear of the author for a moment.

Having said all of this:

Have fun

In my experience the most useful users are the ones who are obviously passionate about helping others. Don’t do it out of some misguided sense of “duty” – no-one’s paying you to do this (I assume) and no-one can reasonably complain if you just haven’t got the time or energy to answer their question. Save yourself for a time when you can be more enthusiastic and enjoy what you’re doing. Go ahead, have a cup of coffee and watch a Youtube video or something instead. You don’t have to check whether there are any new questions!

Benchmarking, C#, General, Stack Overflow

Benchmarking: designing an API with unusual goals

February 2, 2009 jonskeet 8 Comments

In a couple of recent posts I’ve written about a benchmarking framework and the results it produced for using for vs foreach in loops. I’m pleased with what I’ve done so far, but I don’t think I’ve gone far enough yet. In particular, while it’s good at testing multiple algorithms against a single input, it’s not good at trying several different inputs to demonstrate the complexity vs input size. I wanted to rethink the design at three levels – what the framework would be capable of, how developers would use it, and then the fine-grained level of what the API would look like in terms of types, methods etc. These may all sound quite similar on the face of it, but this project is somewhat different to a lot of other coding I’ve done, mostly because I want to lower the barrier to entry as far as humanly possible.

Before any of this is meaningful, however, I really needed an idea of the fundamental goal. Why was I writing yet another benchmarking framework anyway? While I normally cringe at mission statements because they’re so badly formulated and used, I figured this time it would be helpful.

Minibench makes it easy for developers to write and share tests to investigate and measure code performance.

The words in bold (or for the semantically inclined, the strong words) are the real meat of it. It’s quite scary that even within a single sentence there are seven key points to address. Some are quite simple, others cause grief. Now let’s look at each of the areas of design in turn.

Each element of the design should either clearly contribute to the mission statement or help in a non-functional way (e.g. make the project feasible in a reasonable timeframe, avoid legal issues etc). I’m aware that with the length of this post, it sounds like I’m engaging in "big upfront design" but I’d like to think that it’s at least informed by my recent attempt, and that the design criteria here are statements of intent rather than implementation commitments. (Aargh, buzzword bingo… please persevere!)

What can it do?

As we’ve already said, it’s got to be able to measure code performance. That’s a pretty vague definition, however, so I’m going to restrict it a bit – the design is as much about saying what isn’t included as what is.

Each test will take the form of a single piece of code which is executed many times by the framework. It will have an input and an expected output. (Operations with no natural output can return a constant; I’m not going to make any special allowance for them.)
The framework should take the tedium out of testing. In particular I don’t want to have to run it several times to get a reasonable number of iterations. I suspect it won’t be feasible to get the framework to guess appropriate inputs, but that would be lovely if possible.
Only wall time is measured. There are loads of different metrics which could be applied: CPU execution time, memory usage, IO usage, lock contention – all kinds of things. Wall time (i.e. actual time elapsed, as measured by a clock on the wall) is by far the simplest to understand and capture, and it’s the one most frequently cited in newsgroup and forum questions in my experience.
The benchmark is uninstrumented. I’m not going to start rewriting your code dynamically. Frankly this is for reasons of laziness. A really professional benchmarking system might take your IL and wrap it in a timing loop within a single method, somehow enforcing that the result of each iteration is used. I don’t believe that’s worth my time and energy, as well as quite possibly being beyond my capabilities.
As a result of the previous bullet, the piece of code to be run lots of times needs to be non-trivial. The reality is that it’ll end up being called as a delegate. This is pretty quick, but if you’re just testing "is adding two doubles faster or slower than adding two floats" then you’ll need to put a bit more work in (e.g. having a loop in your own code as well).
As well as the use case of "which of these algorithms performs the best with this input?" I want to support "how does the performance vary as a function of the input?" This should support multiple algorithms at the same time as multiple inputs.
The output should be flexible but easy to describe in code. For single-input tests simple text output is fine (although the exact figures to produce can be interesting); for multiple inputs against multiple tests a graph would often be ideal. If I don’t have the energy to write a graphing output I should at least support writing to CSV or TSV so that a spreadsheet or graphing tool can do the heavy lifting.
The output should be useful – it should make it easy to compare the performance of different algorithms and/or inputs. It’s clear from the previous post here that just including the scaled score doesn’t give an obvious meaning. Some careful wording in the output, as well as labeled columns, may be required. This is emphatically not a dig at anyone confused by the last post – any confusion was my own fault.

Okay, that doesn’t sound too unreasonable. The next area is much harder, in my view.

How does a developer use it?

Possibly the most important word in the mission statement is share. The reason I started this project at all is that I was fed up with spending ages writing timing loops for benchmarks which I’d then post on newsgroups or Stack Overflow. That means there are two (overlapping) categories of user:

A developer writing a test. This needs to be easy, but that’s an aspect of design that I’m reasonably familiar with. I’m not saying I’m good at it, but at least I have some prior experience.
A developer reading a newsgroup/forum post, and wanting to run the benchark for themselves. This distribution aspect is the hard bit – or at least the bit requiring imagination. I want the barrier to running the code to be really, really low. I suspect that there’ll be a "fear of the unknown" to start with which is hard to conquer, but if the framework becomes widely used I want the reader’s reaction to be: "Ah, there’s a MiniBench for this. I’m confident that I can download and run this code with almost no effort."

This second bullet is the one that my friend Douglas and I have been discussing over the weekend, in some ways playing a game of one-upmanship: "I can think of an idea which is even easier than yours." It’s a really fun game to play. Things we’ve thought about so far:

A web page which lets you upload a full program (without the framework) and spits out a URL which can be posted onto Stack Overflow etc. The user would then choose from the following formats:

Single .cs file containing the whole program – just compile and run. (This would also be shown on the download page.)
Test code only – for those whole already have the framework
Batch file – just run it to extract/build/run the C# code.
NAnt project file containing the C# code embedded in it – just run NAnt
MSBuild project file – ditto but with msbuild.
Zipped project – open the project to load the test in one file and the framework code in other (possibly separate) .cs files
Zipped solution – open to load two projects: the test code in one and the framework in the other

A web page which which lets you upload your results and browse the results of others

Nothing’s finalised here, but I like the general idea. I’ve managed (fairly easily) to write a "self-building" batch file, but I haven’t tried with NAnt/MSBuild yet. I can’t imagine it’s that hard – but then I’m not sure how much value there is either. What I do want to try to aim for is users running the tests properly, first time, without much effort. Again, looking back at the last post, I want to make it obvious to users if they’re running under a debugger, which is almost always the wrong thing to be doing. (I’m pretty sure there’s an API for this somewhere, and if there’s not I’m sure I can work out an evil way of detecting it anyway.)

The main thing is the ease of downloading and running the benchmark. I can’t see how it could be much easier than "follow link; choose format and download; run batch file" – unless the link itself was to the batch file, of course. (That would make it harder to use for people who wanted to get the source in a more normal format, of course.)

Going back to the point of view of the developer writing the test, I need to make sure it’s easy enough for me to use from home, work and on the train. That may mean a web page where I can just type in code, the input and expected output, and let it fill in the rest of the code for me. It may mean compiling a source file against a library from the command line. It may mean compiling a source file against the source code of the framework from the command line, with the framework code all in one file. It may mean building in Visual Studio. I’d like to make all of these cases as simple as possible – which is likely to make it simple for other developers as well. I’m not planning on optimising the experience when it comes to writing a benchmark on my mobile though – that might be a step too far!

What should the API look like?

When we get down to the nitty-gritty of types and methods, I think what I’ve got is a good starting point. There are still a few things to think about though:

We nearly have the functionality required for running a suite with different inputs already – the only problem is that we’re specifying the input (and expected output) in the constructor rather than as parameters to the RunTests method. I could change that… but then we lose the benefit of type inference when creating the suite. I haven’t resolved this to my satisfaction yet :(
The idea of having the suite automatically set up using attributed methods appeals, although we’d still need a Main method to create the suite and format the output. The suite creation can be simplified, but the chances of magically picking the most appropriate output are fairly slim. I suppose it could go for the "scale to best by number of iterations and show all columns" option by default… that still leaves the input and expected output, of course. I’m sure I’ll have something like this as an option, but I don’t know how far it will go.
The "configuration" side of it is expressed as a couple of constants at the moment. These control the minimum amount of time to run tests for before we believe we’ll be able to guess how many iterations we’ll need to get close to the target time, and the target time itself. These are currently set at 2 seconds and 30 seconds respectively – but when running tests just to check that you’ve got the right output format etc, that’s far too long. I suspect I should make a test suite have a configuration, and default to those constants but allow them to be specified on the command line as well, or explicitly in code.
Why do we need to set the expected output? In many cases you can be pretty confident that at least one of the test cases will be correct – so it’s probably simpler just to run each test once and check that the results are the same for all of them, and take that as the expected output. If you don’t have to specify the expected output, it becomes easier to specify a sequence of inputs to test.
Currently BenchmarkResult is nongeneric. This makes things simpler internally – but should a result know the input that it was derived from? Or should the ResultSuite (which is also nongeneric) know the input that has been applied to all its functions? The information will certainly need to be somewhere so that it can be output appropriately in the multiple input case.

My main points of design focus around three areas:

Is it easy to pick up? The more flexible it is, with lots of options, the more daunting it may seem.
Is it flexible enough to be useful in a variety of situations? I don’t know what users will want to benchmark – and I don’t build the right tool, it will be worthless to them.
Is the resulting test code easy and brief enough to include in a forum post, with a link to the full program? Will readers understand it?

As you can see, these are aimed at three slightly different people: the first time test writer, the veteran test writer, and the first time test reader. Getting the balance between the three is tricky.

What’s next?

I haven’t started rewriting the framework yet, but will probably do so soon. This time I hope to do it in a rather more test-driven way, although of course the timing-specific elements will be tricky unless I start using a programmatic clock etc. I’d really like comments around this whole process:

Is this worth doing?
Am I asking the right questions?
Are my answers so far headed in the right direction?
What else haven’t I thought of?

Jon Skeet's coding blog

Monthly Archives: February 2009