Category Archives: Stack Overflow

Books, C#, C# 4, General, Parallelization, Speaking engagements, Stack Overflow

Recent activities

September 3, 2009 jonskeet 15 Comments

It’s been a little while since I’ve blogged, and quite a lot has been going on. In fact, there are a few things I’d have blogged about already if it weren’t for “things” getting in the way.

Rather than writing a whole series of very short blog posts, I thought I’d wrap them all up here…

C# in Depth: next MEAP drop available soon – Code Contracts

Thanks to everyone who gave feedback on my writing dilemma. For the moment, the plan is to have a whole chapter about Code Contracts, but not include a chapter about Parallel Extensions. My argument for making this decision is that Code Contracts really change the feel of the code, making it almost like a language feature – and its applicability is almost ubiquitous, unlike PFX.

I may write a PFX chapter as a separate download, but I’m sensitive to those who (like me) appreciate slim books. I don’t want to “bulk out” the book with extra topics.

The Code Contracts chapter is in the final stages before becoming available to MEAP subscribers. (It’s been “nearly ready” for a couple of weeks, but I’ve been on holiday, amongst other things.) After that, I’m going back to the existing chapters and revising them.

Talking in Dublin – C# 4 and Parallel Extensions

Last week I gave two talks in Dublin at Epicenter. One was on C# 4, and the other on Code Contracts and Parallel Extensions. Both are now available in a slightly odd form on the Talks page of the C# in Depth web site. I no longer write “formal” PowerPoint slides, so the downloads are for simple bullet points of text, along with silly hand-drawn slides. No code yet – I want to tidy it up a bit before including it.

Podcasting with The Connected Show

I recently recorded a podcast episode with The Connected Show. I’m “on” for the second 2/3 of the show – about an hour of me blathering on about the new features of C# 4. If you can understand generic variance just by listening to me talking about it, you’re a smart cookie ;)

(Oh, and if you like it, please express your amusement on Digg / DZone / Shout / Kicks.)

Finishing up with Functional Programming for the Real World

Well, this hasn’t been taking much of my time recently (I bowed out of all the indexing etc!) but Functional Programming for the Real World is nearly ready to go. Hard copy should be available in the next couple of months… it’ll be really nice to see how it fares. Much kudos to Tomas for all his hard work – I’ve really just been helping out a little.

Starting on Groovy in Action, 2nd edition

No sooner does one book finish than another one starts. The second edition of Groovy in Action is in the works, which should prove interesting. To be honest, I haven’t played with Groovy much since the first edition of the book was finished, so it’ll be interesting to see what’s happened to the language in the meantime. I’ll be applying the same sort of spit and polish that I did in the first edition, and asking appropriately ignorant questions of the other authors.

Tech Reviewing C# 4.0 in a Nutshell

I liked C# 3.0 in a Nutshell, and I feel honoured that Joe asked me to be a tech reviewer for the next edition, which promises to be even better. There’s not a lot more I can say about it at the moment, other than it’ll be out in 2010 – and I still feel that C# in Depth is a good companion book.

MoreLINQ now at 1.0 beta

A while ago I started the MoreLINQ project, and it gained some developers with more time than I’ve got available :) Basically the idea is to add some more useful LINQ extension methods to LINQ to Object. Thanks to Atif Aziz, the first beta version has been released. This doesn’t mean we’re “done” though – just that we think we’ve got something useful. Any suggestions for other operators would be welcome.

Manning Pop Quiz and discounts

While I’m plugging books etc, it’s worth mentioning the Manning Pop Quiz – multiple choice questions on a wide variety of topics. Fabulous prizes available, as well as one-day discounts:

Monday, Sept 7th: 50% of all print books (code: pop0907)
Monday, Sept 14: 50% off all ebooks (code: pop0914)
Thursday, Sept 17: $25 for C# in Depth, 2nd Edition MEAP print version (code: pop0917) + C# Pop Quiz question
Monday, Sept 21: 50% off all books (code: pop0921)
Thursday, Sept 24: $12 for C# in Depth, 2nd Edition MEAP ebook (code: pop0924) + another C# Pop Quiz question

Future speaking engagements

On September 16th I’m going to be speaking to Edge UG (formerly Vista Squad) in London about Code Contracts and Parallel Extensions. I’m already very much looking forward to the Stack Overflow DevDays London conference on October 28th, at which I’ll be talking about how humanity has screwed up computing.

Future potential blog posts

Some day I may get round to writing about:

Revisiting StaticRandom with ThreadLocal<T>
Volatile doesn’t mean what I thought it did

There’s a lot more writing than coding in that list… I’d like to spend some more time on MiniBench at some point, but you know what deadlines are like.

Anyway, that’s what I’ve been up to and what I’ll be doing for a little while…

Books, C#, C# 4, Stack Overflow, Wacky Ideas

Faking COM to fool the C# compiler

July 7, 2009 jonskeet 3 Comments

C# 4 has some great features to make programming against COM components ~~bearable~~ fun and exciting. In particular:

PIA linking allows you to embed just the relevant bits of the Primary Interop Assembly into your own assembly, so the PIA isn’t actually required at execution time
Named arguments and optional parameters make life much simpler for APIs like Office which are full of methods with gazillions of parameters
"ref" removal allows you to pass an argument by value even though the parameter is a by-reference parameter (COM only, folks – don’t worry!)
Dynamic typing allows you to remove a load of casts by converting every parameter and return type of "object" into "dynamic" (if you’re using PIA linking)

I’m currently writing about these features for the book (don’t forget to buy it cheap on Friday) but I’m not really a COM person. I want to be able to see these compiler features at work against a really simple type. Unfortunately, these really are COM-specific features… so we’re going to have to persuade COM that the type really is a COM type.

I got slightly stuck on this first, but thanks to the power of Stack Overflow, I now have a reasonably complete demo "fake" COM type. It doesn’t do a lot, and in particular it doesn’t have any events, but it’s enough to show the compiler features:

using System;
using System.Runtime.InteropServices;

// Required for linking into another assembly (C# 4)
[assembly:Guid("86ca55e4-9d4b-462b-8ec8-b62e993aeb64")]
[assembly:ImportedFromTypeLib("fake.tlb")]

namespace FakeCom
{
    [Guid("c3cb8098-0b8f-4a9a-9772-788d340d6ae0")]
    [ComImport, CoClass(typeof(FakeImpl))]
    public interface FakeComponent
    {
        object MakeMeDynamic(object arg);

        void Foo([Optional] ref int x,
                 [Optional] ref string y);
    }

    [Guid("734e6105-a20f-4748-a7de-2c83d7e91b04")]
    public class FakeImpl {}
}

We have an interface representing our COM type, and a class which the interface claims will implement it. Fortunately the compiler doesn’t actually check that, so we can get away with leaving it entirely unimplemented. It’s also worth noting that our optional parameters can be by-reference parameters (which you can’t normally do in C# 4) and we haven’t given them any default values (as those are ignored for COM anyway).

This is compiled just like any other assembly:

csc /target:library FakeCom.cs

Then we get to use it with a test program:

using FakeCom;

class Test
{
    static void Main()
    {
        // Yes, that is calling a "constructor" on an interface
        FakeComponent com = new FakeComponent();

        // The boring old fashioned way of calling a method
        int i = 0;
        string j = null;
        com.Foo(ref i, ref j);

        // Look ma, no ref!
        com.Foo(10, "Wow!");

        // Who cares about parameter ordering?
        com.Foo(y: "Not me", x: 0);

        // And the parameters are optional too
        com.Foo();

        // The line below only works when linked rather than
        // referenced, as otherwise you need a cast.
        // The compiler treats it as if it both takes and
        // returns a dynamic value.
        string value = com.MakeMeDynamic(10);
    }
}

This is compiled either in the old "deploy the PIA as well" way (after adding a cast in the last line):

csc /r:FakeCom.dll Test.cs

… or by linking the PIA instead:

csc /l:FakeCom.dll Test.cs

(The difference is just using /l instead of /r.)

When the test code is compiled as a reference, it decompiles in Reflector to this (I’ve added whitespace for clarity):

private static void Main()
{
FakeComponent component = (FakeComponent) new FakeImpl();

    int x = 0;
    string y = null;
    component.Foo(ref x, ref y);

    int num2 = 10;
    string str3 = "Wow!";
    component.Foo(ref num2, ref str3);

    string str4 = "Not me";
    int num3 = 0;
    component.Foo(ref num3, ref str4);

    int num4 = 0;
    string str5 = null;
    component.Foo(ref num4, ref str5);

string str2 = (string) component.MakeMeDynamic(10);
}

Note how the compiler has created local variables to pass by reference; any changes to the parameter are ignored when the method returns. (If you actually pass a variable by reference, the compiler won’t take that away, however.)

When the code is linked instead, the middle section is the same, but the construction and the line calling MakeMeDynamic are very different:

private static void Main()
{
FakeComponent component = (FakeComponent) Activator.CreateInstance(Type.GetTypeFromCLSID
(new Guid("734E6105-A20F-4748-A7DE-2C83D7E91B04")));

// Middle bit as before

    if (<Main>o__SiteContainer6.<>p__Site7 == null)
    {
        <Main>o__SiteContainer6.<>p__Site7 = CallSite<Func<CallSite, object, string>>
            .Create(new CSharpConvertBinder
                       (typeof(string),
                        CSharpConversionKind.ImplicitConversion, false));
    }
    string str2 = <Main>o__SiteContainer6.<>p__Site7.Target.Invoke
        (<Main>o__SiteContainer6.<>p__Site7, component.MakeMeDynamic(10));
}

The interface is embedded in the generated assembly, but with a slightly different set of attributes:

[ComImport, CompilerGenerated]
[Guid("C3CB8098-0B8F-4A9A-9772-788D340D6AE0"), TypeIdentifier]
public interface FakeComponent
{
object MakeMeDynamic(object arg);
void Foo([Optional] ref int x, [Optional] ref string y);
}

The class isn’t present at all.

I should point out that doing this has no practical benefit in real code – but the ability to mess around with a pseudo-COM type rather than having to find a real one with the exact members I want will make it a lot easier to try a few corner cases for the book.

So, not a terribly productive evening in terms of getting actual writing done, but interesting nonetheless…

Stack Overflow

Reasons for voting on questions and answers

May 20, 2009 jonskeet 23 Comments

I’ve recently been involved in a few discussions around voting on Stack Overflow, and I think my own "policy" around it may be different to that of others. I thought it would be worth sharing why I personally vote items up or down, and hear your thoughts too. This blog may not be the ideal venue for such a post, but until such time as we have a real "meta" site for Stack Overflow (such as Stack Overflow Overflow) I can’t think of anywhere better to write about it. Readers who are only interested in coding should move on; I promise not to include anything about code in the rest of this post.

I’m going to assume that anyone who’s read this far is at least somewhat familiar with the logistics of Stack Overflow – in particular, how one votes and the effects on reputation. I’ll use the word "post" here to mean either a question or an answer.

I’d like to stress that this is in no way meant to be seen as an "official" voting guide – just how I happen to think.

Why vote?

There are two "audiences" for a vote in my view: the author of the post and the community who is reading the post and looking at the vote tally. The author can tell what votes they’ve received using the reputation tab of their "recent activity" page, whereas the readers can only tell what the overall tally is. Obviously the author also receives or loses reputation, too. This means the effect on the author and the audience are slightly different.

For the author, the immediate reward or punishment aspect may sound like the most important aspect: but I’d argue that for many users (particularly those with high reputation) the reputation for a single vote isn’t as important as the effect on the vote tally and what it communicates about your post. There’s usually a positive feedback effect on Stack Overflow: if one answer has a couple of votes and another has none, then the higher voted one is likely to get read more and thus garner more votes. The opposite can happen: an answer with a negative score will sometimes receive "sympathy" votes from users who think, "This answer isn’t brilliant, but it’s not bad enough to deserve downvotes."

For me, the important point about a downvote is that it may indicate I’ve got something wrong in an answer. I may have missed the point of the question in the first place, or simply provided a technically incorrect or unhelpful answer. I want to know about that, so I can fix my answer or delete it if I can’t actually provide any more useful information than is contained in other answers.

For the reader, the information communicated by the score can be as simple as "this question is interesting/this answer is helpful" vs "this is a poor question/this answer is harmful". I would hope that non-regular visitors will quickly get the idea that the highest voted answers are likely to be the best ones, and that answers with a negative score really shouldn’t be trusted. It doesn’t always work that way, but it’s a reasonable rule of thumb.

So, how do I vote?

How I react to questions

I generally upvote a question if it’s been well written and I think the problem is sufficiently common that it’s going to help someone else searching for it. I’ll also upvote it if it’s particularly interesting, even if it’s not a very general problem. I suspect I should upvote questions more often – and I also suspect that’s true of many users.

I very rarely downvote a question, however. If the question is inappropriate, I’ll usually vote to close it. If it’s badly written but intelligible, I’ll edit it. If it isn’t precise enough (just lacks information) I’ll ask for more information in a comment. I don’t see much use in downvoting. Now there are various users who don’t support the idea of closing a question at all, of course – and if closing weren’t an option then I would downvote instead. However, I personally support question closing (when appropriate, of course – I’m not saying every closed question deserved it).

How I react to answers

I will generally upvote an answer if I feel it’s correct and helpful. If there are multiple posts which effectively answer a question, I will usually upvote the best one, but others which provide other bits of relevant information may get a vote too. Again, I probably don’t upvote as often as I should.

I downvote if I see an answer as actively unhelpful: this is usually if it’s technically inaccurate, or suggests something which I think is a really bad idea (such as string concatenation in a loop without a small, known limit). I don’t downvote an answer just for being "not as helpful as it could be" or "not as helpful as another answer". I believe that behaviour discourages people from contributing in the first place, and an extra answer has relatively little cost associated with it. If it contains no information which isn’t present in another answer then I’d prefer it to be deleted, but I’d leave that suggestion as a comment rather than a downvote.

Speaking of comments, I practically always leave a comment when I downvote. A downvote means I believe something is really wrong with the answer, and it should be fixed: leaving it as it is makes the world a worse place than if it didn’t exist. A downvote without a comment is fairly pointless – it doesn’t help the poster to fix the answer, because they don’t know what they did wrong in the first place. I find it intensely frustrating when someone downvotes one of my answers without giving any reason: apart from anything else, that downvote could be made on the basis of a mistaken belief, but without that belief being expressed I have no way of correcting it. I always take another look at an answer which has been downvoted, but I’m much more likely to edit it if I’m given a specific reason.

Note that the reputation loss due to a downvote is almost insignificant – particularly if it’s early in the day (i.e. before very active users hit the 200 cap) – the idea that I’ve written something unhelpful is far more disconcerting to me than the loss of a tiny amount of rep.

I can see why users aren’t prompted for a comment on a downvote – it would be very easy to just type garbage, and that would be worse than no comment at all. It would also require the comment to be anonymised in order to keep the vote itself anonymous. Even so, I’d ask courteous readers to add a comment when you downvote one of my answers: I promise not to "retaliate" with a spate of downvotes, but I’d really like to be able to fix the answer!

In terms of editing, I will often edit an answer for formatting reasons or to correct a small typo, but I edit answers (from other people) less often than I edit questions.

How about you?

Enough about me – how do you vote? If you downvote without comments, what effect are you trying to achieve? When would you downvote a question rather than editing it, voting to close or leaving a comment? How do you react to downvotes to your own posts?

Stack Overflow, Wacky Ideas

Go on, ask me anything

March 24, 2009 jonskeet 28 Comments

This afternoon, I found a comment which had been trapped in the spam bin for this blog. It was from Andrew Rimmer, in reply to my “micro-celebrity” post, pointing me at http://askjonskeet.com

The world has officially become extremely silly. The surprising thing is, it’s actually useful – at least for me. A number of times I’ve wanted to find my old answers to questions, so I can either just refer to them in a new answer or mark the new question as a dupe. You might have thought that a simple search such as

exceptions site:stackoverflow.com “jon skeet”

would suffice – but that finds what other people have said about exceptions in the same questions that I’ve been active in, and it also picks up any questions which happened to get one of the FinalBuilder adverts when the spider fetched them. The equivalent search on AskJonSkeet.com gets good results.

I’ve no idea how useful it will be for anyone else, but personally I love it. Ego? What ego?

Side note to self, puncturing ego slightly: don’t blog on the tube. It’s way too easy to miss your stop…

General, Stack Overflow

Answering technical questions helpfully

February 17, 2009 jonskeet 19 Comments

I’m unsure of whether this should be a blog post or an article, so I’ll probably make it both. I’ve probably written most of it before in Stack Overflow answers, but as I couldn’t find anything when I was looking earlier (to answer a question about Stack Overflow answers – now deleted, so only available for those over 10k rep) I figured it was time to write something in a medium I had more control over.

This is not a guide to getting huge amounts of reputation on Stack Overflow. As it happens, following the guidelines here is likely to result in decent rep, but I’m sure there are various somewhat underhand ways of gaining reputation without actually being helpful. In other words, I’m not going to explain how you might game the system, but just share my views on how to work with the system for the benefit of all involved. A lot of this isn’t specific to Stack Overflow, but some of the advice isn’t applicable to some forums.

Hypocrisy warning: I’m not going to claim I always follow everything in this list. I try, but that’s not the same thing – and quite often time pressures will compromise the quality of an answer. Oh, and don’t read anything into the ordering here. It’s how it happened to come out, that’s all.

Read the question

All too often I’ve written what I thought was a great answer… only to reread the question and find out that it wasn’t going to help the questioner at all.

Now, questions are often written pretty badly, leaving out vital information, being vague about the problem, including lots of irrelevant code but leaving out the crucial bit which probably contains the bug, etc. It’s at least worth commenting on the question to ask for more information, but just occasionally you can apply psychic debugging to answer the question anyway. If you do have to make assumptions when writing an answer, state them explicitly. That will not only reduce the possibility of miscommunication, but will also point out the areas which need further clarification.

Code is king

I can’t believe I nearly posted this article without mentioning sample code.

Answers with sample code are gold… if the code is appropriate. A few rules of thumb:

Make sure it compiles, assuming it’s meant to. This isn’t always possible – for example, I often post at work from a machine without .NET on it, or on the train from a laptop without Java. If you can’t get a compiler to make sure your code is valid, be extra careful in terms of human inspection.
Snippets are okay, but complete programs rock. If you’re not already comfortable with writing short console apps, practise it. Often you can write a complete app in about 15 lines which doesn’t just give the solution, but shows it working. Imagine the level of confidence that gives to anyone reading your answer. Get rid of anything you don’t need – brevity is really helpful. (In C# for example that usually means getting rid of [STAThread], namespace declarations and unused using directives.)
Take a bit of care over formatting. If possible, try to prevent the code wrapping. That’s not always realistically feasible, but it’s a nice ideal. Make sure the spacing at least looks okay – you may want to use a 2-space indent if your code involves a lot of nesting.
If you skip best practices, add a comment. For example, you might include // Insert error handling here to indicate that production code really should check the return value etc. This doesn’t include omitting the STAThread attribute and working without a namespace – those are just reasonable assumptions and unlikely to be copied wholesale, whereas the main body of the code may well be. If your code leaks resources, someone’s production code may do so as well…

Code without an explanation is rarely useful, however. At least provide a sentence or two to explain what’s going on.

Answer the question and highlight side-issues

Other developers don’t always do things the way we’d like them to. Questions often reflect this, basically asking how to do something which (in our view) shouldn’t be attempted in the first place. It may completely infeasible, or it may just be a really bad idea.

Occasionally, the idea is so awful – and possibly harmful to users, especially when it comes to security questions – that the best response is just to explain (carefully and politely) why this is a really bad thing to do. Usually, however, it’s better to answer the question and give details of better alternatives at the same time. Personally I prefer to give these alternatives before the answer to the question asked, as I suspect it makes it more likely that the questioner will read the advice and take it on board. Don’t forget that the more persuasive you can be, the more likely it is they’ll abandon their original plans. In other words, “Don’t do this!” isn’t nearly as useful as “Don’t do this because…”

It’s okay to guess, but be honest

This may be controversial. I’ve certainly been downvoted twice on SO for having the temerity to post an answer without being 100% sure that it’s the right one – and (worse?) for admitting as much.

Sometimes there are questions which are slightly outside your own area of expertise, but they feel an awful lot like a situation you’ve been in before. In this kind of case, you can often write an answer which may well help – and would at least suggest something for the questioner to investigate a a possible answer to their problem. Sometimes you may be way off base, which is why it’s worth explaining in your answer that you are applying a bit of educated guesswork. If another answer is posted by an expert in the topic, it may well be worth the questioner trying their solution first… but at least you’re providing an alternative if they run out of other possibilities.

Now, somewhat contradictory…

Raise the overall accuracy level

It should go without saying that a correct answer is more helpful than an incorrect one. There are plenty of entirely inaccurate answers on Stack Overflow – and on newsgroups, and basically every online community-based resource I’ve ever seen. This isn’t surprising, but the best ways to counter it are:

Challenge inaccurate information
Provide accurate information yourself

One of the key aspects of this is to provide evidence. If you make an objective statement without any sort of disclaimer about your uncertainty, that should mean you’ve got good reason to believe you’re correct. That doesn’t necessarily mean you need to provide evidence, but as soon as there’s disagreement, evidence is king. If you want to assert that your code is faster than some other code, write a benchmark (carefully!). If you want to prove that an object can be collected at a certain point in time, write a test to show it. Short but complete programs are great for this, and can stop an argument dead in its tracks.

Another source of evidence is documentation and specifications. Be aware that they’re not always accurate, but I generally believe documentation unless I have a specific reason to doubt it.

Provide links to related resources

There have been a few questions on Stack Overflow as to whether it’s appropriate to link to other resources on the web. My own opinion is that it’s absolutely appropriate and can add a lot of value to an answer. In particular, I like to link to:

MSDN and JavaDoc documentation, or the equivalent for other platforms. With MSDN URLs, if they end in something like http://msdn.microsoft.om/foo(VS80).aspx, take the bit in brackets out of the URL (leaving http://msdn.microsoft.om/foo.aspx in this case). That way the link will always be to the most recent version of the documentation, and it doesn’t give WMD as many problems either.
Language specifications, in particular those for C# and Java. I generally link to the Word document for the C# spec, which has the disadvantage that I can’t link to a specific section, and it won’t just open in the browser. On the other hand, I find it easier to navigate than the MSDN version, and I’ve seen inaccuracies in MSDN.
My own articles and blog posts (unsurprisingly :)
Wikipedia
Other resources which are unlikely to become unavailable

The point about the resources not becoming unavailable is important – one of the main arguments against linking is that the page might go away in the future. That’s not a particularly compelling argument for most reference material IMO, but it is relevant in other cases. Either way, it’s worth including some sort of summary of what you’re linking to – a link on its own doesn’t really invite the reader to follow it, whereas a quick description of what they’ll find there provides more incentive.

One quick tip: In Chrome I have an MSDN “search engine” set up and in Firefox a keyword bookmark. In both cases the URL is http://msdn.microsoft.com/en-us/library/%s.aspx – this makes it easier to get to the MSDN page for a particular namespace, type or member. For example, by typing “msdn system.io.fileinfo” I get straight to the FileInfo page. It doesn’t work for generics, however. At some point I’d like to make this simpler somehow…

Care about your reader: spelling, grammar and style matter

I’m lucky: I’m a native English speaker, and I have a reasonably good natural command of English. Having said that, I still take a certain amount of care when writing answers: I’ll often rewrite a sentence several times until I feel it works. I’ve noticed that answers with correct spelling and grammar are generally upvoted more than ones with essentially the same content but less careful presentation.

I’m not recommending style over substance; I’m saying that both are important. You could be putting forward the most insightful comment in the world, but if you can’t communicate it effectively to your readers, it’s not going to help anyone. Having said that…

A time-limited answer may be better than no answer at all

I answer Stack Overflow questions in whatever spare time I have: waiting for a build, on the train, taking a break from editing etc. I frequently see a question which would take a good 15 minutes or more to answer properly – but I only have 30 seconds. If the question already has answers, there’s probably no sensible contribution I can make, but if the question is completely unanswered, I sometimes add a very short answer with the most important points I can think of at the time.

Ideally, I’d then go back later and edit the answer to make it more complete – but at least I may have given the questioner something to think about or an option to try. Usually these “quickies” are relatively fruitless, but occasionally I’ll come back to find that the answer was accepted: the slight nudge in the answer was all that was required. If I’m feeling diligent at that point I’ll still complete the answer, but the point is that a brief answer is usually better than nothing.

Don’t be afraid to delete (or edit heavily) useless answers

It’s almost inevitable that if you post enough answers, one of them will be less than helpful. It may start off being a good one, but if a later answer includes all the information from your answer and more, or explains it in a better way, it’s just clutter.

Likewise if you make a wild stab in the dark about the cause of a problem, and that guess turns out to be wrong, your answer could become actively unhelpful. Usually the community will let you know this in comments or voting, but sometimes you have to recognise it on your own.

Be polite

It’s a shame that I have to include this, and it’s even more of a shame that I need to take better notice of it myself. However boneheaded a question is, there’s no need to be rude. You can express your dismay at a question, and even express dismay at someone failing to read your answer properly, without resorting to inflammatory language. In the end, no-one really wins from a flame war. Remember that there’s a real person at the other end of the network connection, and they’re probably just as frustrated with you as you are with them. If things are getting out of hand, just write a polite note explaining that you don’t think it’s productive to discuss the topic any more, and walk away. (This can be surprisingly difficult advice to heed.)

Next time I get too “involved” in a question, could someone please direct me to this point? And don’t take “But I’m right, darnit!” as an excuse.

Don’t “answer and run”

Sometimes an answer is very, very nearly spot on – but that final 1% is all the difference between the reader understanding fully and having a dangerous misunderstanding of the topic.

Stack Overflow makes it pretty easy to see comments to your answers: monitor this carefully so you can respond and clarify where appropriate. Many web forums have an “email me if this thread is updated” option – again, this is useful. It’s really frustrating (as a questioner) to be left hanging with an answer which looks like it might solve a real headache, if only you could just get the ear of the author for a moment.

Having said all of this:

Have fun

In my experience the most useful users are the ones who are obviously passionate about helping others. Don’t do it out of some misguided sense of “duty” – no-one’s paying you to do this (I assume) and no-one can reasonably complain if you just haven’t got the time or energy to answer their question. Save yourself for a time when you can be more enthusiastic and enjoy what you’re doing. Go ahead, have a cup of coffee and watch a Youtube video or something instead. You don’t have to check whether there are any new questions!

Benchmarking, C#, General, Stack Overflow

Benchmarking: designing an API with unusual goals

February 2, 2009 jonskeet 8 Comments

In a couple of recent posts I’ve written about a benchmarking framework and the results it produced for using for vs foreach in loops. I’m pleased with what I’ve done so far, but I don’t think I’ve gone far enough yet. In particular, while it’s good at testing multiple algorithms against a single input, it’s not good at trying several different inputs to demonstrate the complexity vs input size. I wanted to rethink the design at three levels – what the framework would be capable of, how developers would use it, and then the fine-grained level of what the API would look like in terms of types, methods etc. These may all sound quite similar on the face of it, but this project is somewhat different to a lot of other coding I’ve done, mostly because I want to lower the barrier to entry as far as humanly possible.

Before any of this is meaningful, however, I really needed an idea of the fundamental goal. Why was I writing yet another benchmarking framework anyway? While I normally cringe at mission statements because they’re so badly formulated and used, I figured this time it would be helpful.

Minibench makes it easy for developers to write and share tests to investigate and measure code performance.

The words in bold (or for the semantically inclined, the strong words) are the real meat of it. It’s quite scary that even within a single sentence there are seven key points to address. Some are quite simple, others cause grief. Now let’s look at each of the areas of design in turn.

Each element of the design should either clearly contribute to the mission statement or help in a non-functional way (e.g. make the project feasible in a reasonable timeframe, avoid legal issues etc). I’m aware that with the length of this post, it sounds like I’m engaging in "big upfront design" but I’d like to think that it’s at least informed by my recent attempt, and that the design criteria here are statements of intent rather than implementation commitments. (Aargh, buzzword bingo… please persevere!)

What can it do?

As we’ve already said, it’s got to be able to measure code performance. That’s a pretty vague definition, however, so I’m going to restrict it a bit – the design is as much about saying what isn’t included as what is.

Each test will take the form of a single piece of code which is executed many times by the framework. It will have an input and an expected output. (Operations with no natural output can return a constant; I’m not going to make any special allowance for them.)
The framework should take the tedium out of testing. In particular I don’t want to have to run it several times to get a reasonable number of iterations. I suspect it won’t be feasible to get the framework to guess appropriate inputs, but that would be lovely if possible.
Only wall time is measured. There are loads of different metrics which could be applied: CPU execution time, memory usage, IO usage, lock contention – all kinds of things. Wall time (i.e. actual time elapsed, as measured by a clock on the wall) is by far the simplest to understand and capture, and it’s the one most frequently cited in newsgroup and forum questions in my experience.
The benchmark is uninstrumented. I’m not going to start rewriting your code dynamically. Frankly this is for reasons of laziness. A really professional benchmarking system might take your IL and wrap it in a timing loop within a single method, somehow enforcing that the result of each iteration is used. I don’t believe that’s worth my time and energy, as well as quite possibly being beyond my capabilities.
As a result of the previous bullet, the piece of code to be run lots of times needs to be non-trivial. The reality is that it’ll end up being called as a delegate. This is pretty quick, but if you’re just testing "is adding two doubles faster or slower than adding two floats" then you’ll need to put a bit more work in (e.g. having a loop in your own code as well).
As well as the use case of "which of these algorithms performs the best with this input?" I want to support "how does the performance vary as a function of the input?" This should support multiple algorithms at the same time as multiple inputs.
The output should be flexible but easy to describe in code. For single-input tests simple text output is fine (although the exact figures to produce can be interesting); for multiple inputs against multiple tests a graph would often be ideal. If I don’t have the energy to write a graphing output I should at least support writing to CSV or TSV so that a spreadsheet or graphing tool can do the heavy lifting.
The output should be useful – it should make it easy to compare the performance of different algorithms and/or inputs. It’s clear from the previous post here that just including the scaled score doesn’t give an obvious meaning. Some careful wording in the output, as well as labeled columns, may be required. This is emphatically not a dig at anyone confused by the last post – any confusion was my own fault.

Okay, that doesn’t sound too unreasonable. The next area is much harder, in my view.

How does a developer use it?

Possibly the most important word in the mission statement is share. The reason I started this project at all is that I was fed up with spending ages writing timing loops for benchmarks which I’d then post on newsgroups or Stack Overflow. That means there are two (overlapping) categories of user:

A developer writing a test. This needs to be easy, but that’s an aspect of design that I’m reasonably familiar with. I’m not saying I’m good at it, but at least I have some prior experience.
A developer reading a newsgroup/forum post, and wanting to run the benchark for themselves. This distribution aspect is the hard bit – or at least the bit requiring imagination. I want the barrier to running the code to be really, really low. I suspect that there’ll be a "fear of the unknown" to start with which is hard to conquer, but if the framework becomes widely used I want the reader’s reaction to be: "Ah, there’s a MiniBench for this. I’m confident that I can download and run this code with almost no effort."

This second bullet is the one that my friend Douglas and I have been discussing over the weekend, in some ways playing a game of one-upmanship: "I can think of an idea which is even easier than yours." It’s a really fun game to play. Things we’ve thought about so far:

A web page which lets you upload a full program (without the framework) and spits out a URL which can be posted onto Stack Overflow etc. The user would then choose from the following formats:

Single .cs file containing the whole program – just compile and run. (This would also be shown on the download page.)
Test code only – for those whole already have the framework
Batch file – just run it to extract/build/run the C# code.
NAnt project file containing the C# code embedded in it – just run NAnt
MSBuild project file – ditto but with msbuild.
Zipped project – open the project to load the test in one file and the framework code in other (possibly separate) .cs files
Zipped solution – open to load two projects: the test code in one and the framework in the other

A web page which which lets you upload your results and browse the results of others

Nothing’s finalised here, but I like the general idea. I’ve managed (fairly easily) to write a "self-building" batch file, but I haven’t tried with NAnt/MSBuild yet. I can’t imagine it’s that hard – but then I’m not sure how much value there is either. What I do want to try to aim for is users running the tests properly, first time, without much effort. Again, looking back at the last post, I want to make it obvious to users if they’re running under a debugger, which is almost always the wrong thing to be doing. (I’m pretty sure there’s an API for this somewhere, and if there’s not I’m sure I can work out an evil way of detecting it anyway.)

The main thing is the ease of downloading and running the benchmark. I can’t see how it could be much easier than "follow link; choose format and download; run batch file" – unless the link itself was to the batch file, of course. (That would make it harder to use for people who wanted to get the source in a more normal format, of course.)

Going back to the point of view of the developer writing the test, I need to make sure it’s easy enough for me to use from home, work and on the train. That may mean a web page where I can just type in code, the input and expected output, and let it fill in the rest of the code for me. It may mean compiling a source file against a library from the command line. It may mean compiling a source file against the source code of the framework from the command line, with the framework code all in one file. It may mean building in Visual Studio. I’d like to make all of these cases as simple as possible – which is likely to make it simple for other developers as well. I’m not planning on optimising the experience when it comes to writing a benchmark on my mobile though – that might be a step too far!

What should the API look like?

When we get down to the nitty-gritty of types and methods, I think what I’ve got is a good starting point. There are still a few things to think about though:

We nearly have the functionality required for running a suite with different inputs already – the only problem is that we’re specifying the input (and expected output) in the constructor rather than as parameters to the RunTests method. I could change that… but then we lose the benefit of type inference when creating the suite. I haven’t resolved this to my satisfaction yet :(
The idea of having the suite automatically set up using attributed methods appeals, although we’d still need a Main method to create the suite and format the output. The suite creation can be simplified, but the chances of magically picking the most appropriate output are fairly slim. I suppose it could go for the "scale to best by number of iterations and show all columns" option by default… that still leaves the input and expected output, of course. I’m sure I’ll have something like this as an option, but I don’t know how far it will go.
The "configuration" side of it is expressed as a couple of constants at the moment. These control the minimum amount of time to run tests for before we believe we’ll be able to guess how many iterations we’ll need to get close to the target time, and the target time itself. These are currently set at 2 seconds and 30 seconds respectively – but when running tests just to check that you’ve got the right output format etc, that’s far too long. I suspect I should make a test suite have a configuration, and default to those constants but allow them to be specified on the command line as well, or explicitly in code.
Why do we need to set the expected output? In many cases you can be pretty confident that at least one of the test cases will be correct – so it’s probably simpler just to run each test once and check that the results are the same for all of them, and take that as the expected output. If you don’t have to specify the expected output, it becomes easier to specify a sequence of inputs to test.
Currently BenchmarkResult is nongeneric. This makes things simpler internally – but should a result know the input that it was derived from? Or should the ResultSuite (which is also nongeneric) know the input that has been applied to all its functions? The information will certainly need to be somewhere so that it can be output appropriately in the multiple input case.

My main points of design focus around three areas:

Is it easy to pick up? The more flexible it is, with lots of options, the more daunting it may seem.
Is it flexible enough to be useful in a variety of situations? I don’t know what users will want to benchmark – and I don’t build the right tool, it will be worthless to them.
Is the resulting test code easy and brief enough to include in a forum post, with a link to the full program? Will readers understand it?

As you can see, these are aimed at three slightly different people: the first time test writer, the veteran test writer, and the first time test reader. Getting the balance between the three is tricky.

What’s next?

I haven’t started rewriting the framework yet, but will probably do so soon. This time I hope to do it in a rather more test-driven way, although of course the timing-specific elements will be tricky unless I start using a programmatic clock etc. I’d really like comments around this whole process:

Is this worth doing?
Am I asking the right questions?
Are my answers so far headed in the right direction?
What else haven’t I thought of?

General, Stack Overflow

Programming is hard

January 29, 2009 jonskeet 22 Comments

One of the answers to my “controversial opinions” question on Stack Overflow claims that “programming is so easy a five year old can do it.“

I’m sure there are some aspects of programming which a five year old can do. Other parts are apparently very hard though. Today I came to the following conclusion:

If your code deals with arbitrary human text, it’s probably broken. (Have you taken the Turkey test recently?)
If your code deals with floating point numbers, it’s probably broken.
If your code deals with concurrency (whether that means database transactions, threading, whatever), it’s probably broken.
If your code deals with dates, times and time zones, it’s probably broken. (Time zones in particular are hideous.)
If your code has a user interface with anything other than a fixed size and a fixed set of labels (no i18n), it’s probably broken.

You know what I like working with? Integers. They’re nice and predictable. Give me integers, and I can pretty much predict how they’ll behave. So long as they don’t overflow. Of have an architecture-and-processor-dependent size.

Maybe you think I’m too cynical. I think I’m rather bullish, actually. After all, I used the word “probably” on all of those bullet points.

The thing that amazes me is that despite all this hardness, despite us never really achieving perfection, programs seem to work well enough most of the time. It’s a bit like a bicycle – it really shouldn’t work. I mean, if you’d never seen one working, and someone told you that:

You can’t really balance when it’s stationary. You have to go at a reasonable speed to stay stable.
Turning is a lot easier if you lean towards the ground.
The bit of the bike which is actually in contact with the ground is always stationary, even when the bike itself is moving.
You stop by squeezing bits of rubber onto the wheels.

Would you not be a touch skeptical? Likewise when I see the complexity of software and our collective failure to cope with it, I’m frankly astonished that I can even write this blog post.

It’s been a hard day. I achieved a small victory over one of the bullet points in the first list today. It took hours, and I pity my colleague who’s going to code review it (I’ve made it as clear as I can, but some things are just designed to mess with your head) – but it’s done. I feel simultaneously satisfied in a useful day’s work, and depressed at the need for it.

C#, General, Stack Overflow, Wacky Ideas

Quick rant: why isn’t there an Exception(string, params object[]) constructor?

January 23, 2009 jonskeet 25 Comments

This Stack Overflow question has reminded me of something I often wish existed in common exception constructors – an overload taking a format string and values. For instance, it would be really nice to be able to write:

throw new IOException(“Expected to read {0} bytes but only {1} were available”,
requiredSize, bytesRead);

Of course, with no way of explicitly inheriting constructors (which I almost always want for exceptions, and almost never want for anything else) it would mean yet another overload to copy and paste from another exception, but the times when I’ve actually written it in my own exceptions it’s been hugely handy, particularly for tricky cases where you’ve got a lot of data to include in the message. (You’d also want an overload taking a nested exception first as well, adding to the baggage…)

General, Stack Overflow

Stack Overflow reputation and being a micro-celebrity

January 15, 2009 jonskeet 44 Comments

I’ve considered writing a bit about this before, but not done so for fear of looking like a jerk. I still think I may well end up looking like a jerk, but this is all stuff I’m interested in and I’ll enjoy writing about it, so on we go. Much of this is based on experiences at and around Stack Overflow, and it’s more likely to be interesting to you if you’re a regular there or at least know the basic premises and mechanics. Even then you may well not be particularly interested – as much as anything, this post is to try to get some thoughts out of my system so I can stop thinking about how I would blog about it. If you don’t want the introspection, but want to know how to judge my egotism, skipping to the summary is probably a good plan. If you really don’t care at all, that’s probably a healthy sign. Quit now while you’re ahead.

What is a micro-celebrity?

A couple of minutes ago, I thought I might have been original with the term “micro-celebrity” but I’m clearly not. I may well not use the term the same way other people do, however, so here’s my rough definition solely for the purposes of this post:

A micro-celebrity is someone who gains a significant level of notoriety within a relatively limited community on the internet, usually with a positive feedback loop.

Yes, it’s woolly. Not to worry.

I would consider myself to have been a micro-celebrity in five distinct communities over the course of the last 14 years:

The alt.books.stephen-king newsgroup
The mostly web-based community around Team17’s series of “Worms” games (well, the first few, on the PC only)
The comp.lang.java.* newsgroups
The microsoft.public.dotnet.languages.csharp newsgroup
Stack Overflow

The last has been far and away the most blatant case. This is roughly how it goes – or at least how it’s gone in each of the above cases:

Spend some time in the community, post quite a lot. Shouting loudly works remarkably well on the internet – if you’re among the most prolific writers in a group, you will get noticed. Admittedly it helps to try hard to post well-written and interesting thoughts.
After a while, a few people will refer to you in their other conversations. For instance, if someone in the Java newsgroup was talking about “objects being passed by reference”, another poster might say something like “Don’t let Jon Skeet hear you talking like that.”
Play along with it, just a bit. Don’t blow your own trumpet, but equally don’t discourage it. A few wry comments to show that you don’t mind often go down well.
Sooner or later, you will find yourself not just mentioned in another topic, but being the topic of conversation yourself. At this point, it’s no longer an inside joke that just the core members of the group “get” – you’re now communal property, and almost any regular will be part of the joke.

One interesting thing you might have noticed about the above is that it doesn’t really take very much skill. It takes a fair amount of time, and ideally you should have some reasonable thoughts and the ability to express yourself clearly, but you certainly don’t have to be a genius. Good job, really.

How much do you care?

This is obviously very personal, and I’m only speaking for myself (as ever).

It’s undeniably an ego boost. Just about every day there’s something on Stack Overflow to laugh about in reference to me. How could I not enjoy that? How could it not inflate my sense of self-worth just a little bit? I could dismiss it as being entirely silly and meaningless – which it is, ultimately – but it’s still fun and I get a kick out of it. And yes, I’m sorry to say I bore/annoy my colleagues and wife with the latest Stack Overflow news, because I’ve always been the selfish kind of person who wants to talk about what they’re up to instead of asking the other person about their interests. This is an unfortunate trait which has relatively little to do with the micro-celebrity business.

One very good thing for keeping my ego in check is that at Google, I converse with people who are smarter than me every day, whether at coffee, breakfast, lunch or just while coding. There’s no sense of anyone trying to make anyone else feel small, but it’s pretty obvious that I’m nothing special when it comes to Google. Now, I don’t want to put on too much false modesty – I know I have a reasonable amount of experience, and I happen to know two very popular platforms reasonably well (which really helps on Stack Overflow- being the world’s greatest guru on MUMPS isn’t going to get you much love), and perhaps most importantly I can communicate pretty well. All of these are good things, and I’m proud of my career and particularly C# in Depth…

… but let’s get real here. The Jon Skeet Facts page isn’t really about me. It’s about making geek jokes where the exact subject is largely irrelevant. It could very easily have been about someone else with little change in the humour. Admittedly the funniest answers (to my mind) are the ones which do have some bearing on me (particularly the one about having written a book on C# 5.0 already) – but that doesn’t mean there’s anything really serious in it. I hope it’s pretty obvious to everyone that I’m not a genius programmer. I’d like to think I’m pretty good, but I’m not off-the-charts awesome by any means. (In terms of computer science, I’m nothing special at all and I have a really limited range of languages/paradigms. I’m trying to do something about those, but it’s hard when there’s always another question to answer.)

It’s worth bearing in mind the “micro” part of micro-celebrity. I suspect that if we somehow got all the C# developers in the world together and asked them whether they’d heard of Jon Skeet, fewer than 0.1% would say yes. (That’s a complete guess, by the way. I have really no idea. The point is I’m pretty sure it’s a small number.) Compared with the sea of developers, the set of Stack Overflow regulars is a very small pond.

What I care far more about than praise and fandom is the idea of actually helping people and making a difference. A couple of days ago I had an email from someone saying that C# in Depth had helped them in an interview: they were able to write more elegant code because now they grok lambda expressions. How cool is that? Yes, I know it’s all slightly sickening in a “you do a lot of good work for charity” kind of way – but I suspect it’s what drives most Stack Overflow regulars. Which leads me on to reputation…

What does Stack Overflow reputation mean to you?

In virtually every discussion about the Stack Overflow reputation system and its caps, I try to drop in the question of “what’s the point of reputation? What does it mean to you?” It’s one of those questions which everyone needs to answer for themselves. Jeff Atwood’s answer is that reputation is how much the system trusts you. My own answers:

It’s a daily goal. Making sure I always get to 200 is a fun little task, and then trying to get accepted answers is a challenge.
It’s measurable data, and you can play with graphs and stats. Hey we’re geeks – it’s fun to play with numbers, however irrelevant they are.
It’s an indication of helpfulness to some extent. It plays to my ego in terms of both pride of knowledge and the fulfillment of helping people.
It’s useful as an indicator of community trust for the system to use, which is probably more important to Jeff than it is to me.
It’s a game. This is the most important aspect. I love games. I’m fiercely competitive, and will always try to work out all the corners of a game’s system – things like it being actually somewhat useless getting accepted answers before you’ve reached the 200 limit. I don’t necessarily play to the corners of the game (I would rather post a useful but unpopular answer than a popular but harmful one, for serious questions) but I enjoy working them out. I would be interested to measure my levels of testosterone when typing furiously away at an answer, hoping to craft something useful before anyone else does. I’m never going to be “macho” physically, but I can certainly be an alpha geek. So long as it doesn’t go too far, I think it’s a positive thing.

I sometimes sense (perhaps inaccurately) that Jeff and Joel are frustrated with people getting too hung up about reputation. It’s really unimportant in the grand scheme of things – rep in itself isn’t as much of a net contribution to the world’s happiness as the way that Stack Overflow connects people with questions to people with relevant answers really, really quickly. But rep is one of the things that makes Stack Overflow so “sticky” as a website. It’s not that I wouldn’t answer questions if the reputation system went down – after all, I’ve been answering questions on newsgroups for years, for the other reasons mentioned – but the reputation system certainly helps. Yes, it’s probably taking advantage of a competitive streak which is in some ways ugly… but the result is a good one.

One downside of the whole micro-celebrity thing – and in particular of being the top rep earner – is that various suggestions (such as changing the rep limit algorithm and introducing a monthly league) make me look really selfish. It’s undeniable that both of the suggestions work in my favour. I happen to believe that both work in the community’s favour too, but I can certainly see why people might get the wrong idea about my motivation. I don’t remember thinking of any suggestions which would work against my personal interests but in the interests of the community. If I do, I’m pretty sure I’ll post them with no hesitation.

Summary

Yes, I like the attention of being a micro-celebrity. It would be ridiculous to deny it, and I don’t think it says much more about me than the fact that I’m human.

Yes, I like competing for reputation, even though it’s blatantly obvious that the figure doesn’t reflect programming prowess. It’s part of the fuel for my addiction to Stack Overflow.

With this out of the way, I hope to return to more technical blog posts. If anything interesting comes up in the comments, I’ll probably edit this post rather than writing a new one.

Stack Overflow

Stack Overflow Reputation Tool now online

January 15, 2009 jonskeet 7 Comments

Update:

Now that the “recent activity” page is working, the feed that the tool was using has been removed. However, the new page offers pretty much everything that the tool did, and a lot more besides. I’ve updated the tool to just redirect to the relevant page, so your bookmarks should still work.

Original post:

This is the micro-web-app that my recent ASP.NET question was about. It’s very simple – it shows you the reputation gained or lost by a specified user (typically you) for either today or yesterday. Note that these are Stack Overflow “today” and “yesterday” – i.e. they’re in UTC. That happens to be convenient for me as I’m in the UK, but more importantly it’s in tune with the reputation limits. It does mean that if you’re in a different time zone you’ll see the date changing at potentially unexpected times.

There’s an option for including a record of questions/answers which have received an upvote during the day but which haven’t generated any reputation – this happens if you’ve already hit the reputation limit for the day before the vote.

The worst part about the user interface (to my mind) is that you have to know the ID of the user whose reputation you want to check. This isn’t exactly hard, but it’s slightly annoying. Basically you need to go to the user’s profile page on Stack Overflow and look at the URL. It will be of the form http://stackoverflow.com/users/%5Buser-id%5D/%5Buser-name%5D – take the user ID from that, and put it into the tool. I may be able to have a browsing mode just like that on SO at some point, but it will take at least some work. I’ve been concentrating on the data retrieved rather than the presentation, as you’ll no doubt be able to tell at a glance :)

All the options are specified on the URL, so you can bookmark your own user results very easily. For example:

My reputation for today: http://csharpindepth.com/StackOverflow/ReputationTracker.aspx?user=22656&mode=today
My reputation for yesterday, including non-scoring items: http://csharpindepth.com/StackOverflow/ReputationTracker.aspx?user=22656&mode=yesterday&showzero=true

(If anyone has any better ideas for the URL parameter than “showzero” I’m very much open to suggestions. I can keep backward compatibility for the sake of bookmarks really easily.)

At the moment it’s showing pretty much all the information it receives. I’m hoping that I may be able to work with the Stack Overflow team to make it easy (and importantly, cheap for the SO server) to show a whole date range (e.g. “what happened in the last week?”) and also give details of the number of votes up and down, and when an answer is accepted (or unaccepted).

Enjoy, and share with friends. Feedback welcome. Many thanks to Geoff Dalgas for working with me to limit the impact on the server.