For vs Foreach on arrays and lists

As promised earlier in the week, here are the results of benchmarking for and foreach.

For each of int and double, I created an array and a List<T>, filled it with random data (the same for the list as the array) and ran each of the following ways of summing the collection:

  • A simple for loop, testing the index against the array length / list count each time
  • A for loop with an initial variable remembering the length of the array, then comparing the index against the variable each time
  • A foreach loop against the collection with the type known at compile and JIT time
  • A foreach loop against the collection as IEnumerable<T>
  • Enumerable.Sum

I won’t show the complete code in this post, but it’s you can download it and then build it against the benchmarking framework. Here’s a taste of what it looks like – the code for a list instead of an array, and double instead of int is pretty similar:

List<int> intList = Enumerable.Range(0, Size)
                              .Select(x => rng.Next(100))
                              .ToList();
int[] intArray = intList.ToArray();

var intArraySuite = TestSuite.Create(“int[]”, intArray, intArray.Sum())
    .Add(input => { int sum = 0;
        for (int i = 0; i < input.Length; i++) sum += input[i];
        return sum;
    }, “For”)
    .Add(input => { int sum = 0; int length = input.Length;
        for (int i = 0; i < length; i++) sum += input[i];
        return sum;
    }, “ForHoistLength”)
    .Add(input => { int sum = 0;
        foreach (int d in input) sum += d;
        return sum;
    }, “ForEach”)
    .Add(IEnumerableForEach)
    .Add(Enumerable.Sum, “Enumerable.Sum”)
    .RunTests();

static int IEnumerableForEach(IEnumerable<int> input)
{
    int sum = 0;
    foreach (int d in input)
    {
        sum += d;
    }
    return sum;
}

(I don’t normally format code quite like that, and wouldn’t even use a lambda for that sort of code – but it shows everything quite compactly for the sake of blogging.)

Before I present the results, a little explanation:

  • I considered int and double entirely separately – so I’m not comparing the int[] results against the double[] results for example.
  • I considered array and list results together – so I am comparing iterating over an int[] with iterating over a List<int>.
  • The result for each test is a normalized score, where 1.0 means “the best of the int summers” or “the best of the double summers” and other scores for that type of summation show how much slower that test ran (i.e. a score of 2.0 would mean it was twice as slow – it got through half as many iterations).
  • I’m not currently writing out the number of iterations each one completes – that might be interesting to see how much faster it is to sum ints than doubles.

Happy with that? Here are the results…

——————– Doubles ——————–
============ double[] ============
For                 1.00
ForHoistLength      1.00
ForEach             1.00
IEnumerableForEach 11.47
Enumerable.Sum     11.57

============ List<double> ============
For                 1.99
ForHoistLength      1.44
ForEach             3.19
IEnumerableForEach 18.78
Enumerable.Sum     18.61

——————– Ints ——————–
============ int[] ============
For                 1.00
ForHoistLength      2.03
ForEach             1.36
IEnumerableForEach 15.22
Enumerable.Sum     15.73

============ List<int> ============
For                 2.82
ForHoistLength      3.49
ForEach             4.78
IEnumerableForEach 25.71
Enumerable.Sum     26.03

I found the results interesting to say the least. Observations:

  • When summing a double[] any of the obvious ways are good.
  • When summing an int[] there’s a slight benefit to using a for loop, but don’t try to optimise it yourself – I believe the JIT recognizes the for loop pattern and removes array bounds checking, but not when the length is hoisted. Note the lack of difference when summing doubles – I suspect that this is because the iteration part is more significant when summing ints because integer addition is blindingly fast. This is important – adding integers is about as little work as you’re liikely to do in a loop; if you’re doing any real work (even as trivial as adding two doubles together) the difference between for and foreach is negligible
  • Our IEnumerableForEach method has pretty much the same performance as Enumerable.Sum – which isn’t really surprising, as it’s basically the same code. (At some point I might include Marc Gravell’s generic operators to see how they do.)
  • Using a general IEnumerable<T> instead of the specific List<T> makes a pretty huge difference to the performance – I assume this is because the JIT inlines the List<T> code, and it doesn’t need to create an object because List<T>.Enumerator is a value type. (The enumerator will get boxed in the general version, I believe.)
  • When using a for loop over a list, hosting the length in the for loop helped in the double version, but hindered in the int version. I’ve no idea why this happens.

If anyone fancies running this on their own box and letting me know if they get very different results, that would be really interesting. Likewise let me know if you want me to add any more tests into the mix.

Programming is hard

One of the answers to my “controversial opinions” question on Stack Overflow claims that “programming is so easy a five year old can do it.

I’m sure there are some aspects of programming which a five year old can do. Other parts are apparently very hard though. Today I came to the following conclusion:

  • If your code deals with arbitrary human text, it’s probably broken. (Have you taken the Turkey test recently?)
  • If your code deals with floating point numbers, it’s probably broken.
  • If your code deals with concurrency (whether that means database transactions, threading, whatever), it’s probably broken.
  • If your code deals with dates, times and time zones, it’s probably broken. (Time zones in particular are hideous.)
  • If your code has a user interface with anything other than a fixed size and a fixed set of labels (no i18n), it’s probably broken.

You know what I like working with? Integers. They’re nice and predictable. Give me integers, and I can pretty much predict how they’ll behave. So long as they don’t overflow. Of have an architecture-and-processor-dependent size.

Maybe you think I’m too cynical. I think I’m rather bullish, actually. After all, I used the word “probably” on all of those bullet points.

The thing that amazes me is that despite all this hardness, despite us never really achieving perfection, programs seem to work well enough most of the time. It’s a bit like a bicycle – it really shouldn’t work. I mean, if you’d never seen one working, and someone told you that:

  • You can’t really balance when it’s stationary. You have to go at a reasonable speed to stay stable.
  • Turning is a lot easier if you lean towards the ground.
  • The bit of the bike which is actually in contact with the ground is always stationary, even when the bike itself is moving.
  • You stop by squeezing bits of rubber onto the wheels.

Would you not be a touch skeptical? Likewise when I see the complexity of software and our collective failure to cope with it, I’m frankly astonished that I can even write this blog post.

It’s been a hard day. I achieved a small victory over one of the bullet points in the first list today. It took hours, and I pity my colleague who’s going to code review it (I’ve made it as clear as I can, but some things are just designed to mess with your head) – but it’s done. I feel simultaneously satisfied in a useful day’s work, and depressed at the need for it.

Benchmarking made easy

While I was answering a Stack Overflow question on the performance implications of using a for loop instead of a foreach loop (or vice versa) I promised to blog about the results – particularly as I was getting different results to some other posters.

On Saturday I started writing the bigger benchmark (which I will post about in the fullness of time) and used a technique that I’d used when answering a different question: have a single timing method and pass it a delegate, expected input and expected output. You can ask a delegate for the associated method and thus find out its name (for normal methods, anyway – anonymous functions won’t give you anything useful, of course) so that’s all the information you really need to run the test.

I’ve often shied away from using delegates for benchmarking on the grounds of it interfering with the results – including the code inline with the iteration and timing obviously has a bit less overhead. However, the CLR is so fast at delegate invocation these days that it’s really not an issue for benchmarks where each iteration does any real work at all.

It’s still a pain to have to write that testing infrastructure each time, however. A very long time ago I wrote a small attribute-based framework. It worked well enough, but I’ve found myself ignoring it – I’ve barely used it despite writing many, many benchmarks (mostly for newsgroup, blog and Stack Overflow posts) over the course of the years. I’m hoping that the new framework will prove more practical.

There are a few core concepts and (as always) a few assumptions:

  • A benchmark test is a function which takes a single value and returns a single value. This is expressed generically, of course, so you can make the input and output whatever type you like. A test also has a descriptive name, although this can often be inferred as the name of the function itself. The function will be called many times, the exact number being automatically determined.
  • A test suite is a collection of benchmark tests and another descriptive name, as well as the input to supply to each test and the expected output.
  • A benchmark result has a duration and an iteration count, as well as the descriptive name of the test which was run to produce the result. Results can be scaled so that either the duration or the iteration count matches another result. Likewise a result has a score, which is simply the duration (in ticks, but it’s pretty arbitrary) divided by the iteration count. Again, the score can be retrieved in a scaled fashion, using a specified result as a “standard” with a scaled score of 1.0. Lower is always better.
  • A result suite is simply the result of running a test suite. A result suite can be scaled, which is equivalent to building a new result suite with scaled copies of each original result. The result suite contains the smarts to display the results.
  • Running a test consists of two phases. First, we guess roughly how fast the function is. We run 1 iteration, then 2, then 4, then 8 etc – until it takes at least 3 seconds. At that point we scale up the number of iterations so that the real test will last around 30 seconds. This is the one we record. The final iteration of each set is tested for correctness based on the expected output. Currently the 3 and 30 second targets are hard-coded; I could perhaps make them parameters somewhere, but I don’t want to overcomplicate things.

Now for the interesting bit (from my point of view, anyway): I decided that this would be a perfect situation to try playing with a functional style. As a result, everything in the framework is immutable. When you “add” a test to a test suite, it returns a new test suite with the extra test. Running the test suite returns the result suite; scaling the result suite returns a new result suite; scaling a result returns a new result etc.

The one downside of this (beyond a bit of inefficiency in list copying) is that C# collection initializers only work with mutable collections. They also only work with direct constructor calls, whereas generic type inference doesn’t apply to constructors. In the end, the “static generic factory method” combined with simple Add method calls yields quite nice results, even though I can’t use a collection initializer:

double[] array = …; // Code to generate random array of doubles

var results = TestSuite.Create(“Array”, array, array.Sum())
                       .Add(ArrayFor)
                       .Add(ArrayForEach)
                       .Add(Enumerable.Sum, “LINQ Enumerable.Sum”)
                       .RunTests()
                       .ScaleByBest(ScalingMode.VaryDuration);

results.Display(ResultColumns.NameAndDuration);

This is a pretty small amount of extra code to write, beyond the code we actually want to benchmark (the ArrayFor and ArrayForEach methods in particular). No looping by iteration count, no guessing at the number of iterations and rerunning it until it lasts a reasonable amount of time, etc.

My only regret is that I haven’t written this in a test-driven way. There are currently no unit tests at all. Such is the way of projects that start off as “let’s just knock something together” and end up being rather bigger than originally intended.

At some point I’ll make it all downloadable from my main C# page, in normal source form, binary form, and also a “single source file” form so you can compile your benchmark with just csc /o+ /debug- Bench*.cs to avoid checking the assembly filename each time you use it. For the moment, here’s a zip of the source code and a short sample program, should you find them useful. Obviously it’s early days – there’s a lot more that I could add. Feedback would help!

Next time (hopefully fairly soon) I’ll post the for/foreach benchmark and results.

RFID: What I really want it for

This isn’t really coding related, but it’s technology related at least. There’s been a lot of fuss made about how great or awful RFID is and will be in the future, in terms of usefuless and privacy invasion respectively. There’s one use which I haven’t seen discussed, but which seems pretty obvious to me – but with further enhancements available.

Basically, I want RFID on clothes to tell me useful stuff. Suppose each item of clothing were uniquely tagged, and you had a bunch of scanners in your home linked up to one system which stored metadata about the clothes. Suddenly the following tasks become easier:

  • Working out which wash cycle to use for a whole washing basket. Can’t see the dark sock hidden in the white wash? The RFID scanner could alert you to it. Likewise tumbledrying – no need to check each item separately, just attach the scanner over the tumbledryer door and wait for it to beep as you try to put something in which shouldn’t be tumbled.
  • Separating clothes to put them away after they’re clean and dry. Admittedly this is more of a chore for our household than others (with twin boys and an older brother, where some items of clothing could belong to any of them) but it would be really, really useful for us.
  • Remembering who you’ve borrowed which clothes from. Kids grow out of clothes very quickly; we have a number of friends who lend us clothes from their children, and likewise we lend plenty of clothes to them and others. You’ve then got to remember what you borrowed from who, which can become tricky when you’ve got a loft full of bags of clothing. Wouldn’t it be nice to just “label” all the clothes in a bag with the owner’s name when you first receive them, and then just pass a scanner near everything quickly later on?

The privacy issues of all of this would have to be worked out carefully (as the simplest solution would allow your movements to be traced just by which clothes you’re wearing) but if it does end up being possible, I’ll be hugely grateful for this.

(Next up, benchmarking of for vs foreach in various situations. In other words, back to our regular schedule :)

Designing LINQ operators

I’ve started a small project (I’ll post a link when I’ve actually got something worthwhile to show) with some extra LINQ operators in – things which I think are missing from LINQ to Objects, basically. (I hope to include many of the ideas from an earlier blog post.) That, and a few Stack Overflow questions where I’ve effectively written extra LINQ operators and compared them with other solutions, have made me think about the desirable properties of a LINQ operator – or at least the things you should think about when implementing one. My thoughts so far:

Lazy/eager execution

If you’re returning a sequence (i.e. another IEnumerable<T> or similar) the execution should almost certainly be lazy, but the parameter checking should be eager. Unfortunately with the limitations of the (otherwise wonderful) C# iterator blocks, this usually means breaking the method into two, like this:

public static IEnumerable<T> Where(this IEnumerable<T> source,
                                   Func<T, bool> predicate)
{
    // Eagerly executed
    if (source == null)
    {
        throw new ArgumentNullException(“source”);
    }
    if (predicate == null)
    {
        throw new ArgumentNullException(“predicate”);
    }
    return WhereImpl(source, predicate);
}

private static IEnumerable<T> WhereImpl(IEnumerable<T> source,
                                        Func<T, bool> predicate)
{
    // Lazily executed
    foreach (T element in source)
    {
        if (predicate(element))
        {
            yield return element;
        }
    }
}

Obviously aggregates and conversions (Max, ToList etc) are generally eager anyway, within normal LINQ to Objects. (Just about everything in Push LINQ is lazy. They say pets look like their owners…)

Streaming/buffering

One of my favourite features of LINQ to Objects (and one which doesn’t get nearly the publicity of deferred execution) is that many of the operators stream the data. In other words, they only consume data when they absolutely have to, and they yield data as soon as they can. This means you can process vast amounts of data with very little memory usage, so long as you use the right operators. Of course, not every operator can stream (reversing requires buffering, for example) but where it’s possible, it’s really handy.

Unfortunately, the streaming/buffering nature of operators isn’t well documented in MSDN – and sometimes it’s completely wrong. As I’ve noted before, the docs for Enumerable.Intersect claim that it reads the whole of both sequences (first then second) before yielding any data. In fact it reads and buffers the whole of second, then streams first, yielding intersecting elements as it goes. I strongly encourage new LINQ operators to document their streaming/buffering behaviour (accurately!). This will limit future changes in the implementation admittedly (Intersect can be implemented in a manner where both inputs are streamed, for example) but in this case I think the extra guarantees provided by the documentation make up for that restriction.

Once-only evaluation

When I said that reversing requires buffering earlier on, I was sort of lying. Here’s an implementation of Reverse which doesn’t buffer any data anywhere:

public static IEnumerable<T> StreamingReverse<T>(this IEnumerable<T> source)
{
    // Error checking omitted for brevity
    int count = source.Count();
    for (int i = count-1; i >= 0; i–)
    {
        yield return source.ElementAt(i);
    }
}

If we assume we can read the sequence as often as we like, then we never need to buffer anything – just treat it as a random-access list. I hope I don’t have to tell you that’s a really, really bad idea. Leaving aside the blatant inefficiency even for sequences like lists which are cheap to iterate over, some sequences are inherently once-only (think about reading from a network stream) and some are inherently costly to iterate over (think about lines in a big log file – or the result of an ordering).

I suspect that developers using LINQ operators assume that they’ll only read the input data once. That’s a good assumption – wherever possible, we ought to make sure that it’s correct, and if we absolutely can’t help evaluating a sequence twice (and I can’t remember any times when I’ve really wanted to do that) we should document it in large, friendly letters.

Mind your complexity

In some ways, this falls out of “try to stream, and try to only read once” – if you’re not storing any data and you’re only reading each item once, it’s quite hard to come up with an operator which isn’t just O(n) for a single sequence. It is worth thinking about though – particularly as most of the LINQ operators can work with large amounts of data. For example, to find the smallest element in a sequence you can either sort the whole sequence and take the first element of the result or you can keep track of a “current minimum” and iterate through the whole sequence. Clearly the latter saves a lot of complexity (and doesn’t require buffering) – so don’t just take the first idea that comes into your head. (Or at least, start with that and then think how you could improve it.)

Again, documenting the complexity of the operator is a good idea, and call particular attention to anything which is unintuitively expensive.

Conclusion

Okay, so there’s nothing earth-shattering here. But the more I use LINQ to answer Stack Overflow questions, and the more I invent new operators in the spirit of the existing ones, the more powerful I think it is. It’s amazing how powerful it can be, and how ridiculously simple the code (sometimes) looks afterwards. It’s not like the operator implementation is usually hard, either – it’s just a matter of thinking of the right concepts. I’m going to try to follow these principles when I implement my extra operator library, and I hope you’ll bear them in mind too, should you ever feel that LINQ to Objects doesn’t have quite the extension method you need…

Quick rant: why isn’t there an Exception(string, params object[]) constructor?

This Stack Overflow question has reminded me of something I often wish existed in common exception constructors – an overload taking a format string and values. For instance, it would be really nice to be able to write:

throw new IOException(“Expected to read {0} bytes but only {1} were available”,
                      requiredSize, bytesRead);

Of course, with no way of explicitly inheriting constructors (which I almost always want for exceptions, and almost never want for anything else) it would mean yet another overload to copy and paste from another exception, but the times when I’ve actually written it in my own exceptions it’s been hugely handy, particularly for tricky cases where you’ve got a lot of data to include in the message. (You’d also want an overload taking a nested exception first as well, adding to the baggage…)

Stack Overflow reputation and being a micro-celebrity

I’ve considered writing a bit about this before, but not done so for fear of looking like a jerk. I still think I may well end up looking like a jerk, but this is all stuff I’m interested in and I’ll enjoy writing about it, so on we go. Much of this is based on experiences at and around Stack Overflow, and it’s more likely to be interesting to you if you’re a regular there or at least know the basic premises and mechanics. Even then you may well not be particularly interested – as much as anything, this post is to try to get some thoughts out of my system so I can stop thinking about how I would blog about it. If you don’t want the introspection, but want to know how to judge my egotism, skipping to the summary is probably a good plan. If you really don’t care at all, that’s probably a healthy sign. Quit now while you’re ahead.

What is a micro-celebrity?

A couple of minutes ago, I thought I might have been original with the term “micro-celebrity” but I’m clearly not. I may well not use the term the same way other people do, however, so here’s my rough definition solely for the purposes of this post:

A micro-celebrity is someone who gains a significant level of notoriety within a relatively limited community on the internet, usually with a positive feedback loop.

Yes, it’s woolly. Not to worry.

I would consider myself to have been a micro-celebrity in five distinct communities over the course of the last 14 years:

  • The alt.books.stephen-king newsgroup
  • The mostly web-based community around Team17’s series of “Worms” games (well, the first few, on the PC only)
  • The comp.lang.java.* newsgroups
  • The microsoft.public.dotnet.languages.csharp newsgroup
  • Stack Overflow

The last has been far and away the most blatant case. This is roughly how it goes – or at least how it’s gone in each of the above cases:

  • Spend some time in the community, post quite a lot. Shouting loudly works remarkably well on the internet – if you’re among the most prolific writers in a group, you will get noticed. Admittedly it helps to try hard to post well-written and interesting thoughts.
  • After a while, a few people will refer to you in their other conversations. For instance, if someone in the Java newsgroup was talking about “objects being passed by reference”, another poster might say something like “Don’t let Jon Skeet hear you talking like that.”
  • Play along with it, just a bit. Don’t blow your own trumpet, but equally don’t discourage it. A few wry comments to show that you don’t mind often go down well.
  • Sooner or later, you will find yourself not just mentioned in another topic, but being the topic of conversation yourself. At this point, it’s no longer an inside joke that just the core members of the group “get” – you’re now communal property, and almost any regular will be part of the joke.

One interesting thing you might have noticed about the above is that it doesn’t really take very much skill. It takes a fair amount of time, and ideally you should have some reasonable thoughts and the ability to express yourself clearly, but you certainly don’t have to be a genius. Good job, really.

How much do you care?

This is obviously very personal, and I’m only speaking for myself (as ever).

It’s undeniably an ego boost. Just about every day there’s something on Stack Overflow to laugh about in reference to me. How could I not enjoy that? How could it not inflate my sense of self-worth just a little bit? I could dismiss it as being entirely silly and meaningless – which it is, ultimately – but it’s still fun and I get a kick out of it. And yes, I’m sorry to say I bore/annoy my colleagues and wife with the latest Stack Overflow news, because I’ve always been the selfish kind of person who wants to talk about what they’re up to instead of asking the other person about their interests. This is an unfortunate trait which has relatively little to do with the micro-celebrity business.

One very good thing for keeping my ego in check is that at Google, I converse with people who are smarter than me every day, whether at coffee, breakfast, lunch or just while coding. There’s no sense of anyone trying to make anyone else feel small, but it’s pretty obvious that I’m nothing special when it comes to Google. Now, I don’t want to put on too much false modesty – I know I have a reasonable amount of experience, and I happen to know two very popular platforms reasonably well (which really helps on Stack Overflow- being the world’s greatest guru on MUMPS isn’t going to get you much love), and perhaps most importantly I can communicate pretty well. All of these are good things, and I’m proud of my career and particularly C# in Depth…

… but let’s get real here. The Jon Skeet Facts page isn’t really about me. It’s about making geek jokes where the exact subject is largely irrelevant. It could very easily have been about someone else with little change in the humour. Admittedly the funniest answers (to my mind) are the ones which do have some bearing on me (particularly the one about having written a book on C# 5.0 already) – but that doesn’t mean there’s anything really serious in it. I hope it’s pretty obvious to everyone that I’m not a genius programmer. I’d like to think I’m pretty good, but I’m not off-the-charts awesome by any means. (In terms of computer science, I’m nothing special at all and I have a really limited range of languages/paradigms. I’m trying to do something about those, but it’s hard when there’s always another question to answer.)

It’s worth bearing in mind the “micro” part of micro-celebrity. I suspect that if we somehow got all the C# developers in the world together and asked them whether they’d heard of Jon Skeet, fewer than 0.1% would say yes. (That’s a complete guess, by the way. I have really no idea. The point is I’m pretty sure it’s a small number.) Compared with the sea of developers, the set of Stack Overflow regulars is a very small pond.

What I care far more about than praise and fandom is the idea of actually helping people and making a difference. A couple of days ago I had an email from someone saying that C# in Depth had helped them in an interview: they were able to write more elegant code because now they grok lambda expressions. How cool is that? Yes, I know it’s all slightly sickening in a “you do a lot of good work for charity” kind of way – but I suspect it’s what drives most Stack Overflow regulars. Which leads me on to reputation…

What does Stack Overflow reputation mean to you?

In virtually every discussion about the Stack Overflow reputation system and its caps, I try to drop in the question of “what’s the point of reputation? What does it mean to you?” It’s one of those questions which everyone needs to answer for themselves. Jeff Atwood’s answer is that reputation is how much the system trusts you. My own answers:

  • It’s a daily goal. Making sure I always get to 200 is a fun little task, and then trying to get accepted answers is a challenge.
  • It’s measurable data, and you can play with graphs and stats. Hey we’re geeks – it’s fun to play with numbers, however irrelevant they are.
  • It’s an indication of helpfulness to some extent. It plays to my ego in terms of both pride of knowledge and the fulfillment of helping people.
  • It’s useful as an indicator of community trust for the system to use, which is probably more important to Jeff than it is to me.
  • It’s a game. This is the most important aspect. I love games. I’m fiercely competitive, and will always try to work out all the corners of a game’s system – things like it being actually somewhat useless getting accepted answers before you’ve reached the 200 limit. I don’t necessarily play to the corners of the game (I would rather post a useful but unpopular answer than a popular but harmful one, for serious questions) but I enjoy working them out. I would be interested to measure my levels of testosterone when typing furiously away at an answer, hoping to craft something useful before anyone else does. I’m never going to be “macho” physically, but I can certainly be an alpha geek. So long as it doesn’t go too far, I think it’s a positive thing.

I sometimes sense (perhaps inaccurately) that Jeff and Joel are frustrated with people getting too hung up about reputation. It’s really unimportant in the grand scheme of things – rep in itself isn’t as much of a net contribution to the world’s happiness as the way that Stack Overflow connects people with questions to people with relevant answers really, really quickly. But rep is one of the things that makes Stack Overflow so “sticky” as a website. It’s not that I wouldn’t answer questions if the reputation system went down – after all, I’ve been answering questions on newsgroups for years, for the other reasons mentioned – but the reputation system certainly helps. Yes, it’s probably taking advantage of a competitive streak which is in some ways ugly… but the result is a good one.

One downside of the whole micro-celebrity thing – and in particular of being the top rep earner – is that various suggestions (such as changing the rep limit algorithm and introducing a monthly league) make me look really selfish. It’s undeniable that both of the suggestions work in my favour. I happen to believe that both work in the community’s favour too, but I can certainly see why people might get the wrong idea about my motivation. I don’t remember thinking of any suggestions which would work against my personal interests but in the interests of the community. If I do, I’m pretty sure I’ll post them with no hesitation.

Summary

Yes, I like the attention of being a micro-celebrity. It would be ridiculous to deny it, and I don’t think it says much more about me than the fact that I’m human.

Yes, I like competing for reputation, even though it’s blatantly obvious that the figure doesn’t reflect programming prowess. It’s part of the fuel for my addiction to Stack Overflow.

With this out of the way, I hope to return to more technical blog posts. If anything interesting comes up in the comments, I’ll probably edit this post rather than writing a new one.

Stack Overflow Reputation Tool now online

Update:

Now that the “recent activity” page is working, the feed that the tool was using has been removed. However, the new page offers pretty much everything that the tool did, and a lot more besides. I’ve updated the tool to just redirect to the relevant page, so your bookmarks should still work.

Original post:

This is the micro-web-app that my recent ASP.NET question was about. It’s very simple – it shows you the reputation gained or lost by a specified user (typically you) for either today or yesterday. Note that these are Stack Overflow “today” and “yesterday” – i.e. they’re in UTC. That happens to be convenient for me as I’m in the UK, but more importantly it’s in tune with the reputation limits. It does mean that if you’re in a different time zone you’ll see the date changing at potentially unexpected times.

There’s an option for including a record of questions/answers which have received an upvote during the day but which haven’t generated any reputation – this happens if you’ve already hit the reputation limit for the day before the vote.

The worst part about the user interface (to my mind) is that you have to know the ID of the user whose reputation you want to check. This isn’t exactly hard, but it’s slightly annoying. Basically you need to go to the user’s profile page on Stack Overflow and look at the URL. It will be of the form http://stackoverflow.com/users/%5Buser-id%5D/%5Buser-name%5D – take the user ID from that, and put it into the tool. I may be able to have a browsing mode just like that on SO at some point, but it will take at least some work. I’ve been concentrating on the data retrieved rather than the presentation, as you’ll no doubt be able to tell at a glance :)

All the options are specified on the URL, so you can bookmark your own user results very easily. For example:

(If anyone has any better ideas for the URL parameter than “showzero” I’m very much open to suggestions. I can keep backward compatibility for the sake of bookmarks really easily.)

At the moment it’s showing pretty much all the information it receives. I’m hoping that I may be able to work with the Stack Overflow team to make it easy (and importantly, cheap for the SO server) to show a whole date range (e.g. “what happened in the last week?”) and also give details of the number of votes up and down, and when an answer is accepted (or unaccepted).

Enjoy, and share with friends. Feedback welcome. Many thanks to Geoff Dalgas for working with me to limit the impact on the server.

Horrible grotty hack: returning an anonymous type instance

One of the reasons I don’t view anonymous types as being too bad is that they’re nicely confined to methods. You can’t declare the type that you’re returning from a method if it’s anonymous (or if one of its type arguments is generic, e.g. a List<T> where T is an anonymous type and T isn’t a type parameter to the method itself). However, you can get around this if you’re sneaky.

I’ve always known that it’s perfectly easy to return an instance of an anonymous type by declaring that the method will return object. However, it hadn’t occurred to me before today that you can actually cast back to that type afterwards. Of course, you can’t just use a normal cast expression – that requires the name of the type to be known at compile-time. But you can do a cast in a generic method… and you can use type inference to supply a type argument… and two anonymous type instance creation expressions will use the same type within the same assembly if the order, names and types of the properties are the same.

Behold the evil cheesecake factory!

using System;

static class GrottyHacks
{
    internal static T Cast<T>(object target, T example)
    {
        return (T) target;
    }
}

class CheesecakeFactory
{
    static object CreateCheesecake()
    {
        return new { Fruit=“Strawberry”, Topping=“Chocolate” };
    }
   
    static void Main()
    {
        object weaklyTyped = CreateCheesecake();
        var stronglyTyped = GrottyHacks.Cast(weaklyTyped,
            new { Fruit=“”, Topping=“” });
       
        Console.WriteLine(“Cheesecake: {0} ({1})”,
            stronglyTyped.Fruit, stronglyTyped.Topping);           
    }
}

The important thing to note here is that the stronglyTyped variable really is of the same anonymous type as the one used in the CreateCheesecake method. When we use the Fruit and Topping properties in the last statement, that’s checked at compile-time.

Of course, it all goes pear-shaped if you make the slightest of errors when giving the Cast method an example of what you want to cast to – if you got the order of the properties wrong, for example, the code would still compile, but the cast would throw an exception at execution time.

How useful is this? Ooh, probably not at all. Please do not use this “technique” in your code. If you do, at least don’t mention my name anywhere near it. It’s all fun though, isn’t it?

You don’t have to use query expressions to use LINQ

LINQ is clearly gaining a fair amount of traction, given the number of posts I see about it on Stack Overflow. However, I’ve noticed an interesting piece of coding style: a lot of developers are using query expressions for every bit of LINQ they write, however trivial.

Now, don’t get the wrong idea – I love query expressions as a helpful piece of syntactic sugar. For instance, I’d always pick the query expression form over the “dot notation” form for something like this:

var query = from file in Directory.GetFiles(logDirectory, “*.log”)
            from line in new LineReader(file)
            let entry = new LogEntry(line)
            where entry.Severity == Severity.Error
            select file + “: “ + entry.Message;

(Yes, it’s yet another log entry example – it’s one of my favourite demos of LINQ, and particularly Push LINQ.) The equivalent code using just the extension methods would be pretty ugly, especially given the various range variables and transparent identifiers involved.

However, look at these two queries instead:

var query = from person in people
            where person.Salary > 10000m
            select person;

var dotNotation = people.Where(person => person.Salary > 10000m);

In this case, we’re just making a single method call. Why bother with three lines of query expression? If the query becomes more complicated later, it can easily be converted into a query expression at that point. The two queries are exactly the same, even though the syntax is different.

My guess is that there’s a “black magic” fear of LINQ – many developers know how to write query expressions, but aren’t confident about what they’re converted into (or even the basics of what the translation process is like in the first place). Most of the C# 3.0 and LINQ books that I’ve read do cover query expression translation to a greater or lesser extent, but it’s rarely given much prominence.

I suspect the black magic element is reinforced by the inherent “will it work?” factor of LINQ to SQL – you get to write the query in your favourite language, but you may well not be confident in it working until you’ve tried it; there will always be plenty of little gotchas which can’t be picked up at compile time. With LINQ to Objects, there’s a lot more certainty (at least in my experience). However, the query expression translation shouldn’t be part of what developers are wary of. It’s clearly defined in the spec (not that I’m suggesting that all developers should learn it via the spec) and benefits from being relatively dumb and therefore easy to predict.

So next time you’re writing a query expression, take a look at it afterwards – if it’s simple, try writing it without the extra syntactic sugar. It may just be sweet enough on its own.