An object lesson in blogging and accuracy; was: Efficient “vote counting” with LINQ to Objects – and the value of nothing

Well, this is embarrassing.

Yesterday evening, I excitedly wrote a blog post about an interesting little idea for making a particular type of LINQ query (basically vote counting) efficient. It was an idea that had occurred to me a few months back, but I hadn’t got round to blogging about it.

The basic idea was to take a completely empty struct, and use that as the element type in the results of a grouping query – as the struct was empty, it would take no space, therefore "huge" arrays could be created for no cost beyond the fixed array overhead, etc. I carefully checked that the type used for grouping did in fact implement ICollection<T> so that the Count method would be efficient; I wrote sample code which made sure my queries were valid… but I failed to check that the empty struct really took up no memory.

Fortunately, I have smart readers, a number of whom pointed out my mistake in very kind terms.

Ben Voigt gave the reason for the size being 1 in a comment:

The object identity rules require a unique address for each instance… identity can be shared with super- or sub- class objects (Empty Base Optimization) but the total size of the instance has to be at least 1.

This makes perfect sense – it’s just a shame I didn’t realise it before.

Live and learn, I guess – but apologies for the poorly researched post. I’ll attempt to be more careful next time.

API design: choosing between non-ideal options

So, UnconstrainedMelody is coming on quite nicely. It now has quite a few useful options for flags enums, "normal enums" and delegates. However, there are two conflicting limitations which leave a couple of options. (Other related answers on Stack Overflow have suggested alternative approaches, basically.)

Currently, most of the enums code is in two classes: Flags and Enums. Both are non-generic: the methods within them are generic methods, so they have type parameters (and constraints). The main benefit of this is that generic type inference only applies to generic methods, and I definitely want that for extension methods and anywhere else it makes sense.

The drawback is that properties can’t be generic. That means my API is entirely expressed in terms of methods, which can be a pain. The option to work around this is to have a generic type which properties in. This adds confusion and guesswork – what call is where?

To recap, the options are:

// Option 1 (current): all methods in a nongeneric class:
// Some calls which are logically properties end up
// as methods…
IList<Foo> foos = Enums.GetValues<Foo>();
// Type infererence for extenion methods
// Note that we couldn’t have a Description property
// as we don’t have extension properties
string firstDescription = foos[0].GetDescription();
        
// Option 2: Use just a generic type:
// Now we can use a property…
IList<Foo> foos = Enums<Foo>.Values;
// But we can’t use type inference
string firstDescription = Enums<Foo>.GetDescription(foos[0]);
        
// Option 3: Use a mixture (Enums and Enums<T>):
IList<Foo> foos = Enums<Foo>.Values;
// All looks good…
string firstDescription = foos[0].GetDescription();
// … but the user has to know when to use which class

All of these are somewhat annoying. If we only put extension methods into the nongeneric class, then I guess users would never need to really think about that – they’d pretty much always be calling the methods via the extension method syntactic sugar anyway. It still feels like a pretty arbitrary split though.

Any thoughts? Which is more important – conceptual complexity, or the idiomatic client code you end up with once that complexity has been mastered? Is it reasonable to make design decisions like this around what is essentially a single piece of syntactic sugar (extension methods)?

(By the way, if anyone ever wanted justification for extension properties, I think this is a good example… Description feels like it really should be a property.)

Generic constraints for enums and delegates

As most readers probably know, C# prohibits generic type constraints from referring to System.Object, System.Enum, System.Array, System.Delegate and System.ValueType. In other words, this method declaration is illegal:

public static T[] GetValues<T>() where T : struct, System.Enum
{
    return (T[]) Enum.GetValues(typeof(T));
}

This is a pity, as such a method could be useful. (In fact there are better things we can do… such as returning a read-only collection. That way we don’t have to create a new array each time the method is called.) As far as I can tell, there is no reason why this should be prohibited. Eric Lippert has stated that he believes the CLR doesn’t support this – but I think he’s wrong. I can’t remember the last time I had cause to believe Eric to be wrong about something, and I’m somewhat nervous of even mentioning it, but section 10.1.7 of the CLI spec (ECMA-335) partition II (p40) specifically gives examples of type parameter constraints involving System.Delegate and System.Enum. It introduces the table with "The following table shows the valid combinations of type and special constraints for a representative set of types." It was only due to reading this table that I realized that the value type constraint on the above is required (or a constructor constraint would do equally well) – otherwise System.Enum itself satisfies the constraint, which would be a Bad Thing.

It’s possible (but unlikely) that the CLI doesn’t fully implement this part of the CLR spec. I’m hoping that Eric’s just wrong on this occasion, and that actually there’s nothing to stop the C# language from allowing such constraints in the future. (It would be nice to get keyword support, such that a constraint of "T : enum" would be equivalent to the above, but hey…)

The good news is that ilasm/ildasm have no problem with this. The better news is that if you add a reference to a library which uses those constraints, the C# compiler applies them sensibly, as far as I can tell…

Introducing UnconstrainedMelody

(Okay, the name will almost surely have to change. But I like the idea of it removing the constraints of C# around which constraints are valid… and yet still being in the key of C#. Better suggestions welcome.)

I have a plan – I want to write a utility library which does useful things for enums and delegates (and arrays if I can think of anything sensible to do with them). It will be written in C#, with methods like this:

public static T[] GetValues<T>() where T : struct, IEnumConstraint
{
    return (T[]) Enum.GetValues(typeof(T));
}

(IEnumConstraint has to be an interface of course, as otherwise the constraint would be invalid.)

As a post-build step, I will:

  • Run ildasm on the resulting binary
  • Replace every constraint using EnumConstraint with System.Enum
  • Run ilasm to build the binary again

If anyone has a simple binary rewriter (I’ve looked at PostSharp and CCI; both look way more complicated than the above) which would do this, that would be great. Otherwise ildasm/ilasm will be fine. It’s not like consumers will need to perform this step.

As soon as the name is finalized I’ll add a project on Google Code. Once the infrastructure is in place, adding utility methods should be very straightforward. Suggestions for utility methods would be useful, or just join the project when it’s up and running.

Am I being silly? Have I overlooked something?

A couple of hours later…

Okay, I decided not to wait for a better name. The first cut – which does basically nothing but validate the idea, and the fact that I can still unit test it – is in. The UnconstrainedMelody Google Code project is live!

Recent activities

It’s been a little while since I’ve blogged, and quite a lot has been going on. In fact, there are a few things I’d have blogged about already if it weren’t for “things” getting in the way.

Rather than writing a whole series of very short blog posts, I thought I’d wrap them all up here…

C# in Depth: next MEAP drop available soon – Code Contracts

Thanks to everyone who gave feedback on my writing dilemma. For the moment, the plan is to have a whole chapter about Code Contracts, but not include a chapter about Parallel Extensions. My argument for making this decision is that Code Contracts really change the feel of the code, making it almost like a language feature – and its applicability is almost ubiquitous, unlike PFX.

I may write a PFX chapter as a separate download, but I’m sensitive to those who (like me) appreciate slim books. I don’t want to “bulk out” the book with extra topics.

The Code Contracts chapter is in the final stages before becoming available to MEAP subscribers. (It’s been “nearly ready” for a couple of weeks, but I’ve been on holiday, amongst other things.) After that, I’m going back to the existing chapters and revising them.

Talking in Dublin – C# 4 and Parallel Extensions

Last week I gave two talks in Dublin at Epicenter. One was on C# 4, and the other on Code Contracts and Parallel Extensions. Both are now available in a slightly odd form on the Talks page of the C# in Depth web site. I no longer write “formal” PowerPoint slides, so the downloads are for simple bullet points of text, along with silly hand-drawn slides. No code yet – I want to tidy it up a bit before including it.

Podcasting with The Connected Show

I recently recorded a podcast episode with The Connected Show. I’m “on” for the second 2/3 of the show – about an hour of me blathering on about the new features of C# 4. If you can understand generic variance just by listening to me talking about it, you’re a smart cookie ;)

(Oh, and if you like it, please express your amusement on Digg / DZone / Shout / Kicks.)

Finishing up with Functional Programming for the Real World

Well, this hasn’t been taking much of my time recently (I bowed out of all the indexing etc!) but Functional Programming for the Real World is nearly ready to go. Hard copy should be available in the next couple of months… it’ll be really nice to see how it fares. Much kudos to Tomas for all his hard work – I’ve really just been helping out a little.

Starting on Groovy in Action, 2nd edition

No sooner does one book finish than another one starts. The second edition of Groovy in Action is in the works, which should prove interesting. To be honest, I haven’t played with Groovy much since the first edition of the book was finished, so it’ll be interesting to see what’s happened to the language in the meantime. I’ll be applying the same sort of spit and polish that I did in the first edition, and asking appropriately ignorant questions of the other authors.

Tech Reviewing C# 4.0 in a Nutshell

I liked C# 3.0 in a Nutshell, and I feel honoured that Joe asked me to be a tech reviewer for the next edition, which promises to be even better. There’s not a lot more I can say about it at the moment, other than it’ll be out in 2010 – and I still feel that C# in Depth is a good companion book.

MoreLINQ now at 1.0 beta

A while ago I started the MoreLINQ project, and it gained some developers with more time than I’ve got available :) Basically the idea is to add some more useful LINQ extension methods to LINQ to Object. Thanks to Atif Aziz, the first beta version has been released. This doesn’t mean we’re “done” though – just that we think we’ve got something useful. Any suggestions for other operators would be welcome.

Manning Pop Quiz and discounts

While I’m plugging books etc, it’s worth mentioning the Manning Pop Quiz – multiple choice questions on a wide variety of topics. Fabulous prizes available, as well as one-day discounts:

  • Monday, Sept 7th: 50% of all print books (code: pop0907)
  • Monday, Sept 14: 50% off all ebooks  (code: pop0914)
  • Thursday, Sept 17: $25 for C# in Depth, 2nd Edition MEAP print version (code: pop0917) + C# Pop Quiz question
  • Monday, Sept 21: 50% off all books  (code: pop0921)
  • Thursday, Sept 24: $12 for C# in Depth, 2nd Edition MEAP ebook (code: pop0924) + another C# Pop Quiz question

Future speaking engagements

On September 16th I’m going to be speaking to Edge UG (formerly Vista Squad) in London about Code Contracts and Parallel Extensions. I’m already very much looking forward to the Stack Overflow DevDays London conference on October 28th, at which I’ll be talking about how humanity has screwed up computing.

Future potential blog posts

Some day I may get round to writing about:

  • Revisiting StaticRandom with ThreadLocal<T>
  • Volatile doesn’t mean what I thought it did

There’s a lot more writing than coding in that list… I’d like to spend some more time on MiniBench at some point, but you know what deadlines are like.

Anyway, that’s what I’ve been up to and what I’ll be doing for a little while…