Category Archives: C#

Visual Studio vs Eclipse

I often see people in newsgroups saying how wonderful Visual Studio is, and they often claim it’s the “best IDE in the world”. Strangely enough, most go silent when I ask how many other IDEs they’ve used for a significant amount of time. I’m not going to make any claims as to which IDE is “the best” – I haven’t used all the IDEs available, and I know full well that one (IDEA) is often regarded as superior to Eclipse. However, here are a few reasons I prefer Eclipse to Visual Studio (even bearing in mind VS 2005, which is a great improvement). Visual Studio has much more of a focus on designers (which I don’t
tend to use, for reasons given elsewhere) and much less of a focus on making actual coding as easy as possible.

Note that this isn’t a comparison of Java and C# (although those are the languages I use in Eclipse and VS respectively). For the most part, I believe C# is an improvement on Java, and the .NET framework is an improvement on the Java standard library. It’s just a shame the tools aren’t as good. For reference, I’m comparing VS2005 and Eclipse 3.1.1. There are new features being introduced to Eclipse all the time (as I write, 3.2M4 is out, with some nice looking things) and obviously MS is working on improving VS as well. So, without further ado (and in no particular order):

Open Type/Resource

When I hit Ctrl-Shift-T in Eclipse, an “Open Type” dialog comes up. I can then type in the name of any type (whether it’s my code, 3rd party library code, or the Java standard library code) and the type is opened. If the source is available (which it generally is – I’ve used very few closed source 3rd party Java components, and the source for the Java standard library is available) the source opens up; otherwise a list of members is displayed.

In large solutions, this is an enormous productivity gain. I regularly work with solutions with thousands of classes – remembering where each one is in VS is a bit of a nightmare. Non-Java resources can also be opened in the same way in Eclipse, using Ctrl-Shift-R instead. One neat feature is that Eclipse knows the Java naming conventions, and lets you type just the initial letters instead of the type name itself. (You only ever need to type as much as you want in order to find the type you’re after anyway, of course.) So for example, if I type “NPE”, I’m offered NullPointerException and NoPermissionException.

Note that this isn’t the same as the “Find Symbol” search offered by VS 2005. Instead, it’s a live updating search – as you type, the list is updated. This is very handy if you can’t remember whether it’s ArgumentNullException or NullArgumentException and the like – it’s very fast to experiment with.

There’s good news here: Visual Studio users have a saviour in the form of a free add-in called DPack, by USysWare. This offers dialogs
for opening types, members (like the Outline dialog, Ctrl-O, in Eclipse), and files. I’ve only just heard about it, and haven’t tried it on a large solution yet, but I have high hopes for it.

Sensible overload intellisense

(I’m using the word intellisense for what Eclipse calls Code Assist – I’m sure you know what I mean.) For some reason, although Visual Studio is perfectly capable of displaying the choice of multiple methods within a drop-down list, when it comes to overloads it prefers a spinner. Here’s what you get if you type sb.Append( into Visual Studio, where sb is a StringBuilder

Here’s what happens if you do the equivalent in Eclipse:

Look ma, I can see more than one option at once!

Organise imports

For those of you who aren’t Java programmers, import statements are the equivalent to using directives in C# – they basically import a type or namespace so that it can be used without the namespace being specified. In Visual Studio, you either have to manually type the using directives in (which can be a distraction, as you have to go to the top of the file and then back to where you were) or (with 2005) you can hit Shift-Alt-F10 after typing the name ofthe type, and it will give you the option of adding a using statement, or filling in the namespace for you. Now, as far as I’m aware, you have to do that manually for each type. With Eclipse, I can write a load of code which won’t currently compile, then hit Ctrl-Shift-O and the imports are added. I’m only prompted if there are multiple types available from different namespaces with the same name. Not only that, but I can get intellisense for the type name while I’m typing it even before I’ve added the import – and picking the type adds the import automatically. In addition, organise imports removes import statements which aren’t needed – so if you’ve added something but then gone back and removed it, you don’t have misleading/distracting lines at the top of your file. A feature which isn’t relevant to C# anyway but which is quite neat is that Eclipse allows you to specify how many individual type imports you want before it imports the whole package (e.g. import java.util.*). This allows people to code in whatever style they want, and still get
plenty of assistance from Eclipse.

Great JUnit integration

I confess I’ve barely tried the unit testing available in Team System, but it seems to be a bit of a pain in the neck to use. In Eclipse, having written a test class, I can launch it with a simple (okay, a slightly complicated – you learn to be a bit of a spider) key combination. Similarly I can select a package or a whole source directory and run all the unit tests within it. Oh, and it’s got a red/green bar, unlike Team System (from what I’ve seen). It may sound like a trivial thing, but having a big red/green bar in your face is a great motivator in test driven development. Numbers take more time to process – and really, the most important thing you need to know is whether all the tests have passed or not. Now, Jamie Cansdale has done a great job with TestDriven.NET, and I’m hoping that he’ll integrate it with VS2005 even better, but Eclipse is still in the lead at this point for me. Of course, it helps that it just comes with all this stuff, without extra downloads (although there are plenty of plugins available). Oh, and just in case anyone at Microsoft thinks I’ve forgotten: no, unit testing still doesn’t belong in just Team System. It should be in the Express editions, in my view…

Better refactoring

MS has made no secret of the fact that it doesn’t have many refactorings available out of the box. Apparently they’re hoping 3rd parties will add their own – and I’m sure they will, at a cost. It’s a shame that you have to buy two products in 2005 before you can get the same level of refactoring that has been available in Eclipse (and other IDEs) for years. (I know I was using Eclipse in 2001, and possibly earlier.)

Not only does Eclipse have rather more refactorings available, but they’re smarter, too. Here’s some sample code in C#:

public void DoSomething()
    string x = "Hello";
    byte[] b = Encoding.UTF8.GetBytes(x);
    byte[] firstHalf = new byte[b.Length / 2];
    Array.Copy(b, firstHalf, firstHalf.Length);

public void DoSomethingElse()
    string x = "Hello there";
    byte[] b = Encoding.UTF8.GetBytes(x);
    byte[] firstHalf = new byte[b.Length / 2];
    Array.Copy(b, firstHalf, firstHalf.Length);

If I select the last middle lines of the first method, and use the ExtractMethod refactoring, here’s what I get:

public void DoSomething()
    string x = "Hello";
    byte[] firstHalf = GetFirstHalf(x);

private static byte[] GetFirstHalf(string x)
    byte[] b = Encoding.UTF8.GetBytes(x);
    byte[] firstHalf = new byte[b.Length / 2];
    Array.Copy(b, firstHalf, firstHalf.Length);
    return firstHalf;

public void DoSomethingElse()
    string x = "Hello there";
    byte[] b = Encoding.UTF8.GetBytes(x);
    byte[] firstHalf = new byte[b.Length / 2];
    Array.Copy(b, firstHalf, firstHalf.Length);

Note that second method is left entirely alone. In Eclipse, if I have some similar Java code:

public void doSomething() throws UnsupportedEncodingException
    String x = "hello";        
    byte[] b = x.getBytes("UTF-8");
    byte[] firstHalf = new byte[b.length/2];
    System.arraycopy(b, 0, firstHalf, 0, firstHalf.length);
    System.out.println (firstHalf[0]);

public void doSomethingElse() throws UnsupportedEncodingException
    String y = "hello there";        
    byte[] bytes = y.getBytes("UTF-8");
    byte[] firstHalfOfArray = new byte[bytes.length/2];
    System.arraycopy(bytes, 0, firstHalfOfArray, 0, firstHalfOfArray.length);
    System.out.println (firstHalfOfArray[0]);

and again select Extract Method, then the dialog not only gives me rather more options, but one of them is whether to replace the duplicate code snippet elsewhere (along with a preview). Here’s the result:

public void doSomething() throws UnsupportedEncodingException
    String x = "hello";        
    byte[] firstHalf = getFirstHalf(x);
    System.out.println (firstHalf[0]);

private byte[] getFirstHalf(String x) throws UnsupportedEncodingException
    byte[] b = x.getBytes("UTF-8");
    byte[] firstHalf = new byte[b.length/2];
    System.arraycopy(b, 0, firstHalf, 0, firstHalf.length);
    return firstHalf;

public void doSomethingElse() throws UnsupportedEncodingException
    String y = "hello there";        
    byte[] firstHalfOfArray = getFirstHalf(y);
    System.out.println (firstHalfOfArray[0]);

Note the change to doSomethingElse. I’d even tried to be nasty to Eclipse, making the variable names different in the second method. It still does the business.

Navigational Hyperlinks

If I hold down Ctrl and hover over something in Eclipse (e.g. a variable, method or type name), it becomes a hyperlink. Click on the link, and it takes you to the declaration. Much simpler than right-clicking and hunting for “Go to definition”. Mind you, even that much
isn’t necessary in Eclipse with the Declaration view. If you leave your cursor in a variable, method or type name for a second, the Declaration view shows the appropriate code – the line
declaring the variable, the code for the method, or the code for the whole type. Very handy if you just want to check something quickly, without even changing which editor you’re using. (For those of you who haven’t used Eclipse, a view is a window like the Output window in VS.NET. Pretty much any window which isn’t an editor or a dialog is a view.)

Update! VS 2005 has these features too!
F12 is used to go to a definition (there may be a shortcut key in Eclipse as well to avoid having to use the mouse – I’m not sure).
VS 2005 also has the Code Definition window which is pretty much identical to the Declaration view. (Thanks for the tips, guys :)

Better SourceSafe integration

The source control integration in Eclipse is generally pretty well thought through, but what often amuses me is that it’s easier to use Visual SourceSafe (if you really have to – if you have a choice, avoid it) through Eclipse (using the free plug-in) than through Visual Studio. The whole binding business is much more easily set up. It’s a bit more
manual, but much harder to get wrong.

Structural differences

IDEs understand code – so why do most of them not allow you to see differences in code terms? Eclipse does. I can ask it to compare two files, or compare my workspace version with the previous (or any other) version in source control, and it shows me not just the textual
differences but the differences in terms of code – which methods have been changed, which have been added, which have been removed. Also, when going through the differences, it shows blocks at a time and then what’s changed within the block – i.e. down to individual words, not just lines. This is very handy when comparing resources in foreign languages!

Compile on save

The incremental Java compiler in Eclipse is fast. Very, very fast. And it compiles in the background now, too – but even when it didn’t, it rarely caused any bother. That’s why it’s perfectly acceptable for it to compile (by default – you can change it of course) whenever you save. C# compiles a lot faster than C/C++, but I still have to wait a little while for a build to finish, which means that I don’t do it as often as I save in Eclipse. That in turn means I see some problems later than I would otherwise.

Combined file and class browser

The package explorer in Eclipse is aware that Java files contain classes. So it makes sense to allow you to expand a file to see the types within it:

That’s it – for now…

There are plenty of other features I’d like to mention, but I’ll leave it there just for now. Expect this blog entry to grow over time…

New (to me) threading paradigms

In the last couple of days, I’ve been reading up on CSPs (Communicating Sequential Processes) and the Microsoft Research project CCR (Concurrency and Coordination Runtime). I suspect that the latter is really a new look at the former, but I don’t have enough experience with either of them to tell. Now, I know a number of my readers are smart folks who have probably lived and breathed these things for a while – so, is my hunch right, or are they fundamentally different models?

A few links for further reading:

System.Random (and java.util.Random)

This is as much a “before you forget about it” post as anything else.

Both Java and .NET have Random classes, which allow you to get random numbers within a certain range etc. Unless you specify otherwise, both are seeded with the current time. Neither class claims to be thread-safe. This presents a problem – typically you effectively just want one source of random numbers, but you want to be able to access it from multiple threads. You certainly don’t want to create a new instance of the Random class every time you want a random number – due to the granularity of the system clock, that commonly gives repeated sequences of random numbers (i.e. multiple instances are created with the same seed, so each instance gives the same sequence).

I suggest that very few people really need to specify seeds – which means they don’t really need instance methods in the first place. A class with static methods (matching the interface of Random) would be perfectly adequate. This could be implemented using a single Random instance and locking to get thread safety, but a slightly more elegant (or at least more interesting) way would be to use thread-local static variables. In other words, each thread gets its own instance of Random. Now, that introduces the problem of which seed to use for each of these instances. That’s pretty easily solved though – either you take some combination of the thread ID and the current time, or you create a new instance of Random the first time any thread accesses the class, and use that Random as the source of seeds for future Randoms. The only time locking is needed is to access that Random, which occurs once per thread.

This does, of course, break my “keep it as simple as possible” rule – the simplest solution would certainly be to lock each time. In this case though, as this will end up as a library class which may well be used by many threads simultaneously in situations where a lot of random numbers are being generated, I think it’s worth the extra effort. I’ll probably write this up as an article when I’ve written the tests and implementation…

Code formatter online

I’ve finally got round to doing it… the code I use for posting code for articles etc has now been transformed into a small ASP.NET app. It’s based on a VB.NET article. I converted it from VB.NET to C# using Instant C# (which worked very well – just a few gotchas, far fewer than last time I tried it) and then refactored it into a more object-oriented and cleanly layered structure. There’s now a WinForms application, and a separate class library, which the ASP.NET app uses.

If you want to use the results yourself, you’ll need the stylesheet I use. Unfortunately, between this blog, my home page, and the home for the application, I’m starting to get far too many copies floating around – I may need to rationalise at some stage, as making a change is becoming painful. Anyway, just reference the stylesheet in your page header, cut and paste whatever the app provides for you (between the textbox and the sample output) and you’re away.

The site is hosted by AspSpider.NET – free ASP.NET hosting. I only signed up with them yesterday, and everything seems to work pretty seamlessly, so it seems reasonable to acknowledge them in this post :)

The code can still be made a lot prettier, but it’s getting there. I was pleased with the ASP.NET side of things – obviously (being me) I did it all in a plain text editor, and the resulting .aspx page is 29 lines, with the code-behind weighing in at 60 lines. Nice.

Future plans for it:

  • A drop down list of languages (easy) (done, 17th Nov 2005)
  • A radio-button for the format type (css, html – easy)
  • Adding Java to the list of languages (hopefully fairly easy) (done, 17th Nov 2005)
  • Giving the code back to the Darren Neimke

CLI spec mistake with unboxing and enums

When looking over someone’s test code the other day, I happened to notice he was unboxing a boxed enum to an int. I was mildly surprised that he was able to do so – I thought you could only ever unbox to the exact value type that was “in the box”. Naturally, I consulted the spec. The C# spec is relatively woolly on the subject, unfortunately, but the CLI spec (partition three, end of chapter 4 for those who wish to look it up) is very clear. Here’s what it states under the Exceptions section, which is the interesting part – obj is the value to unbox, valuetype is the type we’re tring to unbox to:

InvalidCastException is thrown if obj is not a boxed valuetype (or if obj is a boxed enum and valuetype is not its underlying type)

That’s from the first edition of the spec. The third edition (still under consideration for approval by ISO/IEC) has this:

System.InvalidCastException is thrown if obj is not a boxed valuetype, or if obj is a boxed enum and valuetype is not its underlying type.

Seems harmless enough, right? Well, consider a situation where you have two enums, FirstEnum and SecondEnum. Then consider this (the code is in C#, but the generated IL is a direct translation – in particular, the types used for unboxing are preserved):

// Keep declarations out of the way to focus on the conversions.
object o;
int i;
FirstEnum x;
SecondEnum y;

o = 1; // The value of o is a boxed System.Int32
i = (int) o; // Conversion 1
x = (FirstEnum) o; // Conversion 2
y = (SecondEnum) o; // Conversion 3

o = (FirstEnum)1; // The value of o is a boxed FirstEnum
i = (int) o; // Conversion 4
x = (FirstEnum) o; // Conversion 5
y = (SecondEnum) o; // Conversion 6

Now, which of those versions should succeed? Reading absolutely literally, only conversion 1 should work. The others should all throw InvalidCastException. In all but conversions 1 and 4, the first part of the “or” clause in the spec is true (i.e. we’re trying to unbox to a different type) and in conversion 5, the second part of the “or” clause is true (the value is a boxed enum, and the type we’re trying to unbox it to isn’t the underlying type – it’s the real type!).

I know that’s reading the spec very literally, rather than taking the obviously intended meaning – but specifications should be precise documents. In fact, I’m not sure the intended meaning is so obvious anyway. Clearly conversions 1 and 5 should work (an “exact match” should always be valid) but what about 4? Should I be able to unbox an enum to an int? The way the spec is worded suggests that probably I should. What about the other way round (conversions 2 and 3), unboxing an int to an arbitrary enum which has int as its underlying type? Not so sure about that. As for conversion 6, unboxing one enum to a completely different one (which happens to share the same underlying type) – I think I’d actually rather that failed.

So, what happens? Under the current .NET implementations, all the conversions succeed. That pretty much means that’s the way the behaviour is going to have to stay, meaning I won’t be able to get my wish about conversion 6. Still, never mind – what’s important is that the spec matches reality. I believe it’s very hard to word this in the “single sentence” style they’ve tried for. I think I’d say:

System.InvalidCastException is thrown if none of the following valid conversions are applicable:

  • obj is a boxed valuetype (covers 1 and 5)
  • valuetype is an enum, and obj is a boxed value of the underlying type of valuetype (covers 2 and 3)
  • obj is a boxed enum with underlying type valuetype (covers 4)
  • valuetype is an enum, and obj is a boxed value of an enum with the same underlying type as valuetype (covers 5
    and 6)

Book idea

Having just glanced at the clock, now is the ideal time to post about an idea I had a little while ago – a book (or blog, or something) about C# (or maybe C# and Java) which I’d only write between midnight and one in the morning.

It would contain only those things which seemed like really good ideas at the time – but which might seem insane at other times. Most of these ideas are probably useless, but may contain a germ of interest. While I don’t always have those ideas between midnight and one, that’s the time of night when they seem most potent, and when I’d be most likely to be ready to write enthusiastically about them. The coding equivalent of “beer goggles” if you will.

A couple ideas I’ve had which would probably qualify:

Extension interfaces

If C# 3.0 is going to allow us to pretend to add methods to classes, which shouldn’t it allow us to pretend that classes implement interfaces they don’t? My original reason for wanting this is to get rid of some of the ugliness in the suggesting new XML APIs: there’s a method which takes an array of objects, even though only a handful of types are catered for. Unfortunately, those types don’t have an interface in common, so all the checking has to be done at runtime. If you could pretend that they all implement the same interface, just for the purposes of the API, you could declare the method as taking an array of the interface type. Of course, this is much less straightforward than converting what looks like an instance method call into a static method call…

Conditional returns

This came up when implementing Equals for several types in quick succession. All of them followed a very similar pattern, and there were similar things needed at the start of each implementation – simple checks for nullity, reference identity etc. It would be interesting to have a sort of “nullable return” for methods which had a non-nullable value type return type – I could write return? expression; where the expression was a nullable form of the return type, and it would only return if the expression was non-null. There are bits of this which appeal, and bits which seem horrible – but the main problem I have with it is that I suspect would rarely use it outside Equals implementations. (If this isn’t a clear enough description, I’m happy to write an example – just not right now.)

Corner cases in Java and C#

Every language has a few interesting corner cases – bits of surprising behaviour
which can catch you out if you’re unlucky. I’m not talking about the kind of thing
that all developers should really be aware of – the inefficiencies of repeatedly
concatenating strings, etc. I’m talking about things which you would never suspect
until you bump into them. Both C#/.NET and Java have some oddities in this respect,
and as most are understandable even to a developer who is used to the other, I thought
I’d lump them together.

Interned boxing – Java 1.5

Java 1.5 introduced autoboxing of primitive types – something .NET has had
from the start. In Java, however, there’s a slight difference – the boxed
types have been available for a long time, and are proper named reference
types just as you’d write elsewhere. In this example, we’ll look at
int boxing to java.lang.Integer. What would you
expect the results of the following operation to be?

Object x = 5;
Object y = 5;
boolean equality = (x==y);

Personally, I’d expect the answer to be false. We’re testing for reference equality
here, after all – and when you box two values, they’ll end up in different boxes,
even if the values are the same, right? Wrong. Java 1.5 (or rather, Sun’s current
implementation of Java 1.5) has a sort of cache of interned values between -128 and
127 inclusive. The language specification explicitly states that programmers shouldn’t
rely on two boxed values of the same original value being different (or being the
same, of course). Goodness only knows whether or not this actually yields performance
improvements in real life, but it can certainly cause confusion. I only ran into it
when I had a unit test which incorrectly asserted reference equality rather than
value equality between two boxed values. The tests worked for ages, until I added
something which took the value I needed to test against above 127.

Lazy initialisation and the static constructor – C#

One of the things which is sometimes important about the pattern I usually use when
implementing a singleton
is that it’s only initialised when it’s first used – or is it? After a newsgroup
question asked why the supposedly lazy pattern wasn’t working, I investigated a little,
finding out that there’s a big difference between using an initialiser directly on
the static field declaration, and creating a static constructor which assigns the value.
Full details on my beforefieldinit

The old new object – .NET

I always believed that using new with a reference type would give me
a reference to a brand new object. Not quite so – the overload for the String
constructor which takes a char[] as its single parameter will return
String.Empty if you pass it an empty array. Strange but true.

When is == not reflexive? – .NET

Floating point numbers have been
the cause of many headaches over the years. It’s relatively well known that “not a number” is not equal
to itself (i.e. if x=double.NaN, then x==x is false).

It’s slightly more surprising when two values which look like they really, really should be equal just
aren’t. Here are a couple of sample programs:

using System;
public class Oddity1
    public static void Main()
        double two = double.Parse("2");
        double a = double.Epsilon/two;
        double b = 0;
        Console.WriteLine(Math.Abs(b-a) < double.Epsilon);

On my computer, the above (compiled and run from the command line) prints out True twice.
If you comment out the last line, however, it prints False – but only under .NET 1.1.
Here’s another:

using System;

class Oddity2
    static float member;

    static void Main()
        member = Calc();
        float local = Calc();
        member = local;

    static float Calc()
        float f1 = 2.82323f;
        float f2 = 2.3f;
        return f1*f2;

This time it prints out True until you comment out the last
line, which changes the result to False. This occurs on both .NET 1.1 and 2.0.

The reason for these problems is really the same – it’s a case of when the JIT decides to
truncate the result down to the right number of bits. Most CPUs work on 80-bit floating point
values natively, and provide ways of converting to and from 32 and 64 bit values. Now, if you
compare a value which has been calculated in 80 bits without truncation with a value which has
been calculated in 80 bits, truncated to 32 or 64, and then expanded to 80 again, you can run
into problems. The act of commenting or uncommenting the extra lines in the above changes what
the JIT is allowed to do at what point, hence the change in behaviour. Hopefully this will
persuade you that comparing floating point values directly isn’t a good idea, even in cases
which look safe.

That’s all I can think of for the moment, but I’ll blog some more examples as and when I see/remember
them. If you enjoy this kind of thing, you’d probably like
Java Puzzlers
– whether or not you use Java itself. (A lot of the puzzles there map directly to C#, and even those which
don’t are worth looking at just for getting into the mindset which spots that kind of thing.)

A short case study in LINQ efficiency

I’ve been thinking a bit about how I’d use LINQ in real life (leaving DLinq and XLinq alone for the moment). One of the examples I came up with is a fairly common one – trying to find the element in a collection which has the maximum value for a certain property. Note that quite often I don’t just need to know the maximum value of the property itself – I need to know which element had that value. Now, it’s not at all hard to implement that in “normal” code, but using LINQ could potentially make the intention clearer. So, I tried to work out what the appropriate LINQ expression should look like.

I’ve come up with three ways of expressing what I’m interested in in LINQ. For these examples, I’ve created a type NameAndSize which has (unsurprisingly) properties Name (a string) and Size (an int). For testing purposes, I’ve created a list of these items, with a variable list storing a reference to the list. All samples assume that the list is non-empty.

Sort, and take first element

(from item in list
orderby item.Size descending
select item).First();

This orders by size descending, and takes the first element. The obvious disadvantage of this is that before finding the first element (which is the only one we care about) we have to sort all the elements – nasty! Assuming a reasonable sort, this operation is likely to be O(n log (n))

Subselect in where clause

(from item in list
where item.Size==list.Max(x=>x.Size)
select item).First();

This goes through the list, finding every element whose size is equal to the maximum one, and then takes the first of those elements. Unfortunately, the comparison calculates the maximum size on every iteration This makes it an O(n^2) operation.

Two selects

int maxSize = list.Max(x=>x.Size);
NameAndSize max = (from item in list
where item.Size==maxSize
select item).First();

This is similar to the previous version, but solves the problem of the repeated calculation of the maximum size by doing it before anything else. This makes the whole operation O(n), but it’s still somewhat dissatisfying, as we’re having to iterate through the list twice.

The non-LINQ way

NameAndSize max = list[0];
foreach (NameAndSize item in list)
if (item.Size > max.Size)
max = item;

This keeps a reference to the “maximum element so far”. It only iterates through the list once, and is still O(n).


Now, I’ve written a little benchmark which runs all of these except the “embedded select” version which was just too slow to run sensibly by the time I’d made the list large enough to get sensible results for the other versions. Here are the results using a list of a million elements, averaged over five runs:
Sorting: 437ms
Two queries: 109ms
Non-LINQ: 38ms

After tripling the size of the list, the results were:
Sorting: 1437ms
Two queries: 324ms
Non-LINQ: 117ms

These results show the complexities being roughly as predicted above, and in particular show that it’s definitely cheaper to only iterate through the collection once than to iterate through it twice.

Now, this query is a fairly simple one, conceptually – it would be a shame if LINQ couldn’t cope with it efficiently. I suspect it could be solved by giving the Max operator another parameter, which specified what should be selected, as well as what should be used for comparisons. Then I could just use list.Max(item => item.Size, item=>item). At that stage, the only loss in efficiency would be through invoking the delegates, which is a second-order problem (and one which is inherent in LINQ). Fortunately, the way LINQ works makes this really easy to try out – just write an extension class:

static class Extensions
    public static V Max<T,V>(this IEnumerable<T> source, 
                             Func<T,int> comparisonMap, 
                             Func<T,V> selectMap)
        int maxValue=0;
        V maxElement=default(V);
        bool gotAny = false;
        using (IEnumerator<T> enumerator = source.GetEnumerator())
            while (enumerator.MoveNext())
                T sourceValue = enumerator.Current;
                int value = comparisonMap(sourceValue);
                if (!gotAny || value > maxValue)
                    maxValue = value;
                    maxElement = selectMap(sourceValue);
                    gotAny = true;
        if (!gotAny)
            throw new EmptySequenceException();
        return maxElement;

This gave results of 57ms and 169ms for the two list sizes used earlier – not quite as fast as the non-LINQ way, but much better than any of the others – and by far the simplest to express, too.

Lessons learned

  • You really need to think about the complexity of LINQ queries, and know where they will be executed (I suspect that DLinq would have coped with the “subselect” version admirably).
  • There’s still some more work to be done on the standard query operators to find efficient solutions to common use cases.
  • Even if the standard query operators don’t quite do what you want, it can be worthwhile to implement your own – and it’s not that hard to do so!

Nasty generics restrictions

So, I caved and finally downloaded the LINQ preview. Obviously it’s fairly heavily genericised (if that’s even a word) and I decided to push it a little. Nothing particularly heavy – just an interesting bit of functional programming.

It’s easy to do a query which returns a string property from an object:

var items = new[] 
    new {Data = "First"},
    new {Data = "Second"},
    new {Data = "Third"}

IEnumerable<string> query = from item in items
                            select item.Data;

foreach (string s in query)
    Console.WriteLine (s);

The above prints out:


It’s not particularly hard to return a string from an object, having performed an operation on that string, where the operation is also specified by
the object:

var items = new[] 
    new {Data = "First", Operation = (Func<string,string>) (s => s+s)},
    new {Data = "Second", Operation = (Func<string,string>) (s => s.Substring(1))},
    new {Data = "Third", Operation = (Func<string,string>) (s => s)}

IEnumerable<string> query = from item in items
                            select item.Operation(item.Data);

foreach (string s in query)
    Console.WriteLine (s);

The first operation is plain concatenation; the second takes the first letter off the string, and the third is the identity operator. The above prints out:


So, the next thing I wanted to try was performing the operation twice. using the output of the first as the input of the second. I could have just used item.Operation(item.Operation(item.Data)) but where would the fun be in that? Instead, I wanted to have an operator whose parameters were a transformation from one type to the same type and an initial value, returning the same type. I’d hoped it would be as simple as Func<Func<T,T>,T,T> doItTwice = (op, input) => op(op(input));. After all, the implementation we’ve given doesn’t care in the slightest what the type involved is. Unfortunately, .NET generics don’t allow that kind of thing as far as I can tell – because it doesn’t know whether T is going to be a reference type or a value type, it can’t create the appropriate concrete implementation. Changing the declaration to use string instead of T everywhere works fine, but it’s much less elegant.

The interesting thing is that (lambda functions aside) I believe that would be possible in Java. Java generics are much weaker in terms of implementation, but allow some things to be expressed which you just can’t do in .NET. (The reverse is true too, of course, partly because .NET generics can involve value types and Java generics can’t. So, to put on my best whiny voice – why can’t we have the best of both worlds? I understand that it probably makes things a bit harder in terms of implementation, but come on, that ‘s the kind of thing which only has to be worked out once, by one team – whereas hundreds of thousands of developers are going to be actually using it.

The good news is that playing with lambda functions is still fun, just like I expected it to be.

Improvement to extension method syntax

I’m blogging this before my sleep-deprived mind (3 hours in the last 33, and I’m getting up for the day in an hour or so) loses it.

One of the things I don’t like about the proposed extension methods is the way the compiler is made aware of them – on a namespace basis. “Using” directives are very common to add for any namespace used in a class, and quite often I wouldn’t want the extension methods within that namespace to be applied. I propose using using static System.Query.Sequence; instead, in a way that is analogous to the static imports of Java 1.5 (except without importing single members or requiring the “.*” part. This would make it clearer what you were actually trying to do.