All posts by jonskeet

Mad props to @arcaderage for the "Princess Rescue" image - see https://toggl.com/programming-princess for the full original

Bringing Subversion and Fitnesse together

I’ve recently started working with Subversion (a version control system) and FitNesse (the Fit acceptance testing framework based in a wiki). FitNesse has a primitive version control system built into it, where it builds zip files of previous versions of pages. It’s all a bit messy though, and it’s not likely to be the version control system used by the rest of your source code. Why wouldn’t you want your acceptance tests in the same repository you use for the rest of your source and tests?

So, arming myself with JavaSVN (a pure Java Subversion client library) I went looking
at the FitNesse source code. I’m sorry to say it’s not everything I’d hoped for – lots of methods declared to just throw Exception, using streams with no try/finally blocks and (I suspect) a rather gaping potential for things to go seriously wrong if someone commits a page at the same time as someone else deletes it. However, life goes on – fortunately I was able to find the entry point I needed fairly quickly.

In this case, it was fitnesse.wiki.FileSystemPage, which dealt with both the writing of the “plain” contents/metadata files, along with the versioning. It was only a matter of a few hours to refactor that to allow the versioning piece to be pluggable. Adding Subversion support took another few hours, and the result works reasonably well. A few things to note:

  • I could possibly have used an existing plugin point instead of creating a versioning system off FileSystemPage.
    I didn’t know that at the time, and I’m not sure how much it would have helped me. I’m not sure whether JavaSVN would have
    let me get away with making changes to the repository without having a working copy at all, but if so that would have been
    quite a nice solution. There’s no real need for a directory hierarchy – just a file per page, and Subversion properties to
    store the FitNesse metadata. With the sort of load I’m expecting the server at work to have, performance wouldn’t have been
    an issue, and it would quite possibly have simplified things a bit. On the other hand, what I’ve got works and was probably
    a bit simpler to implement. On the other hand, it means changing FitNesse :(
  • I’ve currently implemented the new interface in the fitnesse.wiki namespace, and I build it within the same
    Eclipse project as the rest of the FitNesse code. It should really be in its own separate jar file, but that seemed overkill for what I was doing at the moment (especially as it’s only one source file).
  • I’ve only done manual testing on this. I don’t know enough about either FitNesse or JavaSVN to sanely test what I’ve done.
    I’m sure it’s possible, and I hope that if others find this hack useful, they could help me to test it properly. I’m somewhat ashamed of this situation, given my firm belief in TDD – it’s due to a lack of understanding of where to go, not a belief that I’ll have magically got the code right. On the plus side, all the built-in FitNesse tests still pass, so I’m reasonably confident that if you run it without actually using the Subversion code, it’ll still work.
  • I’m really worried about threading. It’s unlikely to be a problem unless you happen to get two users doing things to the same pages at the same time, but the level of locking present in FileSystemPage doesn’t really cut it. That level is too low to be particularly useful, as one responder may need to change several pages on disk, and that should be done reasonably atomically. (I don’t even try to do it in an atomic way in terms of Subversion, but stopping other disk activity from interfering would be helpful.) Of course, you should never end up with a hosed Subversion repository (I really hope the server just wouldn’t let you do that) but it may be possible to get into a situation where you need to either do some delicate work updating the working copy manually, or just check the whole tree out again.
  • Currently, files (the ones under the files directory) aren’t versioned. I’m not sure how easy that will be to fix, but it’s
    obviously something which is needed before it’s really production-ready. Hooks are needed for upload, delete and rename. Creating a directory probably doesn’t need to be versioned, so long as the directory is put under version control when the first file is created.

The whole change (a total of seven files – it’s a relatively small code change, all things considered) is available along with installation instructions on my main web site. It’s pretty basic at the moment, but if it all takes off, who knows what could happen?

List.ForEach vs foreach(…)

A thread came up yesterday on the C# newsgroup about when to use the “normal” foreach
and when to use List<T>.ForEach (assuming, of course, that one is dealing with a
List<T> in the first place). Obviously there are readability issues, but we ended up focusing
on performance. (Isn’t that telling in its own right? How often is the iteration part rather than
the body going to dominate and be a significant bottleneck? Anyway, I digress.)

So, I wrote a small benchmark, and Patrick asked me to blog about it. I’ve refactored the test I posted on the newsgroup and added a couple more tests as suggested by Willy Denoyette. The source code is a little bit unwieldy (and frankly tedious) to include in this blog post – download it if you’re interested.

The test basically creates a list of strings, each being “x”. Each test case iterates through the
list a fixed number of times, keeping a running total of the lengths of strings it sees. The result
is checked and the time taken is reported. This is what the individual tests do:

  • LanguageForEach just uses foreach (string x in list) in the obvious way.
  • NewDelegateEachTime uses an anonymous method as the parameter to List.ForEach<T>, where that method captures a different variable each “outer” iteration. That means a new delegate has to be created each time.
  • CachedDelegate creates a single delegate and uses that for all calls to List<T>.ForEach.
  • LanguageForEachWithCopy1 copies the list to an array each “outer” iteration, and then uses foreach over that array.
  • LanguageForEachWithCopy2 copies the list to an array once at the start of the test, and then uses foreach over that array.

Here are the results, with a few different test cases (all doing the same amount of work overall). I shall attempt to tabulate them a bit better when I get some time :)

Test parameters: Size=10000000; Iterations=100
Test 00:00:11.8251914: LanguageForEach
Test 00:00:05.3463387: NewDelegateEachTime
Test 00:00:05.3238162: CachedDelegate
Test 00:00:22.1342570: LanguageForEachWithCopy1
Test 00:00:03.7493164: LanguageForEachWithCopy2

Test parameters: Size=1000000; Iterations=1000
Test 00:00:11.8163135: LanguageForEach
Test 00:00:05.3392333: NewDelegateEachTime
Test 00:00:05.3334596: CachedDelegate
Test 00:00:26.9471681: LanguageForEachWithCopy1
Test 00:00:03.5251209: LanguageForEachWithCopy2

Test parameters: Size=100000; Iterations=10000
Test 00:00:11.6576344: LanguageForEach
Test 00:00:05.2225531: NewDelegateEachTime
Test 00:00:05.2066938: CachedDelegate
Test 00:00:16.2563401: LanguageForEachWithCopy1
Test 00:00:03.0949064: LanguageForEachWithCopy2

Test parameters: Size=100; Iterations=10000000
Test 00:00:12.2547105: LanguageForEach
Test 00:00:04.9791093: NewDelegateEachTime
Test 00:00:04.6191521: CachedDelegate
Test 00:00:06.0731525: LanguageForEachWithCopy1
Test 00:00:02.8182444: LanguageForEachWithCopy2

The LanguageForEachWithCopy1 results surprised me, as I’d really expected the
performance to go up as the number of iterations went up. It seems it’s cheaper to copy
a short list many times than a long list a few times…

Singletons and inheritance

For a long time, when people have asked about having inheritance and singletons, I’ve stated flatly that a singleton can’t be derived from. It stops being a singleton at that point. Even if the class is internal and you can prove that no other classes in the assembly do derive from the singleton and break the pattern’s effect, it’s still not a genuine singleton.

It was only when I was thinking about one of the comments about my enhanced enums proposal that I realised there’s an alternative approach. You can derive from a singleton as nested types of the Singleton itself and still keep the private constructor. That means that the singleton nature is still contained within the body of the text which declares the singleton class, even though that text actually declares more classes. Indeed, the singleton itself can even be abstract. Here’s an example:

using System;

public abstract class DataProvider
{
    static DataProvider instance = CreateDataProvider();
    
    public static DataProvider Instance
    {
        get { return instance; }
    }
    
    // Note that nested classes can call private members
    private DataProvider() {}

    public abstract void Connect();
    
    static DataProvider CreateDataProvider()
    {
        // Use Oracle on a Sunday, SQL Server otherwise
        if (DateTime.Now.DayOfWeek==DayOfWeek.Sunday)
        {
            return new OracleProvider();
        }
        else
        {
            return new SqlServerProvider();
        }
    }
    
    // Note that there’s no need to make the constructors
    // for the nested types non-public, as the classes
    // themselves are private to DataProvider.
    
    class OracleProvider : DataProvider
    {
        public override void Connect()
        {
            Console.WriteLine (“Connecting to Oracle”);
        }
    }

    class SqlServerProvider : DataProvider
    {
        public override void Connect()
        {
            Console.WriteLine (“Connecting to SQL Server”);
        }
    }
}

I’m not suggesting you should actually use a singleton for a data provider like this –
it just seemed like a simple example to demonstrate the point.

It is easy to validate that the singleton only allows one instance of DataProvider
to ever be constructed. (This version isn’t fully lazy, but that could be added if desired.)

It looks like I’ll have to revise my statement about inheritance from now on…

ICloneable? Not quite…

I don’t usually post personal news on my blog, but this is fairly major. Some of you may know that my wife, Holly, is pregnant. What you won’t know – and what we didn’t know until today – was that we’re having twins. Eek! A lovely surprise, if somewhat scary. For those of you who like piccies, the scans are on my web site…

Now if you’ll excuse me, I think I’ll go and lie down. And to think I was going to write up C# 2.0 features tonight…

Nice doc comment idea

I’ve just been reading the
transcript of a whiteboard session with Anders Hejlsberg
and one of the questions
is really, really good:

Question: My problem is I’ve got these XML doc comments
that are duplicated. I just strip off one. I guess it would be a neat
language feature to be able to somehow indicate this is my primary- my
big method, right? With all the parameters. Then the other ones are
just going to borrow that XML doc comment.

Hejlsberg: Yes, okay. Now that I think is- that’s not a bad idea.
That yes, they should be able to share the documentation. I can sympathize
with that.

That’s not just “not a bad idea”. That’s a fantastic idea. When I was
writing the BitConverter and BinaryReader etc equivalents
in my miscellaneous utility library
the doc comments for the overloads took significantly longer to write than the actual code.
(Most of the code was just each overload calling a “master” routine.) Now, sometimes
that comment won’t be exactly the same for each overload; sometimes there’ll effectively
be placeholders: “Converts the specified ${type} value into ${n} bytes” or whatever. I don’t
know exactly how this could be done elegantly (and I’m not actually suggesting the ${token} syntax!), but it’s something that should be strongly
considered for a later version of C#. It could make life a lot simpler in some cases.

Enhanced enums in C#

This will be an evolving post, hopefully. (If no-one comments on it, it probably
won’t change unless I come up with better ideas myself.) Since working on a Java
project last year, I’ve been increasingly fed up with C#’s enums. They’re really
not very object oriented: they’re not type-safe (you can cast from one enum to
another via a cast to their common underlying type), they don’t allow any
behaviour to be specified, etc. They’re just named constant integral values.
Until I played with Java
1.5’s enum support
, that wouldn’t have struck me as being
a problem, but (at least in some cases) enums can give you so much more. This post
is a feature proposal for C# 4.0. (I suspect the lid for this kind of thing is closed
on 3.0.)

What’s the basic idea?

Instead of being a fixed set of integral values, imagine if enums were a fixed set of
objects. They could have behaviour (and state of sorts), just like other objects – the
only difference would be that there’d only ever be one instance for each value. When
I say they could have “state of sorts” I mean that two values of an enum could differ
in just what they represented. For instance, imagine an enumeration of coins – I’ll use
US coins for convenience for most readers. Each value in the enum would have a name
(Cent, Nickel, Dime, Quarter, Dollar) and a monetary value in cents (1, 5, 10, 25, 100
respectively). Each might have a colour property too, or the metal they’re made of. That’s
the kind of state I mean. In fact, there’s nothing in the proposal below to say that the
state within an enum has to stay the same. I’d recommend that it did stay the same,
but maybe someone’s got an interesting use case where mutable enums would be useful.
(Actually, it’s not terribly hard to think of an example where the underlying state mutates
even if it doesn’t appear to from a public property point of view – some properties could
be lazily initialised if they might take time to compute.)

As well as properties, the enum type could have methods, just as other types do. For instance,
an enumeration of available encryption types may have methods to encrypt data. (I’m giving
fairly general examples here – in my experience, the actual enums you might use tend to be
very domain-specific, and as such don’t make good examples. I’m also trying to steer well
clear of risking giving away any intellectual property owned by my employer.)

Now, consider the possibilities available when you bring polymorphism into the picture. Not
every implementation of a method has to be the same. Some enum values may be instances of
a type derived from the top-most one. This would be limited at compile-time to make enums
fixed – you couldn’t derive from an enum type in the normal way, so you’d
always know that if you had a reference to an instance of the enum type, you’d got one of
the values specified by the enum.

What’s the syntax?

I propose a syntax which for the simplest of cases looks very much like normal enumerations,
just with class enum instead of enum:

public class enum Example
{
    FirstValue,
    SecondValue;
}

Note the semi-colon at the end of the list of values. This could perhaps be optional,
but when using the power of “class enums” (as I’ll call them for now) in a non-trivial way,
you’d need them anyway to tell the compiler you’d reached the end of the list of values,
and other type members were on the way. The next step is to introduce a constructor and
a property:

public class enum Coins
{
    Cent(1),
    Nickel(5),
    Dime(10),
    Quarter(25),
    Dollar(100);
    
    // Instance variable representing the monetary value
    readonly int valueInCents;
    
    // Constructor - all enum constructors are private
    // or protected
    Coins(int valueInCents)
    {
        this.valueInCents = valueInCents;
    }
    
    public int ValueInCents
    {
        get { return valueInCents; }
    }
}

Now, there would actually be some significant compiler magic going on at this point. Each of
the generated constructor calls would actually have an extra parameter – the name of the value. That
parameter would be present in every generated constructor, and if no constructors were declared, a protected
constructor would be generated which took just that parameter. Each constructor would implicitly
call the base constructor which would stash the name away and make it available through a Name
property (and no doubt through calls to ToString() too). What’s the base class in this case?
System.ClassEnum or some such. This could be an ordinary type as far as the CLR is concerned,
although language compilers would be as well to prevent direct derivation from it. This leaves room for
some compilers to allow types which aren’t really enums to derive from it, but does have the advantage
of not requiring a CLR change. Whenever you use someone else’s code you’re always taking a certain amount on
trust anyway, so arguably it’s not an awful risk. More about the services of System.ClassEnum
later…

The next piece of functionality is to have some members with overridden behaviour. The canonical example
of this is simple arithmetic operations – addition, subtraction, division and multiplication. The enumeration
contains a value for each operation, and has an Eval method to perform the operation. Here’s
what it would look like in C#:

public class enum ArithmeticOperation
{
    Addition
    {
        public override int Eval(int x, int y)
        {
            return x+y;
        }
    },
    
    Subtraction
    {
        public override int Eval(int x, int y)
        {
            return x-y;
        }
    },
    
    Multiplication
    {
        public override int Eval(int x, int y)
        {
            return x*y;
        }
    },
    
    Division
    {
        public override int Eval(int x, int y)
        {
            return x/y;
        }
    };
    
    public abstract int Eval(int x, int y);
}

Sometimes, you may wish to save a bit of space and specify the implementation of
a method as a delegate – especially if the method would otherwise be abstract (i.e.
there was no “common” implementation which most values would use). Here, C#’s
anonymous method syntax helps:

public class enum ArithmeticOperation
{    
    delegate int Int32Operation (int x, int y);
    
    Addition (delegate (int x, int y) { return x+y; }),
    Subtraction (delegate (int x, int y) { return x-y; }),
    Multiplication (delegate (int x, int y) { return x*y; }),
    Division (delegate (int x, int y) { return x/y; });
        
    Int32Operation op;
    
    ArithmeticOperation (Int32Operation op)
    {
        this.op = op;
    }
    
    public int Eval (int x, int y)
    {
        return op(x, y);
    }
}

That’s still a bit clumsy, of course – let’s try with lambda function syntax instead:

public class enum ArithmeticOperation
{    
    Addition ( (x,y) => x+y),
    Subtraction ( (x,y) => x-y),
    Multiplication ( (x,y) => x*y),
    Division ( (x,y) => x/y);
        
    Func<int, int> op;
    
    ArithmeticOperation (Func<int,int> op)
    {
        this.op = op;
    }
    
    public int Eval (int x, int y)
    {
        return op(x, y);
    }
}

Now we’re really cooking! Of course, some of the time you’ll be able to provide a single implementation for most values,
which only some values will want to override. Something like:

public class enum InputType
{
    Integer,
    String,
    Date
    {
        // Default implementation isn't quite good enough for us
        public override string Format(object o)
        {
            return ((DateTime)o).ToString("yyyyMMdd");
        }
    };
    
    // Default implementation of formatting
    public virtual string Format(object o)
    {
        return o.ToString();
    }
}

So far, this is all quite similar to Java’s enums in appearance. Java’s enums also come with an ordinal
(the position of declaration within the enum) automatically, but in my experience this is as much of a
pain as it is a blessing. In particular, as that ordinal can’t be specified in the source code, if you
have other code relying on specific values (e.g. to pass data across a web-service) you have to leave
bogus values in the list in order to keep the ordinals of the later values the same. Java also only
allows you to specify the constructors in “top-most” enum. This can occasionally be a nuisance. Let’s extend
the enum above to include DateTime and Time – both of which need the same kind of
“special” formatting. In Java, you’d have to override Format in three different places, like
this:

public class enum InputType
{
    Integer,
    String,
    DateTime
    {
        public override string Format(object o)
        {
            return ((System.DateTime)o).ToString("yyyyMMdd HH:mm:ss");
        }
    },
    Date
    {
        public override string Format(object o)
        {
            return ((System.DateTime)o).ToString("yyyyMMdd");
        }
    },
    Time
    {
        public override string Format(object o)
        {
            return ((System.DateTime)o).ToString("HH:mm:ss");
        }
    };
    
    public virtual string Format(object o)
    {
        return o.ToString();
    }
}

I would propose that enum values could reuse each other’s implementations, possibly
parameterising them via constructors. The above could be rewritten as:

public class enum InputType
{
    Integer,
    String,
    DateTime("yyyyMMdd HH:mm:ss")
    {
        string formatSpecifier;
        
        protected DateTime(string formatSpecifier)
        {
            this.formatSpecifier = formatSpecifier;
        }
        
        public override string Format(object o)
        {
            return ((System.DateTime)o).ToString(formatSpecifier);
        }
    },
    Date : DateTime("yyyyMMdd"),
    Time : DateTime("HH:mm:ss");
    
    public virtual string Format(object o)
    {
        return o.ToString();
    }
}

If Date wanted to further specialise the class (e.g. if another method needed overriding),
it could add implementation there too. Note that the DateTime constructor does not explicitly
call any constructor. In this case, an implicit call to the InputType constructor which took
only the name parameter would be made. Explicit calls to base class constructors could be included in the normal
way – the extra parameter would be entirely hidden from the source code. Only protected constructors could
be called by derived types in the normal way.

Switch

Switch statements would appear in exactly the same way they do now. (Possibly without the qualification (e.g.
case Time: instead of case InputType.Time:. The code is more readable without the
qualification, and is unambiguous, but it would be inconsistent with the current handling of switch cases
for “normal” enums.) The implementation would work a lot like strings – either using the equivalent of
a sequence of if statements or building a map behind the scenes. This is where keeping something
like Java’s ordinals would speed things up, but then I would at least want something like an attribute to be able to
specify the value in source code to avoid the problems described in Java earlier. Note that only reference
equality needs to be checked in any of these cases, as only one instance of any value would be created.

Static field initializers and static constructors

Java has restrictions on where you can use static fields in enums, because the static field initializers
are executed after the enum values have been created. Static fields are useful in a surprising number
of circumstances, mostly getting at an enum value dynamically by something other than name. The rules
for this would need to be considered carefully – sometimes it’s useful to have code which will be run
after the enums have all been set up; other times you want it beforehand (so you can use the static fields
during initialization, e.g. to add a value to a map).

Other features

Like Java, there could be an EnumSet<T> type which would be the equivalent of using
FlagsAttribute on normal enums. Indeed, the compiler could even generate operator overloads
to make this look nicer.

In Java, some enums with many values overriding many methods can end up being pretty large. In
C#, of course, we can use partial types to split the whole enum definition over several
files. (Some may object to this, others not – it would be available if you wanted it.)

Open questions

  • The potential abuse problem mentioned earlier
  • Serialization/deserialization would need to know to use the previously set up values
  • Should identifying values (like Java ordinals) be present, if only for switch performance?

I suspect there are other things I haven’t thought of, but with any luck this will be food for thought.

Deadlock detection – finally released

I’d hoped to be able to make this post a week ago, but adding extra unit tests, performance tests and documentation took longer than expected. (Doesn’t it always?)

I’ve now refactored the previous incarnation of SyncLock in my Miscellaneous Utility Library (snappy title, huh?) and added a new type of lock which can detect deadlocks (throwing an exception instead of entering the deadlock-prone state). There’s also a usage page explaining how to use the locks, and what the performance impact is. (Well, what it is on my box, anyway.) Any further suggestions now it’s concrete are welcome, of course.

Pedantry – how much is too much?

I’m a pedant, there’s no doubt about it. I’m particularly pedantic when it comes to terminology in computing discussions – at least where I see value in being precise about what is meant. So, when discussing static constructors in a mailing list thread recently, I’ve been very carefully distinguishing between a static constructor (which is a C# term) and a type initializer (which is a CLI term). This hasn’t been met terribly favourably by those who wish to use the term “static constructor” to mean both the .cctor member in a (compiled) type and the C# static constructor, despite them being slightly different in semantics and belonging to different domains. Now, I don’t wish to spill that discussion over onto my blog, but it has made me think about the general issue of pedantry when it comes to terminology.

Pedantry is rarely popular, but I believe it does bring value to a discussion, especially when some subtleties are involved. I generally assume a specification to be the authoritative source of information on terms related to the topic covered by the specification, as it’s a piece of common ground on which to base discussions. (The exception to this is if the spec is generally agreed to be incorrect in a particular regard.) If I talk about something being a variable and you understand “variable” in a completely different way to me, it’s a potential source of great confusion. I’m not pedantic to gain a feeling of superiority – I’m pedantic to try to make sure everyone’s effectively speaking the same language.

Of course, you don’t need to be absolutely precise all the time. If I were discussing an ASP.NET problem, for instance, I probably wouldn’t feel too bad about a sentence such as “x is now a string of length 5”. However, if I were discussing variables, reference types etc, I’d probably try to be more precise: “The value of x is now a reference to a string of length 5.” Writing (or reading) the second style for prolonged periods gets quite tedious, but I believe it’s important to be able to move into that mode when the need arises.

So, the question is: am I the only one who feels this way? I would expect most of the readers of this blog to be people who’ve read either my newsgroup posts, mailing list posts, or C# articles, so you probably have a fair idea of what I’m like. Do I go over the top, or do you find it useful? Is there a way of bringing precision to a discussion without irritating people (as I tend to, unfortunately)? Just to possibly remind you of things I’m often pedantic about, here’s a brief list of “pet peeves” which tend to involve people cutting fast and loose with terminology:

  • Value types “always being on the stack”
  • “Objects are passed by reference by default”
  • “C# supports two floating-point types: float and double.” (That one’s in the C# spec, unfortunately – decimal is also a floating point type.)
  • “I’m having trouble with ASCII characters above 127…” (along with its side-kick “I’m using extended ASCII”)
  • Volatility and atomicity being mixed up

What kind of deadlock prevention do you want?

Okay, so I’m having another look at the alternative threading ideas
which are part of my threading article. (They’re not that big an
alternative really – not compared with CSP etc – they’d just make
things more pleasant.) I want to add deadlock prevention to my locks,
making it impossible to lock things incorrectly (so long as you’re
locking the simple way – if you lock the associated monitor
independently, that’s your own lookout). Obviously this requires you to
set up what’s correct and what’s incorrect to start with. My question
to you all is: how do you want to be able to set up those
rules? What kind of rules do you need? Do they need to be extensible
somehow? Some desirable things may be impossible, but I’d like to know
what the ideal would look like before working out the realistic. I have
a couple of pretty simple ideas, but I won’t taint your own views by
mentioning them yet…

Visual Studio vs Eclipse

I often see people in newsgroups saying how wonderful Visual Studio is, and they often claim it’s the “best IDE in the world”. Strangely enough, most go silent when I ask how many other IDEs they’ve used for a significant amount of time. I’m not going to make any claims as to which IDE is “the best” – I haven’t used all the IDEs available, and I know full well that one (IDEA) is often regarded as superior to Eclipse. However, here are a few reasons I prefer Eclipse to Visual Studio (even bearing in mind VS 2005, which is a great improvement). Visual Studio has much more of a focus on designers (which I don’t
tend to use, for reasons given elsewhere) and much less of a focus on making actual coding as easy as possible.

Note that this isn’t a comparison of Java and C# (although those are the languages I use in Eclipse and VS respectively). For the most part, I believe C# is an improvement on Java, and the .NET framework is an improvement on the Java standard library. It’s just a shame the tools aren’t as good. For reference, I’m comparing VS2005 and Eclipse 3.1.1. There are new features being introduced to Eclipse all the time (as I write, 3.2M4 is out, with some nice looking things) and obviously MS is working on improving VS as well. So, without further ado (and in no particular order):

Open Type/Resource

When I hit Ctrl-Shift-T in Eclipse, an “Open Type” dialog comes up. I can then type in the name of any type (whether it’s my code, 3rd party library code, or the Java standard library code) and the type is opened. If the source is available (which it generally is – I’ve used very few closed source 3rd party Java components, and the source for the Java standard library is available) the source opens up; otherwise a list of members is displayed.

In large solutions, this is an enormous productivity gain. I regularly work with solutions with thousands of classes – remembering where each one is in VS is a bit of a nightmare. Non-Java resources can also be opened in the same way in Eclipse, using Ctrl-Shift-R instead. One neat feature is that Eclipse knows the Java naming conventions, and lets you type just the initial letters instead of the type name itself. (You only ever need to type as much as you want in order to find the type you’re after anyway, of course.) So for example, if I type “NPE”, I’m offered NullPointerException and NoPermissionException.

Note that this isn’t the same as the “Find Symbol” search offered by VS 2005. Instead, it’s a live updating search – as you type, the list is updated. This is very handy if you can’t remember whether it’s ArgumentNullException or NullArgumentException and the like – it’s very fast to experiment with.

There’s good news here: Visual Studio users have a saviour in the form of a free add-in called DPack, by USysWare. This offers dialogs
for opening types, members (like the Outline dialog, Ctrl-O, in Eclipse), and files. I’ve only just heard about it, and haven’t tried it on a large solution yet, but I have high hopes for it.

Sensible overload intellisense

(I’m using the word intellisense for what Eclipse calls Code Assist – I’m sure you know what I mean.) For some reason, although Visual Studio is perfectly capable of displaying the choice of multiple methods within a drop-down list, when it comes to overloads it prefers a spinner. Here’s what you get if you type sb.Append( into Visual Studio, where sb is a StringBuilder
variable:

Here’s what happens if you do the equivalent in Eclipse:

Look ma, I can see more than one option at once!

Organise imports

For those of you who aren’t Java programmers, import statements are the equivalent to using directives in C# – they basically import a type or namespace so that it can be used without the namespace being specified. In Visual Studio, you either have to manually type the using directives in (which can be a distraction, as you have to go to the top of the file and then back to where you were) or (with 2005) you can hit Shift-Alt-F10 after typing the name ofthe type, and it will give you the option of adding a using statement, or filling in the namespace for you. Now, as far as I’m aware, you have to do that manually for each type. With Eclipse, I can write a load of code which won’t currently compile, then hit Ctrl-Shift-O and the imports are added. I’m only prompted if there are multiple types available from different namespaces with the same name. Not only that, but I can get intellisense for the type name while I’m typing it even before I’ve added the import – and picking the type adds the import automatically. In addition, organise imports removes import statements which aren’t needed – so if you’ve added something but then gone back and removed it, you don’t have misleading/distracting lines at the top of your file. A feature which isn’t relevant to C# anyway but which is quite neat is that Eclipse allows you to specify how many individual type imports you want before it imports the whole package (e.g. import java.util.*). This allows people to code in whatever style they want, and still get
plenty of assistance from Eclipse.

Great JUnit integration

I confess I’ve barely tried the unit testing available in Team System, but it seems to be a bit of a pain in the neck to use. In Eclipse, having written a test class, I can launch it with a simple (okay, a slightly complicated – you learn to be a bit of a spider) key combination. Similarly I can select a package or a whole source directory and run all the unit tests within it. Oh, and it’s got a red/green bar, unlike Team System (from what I’ve seen). It may sound like a trivial thing, but having a big red/green bar in your face is a great motivator in test driven development. Numbers take more time to process – and really, the most important thing you need to know is whether all the tests have passed or not. Now, Jamie Cansdale has done a great job with TestDriven.NET, and I’m hoping that he’ll integrate it with VS2005 even better, but Eclipse is still in the lead at this point for me. Of course, it helps that it just comes with all this stuff, without extra downloads (although there are plenty of plugins available). Oh, and just in case anyone at Microsoft thinks I’ve forgotten: no, unit testing still doesn’t belong in just Team System. It should be in the Express editions, in my view…

Better refactoring

MS has made no secret of the fact that it doesn’t have many refactorings available out of the box. Apparently they’re hoping 3rd parties will add their own – and I’m sure they will, at a cost. It’s a shame that you have to buy two products in 2005 before you can get the same level of refactoring that has been available in Eclipse (and other IDEs) for years. (I know I was using Eclipse in 2001, and possibly earlier.)

Not only does Eclipse have rather more refactorings available, but they’re smarter, too. Here’s some sample code in C#:

public void DoSomething()
{
    string x = "Hello";
    byte[] b = Encoding.UTF8.GetBytes(x);
    byte[] firstHalf = new byte[b.Length / 2];
    Array.Copy(b, firstHalf, firstHalf.Length);
    Console.WriteLine(firstHalf[0]);
}

public void DoSomethingElse()
{
    string x = "Hello there";
    byte[] b = Encoding.UTF8.GetBytes(x);
    byte[] firstHalf = new byte[b.Length / 2];
    Array.Copy(b, firstHalf, firstHalf.Length);
    Console.WriteLine(firstHalf[0]);
}

If I select the last middle lines of the first method, and use the ExtractMethod refactoring, here’s what I get:

public void DoSomething()
{
    string x = "Hello";
    byte[] firstHalf = GetFirstHalf(x);
    Console.WriteLine(firstHalf[0]);
}

private static byte[] GetFirstHalf(string x)
{
    byte[] b = Encoding.UTF8.GetBytes(x);
    byte[] firstHalf = new byte[b.Length / 2];
    Array.Copy(b, firstHalf, firstHalf.Length);
    return firstHalf;
}

public void DoSomethingElse()
{
    string x = "Hello there";
    byte[] b = Encoding.UTF8.GetBytes(x);
    byte[] firstHalf = new byte[b.Length / 2];
    Array.Copy(b, firstHalf, firstHalf.Length);
    Console.WriteLine(firstHalf[0]);
}

Note that second method is left entirely alone. In Eclipse, if I have some similar Java code:

public void doSomething() throws UnsupportedEncodingException
{
    String x = "hello";        
    byte[] b = x.getBytes("UTF-8");
    byte[] firstHalf = new byte[b.length/2];
    System.arraycopy(b, 0, firstHalf, 0, firstHalf.length);
    System.out.println (firstHalf[0]);
}

public void doSomethingElse() throws UnsupportedEncodingException
{
    String y = "hello there";        
    byte[] bytes = y.getBytes("UTF-8");
    byte[] firstHalfOfArray = new byte[bytes.length/2];
    System.arraycopy(bytes, 0, firstHalfOfArray, 0, firstHalfOfArray.length);
    System.out.println (firstHalfOfArray[0]);
}

and again select Extract Method, then the dialog not only gives me rather more options, but one of them is whether to replace the duplicate code snippet elsewhere (along with a preview). Here’s the result:

public void doSomething() throws UnsupportedEncodingException
{
    String x = "hello";        
    byte[] firstHalf = getFirstHalf(x);
    System.out.println (firstHalf[0]);
}

private byte[] getFirstHalf(String x) throws UnsupportedEncodingException
{
    byte[] b = x.getBytes("UTF-8");
    byte[] firstHalf = new byte[b.length/2];
    System.arraycopy(b, 0, firstHalf, 0, firstHalf.length);
    return firstHalf;
}

public void doSomethingElse() throws UnsupportedEncodingException
{
    String y = "hello there";        
    byte[] firstHalfOfArray = getFirstHalf(y);
    System.out.println (firstHalfOfArray[0]);
}

Note the change to doSomethingElse. I’d even tried to be nasty to Eclipse, making the variable names different in the second method. It still does the business.

Navigational Hyperlinks

If I hold down Ctrl and hover over something in Eclipse (e.g. a variable, method or type name), it becomes a hyperlink. Click on the link, and it takes you to the declaration. Much simpler than right-clicking and hunting for “Go to definition”. Mind you, even that much
isn’t necessary in Eclipse with the Declaration view. If you leave your cursor in a variable, method or type name for a second, the Declaration view shows the appropriate code – the line
declaring the variable, the code for the method, or the code for the whole type. Very handy if you just want to check something quickly, without even changing which editor you’re using. (For those of you who haven’t used Eclipse, a view is a window like the Output window in VS.NET. Pretty much any window which isn’t an editor or a dialog is a view.)

Update! VS 2005 has these features too!
F12 is used to go to a definition (there may be a shortcut key in Eclipse as well to avoid having to use the mouse – I’m not sure).
VS 2005 also has the Code Definition window which is pretty much identical to the Declaration view. (Thanks for the tips, guys :)

Better SourceSafe integration

The source control integration in Eclipse is generally pretty well thought through, but what often amuses me is that it’s easier to use Visual SourceSafe (if you really have to – if you have a choice, avoid it) through Eclipse (using the free plug-in) than through Visual Studio. The whole binding business is much more easily set up. It’s a bit more
manual, but much harder to get wrong.

Structural differences

IDEs understand code – so why do most of them not allow you to see differences in code terms? Eclipse does. I can ask it to compare two files, or compare my workspace version with the previous (or any other) version in source control, and it shows me not just the textual
differences but the differences in terms of code – which methods have been changed, which have been added, which have been removed. Also, when going through the differences, it shows blocks at a time and then what’s changed within the block – i.e. down to individual words, not just lines. This is very handy when comparing resources in foreign languages!

Compile on save

The incremental Java compiler in Eclipse is fast. Very, very fast. And it compiles in the background now, too – but even when it didn’t, it rarely caused any bother. That’s why it’s perfectly acceptable for it to compile (by default – you can change it of course) whenever you save. C# compiles a lot faster than C/C++, but I still have to wait a little while for a build to finish, which means that I don’t do it as often as I save in Eclipse. That in turn means I see some problems later than I would otherwise.

Combined file and class browser

The package explorer in Eclipse is aware that Java files contain classes. So it makes sense to allow you to expand a file to see the types within it:

That’s it – for now…

There are plenty of other features I’d like to mention, but I’ll leave it there just for now. Expect this blog entry to grow over time…