Returning from Copenhagen

October 30, 2008 jonskeet 14 Comments

I’m currently sitting in Copenhagen airport, with two hours to wait before my flight boards. It’s been a tiring day – my feet in particular are aching somewhat, and my throat is quite sort – but I’ve thoroughly enjoyed it. We started the actual talk at about 10am and finished at 4pm, with three breaks totalling about an hour – so I was speaking for five hours. You might have thought that with all that time, it would be easy to go into a lot of detail about C# 2 and 3, but there’s really too much to cover everything. Topics I remember talking about:

Generics (assuming everyone knew the basics – talked about static members, constraints)
Iterator blocks
Anonymous methods
Automatic properties
Anonymous types
Object/collection initializers (really briefly)
Lambda expressions
Extension methods
Query translation (at a very broad level)
The basics of LINQ to Objects, including streaming/buffering and deferred/immediate execution
Push LINQ and Protocol Buffers
C# 4
What I’d like to see in C# 5

The whole session was recorded – I’ll post a link when it’s ready, along with slides and code – if you really want to sit through 5 hours of me talking :)

Anyway, I’ve had a great time, and massive thanks to Brian Rasmussen for organising it and letting me stay in his beautiful home, as well as to Microsoft for hosting the event and Google for giving me the go-ahead to do it.

Next stop: London .NET User Group on November 19th, talking about Push LINQ (in more detail).

C#, C# 4, CSharpDev, CSharpDevCenter, Wacky Ideas

C# 4.0: dynamic ?

October 30, 2008 jonskeet 27 Comments

I’ve not played with the VS2010 CTP much yet, and I’ve only looked briefly at the documentation and blogs about the new C# 4.0 dynamic type, but a thought occurred to me: why not have the option of making it generic as a way of saying “I will dynamically support this set of operations”?

As an example of what I mean, suppose you have an interface IMessageRouter like this:

public interface IMessageRouter
{
void Send(string message, string destination);
}

(This is an arbitrary example, by the way. The idea isn’t specifically more suitable for message routing than anything else.)

I may have various implementations, written in various languages (or COM) which support the Send method with those parameters. Some of those implementations actually implement IMessageRouter but some don’t. I’d like to be able to do the following:

dynamic<IMessageRouter> router = GetRouter();

// This is fine (but still invoked dynamically)
router.Send(“message”, “skeet@pobox.com”);
// Compilation error: no such overload
router.Send(“message”, “skeet@pobox.com”, 20);

Intellisense would work, and we’d still have some of the benefits of static typing but without the implementations having to know about your interface. Of course, it would be quite easy to create an implementation of the interface which did exactly this – but now imagine that instead of IMessageRouter we had MessageRouter – a concrete class. In this case the compiler would still restrict the caller to the public API of the class, but it wouldn’t have to be the real class. No checking would be performed by the compiler that your dynamic type actually supported the operations – given that we’re talking about dynamic invocation, that would be impossible to do. It would instead be an “opt-in” restriction the client places on themselves. It could also potentially help with performance – if the binding involved realised that the actual type of the dynamic object natively implemented the interface or was/derived from the class, then no real dynamic calls need be made; just route all directly.

This may all sound a bit fuzzy – I’m extremely sleepy, to be honest – but I think it’s a potentially interesting idea. Thoughts?

Update

Apparently this post wasn’t as clear as it might be. I’m quite happy to keep the currently proposed dynamic type idea as well – I’d like this as an additional way of using dynamic objects.

C#, LINQ

What other Enumerable extension methods would you like to see?

October 23, 2008 jonskeet 23 Comments

A few questions on Stack Overflow have suggested to me that there might be some bits missing from LINQ to Objects. There’s the idea of a “zip” operator, which pairs up two sequences, for instance. Or the ability to apply the set operators with projections, e.g. DistinctBy, IntersectBy etc (mirroring OrderBy). They’re easy to implement, but it would be nice to get a list of what people would like to see.

So, what’s missing?

C#, CSharpDev, CSharpDevCenter, LINQ

DeveloperDeveloperDeveloper: Registration now open (hurry!)

October 22, 2008 jonskeet 3 Comments

The registration page for DeveloperDeveloperDeveloper Day 2008 (Reading, November 22nd) is now open. In the past this has been heavily oversubscribed, so if you want to come you’ll need to register quickly.

I’ve got a speaking slot in the afternoon: implementing LINQ to Objects in 60 minutes. As always, I’m very much looking forward to it.

C#, CSharpDev, CSharpDevCenter, Protocol Buffers

Mapping from a type to an instance of that type

October 8, 2008 jonskeet 18 Comments

A question came up on Stack Overflow yesterday which I’ve had to deal with myself before now. There are times when it’s helpful to have one value per type, and that value should be an instance of that type. To express it in pseudo-code, you want an IDictionary<typeof(T), T> except with T varying across all possible types. Indeed, this came up in Protocol Buffers at least once, I believe.

.NET generics don’t have any way of expressing this, and you end up with boxing and a cast. I decided to encapsulate this (in MiscUtil of course, although it’s not in a released version yet) so that I could have all the nastiness in a single place, leaving the client code relatively clean. The client code makes calls to generic methods which either take an instance of the type argument or return one. It’s a really simple class, but a potentially useful one:

/// <summary>
/// Map from types to instances of those types, e.g. int to 10 and
/// string to “hi” within the same dictionary. This cannot be done
/// without casting (and boxing for value types) as .NET cannot
/// represent this relationship with generics in their current form.
/// This class encapsulates the nastiness in a single place.
/// </summary>
public class DictionaryByType
{
private readonly IDictionary<Type, object> dictionary = new Dictionary<Type, object>();

    /// <summary>
    /// Maps the specified type argument to the given value. If
    /// the type argument already has a value within the dictionary,
    /// ArgumentException is thrown.
    /// </summary>
    public void Add<T>(T value)
    {
        dictionary.Add(typeof(T), value);
    }

    /// <summary>
    /// Maps the specified type argument to the given value. If
    /// the type argument already has a value within the dictionary, it
    /// is overwritten.
    /// </summary>
    public void Put<T>(T value)
    {
        dictionary[typeof(T)] = value;
    }

    /// <summary>
    /// Attempts to fetch a value from the dictionary, throwing a
    /// KeyNotFoundException if the specified type argument has no
    /// entry in the dictionary.
    /// </summary>
    public T Get<T>()
    {
        return (T) dictionary[typeof(T)];
    }

    /// <summary>
    /// Attempts to fetch a value from the dictionary, returning false and
    /// setting the output parameter to the default value for T if it
    /// fails, or returning true and setting the output parameter to the
    /// fetched value if it succeeds.
    /// </summary>
    public bool TryGet<T>(out T value)
    {
        object tmp;
        if (dictionary.TryGetValue(typeof(T), out tmp))
        {
            value = (T) tmp;
            return true;
        }
        value = default(T);
        return false;
    }
}

It doesn’t implement any of the common collection interfaces, because it would have to do so in a way which exposed the nastiness. I’m tempted to make it implement IEnumerable<KeyValuePair<Type, object>> but even that’s somewhat unpleasant and unlikely to be useful. Easy to add at a later date if necessary.

(I know the XML documentation leaves something to be desired. One day I’ll learn how to really do it properly – currently I fumble around if I’m trying to refer to other types etc within the docs.)

Books, C#, CSharpDev, CSharpDevCenter

Why boxing doesn’t keep me awake at nights

October 8, 2008 jonskeet 26 Comments

I’m currently reading the (generally excellent) CLR via C#, and I’ve recently hit the section on boxing. Why is it that authors feel they have to scaremonger about the effects boxing can have on performance?

Here’s a piece of code from the book:

using System;

public sealed class Program {
public static void Main() {
Int32 v = 5; // Create an unboxed value type variable.

#if INEFFICIENT
      // When compiling the following line, v is boxed
      // three times, wasting time and memory
      Console.WriteLine(“{0}, {1}, {2}”, v, v, v);
#else
      // The lines below have the same result, execute
      // much faster, and use less memory
      Object o = v;

      // No boxing occurs to compile the following line.
      Console.WriteLine(“{0}, {1}, {2}”, o, o, o);
#endif
   }
}

In the text afterwards, he reiterates the point:

This second version executes much faster and allocates less memory from the heap.

This seemed like an overstatement to me, so I thought I’d try it out. Here’s my test application:

using System;
using System.Diagnostics;

public class Test
{
    const int Iterations = 10000000;

    public static void Main()
    {
        Stopwatch sw = Stopwatch.StartNew();
        for (int i=0; i < Iterations; i++)
        {
#if CONSOLE_WITH_BOXING
            Console.WriteLine(“{0} {1} {2}”, i, i, i);
#elif CONSOLE_NO_BOXING
            object o = i;
            Console.WriteLine(“{0} {1} {2}”, o, o, o);
#elif CONSOLE_STRINGS
            string s = i.ToString();
            Console.WriteLine(“{0} {1} {2}”, s, s, s);
#elif FORMAT_WITH_BOXING
            string.Format(“{0} {1} {2}”, i, i, i);
#elif FORMAT_NO_BOXING
            object o = i;
            string.Format(“{0} {1} {2}”, o, o, o);
#elif FORMAT_STRINGS
            string s = i.ToString();
            string.Format(“{0} {1} {2}”, s, s, s);
#elif CONCAT_WITH_BOXING
            string.Concat(i, ” “, i, ” “, i);
#elif CONCAT_NO_BOXING
            object o = i;
            string.Concat(o, ” “, o, ” “, o);
#elif CONCAT_STRINGS
            string s = i.ToString();
            string.Concat(s, ” “, s, ” “, s);
#endif
        }
        sw.Stop();
        Console.Error.WriteLine(“{0}ms”, sw.ElapsedMilliseconds);
    }
}

I compiled the code with one symbol defined each time, with optimisations and without debug information, and ran it from a command line, writing to nul (i.e. no disk or actual console activity). Here are the results:

Symbol	Results (ms)	Average (ms)
CONSOLE_WITH_BOXING	33054	33444
	33898
	33381
CONSOLE_NO_BOXING	34638	33451
	32423
	33294
CONSOLE_STRINGS	29259	28337
	29071
	26683
FORMAT_WITH_BOXING	17143	17210
	18100
	16389
FORMAT_NO_BOXING	15814	15657
	15936
	15222
FORMAT_STRINGS	9178	8999
	9077
	8742
CONCAT_WITH_BOXING	12056	12563
	14304
	11329
CONCAT_NO_BOXING	11949	12240
	13145
	11628
CONCAT_STRINGS	5833	5936
	6263
	5713

So, what do we learn from this? Well, a number of things:

As ever, microbenchmarks like this are pretty variable. I tried to do this on a “quiet” machine, but as you can see the results varied quite a lot. (Over two seconds between best and worst for a particular configuration at times!)
The difference due to boxing with the original code in the book is basically inside the “noise”
The dominant factor of the statement is writing to the console, even when it’s not actually writing to anything real
The next most important factor is whether we convert to string once or three times
The next most important factor is whether we use String.Format or Concat
The least important factor is boxing

Now I don’t want anyone to misunderstand me – I agree that boxing is less efficient than not boxing, where there’s a choice. Sometimes (as here, in my view) the “more efficient” code is slightly less readable – and the efficiency benefit is often negligible compared with other factors. Exactly the same thing happened in Accelerated C# 2008, where a call to Math.Pow(x, 2) was the dominant factor in a program again designed to show the efficiency of avoiding boxing.

The performance scare of boxing is akin to that of exceptions, although I suppose it’s more likely that boxing could cause a real performance concern in an otherwise-well-designed program. It used to be a much more common issue, of course, before generics gave us collections which don’t require boxing/unboxing to add/fetch data.

In short: yes, boxing has a cost. But please look at it in context, and if you’re going to start making claims about how much faster code will run when it avoids boxing, at least provide an example where it actually contributes significantly to the overall execution cost.

C#, C# 4, General, Google, LINQ, Parallelization, Protocol Buffers

DotNetRocks interview

October 7, 2008 jonskeet 5 Comments

Last Monday evening I had a chat with the guys from DotNetRocks, and today the show has gone live.

I wouldn’t claim to have said anything particularly earth-shattering, and regular readers will probably be familiar with many of the themes anyway, but I thoroughly enjoyed it and hope you will too. Amongst other things, we talked about:

Protocol buffers
Implicit typing and anonymous types
Why it doesn’t bother me that Office hasn’t been ported to .NET
C# 4
My wishlist for C#
Threading and Parallel Extensions
Working for Google
How to learn LINQ
C# in Depth

Feedback welcome. And yes, I know I sound somewhat like a stereotypical upper-class idiot at times. Unfortunately there’s not a lot I can do about that. Only the “idiot” part is accurate :)

C#, CSharpDev, CSharpDevCenter

Non-nullable reference types

October 6, 2008 jonskeet 25 Comments

I suspect this has been done several times before, but on my way home this evening I considered what would be needed for the reverse of nullable value types – non-nullable reference types. I’ve had a play with an initial implementation in MiscUtil (not in the currently released version) and it basically boils down (after removing comments, equality etc) to this:

public struct NonNullable<T> where T : class
{
private readonly T value;

    public NonNullable(T value)
    {
        if (value == null)
        {
            throw new ArgumentNullException(“value”);
        }
        this.value = value;
    }

    public T Value
    {
        get
        {
            if (value == null)
            {
                throw new NullReferenceException();
            }
            return value;
        }
    }

    public static implicit operator NonNullable<T>(T value)
    {
        return new NonNullable<T>(value);
    }

    public static implicit operator T(NonNullable<T> wrapper)
    {
        return wrapper.Value;
    }
}

Currently I’ve got both conversion operators as implicit, which isn’t quite as safe/obvious as it is with nullable value types, but it does make life simpler. Obviously this is experimental code more than anything else, so I may change my mind later.

The simple use case is something like this (along with testing error cases):

[Test]
public void Demo()
{
    SampleMethod(“hello”); // No problems
    try
    {
        SampleMethod(null);
        Assert.Fail(“Expected exception”);
    }
    catch (ArgumentNullException)
    {
        // Expected
    }
    try
    {
        SampleMethod(new NonNullable<string>());
        Assert.Fail(“Expected exception”);
    }
    catch (NullReferenceException)
    {
        // Expected
    }
}

private static void SampleMethod(NonNullable<string> text)
{
    // Shouldn’t get here with usual conversions, but could do
    // through default construction. The conversion to string
    // will throw an exception anyway, so we’re guaranteed
    // that foo is non-null afterwards.
    string foo = text;
    Assert.IsNotNull(foo);
}

I don’t intend to use this within MiscUtil at the moment. Note that it’s a struct, which has a few implications:

It has the same memory footprint as the normal reference type. No GC pressure and no extra dereferencing. This is basically the main reason for making it a struct.
It ends up being boxed to a new object. Boo hiss – ideally the boxing conversion would box it to the wrapped reference, and you could “unbox” (not really unboxing, of course) from a non-null reference to an instance of the struct.
It’s possible to create instances wrapping null references, by calling the parameterless constructor or using the default value of fields etc. This is very annoying.

The good news is that it’s self-documenting and easy to use – the conversions will happen where required, doing appropriate checking (reasonably cheaply). The bad news is the lack of support from either the CLR (boxing behaviour above) or language (I’d love to specify string! text as the parameter in the sample method above.) I suspect that if Microsoft were to do this properly, they’d want to avoid all of the problems above – but I’m not sure how…

Anyway, just a little experiment, but an interesting one – any thoughts?

C#, CSharpDev, CSharpDevCenter

Formatting strings

October 6, 2008 jonskeet 16 Comments

A while ago I wrote an article about StringBuilder and a reader mailed me to ask about the efficiency of using String.Format instead. This reminded me of a bone I have to pick with the BCL.

Whenever we make a call to String.Format, it has to parse the format string. That doesn’t sound too bad, but string formatting can be used a heck of a lot – and the format is almost always hard-coded in some way. It may be loaded from a resource file instead of being embedded directly in the source code, but it’s not going to change after the application has started.

I put together a very crude benchmark which joins two strings together, separating them with just a space. The test uses String.Format first, and then concatenation. (I’ve tried it both ways round, however, and the results are the same.)

using System;
using System.Diagnostics;

public static class Test
{
    const int Iterations=10000000;
    const int PieceSize=10;

    static void Main()
    {
        string first = GenerateRandomString();
        string second = GenerateRandomString();
        int total=0;

        Stopwatch sw = Stopwatch.StartNew();
        for (int i=0; i < Iterations; i++)
        {
            string x = String.Format(“{0} {1}”, first, second);
            total += x.Length;
        }
        sw.Stop();
        Console.WriteLine(“Format: {0}”, sw.ElapsedMilliseconds);
        GC.Collect();

        sw = Stopwatch.StartNew();
        for (int i=0; i < Iterations; i++)
        {
            // Equivalent to first + ” ” + second
            string x = String.Concat(first, ” “, second);
            total += x.Length;
        }
        sw.Stop();
        Console.WriteLine(“Concat: {0}”, sw.ElapsedMilliseconds);
        if (total != Iterations * 2 * (PieceSize*2 + 1))
        {
            Console.WriteLine(“Incorrect total: {0}”, total);
        }
    }

    private static readonly Random rng = new Random();
    private static string GenerateRandomString()
    {
        char[] ret = new char[PieceSize];
        for (int j=0; j < ret.Length; j++)
        {
            ret[j] = (char) (‘A’ + rng.Next(26));
        }
        return new string(ret);
    }
}

And the results (on my very slow Eee)…

Format: 14807
Concat: 3567

As you can see, Format takes significantly longer than Concat. I strongly suspect that this is largely due to having to parse the format string on each iteration. That won’t be the whole of the cost – String needs to examine the format specifier for each string as well, in case there’s padding, etc – but again that could potentially be optimised.

I propose a FormatString class with a pair of Format methods, one of which takes a culture and one of which doesn’t. We could then hoist our format strings out of the code itself and make them static readonly variables referring to format strings. I’m not saying it would do a huge amount to aid performance, but it could shave off a little time here and there, as well as making it even more obvious what the format string is used for.

Jon Skeet's coding blog

Monthly Archives: October 2008