Reimplementing LINQ to Objects: Part 25 – ToDictionary

This one ended up being genuinely easy to implement, although with lots of tests for different situations.

What is it?

ToDictionary has four overloads which look remarkably similar to the ones we used for ToLookup:

public static Dictionary<TKey, TSource> ToDictionary<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector)

public static Dictionary<TKey, TElement> ToDictionary<TSource, TKey, TElement>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    Func<TSource, TElement> elementSelector)

public static Dictionary<TKey, TSource> ToDictionary<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IEqualityComparer<TKey> comparer)
        
public static Dictionary<TKey, TElement> ToDictionary<TSource, TKey, TElement>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    Func<TSource, TElement> elementSelector,
    IEqualityComparer<TKey> comparer)

Yes, it’s the ever-popular combination of an optional key comparer and an optional element selector when mapping values.

ToDictionary does exactly what it says on the tin: it converts the source sequence to a dictionary. Each source item is mapped to a key, and then either the source item itself is used as the value or the element selector is applied to obtain one. A custom equality comparer can be used to create a dictionary which is (for example) case-insensitive when fetching keys.

The main difference between ToDictionary and ToLookup is that a dictionary can only store a single value per key, rather than a whole sequence of values. Of course, another difference is that Dictionary<TKey, TValue> is mutable concrete class, whereas ILookup<TKey, TElement> is an interface usually implemented in an immutable fashion.

The normal sort of note:

  • ToDictionary uses immediate execution: it reads the entire source sequence, and once it has returned, the dictionary is independent of the source
  • source, keySelector and elementSelector mustn’t be null
  • comparer can be null, in which case the default equality comparer for the key type will be used
  • No key can appear more than once (including non-identical but equal occurrences). This will cause an ArgumentException
  • Null keys are prohibited, causing an ArgumentNullException. (This is slightly odd, as the ToDictionary method itself doesn’t have a null argument at this point… really it should be a NullValueDerivedFromAnArgumentSomehowException, but that doesn’t exist)
  • Null values are allowed

Just a word on usage: one place I’ve seen ToDictionary used to reasonable effect is when the source is an IEnumerable<KeyValuePair<TKey, TValue>>, often from another dictionary. This lets you project all the values of a dictionary, or perhaps "trim" the dictionary in some ways… creating a new dictionary rather than modifying the existing one, of course. For example, here’s a way of creating a new Dictionary<string, Person> from an existing one, but only keeping the entries which are adults:

var adults = nameToPersonMap.Where(pair => pair.Value.Age >= 18)
                            .ToDictionary(pair => pair.Key, pair => pair.Value);

Of course, this is just one use – it can be a pretty handy operator, to be frank.

What are we going to test?

Just the "null argument" validation tests gives us ten tests to start with, then:

  • A test for each overload, always mapping a sequence of strings to a map using the first letter as the key; sometimes as a char and sometimes as a string (so we can easily use a case-insensitive comparer)
  • A test for null keys (ArgumentNullException)
  • A test for null values (no exception)
  • A test for duplicate keys (ArgumentException)
  • A test for a null comparer (uses the default)

Let’s implement it!

In my implementation, the first three overloads all call the last – just using a default equality comparer or an identity element selector where necessary. Then after the argument validation, the implementation is simple:

public static Dictionary<TKey, TElement> ToDictionary<TSource, TKey, TElement>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    Func<TSource, TElement> elementSelector,
    IEqualityComparer<TKey> comparer)
{
    // Argument validation elided
    Dictionary<TKey, TElement> ret = new Dictionary<TKey, TElement>(comparer ?? EqualityComparer<TKey>.Default);
    foreach (TSource item in source)
    {
        ret.Add(keySelector(item), elementSelector(item));
    }
    return ret;
}

There’s just one thing to be careful of – we have to use the Add method instead of the indexer to add entries to the dictionary. The Add method deals with duplicate and null keys in exactly the way we want, whereas the indexer would overwrite the existing entry if we had a duplicate key.

There’s one thing my implementation doesn’t do which it could if we really wanted – it doesn’t optimize the case where we know the size beforehand, because the source implements ICollection<T>. We could use the Dictionary constructor which takes a capacity in that case, like this:

comparer = comparer ?? EqualityComparer<TKey>.Default;
ICollection<TSource> list = source as ICollection<TSource>;
var ret = list == null ? new Dictionary<TKey, TElement>(comparer)
                       : new Dictionary<TKey, TElement>(list.Count, comparer);

After all, we know the eventual size of the dictionary will be exactly the count of the input sequence, in the success case.

Is this worth it? I’m not sure. It’s not a huge change, admittedly… I’m quite tempted to put it in. Again, we’d have to benchmark it to see what difference it made (and I’d benchmark the .NET implementation too, of course) but a unit test won’t show it up. Thoughts welcome.

EDIT: I’ve now edited the code to include this optimization. I haven’t benchmarked it though…

Conclusion

Gah – even when I thought I had this wrapped up before starting to write the post, that final bit of potential optimization still appeared out of nowhere.

Idle thought: if we all had to write up our code in blog posts as we went along, we might discover some interesting alternative designs and implementations.

That’s it for tonight, and I may not have any time for posts tomorrow – we’ll see.

9 thoughts on “Reimplementing LINQ to Objects: Part 25 – ToDictionary”

  1. Actually, you don’t need to explicitly pass the default comparer when the comparer is null: according to the documentation, the constructor of Dictionary accepts a null comparer and uses the default one in that case.

    Also, I was wondering: why are ToList and ToDictionary declared to return concrete classes, rather than interfaces? It’s against the usual guidelines, so I guess there must be a good reason for this?

    Like

  2. @Thomas: Yes, I suspected I could have used null all over the place here. I’ve explicitly used the default comparer as I think it’s clearer when reading the code.

    As for why concrete implementations are returned – I’m not sure why that is, to be honest. It does make things easier in various situations – for example, it’s only due to ToList returning List that we can call List.ToArray in Enumerable.ToArray :)

    Like

  3. @Thomas, to me, these two extension methods as well as ToArray have the specific purpose of returning a new populated instance of the required collection type (in an undeferred fashion) and therefore they can also return the concrete type they’re creating.

    They aren’t in the family of the “pipeline”-type methods, for which I do expect interface return types.

    Like

  4. That optimization looks important; why optimize it in ToList and not in ToDictionary? Don’t forget that dictionaries have more work to be done to resize themselves than lists; they have to recompute hashes and redistribute values.

    As for ToList and ToDictionary’s return type, considering their actual purpose is to return a List and a Dictionary, it would be quite odd if they returned interfaces instead…

    Like

  5. @configurator: Okay, I’ll add in the optimization :) I don’t believe that resizing actually recomputes the hash though – there’s no reason it would have to, as it should have retained the hash already.

    As for the result type of ToList and ToDictionary, your argument doesn’t hold for ToLookup which returns an interface…

    Like

  6. @jon: I think what configurator meant was that it has to recompute the *buckets*. I think the bucket number is sometimes called hash too, but it’s not the number returned by GetHashCode().

    Like

  7. @Douglas: Yes, that was just a typo. Fixed.

    As for throwing a derived exception – I think that would be acceptable, but I’m not sure whether it’s *really* applicable here. Will think about it.

    Like

Leave a comment