Formatting strings

A while ago I wrote an article about StringBuilder and a reader mailed me to ask about the efficiency of using String.Format instead. This reminded me of a bone I have to pick with the BCL.

Whenever we make a call to String.Format, it has to parse the format string. That doesn’t sound too bad, but string formatting can be used a heck of a lot – and the format is almost always hard-coded in some way. It may be loaded from a resource file instead of being embedded directly in the source code, but it’s not going to change after the application has started.

I put together a very crude benchmark which joins two strings together, separating them with just a space. The test uses String.Format first, and then concatenation. (I’ve tried it both ways round, however, and the results are the same.)

using System;
using System.Diagnostics;

public static class Test
{
    const int Iterations=10000000;
    const int PieceSize=10;
   
    static void Main()
    {
        string first = GenerateRandomString();
        string second = GenerateRandomString();
        int total=0;
   
        Stopwatch sw = Stopwatch.StartNew();
        for (int i=0; i < Iterations; i++)
        {
            string x = String.Format(“{0} {1}”, first, second);
            total += x.Length;
        }
        sw.Stop();
        Console.WriteLine(“Format: {0}”, sw.ElapsedMilliseconds);
        GC.Collect();
       
        sw = Stopwatch.StartNew();
        for (int i=0; i < Iterations; i++)
        {
            // Equivalent to first + ” ” + second
            string x = String.Concat(first, ” “, second);
            total += x.Length;
        }
        sw.Stop();
        Console.WriteLine(“Concat: {0}”, sw.ElapsedMilliseconds);
        if (total != Iterations * 2 * (PieceSize*2 + 1))
        {
            Console.WriteLine(“Incorrect total: {0}”, total);
        }
    }
   
    private static readonly Random rng = new Random();
    private static string GenerateRandomString()
    {
        char[] ret = new char[PieceSize];
        for (int j=0; j < ret.Length; j++)
        {
            ret[j] = (char) (‘A’ + rng.Next(26));
        }
        return new string(ret);
    }
}

And the results (on my very slow Eee)…

Format: 14807
Concat: 3567

As you can see, Format takes significantly longer than Concat. I strongly suspect that this is largely due to having to parse the format string on each iteration. That won’t be the whole of the cost – String needs to examine the format specifier for each string as well, in case there’s padding, etc – but again that could potentially be optimised.

I propose a FormatString class with a pair of Format methods, one of which takes a culture and one of which doesn’t. We could then hoist our format strings out of the code itself and make them static readonly variables referring to format strings. I’m not saying it would do a huge amount to aid performance, but it could shave off a little time here and there, as well as making it even more obvious what the format string is used for.

16 thoughts on “Formatting strings”

  1. Surprisingly, the code doesn’t compile.

    Line: ret = (char) (‘A’ + rng.Next(26));
    Cannot implicitly convert type ‘char’ to ‘char[]’

    The point here is – one should use Concat against Format (in this case). The format will be useful where it does some real formatting operation (e.g. {0:c}

    BTW, I am waiting to hear your DNR show ;)

    Like

  2. That’s very strange. It should, of course, be “ret[i] = (char) (‘A’ + rng.Next(26));”. I’ll fix it in a minute. I wonder how that got past the formatter though. Very strange. It could be yet another case of the blog engine “fixing” things for me :(

    The performance issue is a general one though – you can always rewrite a call to String.Format as a (potentially large) call to Concat, but that’s not as readable. It would be nice if the framework provided a way to have our cake and eat it – in the form of a “parse once, use often” object.

    Like

  3. It certainly makes sense to me. The Regex class has static methods for performing regular expression matching (similar to String.Format), but you can also create an instance of the Regex class with your regular expression and the options you want to use, which can be optimized for reuse. I don’t see any reason not to have something similar for String.Format.

    Have you made this suggestion to the BCL team directly?

    Like

  4. As I suspected, it was the blog engine.

    Oh, and I’m looking forward to the DNR show tomorrow too. It’ll be interesting to see if they’ve managed to edit my drivel into something which makes me sound intelligent!

    Like

  5. How about strongly-typing compiled format strings as delegates, in the manner of F#? Maybe not going so far as to have the compiler actually parse the format string and infer the types, but how about:

    var format = Formatter.Compile(“{0} ({1:0.00})”);

    string formatted = format(“Hello”, Math.Pi); // “Hello (3.14)”

    Gives you the opportunity to fail at Formatter.Compile if the format string tries to use a format item that isn’t specified or with an incompatible format string for a given type.

    Like

  6. @James: Gosh, that’s an interesting idea. I like that. I think it would require some extra support above and beyond what the current interfaces express, but it’s still a neat plan.

    Like

  7. I would like to see the format string compiled like Regex can be; possibly broken into an expression tree so that it only needs to get parsed once and is thereafter faster to run. I’ve often wondered why we don’t have the concept of a compiled format string so that calls like string.Format will be more efficient in repeated calls.

    Like

  8. As for the relative efficiency of String.Format versus StringBuilder.AppendFormat, have a look at the code for the String.Format(IFormatProvider, string, params object[]) method:

    public static string Format(…)
    {

    StringBuilder builder = new StringBuilder(format.Length + (args.Length * 8));
    builder.AppendFormat(provider, format, args);
    return builder.ToString();
    }

    Clearly, if you already have a StringBuilder, it’s more efficient to call AppendFormat than Append(String.Format).

    The only caveat is if the format string is invalid: AppendFormat will append the start of the format string, and then throw an exception, whereas Append(String.Format) will throw an exception without modifying the StringBuilder instance.

    Like

  9. @Richard: I certainly wouldn’t suggest doing that. But I’d suggest doing:

    sb.Append(x);
    sb.Append(“:”);
    sb.Append(y);

    instead of

    sb.AppendFormat(“{0}:{1}”, x, y);

    (If performance were an issue, of course.)

    Like

  10. Interestingly, that’s one case where F# ML curring functions allow for smoother syntax. You could curry Format with the format string, and if the implementation is smart, it will parse it at that point (or, alternatively, parse it first time it’s needed, but then memoize it). Then you save the function that is the result of that currying, and use it. I.e.:

    // Core functionality
    type ParsedFormatString = …
    let ParseFormatString s = …
    let FormatPreparsed parsedFormatString args = …

    // BCL-style Format on top of that, with currying and memoization
    let Format formatString = FormatPreparsed (ParseFormatString formatString)

    // Parse as usual
    for x = 1 to 10 do
    for y = 1 to 10 do
    Format “{0} {1}” x y

    // Memoize and reuse
    let FormatPair = Format “{0} {1}”
    for x = 1 to 10 do
    for y = 1 to 10 do
    FormatPair x y

    Like

  11. The problem with that article is it’s focused solely on the performance. Sure, concatenating 5 strings in one go may be slightly slower than keeping them within blocks of four – but it’s more readable.

    Readability is king unless there are big-Oh reasons to change…

    Like

  12. I don’t see why you say the dotnetperls.com article is solely focused on performance. It has many more examples than this blog post. Obviously you want to write clear code.

    In my opinion this post is focused solely on performance.

    Sam

    Like

  13. @Sam: Look at the advice you give at the end. It’s about performance, and not about readability:

    “However, When you need to concat 5 or more strings, use multiple statements of 4 strings at once. This is appropriate for when you have a known number of strings.”

    I would always go for the simpler approach until the performance proves to be a bottleneck.

    As for this blog post: it’s about performance without giving advice to readers (seeing as the request I’ve made hasn’t come to fruition yet). However, look at the last bit:

    “I’m not saying it would do a huge amount to aid performance, but it could shave off a little time here and there, as well as making it even more obvious what the format string is used for.”

    See – it’s about making it more readable *as well* as performing better.

    Like

  14. Readability is king unless there are big-Oh reasons to change…

    This could also be one of my mottos
    yet, even though str1 + str2 is sometimes more readable than string.Format(… I’ve stopped using it for performance reasons

    Anders Hejlsberg used it in his PDC presentation and I thought it was very approprate in this context (i.e. the context in which someone else than you has to read your code)

    Jon, glad to see you also own an eee. I bought it for ebooks reading… but was surprised I could actually develop WCF services and Silverlight frontends with it (OK, the performance is not optimal, but it’s still usable)

    Like

Leave a comment