When is a constant not a constant? When it’s a decimal…

August 22, 2014 jonskeet 18 Comments

A comment on a Stack Overflow post recently got me delving into constants a bit more thoroughly than I have done before.

Const fields

I’ve been aware for a while that although you can specify decimal field as a const in C#, it’s not really const as far as the CLR is concerned. Let’s consider this class to start with:

class Test
{
    const int ConstInt32 = 5;
    const decimal ConstDecimal = 5;
}

Firstly, csc gives us a warning about ConstDecimal but not about ConstInt32:

Test.cs(4,19): warning CS0414: The field ‘Test.ConstDecimal’ is assigned but its value is never used

The Roslyn compiler (rcsc) doesn’t warn about either of the fields. This is just a curiosity, really – but it already shows that there’s some difference in how they’re compiled. When we delve into the IL, the difference is much more pronounced:

.class private auto ansi beforefieldinit Test
       extends [mscorlib]System.Object
{
  .field private static literal int32 ConstInt32 = int32(0x00000005)
  .field private static initonly valuetype [mscorlib]System.Decimal ConstDecimal
  .custom instance void [mscorlib]DecimalConstantAttribute::.ctor
      (uint8, uint8, uint32, uint32, uint32) = 
      ( 01 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00 00 00 ) 

  // Skip the parameterless constructor

  .method private hidebysig specialname rtspecialname static 
          void  .cctor() cil managed
  {
    // Code size       12 (0xc)
    .maxstack  8
    IL_0000:  ldc.i4.5
    IL_0001:  newobj     instance void [mscorlib]System.Decimal::.ctor(int32)
    IL_0006:  stsfld     valuetype [mscorlib]System.Decimal Test::ConstDecimal
    IL_000b:  ret
  } // end of method Test::.cctor

} // end of class Test

First things to note:

ConstInt32 has the literal constraint in IL. From ECMA 335, I.8.6.1.2:

The literal constraint promises that the value of the location is actually a fixed value
of a built-in type. The value is specified as part of the constraint. Compilers are
required to replace all references to the location with its value, and the VES therefore
need not allocate space for the location. This constraint, while logically applicable to
any location, shall only be placed on static fields of compound types. Fields that are
so marked are not permitted to be referenced from CIL (they shall be in-lined to their
constant value at compile time), but are available using reflection and tools that
directly deal with the metadata.

Whereas ConstDecimal only has the initonly constraint. Again, from ECMA 335:

The init-only constraint promises (hence, requires) that once the location has been
initialized, its contents never change. Namely, the contents are initialized before any
access, and after initialization, no value can be stored in the location. The contents are
always identical to the initialized value (see §I.8.2.3). This constraint, while logically
applicable to any location, shall only be placed on fields (static or instance) of
compound types.

(Here “compound type” just means “not an array type” – although in quite a confusing manner. I would ignore it if I were you.)

Next, note that there’s an attribute (System.Runtime.CompilerServices.DecimalConstantAttribute – I’ve taken the namespace off in the listing for the sake of readability) applied to the field, which tells anything consuming the assembly that it should be a constant, and what its value is. Indeed, if you’re very careful, you can create your own “constants” like this:

[DecimalConstant((byte)0, (byte)0, (uint)0, (uint)0, (uint) 5)]
public static readonly decimal ConstDecimal;

That field declaration will be treated as a constant in other assembles – but not within the same assembly. So printing ConstDecimal within the same assembly will result in 0 (unless you change it to another value in the static initializer) whereas printing Test.ConstDecimal in a different assembly will result in 5. (It won’t even touch the field at execution time.) I’m sure I can work out some nasty ways of abusing that, if I try hard enough.

Note that the casts to uint are important – if you accidentally call the attribute constructor with a (byte, byte, int, int, int), the compiler doesn’t recognize it. (Interesting, the latter was only introduced in .NET 2.0. I’ve no idea why.)

Amusingly, you can combine the two:

[DecimalConstant((byte)0, (byte)0, (uint)0, (uint)0, (uint) 5)]
public const decimal ConstDecimal = 10;

In this case, the IL contains both DecimalConstant attributes, despite the fact that it’s only legal to apply one. (AllowMultiple is false on its AttributeUsage.) The compiler appears to pick the one specified by the value rather than manually applied, which is slightly disappointing, but of no real importance.

Optional parameters

In the case of const fields, we’ve really only cared about what the compiler does. Let’s try something where both the compiler and the framework can get involved: optional parameters.

Again, let’s write a little class to demonstrate how the default values of optional parameters are encoded in IL:

public class Test
{
    public void PrintInt32(int x = 10)
    {
        Console.WriteLine(x);
    }

    public void PrintDecimal(decimal d = 10m)
    {
        Console.WriteLine(d);
    }
}

The important bits of the generated IL are:

.method public hidebysig instance void PrintInt32([opt] int32 x) cil managed
{
    .param [1] = int32(0x0000000A)
    ...
} // end of method Test::PrintInt32

.method public hidebysig instance void PrintDecimal(
    [opt] valuetype [mscorlib]System.Decimal d) cil managed
{
    .param [1]
    .custom instance void [mscorlib] DecimalConstantAttribute::.ctor
    (uint8, uint8, uint32, uint32, uint32) =
    ( 01 00 00 00 00 00 00 00 00 00 00 00 0A 00 00 00 00 00 ) 
    ...
} // end of method Test::PrintDecimal

Again, we have a DecimalConstantAttribute to describe the default value for the decimal parameter, whereas . If you call the method but don’t specify an argument, the compiler notes the DecimalConstantAttribute applied to the method parameter, and constructs the value in the calling code. That’s not the only way of observing the default value, however – you can do it with reflection, too:

public static void Main()
{
    var method = typeof(Test).GetMethod("PrintDecimal");
    var parameter = method.GetParameters()[0];
    Console.WriteLine("Default value: {0}", parameter.DefaultValue);
}

As you’d expect, that prints a default value of 10. There’s no direct equivalent of DefaultValue for FieldInfo – there’s GetRawConstantValue() but that only works for constants that the CLR is directly aware of – it fails for a field like const decimal Foo = 10m , with an InvalidOperationException. I’ll talk more about CLR constants later.

Now let’s try something a bit more tricksy though… C# doesn’t support DateTime literals, unlike VB – but there’s a DateTimeConstantAttribute – what happens if we try to apply that ourselves? Let’s see…

public void PrintDateTime
    ([Optional, DateTimeConstant(635443315962469079L)] DateTime date)
{
    Console.WriteLine(date);
}

So if we call PrintDateTime(), what does that print? Well (leaving aside the formatting – the examples below use the UK default formatting):

With csc.exe (the “old” C# compiler), with the call in the same assembly as the method declaration, it prints 01/01/0001 00:00:00
With csc.exe, with the call in a different assembly to the method declaration, it prints 22/08/2014 19:13:16
With rcsc.exe (Roslyn), it prints 22/08/2014 19:13:16 regardless of where it’s called from
If you call it dynamically (dynamic d = new Test(); d.PrintDateTime();) it prints 22/08/2014 19:13:16 regardless of the compiler – at least with the version of .NET I’m using. It may well vary by version.

In every case, printing out the ParameterInfo.DefaultValue gives the right answer: the framework doesn’t care whether or not the compiler understands the attribute.

In case you’re wondering why I didn’t mention this possibility for constant fields – I tried it, and it didn’t work, even in Roslyn. For some reason optional parameters are treated as more “special” than constant fields.

Having got this far, why stop with DateTime? The DateTimeConstantAttribute class derives from CustomConstantAttribute (whereas DecimalConstantAttribute just derives from Attribute). So can I introduce my own constant attributes? Noda Time seems to be an obvious candidate for this – let’s try for a LocalDateConstantAttribute:

[AttributeUsage(AttributeTargets.Parameter | AttributeTargets.Field)]
public class LocalDateConstantAttribute : CustomConstantAttribute
{
    private readonly LocalDate value;

    public LocalDateConstantAttribute(int year, int month, int day)
    {
        value = new LocalDate(year, month, day);
    }

    public override object Value { get { return value; } }
}
...
public void PrintLocalDate(
    [Optional, LocalDateConstant(2014, 8, 22)] LocalDate date)
{
    Console.WriteLine(date);
}

How does this fare? Not so well, unfortunately:
– With a normal method call (regardless of assembly), it prints 01 January 1970 (the default value for a LocalDate in Noda Time 1.3)
– With a dynamic method call it prints 01 January 1970
– With reflection (i.e. using ParameterInfo.DefaultValue) the framework does construct the appropriate LocalDate, which seems logical as it’s presumably just using the Value property of the attribute

So, there’s still work to be done there. I think it would be feasible to do this in a general way, if it’s acceptable for an exception to be thrown if the Value property returns an incompatible type of object to the parameter type. The great thing is that Roslyn is open source, so I should be able to spike this myself, right? How hard can it be? (Cue howls of derisive laughter from the Roslyn team, who will know much better than I how hard it’s likely to really be.)

CLR constants and attribute arguments

So with constant fields, it was all down to the compiler, really. For optional parameters, it’s mostly still down to the compiler, but with framework support for reflection. What about attribute arguments? They’re the other notable aspect of C# which requires compile-time constants. For example, this is fine:

[Description("This is a constant")]

But this is not:

[Description("Initialized at " + DateTime.Now)]

Intriguingly, this is fine too:

[ContractClass(typeof(Foo))]

… despite the fact that

const Type ConstType = typeof(Foo);

isn’t valid. The only constant expression which is valid for type Type is null. So in section 17.2 of the C# 5 specification, Type is explicitly called out:

An expression E is an attribute-argument-expression if all of the following statements are true:

The type of E is an attribute parameter type (§17.1.3).

At compile-time, the value of E can be resolved to one of the following:

A constant value.

A System.Type object.

A one-dimensional array of attribute-argument-expressions.

(Interestingly, there’s no indication that I can see that the value of E has to be obtained via typeof, in the spec – clearly [ContractClass(Type.GetType("TodayIs" + DateTime.Today.Day))] should be invalid, but I can’t currently see which bit of the spec that violates. Something to be tightened up, potentially.)

And the “attribute parameter type” part – section 17.1.3 – looks like this:

The types of positional and named parameters for an attribute class are limited to the attribute parameter types, which are:

One of the following types: bool, byte, char, double, float, int, long, sbyte, short, string, uint, ulong, ushort.

The type object.

The type System.Type.

An enum type, provided it has public accessibility and the types in which it is nested (if any) also have public accessibility (§17.2).

Single-dimensional arrays of the above types.

Oops – no decimal. Note that the type of E that has to be one of those types, as well as the parameter type… so it doesn’t help to have a parameter of type object and then try to pass a constant decimal value as the argument.

Basically, attributes arguments are sufficiently low-level that the CLR itself wants to know about them – and while it has a deep knowledge of Type, string and the primitive value types, it doesn’t have much knowledge about decimal (or at least, it isn’t required to). Attribute arguments can’t use funky “custom constant” attributes to specify values like decimal or DateTime – you really are limited to the types that the CLR knows about. In a future version it’s not inconceivable that this could be broadened, but at the moment it’s pretty strict.

Conclusion

So, it turns out the idea of a constant isn’t terribly constant in itself. We have:

Constant fields, which are primarily a compile-time concern, and therefore language-specific.
Optional parameter default values, which feel like they ought to be just like constant fields (in that a value specified in one place is substituted in another) but apparently have a bit more support in the C# compiler… and more reflection support too.
Attribute arguments, which are the strictest form of constant I’ve found so far, in that they have to correspond to a small set of CLR “special” types.

I didn’t even talk about constant expressions (section 7.19 of the C# 5 spec) much in this post – maybe I’ll delve into those in more detail in another post.

Unlike my normal day-dreaming about changing the compiler, I think I really might have a crack at Roslyn for supporting arbitrary optional parameters – it feels like it could potentially be genuinely useful (which is also unlike most of my idle speculation).

So next time you ask yourself whether something is a constant, you should start off by asking yourself what you mean by “constant” in the first place.

C#, Evil code, Stack Overflow

The BobbyTables culture

August 8, 2014 jonskeet 34 Comments

I started writing a post like this a long time ago, but somehow never finished it.

Countless posts on Stack Overflow are vulnerable to SQL injection attacks. Along with several other users, I always raise this when it shows up – this is something that really just shouldn’t happen these days. It’s a well-understood issue,and parameterized SQL is a great solution in almost all cases. (No, it doesn’t work if you want to specify an column or table name dynamically. Yes, whitelisting is the solution there.)

The response usually falls into one of three camps:

Ah – I didn’t know about that. Great, I’ll fix it now. Thanks!
This is just a prototype. I’ll fix it for the real thing. (Ha! Like that ever happens.)
Well yes, in theory – but I’m just using numbers. That’s not a problem, is it?

Now personally I feel that you should just get the habit of using parameterized queries all the time, even when you could get away without it. This post is a somewhat tongue-in-cheek counterargument to the last of these responses. If you haven’t seen Bobby Tables, you really should. It’s the best 10-second explanation of SQL injection that I’ve ever seen, and I almost always drop a link to it when I’m adding a comment on a vulnerable query on Stack Overflow.

So in honour of Bobby, here’s a little program. See if you can predict the output.

using System;
using System.Globalization;
using System.Threading;

class Test
{
    static void Main()
    {
        string sql = "SELECT * FROM Foo WHERE BarDate > '" + DateTime.Today + "'";
        // Imagine you're executing the query here...
        Console.WriteLine(sql);

        int bar = -10;
        sql = "SELECT * FROM Foo WHERE BarValue = " + bar;
        // Imagine you're executing the query here...
        Console.WriteLine(sql);
    }

    // Some other code here...
}

Does that look okay? Not great, admittedly – but not too bad, right? Well, the output of the program is:

SELECT * FROM Foo WHERE BarDate > '2014-08-08' OR ' '=' '
SELECT * FROM Foo WHERE BarValue = 1 OR 1=1 OR 1=10

Yikes! Our queries aren’t filtering out anything!

Of course, the black magic is in “Some other code here” part:

static Test()
{
    InstallBobbyTablesCulture();
}

static void InstallBobbyTablesCulture()
{
    CultureInfo bobby = (CultureInfo) CultureInfo.InvariantCulture.Clone();
    bobby.DateTimeFormat.ShortDatePattern = @"yyyy-MM-dd'' OR ' '=''";
    bobby.DateTimeFormat.LongTimePattern = "";
    bobby.NumberFormat.NegativeSign = "1 OR 1=1 OR 1=";
    Thread.CurrentThread.CurrentCulture = bobby;
}

Neither numbers (well, negative numbers in this case) nor dates are safe. And of course if your database permissions aren’t set correctly, the queries could do a lot more than just remove any filtering. For extra fun, you can subvert some custom format strings – by changing the DateSeparator property, for example.

Even in sensible cultures, if the database expects you to use . for the decimal separator and you’re in a European culture that uses , instead, do you know how your database will behave? If you sanitize your input based on the numeric value, but then that isn’t the value that the database sees due to a string conversion, how comfortable are you that your application is still safe? It may not allow direct damage, but it could potentially reveal more data than you originally expected – which is definitely a vulnerability in a form.

Now the chances of me getting onto your system and installing the Bobby Tables culture – let alone making it the system default – are pretty slim, and if that happens you’ve probably got bigger problems anyway… but it’s the principle of the thing. You don’t care about a text representation of your values: you just want to get them to the database intact.

Parameterized SQL: just say yes.

Performance

Object pooling and thread safety

August 1, 2014 jonskeet 52 Comments

A few days ago, I was watching Dustin Campell’s excellent talk from TechEd 2013, “Essential truths everyone should know about performance in a large managed code base”. I highly recommend it, and it made me think a bit more about Noda Time’s performance. (It doesn’t take much to make me think about Noda Time’s performance, admittedly.)

Base situation: no pooling

Most of our formatting goes through something called SteppedPattern – that’s basically one multicast delegate (with lots of actions) for formatting, and a list of delegates for parsing. The Format method was really simple before this week:

public string Format(TResult value)
{
    StringBuilder builder = new StringBuilder();
    // This will call all the actions in the multicast delegate.
    formatActions(value, builder);
    return builder.ToString();
}

Very simple – but it allocates a new StringBuilder on each call, which is somewhat unnecessary. Allocation is cheap, but not free – and string formatting is something which could reasonably happen an awful lot.

In the talk, Dustin showed an example where introducing a single object pool – also for StringBuilder – had made marked difference. Unfortunately, in my case it’s not as simple as we might like. Or at least, I’m not confident that it is. Let’s look at the options I’ve considered so far – in no particular order.

Option 1: Lock-free using Interlocked

After construction, a formatter can be used on multiple threads, so some case is needed to make sure we don’t end up using one StringBuilder from multiple threads at a time. Interlocked.Exchange makes this reasonably straightforward:

private StringBuilder cachedBuilder = new StringBuilder();
...
public string Format(TResult value)
{
    // Try to borrow the cached builder, in a thread-safe way.
    // (If another thread is using the builder, this will return null.)
    StringBuilder builder = Interlocked.Exchange(ref cachedBuilder, null);
    if (builder == null)
    {
        builder = new StringBuilder();
    }
    else
    {
        builder.Length = 0;
    }
    // This will call all the actions in the multicast delegate.
    formatActions(value, builder);
    string ret = builder.ToString();
    // Whether we managed to borrow the cached builder or not,
    // we can replace it with the one we've used, now that we
    // know what to return.
    Volatile.Write(ref cachedBuilder, builder);
    return ret;
}

(See comments for discussion on the Volatile.Write call, which was previously just an assignment statement, along with the use of Interlocked.Exchange compared with an earlier version using Interlocked.CompareExchange.)
I’m pretty confident that this does achieve the goal of only using the builder in one thread at a time. If an exception is thrown while formatting, that’s fine – we don’t end up putting anything back in the cache, and the next caller will allocate a new one.
I don’t believe we need to use Exchange a second time when writing to cachedBuilder – we’re certainly fine if it takes a while for that write to be seen by other threads, as the only downside would be not using the cached builder when we might want to.

But is that enough to make this code thread-safe? Is it possible that a write from one thread would only be seen in another thread after the second thread has borrowed the builder? Is it possible that the write to cachedBuilder actually occurs before the call to builder.ToString() completes, allowing another thread to start messing with the builder before we’ve got our result?

My guess is that it’s actually okay – but lock-free code fundamentally scares me, and I’m very cautious about writing it. (Elsewhere in Noda Time we do have some lock-free mutable data – but usually in the form of array-based caches where the element type is guaranteed to have atomic writes, and is immutable.)

So while this is “clever” and may be fine, it makes me nervous, which is not a good place to be.

Option 2: Locking

I’m much more confident about using threads when locking is involved. I find it easier to reason about what’s allowed and what’s not.

private readonly StringBuilder pooledStringBuilder = new StringBuilder();
private bool pooledBuilderInUse = false;
...

public string Format(TResult value)
{
    StringBuilder builder = null;
    lock (pooledStringBuilder)
    {
        if (!pooledBuilderInUse)
        {
            pooledBuilderInUse = true;
            builder = pooledStringBuilder;
            builder.Length = 0;
        }
    }
    try
    {
        builder = builder ?? new StringBuilder();
        // This will call all the actions in the multicast delegate.
        formatActions(value, builder);
        return builder.ToString();
    }
    finally
    {
        if (builder == pooledStringBuilder)
        {
            lock (pooledStringBuilder)
            {
                pooledBuilderInUse = false;
            }
        }
    }
}

Unlike the previous solution, this time we always know what the builder is (and never change it) – that gives us something to lock on. What we change within the lock is whether or not the builder is in use. This should be safe across multiple threads, but it’s ever so verbose – and in my benchmarks, the results weren’t terribly clear.

Option 3: ThreadStatic

The obvious alternative to sharing one builder between multiple threads is to have a
single builder per thread. There are two approaches to this in .NET:
[ThreadStatic] and ThreadLocal.

ThreadStatic is the older mechanism, and is simply a matter of decorating a static field with the [ThreadStatic] attribute. Each thread has a separate variable, as if by magic. So in principle, the implementation just looks like this:

[ThreadStatic]
private static StringBuilder cachedBuilder;

public string Format(TResult value)
{
    StringBuilder builder = cachedBuilder;
    if (cachedBuilder == null)
    {
        builder = new StringBuilder();
        cachedBuilder = builder;
    }
    else
    {
        builder.Length = 0;
    }
    // This will call all the actions in the multicast delegate.
    formatActions(value, builder);
    return builder.ToString();
}

Unfortunately we still need to check for nullity on every call, as any initialization
expression for the variable will only be executed by the first thread which uses the class. Still, the code is reasonably simple.

There’s a big “gotcha” with this though: our code is now not re-entrant. If one SteppedPattern has a format action which calls format on another SteppedPattern, we’ll end up trying to use the same builder for both operations, resetting the builder in the middle.

Fortunately, SteppedPattern already has a method (FormatPartial) designed for calling one formatter within another. In the very few places where this is required, I’m confident that it’s being called correctly – but I’m not keen on making my code fragile like this. It’s fairly safe from third party intervention – there are very few places (if any) where formatting code calls out to any other code which could include a format call. Just writing this paragraph, I’ve come up with a couple of worrying parts (fetching values from a CultureInfo, and fetching the ID of a time zone) but I think they’re safe… the question is whether I’ve missed something similar in another case. Either way, it would have to be pretty pathological code to cause a problem, and I consider the chances of this being a problem in the real world to be essentially zero.

However, I still want to guard against accidental re-entrancy within Noda Time itself – so I’ve got a debug-only version of Format which performs a crude re-entrancy check using another [ThreadStatic] variable. (And I’ve got a debug-only test to check that the check works, of course.) So long as my unit tests all still pass in a debug configuration, I’m reasonably confident. (You can see the check and the test in the revision that added this code. It may be gone in the latest version, of course – I’m still on the fence about all of this.)

There’s a second issue with [ThreadStatic] which is its level of support in the PCL. We just recently updated our PCL profiles, and I think using [ThreadStatic] would have been a problem if we’d continued to support Windows Phone 7, as that didn’t include [ThreadStatic] as far as I can tell. We’re fine in the current profile, but this is the first time I’ve knowing added code which prevents us from going back to a WP7-enabled profile.

Option 4: ThreadLocal

ThreadLocal is a more recent solution to the thread-local-storage problem. It’s generally a cleaner solution:

The variable doesn’t have to be static
You can specify an initialization delegate to be executed on first use in any particular thread
It’s clear from the type itself what you’re doing, rather than using an attribute which makes a variable act completely differently from normal

That means we can have one StringBuilder per formatter per thread (if we want; we could still make it a static variable if we wanted exactly one StringBuilder for each thread), and simplify the code within the Format method:

private ThreadLocal<StringBuilder> cachedBuilder =
    new ThreadLocal<StringBuilder>(() => new StringBuilder());

public string Format(TResult value)
{
    StringBuilder builder = cachedBuilder.Value;
    builder.Length = 0;
    // This will call all the actions in the multicast delegate.
    formatActions(value, builder);
    return builder.ToString();
}

However, I haven’t tested this code, as ThreadLocal required .NET 4.0 – whereas Noda Time targets .NET 3.5.

Conclusion

You may have noticed that I haven’t provided any performance figures in this post. That’s deliberate, because the results I’ve seen have had too much variance so far. The basic upshot of this is that I don’t yet have enough evidence of the benefit to justify the complexity – it’s just one StringBuilder per format operation, after all, and there’s plenty of other work going on at the same time. I believe there is a benefit, but it’s pretty marginal – and the level of benefit depends on which of the above options is taken.

I’m intrigued by the first option though, and whether experts would consider it to be safe. I still want to see a really good specification for the .NET memory model (which I know to be stronger than the one in the ECMA CLI specification) – I know it’s something that Microsoft has wanted to get round to writing up, but I don’t know if it’s actually happened… I fully expect to see contradictory opinions in comments :)

Jon Skeet's coding blog

Monthly Archives: August 2014

When is a constant not a constant? When it’s a decimal…

Const fields

Optional parameters

CLR constants and attribute arguments

Conclusion

The BobbyTables culture

Object pooling and thread safety

Base situation: no pooling

Option 1: Lock-free using Interlocked

Option 2: Locking

Option 3: ThreadStatic

Option 4: ThreadLocal

Conclusion