The BobbyTables culture

I started writing a post like this a long time ago, but somehow never finished it.

Countless posts on Stack Overflow are vulnerable to SQL injection attacks. Along with several other users, I always raise this when it shows up – this is something that really just shouldn’t happen these days. It’s a well-understood issue,and parameterized SQL is a great solution in almost all cases. (No, it doesn’t work if you want to specify an column or table name dynamically. Yes, whitelisting is the solution there.)

The response usually falls into one of three camps:

  • Ah – I didn’t know about that. Great, I’ll fix it now. Thanks!
  • This is just a prototype. I’ll fix it for the real thing. (Ha! Like that ever happens.)
  • Well yes, in theory – but I’m just using numbers. That’s not a problem, is it?

Now personally I feel that you should just get the habit of using parameterized queries all the time, even when you could get away without it. This post is a somewhat tongue-in-cheek counterargument to the last of these responses. If you haven’t seen Bobby Tables, you really should. It’s the best 10-second explanation of SQL injection that I’ve ever seen, and I almost always drop a link to it when I’m adding a comment on a vulnerable query on Stack Overflow.

So in honour of Bobby, here’s a little program. See if you can predict the output.

using System;
using System.Globalization;
using System.Threading;

class Test
{
    static void Main()
    {
        string sql = "SELECT * FROM Foo WHERE BarDate > '" + DateTime.Today + "'";
        // Imagine you're executing the query here...
        Console.WriteLine(sql);

        int bar = -10;
        sql = "SELECT * FROM Foo WHERE BarValue = " + bar;
        // Imagine you're executing the query here...
        Console.WriteLine(sql);
    }

    // Some other code here...
}

Does that look okay? Not great, admittedly – but not too bad, right? Well, the output of the program is:

SELECT * FROM Foo WHERE BarDate > '2014-08-08' OR ' '=' '
SELECT * FROM Foo WHERE BarValue = 1 OR 1=1 OR 1=10

Yikes! Our queries aren’t filtering out anything!

Of course, the black magic is in “Some other code here” part:

static Test()
{
    InstallBobbyTablesCulture();
}

static void InstallBobbyTablesCulture()
{
    CultureInfo bobby = (CultureInfo) CultureInfo.InvariantCulture.Clone();
    bobby.DateTimeFormat.ShortDatePattern = @"yyyy-MM-dd'' OR ' '=''";
    bobby.DateTimeFormat.LongTimePattern = "";
    bobby.NumberFormat.NegativeSign = "1 OR 1=1 OR 1=";
    Thread.CurrentThread.CurrentCulture = bobby;
}

Neither numbers (well, negative numbers in this case) nor dates are safe. And of course if your database permissions aren’t set correctly, the queries could do a lot more than just remove any filtering. For extra fun, you can subvert some custom format strings – by changing the DateSeparator property, for example.

Even in sensible cultures, if the database expects you to use . for the decimal separator and you’re in a European culture that uses , instead, do you know how your database will behave? If you sanitize your input based on the numeric value, but then that isn’t the value that the database sees due to a string conversion, how comfortable are you that your application is still safe? It may not allow direct damage, but it could potentially reveal more data than you originally expected – which is definitely a vulnerability in a form.

Now the chances of me getting onto your system and installing the Bobby Tables culture – let alone making it the system default – are pretty slim, and if that happens you’ve probably got bigger problems anyway… but it’s the principle of the thing. You don’t care about a text representation of your values: you just want to get them to the database intact.

Parameterized SQL: just say yes.

34 thoughts on “The BobbyTables culture”

  1. I would say it’s also important to remember that this doesn’t just apply to SQL based DBs, I know they’re the biggest target, but switching to a Graph DB (for example) doesn’t mean you can get away with not using parameters, you are just as vulnerable.

    Liked by 1 person

    1. True. Basically anywhere that you want to send multiple pieces of information (in this case a query and some values, but there are plenty of other examples), smashing a bunch of string representations together in your code is a bad approach.

      Like

    1. In that case you’re not performing any execution-time concatenation – you’re not using the values from a user. For IN with user-specified values, there are various options, including dynamically generating the SQL using parameters – and then populating those parameters as normal. (There are various other options, some database-specific.)

      For LIKE, it’s trickier. You can use a parameter, and then build the parameter with concatenation – but LIKE is a little language of its own, for escaping percents etc. There may well be clean ways around that, but I don’t know of them, and I suspect they’re db-specific.

      Liked by 1 person

      1. The problem I’ve ran into when building SQL statements dynamically is that there is a limit to the number of parameters you can specify. If I remember correctly it’s around 2000 for MSSQL, which seems like a big number, but when you start concatenating insert & update statements you can ran into that limit.
        I’ve also ran into that limit the first time I’ve played with Linq2SQL :)
        The code was basically:
        String[] files=Directory.GetFiles(dir)
        results=(from f in DB.Files where files.Contains(f.Name) select f)

        Like

    2. The “IN” list can work with a little elbow grease. Spend some time on Google and you’ll find quite a few great solutions – I use a function that converts a delimited list of values into a table that can be used for the comparison.

      The “LIKE” example would require you to concatenate the % onto the sanitized inbound parameter.

      WHERE Name LIKE ‘%’ + @Parameter + ‘%’

      Like

  2. This is why using Fluent Linq is so nice. Internally the values are substituted server side through parameters. I wish more people would switch to using linq and away from “magic string” queries.

    Like

  3. I think one of the main reasons people don’t use parameters is that they’re painfully akward to use if you do vanilla ADO.NET. Now I mostly use Dapper, which makes it much easier (and has the added benefit of automatically materializing entities from query results)

    Like

  4. Dapper has an excellent solution for IN: It replaces any IEnumerable parameters to the query with multiple parameters and rewrites the IN part of the query.

    Like

  5. I’m not sure I buy “painfully awkward to use”. Here’s the basic SQL command pattern in raw ADO.NET:

    using (SqlCommand cmd = conn.CreateCommand())
    {
        cmd.CommandText = "SELECT * FROM Employees WHERE Name = @name AND Company = @company";
        cmd.Parameters.AddWithValue("@name", Request.QueryString["name"]);
        cmd.Parameters.AddWithValue("@company", Request.QueryString["company"]);
    
        // execute query here
    }
    

    Okay, I buy that those cmd.Parameters.AddWithValue calls are verbose and repetitive, but they’re not that bad. If your code smell sense is tingling and you really want to clean it up, you can add a tiny extension method to SqlCommand to make it nicer. Here’s one I’ll give you for free:

    public static void Parameterize(string sql, this SqlCommand cmd, object parameters)
    {
        cmd.CommandText = sql;
        foreach (var prop in parameters.GetType().GetProperties())
        {
            cmd.Parameters.Add(new SqlParameter("@" + prop.Name, prop.GetValue(parameters)));
        }
    }
    

    Now our Command code reads:

    using (SqlCommand cmd = conn.CreateCommand())
    {
        cmd.Parameterize(
            "SELECT * FROM Employees WHERE Name = @name AND Company = @company", 
             new { 
                @name = Request.QueryString["name"], 
                @company = Request.QueryString["company"]
             });
    
        // Execute query here
    }
    

    (Wouldn’t be worth adding a comment to a Jon Skeet blogpost if it didn’t make gratuitous misuse of an obscure C# syntax feature, after all)

    Liked by 1 person

    1. It’s not too bad when you use a specific ADO.NET provider, but look at the equivalent DBMS-agnostic code:

      IDbConnection conn = ...
      using (IDbCommand cmd = conn.CreateCommand())
      {
          cmd.CommandText = "SELECT * FROM Employees WHERE Name = @name AND Company = @company";
      
          IDbDataParameter nameParam = cmd.CreateParameter();
          nameParam.ParameterName = "@name";
          nameParam.DbType = DbType.String;
          nameParam.Value = Request.QueryString["name"];
      
      
          IDbDataParameter companyParam = cmd.CreateParameter();
          companyParam.ParameterName = "@company";
          companyParam.DbType = DbType.String;
          companyParam.Value = Request.QueryString["company"];
      
          cmd.Parameters.Add(nameParam);
          cmd.Parameters.Add(companyParam);
      
          // execute query here
      }
      

      Obviously you can create an extension method to make it easier, but the standard API is really clumsy…

      Like

  6. We occasionally forgo use of parameters due to the problems with parameter sniffing. In certain situations big performance gains can be made. Definitely wouldn’t make a habit out of it, but it’s amazing the difference that can be made to query plans.

    Like

  7. The only time parameters seem onerous is for a 50 row insert. Do you really parameterize all 500 values? How dangerous is AppendSqlLiteral and AppendFormat for numbers and dates with the numeric format specified? Also what about performance? I’d be willing to be convinced.

    Like

    1. For a 50 row insert, wouldn’t you just batch 50 single-row inserts, in which case you’re only parameterizing 10 values? I’d need to look at specific examples – and bear in mind that I’m far from a SQL expert – but I’d generally be far more comfortable in assuming that everything is safe if I’m using parameters than using something like AppendSqlLiteral.

      Like

      1. As far as pure performance, these benchmarks cover a lot. (Oddly, substringing from one long parameter for each row gives the best and truly ugliest performance benefit.)
        Basically, if I can prove that AppendSqlLiteral is incapable of outputting a vulnerable value, and I can abstract away the ugliness, I’ll be tempted to stay with it for the performance improvements.

        Like

        1. The “if I can prove” part is the tricky bit there. It means making sure you’re aware of every weird thing that can be done in SQL. As someone with a relatively poor knowledge of SQL, it would take a lot to convince me of the safety – and I’d rather give up a bit of performance for the guarantee of safety. But different developers will have different tolerance levels for that, and in different contexts.

          Now if framework provided something to do it – the SQL equivalent of Regex.Escape – that would give me a lot more confidence.

          Liked by 1 person

  8. We’re not helping beginners by calling it “sanitizing”. It leads to bad mental models such as you’ve demonstrated. Almost every use of “sanitization” (of user inputs) should be replaced with “containment”.

    Sanitizing leads people to think that strings need cleaning up, to the chagrin of anyone with an apostrophe in their name.

    Like

  9. Hey Jon, I love your new blog, It’s very clean and easy on the eyes. I have a bit of catching up to do.

    As for the article, I do see a lot of cases like this in SO, sometimes when I point it out I get a bit of a “this-is-not-what-I-asked” attitude. You see, I HAVE to point it out :) It’s like when I see someone querying the DB fetching all the records and THEN filter them in memory, very often I see people doing pagination this way (for example), arghhh!

    Like

    1. It’s a problem that is pervasive in most companies I work with. Most developers, especially for smaller companies, just don’t understand injection security or (in some cases) the need for security at all. I had one business owner tell me we didn’t need to spend time securing the web service for a Flex app – security through obscurity was just fine. When I explained that he’d have penis pictures in his church-oriented application, it was just a matter of time, suddenly he decided that securing the web service might be a good idea.

      Like

  10. This would also be an attack vector for an app that lets the user adjust their output date/time format by a custom string. I’ve seen a few of those. :)

    Also, a point that a lot of folks forget about paramaterization is that it also improves performance. Most SQL engines will cache the execution plan of the SQL strings they are given, but they do the cache lookup with a hash of the entire string. So if your parameters are inlined the string, then you get a different hash code and a cache-miss. So if the parameters vary often, then the SQL string will have to be recompiled on the fly instead of using the cached execution plan. Paramaterization ensures that the same hash is computed regardless of the parameter values, as the string only contains the placeholder or name of the parameter.

    Like

  11. This won’t be among the great comments in this post and comment section; I just wanted to say that Bobby Tables is one of my favorite Cartoons.

    Like

  12. I think the example is interesting but not really relevant. I mean if somebody can access my current process and modify the CultureInfo of the current thread, they can also call Environment.Exit() or make a recursive call and provoke a StackOverflowException. But when concatenating a query, one is anyway forced to use InvariantCulture with format parameters. All other cultures are variant, and believe me, I know what I am talking about, I’m Swiss where Microsoft changed – based on a crackpot idea of our federal chancellery – the default number and currency format of our culture some years ago (and then back again after all the complaints), so that depending on what server the program is running on, the default culture (not the user overrides!) are either with a decimal symbole of a “.” or a “,” and a digit grouping symbol of an apostrophe or a space (that even topped it, a space! Totally practical for OCR, no differentiation anymore whether this are two numbers or a single number with digit grouping…). Did you ever try do the same on a parametrized query? They probably also use ToString on a Guid or the like, and if you change the InvariantCulture through reflection, I would not be surprise to see them behave exactly the same…

    Like

    1. Parameterized queries don’t typically convert the values to strings at all – they transmit the values separately from the query, using whatever the DB-specific protocol uses for that purpose. (So yes, potentially as strings, but I wouldn’t expect so… I’d expect a suitable binary representation.)

      Like

Leave a comment