The BobbyTables culture

August 8, 2014 jonskeet 34 Comments

I started writing a post like this a long time ago, but somehow never finished it.

Countless posts on Stack Overflow are vulnerable to SQL injection attacks. Along with several other users, I always raise this when it shows up – this is something that really just shouldn’t happen these days. It’s a well-understood issue,and parameterized SQL is a great solution in almost all cases. (No, it doesn’t work if you want to specify an column or table name dynamically. Yes, whitelisting is the solution there.)

The response usually falls into one of three camps:

Ah – I didn’t know about that. Great, I’ll fix it now. Thanks!
This is just a prototype. I’ll fix it for the real thing. (Ha! Like that ever happens.)
Well yes, in theory – but I’m just using numbers. That’s not a problem, is it?

Now personally I feel that you should just get the habit of using parameterized queries all the time, even when you could get away without it. This post is a somewhat tongue-in-cheek counterargument to the last of these responses. If you haven’t seen Bobby Tables, you really should. It’s the best 10-second explanation of SQL injection that I’ve ever seen, and I almost always drop a link to it when I’m adding a comment on a vulnerable query on Stack Overflow.

So in honour of Bobby, here’s a little program. See if you can predict the output.

using System;
using System.Globalization;
using System.Threading;

class Test
{
    static void Main()
    {
        string sql = "SELECT * FROM Foo WHERE BarDate > '" + DateTime.Today + "'";
        // Imagine you're executing the query here...
        Console.WriteLine(sql);

        int bar = -10;
        sql = "SELECT * FROM Foo WHERE BarValue = " + bar;
        // Imagine you're executing the query here...
        Console.WriteLine(sql);
    }

    // Some other code here...
}

Does that look okay? Not great, admittedly – but not too bad, right? Well, the output of the program is:

SELECT * FROM Foo WHERE BarDate > '2014-08-08' OR ' '=' '
SELECT * FROM Foo WHERE BarValue = 1 OR 1=1 OR 1=10

Yikes! Our queries aren’t filtering out anything!

Of course, the black magic is in “Some other code here” part:

static Test()
{
    InstallBobbyTablesCulture();
}

static void InstallBobbyTablesCulture()
{
    CultureInfo bobby = (CultureInfo) CultureInfo.InvariantCulture.Clone();
    bobby.DateTimeFormat.ShortDatePattern = @"yyyy-MM-dd'' OR ' '=''";
    bobby.DateTimeFormat.LongTimePattern = "";
    bobby.NumberFormat.NegativeSign = "1 OR 1=1 OR 1=";
    Thread.CurrentThread.CurrentCulture = bobby;
}

Neither numbers (well, negative numbers in this case) nor dates are safe. And of course if your database permissions aren’t set correctly, the queries could do a lot more than just remove any filtering. For extra fun, you can subvert some custom format strings – by changing the DateSeparator property, for example.

Even in sensible cultures, if the database expects you to use . for the decimal separator and you’re in a European culture that uses , instead, do you know how your database will behave? If you sanitize your input based on the numeric value, but then that isn’t the value that the database sees due to a string conversion, how comfortable are you that your application is still safe? It may not allow direct damage, but it could potentially reveal more data than you originally expected – which is definitely a vulnerability in a form.

Now the chances of me getting onto your system and installing the Bobby Tables culture – let alone making it the system default – are pretty slim, and if that happens you’ve probably got bigger problems anyway… but it’s the principle of the thing. You don’t care about a text representation of your values: you just want to get them to the database intact.

Parameterized SQL: just say yes.

34 thoughts on “The BobbyTables culture”

cskardon says:

August 8, 2014 at 3:00 pm

I would say it’s also important to remember that this doesn’t just apply to SQL based DBs, I know they’re the biggest target, but switching to a Graph DB (for example) doesn’t mean you can get away with not using parameters, you are just as vulnerable.

LikeLiked by 1 person

Reply
1. jonskeet says:
  
  August 8, 2014 at 3:02 pm
  
  True. Basically anywhere that you want to send multiple pieces of information (in this case a query and some values, but there are plenty of other examples), smashing a bunch of string representations together in your code is a bad approach.
  
  LikeLike
  
  Reply
Vitor Canova (@vitor_canova) says:

August 8, 2014 at 3:28 pm

I always use parameters. But how about something with an “IN” or “LIKE”. Does it work?

I mean:

Select * from Users where state in (1,2,3,4)

Or

Select * from Users where name like ‘%skeet%’

LikeLiked by 1 person

Reply
1. jonskeet says:
  
  August 8, 2014 at 3:32 pm
  
  In that case you’re not performing any execution-time concatenation – you’re not using the values from a user. For IN with user-specified values, there are various options, including dynamically generating the SQL using parameters – and then populating those parameters as normal. (There are various other options, some database-specific.)
  
  For LIKE, it’s trickier. You can use a parameter, and then build the parameter with concatenation – but LIKE is a little language of its own, for escaping percents etc. There may well be clean ways around that, but I don’t know of them, and I suspect they’re db-specific.
  
  LikeLiked by 1 person
  
  Reply
  1. Vitor Canova (@vitor_canova) says:
    
    August 10, 2014 at 12:22 pm
    
    Poor example. I mean exactly what you said, with values from user. Thanks
    
    LikeLike
    
    Reply
  2. Nadav says:
    
    August 11, 2014 at 6:34 am
    
    The problem I’ve ran into when building SQL statements dynamically is that there is a limit to the number of parameters you can specify. If I remember correctly it’s around 2000 for MSSQL, which seems like a big number, but when you start concatenating insert & update statements you can ran into that limit.
    I’ve also ran into that limit the first time I’ve played with Linq2SQL :)
    The code was basically:
    String[] files=Directory.GetFiles(dir)
    results=(from f in DB.Files where files.Contains(f.Name) select f)
    
    LikeLike
    
    Reply
2. triptyx says:
  
  August 13, 2014 at 9:31 pm
  
  The “IN” list can work with a little elbow grease. Spend some time on Google and you’ll find quite a few great solutions – I use a function that converts a delimited list of values into a table that can be used for the comparison.
  
  The “LIKE” example would require you to concatenate the % onto the sanitized inbound parameter.
  
  WHERE Name LIKE ‘%’ + @Parameter + ‘%’
  
  LikeLike
  
  Reply
travisj says:

August 8, 2014 at 3:31 pm

This is why using Fluent Linq is so nice. Internally the values are substituted server side through parameters. I wish more people would switch to using linq and away from “magic string” queries.

LikeLike

Reply
Thomas Levesque says:

August 8, 2014 at 5:05 pm

I think one of the main reasons people don’t use parameters is that they’re painfully akward to use if you do vanilla ADO.NET. Now I mostly use Dapper, which makes it much easier (and has the added benefit of automatically materializing entities from query results)

LikeLike

Reply
Albin Sunnanbo (@AlbinSunnanbo) says:

August 8, 2014 at 5:11 pm

Nice twist with the culture ;-)
Caught me off guard.

LikeLike

Reply
Julian says:

August 8, 2014 at 5:41 pm

Dapper has an excellent solution for IN: It replaces any IEnumerable parameters to the query with multiple parameters and rewrites the IN part of the query.

LikeLike

Reply

I’m not sure I buy “painfully awkward to use”. Here’s the basic SQL command pattern in raw ADO.NET:

using (SqlCommand cmd = conn.CreateCommand())
{
    cmd.CommandText = "SELECT * FROM Employees WHERE Name = @name AND Company = @company";
    cmd.Parameters.AddWithValue("@name", Request.QueryString["name"]);
    cmd.Parameters.AddWithValue("@company", Request.QueryString["company"]);

    // execute query here
}

Okay, I buy that those cmd.Parameters.AddWithValue calls are verbose and repetitive, but they’re not that bad. If your code smell sense is tingling and you really want to clean it up, you can add a tiny extension method to SqlCommand to make it nicer. Here’s one I’ll give you for free:

public static void Parameterize(string sql, this SqlCommand cmd, object parameters)
{
    cmd.CommandText = sql;
    foreach (var prop in parameters.GetType().GetProperties())
    {
        cmd.Parameters.Add(new SqlParameter("@" + prop.Name, prop.GetValue(parameters)));
    }
}

Now our Command code reads:

using (SqlCommand cmd = conn.CreateCommand())
{
    cmd.Parameterize(
        "SELECT * FROM Employees WHERE Name = @name AND Company = @company", 
         new { 
            @name = Request.QueryString["name"], 
            @company = Request.QueryString["company"]
         });

    // Execute query here
}

(Wouldn’t be worth adding a comment to a Jon Skeet blogpost if it didn’t make gratuitous misuse of an obscure C# syntax feature, after all)

LikeLiked by 1 person

It’s not too bad when you use a specific ADO.NET provider, but look at the equivalent DBMS-agnostic code:

IDbConnection conn = ...
using (IDbCommand cmd = conn.CreateCommand())
{
    cmd.CommandText = "SELECT * FROM Employees WHERE Name = @name AND Company = @company";

    IDbDataParameter nameParam = cmd.CreateParameter();
    nameParam.ParameterName = "@name";
    nameParam.DbType = DbType.String;
    nameParam.Value = Request.QueryString["name"];


    IDbDataParameter companyParam = cmd.CreateParameter();
    companyParam.ParameterName = "@company";
    companyParam.DbType = DbType.String;
    companyParam.Value = Request.QueryString["company"];

    cmd.Parameters.Add(nameParam);
    cmd.Parameters.Add(companyParam);

    // execute query here
}

Obviously you can create an extension method to make it easier, but the standard API is really clumsy…

LikeLike

sonergonul says:

August 9, 2014 at 4:18 pm

Actually, using AddWithValue is kind of.. evil. http://blogs.msmvps.com/jcoehoorn/blog/2014/05/12/can-we-stop-using-addwithvalue-already/

LikeLike

Reply
1. jonskeet says:
  
  August 9, 2014 at 4:19 pm
  
  Yup, that describes my preferred approach, too. Not that I ever use SQL from C# in my normal life, to be honest…
  
  LikeLike
  
  Reply

cbp says:

August 10, 2014 at 10:46 pm

We occasionally forgo use of parameters due to the problems with parameter sniffing. In certain situations big performance gains can be made. Definitely wouldn’t make a habit out of it, but it’s amazing the difference that can be made to query plans.

LikeLike

Reply
Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1670
Joseph says:

August 11, 2014 at 12:35 pm

The only time parameters seem onerous is for a 50 row insert. Do you really parameterize all 500 values? How dangerous is AppendSqlLiteral and AppendFormat for numbers and dates with the numeric format specified? Also what about performance? I’d be willing to be convinced.

LikeLike

Reply
1. jonskeet says:
  
  August 11, 2014 at 12:37 pm
  
  For a 50 row insert, wouldn’t you just batch 50 single-row inserts, in which case you’re only parameterizing 10 values? I’d need to look at specific examples – and bear in mind that I’m far from a SQL expert – but I’d generally be far more comfortable in assuming that everything is safe if I’m using parameters than using something like AppendSqlLiteral.
  
  LikeLike
  
  Reply
  1. Joseph says:
    
    August 11, 2014 at 12:50 pm
    
    As far as pure performance, these benchmarks cover a lot. (Oddly, substringing from one long parameter for each row gives the best and truly ugliest performance benefit.)
    Basically, if I can prove that AppendSqlLiteral is incapable of outputting a vulnerable value, and I can abstract away the ugliness, I’ll be tempted to stay with it for the performance improvements.
    
    LikeLike
    
    Reply
    1. jonskeet says:
      
      August 11, 2014 at 12:53 pm
      
      The “if I can prove” part is the tricky bit there. It means making sure you’re aware of every weird thing that can be done in SQL. As someone with a relatively poor knowledge of SQL, it would take a lot to convince me of the safety – and I’d rather give up a bit of performance for the guarantee of safety. But different developers will have different tolerance levels for that, and in different contexts.
      
      Now if framework provided something to do it – the SQL equivalent of Regex.Escape – that would give me a lot more confidence.
      
      LikeLiked by 1 person
      
      Reply
Pingback: BobbyTables Culture - The Daily Six Pack: August 12, 2014
Bill P. Godfrey says:

August 12, 2014 at 3:18 pm

We’re not helping beginners by calling it “sanitizing”. It leads to bad mental models such as you’ve demonstrated. Almost every use of “sanitization” (of user inputs) should be replaced with “containment”.

Sanitizing leads people to think that strings need cleaning up, to the chagrin of anyone with an apostrophe in their name.

LikeLike

Reply
Paulo Morgado says:

August 12, 2014 at 3:25 pm

Jon, you have a typo on the output of the program: > instead of >

LikeLike

Reply
1. jonskeet says:
  
  August 12, 2014 at 9:13 pm
  
  Thanks, fixed.
  
  LikeLike
  
  Reply
Pingback: Les liens de la semaine – Édition #93 | French Coding
Dimitar Dimitrov says:

August 19, 2014 at 10:22 am

Hey Jon, I love your new blog, It’s very clean and easy on the eyes. I have a bit of catching up to do.

As for the article, I do see a lot of cases like this in SO, sometimes when I point it out I get a bit of a “this-is-not-what-I-asked” attitude. You see, I HAVE to point it out :) It’s like when I see someone querying the DB fetching all the records and THEN filter them in memory, very often I see people doing pagination this way (for example), arghhh!

LikeLike

Reply
1. triptyx says:
  
  August 19, 2014 at 6:18 pm
  
  It’s a problem that is pervasive in most companies I work with. Most developers, especially for smaller companies, just don’t understand injection security or (in some cases) the need for security at all. I had one business owner tell me we didn’t need to spend time securing the web service for a Flex app – security through obscurity was just fine. When I explained that he’d have penis pictures in his church-oriented application, it was just a matter of time, suddenly he decided that securing the web service might be a good idea.
  
  LikeLike
  
  Reply
Matt Johnson says:

September 5, 2014 at 5:30 pm

This would also be an attack vector for an app that lets the user adjust their output date/time format by a custom string. I’ve seen a few of those. :)

Also, a point that a lot of folks forget about paramaterization is that it also improves performance. Most SQL engines will cache the execution plan of the SQL strings they are given, but they do the cache lookup with a hash of the entire string. So if your parameters are inlined the string, then you get a different hash code and a cache-miss. So if the parameters vary often, then the SQL string will have to be recompiled on the fly instead of using the cached execution plan. Paramaterization ensures that the same hash is computed regardless of the parameter values, as the string only contains the placeholder or name of the parameter.

LikeLike

Reply
Pingback: Эти занимательные региональные настройки | Zit@i0
matt says:

March 25, 2015 at 4:54 pm

This won’t be among the great comments in this post and comment section; I just wanted to say that Bobby Tables is one of my favorite Cartoons.

LikeLike

Reply
Christoph says:

March 26, 2021 at 12:16 pm

I think the example is interesting but not really relevant. I mean if somebody can access my current process and modify the CultureInfo of the current thread, they can also call Environment.Exit() or make a recursive call and provoke a StackOverflowException. But when concatenating a query, one is anyway forced to use InvariantCulture with format parameters. All other cultures are variant, and believe me, I know what I am talking about, I’m Swiss where Microsoft changed – based on a crackpot idea of our federal chancellery – the default number and currency format of our culture some years ago (and then back again after all the complaints), so that depending on what server the program is running on, the default culture (not the user overrides!) are either with a decimal symbole of a “.” or a “,” and a digit grouping symbol of an apostrophe or a space (that even topped it, a space! Totally practical for OCR, no differentiation anymore whether this are two numbers or a single number with digit grouping…). Did you ever try do the same on a parametrized query? They probably also use ToString on a Guid or the like, and if you change the InvariantCulture through reflection, I would not be surprise to see them behave exactly the same…

LikeLike

Reply
1. jonskeet says:
  
  March 26, 2021 at 12:22 pm
  
  Parameterized queries don’t typically convert the values to strings at all – they transmit the values separately from the query, using whatever the DB-specific protocol uses for that purpose. (So yes, potentially as strings, but I wouldn’t expect so… I’d expect a suitable binary representation.)
  
  LikeLike
  
  Reply
Pingback: Is it safe to not parameterize an SQL query when the parameter is not a string?