Code and data

August 29, 2010 jonskeet 13 Comments

In a recent Stack Overflow question, I answered a question which started off with a broken XPath expression by suggesting that that poster might be better off using LINQ to XML instead. The discussion which followed in the comments (around whether or not this was an appropriate answer) led me to think about the nature of code and data, and how important context is.

I don’t think there’s any particularly deep insight in this post – so I’ll attempt to keep it relatively short. However, you might like to think about how code and data interact in your own experience, and what the effects of this can be.

Code is data

Okay, so let’s start off with the obvious: all code is data, at some level. If it’s compiled code, it’s just binary data which a machine can execute. Put it on another machine with no VM, and there’s nothing remarkable about it. It’s just a load of 1s and 0s. As source code, most languages are just plain text. Open up some source code written in C#, Ruby, Python, Java, C++ etc in Notepad and it’ll be readable. You may miss the syntax highlighting and so forth, but it’s still just text.

Code in the right context is more than just data

So what makes this data different to (say) a CSV file or a plain text story? It’s all in the context. When you load it into the right editor, or pass it to the right compiler, you get more information: in an editor you may see the aforementioned syntax highlighting, autocompletion, documentation for members you’re using; a compiler will either produce errors or a binary file. For something like Python or Ruby, you may want to feed the source into an interpreter instead of a compiler, but the principle is the same: the data takes on more meaning.

Code in the wrong code-related context is just data again

Now let’s think about typical places where you might put code (or something with similar characteristics) into the "wrong" context:

SQL statements
XSLT transformations
XPath expressions
XML or HTML text
Regular expressions

All of these languages have editors which understand them, and will help you avoid problems. All of these are also possible to embed in other code – C#, for example. Indeed, almost all the regular expressions I’ve personally written have ended up in Java or C# code. At that point, there are two problems:

You may want to include text which doesn’t embed easily within the "host" language’s string literals (particularly double quotes, backslashes and newlines)
The code editor doesn’t understand the additional meaning to the text

The first problem is at least somewhat mitigated by C#’s support for verbatim string literals – only double quotes remain as a problem. But the second problem is the really big one. Visual Studio isn’t going to check that your regular expression or XPath expression looks valid. It’s not going to give you syntax highlighting for your SQL statement, much less IntelliSense on the columns present in your database. Admittedly such a thing might be possible, if the IDE looked ahead to find out where the text was going to be used – but I haven’t seen any IDE that advanced yet. (The closest I’ve seen is ReSharper noticing when you’re using a format string with the wrong number of parameters – that’s primitive but still really useful.)

Of course, you could write your SQL (or XPath etc) in a dedicated editor, and then either copy and paste it into your code or embed it into your eventual binary and load it at execution time. Neither of these is particularly appealing. Copy and paste works well once, but then when you’re reading or modifying the code you lose the advantages you had unless you copy and paste it again. Embedding the file can work well in some cases – I use it liberally for test data in unit tests, for example – but I wouldn’t want it all over production code. It means that when reading the code, you have to refer to the external resource to work out what’s going to happen. In some cases that’s not too bad – it’s only like opening another class or method, I guess – but in other cases the shift of gears is too distracting.

When code is data, it’s easy to mix it with other data – badly

Within C# code, it’s easy to see the bits of data which sometimes occur in your code: string or numeric literals, typically. Maybe you subscribe to the "no magic values" philosophy, and only ever have literals (other than 0 or 1, typically) as values for constants. Well, that’s just a level of indirection – which in some ways hides the fact that you’ve still got magic values. If you’re only going to use a piece of data once, including it directly in-place actually adds to readability in my view. Anyway, without wishing to dive into that particular debate too deeply, the point is that the compiler (or whatever) will typically stop you from using that data as code – at least without being explicit about it. It will make sure that if you’re using a value, it really is a value. If you’re trying to use a variable, it had better be a variable. Putting a variable name in quotes means it’s just text, and using a word without the quotes will make the compiler complain unless you happen to have a variable with the right name.

Now compare that with embedding XPath within C#, where you might have:

var node = doc.SelectSingleNode("//foo/bar[@baz=xyz]");

Now it may be obvious to you that "xyz" is meant to be a value here, not the name of an attribute, an element, a function or anything like that… but it’s not obvious to Visual Studio, which won’t give you any warnings. This is only a special case of the previous issue of invalid code, of course, but it does lead onto a related issue… SQL injection attacks.

When you’ve already got your "code" as a simple text value – a string literal containing your SQL statement, as an obvious example – it’s all too easy to start mixing that code/data with genuine data data: a value entered by a user, for example. Hey, let’s just concatenate the two together. Or maybe use a format string, effectively mixing three languages (C#, SQL, the primitive string formatting "language" of string.Format) into a single statement. We all know the results, of course: nothing differentiates between the code/data and the genuine data, so if the user-entered value happens to look like SQL to drop a database table, we end up with Little Bobby Tables.

I’m sure 99% of my blog readers know the way to avoid SQL injection attacks: use parameterized SQL statements. Keep the data and the code separate, basically.

Expressing the same ideas, but back in the "native" language

Going back to the start of all this, the above is why I like LINQ to XML. When I express a query using LINQ to XML, it’s often a lot longer than it would have been in the equivalent XPath – but I can tell where the data goes. I know where I’m using an element name, where I’m using an attribute name, and where I’m comparing or extracting values. If I miss out some quotes, chances are pretty high that the resulting code will be invalid, and it’ll be obvious where the problem is. I’m prepared to sacrifice brevity for the fact that I only work in a single language + library, instead of trying to embed one language within another.

Likewise building XML using LINQ to XML is much better than concatenating strings – I don’t need to worry about any nasty escaping issues, for example. LINQ to XML has been so nicely design, it makes all kinds of things incredibly easy.

Regular expressions can sometimes be replaced by simple string operations. Where they can, I will often do so. I’d rather use a few IndexOf and Substring calls over a regular expression in general – but where the patterns I need get too tricky, I will currently fall back to regular expressions. I’m aware of ReadableRex but I haven’t looked at it in enough detail to say whether it can take the place of "normal" regular expressions in the way that LINQ to XML can so often take the place of XPath.

Of course, LINQ to SQL (and the Entity Framework) do something similar for SQL… although that’s slightly different, and has its own issues around predictability.

In all of these cases, however, the point is that by falling back to more verbose but more native-feeling code, some of the issues of embedding one language within another are removed. Code is still code, data is data again, and the two don’t get mixed up with each other.

Conclusion

If I ever manage to organize these thoughts in a more lucid way, I will probably just rewrite them as another (shorter) post. In the meantime, I’d urge you to think about where your code and data get uncomfortably close.

General, Stack Overflow

Writing the perfect question

August 29, 2010 jonskeet 102 Comments

Please note: this blog post is not a suitable place to post a development question. You should probably be doing that on Stack Overflow – after reading this post, of course. I’ve received a truly astonishing number of comments on this post which are basically misplaced questions. Such comments are simply marked as spam – they don’t belong here.

TL;DR: If you don’t have time to read this post fully, I have a checklist version. I’d strongly encourage you to read this post fully though, when you have time.

I’ve added a tinyurl to this post for easy reference:
http://tinyurl.com/stack-hints. Nice and easy to remember :)

A while ago, I wrote a blog entry on how to answer questions helpfully on sites like Stack Overflow. Recently I saw a meta question about bad questions and thought it would be worth following up with another blog post on asking questions. For the sake of convenience – and as Stack Overflow is so popular – I will assume the question is going to be asked on Stack Overflow or a similar Stack Exchange site. Most of the post doesn’t actually depend on that, but if you’re asking elsewhere you may need to tweak the advice a little.

There are plenty of similar resources around, of course – in particular, How to Ask Questions the Smart Way by Eric Raymond and Rick Moen is a perennial favourite. Still, I think I can bring something to the table.

The Golden Rule: Imagine You’re Trying To Answer The Question

If you don’t remember anything else from this post, remember this bit. Everything else follows from here. (And yes, this does smack somewhat of Matthew 7:12.)

Once you’ve finished writing your question, read it through. Imagine you were coming to it fresh, with no context other than what’s on the screen. Does it make sense? Is it clear what’s being asked? Is it easy to read and understand? Are there any obvious areas you’d need to ask about before providing an answer? You can usually do this pretty well however stuck you are on the actual question. Just apply common sense. If there’s anything wrong with the question when you’re reading it, obviously that will be a problem for whoever’s actually trying to answer it. So fix the problems. Improve the question until you can read it and think, “If I only knew the answer to the question, it would be a pleasure to provide that answer.” At that point, post and wait for the answers to come rolling in.

Obviously this is somewhat easier to do if you have a certain amount of experience answering questions, particularly on the forum where you’re about to post. So, what should you be looking out for?

Question title

When a reader first sees your question, they’re likely to be scrolling down a list of snippets. The most eye-catching part of the snippet will be the title – so use that text wisely. While you can include language or platform information, you should only do so naturally – not as a sort of “header”. For example, this is bad:

Java: Why are bytes signed?

But this is okay:

Why are bytes signed in Java?

Of course, you should also include this information in tags, as it will help people who pay particular attention to specific tags.

Ideally, a question title should be a question – but frankly that’s not always feasible. I would recommend favouring a short, descriptive title which captures the theme of the question without actually being a question instead of really trying to crowbar it into the form of a question when it really doesn’t want to be. That’s not an excuse for laziness though – it’s usually possible to come up with a good title which is genuinely a question.

It’s important that the question title is specific though, and has at least some meaning with no other information. A question such as “Why doesn’t this work?” makes absolutely no sense without the rest of the question. Likewise a “question” title of “Please help” is unlikely to do well.

Context

In most cases, anyone answering the question will need to know what language and platform you’re using. The basics should usually be communicated through tags, but it may very well be worth providing more information:

Language version (e.g. C# 4)
Platform version (e.g. .NET 3.5; note that this isn’t always implicit from the language version, or vice versa)
Operating system, if it could be relevant (e.g. particular permissions issues)
Any other relevant software (e.g. database type and version, the IDE you’re using, web server you’re connecting to)
Any other constraints. This is particularly important. It’s really annoying to give a perfectly good answer to a question, only to be told that you’re not allowed to use feature X or Y which provide the obvious solution.
- If you have unusual constraints, it’s worth explaining why. Not only does this answer the obvious follow-up comment, but it also gives more information about what other solutions may not be applicable.

Describe what you’ve already tried and the results of any research. (You have searched for a solution to your problem before asking it, haven’t you? Stack Overflow isn’t meant to replace basic search skills.) Often there will be other people in a similar situation, but the answers didn’t quite match your situation. Just like the above point about unusual constraints, it saves time if you can point out differences between your situation and other common ones. It’s even worth referring to other related questions explicitly – particularly if they’re on the same forum. Aside from anything else, this shows a certain amount of “due diligence” – people are generally more willing to help you if can show you’ve already put some effort in.

You should absolutely make sure that you tag the question appropriately. If you’re not sure which exact tags are appropriate, see what gets auto-suggested and look at samples for each one. If that sounds like a lot of work, just remember how much time you may be able to save yourself in the long run. It gets easier over time, of course.

Problem statement

Make sure it’s obvious what you’re trying to get out of the question. Too many “questions” are actually just statements: when I do X, something goes wrong.

Well, what did you expect it to do? What are you trying to accomplish? What have you already tried? What happened in those attempts? Be detailed: in particular, if something didn’t work, don’t just state that: tell us how it failed. If it threw an exception, what was the exception? (Don’t just give the type – give the error message and say which line threw it. If there’s a nested exception, post that too.)

If at all possible, write a sort of “executive summary” at the start of your question, followed by a more detailed description. Remember that on the list of questions, the first few sentences will appear as a snippet. If you can get a sense of the question across in that snippet, you’re more likely to attract views from people who can answer the question.

One trap that many posters fall into is to ask how to achieve some “small” aim, but never say what the larger aim is. Often the smaller aim is either impossible or rarely a good idea – instead, a different approach is needed. Again, if you provide more context when writing your problem statement, we can suggest better designs. It’s fine to specify how you’re currently trying to solve your bigger problem, of course – that’s likely to be necessary detail – but include the bigger goal too.

Sample code and data

I may be biased on this one. I’m a huge believer in sample code, both for questions and answers… and I probably use it in an unconventional way. I usually paste it into a text editor, and try to compile it from the command line. If that’s not likely to work (and the problem isn’t obvious by inspection), I’m unlikely to bother too much with it. Firing up Eclipse or Visual Studio and finding an existing project I don’t care about or starting a new one is going to take much more time.

That means if you want me to look at code, it should:

Be standalone. Don’t try to talk to a database unless you really have to. (Obviously for database questions, that’s kinda hard :) If you use sample XML, provide a short but complete XML file for us to reproduce the issue with. (And the same for other file types, obviously.)
Be complete. If there are missing imports or using directives, that’s really annoying
Actually compile (unless the compilation error is the reason for the question). Don’t give me code which is “something like” the real code but which clearly isn’t the real code, and may well not exhibit the same symptoms by the time I’ve fixed it so that it compiles.
Ideally not bring up a UI. Unless your code is about a UI issue, don’t bring one up. Console apps are simpler, and simplicity is a huge benefit when trying to hunt down a problem.
Demonstrate the problem. You should be able to say, “I expected the result to be X, it’s actually Y.” (You should actually say that too, so that we can check that we get the same results.)
Be as short as possible. If I have to wade through hundreds of lines of code to find the problem, I’m doing work that you should be doing. Often if you work hard to reduce the problem to a short but complete program, you’ll find the issue yourself. You can absolutely do this without knowing what the problem is; you should be looking to the community for their expertise, not their willingness to spend time on your problem doing the work that you can do yourself.

Yes, this is a relatively onerous list. It doesn’t all apply to every problem, but it does apply in a great many situations. While I get put off by reams of irrelevant, badly formatted code, some of which clearly won’t compile, the inverse is true as well: if I can tell by looking at the question that the code can go through a copy/paste/compile/run cycle really quickly, I’m much more likely to pay the question significant attention.

In data-oriented questions, it’s very often helpful to give some sample data. Cut out anything irrelevant (if your real table has 50 columns, you only need to include relevant ones) but make sure that you give enough sample input for it to be meaningful. For example, if you’re trying to group some data by a PersonID column, it’s pretty useless if there’s only one PersonID given, or if each PersonID only appears once. If you are giving examples of expected input and corresponding output, make sure it’s clear why that’s the expected output. Often I see questions which give a small number of samples, and there are various ways they could be interpreted. This is one area where it’s particularly important to reread the question from a stranger’s point of view: while a brief summary of the desired results may well make sense to someone who already knows what your app is trying to achieve, it may be gobbledygook to those trying to answer your question.

Spelling, grammar and formatting

I know not everyone speaks English natively. My own command of non-English languages is lamentably poor – I’m incredibly lucky that my native tongue happens to be the lingua franca of the technical world. However, if you’re trying to communicate on an English-language forum, you owe it to yourself to make an effort to write at least reasonably correct English.

Please use capital letters where appropriate. It really can make a big difference in the readability of text.
Please split your text into paragraphs. Imagine this blog post as one big paragraph – it would be almost impossible to read.
Please write actual words. There are undoubtedly some abbreviations which are acceptable to most readers – IMO, IIRC etc – there’s no reason to switch into text-speak with “gr8”, “bcoz”, “u” and so forth. It’s unlikely that you’re actually writing your question on a phone with only a primitive keyboard; show your readers respect by writing properly. It may take you a few more seconds, but if it means you get an answer quicker, it’s surely worth the extra effort.
Most browsers have built-in spelling checkers these days, or at least have plug-ins or extensions available to check your text. Technical text often creates a lot of false positives for checkers, but if your spelling isn’t generally great, it’s worth looking at the suggestions.

Having said all of this, you’re not trying to create a literary masterpiece. You’re trying to communicate your question as effectively as possible. If you’re faced with the choice between an unambiguous but ugly sentence, or a phrase which stirs the soul but leaves the reader confused about exactly what you mean, go for the unambiguous option every time.

One way a huge number of questions can be improved with very little effort is simply formatting them properly. Stack Overflow’s markdown editor is very good – the preview below your input box is almost always accurate in terms of the eventual result, and you can always edit the question later if anything doesn’t quite work. The exact details of the markdown is beyond the scope of this article – Stack Overflow has a detailed guide though – if you’re new to the site, I’d recommend you at least skim through it.

By far the most important kind of formatting is making code look like code. Within a text paragraph, simply surround the code with backticks `like this`. For blocks of code, just indent everything by four spaces. If you’re cutting and pasting code, it may already be indented (for example if you’re copying code within a class) but if not, the easiest way to indent everything is to paste it, select the whole code block, and then press Ctrl-K or the “{ }” button just above the editor.

One of the important things about code formatting is that it means angle brackets (and some other symbols) are preserved instead of being swallowed by the markdown formatter. In some cases this can mean all the difference between a question which is easy to answer and one which doesn’t make any sense, particularly in terms of generics in Java and C# or templates in C++. For example, like this

Why can’t I convert an expression of type List<string> to List<object>?

makes no sense at all if the type arguments are removed:

Why can’t I convert an expression of type List to List?

Often experienced members of the site will recognise what’s going on and edit your question for you, but obviously it’s better if they don’t have to.

Making a good impression

Leaving aside the main body of the question, there are a few simple ways to get the community “on your side” and therefore more likely to give you a useful answer quickly.

Register as a user and give yourself a meaningful name. It doesn’t have to be your real name, but frankly names like “Top Coder” or “Coding Guru” look pretty silly when you’re asking a question which others find simple. That’s still better than leaving yourself as “user154232” or whatever identifier is assigned to you by default though. Aside from anything else, it shows a certain amount of commitment to the question and/or site: if you’ve bothered to give yourself a name, you’re less likely to be an “ask-and-run” questioner.
Keep an eye on your question. There may well be requests for clarification – and of course, answers! If you receive an answer which wasn’t quite what you were looking for, explain carefully (and politely) why it’s not suitable for your purposes. Consider going back and editing your question to make it clearer for subsequent users.
Don’t add your own answer unless it really is an answer. Often users add extra details in an “answer” when they should really have just edited their question. Likewise editing your question is generally a better idea than adding a long comment to an existing answer – particularly if that comment contains a block of code (which won’t work well in a comment). If you do change the question in response to an answer though, it’s worth adding a comment to the answer just to let the user know that you’ve updated it though… you may well find they quickly edit their answer to match the revised question.
There’s no need to include greetings and sign-offs such as “Hi everyone!” and “Thanks – hope to get an answer soon” in the question. These will often be edited out by other users, as they’re basically a distraction. Greetings at the start of a question are particularly useless as they can take up valuable space in the snippet displayed in the question list.
Above all, be polite. Remember that no-one is getting paid to answer your question. Users are giving up their time to help you – so please be appreciative of that. If you’re asking a homework question, explain why you’re asking for help with something that traditionally you’d have to answer all by yourself. If a user suggests that your general approach is wrong and that there’s a better way of doing things, don’t take it personally: they’re trying to help you improve your code. By all means disagree robustly, but don’t start into ad hominem arguments. (This advice applies to answerers as well, of course.)
(Somewhat specific to Stack Overflow.) If an answer is particularly helpful or solves your problem, accept it by clicking on the tick mark by it. This gives extra credit to the person who provided that answer, as well as giving more information to future readers.

Conclusion and feedback

Stack Overflow is an amazing resource (along with other Q&A sites, of course). The idea that you can get a good answer to a wide range of questions within minutes is pretty staggering… but there’s an obvious correlation between the quality of a question and the likelihood that you’ll get quick, helpful answers. Put that extra bit of effort in yourself, and it will probably pay for itself very quickly.

I’m hoping to keep this blog post up to date with suggestions received – if I’ve missed out anything, over- or under-emphasized a specific point, or generally gone off track, let me know either in the comments here or mail me (skeet@pobox.com). If this document ends up elsewhere, then that copy may end up being the “canonical” one which is edited over time – in which case I’ll indicate that here.

C#, Speaking engagements

Reflecting on presentation skills: The Guathon, August 13th 2010

August 14, 2010 jonskeet 4 Comments

(I apologise for the unstructured nature of this post. I honestly don’t know how to structure it. I’ve thought of a few ways of breaking it up by heading, and none of them really work. Particular apologies to Simon Stewart, who has requested more brevity in my blog. Just for Simon, the executive summary is: Scott Guthrie is a really good speaker. I want to be more like him.)

Yesterday I had the good fortune (well, good friends – thanks Phil!) to attend the Guathon in London. This was a free, day-long event with Scott Guthrie and Mike Ormond, talking about MVC 2 and 3, Visual Studio 2010, Windows Phone 7 and more. This was my encounter with Scott – and indeed the first time I’d seen him present. (I value videos of presentations, but rarely find time to actually watch them, more’s the pity.)

Obviously I was interested in hearing about the technologies they were talking about, but I confess I was more interested in watching how Scott went about presenting. (I’ve seen Mike present before – but clearly Scott was the "big name" here. No offence meant to Mike whatsoever, who did a great job talking about Windows Phone 7.) Scott is a legend in the industry, and as I’m very interested in improving my public speaking skills, I felt I had at least as much to learn in that area as anything else.

I was really impressed. In some ways, Scott didn’t present in a way I’d expected him to… but what he did was so much better. Not having seen him before, I’d sort of expected an utterly polished sort of talk – almost like a Steve Jobs presentation. I was hoping to get some insight into what sort of polish I could add to my presentations: where does it make sense to have photos, where do simpler visuals work, where are words important? How do you present against an enormous screen without losing the audience’s focus? Do jokes enhance a presentation or detract from it?

In retrospect, this was hopelessly naïve. I think Scott’s secret sauce is actually pretty simple: he knows what he’s talking about, and talks about it honestly and openly. He’s completely authentic, obviously passionate about what he does, good humoured (we had a few bits of mild Google/Bing banter), and interested in the audience.

At almost every turn, Scott asked the audience how many of us had used a certain feature, or developed in a certain way. This was then reflected in the level at which he pitched the next section, as well as giving a few opportunities for jokes. There were questions throughout – particularly in Mike’s talk, actually – to the extent that I’d say a good quarter to a third of the time was spent answering the audience. This was a very good thing, in my view – I can’t remember finding any of the questions irrelevant or obvious (I should state for the record that I probably asked more questions than anyone else; apologies if other attendees found my questions to be boring). Questions from the audience are always a good reality check, because they’re clearly addressing real concerns rather than the ones in the speaker’s imagination. But the best thing about the questions was Scott’s way of answering, which could broadly be divided into three types of answer:

A known answer: "Yes, you can do X – and you can do Y as well. But you can’t do Z."
An unknown answer which was easily testable: "Hmm. I’m not sure. Let’s try it. Ah yes, the code does X." (There were fewer of these, just due to the nature of the questions.)
An answer which was unknown but needed further investigation: "Send me a mail and I’ll get back to you about it."

The last one is most interesting – because I have absolutely no doubt that Scott will get back to anyone who sent him a mail. (I’ve sent him two.) Now don’t forget that Scott is a Corporate Vice President (Dev Div). He’s clearly a busy man… but his openness and passion make an enormously positive impression, suggesting that he’s the kind of guy who doesn’t think of himself as being above such questions. Assuming this is what he says at all his presentations (and I suspect it is), I dread to think how much time he spends every day answering emails… but I also suspect that it’s of enormous benefit to the products for which he’s responsible, by keeping the executive level in touch with grass-roots developers.

So, what have I learned from the whole experience, in terms of presentation skills?

You can definitely give awesome presentations without fancy graphics. Content is king.
There’s no substitute for knowing your stuff, and being honest about when you don’t know the answer.
Interaction with the audience is beneficial to everyone.
Sitting down and just writing out code – particularly with audience participation to make the demo "belong" to them – is absolutely fine.
Scott’s an incredibly nice guy, and it shines through very clearly. I really hope to see him again soon.
If you speak clearly, speed doesn’t matter too much: Scott talks really fast, but is very easy to listen to.
If you lose a vital file in the middle of a presentation, check the recycle bin. It’s the virtual equivalent of checking down the back of the sofa.
Don’t worry if you have more material than you have time to present, particularly if that’s due to audience questions.

Whether I’ll be able to apply this myself remains to be seen… although I’ve already been acutely aware of how much more comfortable I am when presenting on "home topics" (e.g. C# language features) than areas where I have a lot less expertise (e.g. Reactive Extensions).

Jon Skeet's coding blog

Monthly Archives: August 2010