Stack Overflow Culture

This blog post was most directly provoked by this tweet from my friend Rob Conery, explaining why he’s giving up contributing on Stack Overflow.

However, it’s been a long time coming. A while ago I started writing a similar post, but it got longer and longer without coming to any conclusion. I’m writing this one with a timebox of one hour, and then I’ll post whatever I’ve got. (I may then reformat it later.)

I’m aware of the mixed feelings many people have about Stack Overflow. Some consider it to be completely worthless, but I think more people view it as “a valuable resource, but a scary place to contribute due to potential hostility.” Others contribute on a regular basis, occasionally experiencing or witnessing hostility, but generally having a reasonable time.

This post talks about my experiences and my thoughts on where Stack Overflow has a problem, where I disagree with some of the perceived problems, and what can be done to improve the situation. This is a topic I wish I’d had time to talk about in more detail with the Stack Overflow team when I visited them in New York in February, but we were too busy discussing other important issues.

For a lot of this post I’ll talk about “askers” and “answerers”. This is a deliberate simplification for the sake of, well, simplicity. Many users are both askers and answerers, and a lot of the time I’ll write comments with a view to being an answerer, but without necessarily ending up writing an answer. Although any given user may take on different roles even in the course of an hour, for a single post each person usually has a single role. There are other roles of course “commenter on someone else’s answer” for example – I’m not trying to be exhaustive here.

Differences in goals and expectations

Like most things in life, Stack Overflow works best when everyone has the same goal. We can all take steps towards that goal together. Conversely, when people in a single situation have different goals, that’s when trouble often starts.

On Stack Overflow, the most common disconnect is between these two goals:

  • Asker: minimize the time before I’m unblocked on the problem I’m facing
  • Answerer: maximize the value to the site of any given post, treating the site as a long-lasting resource

In my case, I have often have a sub-goal of “try to help improve the diagnostic skill of software engineers so that they’re in a better position to solve their own problems.”

As an example, consider this question – invented, but not far-fetched:

Random keeps giving me the same numbers. Is it broken?

This is a low-quality question, in my view. (I’ll talk more about that later.) I know what the problem is likely to be, but to work towards my goal I want the asker to improve the question – I want to see their code, the results etc. If I’m right about the problem (creating multiple instances of System.Random in quick succession, which will also use the same system-time-based seed), I’d then almost certainly be able to close the question as a duplicate, and it could potentially be deleted. In its current form, it provides no benefit to the site. I don’t want to close the question as a duplicate without seeing that it really is a duplicate though.

Now from the asker’s perspective, none of that is important. If they know that I have an idea what the problem might be, their perspective is probably that I should just tell them so they can be unblocked. Why take another 10 minutes to reproduce the problem in a good question, if I can just give them the answer now? Worse, if they do take the time to do that and then I promptly close their question as a duplicate, it feels like wasted time.

Now if I ignored emotions, I’d argue that the time wasn’t wasted:

  • The asker learned that when they ask a clearer question, they get to their answer more quickly. (Assuming they follow the link to the duplicate and apply it.)
  • The asker learned that it’s worth searching for duplicate questions in their research phase, as that may mean they don’t need to ask the question at all.

But ignoring emotions is a really bad idea, because we’re all human. What may well happen in that situation – even if I’ve been polite throughout – is that the asker will decide that Stack Overflow is full of “traffic cop” moderators who only care about wielding power. I could certainly argue that that’s unfair – perhaps highlighting my actual goals – but that may not change anyone’s mind.

So that’s one problem. How does the Stack Overflow community agree what the goal of site is, and then make that clearer to users when they ask a question? It’s worth noting that the tour page (which curiously doesn’t seem to be linked from the front page of the site any more) does include this text:

With your help, we’re working together to build a library of detailed answers to every question about programming.

I tend to put it slightly differently:

The goal of Stack Overflow is to create a repository of high-quality questions, and high-quality answers to those questions.

Is that actually a shared vision? If askers were aware of it, would that help? I’d like to hope so, although I doubt that it would completely stop all problems. (I don’t think anything would. The world isn’t a perfect place.)

Let’s move onto another topic where I disagree with some people: low-quality questions.

Yes, there are low-quality questions

I assert that even if it can’t be measured in a totally objective manner, there are high-quality questions and low-quality questions (and lots in between).

I view a high-quality question in the context of Stack Overflow as one which:

  • Asks a question, and is clear in what it’s asking for. It should be reasonably obvious whether any given attempted answer does answer the question. (That’s separate to whether the answer is correct.)
  • Avoids irrelevancies. This can be hard, but I view it as part of due diligence: if you’re encountering a problem as part of writing a web-app, you should at least try to determine whether the context of a web-app is relevant to the problem.
  • Is potentially useful to other people. This is where avoiding irrelevant aspects is important. Lots of people need to parse strings as dates; relatively few will need to parse strings as dates using framework X version Y in conjunction with a client written in COBOL, over a custom and proprietary network protocol.
  • Explains what the asker has already tried or researched, and where they’ve become stuck.
  • Where appropriate (which is often the case) contains a minimal example demonstrating the problem.
  • Is formatted appropriately. No whole-page paragraphs, no code that’s not formatted as code, etc.

There are lots of questions which meet all those requirements, or at least most of them.

I think it’s reasonable to assert that such a question is of higher quality than a question which literally consists of a link to a photo of a homework assignment, and that’s it. Yes, I’ve seen questions like that. They’re not often quite that bad, but if we really can’t agree that that is a low-quality question, I don’t know what we can agree on.

Of course, there’s a huge spectrum in between – but I think it’s important to accept that there are such things as low-quality questions, or at least to debate it and find out where we disagree.

Experience helps write good questions, but isn’t absolutely required

I’ve seen a lot of Meta posts complaining that Stack Overflow is too hard on newcomers, who can’t be expected to write a good question.

I would suggest that a newcomer who accepts the premise of the site and is willing to put in effort is likely to be able to come up with at least a reasonable question. It may take them longer to perform the research and write the question, and the question may well not be as crisp as one written by a more experienced developer in the same situation, but I believe that on the whole, newcomers are capable of writing questions of sufficient quality for Stack Overflow. They may not be aware of what they need to do or why, but that’s a problem with a different solution than just “we should answer awful questions which show no effort because the asker may be new to tech”.

One slightly separate issue is whether people have the diagnostic skills required to write genuinely good questions. This is a topic dear to my heart, and I really wish I had a good solution, but I don’t. I firmly believe that if we can help programmers become better at diagnostics, then that will be of huge benefit to them well beyond asking better Stack Overflow questions.

Some regular users behave like jerks on Stack Overflow, but most don’t

I’m certainly not going to claim that the Stack Overflow community is perfect. I have seen people being rude to people asking bad questions – and I’m not going to excuse that. If you catch me being rude, call me out on it. I don’t believe that requesting improvements to a question is rude in and of itself though. It can be done nicely, or it can be done meanly. I’m all for raising the level of civility on Stack Overflow, but I don’t think that has to be done at the expense of site quality.

I’d also say that I’ve experienced plenty of askers who react very rudely to being asked for more information. It’s far from one way traffic. I think I’ve probably seen more rudeness in this direction than from answerers, in fact – although the questions usually end up being closed and deleted, so anyone just browsing the site casually is unlikely to see that.

My timebox is rapidly diminishing, so let me get to the most important point. We need to be nicer to each other.

Jon’s Stack Overflow Covenant

I’ve deliberately called this my covenant, because it’s not my place to try to impose it on anyone else. If you think it’s something you could get behind (maybe with modifications), that’s great. If Stack Overflow decides to adopt it somewhere in the site guidelines, they’re very welcome to take it and change it however they see fit.

Essentially, I see many questions as a sort of transaction between askers and answerers. As such, it makes sense to have a kind of contract – but that sounds more like business, so I’d prefer to think of a covenant of good faith.

As an answerer, I will…

  • Not be a jerk.
  • Remember that the person I’m responding to is a human being, with feelings.
  • Assume that the person I’m responding to is acting in good faith and wants to be helped.
  • Be clear that a comment on the quality of a question is not a value judgement on the person asking it.
  • Remember that sometimes, the person I’m responding to may feel they’re being judged, even if I don’t think I’m doing that.
  • Be clear in my comments about how a question can be improved, giving concrete suggestions for positive changes rather than emphasizing the negative aspects of the current state.
  • Be clear in my answers, remembering that not everyone has the same technical context that I do (so some terms may need links etc).
  • Take the time to present my answer well, formatting it as readably as I can.

As an asker, I will…

  • Not be a jerk.
  • Remember that anyone who responds to me is a human being, with feelings.
  • Assume that any person who responds to me is acting in good faith and trying to help me.
  • Remember that I’m asking people to give up their time, for free, to help me with a problem.
  • Respect the time of others by researching my question before asking, narrowing it down as far as I can, and then presenting as much information as I think may be relevant.
  • Take the time to present my question well, formatting it as readably as I can.

I hope that most of the time, I’ve already been following that. Sometimes I suspect I’ve fallen down. Hopefully by writing it out explicitly, and then reading it, I’ll become a better community member.

I think if everyone fully took something like this on board before posting anything on Stack Overflow, we’d be in a better place.

Implementing IXmlSerializable in readonly structs

Background

There are three things you need to know to start with:

Operations on read-only variables which are value types copy the variable value first. I’ve written about this before on this blog. C# 7.2 addresses this by introducing the readonly modifier for structs. See the language proposal for more details. I was touched to see that the proposal references my blog post :)

The ref readonly local variables and in parameter features in C# 7.2 mean that “read-only variables” are likely to be more common in C# than they have been in the past.

Noda Time includes many value types which implement IXmlSerializable. Noda Time implements IXmlSerializable.ReadXml by assigning to this: fundamentally IXmlSerializable assumes a mutable type. I use explicit interface implementation to make this less likely to be used directly on an unboxed variable. With a generic method using an interface constraint you can observe a simple method call mutating a non-read-only variable, but that’s generally harmless.

Adding the readonly modifier to Noda Time structs

I relished news of the readonly modifier for struct declarations. At last, I can remove my ReadWriteForEfficiency attribute! Hmm. Not so much. To be fair, some of the structs (7 out of 18) were fine. But every struct that implements IXmlSerializable gave me this error:

Cannot assign to ‘this’ because it is read-only

That’s reasonable: in members of a readonly struct, this effectively becomes an in parameter instead of a ref parameter. But how can we fix that? Like any sensible developer, I turned to Stack Overflow, which has precisely the question I’m interested in. It even has an answer! Unfortunately, it amounts to a workaround: assigning to this via unsafe code.

Violating readonly with unsafe code

To give a concrete example of the answer from Stack Overflow, here’s my current LocalTime code:

void IXmlSerializable.ReadXml([NotNull] XmlReader reader)
{
    Preconditions.CheckNotNull(reader, nameof(reader));
    var pattern = LocalTimePattern.ExtendedIso;
    string text = reader.ReadElementContentAsString();
    this = pattern.Parse(text).Value;
}

Here’s the code that compiles when LocalTime is marked as readonly, after enabling unsafe blocks:

unsafe void IXmlSerializable.ReadXml([NotNull] XmlReader reader)
{
    Preconditions.CheckNotNull(reader, nameof(reader));
    var pattern = LocalTimePattern.ExtendedIso;
    string text = reader.ReadElementContentAsString();
    fixed (LocalTime* thisAddr = &this)
    {
        *thisAddr = pattern.Parse(text).Value;
    }
}

Essentially, the unsafe code is bypassing the controls around read-only structs. Just for kicks, let’s apply the same change
throughout Noda Time, and think about what would happen…

As it happens, that fix doesn’t work universally: ZonedDateTime is a “managed type” because it contains a reference (to a DateTimeZone) which means you can’t create a pointer to it. That’s a pity, but if we can make everything else readonly, that’s a good start. Now let’s look at the knock-on effects…

Safe code trying to abuse the unsafe code

Let’s try to abuse our “slightly dodgy” implementations. Here’s a class we’ll put in a “friendly” assembly which is trying to be as helpful as possible:

using NodaTime;

public class AccessToken
{
    private readonly Instant expiry;
    public ref readonly Instant Expiry => ref expiry;

    public AccessToken(Instant expiry) => this.expiry = expiry;
}

Great – it lets you get at the expiry time of an access token without even copying the value.

The big test is: can we break this friendly’s code’s assumptions about expiry really not changing its value?

Here’s code I expected to mutate the access token:

static void MutateAccessToken(AccessToken accessToken)
{
    ref readonly Instant expiry = ref accessToken.Expiry;
    string xml = "<evil>2100-01-01T00:00:00Z</evil>";
    EvilReadXml(in expiry, xml);
}

static void EvilReadXml<T>(in T value, string xml) where T : IXmlSerializable
{
    var reader = XmlReader.Create(new StringReader(xml));
    reader.Read();
    value.ReadXml(reader);
}

We have an in parameter in EvilReadXml, so the expiry variable is being passed by reference, and then we’re calling ReadXml on that parameter… so doesn’t that mean we’ll modify the parameter, and thus the underlying expiry field in the object?

Nope. Thinking about it, the compiler doesn’t know when it compiles EvilReadXml that T is a readonly struct – it could be a regular struct. So it has to create a copy of the value before calling ReadXml.

Looking the the spec proposal, there’s one interesting point in the section on ref extension methods:

However any use of an in T parameter will have to be done through an interface member. Since all interface members are considered mutating, any such use would require a copy.

Hooray! That suggests it’s safe – at least for now. I’m still worried though: what if the C# team adds a readonly struct generic constraint in C# 8? Would that allow a small modification of the above code to break? And if interface methods are considered mutating anyway, why doesn’t the language know that when I’m trying to implement the interface?

But hey, now that we know unsafe code is able to work around this in our XML implementation, what’s to stop nasty code from using the same trick directly?

Unsafe code abusing safe code

Imagine we didn’t support XML serialization at all. Could unsafe code mutate AccessToken? It turns out it can, very easily:

static void MutateAccessToken(AccessToken accessToken)
{
    ref readonly Instant expiry = ref accessToken.Expiry;
    unsafe
    {
        fixed (Instant* expiryAddr = &expiry)
        {
            *expiryAddr = Instant.FromUtc(2100, 1, 1, 0, 0);
        }
    }
}

This isn’t too surprising – unsafe code is, after all, unsafe. I readily admit I’m not an expert on .NET security, and I know the
landscape has changed quite a bit over time. These days I believe the idea of a sandbox has been somewhat abandoned – if you care about executing code you don’t really trust, you do it in a container and use that as your security boundary. That’s about the limit of my knowledge though, which could be incorrect anyway.

Where do we go from here?

At this point, I’m stuck making a choice between several unpleasant ones:

  • Leave Noda Time public structs as non-read-only. That prevents users from writing efficient code to use them without copying.
  • Remove XML serialization from Noda Time 3.0. We’re probably going to remove binary serialization anyway on the grounds that Microsoft is somewhat-discouraging it going forward, and it’s generally considered a security problem. However, I don’t think XML serialization is considered to be as bad, so long as you use it carefully, preventing arbitrary type loading. (See this paper for more information.)
  • Implement XML serialization using unsafe code as shown above. Even though my attempt at abusing it failed, that doesn’t provide very much comfort. I don’t know what other ramifications there might be for including unsafe code. Does that limit where the code can be deployed? It also doesn’t help with my concern about a future readonly struct constraint.

Thoughts very welcome. Reassurances from the C# team that “readonly struct” constraints won’t happen particularly welcome… along with alternative ways of implementing XML serialization.