Breaking Liskov

March 16, 2009 jonskeet 31 Comments

Very recently, Barbara Liskov won the Turing award, which makes it a highly appropriate time to ponder when it’s reasonable to ignore her most famous piece of work, the Liskov Substitution (or Substitutability) Principle. This is not idle speculation: I’ve had a feature request for MiscUtil. The request makes sense, simplifies the code, and is good all round – but it breaks substitutability and documented APIs.

The substitutability principle is in some ways just common sense. It says (in paraphrase) that if your code works for some base type T, it should be able to work with subtype of T, S. If it doesn’t, S is breaking substitutability. This principle is at the heart of inheritance and polymorphism – I should be able to use a Stream without knowing the details of what its underlying storage is, for example.

Liskov’s formulation is:

Let q(x) be a property provable about objects x of type T. Then q(y) should be true for objects y of type S where S is a subtype of T.

So, that’s the rule. Sounds like a good idea, right?

Breaking BinaryReader’s contract

My case in point is EndianBinaryReader (and EndianBinaryWriter, but the arguments will all be the same – it’s better to focus on a single type). This is simply an equivalent to System.IO.BinaryReader, but it lets you specify the endianness to use when converting values.

Currently, EndianBinaryReader is a completely separate class to BinaryReader. They have no inheritance relationship. However, as it happens, BinaryReader isn’t sealed, and all of the appropriate methods are virtual. So, can we make EndianBinaryReader derive from BinaryReader and use it as a drop-in replacement? Well… that’s where the trouble starts.

There’s no difficulty technically in doing it. The implementation is fairly straightforward – indeed, it means we can drop a bunch of methods from EndianBinaryReader and let BinaryReader handle it instead. (This is particularly handy for text, which is fiddly to get right.) I currently have the code in another branch, and it works fine.

And I would have gotten away with it if it weren’t for that pesky inheritance…

The problem is whether or not it’s the right thing to do. To start with, it breaks Liskov’s substitutability principle, if the “property” we consider is “the result of calling ReadInt32 when the next four bytes of the underlying stream are 00, 00, 00, 01” for example. Not having read Liskov’s paper for myself (I really should, some time) I’m not sure whether this is the intended kind of use or not. More on that later.

The second problem is that it contradicts the documentation for BinaryReader. For example, the docs for ReadInt32 state: “BinaryReader reads this data type in little-endian format.” That’s a tricky bit of documentation to understand precisely – it’s correct for BinaryReader itself, but does that mean it should be true for all subclasses too?

When I’ve written in various places about the problems of inheritance, and why if you design a class to be unsealed that means doing more design work, this is the kind of thing I’ve been talking about. How much detail does it make sense to specify here? How much leeway is there for classes overriding ReadInt32? Could a different implementation read a “compressed” Int32 instead of always reading four bytes, for example? Should the client care, if they make sure they’ve obtained an appropriate BinaryReader for their data source in the first place? This is basically the same as asking how strictly we should apply Liskov’s substitutability principle. If two types are the same in every property, surely we can’t distinguish between them at all.

I wonder whether most design questions of inheritance basically boil down to defining which properties should obey Liskov’s substitutability principle and which needn’t, for the type you’re designing. Of course, it’s not just black and white – there will always be exceptions and awkward points. Programming is often about nuance, even if we might wish that not to be the case.

Blow it, let’s do it anyway…

Coming back to BinaryReader, I think (unless I can be persuaded otherwise) that the benefits from going against the documentation (and strict substitutability) outweigh the downsides. In particular, BinaryReaders don’t tend to be passed around in my experience – the code which creates it is usually the code which uses it too, or it’s at least closely related. The risk of breaking code by passing it a BinaryReader using an unexpected endianness is therefore quite low, even though it’s theoretically possible.

So, am I miles off track? This is for a class library, after all – should I be more punctilious about playing by the rules? Or is pragmatism the more important principle here?

31 thoughts on “Breaking Liskov”

duniho says:

March 16, 2009 at 1:39 pm

Well, to me at least part of the question is unanswered in your article. What _are_ the benefits? You’ve only mentioned the advantage of having to define fewer members in your EndianBinaryReader. Is that really that huge an advantage? It’s not like you have to _implement_ those members…you can just delegate to a contained BinaryReader.

Often another benefit of sub-class is so that you _can_ pass around the instance as a base class instead of the more-derived type. But is that a benefit here? After all, it seems to me a primary argument proposed here in favor of “Breaking Liskov” is that you _don’t_ expect that to happen. So presumably that advantage doesn’t apply.

I think there’s a good argument that the basic design of BinaryReader (and BinaryWriter) is flawed from the outset, that it should _not_ have been endian-specific to start with. After all, in some sense it’s at least as similar to TextReader as it is to StreamReader or StringReader, and TextReader is storage/encoding-agnostic even though that’s exactly what the sub-classes are handling.

But does that justify breaking it now?

I’m not one to say you have to follow a given rule 100% of the time. Part of programming (or any endeavor) is knowing when the rules don’t apply and need to be broken. But, there should also be a strong win in doing so. I’m not sure you’ve made your case here, focusing almost exclusively on why you think the disadvantages are mitigated, rather than on what the advantages are, never mind why they might be compelling.

LikeLike

Reply
skeet says:

March 16, 2009 at 2:00 pm

Thanks for the comments Peter, and I do take your point. While I’d expect the “using” code to be *near* the “creating” code, they don’t have to be in *exactly* the same place – and allowing something to be passed as just a BinaryReader could simplify things, so long as the user didn’t expect the *exact* behaviour of BinaryReader.

I’d be interested to see the main benefits myself – this all came about because of a “customer” for MiscUtil who requested the change to simplify *their* codebase…

(And yes, I’d agree that BinaryReader/Writer is flawed in its design…)

Jon

LikeLike

Reply
Abraham Pinzur says:

March 16, 2009 at 2:07 pm

Jon, I think your case at hand may actually be beside the point. In my mind, because Liskov’s principle focuses on *objects*, the construction mechanism would be outside its purview. And since the BinaryReader’s input is provided through its constructor, your derived type’s differing behavior is not actually observable to clients of the *object* itself.

Does that make any sense?

LikeLike

Reply
Duncan Godwin says:

March 16, 2009 at 2:17 pm

Is what you are suggesting really that different from the issues surrounding the steam class where you some times can’t set position or seek? The stream class in this case has CanSeek etc to try and help with the violations.

I seem to remember reading an article with Joshua Bloch discussing the design of the collection classes in Java and how some method occasionally had to throw a not implemented exception. The idea as to avoid combinational explosion on interfaces etc. As with all design there are tradeoffs.

LikeLike

Reply
skeet says:

March 16, 2009 at 2:25 pm

@Abraham: I see what you mean, but the observable behaviour *is* different – because you have access to the base stream. You can read from the stream, rewind the stream to the point you started at, read with BinaryReader and observe the result.

In other words, when the underlying stream is seekable you can (just with public members) determine the endianness.

LikeLike

Reply
skeet says:

March 16, 2009 at 2:26 pm

@Duncan: Yes, I totally agree there are trade-offs. I think this is one of those areas where computer science purity and pragmatism sometime clash… I just thought it might be interesting to look at one such case in a small amount of detail :)

LikeLike

Reply
C. Watford says:

March 16, 2009 at 2:49 pm

I think you *may* be ok in this regard as MS seems to have left BinaryReade/Writer unsealed. I would take this as an invitation to overload the Binary* classes. Why would you want a downstream library which takes a BinaryReader to have to care about how the BinaryReader gets the Int32LE? All your library has to do is provide the Int32LE like the original BinaryReader specifies.

An example that I have dealt with is a downstream library wants a BinaryReader but is unaware the floating point data is VAX Floats with PDP Endianess. I should be allowed to overload BinaryReader to provide a seamless experience for the downstream library I may not have the ability to change!

Now, whether or not this will cause headaches with a downstream .Net version I couldn’t say. Just that by not sealing the classes MS has left them open to subclassing, meaning they at the very least invited somebody to subclass.

LikeLike

Reply
Abraham Pinzur says:

March 16, 2009 at 2:59 pm

@Jon: You’re right – that BaseStream property changes everything; but on the other hand, could you not override BaseStream to return a read-only wrapper Stream (CanSeek = false)?
=)

BTW, I’m nit-picking your example, but I agree with your main point. Mistakes in the design of unsealed classes can most certainly force unfortunate trade-offs between OO principles and economic considerations.

LikeLike

Reply
Ed Blackburn says:

March 16, 2009 at 3:00 pm

Forgive me if I’ve misunderstood, but can’t you have both?

Keep your current Reader, which complies with LSP, also have your different implementation.

EndianBinaryReader and EndianBinaryReaderDecorator or EndianBinaryReaderImpl

You only really need one “real” implementation, the other will simply be a decorator. You could possibly introduce a further abstraction / interface for your own Reader implementation to enforce Liskov for your own footprint? If you see a flaw fix it, this isn’t Java, you won’t be putting MiscUtil for .Net 4.0 – will you(!) :o

Ultimately it’s a principle not a commandment :D

LikeLike

Reply
David Nelson says:

March 16, 2009 at 3:06 pm

The endian-ness of the BinaryReader really has to be specified in the documentation, otherwise it would useless. I agree with Peter, there should have been an implementation-agnostic base class analogous to TextReader. Unfortunately that ship has sailed. (Interestingly, TextReader is part of the ECMA specification for the BCL, while BinaryReader is not. I wonder if endian-ness had something to do with that?)

While I can see certain theoretical advantages of being able to treat EndianBinaryReader like BinaryReader, I also think the LSP exists for a good reason. Someone somewhere has written code that depends on the endian-ness of BinaryReader, and someone somewhere is going to use EndianBinaryReader instead (because after all it inherits from BinaryReader, so it is substitutable, right?), and something is going to break, and someone is probably going to blame you :) I would have to be convinced that there were significant advantages to the inheritance relationship before I would approve such a change, and even then I wouldn’t feel comfortable with it.

If you do decide to do it, make sure you document it in BIG BOLD LETTERS, so at least you have some CYA.

LikeLike

Reply
James Curran says:

March 16, 2009 at 3:42 pm

I think the real problem is trying for theortic perfection, given an imperfect starting position.
In a perfect world, BinaryWriter & EndianBinaryWriter would be sibling leaf nodes, both inheriting for a “missing link”, let’s call it “IBinaryWriter”. The fact that IBinaryWriter doesn’t exist is MSFT’s error, not yours.

Second, the fact that you make use of some of the internals of BinaryWriter is an irrelevant implementation detail. The fact that it can then be used in place of BinaryWriter, is a hack (by the users of EndianBinaryWriter, not you), made necessary by the lack of IBinaryWriter, which that should have been using in place of BinaryWriter in the first place.

On a more practical level, BinaryWriter & EndianBinaryWriter appear to have exact the same relationship as TextWriter & HtmlTextWriter.

BTW, on a completely different topic, this past Saturday (3/14), The high IQ crowd here in the USA celebrated “Pi Day”. Did the UK as well, or do European geeks hold out until the 3rd day of the 14th months?

LikeLike

Reply
Rik Hemsley says:

March 16, 2009 at 4:05 pm

Your EndianDataReader is not a DataReader. It works similarly to a DataReader, but it is not a subclass. As you have said, it breaks the substitutability principle.

It is very easy to fix the model. Either put EndianDataReader alongside DataReader, so it performs a similar function, or turn it into an adapter, which contains a DataReader.

Also, everything Peter said!

LikeLike

Reply
ShuggyCoUk says:

March 16, 2009 at 6:16 pm

The existing API is screwed by not supplying IReader which doesn’t care about what underlying format is but has a complementary IWriter interface implementation whose output it will correctly read.

Seeing this you can then see the problem with extending any implementation however. The *pairing* no longer holds true, something writer by a BinaryWriter is NOT readabe correctly by an EndianBinaryReader…
This is more of a serious flaw.

If methods are coded to take the interfaces then they make it clear that they delegate responsibility for maintaining the pairing to someone else.

This screams out for a utility class which can deal with arbitrary byte order work as a private field (static if it’s tread safe) in each class to hold the common code but having each writer a separate (and sealed) class.

Obviously you have no interfaces to work with here so you’re basically shafted in terms of doing it cleanly…

LikeLike

Reply
skeet says:

March 16, 2009 at 6:55 pm

Thanks for all the comments folks – I’ll look in detail tomorrow morning. Two very quick comments:

1) Pairing Reader/Writer: that doesn’t mean that Encoder and Decoder are flawed classes, does it? You have to use the same encoding for both of them for it to make sense. I have no problem with that.

2) Endianness having to be specified for the class to make sense: better tell that to the designers of BitConverter, which has the IsLittleEndian property to say which endianness it’s using :) (So you can’t use BitConverter to encode on one computer and *assume* that you can decode on another.)

Will look at the rest tomorrow. Boy it’s good to have smart readers who care enough to write detailed comments like this.

LikeLike

Reply
Steven A. Lowe says:

March 16, 2009 at 7:57 pm

Hi Jon,

The spirit of Liskov subsitution is de facto preserved by the inheritance of the superclass’ protocol.

Think about it this way: if subclasses were required to preserve every detail of the operation of their superclass, then inheritance would be strictly additive, i.e. overrides would be forbidden. Taken to the extreme, it would not even be additive, for the fact that class A has 10 methods is a “provable property” so subclassing it to add an 11th method arguably violates this – which is of course silly…

What you’re describing is an override. Reading on a bit in the wikipedia article we see “if S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program.” The key word here is DESIRABLE.

If someone wants little-endian behavior, they use the superclass. If they want big-endian behavior they use your new subclass. The old behavior is not desirable, the new behavior is desirable.

Implementing an entirely new class without inheriting from BinaryReader is wasteful and redundant and unnecessary. And also redundant and wasteful. And unnecessary. :-P

If this makes you nervous, perhaps you should email Ms. Liskov and get her permission ;-)

http://www.pmg.lcs.mit.edu/~liskov/

What’s the big deal?

LikeLike

Reply
configurator says:

March 16, 2009 at 9:45 pm

Here are my thoughts after reading this post:
“No! What the @#$% are you doing! BinaryReader and EndianBinaryReader should share a base, and be siblings in the inheritance tree, both inheriting from… Wait a second, BinaryReader doesn’t have a base class, nor an interface. Hmm.
The only way to make it interchangeable with anything is if that thing (EndianBinaryReader) inherits from it. Making it interchangeable is very recommended. The problem here is that the framework doesn’t supply an abstract base class or an interface which you can inherit, so you must inherit BinaryReader. But you should definitely do it, otherwise all code that could accept both would have to be duplicated, which is definitely a code smell. A code smell generated by your class library.”

LikeLike

Reply
Jeff Yates says:

March 17, 2009 at 7:37 am

If you’re really concerned about this, you could create your own BinaryReader and BinaryWriter base classes that don’t specify endianness and then wrap the BCL BinaryReader.

LikeLike

Reply
Mark Brackett says:

March 17, 2009 at 8:37 am

The *remarks* state that it’s read in little-endian format. To me, that’s an implementation detail.

The contract says that it reads an Int32, and my internal least-surprise-o-meter says that it should read that Int32 in whatever endianness it was written in.

AFAIC, no LSP violation to see here…move along.

LikeLike

Reply
Jorge says:

March 17, 2009 at 9:13 am

The problem here, I think, comes from the fact that BinaryReader is NOT a BinaryReader, the whole name is missleading.

If the class name was LittleEndianBinaryReader then you won’t even be thinking about inheriting from it, you’ll have, as some have suggested a common base class.

But the lack of a proper name gives the impression that BinaryReader provides an abstraction to read binary data disregarding the byte ordering… Well it doesn’t.

LikeLike

Reply
David Nelson says:

March 17, 2009 at 9:58 am

“Endianness having to be specified for the class to make sense: better tell that to the designers of BitConverter, which has the IsLittleEndian property to say which endianness it’s using :) (So you can’t use BitConverter to encode on one computer and *assume* that you can decode on another.)”

The endianness of BitConverter is still documented, it is just documented at runtime instead of design time, which I would argue is actually less useful. What are you going to do if you check IsLittleEndian at runtime and it doesn’t match the endianness of your bits? Its not like you have the option of changing the endianness to suit your needs. At least with BinaryReader, you know ahead of time which way it goes, and can make decisions appropriately.

My point was, leaving the endianness unspecified for the sake of having a looser contract would make the class useless, since you would never know if it matches your stream. Obviously if they had built the ability to switch endianness into the BinaryReader in the first place, then they wouldn’t have to document a particular endianness; but then we wouldn’t need your EndianBinaryReader, and we wouldn’t be having this conversation :)

On the other hand, I can see Stephen A. Lowe’s point (which I think is also Mark Brackett’s point in a roundabout way). While the little-endianness of BinaryReader is documented, that doesn’t necessarily mean that it is part of the contract of the type as a superclass. The question is, are libraries which consume BinaryReader likely to rely on its endianness? If they do, and you have a big-endian stream, then there is already a mismatch. On the other hand, if a library doesn’t rely on the endianness of the reader, then providing an EndianBinaryReader makes it possible to use that library with a big-endian stream, where it wasn’t possible before. Based on that, I am now inclined to say that deriving EndianBinaryReader from BinaryReader is the right thing to do (although I still say that explicit documentation is necessary).

LikeLike

Reply
skeet says:

March 17, 2009 at 10:04 am

I still haven’t got round to reading and fully absorbing these comments, but the *rate* of them is incredible. I nearly didn’t write this post – I wasn’t sure anyone would be interested. Golly.

LikeLike

Reply
Einar W. Høst says:

March 17, 2009 at 10:24 am

I agree with Peter and Rik.

LSP is really an assertment that what we in our class-based OO languages somewhat inappropriately call “inheritance” really is a logical relation between categories. I do not “inherit” anything from man, I _am_ a man. (For an interesting discussion, see Derek Rayside’s paper “An Aristotelian Understanding of Object Oriented Programming”.)

Note that Steven’s amusing example of the “has 10 methods” property could (and should) be considered an accidental property in Aristotelian terms. It is the essential properties we need to worry about. Since the docs explicitly state that BinaryReader reads little-endian data, I would say it is an essential property.

“Inheritance” may be a more appropriate metaphor with respect to prototype-based OO languages?

LikeLike

Reply
Daniel Earwicker says:

March 17, 2009 at 11:24 am

I’ve always interpreted Liskov’s reference to “property provable about objects” to mean a set of provable properties that the software designer has to define for themselves, based on what they believe to be necessary, in whatever contexts are applicable. It can’t possibly mean every possible provable property – should GetType().Name return the same thing on the subclass as it does on the superclass?

LikeLike

Reply
Kristian Freed says:

March 17, 2009 at 12:23 pm

Apologies if I duplicate someones answer, I think most views have been expressed. People (rightly) complain about the BinaryReader both declaring an interface (non-sealed with virtual members that very well might be implemented in more than one way) and implementing a specific behaviour. You can’t change that fact, but you can roll your own IBinaryReader with the same methods, have your new reader class implement it and let LittleEndianBinaryReader : BinaryReader, IBinaryReader, with no overrides.

Code that is using BinaryReader now can be easily be refactored to use the interface instead, third party libraries that take a BinaryReader should not be trusted to do the right thing if given an implementation different from the document behaviour, so not being able to pass yours to them is no loss. The only case not covered as nicely as could possibly be is when you receive a BinaryReader from third party code. If this really bothers you, I guess you should let LittleEndianBinaryReader wrap a BinaryReader instead of deriving from it, with an implicit cast in either direction.

LikeLike

Reply
David Nelson says:

March 17, 2009 at 1:09 pm

@Einar

“The docs explicitly state” can’t be used as the test for what is and isn’t an accidental property. After all, the docs explicitly state that there are X properties and X methods, too! Not to mention that much of what is documented in MSDN are implementation details, not contract requirements. Unfortunately I think this comes down to a judgment call.

@Kristian,

“third party libraries that take a BinaryReader should not be trusted to do the right thing if given an implementation different from the document behaviour, so not being able to pass yours to them is no loss”

This is precisely the case that is most interesting when discussing LSP and inherited behavior. If you toss this case out the window, none of the rest of it really matters. The question is, should a library X define a class which may break the contract of the supertype, but which otherwise has significant benefits, given that the user of X may want to pass an instance of that class to library Y?

LikeLike

Reply
Einar W. Høst says:

March 17, 2009 at 2:08 pm

@David

I agree that it can’t be used as a test, but surely it is something to inform that judgment call you mention? (Also, are you sure the number of methods and properties are explicitly stated, not implicitly (as in: you do the addition)? If so, then the docs are of questionable quality indeed!)

LikeLike

Reply
Olof Bjarnason says:

March 17, 2009 at 4:22 pm

Is it just me or wouldn’t Liskovs formulation, taken literally, mean that the base type and the inherited type are identical..? It looks like the definition of mathematical/logical equality to me.

I guess it would be fruitful to know the exact details surrounding that formulation, for example what “provable property” means.

LikeLike

Reply
Wilka says:

March 27, 2009 at 11:58 am

Instead of creating a EndianBinaryReader, why not make a new Stream derived type (EndianStream) to use with the existing BinaryReader?

LikeLike

Reply
skeet says:

March 27, 2009 at 12:19 pm

@Wilka: No, not reliably.

For instance, suppose BinaryReader decided to read an Int32 by reading two bytes twice – how would I know that was happening and do the appropriate reversal?

A stream is just a sequence of bytes – it only has endianness when you impose some sort of structure on it from above. That’s what BinaryReader does, but that means anything at a lower level than BinaryReader can’t guess at that structure.

LikeLike

Reply
Mats Helander says:

May 12, 2009 at 6:44 am

Kristian Freed’s comment FTW!

@David, Re: your objections to Kristian,

You are of course right in the abstract case, but Kristian’s comment is correct in relation to this concrete case when the designers of the .NET BCL *have* screwed up.

Since MS only made it possible for 3-rd party developers to bind their code directly to an endian-specific implementation, that’s what has to be assumed they have done (and that’s why MS should have provided an IBinaryReader interface and called their BinaryReader LittleEndianBinaryReader).

As it is, any 3-rd party developer not wishing to bind thus poorly would have to follow Kristian’s suggestions and export their own IBinaryReader interface.

What Kristian describes is the right workaround for when somebody forgets to deliver an abstract type for their implementation. You put a type that should have been further down in a type hierarchy at its proper place in a new, correct type hierarchy.

LikeLike

Reply
Trap says:

May 20, 2009 at 9:04 am

Hi Jon.

I like EndianBinaryReader not being inherited from BinaryReader, because imho and as someone else said the latter is a poorly named LittleEndianBinaryReader. To my eyes it’d be like inheriting Cat from Dog instead of Animal, and try to implement Cat by tweaking a Dog’s behaviour. (I don’t know if this makes any sense). If this were the case, poor naming, I wouldn’t have bothered implementing an EndianBinaryReader, just its complementary class, a well named BigEndianBinaryReader.

LikeLike

Reply

Jon Skeet's coding blog

Breaking Liskov

Breaking BinaryReader’s contract

And I would have gotten away with it if it weren’t for that pesky inheritance…

Blow it, let’s do it anyway…

31 thoughts on “Breaking Liskov”

Leave a comment Cancel reply

Breaking BinaryReader’s contract

And I would have gotten away with it if it weren’t for that pesky inheritance…

Blow it, let’s do it anyway…

Share this:

Related

31 thoughts on “Breaking Liskov”

Leave a comment Cancel reply