Wacky Ideas 3: Object life-cycle support

No, don’t leave yet! This isn’t another article about non-deterministic finalization, RAII etc. That’s what we almost always think of when someone mentions the object life-cycle, but I’m actually interested in the other end of the cycle – the “near birth” end.

We often take it as read that when an object’s constructor has completed successfully, the object should be ready to use. However, frameworks and technologies like Spring and XAML often make it easier to create an object and then populate it with dependencies, configuration etc. Yes, in some cases it’s more appropriate to have a separate configuration class which is used for nothing but a bunch of properties, and then the configuration can be passed into the “real” constructor in one go, with none of the readability problems of constructors taking loads of parameters. It’s all a bit unsatisfactory though.

What we most naturally want is to say, “Create me an empty X. Now configure it. Now use it.” (Okay, and as an obligatory mention, potentially “Now make it clean up after itself.”)

While configuring the object, we don’t want to call any of the “real” methods which are likely to want to do things. We may want to be able to fetch some of the configuration back again, e.g. so that some values can be relative to others easily, but we don’t want the main business to take place. Likewise, when we’ve finished configuring the object, we generally want to validate the configuration, and after that we don’t want anyone to be able to change the configuration. Sometimes there’s even a third phase, where we’ve cleaned up and want to still be able to get some calculated results (the byte array backing a MemoryStream, for instance) but not call any of the “main” methods any more.

I’d really like some platform support for this. None of it’s actually that hard to do – just a case of keeping track of which phase you’re in, and then adding a check to the start of each method. Wouldn’t it be nicer to have it available as attributes though? Specify the “default phase” for any undecorated members, and specify which phases are valid for other members – so configuration setters would only be valid in the configuration phase, for instance. Another attribute could dictate the phase transition – so the ValidateAndInitialize method (or whatever you’d call it) would have an attribute stating that on successful completion (no exceptions thrown) the phase would move from “configure” to “use”.

Here’s a short code sample. The names and uses of the attributes could no doubt be improved, and if there were only a few phases which were actually useful, they could be named in an enum instead, which would be neat.

[Phased(defaultRequirement=2, initial=1)]
class Sample
{
    IAuthenticator authenticator;
    
    public IAuthenticator Authenticator
    {
        [Phase(1)]
        [Phase(2)]
        get
        {
            return authenticator;
        }
        [Phase(1)]
        set
        {
            authenticator = value;
        }
    }
    
    [Phase(1)]
    [PhaseTransition(2)]
    public void ValidateAndInitialize()
    {
        if (authenticator==null)
        {
            throw new InvalidConfigurationException("I need an authenticator");
        }
    }
    
    public void DoSomething()
    {
        // Use authenticator, assuming it's valid
    }
    
    public void DoSomethingElse()
    {
        // Use authenticator, assuming it's valid
    }
}

Hopefully it’s obvious what you could and couldn’t do at what point.

This looks to me like a clear example of where AOP should get involved. I believe that Anders isn’t particularly keen on it, and when abused it’s clearly nightmarish – but for certain comment things, it just makes life easier. The declarative nature of the above is simpler to read (IMO – particularly if names were used instead of numbers) than manually checking the state at the start of each method. I don’t know if any AOP support is on the slate for Java 7 – I believe things have been made easier for AOP frameworks by Java 6, although I doubt that any target just Java 6 yet. We shall have to see.

One interesting question is whether you’d unit test that all the attributes were there appropriately. I guess it depends on the nature of the project, and just how thoroughly you want to unit test. It wouldn’t add any coverage, and would be hard to exhaustively test in real life, but the tests would be proving something…

Wacky Ideas 2: Class interfaces

(Disclaimer: I’m 99% sure I’ve heard someone smarter than me talking about this before, so it’s definitely not original. I thought it worth pursuing though.)

One of the things I love about Java and C# over C/C++ is the lack of .h files. Getting everything in the right place, only doing the right things in the right files, and coping with bits being included twice etc is a complete pain, particularly if you only do it every so often rather than it being part of your everyday life.

Unfortunately, as I’ve become more interface-based, I’ve often found myself doing effectively the same thing. Java and C# make life a lot easier than C in this respect, of course, but it still means duplicating the method signatures etc. Often there’s only one implementation of the interface – or at least one initial implementation – but separating it out as an interface gives a warm fuzzy feeling and makes stubbing/mocking easier for testing.

So, the basic idea here is to extract an interface from a class definition. In the most basic form:

class interface Sample
{
    public void ThisIsPartOfTheInterface()
    {
    }
    
    public void SoIsThis()
    {
    }
    
    protected void NotIncluded()
    {
    }
    
    private void AlsoNotIncluded()
    {
    }
}

So the interface Sample just has ThisIsPartOfTheInterface and SoIsThis even though the class Sample has the extra methods.

Now, I can see a lot of cases where you would only want part of the public API of the class to contribute to the interface – particularly if you’ve got properties etc which are meant to be used from an Inversion of Control framework. This could either be done with cunning keyword use, or (to make fewer syntax changes) a new attribute could be introduced which could decorate each member you wanted to exclude (or possibly include, if you could make the default “exclude” with a class-level attribute).

So far, so good – but now we’ve got two types with the same name. What happens when the compiler runs across one of the types? Well, here’s the list of uses I can think of, and what they should do:

  • Variable declaration: Use the interface
  • Construction: Sse the class
  • Array declaration/construction: Use the interface (I think)
  • typeof: Tricky. Not sure. (Note that in Java, we could use Sample.class and Sample.interface to differentiate.)
  • Type derivation: Not sure. Possibly make it explicit: “DerivedSample : class Sample” or “DerivedSample : interface Sample
  • Generics: I think this would depend on the earlier “not sure” answers, and would almost certainly be complicated

As an example, the line of code “Sample x = new Sample();” would declare a variable x of the interface type, but create an instance of the concrete class to be its initial value.

So, it’s not exactly straightforward. It would also violate .NET naming conventions. Would it be worth it, over just using an “Extract Interface” refactoring? My gut feeling is that there’s something genuinely useful in here, but the complications do seem to overwhelm the advantages.

Perhaps the answer is not to try to have two types with the same name (which is where the complications arise) but to be able to explicitly say “I’m declaring interface ISample and implementing it in Sample” both within the same file. At that point it may be unintuitive to get to the declaration of ISample, and seeing just the members of it isn’t straightforward either.

Is this a case where repeating yourself is fundamentally necessary, or is there yet another way of getting round things that I’m missing?

Wacky Ideas 1: Inheritance is dead, long live mix-ins!

(Warning: I’ve just looked up “mix-in” on Wikipedia and their definition isn’t quite what I’m used to. Apologies if I’m using the wrong terminology. What I think of as a mix-in is a proxy object which is used to do a lot of the work the class doing the mixing says it does, but preferably with language/platform support.)

I’ve blogged before about my mixed feelings about inheritance. It’s very useful at times, but the penalty is usually very high, and if you’re going to write a class to be derived from, you need to think (and document) about an awful lot of things. So, how about this: we kill of inheritance, but make mix-ins really easy to write. Oh, and I’ll assume good support for closures as well, as a lot can be done with the Strategy Pattern via closures which would otherwise often be done with inheritance.

So, let’s make up some syntax, and start off with an example from the newsgroups. The poster wanted to derive from Dictionary<K,V> and override the Add method to do something else as well as the normal behaviour. Unfortunately, the Add method isn’t virtual. One poster suggested hiding the Add method with a new one – a solution I don’t like, because it’s so easy for someone to break encapsulation by using an instance as a plain Dictionary<K,V>. I suggested re-implementing IDictionary<K,V>, having a private instance of Dictionary<K,V> and making each method just call the corresponding one on that, doing extra work where necessary.

Unfortunately, that’s a bit ugly, and for interfaces with lots of methods it can get terribly tedious. Instead, suppose we could do this:

using System.Collections.Generic;

class FunkyDictionary<K,V> : IDictionary<K,V>
{
IDictionary<K,V> proxyDictionary proxies IDictionary<K,V>;

void IDictionary<K,V>.Add(K key, V value)
{
// Do some other work here

proxyDictionary.Add(key, value);

// And possibly some other work here too
}
}

Now, that’s a bit simpler. To be honest, that kind of thing would cover most of what I use inheritance for. (Memo to self: write a tool which actually finds out how often I do use inheritance, and where, rather than relying on memory and gut feelings.) The equivalent of having an abstract base class and overriding a single method would be fine, with a bit of care. The abstract class could still exist and claim to implement the interface – you just implement the “missing” method in the class which proxies all the rest of the calls.

The reason it’s important to have closures (or at least delegates with strong language support) is that sometimes you want a base class to be able to very deliberately call into the derived class, just for a few things. For those situations, delegates can be provided. It achieves the same kind of specialization as inheritance, but it makes it much clearer (in both the base class and the “derived” one) where the interactions are.

One point of interest is that without any inheritance, we lose the benefits of a single inheritance tree – unless object becomes a general “any reference”, which is mostly what it’s used for. Of course, there are a few methods on System.Object itself which we’d lose. Let’s look at them. (Java equivalents aren’t specified, but Java-only ones are):

  • ToString: Not often terribly useful unless it’s been overridden anyway
  • GetHashCode/Equals: Over time I’ve been considering that it may have been a mistake to make these generally available anyway; when they’re not overridden they tend to behave very differently to when they are. Wrapping the existing behaviour wouldn’t be too hard when wanted, but otherwise make people use IEquatable<T> or the like
  • GetType: This is trickier. It’s clearly a pretty fundamental kind of call which the CLR will have to deal with itself – would making it a static (natively implemented) method which took an object argument be much worse?
  • MemberwiseClone: This feels “systemy” in the same way as GetType. Could something be done such that you could only pass in “this“? Not a terribly easy one, unless I’m missing something.
  • finalize (Java): This could easily be handled in a different manner, similar to how .NET does.
  • wait/notify/notifyAll (Java): These should never have been methods on java.lang.Object in the first place. .NET is a bit better with the static methods on the Monitor class, but we should have specific classes to use for synchronization. Anyway, that’s a matter I’ve ranted about elsewhere.

 

What are the performance penalties of all of this? No idea. Because we’d be using interfaces instead of concrete classes a lot of the time, there’d still be member lookup even if there aren’t any virtual methods within the classes themselves. Somehow I don’t think that performance will be the reason this idea is viewed as a non-starter!

Of course, all of this mix-in business relies on having an interface for everything you want to use polymorphically. That can be a bit of a pain, and it’s the subject of the next article in this series.

Wacky Ideas – Introduction

I’ve been having a few wacky ideas recently, and I think it’s time to put them to virtual paper. They’re mostly around how we think about OO, and how future languages and platforms could do things. I very much doubt that any of them are new. I suspect they’ve been mulled over by people who really know how to think about these things, and then write papers about them. Probably using TeX. I’m not going to that much effort, so there will be several things I haven’t thought through at all. I won’t go so far as to say that’s your job, but knowing my readership you’re likely to come up with loads of things I’d never considered anyway.

Most are likely to be phrased in C#/.NET terms, but they’re likely to apply to Java anyway. Some may have a few better fits in one language than another – I’ll point them out when I think of them.

I don’t necessarily think these are good ideas. Some are probably stinkers. Some may well be useful. Some may even occur one day. Some are bound to exist already in languages I don’t know, of which there are many. Almost all of them are likely to introduce new syntax (or take some away) which makes them non-starters for many scenarios. Don’t take it all too seriously, but I hope you have fun.

What would make a good Java book?

So, Groovy in Action has been out for a little while, and I’m missing it – or rather, book writing. I’d like my next project to be a solo effort, almost certainly on Java. However, I’m interested in hearing what you good folks think would make a good Java book. I’ve got some ideas myself, but I’d rather hear unprejudiced opinions first. (I may be soliciting more feedback at a later date, of course.) So, shoot – what would you like me to write about?

Build and config friendliness counts

Yesterday, I bought a Toppy PVR when my Tivo died. The details of what it does are irrelevant (although quite fun). The important thing is that it’s very hackable – so there are lots of extensions and access programs available. While the Windows ones are typically in binary form, the Linux ones aren’t. The Toppy gives access via a USB port, and programs either access that directly or use FTP to transfer files to it via an intermediate server which basically converts FTP requests into whatever protocol the USB connection uses.

Now, I have a Linkstation with the OpenLink firmware installed on it – a hard disk running a very cut down Linux on a fairly pitiful processor. I had a few bits and bobs on it already, notably TwonkyMedia and Subversion. While TwonkyMedia was a binary installation, Subversion was built from scratch, which took a little bit of doing, mostly because the configure script required sort, which wasn’t provided in the tools for the Linkstation. Doesn’t sound too bad – you just need to download the right package to build and install sort, right? Guess what the configure script for sort requires? Yup – sort. Fortunately a friend helped me out, battling with makefiles until we had a verison of sort which worked well enough to rebuild it properly.

All of this is partly background, but partly the whole point of this blog – build annoyances.

Anyway, back to the Toppy. Over the course of the last 24 hours or so, I’ve fetched/built the following packages on the Linkstation:

  • puppy – a “direct connection” client to fetch/store files
  • ftpd-topfield – an FTP/USB proxying server
  • toppy-web – a web application allowing (limited) remote control of the Toppy
  • lighttpd – a light-weight web server to host toppy-web
  • php – PHP, required as a fastcgi module for lighttpd to run toppy-web
  • libxml2 – an XML library required for some PHP features
  • byacc – Berkeley’s lex/yacc implementation; libxml2 needs yacc
  • ncftp – a Linux CLI FTP client

(Apologies if any of the dependencies in here are wrong. It was getting pretty late by the time I’d got a php-enabled webserver…)

The build/install procedure of all of these varied immensely, and the impression I gained of the quality of the software reflects this. For those of you who don’t do Linux builds regularly, the normal procedure is to run ./configure, then make, then make install. Sounds simple, right? Well…

puppy was straightforward, although it didn’t have a make install – it just built a binary. Still, it was simple enough, and worked first time.
Running it without specifying any arguments gave a useful help message, and it was easy to get to grips with.

ftpd-topfield wasn’t bad either. This time there was a make install, and even a make test. On the other hand, running it without any arguments just returned back to the console with no hint of what was going on. Using --help produced a reasonable help message, but it’s still not clear to me what it does when you don’t specify -D for standalone mode. It could be for running from inetd, but I don’t know. Anyway, it worked pretty well, and I don’t remember any build problems, so it can’t have been that bad. I had to write my own rc.init script, but a simple one is sufficing for the moment.

toppy-web was where the problems began. Now, it doesn’t need to be built itself, but it requires a PHP-capable web server. It’s not quite “onto the next item” though, because this is the one thing I still haven’t got running – due to the configuration aspect. It comes with a bunch of sample configuration files, but you have to hunt around the web for documentation, which still seems to be flaky at best. Now, I can entirely sympathise with the developers, but the point of this blog post is the comparison of build/configure procedures. I’ll get it going at some point, I’m sure (given the effort I’ve put into the rest) but I’m not sure I’ve got the energy just yet. This is a web application – why can’t it be configured with a web page?

lighttpd – the web server itself. This wasn’t too bad, if I remember rightly. The docs have fairly good descriptions of how to configure it appropriately, including what’s required from a php build. Which brings us to…

php – oh dear. It’s never a good sign when the configure script complains about syntax errors to start with. After googling these errors and finding that other people who had received them were told that basically they could be ignored, I let the script continue. After quite a few minutes, it decided that I needed libxml2. At first, I tried to disable this – at which point (well, after starting the script again and letting it run for a few minutes) it complained that without libxml2 it wouldn’t be able to have any DOM support. I don’t know whether toppy-web requires DOM support, but it seemed like an important thing to be without. So, I decided to download libxml2 and build it. In retrospect, I should probably have looked through the toppy-web source to see whether I really needed it…

libxml2 fairly quickly announced that it needed yacc. That’s not particularly unreasonable, and I was slightly surprised not to have it. However, it was yet another step. Fortunately, byacc built and installed easily enough to not deserve its own paragraph here. Hurrah – finally I could configure and build libxml2. You wouldn’t expect that to take too long, would you? XML isn’t that hard. Hmm. It wasn’t actually difficult but I felt very sorry for the Linkstation afterwards. Most of the C files had to be compiled twice for some reason (I didn’t look at the depths of why – something to do with how the libraries are deployed, I believe) and some of them are large. Very large. xmlschemas.c is over 28,000 lines long, and the (fortunately auto-generated) testapi.c is over 50,000 lines long. C compilers are slow beasts at the best of times, and this is on a box which was really only meant to run samba. It took ages. Not only that, but the first time I had to link php with it, it failed. No idea whose fault that was (mine, php or libxml2 but it was frustrating.) Anyway, back to php

With libxml2 built, I set php configuring and building – with the three features enabled which I knew I had to have (thanks to the lighttpd docs) and which all sound like they should be enabled by default. It got there in the end, so by the time I went to bed (very late) I had a working php-enabled web server.

Tonight, I decided to find a command-line FTP client for Linux. After finding a very useful Linux FTP client comparison page I decided to plump for ncftp. After the previous night’s frustrations, I was somewhat nervous. Fortunately, my fears were unwarranted. More than that – the authors of ncftp deserve awards.

The readme file is suitably undaunting. The configure script runs reasonably quickly. So far, so unremarkable… but the make file. Oooh… instead of showing you the exact command it’s running at any time (which in some cases for the previous packages was over a screenful for a single link, and frequently 5 or 6 lines per compile), it just tells you what it’s doing (e.g. Compiling c_utime.c... and has a colour-coded [OK] (in the same way that most Linux distributions do for starting and stopping services) for each build step. Green means a clean build, yellow means there were warnings (which are printed). At the end of the build, it shows you the files it’s built. At the end of make install it shows you the files you’ve installed. The difference in professionalism and the impression you’re left with is marked. I’ve no idea whether the ncftp guys wrote the makefile themselves or whether it’s a framework which will is generally available – but I’ve never seen anything as clean when it comes to make. Indeed, it leaves me wish that Ant builds were logged as cleanly (when they’re successful).

So, the moral of this post (which is rather longer than I’d anticipated)? Builds matter. Configuration matters. Documentation matters. There’s more to a build being good than just it working first time: giving feedback and a warm fuzzy feeling are important too. Making it look simple even if there are complex things going on behind the scenes makes life feel smoother. (I’m sure the ncftp makefile has options for seeing the commands themselves if you really need to.) I’ve understood the importance of a good build system before, but usually in terms of a developer having all the options they need. In the open source world, particularly on Linux where in many cases the end user will have to build the package in order to run it on their custom devices, the build is just as much a part of “ease of use” as the product itself – and if a user falls at the first hurdle, they’ll never see your pretty UI.