Category Archives: Java

The Beauty of Closures

Fairly soon I’m going to write a blog post comparing the different proposals under consideration for Java 7 when it comes to closures. I thought it would be worth writing some background material on it first though, so I’ve put an article on the C# in Depth site.

I’m not entirely comfortable that I’ve captured why they’re important, but the article can mature over time like a good wine. The idea is to counter the Blub Paradox (which I hadn’t come across before, but I agree with completely – it’s part of the effect I believe Steve McConnell was fighting against when talking about programming in a language).

Programming “in” a language vs programming “into” a language

I’m currently reading Steve McConnell’s Code Complete (for the first time – yes, I know that’s somewhat worrying) and there was one section was disturbed me a little. For those of you with a copy to hand, it’s in section 4.3, discussing the difference between programming in a language and programming into a language:

Programmers who program “in” a language limit their thoughts to constructs that the language directly supports. If the language tools are primitive, the programmer’s thoughts will also be primitive.

Programmers who program “into” a language first decide what thoughts they want to express, and then they determine how to express those thoughts using the tools provided by their specific language.

Now don’t get me wrong – I can see where he’s coming from, and the example he then provides (Visual Basic – keeping the forms simple and separating them from business logic) is fine, but he only seems to give one side of the coin. Here’s a different – and equally one-sided – way of expressing the same terms:

Programmers who program “in” a language understand that language’s conventions and idioms. They write code which integrates well with other libraries, and which can be easily understood and maintained by other developers who are familiar with the language. They benefit from tools which have been specifically designed to aid coding in the supported idioms.

Programmers who program “into” a language will use the same ideas regardless of their target language. If their style does not mesh well with the language, they will find themselves fighting against it every step of the way. It will be harder to find libraries supporting their way of working, and tools may well prove annoying. Other developers who come onto the project later and who have experience in the language but not the codebase will find it hard to navigate and may well accidentally break the code when changing it.

There is a happy medium to be achieved, clearly. You certainly shouldn’t restrict your thinking to techniques which are entirely idiomatic, but if you find yourself wanting to code in a radically different style to that encouraged by the language, consider changing language if possible!

If I were attacking the same problem in C# 1 and C# 3, I could easily end up with radically different solutions. Some data extraction using LINQ in a fairly functional way in C# 3 would probably be better solved in C# 1 by losing some of the functional goodness than by trying to back-port LINQ and then use it without the benefit of lambda expressions or even anonymous methods.

Accents and Conventions

That’s just between different versions of the same language. Between different actual languages, it can get much worse. If you’ve ever seen Java code written in a C++ style or vice versa, you’ll know what I mean. I’ve previously referred to this in terms of speaking a language with an accent – you can speak C# with a Java accent just as you can speak French with an English accent. Neither is pleasant.

At the lowest level, this is likely to be about conventions – and I’m pretty sure that when Steve writes “Invent your own coding conventions, standards, class libraries, and other augmentations” he doesn’t actually mean us to do it in a gratuitous fashion. It can be worth deviating from the “platform favoured” conventions sometimes, particularly if those differences are invisible to clients, but it should always be done with careful consideration. In a Java project I worked on a few years ago, we took the .NET naming conventions for interfaces (an I prefix) and constants (CamelCasing instead of SHOUTY_CAPS). Both of these made the codebase feel slightly odd, particularly where Java constants were used near our constants – but I personally found the benefits to be worth the costs. Importantly, the whole team discussed it before making any decisions.

Design Patterns

At a slightly higher level, many design patterns are just supported much, much better by some languages than others. The iterator pattern is a classic example. Compare the support for it from Java 6 and C# 2. On the “client” side, both languages have specific syntax: the enhanced for loop in Java and the foreach loop in C#. However, there is one important difference: if the iterator returned by GetEnumerator implements IDisposable (which the generic form demands, in fact) C# will call Dispose at the end of the loop, no matter how that occurs (reaching the end of the sequence, breaking early, an exception being thrown, etc). Java has no equivalent of this. Imagine that you want to write a class to iterate over the lines in a file. In Java, there’s just no safe way of representing it: you can make your iterator implement Closeable but then callers can’t (safely) use the enhanced for loop. You can make your code close the file handle when it reaches the end, but there’s no guarantee that will happen.

Then consider the “server” side of the iterator – the code actually providing the data. Java is like C# 1 – there’s no specific support for implementing an iterator. In C# 2 and above, iterator blocks (i.e. methods with yield statements) make life much, much easier. Writing iterators by hand can be a real pain. Reading a file line by line isn’t too bad, leaving aside the resource lifetime issue – but the complexity can balloon very quickly. Off by one errors are really easy to introduce.

So, if I were tackling a project which required reading text files line by line in various places, what would I do? In Java, I would take the reasonably small hit of a while loop in each place I needed it. In C# I’d write a LineReader class (if I didn’t already have one!) and use a more readable foreach loop. The contortions involved in introducing that idea into Java just wouldn’t be worth the effort.

At a much higher level, we get into whole programming styles and paradigms. If your natural inclination is to write imperative code, you’re likely to create a mess (or get very frustrated) in a functional language. If the problem really does call for a functional language, find someone else to help you think in a more functional way. If the problem suits imperative programming just as well as it does functional programming, see if you can change the environment to something more familiar.

Conclusion

I’m not suggesting that Steve’s point isn’t valid – but he’s done his readers a disservice by only presenting one side of the matter. Fortunately, the rest of the book (so far) is excellent and humbling – to such a degree that this minor quibble stuck out like a sore thumb. In a book which had more problems, I would probably barely have even noticed this one.

There’s another possibility, of course – I could be competely wrong; maybe I’ve been approaching problems from a restrictive viewpoint all this time. How about you?

Macros, and languages within languages

Ian Griffiths mailed me about macros, and explained how LISP macros were very different to C/C++ macros, working at a language level instead of at a text level. I won’t pretend to understand all about what would be possible and what wouldn’t, but Ian gave a good example: query expressions in C# 3. Instead of being part of the language itself, they could apparently have been written as macros, if C# supported them. Then if you wanted to have similar support for different forms of expression, you could just write your own macro library.

Assuming that’s what people are actually requesting, I can certainly see the attraction – but I’d still prefer it if C# didn’t go down that route. I’ll go back to C++ for the guts of the reason why, but it’s not really about macros at this point. It’s about building your own language. Once, someone told me that C++ wasn’t a language – it was a meta-language; no-one used “bare” C++, they worked out their own language made from the building blocks of normal C++, and then used that.

That may or may not be true – or more likely, it’s true in some places but not others – but it scares me as an idea. I’m not going to claim I know every nuance of C#, but it’s pretty rare that you’d throw a line at me without it being reasonably clear what’s going to happen and why, at the language level. Extension methods might mean a bit more information is required as to where a particular method comes from, but it doesn’t take a lot of digging to see what’s going on.

Now imagine that C# 3 didn’t include query expressions, but that someone had come up with them as a macro library. It’s not an insignificant amount of effort to learn what’s going on there, and how it all maps to normal method calls, potentially with expression trees as arguments instead of delegates. Until you understand what’s going on at a reasonably deep level, you can’t really make any firm decisions as to what code including a query expression will do. (Heck, that’s one of the premises of the book: you should really know this stuff, or at least be aware of it.)

That’s fine when there’s a single macro library used globally, but now imagine every company has their own – or worse still, has a bunch of them grabbed from Code Project, possibly including a load of bugs. Most of us aren’t accomplished language designers, and I suspect there’d be an awful lot of macro libraries out there which weren’t quite thought through enough – but were still useful enough to be attractive. They’d become magnets for code warts.

It’s hard enough when you change company to work out what 3rd party libraries are in use, how they’re being used, what the coding conventions are etc. It would be much worse if I had to learn another flavour of C# itself each time. I’m already worried that developers are picking up C# 3 without having a firm enough grasp of C# 2 – and that’s when there’s just a progression within a single language.

I know this all sounds patronising and/or elitest and/or “nanny state” applied to programming languages – but it’s how I feel nonetheless. I just don’t think we (as a development community) are mature enough to handle that sort of power without it turning into a blood bath. This sort of thing sounds fabulous for hobby and research development, and would probably be great in the hands of the best few companies in the world – but I don’t think it’s a good idea for the mainstream.

Okay – time to hear why I’m wrong :)

Java isn’t an acronym

Just a quickie while I remember. A pet peeve of mine has surfaced again recently, while reading some CVs.

Java, the programming language, is just written “Java”. It’s not an acronym. There’s no need to write it as “JAVA”. That just looks shouty and somewhat silly. Why do so many people get it wrong? While we’re at it, why does it irritate me so much to see it written the wrong way?

Why hasn’t Microsoft bought JetBrains yet?

For those of you who aren’t aware, JetBrains is the company behind IntelliJ IDEA, the Java IDE which I’ve heard amazing things about (I’ve tried it a couple of times but never got into it – I think I need an expert sitting beside me to point out the cool stuff as I go) and ReSharper, the incredibly useful (although somewhat resource hungry) add-in to Visual Studio that turns it into a respectable IDE.

What would happen if Microsoft bought JetBrains?

I’m sure that killing off the reportedly best Java IDE would do .NET no harm (even if it would be a fairly cruel thing to do, and still leave other perfectly good IDEs in the Java space), and surely they could use the ideas and experience of the company to improve Visual Studio significantly. I strongly suspect that tighter integration could make all the ReSharper goodness available with less performance overhead, and while it’s no doubt too late now, wouldn’t it have been wonderful for all of those features to be available in Orcas?

Anyway, just a thought.

Sheer Evil: Rethrowing exceptions in Java

This morning, I was looking through some code and I was annoyed (yet again) at Java’s exception hierarchy, particularly when it comes to checked exceptions. Just as a reminder, everything that can be thrown in Java derives from Throwable. The predefined direct subclasses are Error and Exception. (You can derive from Throwable directly yourself, but I’ve never seen anyone do it, thank goodness.) Exception, and any class deriving from it, count as a checked exception – one that you have to declare if your method might throw it. Oh, except for RuntimeException, and its descendants such as NullPointerException. Blech.

This is really painful in some situations. In particular, the code I was looking at wanted to catch everything, act on it, and then rethrow it. My method was declared to throw IOException, and without the catch block everything compiled fine, so I knew that nothing I was calling should throw any checked exceptions other than IOException. However, rethrowing the exception is tricky – because the compiler doesn’t know what you’re up to. I ended up with this foul code:

try
{
    // Stuff here
}
catch (Throwable t)
{
    // Log the error (or whatever)
            
    // Now rethrow
    if (t instanceof IOException)
    {
        throw (IOException) t;
    }
    if (t instanceof RuntimeException)
    {
        throw (RuntimeException) t;
    }
    if (t instanceof Error)
    {
        throw (Error) t;
    }
    // Very unlikely to happen
    throw new RuntimeException(t);
}
finally
{
    // More stuff here
}

Nasty, isn’t it? It would be lovely to somehow tell the compiler that you know there won’t be any other kinds of checked exceptions thrown, just rethrow the original, it’s all right guv, you can trust me, honest.

Well, apparently you can’t really trust me. Not since the hack I worked out this morning. You see, exception checking only occurs at compile time. So, let’s define a really harmless little class called ExceptionHelper:

public class ExceptionHelper
{
    /** Private constructor to prevent instantiation */
    private ExceptionHelper()
    {
    }
    
    public static void rethrow (Throwable t)
    {
    }
}

Nothing nasty going on, is there? So the compiler won’t mind at all if I change the original code to:

try
{
    // Stuff here
}
catch (Throwable t)
{
    // Log the error (or whatever)
            
    // Now rethrow
    ExceptionHelper.rethrow(t);
}
finally
{
    // More stuff here
}

The only trouble is, it doesn’t rethrow the exception any more, regardless of the name of the method. But as I suspect you’ve guessed by now, once we’ve satisfied the compiler, we can change ExceptionHelper.rethrow slightly:

public static void rethrow (Throwable t) throws Throwable
{
    throw t;
}

Recompile ExceptionHelper but not the calling code and we achieve exactly what we want – it will rethrow whatever exception we ask it to, and we’ve fooled the compiler into not worrying about the potential consequences. Of course, this means we could change the code in the try block to something which throws a completely different checked exception, and we’d never know until it happened – the compiler couldn’t help us. The workaround for this is to temporarily remove the catch block and see whether or not the compiler complains.

I’m not actually suggesting anyone should do this, despite a certain appeal in terms of simpler, more readable code in the catch block. A hack like this is horrible, evil, awful. Which is why I had to share it, of course.

Wacky Ideas 3: Object life-cycle support

No, don’t leave yet! This isn’t another article about non-deterministic finalization, RAII etc. That’s what we almost always think of when someone mentions the object life-cycle, but I’m actually interested in the other end of the cycle – the “near birth” end.

We often take it as read that when an object’s constructor has completed successfully, the object should be ready to use. However, frameworks and technologies like Spring and XAML often make it easier to create an object and then populate it with dependencies, configuration etc. Yes, in some cases it’s more appropriate to have a separate configuration class which is used for nothing but a bunch of properties, and then the configuration can be passed into the “real” constructor in one go, with none of the readability problems of constructors taking loads of parameters. It’s all a bit unsatisfactory though.

What we most naturally want is to say, “Create me an empty X. Now configure it. Now use it.” (Okay, and as an obligatory mention, potentially “Now make it clean up after itself.”)

While configuring the object, we don’t want to call any of the “real” methods which are likely to want to do things. We may want to be able to fetch some of the configuration back again, e.g. so that some values can be relative to others easily, but we don’t want the main business to take place. Likewise, when we’ve finished configuring the object, we generally want to validate the configuration, and after that we don’t want anyone to be able to change the configuration. Sometimes there’s even a third phase, where we’ve cleaned up and want to still be able to get some calculated results (the byte array backing a MemoryStream, for instance) but not call any of the “main” methods any more.

I’d really like some platform support for this. None of it’s actually that hard to do – just a case of keeping track of which phase you’re in, and then adding a check to the start of each method. Wouldn’t it be nicer to have it available as attributes though? Specify the “default phase” for any undecorated members, and specify which phases are valid for other members – so configuration setters would only be valid in the configuration phase, for instance. Another attribute could dictate the phase transition – so the ValidateAndInitialize method (or whatever you’d call it) would have an attribute stating that on successful completion (no exceptions thrown) the phase would move from “configure” to “use”.

Here’s a short code sample. The names and uses of the attributes could no doubt be improved, and if there were only a few phases which were actually useful, they could be named in an enum instead, which would be neat.

[Phased(defaultRequirement=2, initial=1)]
class Sample
{
    IAuthenticator authenticator;
    
    public IAuthenticator Authenticator
    {
        [Phase(1)]
        [Phase(2)]
        get
        {
            return authenticator;
        }
        [Phase(1)]
        set
        {
            authenticator = value;
        }
    }
    
    [Phase(1)]
    [PhaseTransition(2)]
    public void ValidateAndInitialize()
    {
        if (authenticator==null)
        {
            throw new InvalidConfigurationException("I need an authenticator");
        }
    }
    
    public void DoSomething()
    {
        // Use authenticator, assuming it's valid
    }
    
    public void DoSomethingElse()
    {
        // Use authenticator, assuming it's valid
    }
}

Hopefully it’s obvious what you could and couldn’t do at what point.

This looks to me like a clear example of where AOP should get involved. I believe that Anders isn’t particularly keen on it, and when abused it’s clearly nightmarish – but for certain comment things, it just makes life easier. The declarative nature of the above is simpler to read (IMO – particularly if names were used instead of numbers) than manually checking the state at the start of each method. I don’t know if any AOP support is on the slate for Java 7 – I believe things have been made easier for AOP frameworks by Java 6, although I doubt that any target just Java 6 yet. We shall have to see.

One interesting question is whether you’d unit test that all the attributes were there appropriately. I guess it depends on the nature of the project, and just how thoroughly you want to unit test. It wouldn’t add any coverage, and would be hard to exhaustively test in real life, but the tests would be proving something…

Wacky Ideas 2: Class interfaces

(Disclaimer: I’m 99% sure I’ve heard someone smarter than me talking about this before, so it’s definitely not original. I thought it worth pursuing though.)

One of the things I love about Java and C# over C/C++ is the lack of .h files. Getting everything in the right place, only doing the right things in the right files, and coping with bits being included twice etc is a complete pain, particularly if you only do it every so often rather than it being part of your everyday life.

Unfortunately, as I’ve become more interface-based, I’ve often found myself doing effectively the same thing. Java and C# make life a lot easier than C in this respect, of course, but it still means duplicating the method signatures etc. Often there’s only one implementation of the interface – or at least one initial implementation – but separating it out as an interface gives a warm fuzzy feeling and makes stubbing/mocking easier for testing.

So, the basic idea here is to extract an interface from a class definition. In the most basic form:

class interface Sample
{
    public void ThisIsPartOfTheInterface()
    {
    }
    
    public void SoIsThis()
    {
    }
    
    protected void NotIncluded()
    {
    }
    
    private void AlsoNotIncluded()
    {
    }
}

So the interface Sample just has ThisIsPartOfTheInterface and SoIsThis even though the class Sample has the extra methods.

Now, I can see a lot of cases where you would only want part of the public API of the class to contribute to the interface – particularly if you’ve got properties etc which are meant to be used from an Inversion of Control framework. This could either be done with cunning keyword use, or (to make fewer syntax changes) a new attribute could be introduced which could decorate each member you wanted to exclude (or possibly include, if you could make the default “exclude” with a class-level attribute).

So far, so good – but now we’ve got two types with the same name. What happens when the compiler runs across one of the types? Well, here’s the list of uses I can think of, and what they should do:

  • Variable declaration: Use the interface
  • Construction: Sse the class
  • Array declaration/construction: Use the interface (I think)
  • typeof: Tricky. Not sure. (Note that in Java, we could use Sample.class and Sample.interface to differentiate.)
  • Type derivation: Not sure. Possibly make it explicit: “DerivedSample : class Sample” or “DerivedSample : interface Sample
  • Generics: I think this would depend on the earlier “not sure” answers, and would almost certainly be complicated

As an example, the line of code “Sample x = new Sample();” would declare a variable x of the interface type, but create an instance of the concrete class to be its initial value.

So, it’s not exactly straightforward. It would also violate .NET naming conventions. Would it be worth it, over just using an “Extract Interface” refactoring? My gut feeling is that there’s something genuinely useful in here, but the complications do seem to overwhelm the advantages.

Perhaps the answer is not to try to have two types with the same name (which is where the complications arise) but to be able to explicitly say “I’m declaring interface ISample and implementing it in Sample” both within the same file. At that point it may be unintuitive to get to the declaration of ISample, and seeing just the members of it isn’t straightforward either.

Is this a case where repeating yourself is fundamentally necessary, or is there yet another way of getting round things that I’m missing?

Wacky Ideas 1: Inheritance is dead, long live mix-ins!

(Warning: I’ve just looked up “mix-in” on Wikipedia and their definition isn’t quite what I’m used to. Apologies if I’m using the wrong terminology. What I think of as a mix-in is a proxy object which is used to do a lot of the work the class doing the mixing says it does, but preferably with language/platform support.)

I’ve blogged before about my mixed feelings about inheritance. It’s very useful at times, but the penalty is usually very high, and if you’re going to write a class to be derived from, you need to think (and document) about an awful lot of things. So, how about this: we kill of inheritance, but make mix-ins really easy to write. Oh, and I’ll assume good support for closures as well, as a lot can be done with the Strategy Pattern via closures which would otherwise often be done with inheritance.

So, let’s make up some syntax, and start off with an example from the newsgroups. The poster wanted to derive from Dictionary<K,V> and override the Add method to do something else as well as the normal behaviour. Unfortunately, the Add method isn’t virtual. One poster suggested hiding the Add method with a new one – a solution I don’t like, because it’s so easy for someone to break encapsulation by using an instance as a plain Dictionary<K,V>. I suggested re-implementing IDictionary<K,V>, having a private instance of Dictionary<K,V> and making each method just call the corresponding one on that, doing extra work where necessary.

Unfortunately, that’s a bit ugly, and for interfaces with lots of methods it can get terribly tedious. Instead, suppose we could do this:

using System.Collections.Generic;

class FunkyDictionary<K,V> : IDictionary<K,V>
{
IDictionary<K,V> proxyDictionary proxies IDictionary<K,V>;

void IDictionary<K,V>.Add(K key, V value)
{
// Do some other work here

proxyDictionary.Add(key, value);

// And possibly some other work here too
}
}

Now, that’s a bit simpler. To be honest, that kind of thing would cover most of what I use inheritance for. (Memo to self: write a tool which actually finds out how often I do use inheritance, and where, rather than relying on memory and gut feelings.) The equivalent of having an abstract base class and overriding a single method would be fine, with a bit of care. The abstract class could still exist and claim to implement the interface – you just implement the “missing” method in the class which proxies all the rest of the calls.

The reason it’s important to have closures (or at least delegates with strong language support) is that sometimes you want a base class to be able to very deliberately call into the derived class, just for a few things. For those situations, delegates can be provided. It achieves the same kind of specialization as inheritance, but it makes it much clearer (in both the base class and the “derived” one) where the interactions are.

One point of interest is that without any inheritance, we lose the benefits of a single inheritance tree – unless object becomes a general “any reference”, which is mostly what it’s used for. Of course, there are a few methods on System.Object itself which we’d lose. Let’s look at them. (Java equivalents aren’t specified, but Java-only ones are):

  • ToString: Not often terribly useful unless it’s been overridden anyway
  • GetHashCode/Equals: Over time I’ve been considering that it may have been a mistake to make these generally available anyway; when they’re not overridden they tend to behave very differently to when they are. Wrapping the existing behaviour wouldn’t be too hard when wanted, but otherwise make people use IEquatable<T> or the like
  • GetType: This is trickier. It’s clearly a pretty fundamental kind of call which the CLR will have to deal with itself – would making it a static (natively implemented) method which took an object argument be much worse?
  • MemberwiseClone: This feels “systemy” in the same way as GetType. Could something be done such that you could only pass in “this“? Not a terribly easy one, unless I’m missing something.
  • finalize (Java): This could easily be handled in a different manner, similar to how .NET does.
  • wait/notify/notifyAll (Java): These should never have been methods on java.lang.Object in the first place. .NET is a bit better with the static methods on the Monitor class, but we should have specific classes to use for synchronization. Anyway, that’s a matter I’ve ranted about elsewhere.

 

What are the performance penalties of all of this? No idea. Because we’d be using interfaces instead of concrete classes a lot of the time, there’d still be member lookup even if there aren’t any virtual methods within the classes themselves. Somehow I don’t think that performance will be the reason this idea is viewed as a non-starter!

Of course, all of this mix-in business relies on having an interface for everything you want to use polymorphically. That can be a bit of a pain, and it’s the subject of the next article in this series.

What would make a good Java book?

So, Groovy in Action has been out for a little while, and I’m missing it – or rather, book writing. I’d like my next project to be a solo effort, almost certainly on Java. However, I’m interested in hearing what you good folks think would make a good Java book. I’ve got some ideas myself, but I’d rather hear unprejudiced opinions first. (I may be soliciting more feedback at a later date, of course.) So, shoot – what would you like me to write about?