Some readers may already be aware of Project LINQ – .NET Language Integrated Query. There have been various posts about it (and in particular the effect it has on C# itself) on the C# newsgroup, and many of those have involved a certain amount of speculation. This is understandable, as it’s still very much under development, and although a sample initial implementation has been available since the PDC, I’d be surprised if things didn’t change.
I haven’t downloaded the PDC implementation, and don’t intend to for a while. I have, however, finally had a chance to read the specs and overviews of LINQ, DLinq, XLinq and C# 3.0. I haven’t pored over them, and again don’t intend to just yet – I believe that features like these are best investigated in anger. Once I have a use for them, I’ll probably look more closely at them. However, here is my (literally – I’m typing in a plane here) 34,000 foot view of what’s going on. This won’t include any code samples, not even ones copied from the documentation, so you may well find it useful to read the Microsoft documents before proceeding much further.
The basic premise
Imperative programming languages such as C, C++, Java, C#, VB and VB.NET have traditionally made it hard to work with relational databases, and made it possibly even harder to work with XML. The latter should come as a surprise in a way – XML was designed much later than SQL, and should have benefitted from a lot of hindsight in terms of technology design. Don’t get me wrong – I’m not anti-XML per se, but the APIs for working with it have generally sucked, particularly the W3C ones. There are half-decent APIs available in .NET, and various open source libraries in Java-land such as Dom4J and JDom which improve the situation, but don’t feel like they’ve quite cracked it.
Both XML and relational databases have what is commonly called an impedance mismatch with object-oriented languages – they just don’t think about things in the same way. Database tables are related to each other rather than entities being composed of each other, and object identity and equality pose really significant problems when trying to map data from one paradigm to the other. Just as with XML manipulation, there have been various attempts to solve (or at least greatly help) the mismatch problem, often with libraries or tools called Object Relational Mappers (ORM). There are many different ORM tools available – probably more for Java than .NET, possibly due to the longer timescale of availability of Java. Beyond the sphere of Java and .NET, I only know of one other ORM tool, which is ActiveRecord for Ruby, usually used with the Rails framework for web applications. I’m sure there are plenty of others available though – no need to comment on my ignorance here!
The Powers-That-Be in the Java world are trying to semi-standardise ORM with the EJB 3.0 specification, which I believe is currently at the public review stage. In theory, this should mean that marking up some Java classes with annotations (attributes to .NET folks) will allow the same classes to be used with multiple ORMs. I suspect that this facility won’t be used for multiple-ORM support very often, as you tend to need to know which ORM you’re targetting for other reasons, but it does mean that the bigwigs of the ORM world have got together to try to work out at least a common way of talking about ORM. I should say at this point that the only ORM I’ve had any real experience with is Hibernate, which is generally excellent, although rough around the edges in a few places. It has a .NET equivalent, NHibernate, which I haven’t used.
So, what does this have to do with LINQ? Well, all of the above projects and tools have a problem – you don’t tend to get much language support for them, as they’ve had to work with the language features available to them, rather than directing the future of the languages they target. LINQ, on the other hand, has added many new features to C# (and VB 9.0) which should greatly add to the potential safety and efficiency of the solution. LINQ is neither XML-specific nor SQL-specific, but instead introduces various language features which target general querying, with DLinq and XLinq hooking those up to SQL and XML respectively. It is worth noting at this point that LINQ itself doesn’t try to be an ORM system at all – only half of DLinq is particularly LINQ-related, and that’s (naturally) the querying side. The update side of things requires no new language features, and looks somewhat like using Hibernate with annotations, at least at first glance.
The language features
So, what new language features does LINQ effectively require? I’m only going to cover the C# side of things here – VB 9.0 has gained some features supporting XLinq (of somewhat dubious value at a first glance, but I’ll let VB fans work out whether they’re actually good or not), but I won’t address how the new VB looks.
- Lambda expressions: these look pretty powerful and (more importantly) useful – and not just as part of LINQ. Whether expression trees (where a lambda expression ends up as data rather than compiler code) will be good or not (at least outside LINQ, which I suspect pretty much requires them) remains to be seen. Again though, there’s a lot of potential.
- Extension methods: ouch. I can see why it’s being done, and I can see why it will be potentially very useful, but I suspect it will be abused hideously. The worst thing is, I can see times when I might well abuse it and like it at the time – System.String doesn’t have a method which would be handy? Heck, make it an extension. Fine at write-time, but not so fun at maintenance time. This is just a gut feeling at this stage, but I’m frankly a little worried. If my team were using C# 3.0, I’d want any language extensions to be code-reviewed by two people (other than the original developer) rather than just one (or if pair programming was going on, at least one extra pair of eyes).
- Object initializers: yes, yes, yes. Great stuff.
- Anonymous types: these could be interesting. They certainly make sense in LINQ, but they potentially have use outside it too. How many times have you had local variables which were essentially linked to each other, but that relationship could only be expressed through naming? I’m thinking of situations like searching for the minimum element in a collection (which admittedly LINQ would make a piece of cake anyway) – you typically want to remember the minimum value and the index of the element which had that value. Coupling the two into a real type is too much effort, but with an anonymous type? There are possibilities there. I’ll have to see how it reads and how maintainable it is before I can really pass judgement.
- Implicitly typed arrays: yes and no. Part of me screams that it’ll make things easier – and part screams that it’ll make things harder to read at a glance. I think “use with care” is the watch-phrase here.
- Implicitly typed local variables: Hmm. Very much “use with care”. I don’t want to see variables being implicitly typed all over the place, as they are in the specifications. They’re obviously necessary for anonymous types, but I’m not sure about their use outside that situation. There’s been a fair amount of discussion about this on the newsgroups, with no clear “winning” side. I think we’re all guessing really – we need to use this in anger, and wait for a year or two to see what the maintenance implications are.
- Query expressions themselves: not sure. I can see why it’s nice to have them in the language – particularly having gone through an HQL (Hibernate Query Language) run/see error/debug/repeat cycle a few times, but at the same time it feels like it’s going a bit overboard. I think I’ll need to see real examples from real code in both query syntax and “dot notation” (calling normal methods) before making up my mind on this. I should note that this attitude is significantly more “friendly” towards query expressions than my first reactions, which were along the lines of “no, no, no, get that SQL out of my C# code.” That suggests I may well come to like it over time – but I’m not quite there yet.
Concerns
I’m worried about C# expanding this quickly. I don’t know what the C# 3.0 timeframe is, but C# 2.0 isn’t out of the door yet, and we don’t know how developers are going to cope with generics in the real world yet. Introducing several new and pretty major language features at this stage seems premature – although of course they’re not really being introduced just yet. Java has tended to go in the opposite direction, only allowing change very slowly. This hasn’t always had positive results (there are some aspects of generics in Java which are truly awful compared with .NET generics – although it has its advantages) but it has generally given a lot of time for people to think about things and give feedback. Hopefully the reason for the C# 3.0 draft specs being made available at this stage is to get as much feedback as possible as early as possible.
Having said I’m worried about C# growing too quickly, there are ways in which I wish it would grow which don’t sem to be addressed at all – including ones which are present in prototype form in MS research labs (Spec# in particular). Things I’d like to see:
- Simpler property definition, for properties which really are just setting variables and returning them – making thread-safety available by specifying a lock to apply would be nice too, if it didn’t interfere with the simplicity too much.
- Design-by-contract – not so much for the “provability” side of things (which I’m sure would be great too, but which I have no direct experience of) but more for getting rid of all the irritating code I need to write just to verify arguments etc. This is an ideal target for machine-generated code. Proving the result side of the contract would be great too – not just “this parameter must not be null” but “the result of this operation is never null”.
- Aspect-oriented programming – in a very similar vein to design-by-contract, I’m sure that AOP could have great benefits for cross-cutting concerns, and would work much better as a language construct than in libraries which need to do nasty code manipulation.
I’m sure there are more that I’ve thought of over the years, but these are my biggest gripes at the moment. Compared with the changes which are being made, they’re possibly relatively small, too. You can bet that I’ll be asking the C# team about the possibility of their inclusion while I’m at the MVP summit! (Don’t expect any results to be posted here though – I’m afraid it’ll almost certainly all be under NDA.)
Conclusion
However far away C# 3.0 may be, it has great promise – as well as a few big holes which the over-zealous developer wishing to use new features wherever possible may end up falling into. We’ll see how things shape up over time. My battery is running low, so until I’m near power again, goodbye…