Macros, and languages within languages

February 8, 2008 jonskeet 12 Comments

Ian Griffiths mailed me about macros, and explained how LISP macros were very different to C/C++ macros, working at a language level instead of at a text level. I won’t pretend to understand all about what would be possible and what wouldn’t, but Ian gave a good example: query expressions in C# 3. Instead of being part of the language itself, they could apparently have been written as macros, if C# supported them. Then if you wanted to have similar support for different forms of expression, you could just write your own macro library.

Assuming that’s what people are actually requesting, I can certainly see the attraction – but I’d still prefer it if C# didn’t go down that route. I’ll go back to C++ for the guts of the reason why, but it’s not really about macros at this point. It’s about building your own language. Once, someone told me that C++ wasn’t a language – it was a meta-language; no-one used “bare” C++, they worked out their own language made from the building blocks of normal C++, and then used that.

That may or may not be true – or more likely, it’s true in some places but not others – but it scares me as an idea. I’m not going to claim I know every nuance of C#, but it’s pretty rare that you’d throw a line at me without it being reasonably clear what’s going to happen and why, at the language level. Extension methods might mean a bit more information is required as to where a particular method comes from, but it doesn’t take a lot of digging to see what’s going on.

Now imagine that C# 3 didn’t include query expressions, but that someone had come up with them as a macro library. It’s not an insignificant amount of effort to learn what’s going on there, and how it all maps to normal method calls, potentially with expression trees as arguments instead of delegates. Until you understand what’s going on at a reasonably deep level, you can’t really make any firm decisions as to what code including a query expression will do. (Heck, that’s one of the premises of the book: you should really know this stuff, or at least be aware of it.)

That’s fine when there’s a single macro library used globally, but now imagine every company has their own – or worse still, has a bunch of them grabbed from Code Project, possibly including a load of bugs. Most of us aren’t accomplished language designers, and I suspect there’d be an awful lot of macro libraries out there which weren’t quite thought through enough – but were still useful enough to be attractive. They’d become magnets for code warts.

It’s hard enough when you change company to work out what 3rd party libraries are in use, how they’re being used, what the coding conventions are etc. It would be much worse if I had to learn another flavour of C# itself each time. I’m already worried that developers are picking up C# 3 without having a firm enough grasp of C# 2 – and that’s when there’s just a progression within a single language.

I know this all sounds patronising and/or elitest and/or “nanny state” applied to programming languages – but it’s how I feel nonetheless. I just don’t think we (as a development community) are mature enough to handle that sort of power without it turning into a blood bath. This sort of thing sounds fabulous for hobby and research development, and would probably be great in the hands of the best few companies in the world – but I don’t think it’s a good idea for the mainstream.

Okay – time to hear why I’m wrong :)

12 thoughts on “Macros, and languages within languages”

Bruce Wood says:

February 8, 2008 at 7:25 pm

One of the things I love about C# is that it does NOT have macros. I spent 15 years programming in C, and most of those 15 years hating macros.

I didn’t hate them for any intrinsic quality they had; I hated them because they were endlessly abused. Even the innocuous-looking little MIN and MAX macros, which pretty-much everyone wrote at every company, had their little trap for the unwary.

If C macros are annoying, then C++ is macro hell. People ten times more clever than I’ll ever be designed libraries full of devlishly complicated macros, which were great when they worked, and impossible to debug when they didn’t.

Figuring out how someone else’s regular, object-oriented code works is difficult enough, thanks. When text substitution gets involved, I just turn out the lights and go home.

I don’t think it’s elitist or patronizing to dumb down a language. There’s a place for that, and C# is the place. It’s the C-style language for the masses, for business, where what counts is not how clever your code is but getting the job done, and leaving something that the next guy can maintain. Too many programmers can’t resist the temptation to be clever (“Look what I made!”), and the last thing I want when I have business deadlines on my back is to run across someone else’s extreme cleverness.

As I said, one of the things I like about C# is that it gets in the way of that kind of cleverness, both on the part of others and on my part (when I occasionally lose sight of what really matters).

If you want a language for clever, tricky, tight code, then there’s always C++… fill yer boots!

LikeLike

Reply
Barry Kelly says:

February 8, 2008 at 8:31 pm

If you replace the phrase ‘macro library’ with ‘library’, ‘language designers’ with ‘library designers’, etc., one could make a similar argument. And what refutes that argument which fails for macros?

I do agree that you can make a bigger fool out of yourself with more powerful tools. That brings in another point: it’s a *lot* harder to make macro support that has good tooling support, such as intellisense and debugging in particular.

My instinct is that the language that defines the macros needs to be a separate language, or at least a very distinct module, to that of the main code that’s parsed and translated by the modules. Writing macros shouldn’t be done idly. Also, they ought to be done with some kind of pattern-matching language, or other declarative scheme, such that they can be strongly machine-checked for self-consistency: for example, they could be modeled as a parser->tree function and tree->tree function tuple, and type-checked such that there are no typing violations with the transformations.

As you may or may not be able to tell, I’ve been thinking about this for some time. I’ve tried to get the ideas worked out in code a few times before, but the size of incidental tasks has kept me back. With the DLR and composable parsers (especially parser combinators) being available these days, I can focus on the core ideas more. All I need now is more spare time :)

LikeLike

Reply
skeet says:

February 8, 2008 at 11:56 pm

Barry: while I can see where you’re going comparing “different language” with “different libraries” I believe there *is* a fundamental difference.

If you don’t know what a method does, it’s reasonably easy to look that up in isolation, knowing the manner in which it’s called. Whole new language constructs are harder to cope with.

Compare it with natural language – if I don’t know the meaning of a word, I can look it up, but if someone starts using a completely different grammar that’s a lot harder to understand, in my view.

A lot of this *is* just gut feeling, I admit – but it’s a pretty strong feeling.

LikeLike

Reply
Eric Lippert says:

February 9, 2008 at 9:38 am

It is instructive to think about why macros are so bad. It’s not just because they can be abused and the tooling support tends to be dismal, though that is certainly true.

But at a deeper level there is a fundamental problem. The macro language of C++ is the metalanguage of C++. The metalanguage is (a) a weak language, and (b) has almost nothing in common with the language. The metalanguage is basically a search-and-replace string matcher, the language is a relatively modern OO language, they have nothing in common so there is a huge mismatch between them.

It is nigh impossible to write a C++ program that manipulates C++ programs. If you want to write a program that manipulates C++ programs, practically you can only write a C++macro-language program, and that language is not really much of a programming language compared to C++.

Now switch gears and think about expression trees for a bit.

Expression trees allow you to take a chunk of C# code — a small chunk, just an expression — and have the compiler automatically give you an object which represents that expression.

That object can then be examined, manipulated, transformed, BY A C# PROGRAM. The resulting object can then be turned into a delegate which does not just _represent_ that functionality, it _implements_ it.

In short, the metalanguage for C# _expressions_ is C# and the form that the metalanguage takes is programs that create expression tree lambdas and then visit them.

Now think about what we could do if we had not just expression trees but statement trees.

Now think about what you could do with type declaration trees.

Now think about what you could do if you had programmatic access (programming as always in C# itself) to not just the finished expression/statement/declaration tree provided by the semantic analyzer. Suppose you could manipulate all stages of compilation: lexing, parsing, multi-phase semantic analysis and code generation.

At this point indeed, you potentially no longer are analyzing anything that even looks like C#. But that’s not a negative, as you state. Rather, what you I am describing is an engine for quickly and efficiently writing your own domain-specific languages which are transformed into C# programmatically. You get your own language but it is backed by the power of the C# semantic analyzer, code generator, etc.

Now think about what you could do if C#, VB, F#, JScript, Python and Ruby all had the same APIs to do all of the above and you could freely mix them at will.

To start with, the entirely artificial notion of “C# project” and “VB project” goes away forever. You just have projects with source code, and whatever compilers (and metacompilers!) you need all work together. But that’s trivial compared to the kind of work you could do with such a system.

I think I’ve just laid out about two decades of work there, and I’m certainly not guaranteeing that any of it will ever be done. We have lots of priorities for the language(s), and metaprogramming is just one of them, and not even THAT high on the list. But this is the direction that I would like to go as we explore metaprogramming in C#.

LikeLike

Reply
Ollie Riches says:

February 10, 2008 at 7:35 am

I remember having to debug and investigate problems related to macros ten years ago in CC++ and it wasn’t pleasant…

IMO macros wouldn’t be used as tool to help develop DSLs they would over used by programmers who prefer a procedural style. I also believe the use of fluent interfaces, mature class design and design by contract can lead to a better DSL implementations.

We all try and apply KISS, YAGNI, DRY etc to software design and build, why not apply these to language design?

LikeLike

Reply
JaredPar says:

February 14, 2008 at 10:37 pm

I have mixed opinions on Macros. On even days I feel they’re an elegant abstraction that adds powerfull debugability and increases code readability. On odd days I feel they are pure evil and should be abolished.

Macros are powerful and useful when used correctly and responsibly. Unfortunately they’re just as powerful in the hands of a novice as that of an expert. They take little effort to write but a lot of effort to write correctly.

Then again you could argue the same about most of C++.

LikeLike

Reply
Ian Griffiths says:

February 18, 2008 at 2:41 am

This makes me think of the history of string classes in C++. Back in the days before the C++ standard included a string class, almost every C++ project had its own string class. The only exceptions were the projects which had several of their own string classes.

Writing your own string class was practically a rite of passage for C++ developers in the 1990s. These days I think using your own would look highly suspect. Some projects still end up using several, but usually only for historical reasons – e.g., the need to deal with APIs that expect COM strings or C-style strings, or whatever.

The situation is still unsatisfactory, because historically, multiple string types have always existed in C++ and always will. But the libraries available are sufficiently good that most people don’t tend to write their own any more. (At least, not in my experience…) And the history behind the mess is that for many years we didn’t have one good library.

And more generally, considering things like collection classes, to me it looks like the tendency for people to write their own versions of common utility classes is in inverse proportion to the quality of the available standard libraries.

I would expect macro libraries to fall into the same pattern.

(And yes, I know I was the one who pointed out that LISP-style macros are massively better than C/C++-style macros. But that doesn’t mean I have concluded that I necessarily want them in C# just yet…or ever. I just felt you had given them a slightly harsh write-up. And it’s true that I find the query expression unsatisfactory – it *does* look to me like macros might have been a more elegant solution to that problem – I’m not arguing that macros should have been added to C# 3.0. I think there are two sides to the argument, and the reasons not to add macros probably outweigh the reasons to add them at this stage.)

Yes, I think you’re absolutely right – the world and his or her dog will invent a million really awful macro libraries, some of which will inexplicably become recommended articles on Codeproject. (It scares me to think what the articles that *don’t* get recommended must be like.)

But does that mean everyone would build their own query expression languages?

I think that would depend on how good the ones built into the libraries are. If you get things right first time, people are less likely to build their own – I don’t think I’ve ever come across a custom string class in real Java or C# projects, for example.

However…the problem with LISP-style macros is that we have a lot less experience with them than we do with Java/.NET-style class libraries. So I would expect the degree of maturity in a set of macro libraries written today to be more MFC than .NET Framework.

And if I had to hazard a guess about where it would go wrong, I’d say: extensibility. It’s one thing to design a macro library to support a DSL that handles exactly the scenarios you intend it to support. It’s quite another to build one that does all that, and which can also be extended elegantly by its users.

The design of just the fixed DSL is challenging enough – LINQ query expressions must have taken a lot of effort as it is. And yet I’ve run into problems with it already. (Composing expression trees using query expressions seems to be broken for no obvious reason, for example.)

I suspect the majority of experience that *does* exist with macros is at the one-person-project scale. LISP really shines here – it’s great at letting an individual be fantastically productive. But it also acquired a bit of a reputation for “write-only code”… I’m unlikely to be very productive trying to work with code written in your private LISP world and vice versa.

Moreover, I’m not aware that anyone has convincingly demonstrated viable techniques for using LISP-like macros in larger scale projects. And I think if you track the point where OO programming started to move out of the research labs and into the mainstream, even from that point it took almost 20 years to get from the level where invidual developers could see clear benefits from and make effective use of objects to the point where you could write something like the .NET Framework Class Library Guidelines. If anyone thinks macros will be any quicker to mature, I think the onus is on them to explain why…

So what we really need is for some other language to implement this first, go mainstream, then discover all the problems, so we can see how it should have been done. :)

LikeLike

Reply
skeet says:

February 18, 2008 at 7:15 am

Ian: So if I understand you right, you want Java 7 to include macros? :)

(Seriously – thanks to Eric and Ian for putting meat on the bones of my “based on inexperience” post :)

Jon

LikeLike

Reply
Kaveh Shahbazian says:

March 14, 2008 at 7:07 pm

Type-Safe synthetic extension macros are already developed successfully in .NET languages like Boo and Nemerle and F# has a nice version of them too! These macros has nothing to do with C macros that are just some kind of automated string replacement in code.
These projects has already provided enough background for actually implementing syntax extension macros for C#.
I think it is possible to have them without giving up on type-safety and debugging enhancements provided by .NET Framework.

Cheers!

LikeLike

Reply
Keith Hill says:

October 5, 2008 at 3:14 pm

I’m with Eric on this. I think the idea warrants further investigation. The following podcast is on Converge, a language built specifically to see what a modern language that supports compile-time meta-programming might look like:

http://se-radio.net/tags/converge

LikeLike

Reply
Keith Hill says:

October 14, 2008 at 10:49 am

FWIW the main problem I would like to see solved is the elimination of boiler plate code. You know this stuff, it is the code you copy/paste because it is too much of a hassle to type it all in again. Take a class with 20 properties all that raise the INotifyPropertyChanged.PropertyChanged event. Each property impl is very patternistic, which leads itself to copy/paste implementation which unfortunately also leads to bugs (e.g. you forget to change the name of the property raising the event). This is an area with C/C++ macros helped but I hate it that they are ignorant of the language which is why I think the metaprogramming approach is promising and worth investigation.

LikeLike

Reply
Jason Y says:

March 19, 2009 at 12:24 pm

Where do you draw the line between macros and other forms of compile-time programming?

The purpose of macros is compile-time programming. Compile-time programming is good. Abolishing type-safety in the name of compile-time programming is bad. Adding a completely new language to an existing language in the name of compile-time programming is bad (just use separate tool altogether!).

Generics in C# are compile-time programming with tolerably few special / new language features, and which maintain type-safety.

Why not run transitively immutable methods at compile-time for optimization when all input required is available at compile-time? Why not extend such a feature to take in parse trees and return parse trees (in a type-safe way), so that you have macros in the original language, built into the compiler? Probably because this has a myriad of implications I have not thought of yet, and would be difficult to implement; I don’t know. Naturally, you would want a language with a minimalist syntax when implementing such a feature; C# is _not_ a good candidate, imo. But that’s the idea behind good macros, I think–programmatic control at runtime, not necessarily DSL’s.

Imho, DSL’s are facilitated more via unifying the idea of operator and method _a la_ Scala and Smalltalk, and via extension methods _a la_ C# 3 (which also facilitate fluent API’s).

LikeLike

Reply

Jon Skeet's coding blog

Macros, and languages within languages

12 thoughts on “Macros, and languages within languages”

Leave a comment Cancel reply

Share this:

Related

12 thoughts on “Macros, and languages within languages”

Leave a comment Cancel reply