A Macro Story

This is a story about a real macro used by a number of programmers at the Institute for the Learning Sciences for years. It is an example of how not to design and implement a macro.

The creator of the macro was an adept Lisp programmer. He understood how macros worked, how to use backquote to define them, and so on.

The Problem

A demo of a fairly mature program crashed. Nothing major had been added to it, but a programmer had finished cleaning up its code. After a fair amount of bug hunting, the programmer found the change that broke the code. In essence, he had replaced

            (defmacro my-wait-for (n)
              `(wait-for ,n))

with the seemingly more direct

            (defun my-wait-for (n)
              (wait-for n))

But, with the new version, (my-wait-for 12) never waited! Why not?

The Cause of the Problem

The problem lay not with my-wait-for, but the macro it called, wait-for. There were three ways to use it:

(wait-for 12): waits for 12 seconds
(wait-for #'foo): calls (foo) until it returns a non-NIL value
(wait-for (baz x y)): evaluates (baz x y) until it returns a non-NIL value

The macro version of my-wait-for expanded (my-wait-for 12) into (wait-for 12), which fits the first calling format.

The function version of my-wait-for, on the other hand, calls (wait-for n), with n = 12. This fits the third calling format, and is interpreted as "wait until the expression n is non-NIL." Since n is already non-NIL, no waiting occurs.

The Cause of the Cause of the Problem

(wait-for 12) looks like a function call. It looks like it evaluates its argument. But it is doesn't. It looks at the argument form to decide what to do.

wait-for is a macro because its designer violated the principle "one function to a function." He tried to do three waiting tasks with one interface.

Unfortunately, the first task, "wait for number seconds," shares a common calling format with the third task, "wait for expression to be non-NIL." The rule "do the first task only if the number is explicitly given" is not intuitive, easy to forget, and a pain to live with. For example, it makes it impossible to say

        (wait-for *default-wait-period*)

The Fix

The fix is not to write a comment that says "use only literal numbers for wait times." Where would the comment go? If we put it on the definition, it won't be seen by someone modifying a call to wait-for. And it's pretty unlikely that programmers using wait-for will remember to put a comment on every call to wait-for that says "don't replace this number with a variable!"

Nor is it to change wait-for to evaluate its argument and, if the result is a number, do the first task, otherwise do the third. First, this rule counts on programmers knowing and remembering that, in wait-for, unlike elsewhere in Lisp, a number is not treated like other non-NIL values. Second, exactly what is the rule? If exp in (wait-for exp) returns nil the first few times, and then returns 3, is that a signal to stop waiting or a signal to wait three more seconds?

The appropriate fix is to have two waiting forms, e.g., (wait-for number), which is a function, and (wait-until expression), which is a macro. Each waiting form can then have simple unsurprising semantics.