1997 Qual

Programming Question

Analysis


How to Grade Your Answer

I graded on the following items:

These items and possible answers are discussed in detail below. The best possible score is 10 points. I graded the 5 answers submitted this year as follows:

In other words, a recursive function that simply calculated maximum list length and nesting depth and scaled the results using user-adjustable weights for length and depth got 5 points out of 10, assuming the Lisp code was competent otherwise. There needed to be more flexibility than that.

Analysis Point 1: Flexibility

Program for flexibility. It should be easy to change the scoring algorithm and for a Lisp programmer like me to specify context-specific rules, such as "deprecate nesting more when it occurs in the tests of cond's and if's and do's.

There were two techniques used to get flexibility:

Another viable option would be


Analysis Point 2: Metrics

difference-one-do is significantly worse than replace-hex-escape even though it only has a few more atoms and doesn't nest as deeply.

There was several hours of empirical experience hidden in this one sentence. All but one qual-taker used the following obvious metrics:

Unfortunately, they're both terrible, as hinted at by "doesn't nest as deeply". If you look at the table below, you'll see that

Multiplying unreliable values with weights only makes things worse.

Two other simple metrics do a lot better:

The first metric was hinted at by " has a few more atoms" and was mentioned by one qual-taker. The table below shows a pretty good correlation with reasonable spread for both of these. difference-one-do does indeed have fewer atoms than replace-hex-escape, but if you don't count the parameter list, which isn't really the function definition's fault, then difference-one-do has a few more atoms than replace-hex-escape.

The second metric penalizes parentheses, e.g.,

These differences add up fast.


Baseline Metric Results

Here's a table showing how the above metrics score the example pieces of code.

MAX-LENGTH

MAX-DEPTH

ATOM-COUNT

LIST-COUNT

REPLACE-HEX-ESC

5

10

36

58

DIFFERENCE-ONE-DO

4

8

42

71

RIEGER-SIMPLE

5

12

58

96

INTERSECTP

6

9

111

175

STREAM-SUBST

5

8

89

138

As a further test, when I applied list-count to several of my Lisp files that I use a "model" code in CS C25, most functions scored only 20 to 60. Since the scores for the example "too long" functions start at 70 and quickly pass 100, list-count seems like the very reasonable baseline metric.


Baseline Metric Code

Lisp code to do the above metrics is pretty simple. Most of the qual-takers wrote pretty clumsy recursive code.

(defun max-length (form)
  (cond ((atom form) 1)
        (t (reduce #'max form
                   :key #'max-length
                   :initial-value (length form)))))
 
 
(defun max-depth (form &optional (depth 0))
  (cond ((atom form) depth)
        (t (max (max-depth (car form) (1+ depth))
                (max-depth (cdr form) depth)))))
 
(defun atom-count (form)
  (cond ((atom form) 1)
        (t (reduce #'+ form :key #'atom-count))))
 
(defun list-count (form)
  (cond ((atom form) 0)
        (t (reduce #'+ form
                   :key #'list-count
                   :initial-value (length form)))))

 


Handling the special cases

A Data-Driven Approach

My original answer for this question was data-driven. It had a table associating a scoring function with Lisp function names. For example, the following scoring function(s) for cond (one of the more complex cases), penalizes forms in the tests of each branch nested over a certain amount and penalizes long branches.

(setf (score-fn 'cond)
      #'(lambda (form)
          (reduce #'max (rest form) :key 'score-branch)))
 
(defun score-branch (branch)
  (max (penalize (max-depth (first branch)) 1 2)
       (penalize (length (rest branch)) 1 2)))
 
(defun penalize (n &optional (threshold 1) (factor 2))
  (+ 1 (* (max 0 (- n threshold)) factor)))

Implementing a data-driven solution was quite easy. Here was the top-level scoring function. Note that implemeting function-too-long-p simply means "max-score > threshold."

(defun max-score (defn)
  (cond ((atom defn) 1)
        (t (reduce #'max defn
                   :initial-value (score-form defn)
                   :key #'max-score))))
 
(defun score-form (form)
  (cond ((atom form) 1)
        (t (funcall (get-scorer (car form)) form))))

However, I liked the pattern-directed rules two qual-takers used, and especially liked the suggestion by one to use a syntax similar to Scheme's define-syntax special form. Patterns are certainly easier to maintain than Lisp code. Furthermore, I've never tried implementing define-syntax style of patterns, so that's what I've implemented here. The approach is inspired by one of the qual answers, though the code is totally different.

You can test my answer by calling score-file on a file of Lisp code. The examples in the original question are here.

A Pattern-Driven Approach

A rule pattern is basically a template for a Lisp expression, with numbers for subexpressions. The numbers are penalty weights. Whatever score the matching code is given will be multiplied by the corresponding weight.

The special symbol etc in a pattern means "repeat the previous pattern until the end of the list." (This is equivalent to ... in define-syntax in Scheme, but Common Lisp doesn't allow ... as a symbol name.) This kind of syntax is not as general as segment variables in a full pattern matcher, but it's simpler to use and implement, and provides all the generality we need here.

Here's three rules, to show the format:

(defparameter *score-pats* 
  '((if 2 1 etc)
    (cond (2 1 etc) etc) 
    (let ((1 2) etc) 1 etc)
    ))

The first rule says that when scoring IF forms, double the score in the test position. The second rule says to double the score for the test in each branch of a COND form. The third rule says to double the score of the value forms in LET variable specifications.

Matching

The matcher for these patterns is similar to standard pattern matchers, where

Life is simpler in that we never need to check the bindings during the match, just save them for later. The only complexity is handling etc and this is still simpler than handling full-fledged segment variables.

As usual, we need to make sure that we can distinguish a successful match with no bindings from a failed match. Here, we'll use a binding list of the form (pat binding binding ...) where pat is the top-level pattern that matched. Each binding will be a pair whose car is some penalty weight from the pattern and whose cdr is the subexpression that the weight matched.

Here's an example result:

? (match-score-pat '(if 2 1 etc) '(if (null x) t (car x)))
((IF 2 1 ETC) (1 . T) (1 CAR X) (2 NULL X))
 

This says that

Notice how the weight 1 was assigned to the two subexpressions following the test.

Adding Up the Scores

The algorithm for combining scores as implemented by the code is as follows.

It assumes that there are two user-changeable scoring functions:

The algorithm to score a piece of code, code, is:

Notice that if no patterns apply to any part of the code, then the above algorithm does

Notice that patterns with internalize parentheses don't penalize those parentheses, in this algorithm. This could be considered a bug or a feature. The binding penalizer partially compensates for this.

Testing the Extensions

So, how much better do those patterns make our Lisp scoring function? Here's what I get for the example functions, along side the original values returned by list-count:

LIST-COUNT

SCORE-CODE

REPLACE-HEX-ESC

58

62

DIFFERENCE-ONE-DO

71

99

RIEGER-SIMPLE

96

115

INTERSECTP

175

192

STREAM-SUBST

138

160

Are these better? They're spread out more but I leave it as an exercise for the reader to decide if anything substantial has been gained (beyond learning, of course).