Simple Pattern Matching
Consider a simple pattern matcher, such as the one defined in pat-match.lisp. This is basically the unifier in Artificial Intelligence Programming, simplified to handle only patterns against constant forms, rather than patterns against patterns. Some example calls:
> (pat-match '(?x b ?y) '(a b c)) (((?Y C) (?X A))) > (pat-match '(?x b ?x) '(a b c)) NIL > (pat-match '(?x b ?x) '(a b a)) (((?X A)))
Pattern matching is the basis of a great deal of AI programs. Patterns are easy to write, compact, easily extended, and so on.
A serious AI program will have hundreds or thousands of patterns, and may apply a significant fraction of those patterns while solving a problem. As a result, time spent matching patterns may dominate the run-time of an AI program.
There are two approaches to reducing this cost:
- use good indexing techniques, such as discrimination trees, to reduce the number of patterns matched
- compile patterns to speed up the matching process
These approaches are independent. It's usually best to do the first one first. Then, if pattern matching is still the bottleneck in your application, even with the reduced number of patterns to be matched, compile them.
Patterns versus Code
To see the magnitude of the costs involved in pattern matching, consider these two functions. The first calls the pattern matcher:
(defun using-pat (input) (pat-match '(?x b ?x) input))
The second produces the same results for any given input, but is coded quite differently:
(defun using-code (input) (and (consp input) (let ((blists (match-variable '?x (car input) '(nil)))) (and (consp (cdr input)) (eql 'b (cadr input)) (consp (cddr input)) (null (cdddr input)) (match-variable '?x (caddr input) blists)))))
using-pat is clearly more compact, readable, and
using-code however can be significantly more
efficient. Macintosh Common Lisp (version 3) gives the following
> (time (dotimes (i 100) (using-pat '(a b c)))) (DOTIMES (I 100) (USING-PAT '(A B C))) took 164 milliseconds (0.164 seconds) to run. Of that, 9 milliseconds (0.009 seconds) were spent in The Cooperative Multitasking Experience. 6,400 bytes of memory allocated. NIL > (time (dotimes (i 100) (using-code '(a b c)))) (DOTIMES (I 100) (USING-CODE '(A B C))) took 13 milliseconds (0.013 seconds) to run. 3,200 bytes of memory allocated. NIL
For this simple 3-part pattern,
over 10 times faster than
Why is using-code so much faster?
Much of the work that
pat-match does is looking
at the pattern. Effectively, pattern matching involves these
- check to see if the pattern is a variable
- if so, check to see if it can be bound to the input
- else, check to see if the pattern is an atom
- if so, check to see if it's equal to the input
- else, check to see if the pattern is a list
- if so, check to see if the input is a list
- if so, recursively apply the above steps to the
cdrof the pattern and input
The hand-written code in
using-code omits the
run-time checks in Steps 1, 2, and 3. Every call to
match-variable is known in
Furthermore, there's no recursive calls to
because it's already known when the pattern is a list.
As a result, there are fewer conditional checks, fewer function calls, and the function calls are all explicit and compilable into machine code before the code is run.
Getting the best of both worlds
Looking at the run-times for
using-code, it's clear which code we'd rather run.
On the other hand, looking at their definitions, it's also clear
which code we'd rather write.
Since we clearly don't want to have to write code like that in
using-code, we get the computer to do it for us.
The idea behind compiling patterns is simple. In most cases, the patterns in a knowledge base are static, i.e., known in advance. Therefore, we can
- take those patterns,
- generate Lisp functions that would produce the same binding lists that the patterns would generate when applied to an input,
- compile the functions, and
- call the functions instead of the pattern matcher
We'll first look at how
expand-pat works, then at
several ways in which it can be used.
How to Compile Patterns
Compiling a pattern turns out to be quite simple. It's very much like defining a macro expansion.
As with macros, we need to keep track of two types of variables:
- compile-time variables, whose values are known when the
pattern is being compiled; we have one here:
pat, the pattern being compiled.
- run-time variables, whose values won't be known until the
expanded code is applied to some input; we have two here:
blists, bound to the input form and current list of binding lists, respectively.
pat-match had the same three
variables. The difference is that
pat is now a
compile-time variable. It won't exist at run-time.
Now we need to figure out what Lisp code to generate for each kind of pattern elements. Lisp s-expression patterns have only three elements:
The top-level compiler function,
simply calls functions for each of these cases. ("One function to
a function" -- the function of
expand-pat is to
select what other function to run.)
Variables are easy to compile. They just become calls to
> (expand-pat '?x) (match-variable '?x input blists)
Atoms are equally easy to compile. They just become calls to
eql that return
> (expand-pat 'a) (and (eql 'a input) blists)
The only tricky bit is with patterns that are lists.
Obviously, we can recursively expand the
cdr of the pattern into Lisp code, but how to we
combine the two pieces of code that result?
What we want is that, at run-time,
- the code from the
carof the pattern will be executed with
inputbound to the
carof the input, and
blistsis bound to the current list of binding lists;
- the code from the
cdrof the pattern will be executed with
inputbound to the
cdrof the input, and
blistsis bound to the list of binding lists produced by the code for the
carof the pattern;
The following Lisp template achieves these two goals:
(let ((blists (let ((input (car input))) -- put pattern CAR code here --)) (input (cdr input))) -- put pattern CDR code here --)
This will evaluate the code from the two parts of the pattern
in contexts with the desired values for
Normally, it's a bad idea to reuse variables like this, but
the generated code is not meant for human eyes, and doing it this
way means the pattern expander can always count on
blists being bound to the
How to get compiled patterns into code
We can now see how to turn patterns into Lisp code. One big
problem remains, however. How do we turn Lisp code into callable
Lisp functions? In particular,
generate a function, it generates a list, e.g.,
> (expand-pat '?x) (MATCH-VARIABLE '?X INPUT BLISTS)
It's not in function form, and
blists free variables.
compile-pat does a little better:
> (compile-pat '?x) #'(LAMBDA (INPUT &OPTIONAL (BLISTS '(NIL))) (MATCH-VARIABLE '?X INPUT BLISTS))
but this is still just a list, not a function.
It can be converted into a function using
> (eval (compile-pat '?x)) #<Anonymous Function #xB22986>
evalis evil, and
- this won't work in many Lisp's that build stand-alone
applications, because the first thing they delete is
The solution is to look at where our patterns are being defined. For example, suppose we have a file of "if--then" rules, where we link patterns to code for a robot to execute if some input command matches the pattern, e.g.,
(defrule put-rule (put-in ?x ?y) => (move-hand-to ?x) (grasp) (move-hand-into ?y) (ungrasp))
Without compiling, we'd have a robot command interpreter that would look something like this:
(defun run-robot (input) (dolist (rule *rules*) (let ((blists (pat-match (rule-test rule) input))) (unless (null blists) (do-actions (instantiate (rule-actions rule) blists))))))
defrule would simply be
(defmacro defrule (name test => &rest actions) `(add-rule :test ',test :actions ',actions))
To use a pattern compiler, we'd redefine
(defmacro defrule (name test => &rest actions) `(add-rule :test ,(compile-pat test) :actions ',actions))
Now, when our rule file is compiled, all the patterns become compiled Lisp functions.
Note: we also have to redefine the robot interpreter:
(defun run-robot (input) (dolist (rule *rules*) (let ((blists (funcall (rule-test rule) input))) (unless (null blists) (do-actions (instantiate (rule-actions rule) blists))))))
Increasing maintainability of compiled patterns
One problem with compiling patterns is that rules become hard to read and trace. We no longer see anything that tells us what the rule is looking for, because the patterns are now anonymous functions.
Therefore, it's usually a good idea to modify our
defrule to store both the pattern and the function.
The former is for debugging, the latter is for efficiency.
(defmacro defrule (test => &rest actions) `(add-rule :test ,(compile-pat test) :pattern ',test :actions ',actions))