Introduction

Deductive retrieval is a generalization of simple data retrieval. A data retriever has a database of assertions, e.g., (married john mary), (in chicago illinois), and so on. The main operations are

The retriever looks through the database and returns all the assertions that match the input query pattern. Assertions are usually indexed and organized for fast retrieval, but the semantics is as if all assertions are matched against the query.

A deductive retriever adds the following operations:

A simple deductive retriever in Lisp is ddr.lisp. The general concepts are described in this document, and functions for using it are described here.

Deductive retrieval has these major processes:

Pattern Matching

A pattern can be any form, e.g., (in tiger savannah), but usually a pattern will have variables, e.g., (in ?x savannah). A form without variables is sometimes called a constant form.

A form matches a pattern all corresponding elements of the form and pattern can be matched. Constant elements match each other if they are equal. Variable elements can match anything, but if multiple occurrences of the same variable must match the same thing.

The result of a match is a list of binding lists. A binding list is a list of (variable value) pairs, recording each pattern variable and the corresponding form element.

For example, the result of matching (in ?x savannah) with (in tiger savannah) is (((?x tiger))).

With simple patterns, the result of a match will be either NIL or a list of one binding list. It is possible to extend patterns in such a way that multiple bindings can occur, i.e., there is more than one way the pattern can match the form.

In particular, matching (in tiger savannah) with (in tiger savannah) will produce (nil), while matching (in tiger savannah) with (in tiger swamp) will produce nil.

Unification

Unification introduces one additional feature to normal pattern matching. It allows a pattern to match a pattern. This raises a number of complexities. For example, a variable can bind to another variable, or even to itself. Fortunately, these issues have been worked out long ago, and the code for implementing unification, while subtle and subject to pitfalls, is not all that complicated. It can be found in the unify.lisp file.

Patterns matching against patterns is important for backward chaining.

The Knowledge Base

The knowledge base for our deductive retriever contains:

Assertions and rules are stored using (tell form), e.g.,

> (tell '(at tiger savannah))
> (tell '(<- (near ?x grass) (at ?x savannah)))
> (tell '(-> (predator-of ?x ?y) (prey-of ?y ?x)))

Answering Queries

A query is asked by typing (ask query). The query can be any form.

If the query is a constant form, then we're effectively asking a yes or no question. For example, to ask "Is the tiger near grass?" we type:

> (ask '(at tiger savannah))
((AT TIGER SAVANNAH))

The result is a list of answers to the query found in the KB. There's just one answer to this query.

Backward Chaining

Now consider a query very similar to the previous one:

> (ask '(near tiger grass)
((NEAR TIGER GRASS))

The query is still a "yes/no" questions, and the answer still says "yes" but (near tiger grass) was not explicitly stored in the knowledge base. It was inferred using the backward chaining rule "something is near grass if it is in a savannah."

In this case, only one rule was needed to derive the answer. In general, though, many rules may be chained together.

For example, let's add a rule that says that if you're near something, you can bite it:

> (tell '(<- (can-bite ?x ?y) (near ?x ?y)))

Now let's ask if the tiger can bite grass:

> (ask '(can-bite tiger grass))
((CAN-BITE TIGER GRASS))

The answer is obviously yes, but it required two rules and an assertion to prove it. The retriever found a rule that said the tiger could bite grass if it was near grass. It then found a rule that said a tiger would be near grass if it was in a savannah. Finally, it found the assertion that the tiger was indeed in a savannah.

The two rules and the assertion form a chain, or proof, of the assertion that the tiger can bite grass. The retriever will find all the chains that it can. Chains always terminate in assertions.

Deductive Retrieval versus Logic

Deductive retrieval is based on logical principles, but a deductive retriever is different from a theorem prover in a number of significant ways.

First, deductive retrievers are very limited in what they can deduce, for two reasons:

Deductive retrieval also uses techniques not available in logic. Prolog, for example, applies rules in the order in which they were defined, so that different proofs are possible, depending on the order in which rules are specified. Prolog also has a "cut" operation that allows a rule to eliminate other proof paths from the search process. Our retriever has a mechanism for allowing arbitrary Lisp code to be executed during retrieval.

Representing Knowledge in Rules

There are many ways to represent a rule in English as deductive retrieval rules. Just as with the frame system, you have to develop a coherent ontology, i.e., a set of predicates and objects.

For example, consider the rule "something can satisfy its hunger with something if it's near something it eats." There are many parts to be represented in this rule:

There's no single right answer to these questions, though some answers are better than others, depending how sophisticated you need the retriever to be, how efficient it should, how easy you want it to be for others to add rules, and so on.

We've already chosen a representation for "near," namely (near x y). "Can" is a complex concept that still puzzles philosophers. "Hunger" is an interesting property, because it keeps coming back. "Satisfying hunger" is not an all-or-none result, but a matter of degree, ranging from "hungry but surviving" to "stuffed to the gills."

However, for our purposes, we don't care about any of these issues. We just want to know whether something will starve or not. Therefore, we'll merge all these concepts into one predicate. (can-satisfy-hunger-with x y) that represents the claim that x could satisfy the problem of hunger by eating y.

For "food it eats" we'll use the predicate (eats x y) to say that y is a food that x eats.

Our rule then becomes something like this:

(<- (can-satisfy-hunger-with ?x ?y)
    (eats ?x ?y)
    (near ?x ?y))

The difference between the above and

(<- (can-satisfy-hunger ?x)
    (near ?x ?y)
    (eats ?x ?y))

is purely one of possible efficiency. The predicate that generally has fewer answers should be tested first. In the extreme case, a predicate that is often false should be tested as early as possible in a rule, to avoid needless testing of other parts of the rule. The same answers would be returned by either form of the rule.

Since we already have a predator-of predicate, let's link that to eats:

> (tell '(<- (eats ?y ?x) (predator-of ?x ?y)))

The important point is that every one of these choices for predicates, include the near predicate, could have been made differently. The odds are great that the rules that result from the choices used here will not work well for a system reasoning about spatial relationships, detailed animal behavior, and so on.

Backtracking

Assume we've loaded the assertions in *tiger-kb* in ddr-tests.lisp.

> (load "ddr-tests.lisp")
...
> (in-package :ddr-tests)
...
> (init-kb *tiger-kb*)
...

Now, let's ask how our tiger can satisfy its hunger:

> (ask '(can-satisfy-hunger-with tiger ?x))

The retriever will find that it has to choose something the tiger eats. To do that, it has to choose something a tiger is a predator of. Suppose it picks zebra. Now, the retriever will ask (near tiger zebra). This query will fail, because there is no zebra in the same place as the tiger.

The fact that this query has failed doesn't mean that the tiger can't eat something. It means that the retriever has to "back up" and look for another answer to (predator-of tiger ...). The other possible answer is antelope. This leads to the query (near tiger antelope) which eventually succeeds.

The backing up process, undoing one possible match when it doesn't pan out, is called backtracking. It's one of the classic weak methods of AI, so called because it always works, even when the system is knowledge poor, but it may not work very efficiently. Basically, backtracking means writing your code to try all possibilities, because your code isn't smart enough to know the right thing to do.

Extending the Retriever

Many predicates are very difficult or inefficient to do with logical rules. For example, (< x y) should return true if x is less than y, but try defining this with pure logic. It can be done, but you have to axomiatize arithmetic first.

On the other hand, our retriever is running on top of Lisp, and it's trivial to tell if x is less than y in Lisp. So, why not allow our retriever to call Lisp functions?

We do this in our retriever by defining retrieval methods. A retrieval method for a predicate P is basically a function to call when the retriever sees anything of the form (P ...). Retrieval methods are defined using

(define-retrieval-method name (var)
  body)

name is the name of the predicate for which the method is being defined, e.g., <.var will be bound to the pattern beginning with that predicate, that the retriever has come across.

The retrieval method has to be prepared for variables in its arguments. For example, if (< ?x ?y) appears in a rule, then ?x and ?y will be replaced by their bindings, if any, but any unbound variable will still be in the pattern passed to the retrieval method.

In the case of <, we'll assume that it's OK to simply fail if there are any unbound, or otherwise non-numeric, arguments.

(define-retrieval-method < (pat)
  (destructuring-bind (x y) (pattern-args pat)
    (and (numberp x)
         (numberp y)
         (< x y)
         (make-empty-bindings)))

A retrieval method should always return a list of bindings lists, or NIL, so that the predicate will act like every other predicate in the system.


Comments? Send mail to Chris Riesbeck.