Deductive Planning

Planning is one of the classic problems of Artificial Intelligence. Here we just touch on the most basic concepts in the simplest way, and show how planning can be approached using a deductive reasoner.

The Planning Problem

Here's a simplified statement of the planning problem.

A planner is a program that can solve planning problems.

Military planning is an obvious, and long-studied, example of planning. It's particularly challenging because it mixes not only planning, which is hard enough, but game playing against an uncontrollable opponent.

To keep things simple, early work in AI focused on planning where no opponent is involved, the world is fairly simple, and things don't change unless the actor, e.g., a robot, causes them to change when executing the plan. We'll make those assumptions here.

In real planning research, the goal is to find very general rules for selecting steps that can be used in every problem. These rules look at the states and the plan so far. We're going to program in specific deductive rules to solve each problem. The rules will be able to handle different problems, but we won't have a general planner.

Deductive Rules for Planning

Using just our simple backward chainer, we can define a general planning framework, that involves a few basic concepts, and two core deductive rules.

There are three core predicates that we need to define rules for.

Actions and states are things you need to design when creating a planner. Logically, they will be functional terms. You can define them any way you want but there are some design constraints.

States

States should be designed so that there is just one way to describe any given state of the world for some problem domain. If there are multiple ways to describe the same state, backward chaining will have to explore all of them, even though the answer will be the same in every case.

For example, in Blocks World, we could represent a stack of blocks with the term (and (on a b) (on b c) (on c table)). This is a bad representation, because there are six different ways to describe the same state. If we represent the stack of blocks with the term (cons a (cons b (cons c nil)))) -- remember, cons here is NOT the Lisp function -- then there is only one way to describe that state.

Actions and results

Actions are relatively simple to design. They can be simple names, like JUMP, if the actor is implicitly understood to be the robot, or they can be terms specifying parameters, e.g., (PUT-ON x y).

For each action, you have to define RESULTS rules that say how that action changes the state of the world. In simple domains, these rules are normally very simple. See the examples for, um, examples.

Step selection

Action result rules are not enough. The planner can't just try every action and see what happens. That will usually lead to endless loops. For example, a basic action in the Monkey and Bananas problem is (MOVE loc) which means "monkey moves to location loc." Suppose the monkey begins by the door and the planner generates the action (MOVE WINDOW) for "monkey moves to window." One of the next possible actions is (MOVE DOOR). Without some rules to guide it, the planner generate, among other things, a plan that forever goes back and forth.

STEP rules are needed to pick the best action to do in a given state of the world. This is where the intelligence of your planner resides. These rules at the least must avoid endless loops, while still finding solutions, if there are any.

Plan construction

Finally there are PLAN rules. These say how to use the STEP and RESULTS rules to find plans. Surprisingly, just two general rules can do basic planning. You normally don't need to define any more PLAN rules.

The first rule is trivial. It says "the empty plan works if the current state is the desired state." To represent the empty plan, we'll use the symbol NIL. We could have picked DONE or something else.

(<- (plan nil ?goal ?goal))

The second rule is where most of the work occurs. It needs to say "a plan from current state to the desired state is one that has an action for current state, followed by a plan that gets from the result of that action to the desired state."

To represent a sequence of actions, we'll use Lisp list notation. (CONS action1 (CONS action2 ... NIL)) represents the plan "first do action1 then do action2 then ...."

(<- (plan (cons ?action ?actions) ?current ?goal)
    (step ?action ?current ?goal)
    (results ?action ?current ?result)
    (plan ?actions ?result ?goal))

This make take a bit of study. In a typical planning query, ?current and ?goal will be constant state descriptions, but the first argument, the plan, will be an unbound variable. The goal of the deductive planner is to prove the query by constructing a plan that makes it true, if possible.

Examples

The following two examples are classic early toy examples in AI planning. Cordell Green demonstrated how deductive retrieval could solve such problems in his technical report Application of Theorem Proving to Problem Solving. It turned out to be a much harder problem than first realized, and much logical study resulted.

The Blocks World Stacking Problem

At MIT, a lot of early work used the blocks world domain as a testbed for planning ideas. In the blocks world, a robot with a camera and one arm with a grasper is given the task of stacking blocks on a table in some specified arrangement. Since the robot can only move one block at a time, and blocks may already be in stacks, the problem is one of unstacking and stacking. Ideally the planner should create efficient plans with no wasted moves.

We'll make this problem as simple as possible. The robot has just one stack of blocks to worry about. Blocks can be stacked or put in a box.

We'll represent the stack of blocks as a logical list term. E.g., the term (CONS A (CONS B (CONS C NIL))) represents the stack with A on B on C. So a problem to create a stack can be represented as a query with an initial stack and a desired stack, and a variable for the plan.

(plan ?plan (cons a (cons b (cons c nil)))
            (cons a (cons b (cons d (cons c nil)))))

Two actions are sufficient for this problem:

The action result rules are easy to write

(<- (results pop (cons ?x ?stack) ?stack))

(<- (results (push ?x) ?stack (cons ?x ?stack)))

Now comes the hard part -- writing the rules to pick whether to POP or PUSH. It would be easy to get into an endless loop here. Plus the problem in the example query introduces a new block, D -- how can the planner know what blocks there are to push onto the stack?

At this point, it pays to do a number of examples, from one or two blocks to three or four, in different arrangements, to get a feel for what the plans look like. Eventually, the following rule for POP may occur to you:

The rule for (PUSH x) is a bit more of a leap of faith.

These two rules are both based on the concept of the current stack being a substack of the desired stack. So let's assume we have such a predicate and write the above step selection rules.

(<- (step pop ?current ?goal)
    (not (substack ?current ?goal)))
    
(<- (step (push ?x) ?current ?goal)
    (substack (cons ?x ?current) ?goal))

Study these for a while. The first makes use of the NOT logical operator. The second uses variable binding and backward chaining to find the binding for ?x that we need.

All that remains is to define that predicate SUBSTACK. Fortunately, it's fairly intuitive, once you understand backward chaining.

(<- (substack ?stack ?stack))

(<- (substack ?s1 (cons ?y ?s2))
    (substack ?s1 ?s2))

These are the rules we need for the one-stack blocks world planner, along with the two PLAN rules. You can try them out by loading sddr-plan.lisp.

The Monkey and Bananas Problem

While MIT had Blocks World, John McCarthy at Stanford had the Monkey and Bananas problem. A monkey enters a room. Hanging from the ceiling is a bunch of bananas, just out of reach. Over by the window is a box. The planning problem is to generate the plan that gets the bananas by moving the box under the bananas and climbing on top of the box.

The only thing we really care about in this problem are the locations of the monkey, box, and bananas. To make things really simple, we'll put the bananas in the center of the room. So a simple state representation that has only one way to describe any state is

(mb-state monkey-location box-location)

The box can be at the DOOR, WINDOW or CENTER. The Monkey can be those places or BOX-TOP.

If the monkey starts at the door and the box is by the window, then the basic problem query is

(plan ?plan (mb-state door window)
            (mb-state box-top center))

I.e., the box has to end up under the bananas and the monkey has to be on the box.

There are three actions the monkey can do. It can walk to a location, push the box to a location, or climb on the box. The action result rules are pretty simple, but unlike the Blocks World, there are some preconditions that have to be true for some actions to be possible. These preconditions are not the same as the step selection rules.

The monkey can climb on the box if the monkey is in the same location as the box.

(<- (results climb-box
             (mb-state ?bloc ?bloc)
             (mb-state box-top ?bloc)))

Notice how we use repeated variables to require the monkey and box to be in the same location.

The monkey can push the box to a location if the monkey is in the same location as the box. The location of both changes.

(<- (results (push-box ?bloc1 ?bloc2)
             (mb-state ?bloc1 ?bloc1)
             (mb-state ?bloc2 ?bloc2)))

The monkey can walk to any location.

(<- (results (walk-to ?mloc2)
             (mb-state ?mloc1 ?bloc)
             (mb-state ?mloc2 ?bloc)))

Now we need to guide the monkey to choose actions that lead toward getting the bananas, and never fall into an endless loop.

For example, if the monkey is not where the box is, then the monkey should walk to the box. To say this, we'll need a predicate for saying "location1 is different than location2." Simply using different variables doesn't do that.

(<- (step (walk-to ?bloc)
          (mb-state ?mloc ?bloc)
          (mb-state ?mloc-2 ?gloc))
    (different ?mloc ?bloc))

We also need to assert the different locations.

(<- (different window center))
(<- (different center window))
(<- (different window door))
(<- (different door window))
(<- (different center door))
(<- (different door center))

Similarly, if the box is not where the bananas are -- the goal location -- then the monkey should push the box to that location.

(<- (step (push-box ?bloc ?gloc)
          (mb-state ?bloc ?bloc)
          (mb-state ?mloc ?gloc))
    (different ?bloc ?gloc))

Finally, if the box and monkey are at the desired location, the monkey should climb on the box.

(<- (step climb-box
          (mb-state ?gloc ?gloc)
          (mb-state box-top ?gloc)))

These rules, plus the two general PLAN rules, will solve the problem. You can try them out by loading sddr-plan.lisp.

A Universal Planner

Well, not really, but certainly more general than what we've written so far.

Right now, our planners need two sets of rules:

What most AI researchers wanted to find was a planner that only the first kind of rule. A universal planner, AKA general problem solver, should be able to figure out for itself what actions to do in what order, using just knowledge about what actions there are and what they do.

Let's take our basic PLAN rules and remove the part that uses the STEP predicate to find the best action.

(defparameter *universal-plan-kb*
  '((<- (plan nil ?goal ?goal))
    
    (<- (plan (cons ?action ?actions) ?current ?goal)
        (results ?action ?current ?result)
        (plan ?actions ?result ?goal))))

Try a Monkey and Bananas problem with these new rules:

(ask '(plan ?plan
            (mb-state center center)
            (mb-state box-top center))
     (append *universal-plan-kb* *monkey-kb* *room-kb*))

You should get a stack overflow, not NIL. The rules are running -- but they get into an endless loop trying all possible sequences of actions. Even though they may find immediately the action of climbing on the box, they also look for other plans, such as walking somewhere else, or pushing the box somewhere else, and then later coming back, and so on.

This doesn't have to be. It should be possible to write rules that only allow an action to be selected if it results in a new state, i.e., not one that is in the current plan being developed. To do that, we would need to change the PLAN rules to keep a list of the result states and check that the result state of an action is not a member of that list before trying plan with that action.

Doing this is left as an exercise.

Note that this will lead to a more general planner, but also a much less efficient one. Also, it will only work if there are a finite number of possible states. That's true for the two example problems we have here, but would not be true if, for example, something like counting was allowed, e.g., the Peano's axiom rules that can generate "successor of successor of ..." forever.

Faculty: Chris Riesbeck
Time: Monday, Wednesday, Friday: 1pm - 2pm
Location:Annenberg G15

Contents

Important Links