CS 337 -- Intro to Semantic Information Processing-- L. Birnbaum

 

 

LECTURE 7:  CAUSAL CHAINS AND INFERENCE, II

 

 

Last time, we concluded that the representation of a coherent

paragraph describing some event consists of a CAUSAL CHAIN that links

the individual actions and states involved.  It seems intuitively

clear that understanding involves building such a representation,

since it is a direct outcome of, and reflects, our ability to explain

the event.

 

Behaviorally, what can we use causal chains for?

 

To answer questions:

 

    Who gave the slave girl the necklace?

 

To summarize stories:

 

    Jack was thirsty.  He went to the kitchen and took a beer from the

    refrigerator.  His wife asked him to do the dishes.  He told her

    he would do them when he finished the beer.  He went to the den.

    He turned on the radio.  He heard that the weatherman predicted it

    would rain tomorrow.  He went to a chair and sat down.  The chair

    fell over and Jack landed on the floor.  The beer spilled all over

    the chair.  When Jack's wife saw the mess, she was angry.

 

    Summarization rules (Schank 1975):

 

                1. Deadend chains will be forgotten.

                2. Sequential chains may be shortened.

                3. Disconnected pieces will be connected or forgotten.

                4. Pieces that have many causal connections are crucial.

 

    Summary: Jack was thirsty.  He got a beer.  He went into the den.

    He sat down on a chair.  The chair fell over.  Jack fell on the

    floor.  The beer spilled on the chair.  Jack's wife was angry.

 

Let's turn to Charniak's (1972) theory.  Unlike Rieger and Schank, who

propose both a theory of content and of process, Charniak's theory is

basically a process theory -- it makes no commitment to the kinds of

inferences that will be drawn.

 

Charniak doesn't make all inferences all the time.  Some inferences

are made right away, others are deferred.

 

    BASE ROUTINES attached to concepts make immediate inferences, and

    activate DEMONS.

 

    DEMONS represent expectations -- they draw inferences only upon

    the mention of certain other concepts -- i.e., there is a notion

    of RELEVANCE.

 

Example: The base routine for "piggy bank" will activate the following

demons:

 

    1 If you want money, then you can get it from the piggy bank if

      you get the piggy bank.

 

    2 If you want to get money out of the piggy bank, shake it.

 

    3 If you shake a piggy-bank, and you don't hear anything, then

      there is no money in it.

 

    So consider the following story:

 

                Janet wanted to get some money.  She found her piggy bank and

                started to shake it.  She didn't hear anything.

 

    Demon 1 links sentence 1 to sentence 2, demon 2 operates within

    sentence 2, demon 3 links sentence 2 to sentence 3.

 

Basic model:

 

                                       activate demons

                                       |-------<---------|

                                       |                 |

                                       |                 |

    Incoming       Apply             Apply

    assertions --> active demons --> base routines

                |              |                 |   \

                |              |                 |    \

                |--------<-----|--------<--------|     \

                                   inferences                   clean-up

 

Compare with Rieger:

 

    1 No specific inference types -- i.e., no content theory of causal

      inference.

 

    2 Deferral of inference.

 

    3 Demons are more goal-directed.

 

    4 Handles inferences from complex conjunctions of features much

      better -- i.e., the inferences are more SPECIFIC.

 

Now let's consider the following story:

 

    John went to a restaurant.

    He asked for a hamburger.

    He paid and left.

 

How could we do this with Rieger inference?

 

    From the first sentence, we can infer that was at the restaurant

    (resultative), from which we can infer that John wanted to be at

    the restaurant (motivational), from which we can infer that John

    wanted to eat something (functional).  Along with many other,

    mostly irrelevant inferences, e.g., John was somewhere else before

    he went to the restaurant (enablement).

 

    It's hard to know how much we can expect to get out of the second

    sentence "bottom-up", looking only at the words it contains.

    Giving Rieger the benefit of the doubt, we'll say it means "John

    told someone that he wanted someone to give him a hamburger." From

    this we can infer that John wanted someone to give him a hamburger

    (enablement), from which we can infer (maybe) that John wanted to

    have a hamburger (what inference type would this be?), from which

    we can infer that John wanted to eat a hamburger (functional).

 

    This matches the inference from sentence 1 that John wanted to eat

    something.  So we know that he asked for the hamburger for the

    same reason that he went to the restaurant -- in order to eat.

    This hardly captures what we know here -- which is that you ALWAYS

    ask for what you want to eat at a restaurant.

 

    "He paid" means that he gave someone some money in exchange for

    that someone giving something to him.  Through some kludging, we

    might be able to get this to match the fact that John wanted

    someone to give him a hamburger.

 

Thus, with a little juggling, a program using Rieger-style inference

could be made to realize that John went to the restaurant because he

wanted to eat, that he asked for the hamburger for the same reason,

and, with a little more juggling, maybe even that he ate the

hamburger, and that he paid for the hamburger.

 

This is rather unsatisfactory for a number of reasons:

 

    PROCESS: Too many irrelevant inferences.

 

                It would also infer such true but irrelevant gems as the fact

                that John was somewhere else before he went to the restaurant,

                that he had gone to that somewhere else, that someone gave him

                the money that he used to pay, that someone gave that person

        the money, etc.

 

                Furthermore, of course, we can write restaurant stories that

                leave even more implicit:

 

                    John went to a restaurant.

                    The salad bar looked good.

                    He left 5 dollars on the table.

 

    CONTENT: We should be using more specific knowledge.

 

                It leaves many questions unanswered: DID John eat the

        hamburger or not?  It also doesn't realize:

 

                    That John probably sat down.

                    That he may have looked at a menu.

                    That a waiter took his order.

                    That the waiter told the cook his order.

                    That the waiter brought the hamburger.

                    That the waiter brought the check.

 

                We also know that people ask for the food they want in

                restaurants -- this is not an accidental relation.  That is,

        the Rieger program would fail to recognize that the fact that

        John went to the restaurant to eat, and that he asked someone

        to bring him a hamburger so that he could eat, are JOINTLY

                responsible for him getting something to eat.  That is, they

        are RELATED elements in a single plan.  As far as Rieger's

        program would know, the above restaurant story is no different

        from the following:

 

                    John went to the refrigerator.

                    He asked for a hamburger.

                    He paid and left.