Format your assignment as a single PDF file. Due Thursday, June 12 at 11:59 PM under "Problem Set 5" in Blackboard. Late assignments penalized 10% per day.

  1. Read On our best behavior by Hector J. Levesque.
    1. (2 points) In a paragraph (3-5 sentences), argue why the Winograd Schema Test (WST) described in Section 3 is preferable to the Turing Test. Be specific.
    2. (2 points) In a paragraph (3-5 sentences), describe three key limitations of the WST (it's okay if some or all of these limitations are shared by the Turing Test).
    3. (1/2 point) Below are three potential WST questions, and only one is a valid and good WST question. For each, say whether it is a good WST question, and if not, why not.
      1. Who was the mayor of Pickerington, Ohio in 2008?
        • Dave Shaver
        • Yi Yang
      2. Steve was wronged by Paul, but he got even. Who got revenge?
        • Steve
        • Paul
      3. The skyscraper towered over the house because it was so tall. What was tall?
        • the skyscraper
        • the house
    4. (1/2 point) For the valid WST question above, suggest a good choice for the "special word" and "other word."
    5. (1/2 point) Write your own WST question, specifying the special word and other word. In a sentence, what background knowledge does answering your question require?
  2. Experiment with analogies using this demo of neural-net word representations. Some analogies the system gets correct, such as "man:woman::king:queen." However, others it gets incorrect (try e.g. "clown:funny::magician:___"). Based on your experiments and knowledge of how the neural nets (NNs) work, find a family of analogies that the system consistently gets wrong (one such example will be discussed in class, although you should find a different one).
    1. (3/2 points) In a short paragraph, describe your family of analogies and why it makes sense that the NNs would not succeed for it.
    2. (1 point) Give five distinct examples from your family of analogies, giving both the correct answer and the erroneous top-ranking output from the demo.