Due Monday, October 24 at 11:59 PM via e-mail to ddowney at eecs.northwestern.edu. Use EECS 395/495 Homework 3 as the e-mail subject line. PDF format preferred, though Plaintext, Word, and HTML are also acceptable.

  1. In this part of the assignment, you'll experiment with designing, learning, and evaluating Bayes Nets. Start by downloading the Weka machine learning package. It works across platforms and has the functionality required for the homework. We also highly recommend you consult these helpful tips which run through a couple of examples similar to, but not exactly the same as, the homework exercises.
    Complete the steps below and answer the following questions:
    1. (1 point) In the Bayes Net Editor, load the car data set. You can learn more about this data set from its UCI Repository page. Accepting the defaults, learn a Bayes Net for this data set, and save it. The learned Bayes Net will have a specific structure we've talked about in class. What's the name for this kind of Bayes Net?
    2. (1 point) Now, use your domain intuition to design some change to the Bayes Net that you think is reasonable (i.e., adding or deleting edges). Save this network, and include a snapshot and a brief (1-2 sentence) justification for your changes.
    3. (1 point) Now change the defaults in the ``K2'' search algorithm for Bayes Net Learning in order to learn a different Bayes Net structure. Include a snapshot of your learned structure with the assignment, and identify some independence present in this Bayes Net not found in the net from part (a).
  2. Now you'll try learning parameters for your networks and evaluating them.
    1. (3 points) In the Weka Explorer, open the car.arff data set. Using Weka's defaults of 10-fold cross validation and the SimpleEstimator, learn CPT parameters for each of your Bayes Nets from question 1. Include a table that lists the accuracy of each of your Bayes Nets along with a "ZeroR" baseline (this classifier comes up as the default when you navigate to the "Classify" tab, and simply predicts the majority class for all test examples).
    2. (1 point) The SimpleEstimator has a default alpha value of 0.5. What value of alpha corresponds to maximum likelihood estimation?
    3. (1 point) You likely found that the Bayes Net learned in part (c) was particularly effective. However, this network has an unfair advantage -- what is it?
  3. Gibbs sampling questions.
    1. (1 point) Consider a small Bayes Net of binary variables with A->B->C. Let P(A=1)=P(B=1 | A=0)=P(B=1 | A=1)=0.5, and P(C=B)=1.0. Say you initialize a Gibbs Sampler with A=1, B=0, C=0, and choose to re-sample variable B. What is the probability distribution from which you sample new values for B? In the long run, what probability will your sampler initialized in this way assign to P(B)?
    2. (1 point) The sampler in part (a) has an undesireable behavior. How could you solve this?