Homework 3

Due Monday, October 30 at 11:59 PM via Canvas. PDF Format required.

In this part of the assignment, you'll experiment with designing, learning, and evaluating Bayes Nets. Start by downloading the Weka machine learning package. It works across platforms and has the functionality required for the homework. We also highly recommend you consult these helpful tips which run through a couple of examples similar to, but not exactly the same as, the homework exercises.
Complete the steps below and answer the following questions:

(1 point) In the Bayes Net Editor, load the car data set. You can learn more about this data set from its UCI Repository page. Accepting the defaults, learn a Bayes Net for this data set, and save it. The learned Bayes Net will have a specific structure we've talked about in class. What's the name for this kind of Bayes Net?
(1 point) Now, use your domain intuition to design some change to the Bayes Net that you think is reasonable (i.e., adding or deleting edges). Save this network, and include a snapshot and a brief (1-2 sentence) justification for your changes.
(1 point) Now change the defaults in the ``K2'' search algorithm for Bayes Net Learning in order to learn a different Bayes Net structure. Include a snapshot of your learned structure with the assignment, and identify at least one independence assertion that holds for the net in part (a) but not in your learned structure -- or vice-versa.

Now you'll try learning parameters for your networks and evaluating them.

(3 points) In the Weka Explorer, open the car.arff data set. Using Weka's defaults of 10-fold cross validation and the SimpleEstimator, learn CPT parameters and test each of your Bayes Nets from question 1. Include a table that lists the accuracy of each of your Bayes Nets (you don't need to include any CPTs). Compare the accuracy of your nets with that of a "ZeroR" baseline (this classifier comes up as the default when you navigate to the "Classify" tab, and simply predicts the majority class for all test examples).
(1 point) The SimpleEstimator has a default alpha value of 0.5. What value of alpha corresponds to maximum likelihood estimation?
(1 point) You may have found that the Bayes Net learned in part (c) had higher accuracy than the others. However, this network has an "unfair" advantage -- what is it?

Gibbs sampling questions.

(1 point) Consider a small Bayes Net of binary variables with A->B->C. Let P(A=1)=P(B=1 | A=0)=P(B=1 | A=1)=0.5, and P(C=1 | B=1)=P(C=0 | B=0)=1.0. Say you initialize a Gibbs Sampler with A=1, B=0, C=0, and choose to re-sample variable B. What is the probability distribution from which you sample new values for B? Now, from this starting point, say you perform Gibbs sampling on the network in order to estimate P(B) (i.e., there is no evidence). In the long run, what probability will your Gibbs sampler assign to P(B)?
(1 point) The sampler in part (a) has an undesireable behavior. How could you solve this?