EECS 349 Problem Set 2

Due 11:59PM Friday Apr 14

Updated Apr 12 17:25:00 CDT 2017


In this assignment, you will work in teams of 2 or 3 to implement decision trees. You should collaborate on the code, but each student must turn in an individual homework write-up.

The algorithm you should implement is the same as that given in the decision tree lecture slides (slide 24, the "ID3" algorithm), except (a) our ``default'' is a class value to output, rather than a Node as in the pseudocode, and (b) we will not use the "attributes" parameter. Instead, you should terminate the tree-building when either the example set is empty OR all the examples have the same class value OR no non-trivial split of the examples is possible (i.e., there is no split that partitions the data into more than one non-empty set, i.e. all examples have the same attribute vector). In the latter case, the node should assign the mode class value of the examples (breaking ties arbitrarily). [Note: to prevent infinite recursion in certain corner cases, you should explicitly avoid making trivial splits -- i.e. splits that result in all the examples sorting to the same child branch. However, since this note was only added on Wednesday April 12, solutions that do not handle these corner cases correctly will not be penalized.]

We have written code to read in the data for you (parse.py). It represents each example as a dictionary, with attributes stored as key:value pairs. The target output is stored as an attribute with the key "Class".

Guidelines

Steps to complete the homework

The correct functionality of your code is then worth five points, making a total of ten points for the assignment.

One last suggestion: You may find it helpful to consult the starter code from last year's decision tree homework for reference, but be aware that that assignment involved continuous attributes and used a much more complex design than you will need for this homework.

Submission Instructions

You'll turn in your homework as a single zip file, in Canvas. Specifically:

  1. Create a single pdf file PS2.pdf with the answers to the questions above, and your graphs.
  2. Create a single ZIP file containing:
  3. Turn the zip file in under Problem Set 2 in Canvas.