- Your boss sends you this e-mail: "I want a model that can predict when users are likely to return to our Website
,
based on which pages they visit. Mail me something by tomorrow morning." Answer the following three questions
in 2-4 sentences each.
- Assume the dataset you have easy access to is a log of <user ID, url, timestamp> triples compiled over several weeks.
How would you transform the data for use by a machine learning algorithm? Be specific. (2 points)
- Given the choice between neural networks and decision trees for this task, which would you
choose? Give two reasons why. (2 points)
- What statistics would you measure to convince your boss that your model was successful at
the task? (1 point)
- Consider a pair of perceptrons defined over continuous-valued ordered pairs (x1, x2). Perceptron P has weights (w0, w1, w2) and perceptron
P' has weights (w0', w1', w2'). If the biases w0=w0'=1, and the weights w1=1 and w1'=1/2, under
what conditions on w2 and w2' is P' more_general_than P? The more_general_than concept was
defined in Mitchell, Chapter 2. (2 points)
- Consider a sequential covering algorithm (such as CN2) and a simultaneous covering algorithm (such as your decision
tree learner).
- Describe a data set for which you would expect a sequential covering algorithm to outperform
a simultaneous covering algorithm.
- Describe a different data set for which you would expect the opposite to be true.
.
(4 points, +2 potential
points of extra credit for particularly insightful or clear answers).
|
|