# Self Assessment

Assess your answer to the programming question using the following critiques. Figure approximately 2 points for each serious class of failure. The question as a whole was worth 10 points. The sample answer is probably worth 20, since it's beyond what I could do in an hour.

## Incorrect Singularity Assumptions

One of the main points of this question was dealing with several many-to-many relationships. Words and phrases can have multiple meanings, target concepts can have multiple index sets, the same concepts may appear in the input multiple times, and so on.

Subtract points if your code assumes only one

• word per concept, no multiword phrases
• word sense (concept) per input word or phrase
• index set for each target concept
• target concept for each index set

Extra credit if you noted that ambiguous inputs might better be done by separately scoring each parse, rather than simply lumping all word and phrase concepts together.

## Egregious Inefficiencies

The question explicitly said not to trade clarity for efficiency, but some inefficiencies were unnecessary and no clearer than more efficient alternatives.

• builds lists for the three sets p+s , p-s , and s-p ; all you need to do is to count
• computes the three sets using three scans over the sets; only two scans are needed, once over S and once over P; the third value can be calculated by simple subtraction
• computes the three sets using one scan over one of the two sets; you can't calculate p-s , for example, if you've only scanned S

## No Hope for the Future

The other main point of the question was to see if you had some idea for seriously improving the efficiency of the method so that it would scale to large memories. Set operations like intersection are N-squared. Simply speeding up the operations, e.g., by using caching, isn't enough. You have to remove the N-squared factor, or drastically control N.

The example solution points to a common approach -- building a discrimination tree of index sets. There are also methods for reducing the code of set operations from N times M to N plus M.

Subtract points if you had no suggestions for improving the code beyond caching or replacing list building with counting.

## Miscellaneous

• scores and sorts all index sets without first making sure an index set is relevant, i.e., has at least one concept in p+s ; since p-s , and s-p subtract from the index set's score, many obvious formulae for `score-match` may generate higher scores for irrelevant index sets with only a few elements, than for relevant index sets with many elements