Here's one solution to the problem. The key was to be careful about not losing the first row of a new group - we have to save the row that ends a group for use in the next group.
/* State variables - would be dynamically allocated in a real system */
boolean atEOF;
Row nextRow;
int nextGroupByVal;
void open(void)
{
/* in real code, allocate state variables */
inputRelation.open();
nextRow = inputRelation.getNext();
atEOF = (nextRow == NULL);
if (!atEOF)
nextGroupByVal = getAttributeFromRow(nextRow,groupByColumnNum);
}
Row getNext(void)
{
int count=1;
int total;
int thisGroupByValue = nextGroupByVal;
if (atEOF)
return NULL;
total = getAttributeFromRow(nextRow,averageColumnNum);
while (true) {
nextRow = inputRelation.getNext();
if (nextRow == NULL) {
atEOF = true;
break;
}
nextGroupByValue = getAttributeFromRow(nextRow,groupByColumnNum);
if (nextGroupByValue == thisGroupByVal) {
count++;
total += getAttributeFromRow(nextRow,averageColumnNum);
}
else
break;
}
return(composeRow((float)total/count,thisGroupByValue));
}
void close(void)
{
/* in real code, deallocate state variables */
inputRelation.close();
}
A few values we'll need later on:
To perform sort-merge join, we need both relations to be smaller than M2. In this case, the Student table is too large. If we perform the selection before sorting, however, it will be only 125 pages long which is small enough. Thus the cost of the sort-merge is 3*B(Dorm)+B(Student)+2*B(Student)/4. This computes to 740 I/Os.
To perform a hash join, we need at least one of the relations to be smaller than M2 which Dorm is. We can also push the selection down on Student and so our cost is the same as with sort-merge join (740 I/Os).
We can use the index on Dorm.name to perform an index join with Dorm as the inner relation. Each lookup in the Dorm table will require an index lookup plus a read of the actual record. If we push the selection on Student, we will only have to look in the Dorm table for 2500 of the rows in the Student table. The total cost will be B(Student) + 2500 * (index lookup + 1). How deep is the Dorm.name index? We have to solve our favorite equation:
12*(n+1)+100*n <= 4096
which tells us n=36 (at least in the leaves). At 80% occupancy we need roughly 11 leaves and 1 root. Assuming the root is cached, each lookup is only 1 I/O so the total cost of the index join is 5500 I/Os.
Of the 20 buffers we have to work with, we can use 19 to read blocks from Dorm and the remaining buffer to scan through Student. This will require two passes through Student and so the cost is B(Dorm) + 2*B(Student) = 1030 I/Os.
To perform sort-merge join, we need both relations to be smaller than M2 which they are. If they were both less than M we could use the single pass algorithm. Using the two pass algorithm, we get the same result as in part 1.
Since B(Dorm) is smaller than M, we can use the single pass algorithm. The cost is therefore just B(Student)+B(Dorm) = 530 I/Os.
This is the same as in part 1. The extra memory does not help.
Since Dorm fits completely in memory, we only need one pass through Student so the cost is 530 I/Os.