D64 Assignment #4
Due Wednesday, March 1, 2000 at 4:00pm.
The assignment should be turned in before class on Wednesday.
- (30 pts.) It's your job to implement an aggregation module for a query evaluator. The module will
be used to compute an average of one column with GROUP BY on another column. The output of the module will
be rows with two columns, the average value and the group by value. You may assume that the input relation
is already sorted in ascending order of the group by column.
You must implement three functions:
You can assume that the following three global variable are available to you and are set correctly:
- void open(void)
- Row getNext(void) - Row is a new type that is already defined. You do not need to know how it's defined,
though, since there are access functions provided (below).
- void close(void)
- inputRelation - the module that will serve as input to this module. Thus inputRelation.getNext() gets
the next from the input module of your module.
- groupByColumnNum - the column that the aggregation should be grouped by
- averageColumnNum - the column that should be averaged
You also have the following functions that you can call (in addition to the iterator calls on the input):
- int getAttributeFromRow(Row,columnNum) - gets the value of a particular column from a row. Assume that
all columns are integers.
- Row composeRow(value1,value2,...) - creates a new row with the given column values.
For example, if the user asks the query:
GROUP BY class
your module would be the root of the query plan. groupByColumnNum would be the column number of the class column
and averageColumnNum the column number of the gpa column. Show the psuedo-code for each of the three routines you
have to write.
- (80 pts.) This question is about evaluating following SQL query:
WHERE year = 1
AND Student.dorm = Dorm.name
Here is some info you may need for this problem:
- Pages are 4kb.
- Records of Student(id,name,year,dorm) are 200 bytes long - id is 4 bytes.
- Records of Dorm(name,address) are 400 bytes long - name is 100 bytes.
- T(Student) = 10000, T(Dorm) = 300
- V(Student,year) = 4
- Block ptrs and 10 bytes and dbptrs are 12 bytes.
- All B Trees have 80% occupancy.
- There is an unclustered B Tree index on year and on Dorm.name.
Answer the following questions
- (40 pts.) Determine the I/O cost of computing the answer to the SQL query using four different join
algorithms: sort-merge join, hash join, index join, and nested loop join. Show all of your work. You may assume
that you have 20 free buffers.
- (40 pts.) Repeat question 1 assuming that you have 100 free buffers.