D64 Assignment #3 Solutions

1. 4.2.6

b. Use studioName index to fetch rows

The index does not use pointer buckets so each the bucket will contain 10 key-pointer pairs. Since there are 51 Disney movies, they will be spread over 6 blocks. Assuming the worse case that each record is on a different block, we will need to read 51 blocks to get the records themselves. Checking whether the movie was made in 1995 will require no additional I/O once the block containing the record has been read so the total I/O requirement is 57 I/O's.

c. Use year index to fetch rows

The index does not use pointer buckets so each the bucket will contain 10 key-pointer pairs. Since there are 101 movies from 1995, they will be spread over 11 blocks. Assuming the worse case that each record is on a different block, we will need to read 101 blocks to get the records themselves. Checking whether the movie was made by Disney will require no additional I/O once the block containing the record has been read so the total I/O requirement is 112 I/O's.

2. 4.3.1

b. Dense B-Tree on a heap

(i) Each leaf will hold 69 keys so the tree will need 14493 leaf nodes. Each interior node points to 70 children so the tree will need 208 nodes above the leaves. This level will have 3 parent nodes. Finally, the tree will have a root node. The total space for the tree is therefore 14705 blocks.
(ii) To retrieve a record given its search key will require reading the root, two interior nodes and a leaf node. We must then follow the pointer in the leaf to read the block itself. The total number of I/O's is therefore 5.

c. Sparse B-Tree on a sequential file

(i) Each leaf will hold 69 keys so the tree will need 1450 leaf nodes. Each interior node points to 70 children so the tree will need 21 nodes above the leaves. Finally, the tree will have a root node. The total space for the tree is therefore 1472 blocks.
(ii) To retrieve a record given its search key will require reading the root, one interior node and a leaf node. We must then follow the pointer in the leaf to read the block itself. The total number of I/O's is therefore 4.

3. B+ Tree

Here is the tree at the three points. Note that record pointers are not shown.

4. Hash Indexes

1. Each entry in a bucket will require 16 bytes (key+rid).  A block can thus hold 256 entries.  If we had a totally packed index, we could point to all of the rows with ceil(12,000,000/256)=46875 blocks.  To compute the size of the directory, we note that a directory entry will be 10 bytes (block ptr) so a directory block will hold 409 entries.  The directory will then take ceil(46875/409) = 115 blocks.  The total size of the index is 46990 blocks.  Lookups will take 2 accesses if we assume that the directory is in memory, 3 if not.

The likelihood of really packing an extensible hash index this tightly is slim.  Remember that we pick some initial directory size and then double it every time we'd need an overflow block.  Suppose we started with a directory with 1024 entries.  How large would it be when it was fully populated?  It would be 65536 entries large, the smallest power of two that's larger than our minimum value of 46875.  This requires 161 blocks.  We would likewise have 65536 blocks instead of 46875 blocks for the hash buckets.  Our revised storage total is 65536+161 = 65697 blocks.  Note that our occupancy of this index is 46875/65536 = 71.5%
.

2. From the second part of the previous question, it's clear that using only 15 bits of the hash value (0-32767) is inadequate, thus we are between 15 and 16 bits.  If all of the buckets were equally full we'd need ceil(12,000,000/floor(256*0.8))= 58824 blocks.  All of the blocks will not be equally full, though, for two reasons.  First, every time a bucket is split, we end up with 2 buckets that are 1/2 the size roughly of the average bucket.  Second, we may need overflow blocks for some of the buckets.  These will be barely fully at all.  Thus on average we will need more than 36810 blocks.  In this particular case, however, since the distribution is perfect, none of the buckets will overflow until we reach 65536*256=16,777,216 rows which doesn't happen.  We can also ignore the smaller split bucket phenomenon (we'll be off by at most 1).  So after all of this work, we decide that there will be in fact 58824 blocks. This means our directory only needs 58824 entries which will require ceil(58824/400)=148 blocks. The total storage is therefore 58972 blocks.

A lookup takes 2 accesses if we assume that the directory is in memory, 3 if not.