D64 Assignment #2 Solutions

1. Record layout

Part A:

Here's one possible solution:

Offset

Length

Contents

Example record

0

1

Null flags

00000100 (binary)

1

40

title attribute

See spot run

41

40

author attribute

 Dick & Jane

81

10

isbn attribute

0123456789

91

1

hardcover attribute

0x00

92

4

numSold attribute

35256

96

256

length + description attribute

n/a

352

4

price attribute (as int)

1995

356

4

wasted space to align next field

 

360

8

averageRating attribute

 8.234

Notes:

  1. We need some way to store null info. We can't store it in the fields themselves for the most part (what is the floating point representation of null) so we need flags in the header.
  2. The default value isn't stored in the record. It's stored with the schema info in the data dictionary.
  3. The length of the record is not needed since it is fixed length.

Part B:

Here's one possible answer. For the layout, I assume all of the fields are non-null. The example record shows what happens when a field is null.

Offset

Length

Contents

0

2

record length

2

8

array of field start offsets (only numSold on require start offsets)

10

40

title attribute

50

40

author attribute

90

10

isbn attribute

100

1

hardcover attribute

?

4

numSold attribute

?

?

length + description attribute

?

4

price attribute

?

8

averageRating attribute

Here's what the example record would look like:

Offset

Length

Contents

0

2

120

2

8

[104,108,108,112]

10

40

See spot run

50

40

Dick & Jane

90

10

0123456789

100

1

0x00

101

3

wasted space to align next field

104

4

35256

108

4

1995

112

8

8.234

Notes:

  1. Null attributes are represented by having their start offset be the same as the next attribute's start offset (or the record length for the last attribute).
  2. Note that the offset of an attribute is not just the previous attribute's start offset plus its length. This would cause alignment problems.

2. Swizzling

No Swizzling

In this case, we don't need the table at all - the buffer manager will be sufficient.

char *convert_pointer(DBPtr ptr) {
	char *block_start = map_block(BLOCK_ID(ptr));
	return (block_start + BLOCK_OFFSET(ptr));
}

Swizzling on demand

char *convert_pointer(DBPtr ptr) {
	char *swizzled_ptr,*new_block_ptr;
	if (IS_SWIZZLED(ptr))
		return ptr;
	// This is a call to the mapping table.  We should also be passing the location
	// of the pointer we're swizzling in case we have to deswizzle later.  Oh well.
	swizzled_ptr = find_in_table(ptr);	
	if (swizzled_ptr == NULL) {
		new_block_ptr = map_block(BLOCK_ID(ptr));
		swizzled_ptr = new_block_ptr + BLOCK_OFFSET(ptr);
		add_entries_to_table(ptr);
	}
	return swizzled_ptr;
}

void add_entries_to_table(DBPtr ptr) {
	char *new_block_ptr = map_block(BLOCK_ID(ptr));	// The block will already be loaded
	DBPtr entries_in_block[] = extract_entries_from_block(new_block_ptr);
	DBPtr cur_ptr;
	int i;
	for (i=0;i<entries_in_block.length;i++) {
		cur_ptr = entries_in_block[i];
		// This is a call to the mapping table
		add_entry_to_table(cur_ptr,new_block_ptr+BLOCK_OFFSET(cur_ptr));
	}
}

Automatic swizzling

char *convert_pointer(DBPtr ptr) {
	char *swizzled_ptr,*new_block_ptr;
	if (IS_SWIZZLED(ptr))
		return ptr;
	// So the block we need must not be in memory yet.
	new_block_ptr = map_block(BLOCK_ID(ptr));
	swizzle_new_block(BLOCK_ID(ptr));
	return new_block_ptr + BLOCK_OFFSET(ptr);
}

void swizzle_new_block(int block_id) {
	char *new_block_ptr = map_block(block_id);	// The block will already be loaded
		// The following function is provided by the mapping table manager
	DBPtr entries_in_block[] = extract_entries_from_block(new_block_ptr);
	DBPtr cur_ptr;
	int i,j;
	char *referencers[];
	for (i=0;i<entries_in_block.length;i++) {
		cur_ptr = entries_in_block[i];
		// This is a call to the mapping table
		add_entry_to_table(cur_ptr,new_block_ptr+BLOCK_OFFSET(cur_ptr));
		// So is this.  We need to fix all the ptrs that reference entries in the new block
		referencers = get_referencing_ptrs(cur_ptr);
		for (j=0;j<referencers.length;j++)
			*referencers[j] = new_block_ptr+BLOCK_OFFSET(cur_ptr);
	}
}

3. Block packing

Since the records on the block are a fixed size, we know exactly how many will fit on the block and can compute the offset of each block without needing any additional info. The only extra info we need in the header is to tell us which record slots are free. If we cannot reuse records once deleted, which is the case if we're using pure physical addressing, then we only have to keep count of how many record slots we've used. The count is incremented everytime a record is added to the page and it's never decremented (since a deleted record's space is lost - see below).


If we're using physical addressing, we need to leave a tombstone at the location the record used to be. Even if the tombstone is just 1 byte, we lose the full storage slot since records are stored at fixed offsets in the block and cannot be moved (physical addressing pins the records down). Thus we never reclaim any space used by a deleted record. Contrast this to the slot directory approach for packing a block in which we only lose the directory slot to a tombstone, not the entire space the record took.