EECS 311: DATA STRUCTURES

Programming Assignment 2

The exact test cases for this assignment are still being worked out.

Goal

The goals of this assignment are:

implement a non-binary search tree suitable for storing 3-dimensional numeric data and supporting range queries
analyze and empirically evaluate the performance of the tree for moderately large amounts of data.

Be sure to follow the 311 coding standards.

Task

There are three parts to this assignment:

Implement an octree data structure for 3-dimensional data.
Test your octree on both real and randomly generated data set.
Analyze the mathematical complexity of your algorithm and compare to your empirical results.

Pair-programming is allowed if it is true pair programming, i.e., coding done by both partners together, on one computer, switching every 20 minutes or so. The analyses should be done individually.

Implement an Octree

An octree is a variation of an quad-tree, suitable for storing three-dimensional data. They are very simple data structures. No rotations to worry about, just pointers, points, and some way to keep track of what region of space any node in the tree represents.

You'll obviously need a class for octrees, but equally important to keep things simple will be at least two more classes:

A point3d class for three-dimensional coordinate points
A region3d class for three-dimensional regions

The points are obvious because that's what we're storing in the octree. The regions are what we'll use to query our octree.

Your point3d can be a simple struct.

Your region3d can also be a struct, with a 3-dimensional point P and a size N, representing a cube with width, height and depth N and center at P. We'll use a cube to keep the arithmetic simple. A more typical solution would support general cuboids or even spheres.

Your octree should support at least the following public member functions:

a constructor to create an empty octree; the constructor will need to know:
- the minimum and maximum values for X, Y, and Z
- a maximum depth N
void insert(point3d point) that adds the given point to the tree, following the rules for octrees. Duplicate points should be allowed.
template <Iter> int query(region3d region, Iter iter) that takes a region and an insert iterator, and stores in iter all points, if any, that are in the given region. query should return how many points were found.

Each node in the octree needs to hold eight pointers to the eight octants, in some standard order. In addition, you'll need to decide how you want to store the actual points in the tree, and how you'll keep track of the dimensions each node represents. Note that if you know the dimensions of the root node, all the other dimensions are directly determined by their location in the tree.

Resources

Octrees are quad-trees extended to three dimensions. Quad-trees are described in exercises 12.39 and 12.40 on page 562. An example of quad-trees applied to two dimensional video games is given here.

A good description of octrees is here.

An excellent Powerpoint presentation with an empirical and analytical comparison of octrees and a number of other approaches to storing three-dimensional points is here.

Octants (and quadrants in quad-trees) are formed repeatedly until points fall into separate bins. A maximum depth is necessary to avoid very close points creating arbitrarily deep trees.

Analyze your algorithm

Do as formal an analysis as you can of the upper and lower complexity bounds of your algorithm for space and time. The number of points stored N is clearly important, but not the only relevant factor. Think about what else affects the space and time requirements. Include and refer to the key lines of your code that you believe dominate the complexity. Don't counting reading and printing, but definitely include the time to insert a point. Don't worry about deletion.

Input

Run your program on some real data and a larger set of randomly generated data.

Use PhotoObjects.zip for the real data set. Each line of the file gives galactic coordinates for a galactic object in the Sloan Digital Sky Survey. I found the file here.

The coordinates in this data fall within the following bounds:

Axis	Minimum	Maximum
X	-0.998	-0.994
Y	-0.105	-0.069
Z	-0.022	-0.003

For your second run, generate and store 100,000 random points in the same ranges.

Output

Your program should run standalone, without user input. It should do two runs, one with real data, one with the randomly generated data. For each run it should

print the source of the data and the number of items, appropriately labelled
all stored values that fall withing the following regions:
- TBD
how long the entire run took, not counting reading or printing, in seconds, to 4 decimal places
the maximum depth of the tree created, and the number of nodes in the tree; a tree with one root node and one leaf would be depth 2

Additional Requirements

You can organize your code into multiple .cpp and .h files, if that makes sense, but be sure to define main() in a file called main.cpp.

Note the use of a templated query() member function. This is so you can pass an insert operator to any container you create to collect results. The simplest thing is to pass a back_inserter for a vector.

Use Weiss' Random class to create the random data. This class is described in Section 10.4.1, with comments on why it might be better than what your system comes with. I put the code in Weiss-Random.zip. It includes a test program you should compile and run to make sure it works. Be sure to include the Random.cpp in your project, and put Random.h in your project directory.

What to Hand in

There are two separate attachments to email: your code, in a compressed archive, and your algorithmic analysis, in a text document. These are described below. Both should be emailed to me with the Subject line EECS 311 PA2.

Code

Submit a zipped or tar-gzipped archive of one directory called PA2 containing:

all your source files
your output in a plain text file called output.txt

If you worked alone, call the archive last-name_first-initial_pa2.zip (use .tgz for tar-gzipped files). Example: riesbeck_c_pa2.zip.

If you pair-programmed, call the archive last-name1_last_name2_pa2.zip (use .tgz for tar-gzipped files). Example: kornhauser_tarzia_pa2.zip.

Make sure your code is compile-and-run.. That means

all source code files must be included
only relative local file paths are used, like "PhotoObjects.txt", not absolute file paths, like "c:/eecs311/projects/pa2/PhotoObjects.txt"

Do not include PhotoObjects.txt! My email account is overfull as it is.

Every source file should have an opening header comment of the following form:

// Filename: main.c (or whatever)
// EECS311 PA2 solution: Simple octree
// Date: April 3, 2007
// Authors: Dan Kornhauser, Stephen Tarzia
// Programming sessions: 4/1/07: 4-6pm; 4/2/07: 3-6pm; 4/3/07: 10am-noon
// 
// Except for code provided by the instructor or textbook, this code was written
// solely by the authors listed above, and not copied from another student
// or external source.

Analysis

Write an algorithmic analysis of the parts of the code that dominate the running time. The key independent variable is the number of points. Use "big-Oh" notation, as shown in the book. Include the key lines of code that you're analyzing. Compare your theoretical results to your empirical results. Ideally they should match, but don't claim a match that isn't there. It's better to recognize a problem with the analysis or empirical measurements than to fudge or misread the data.