EECS 311: DATA STRUCTURES Home Course Info Links Grades Lectures Newsgroup Homework Exams

## Goal

The goals of this assignment are:

• learn how to design and implement an efficient tree-based data structure for two-dimensional data
• that supports rectangular range queries

Two-dimensional data arises in many contexts, particularly problems involving spatial information, such as geographical information systems (GIS) and video games. A common task is finding all points within a given region or near a given point. The techniques that work with scalar data don't do so well here. Red-black trees don't work because 2D points can't be given an ordering (`operator<`) that keeps nearby points together. Hashtables don't work because they don't keep similar data close together at all. And of course regular vectors have poor O(N) search times.

One solution is quad trees. These are described in Exercises 12.38 through 12.40 in Chapter 12, and online in many places. (For 3D data, a similar data structure is the octree.)

Your job is to implement quad trees and range queries, and test their performance on a moderately sized dataset.

Go slowly. Be sure the first subtask is working before going to the next. Get code for the first subtask approved in the Code Critic before submitting the code for the next subtask.

The Debugger is your friend. You will almost certainly have bugs in your initial coding. Use the debugger to trace through your code to uncover why points get lost or your program crashes.

But don't just write code, watch it break, and try tracing each step with the debugger. Work out yourself, by hand, in detail, exactly how the quad trees should grow with the test cases I've given below. When will splitting occur? What quadrant will a point go into? Once you do that, you'll quickly see in the debugger when the code is going down the wrong track. I speak from experience here.

### Set up

Create a new project directory called quadtrees. Copy the Magic Makefile into it.

### PA3 Task 1: Create a range class

Define a Range2D structure in the files range2d.h and range2d.cpp. This will be used to represent a range of 2-dimensional data, e.g., all points whose X value is between 1 and 10 and all Y values between 25 and 35. This structure is worth developing and testing independently.

Here's the public API:

• Constructor: Range2D::Range2D( double xmin, double ymin, double xmax, double ymax )
Creates a range object, with the public data members xmin, ymin, xmax, and ymax.
• bool Range2D::contains( double x, double y ) const
Returns true if the point (x, y) is within the range, i.e., xmin ≤ x ≤ xmax and ymin ≤ y ≤ ymax.
• template <typename InputIterator, typename InsertIterator>
void Range2D::collect( InputIterator begin, InputIterator end, InsertIterator out ) const

Copies into out all points from begin to end that are inside the range. (The template typenames refers to iterator concepts described here.)
• bool Range2D::intersects( const Range2D & range ) const
Returns true if the two ranges overlap, i.e., there are any points that both contain.
• void Range2D::resize( double factor )
Resizes the range. The new sides are factor times the old sides. The center of the range remains the same.

For testing purposes:

• create a tests.cpp UnitTest++ file
• include datum.h in tests.cpp

Put the following tests (and others, if you want) in tests.cpp. Note the use of back_inserter to get a suitable iterator for the vector.

```TEST(rangeConstructor)
{
Range2D range( 3, 2, 6, 4 );
CHECK_EQUAL( 3, range.xmin );
CHECK_EQUAL( 2, range.ymin );
CHECK_EQUAL( 6, range.xmax );
CHECK_EQUAL( 4, range.ymax );
}

TEST(rangeGrow)
{
Range2D range( 3, 2, 6, 4 );
range.resize( 2 );
CHECK_EQUAL( 1.5, range.xmin );
CHECK_EQUAL( 1, range.ymin );
CHECK_EQUAL( 7.5, range.xmax );
CHECK_EQUAL( 5, range.ymax );
}

TEST(rangeShrink)
{
Range2D range( 3, 2, 6, 4 );
range.resize( 0.5 );
CHECK_EQUAL( 3.75, range.xmin );
CHECK_EQUAL( 2.5, range.ymin );
CHECK_EQUAL( 5.25, range.xmax );
CHECK_EQUAL( 3.5, range.ymax );
}

TEST(rangeContains)
{
Range2D range( 3, 1, 5, 3 );
CHECK( range.contains( 4, 2 ) );
CHECK( range.contains( 3, 1 ) );
CHECK( range.contains( 5, 3 ) );
CHECK( !range.contains( 2, 5 ) );
CHECK( !range.contains( 3, 0 ) );
CHECK( !range.contains( 6, 2 ) );
CHECK( !range.contains( 3, 4 ) );
}

TEST(rangeCollect)
{
Range2D range( 3, 1, 5, 3 );
IntData data;
data.push_back( IntDatum( 2, 5, 1 ) );
data.push_back( IntDatum( 4, 2, 2 ) );
data.push_back( IntDatum( 3, 0, 3 ) );
data.push_back( IntDatum( 3, 1, 4 ) );
data.push_back( IntDatum( 6, 2, 5 ) );
data.push_back( IntDatum( 5, 3, 6 ) );
data.push_back( IntDatum( 3, 4, 7  ) );

IntData results;
range.collect( data.begin(), data.end(), back_inserter( results ) );

CHECK_EQUAL( 3, results.size() );
CHECK_EQUAL( 2, results[0].value );
CHECK_EQUAL( 4, results[1].value );
CHECK_EQUAL( 6, results[2].value );
}

TEST(rangeIntersects)
{
Range2D range( 3, 1, 6, 4 );
CHECK( range.intersects ( range ) );
CHECK( range.intersects( Range2D( 4, 2, 7, 5 ) ) );
CHECK( range.intersects( Range2D( 2, 0, 4, 2 ) ) );
CHECK( range.intersects( Range2D( 2, 0, 7, 5 ) ) );
CHECK( range.intersects( Range2D( 4, 2, 5, 3 ) ) );
CHECK( !range.intersects( Range2D( 4, 5, 6, 7 ) ) );
CHECK( !range.intersects( Range2D( 8, 3, 10, 6 ) ) );
}```

Submit to Code Critic: your range2d.h and range2d.cpp, clearly marking which is which. Don't send test code, unless you added any new tests. If so, send just the new tests.

Create a file quadtree.h. In it, define a generic templated QuadTree structure. This will be used to store data associated with 2-dimensional coordinates. The data might be anything, from a string or number to an arbitrary data structure.

Unlike a 2-d tree (chapter 12, section 12.6), a quad tree needs to know what range of points it will contain. The range can be as large as we need, though it's best to keep it as small as possible.

Exercises 12.38 through 12.40 and Figure 12.53 show the basic idea. Initially the tree contains one node that's empty. Adding one point to a node is easy. If another point is added to the node, however, 4 child nodes are created, one for each quadrants (hence the name), the old point is moved to itse appropriate subquadrant, and then, recursively, the new point is added to its appropriate subquadrant.

Note that if the new point is very close to the old point, it will probably go to the same subquadrant, and the splitting process will repeat, until subquadrants become small enough that the points go into different ones.

This raises the issue of what to do with equal points. If we just let splitting occur, we'd have infinite recursion. Instead, use this variation of the basic quad tree algorithm. Each node has

• the range it covers, represented with a Range2D
• a count of the total number of data items stored within this range
• a vector, possibly empty, of data items stored in this node
• an array of pointers to the 4 subquadrants

When an attempt is made to add a new item to a node:

• if the count is zero, add the item to this node's vector;
• otherwise, if the vector is empty, then recursively add the item to the appropriate subquadrant;
• otherwise, if the item's coordinates match those in the vector, add it to the vector;
• otherwise, create nodes for the 4 subquadrants, move all the data in the vector to the appropriate child, and recursively add the item to the appropriate subquadrant.

Here's the public API:

• Constructor: QuadTree<T>::QuadTree( double xmin, double ymin, double xmax, double ymax )
Creates a quad tree object that will contain data of type T associated with points inside the range specified by the four given values. For simplicity:
• The tree should have a public data member range, of type Range2D, with the size given by xmin, ymin, xmax, and ymax.
• The data type T must have public data members x and y for the coordinate point.
Adds a data item to the tree.
• template <typename InsertIterator>
size_t QuadTree<T>::get( double x, double y, InsertIterator out) const

Copies all values stored for the given (x, y) to out, and returns the number of values copied.
Returns how many points are stored in the tree.

Then add the following tests. Make sure you understand what these tests are checking for. Post a question if you don't.

```TEST(emptyTree)
{
QuadTree<IntDatum> tree( 0, 0, 10, 10 );
CHECK_EQUAL( 0, tree.size() );
}

{
QuadTree<IntDatum> tree( 0, 0, 10, 10 );
IntDatum datum( 4, 2, 1 );
CHECK_EQUAL( 1, tree.size() );

IntData results;
CHECK_EQUAL( 1, tree.get( 4, 2, back_inserter( results ) ) );
CHECK_EQUAL( datum.value, results[0].value );
}

{
QuadTree<IntDatum> tree( 0, 0, 20, 20 );
tree.add( IntDatum( 4, 2, 1 ) );
tree.add( IntDatum( 12, 8, 2 ) );
tree.add( IntDatum( 1, 3, 3 ) );
tree.add( IntDatum( 2, 1, 4 ) );
tree.add( IntDatum( 10, 10, 5 ) );
CHECK_EQUAL( 5, tree.size() );

IntData results;
IntInserter inserter = back_inserter( results );

CHECK_EQUAL( 1, tree.get( 4, 2, inserter ) );
CHECK_EQUAL( 1, tree.get( 12, 8, inserter ) );
CHECK_EQUAL( 1, tree.get( 1, 3, inserter ) );
CHECK_EQUAL( 1, tree.get( 2, 1, inserter ) );
CHECK_EQUAL( 1, tree.get( 10, 10, inserter ) );

sort( results.begin(), results.end() );
IntData::iterator iter = results.begin();
CHECK_EQUAL( 1, (iter++)->value );
CHECK_EQUAL( 2, (iter++)->value );
CHECK_EQUAL( 3, (iter++)->value );
CHECK_EQUAL( 4, (iter++)->value );
CHECK_EQUAL( 5, (iter++)->value );
}

{
Tree tree( 0, 0, 10, 10 );
tree.add( IntDatum( 4, 2, 1 ) );
tree.add( IntDatum( 4, 2, 2 ) );
tree.add( IntDatum( 4, 2, 3 ) );
CHECK_EQUAL( 3, tree.size() );

IntData results;
CHECK_EQUAL( 3, tree.get( 4, 2, back_inserter( results ) ) );
CHECK_EQUAL( 1, results[0].value );
CHECK_EQUAL( 2, results[1].value );
CHECK_EQUAL( 3, results[2].value );
}```

Submit to Code Critic: your quadtree.h file. If you created any new new test code, include that too.

### PA3 Task 3: Support range queries

Add the ability to do range queries, i.e., to ask for all points in the quad tree within some rectangular range. A big advantage of quad trees is that only subtrees whose quadrants intersect with the range need to be checked. So if the query has a small range, the tree search will behave somewhat like binary search, rapidly eliminating and never inspect irrelevant distant points.

• template <typename InsertIterator>
void QuadTree<T>::query( const Range2D &range, InsertIterator out ) const

This copies into out all data items that are inside the range, including points on the edge of the range.
```TEST(rangeQuery)
{
QuadTree<IntDatum> tree( 0, 0, 20, 20 );
tree.add( IntDatum( 4, 2, 1 ) );
tree.add( IntDatum( 12, 8, 2 ) );
tree.add( IntDatum( 1, 3, 3 ) );
tree.add( IntDatum( 2, 1, 4 ) );
tree.add( IntDatum( 10, 10, 5 ) );

CHECK_EQUAL( 5, tree.size() );

IntData results;
IntInserter inserter = back_inserter( results );
tree.query( Range2D( 0, 0, 20, 20 ), inserter );
CHECK_EQUAL( 5, results.size() );

results.clear();
tree.query( Range2D( 0, 0, 8, 8 ), inserter );
CHECK_EQUAL( 3, results.size() );

results.clear();
tree.query( Range2D( 10, 10, 20, 20 ), inserter );
CHECK_EQUAL( 1, results.size() );

CHECK_EQUAL( 5, results[ 0 ].value );

results.clear();
tree.query( Range2D( 10, 10, 10, 10 ), inserter  );
CHECK_EQUAL( 1, results.size() );
}```

Submit to Code Critic: whatever code you added or changed. Don't send the entire files or the test code.

Now it's time to see how a quad tree compares to scanning a vector.

The plain text file zipcodes.txt contains the location of over 43,000 US Zip Codes. A sample line is:

`01543	42.380877	-71.96427	MA	Rutland`

There's one zip code per line, and the elements are tab-separated. The above says that the center of the area with Zip Code 01543 in Rutland, Massachusetts is latitude 42.380877 and longitude -71.96427.

In the files zipcode.h and zipcode.cpp define a ZipCode structure with:

• the public double data members x and y for the latitude and longitude, respectively
• the public C++ string data members city, state and zip for the city, state and zip code information
• template <typename InsertIterator>
static void ZipCode::readFile( std::string file, InsertIterator out )

This creates ZipCode data from the given file and copies them into out.

Then put the following timing code in your tests.cpp. This code loads the zip codes into a simple vector, then stores them in your quad tree, then compares querying with several different ranges.

At the start of your tests file, after the includes but before the actual tests, put

```typedef std::vector<ZipCode> ZipCodes;
typedef std::back_insert_iterator< ZipCodes > ZipInserter;

// Prints the time to retrieve all zip codes inside a range from a
// vector and from a quad tree.
void timeRetrieval( Range2D range, const ZipCodes &zipCodes, const ZipTree &zipTree )
{
std::clock_t start, end;
ZipCodes results;
ZipInserter inserter = back_inserter( results );
unsigned int count;

start = std::clock();
for (int i = 0; i < 1000; ++i) {
results.clear();
range.collect( zipCodes.begin(), zipCodes.end(), inserter );
}
end = std::clock();
std::cout << "1000 vector scans for " << results.size() << " Chicago codes: "
<< (end - start) << " ticks" << std::endl;
count = results.size();

start = std::clock();
for (int i = 0; i < 1000; ++i) {
results.clear();
zipTree.query( range, inserter );
}
end = std::clock();
std::cout << "1000 quad tree queries for " << results.size() << " Chicago codes: "
<< (end - start) << " ticks" << std::endl;

CHECK_EQUAL( count, results.size() );
}```

```TEST(timing)
{
ZipCodes zipCodes;
std::clock_t start, end;

start = std::clock();
ZipCode::readFile( "zipcodes.txt", back_inserter( zipCodes ) );
end = std::clock();

std::cout << "Time to load " << zipCodes.size() << " zip codes: "
<< (end - start) << " ticks" << std::endl;

CHECK_EQUAL( 43191, zipCodes.size() );

ZipTree zipTree( -10, -180, 75, -60 );
start = std::clock();
for ( ZipCodes::iterator iter = zipCodes.begin();
iter != zipCodes.end();
++iter ) {
}
end = clock();
std::cout << "Time to build tree: " << (end - start) << " ticks" << std::endl;

CHECK_EQUAL( zipCodes.size(), zipTree.size() );

Range2D range( 41.34,	-87.89, 42.1,	-87.55 );
timeRetrieval( range, zipCodes, zipTree );

range.resize( 0.5 );
timeRetrieval( range, zipCodes, zipTree );
}```

Add a few other ranges, including looking for just one specific point, using a range with whose lower and upper bounds are equal. Collect the timings on all of these. Run 1000 times to get measurable results.

Submit to Code Critic: your zipcode.h and zipcode.cpp files, and any new tests you wrote.

## Wrap Up

When all of the above has been completed, and approved as necessary via the Code Critic, check over your code and send it in, along with a text or Word file reporting on the times from the last task.

Make sure your program runs standalone, with no user input of any kind. That includes not stopping at the end with a "hit any key to continue" message. If I can't run your program automatically from a script, it will be returned.

Your code should compile without errors or warnings.

Be sure your test code file has an opening header comment that identifies the authorship of the code, as follows:

```// EECS311 PA3 solution: Quad trees
// Date: November 12, 2009
// Authors: T.H.E.Hacker
//
// Except for code provided by the instructor or textbook, this code was written
// solely by the authors listed above, and not copied from another student
// or external source.
```

Run make handin to generate a Zip archive of all your source code. By design, the Magic Makefile does not include the word list file. Do not include the word list file.

Email your analysis document and the Zip archive produced by the Magic Makefile to me. Use the Subject EECS 311 PA 3 Quad Tree.