EECS 311: DATA STRUCTURES |
Home Course Info Links Grades |
Lectures Newsgroup Homework Exams |

The goals of this assignment are:

- learn how to design and implement an efficient tree-based data structure for two-dimensional data
- that supports rectangular range queries

Two-dimensional data arises in many contexts, particularly problems
involving spatial information, such as
geographical information systems (GIS) and video games. A common task is
finding all points
within a given region or near a given point. The techniques
that work with scalar data don't
do so well here. Red-black trees don't work because
2D points can't be given an ordering (`operator<`

) that
keeps nearby points together. Hashtables don't work because they don't keep similar
data close together at all. And of course regular vectors
have poor O(N) search times.

One solution is **quad trees**. These
are described in Exercises 12.38 through 12.40 in Chapter 12,
and online in many places. (For 3D data, a similar
data structure is the octree.)

Your job is to implement quad trees and range queries, and test their performance on a moderately sized dataset.

Go slowly. Be sure the first subtask is working before going to the next. Get code for the first subtask approved in the Code Critic before submitting the code for the next subtask.

**The Debugger is your friend.** You will almost
certainly have bugs in your initial coding. Use the debugger
to trace through your code to uncover why points get lost or
your program crashes.

But don't just write code, watch it break, and try tracing each step with the debugger. Work out yourself, by hand, in detail, exactly how the quad trees should grow with the test cases I've given below. When will splitting occur? What quadrant will a point go into? Once you do that, you'll quickly see in the debugger when the code is going down the wrong track. I speak from experience here.

Create a new project directory called `quadtrees`. Copy
the Magic Makefile
into it.

Define a `Range2D` structure in the
files `range2d.h` and `range2d.cpp`. This
will be used to represent a range of 2-dimensional data, e.g.,
all points whose X value is between 1 and 10 and all Y values between
25 and 35. This structure is worth developing and testing
independently.

Here's the public API:

- Constructor:
`Range2D::Range2D( double xmin, double ymin, double xmax, double ymax )`

Creates a range object, with the**public**data members`xmin`,`ymin`,`xmax`, and`ymax`. `bool Range2D::contains( double x, double y ) const`

Returns true if the point (x, y) is within the range, i.e.,`xmin ≤ x ≤ xmax`and`ymin ≤ y ≤ ymax`.`template <typename InputIterator, typename InsertIterator>`

void Range2D::collect( InputIterator*begin*, InputIterator*end*, InsertIterator*out*) const

Copies into*out*all points from*begin*to*end*that are inside the range. (The template typenames refers to iterator concepts described here.)`bool Range2D::intersects( const Range2D &`*range*) const

Returns true if the two ranges overlap, i.e., there are any points that both contain.`void Range2D::resize( double`*factor*)

Resizes the range. The new sides are*factor*times the old sides. The center of the range remains the same.

For testing purposes:

- add the file datum.h to your project
- create a
`tests.cpp`UnitTest++ file - include
`datum.h`in`tests.cpp`

Put the following tests (and others, if you want) in
`tests.cpp`.
Note the use
of `back_inserter` to get a suitable iterator for the
vector.

TEST(rangeConstructor) { Range2D range( 3, 2, 6, 4 ); CHECK_EQUAL( 3, range.xmin ); CHECK_EQUAL( 2, range.ymin ); CHECK_EQUAL( 6, range.xmax ); CHECK_EQUAL( 4, range.ymax ); } TEST(rangeGrow) { Range2D range( 3, 2, 6, 4 ); range.resize( 2 ); CHECK_EQUAL( 1.5, range.xmin ); CHECK_EQUAL( 1, range.ymin ); CHECK_EQUAL( 7.5, range.xmax ); CHECK_EQUAL( 5, range.ymax ); } TEST(rangeShrink) { Range2D range( 3, 2, 6, 4 ); range.resize( 0.5 ); CHECK_EQUAL( 3.75, range.xmin ); CHECK_EQUAL( 2.5, range.ymin ); CHECK_EQUAL( 5.25, range.xmax ); CHECK_EQUAL( 3.5, range.ymax ); } TEST(rangeContains) { Range2D range( 3, 1, 5, 3 ); CHECK( range.contains( 4, 2 ) ); CHECK( range.contains( 3, 1 ) ); CHECK( range.contains( 5, 3 ) ); CHECK( !range.contains( 2, 5 ) ); CHECK( !range.contains( 3, 0 ) ); CHECK( !range.contains( 6, 2 ) ); CHECK( !range.contains( 3, 4 ) ); } TEST(rangeCollect) { Range2D range( 3, 1, 5, 3 ); IntData data; data.push_back( IntDatum( 2, 5, 1 ) ); data.push_back( IntDatum( 4, 2, 2 ) ); data.push_back( IntDatum( 3, 0, 3 ) ); data.push_back( IntDatum( 3, 1, 4 ) ); data.push_back( IntDatum( 6, 2, 5 ) ); data.push_back( IntDatum( 5, 3, 6 ) ); data.push_back( IntDatum( 3, 4, 7 ) ); IntData results; range.collect( data.begin(), data.end(), back_inserter( results ) ); CHECK_EQUAL( 3, results.size() ); CHECK_EQUAL( 2, results[0].value ); CHECK_EQUAL( 4, results[1].value ); CHECK_EQUAL( 6, results[2].value ); } TEST(rangeIntersects) { Range2D range( 3, 1, 6, 4 ); CHECK( range.intersects ( range ) ); CHECK( range.intersects( Range2D( 4, 2, 7, 5 ) ) ); CHECK( range.intersects( Range2D( 2, 0, 4, 2 ) ) ); CHECK( range.intersects( Range2D( 2, 0, 7, 5 ) ) ); CHECK( range.intersects( Range2D( 4, 2, 5, 3 ) ) ); CHECK( !range.intersects( Range2D( 4, 5, 6, 7 ) ) ); CHECK( !range.intersects( Range2D( 8, 3, 10, 6 ) ) ); }

**Submit to Code Critic:** your `range2d.h`
and `range2d.cpp`, clearly marking which is which.
Don't
send test code, unless you added any new tests. If so, send just
the new tests.

Create a file `quadtree.h`. In it,
define a generic templated `QuadTree` structure. This
will be used to store data associated with 2-dimensional coordinates. The data might
be anything, from a string or number to an arbitrary data structure.

Unlike a 2-d tree (chapter 12, section 12.6), a quad tree needs to know what range of points it will contain. The range can be as large as we need, though it's best to keep it as small as possible.

Exercises 12.38 through 12.40 and Figure 12.53 show the basic idea. Initially the tree contains one node that's empty. Adding one point to a node is easy. If another point is added to the node, however, 4 child nodes are created, one for each quadrants (hence the name), the old point is moved to itse appropriate subquadrant, and then, recursively, the new point is added to its appropriate subquadrant.

Note that if the new point is very close to the old point, it will probably go to the same subquadrant, and the splitting process will repeat, until subquadrants become small enough that the points go into different ones.

This raises the issue of what to do with equal points. If we just let splitting occur, we'd have infinite recursion. Instead, use this variation of the basic quad tree algorithm. Each node has

- the range it covers, represented with a
`Range2D` - a count of the total number of data items stored within this range
- a vector, possibly empty, of data items stored in this node
- an array of pointers to the 4 subquadrants

When an attempt is made to add a new item to a node:

- if the count is zero, add the item to this node's vector;
- otherwise, if the vector is empty, then recursively add the item to the appropriate subquadrant;
- otherwise, if the item's coordinates match those in the vector, add it to the vector;
- otherwise, create nodes for the 4 subquadrants, move all the data in the vector to the appropriate child, and recursively add the item to the appropriate subquadrant.

Here's the public API:

- Constructor:
`QuadTree<T>::QuadTree( double xmin, double ymin, double xmax, double ymax )`

Creates a quad tree object that will contain data of type`T`associated with points inside the range specified by the four given values. For simplicity:- The tree should have a
**public**data member`range`, of type`Range2D`, with the size given by`xmin`,`ymin`,`xmax`, and`ymax`. - The data type
`T`must have public data members`x`and`y`for the coordinate point.

- The tree should have a
`void QuadTree<T>::add( T datum )`

Adds a data item to the tree.`template <typename InsertIterator>`

size_t QuadTree<T>::get( double x, double y, InsertIterator out) const

Copies all values stored for the given (x, y) to*out*, and returns the number of values copied.`size_t QuadTree<T>::size( ) const`

Returns how many points are stored in the tree.

Then add the following tests. Make sure you understand what these tests are checking for. Post a question if you don't.

TEST(emptyTree) { QuadTree<IntDatum> tree( 0, 0, 10, 10 ); CHECK_EQUAL( 0, tree.size() ); } TEST(addOnePoint) { QuadTree<IntDatum> tree( 0, 0, 10, 10 ); IntDatum datum( 4, 2, 1 ); tree.add( datum ); CHECK_EQUAL( 1, tree.size() ); IntData results; CHECK_EQUAL( 1, tree.get( 4, 2, back_inserter( results ) ) ); CHECK_EQUAL( datum.value, results[0].value ); } TEST(addSeveralPoints) { QuadTree<IntDatum> tree( 0, 0, 20, 20 ); tree.add( IntDatum( 4, 2, 1 ) ); tree.add( IntDatum( 12, 8, 2 ) ); tree.add( IntDatum( 1, 3, 3 ) ); tree.add( IntDatum( 2, 1, 4 ) ); tree.add( IntDatum( 10, 10, 5 ) ); CHECK_EQUAL( 5, tree.size() ); IntData results; IntInserter inserter = back_inserter( results ); CHECK_EQUAL( 1, tree.get( 4, 2, inserter ) ); CHECK_EQUAL( 1, tree.get( 12, 8, inserter ) ); CHECK_EQUAL( 1, tree.get( 1, 3, inserter ) ); CHECK_EQUAL( 1, tree.get( 2, 1, inserter ) ); CHECK_EQUAL( 1, tree.get( 10, 10, inserter ) ); sort( results.begin(), results.end() ); IntData::iterator iter = results.begin(); CHECK_EQUAL( 1, (iter++)->value ); CHECK_EQUAL( 2, (iter++)->value ); CHECK_EQUAL( 3, (iter++)->value ); CHECK_EQUAL( 4, (iter++)->value ); CHECK_EQUAL( 5, (iter++)->value ); } TEST(addDuplicates) { typedef QuadTree<IntDatum> Tree; Tree tree( 0, 0, 10, 10 ); tree.add( IntDatum( 4, 2, 1 ) ); tree.add( IntDatum( 4, 2, 2 ) ); tree.add( IntDatum( 4, 2, 3 ) ); CHECK_EQUAL( 3, tree.size() ); IntData results; CHECK_EQUAL( 3, tree.get( 4, 2, back_inserter( results ) ) ); CHECK_EQUAL( 1, results[0].value ); CHECK_EQUAL( 2, results[1].value ); CHECK_EQUAL( 3, results[2].value ); }

**Submit to Code Critic:** your `quadtree.h`
file. If you created any **new** new test code, include that too.

Add the ability to do range queries, i.e., to ask for all points in the quad tree within some rectangular range. A big advantage of quad trees is that only subtrees whose quadrants intersect with the range need to be checked. So if the query has a small range, the tree search will behave somewhat like binary search, rapidly eliminating and never inspect irrelevant distant points.

Add the following to the quad tree API:

`template <typename InsertIterator>`

void QuadTree<T>::query( const Range2D &range, InsertIterator out ) const

This copies into`out`all data items that are inside the range, including points on the edge of the range.

TEST(rangeQuery) { QuadTree<IntDatum> tree( 0, 0, 20, 20 ); tree.add( IntDatum( 4, 2, 1 ) ); tree.add( IntDatum( 12, 8, 2 ) ); tree.add( IntDatum( 1, 3, 3 ) ); tree.add( IntDatum( 2, 1, 4 ) ); tree.add( IntDatum( 10, 10, 5 ) ); CHECK_EQUAL( 5, tree.size() ); IntData results; IntInserter inserter = back_inserter( results ); tree.query( Range2D( 0, 0, 20, 20 ), inserter ); CHECK_EQUAL( 5, results.size() ); results.clear(); tree.query( Range2D( 0, 0, 8, 8 ), inserter ); CHECK_EQUAL( 3, results.size() ); results.clear(); tree.query( Range2D( 10, 10, 20, 20 ), inserter ); CHECK_EQUAL( 1, results.size() ); CHECK_EQUAL( 5, results[ 0 ].value ); results.clear(); tree.query( Range2D( 10, 10, 10, 10 ), inserter ); CHECK_EQUAL( 1, results.size() ); }

**Submit to Code Critic:** whatever code you
*added or changed*. Don't send the entire files or the test code.

Now it's time to see how a quad tree compares to scanning a vector.

The plain text file zipcodes.txt contains the location of over 43,000 US Zip Codes. A sample line is:

01543 42.380877 -71.96427 MA Rutland

There's one zip code per line, and the elements are tab-separated. The above says that the center of the area with Zip Code 01543 in Rutland, Massachusetts is latitude 42.380877 and longitude -71.96427.

Download this file and put it in your project directory.

In the files `zipcode.h` and `zipcode.cpp` define
a `ZipCode` structure with:

- the public double data members
`x`and`y`for the latitude and`longitude`, respectively - the public C++ string data members
`city`,`state`and`zip`for the city, state and zip code information `template <typename InsertIterator>`

static void ZipCode::readFile( std::string file, InsertIterator out )

This creates`ZipCode`data from the given file and copies them into`out`.

Then put the following timing code in your `tests.cpp`. This
code loads the zip codes into a simple vector, then stores them in
your quad tree, then compares querying with several different ranges.

At the start of your tests file, after the includes but before the actual tests, put

typedef std::vector<ZipCode> ZipCodes; typedef QuadTree<ZipCode> ZipTree; typedef std::back_insert_iterator< ZipCodes > ZipInserter; // Prints the time to retrieve all zip codes inside a range from a // vector and from a quad tree. void timeRetrieval( Range2D range, const ZipCodes &zipCodes, const ZipTree &zipTree ) { std::clock_t start, end; ZipCodes results; ZipInserter inserter = back_inserter( results ); unsigned int count; start = std::clock(); for (int i = 0; i < 1000; ++i) { results.clear(); range.collect( zipCodes.begin(), zipCodes.end(), inserter ); } end = std::clock(); std::cout << "1000 vector scans for " << results.size() << " Chicago codes: " << (end - start) << " ticks" << std::endl; count = results.size(); start = std::clock(); for (int i = 0; i < 1000; ++i) { results.clear(); zipTree.query( range, inserter ); } end = std::clock(); std::cout << "1000 quad tree queries for " << results.size() << " Chicago codes: " << (end - start) << " ticks" << std::endl; CHECK_EQUAL( count, results.size() ); }

Then add this to your tests:

TEST(timing) { ZipCodes zipCodes; std::clock_t start, end; start = std::clock(); ZipCode::readFile( "zipcodes.txt", back_inserter( zipCodes ) ); end = std::clock(); std::cout << "Time to load " << zipCodes.size() << " zip codes: " << (end - start) << " ticks" << std::endl; CHECK_EQUAL( 43191, zipCodes.size() ); ZipTree zipTree( -10, -180, 75, -60 ); start = std::clock(); for ( ZipCodes::iterator iter = zipCodes.begin(); iter != zipCodes.end(); ++iter ) { zipTree.add( *iter ); } end = clock(); std::cout << "Time to build tree: " << (end - start) << " ticks" << std::endl; CHECK_EQUAL( zipCodes.size(), zipTree.size() ); Range2D range( 41.34, -87.89, 42.1, -87.55 ); timeRetrieval( range, zipCodes, zipTree ); range.resize( 0.5 ); timeRetrieval( range, zipCodes, zipTree ); }

Add a few other ranges, including looking for just one specific point, using a range with whose lower and upper bounds are equal. Collect the timings on all of these. Run 1000 times to get measurable results.

**Submit to Code Critic:** your `zipcode.h`
and `zipcode.cpp` files, and any *new* tests you wrote.

When all of the above has been completed, and approved as necessary via the Code Critic, check over your code and send it in, along with a text or Word file reporting on the times from the last task.

Make sure your program runs standalone, with no user input of any kind.
That includes *not* stopping at the end with a "hit any key to continue"
message. If I can't run your program automatically from a script, it will
be returned.

Your code should compile without errors or warnings.

Be sure your test code file has an opening header comment that identifies the authorship of the code, as follows:

// EECS311 PA3 solution: Quad trees // Date: November 12, 2009 // Authors: T.H.E.Hacker // // Except for code provided by the instructor or textbook, this code was written // solely by the authors listed above, and not copied from another student // or external source.

Run make handin to generate a Zip archive of all your source code. By design, the Magic Makefile does not include the word list file. Do not include the word list file.

**Email** your analysis document and the Zip archive produced by
the Magic Makefile to me. Use
the **Subject** EECS 311 PA 3 Quad Tree.