Last updated: July 3, 2009
[Jump to How to Write Good Tests,
Unit Testing Tips]
Testing and test-driven development is a large and important area of software development usually overlooked in computer science courses. In addition to these notes, you should also read:
The typical approach to testing code is to write it, compile it, run it, and see if it seems to work. This is barely better than not testing at all. Doing serious testing this way is tedious and way too dependent on the patience, energy, skills, and fortitude of the programmer. Even worse, it depends on the coding being done on schedule -- and that never happens. As a result, there's just no time or incentive to do serious testing.
Many companies hire people to do nothing but run tests. These are the Quality Assurance (QA) teams. It's a job often given to interns and first-year co-op students. Scripts are written telling the QA team what to do and what to look for. This approach makes sense for things like checking the look and feel of an interface, but it's still inconsistent, expensive, and prone to error.
There's a better way that turns the code then test process on its head.
Test-Driven Development (TDD) means that you write the tests before you write the code to pass the tests. By reversing the traditional order of code then test, you write code that is:
For testing code thoroughly, quickly and consistently, the best approach is unit testing. A unit test is code that tests other code. Instead of running your program yourself, you run the unit test code. The test code will report any problems.
To make it easy to write and run unit tests, programmers have developed libraries for testing. While there have been many such libraries for decades, the JUnit library for Java has become the model for unit testing packages in a number of different languages.
In this course, we'll use UnitTest++, one of several C++ unit testing packages inspired by JUnit. For a list of many other options for C and C++ unit testing, see here.
The set of tests used to test an application is commonly called a test suite. Any one test is usually neither good nor bad. What's important is the entire suite of tests.
What makes a test suite good? Two things: coverage and clarity. Many books and websites talk about test coverage. Clarity is less discussed.
In general, every test should have a reason for being. Writing tests for 20 randomly picked pieces of data is not a good way to go. Such tests are redundant, inefficient, and incomplete.
Test coverage is a common metric for a test suite. There are formal definitions of different kinds of test coverage, but we'll use it here informally to mean the thoroughness of the test suite. Does the suite test for all possible things that might go wrong? You want a test suite that has good coverage and few if any redundant tests.
For example, the rules for leap years are a little more complex than just "divisible by 4." Years divisible by 100 are not leap years, but years divisible by 400 are. A good test suite would check at least one year for each the following cases:
- not divisible by 4
- divisible by 4 but not 100
- divisible by 100 but not 400
- divisible by 400
Another important thing to write tests for are boundary cases. A boundary case is a situation at or very near the edge of some change in program behavior. "Off-by-one" errors are very common in programming.
For example, a common task is to convert numeric scores to letter grades, with A for 90 to 100, B for 80 to 89, and so on. Tests should be written for
- scores right on the boundaries, i.e., 100, 90, 89, 80, ...
- scores next to the boundaries, i.e., 88, 91, ...
- special cases, like 0
- scores past boundaries, like 101
Unit tests are code. All code should be clear about what it is doing. It should be easy for a programmer reading your test suite to know what things it checks for, and easy to know where to add new tests.
The best way to communicate what's being tested is through well-named highly-focused test functions. You don't want one big test function with line after line of assertions. Each test function should test for a particular situation that the application you're building needs to handle. The name of the function should communicate what situation it checks.
Some situations may affect several different aspects of an application.
For example, when testing for an empty grade book, you'd probably want
to separate out things like
As an example of what experienced test-driven developers do, here are the function names one programmer used for an employee application using PyUnit for Python. Note how even without reading the code, you get a good idea of what's being checked for, and how focused each test is:
test_accurately_report_hours_worked test_reports_active_status_once_hired test_reports_inactive_status_when_terminated test_refuses_invalid_phone_numbers test_allows_nonwestern_names test_accepts_long_names test_accepts_unicode_name test_accepts_unicode_in_addresses test_should_report_subordinate_employees test_should_report_supervisors_correctly test_accepts_valid_birthdate test_refuses_invalid_date_as_birthdate test_refuses_future_birthdate test_reports_years_of_service test_rounds_years_of_service_downward test_rounds_months_of_service_downwards test_reports_vesting_based_on_months_of_service test_reports_role test_access_to_PII_requires_admin_access test_allows_variations_on_email_addresses test_allows_mutiple_phone_numbers
Don't create tests like testConstructor or testNegatives. These aren't very helpful when a maintainer is trying to see what's been tested for so far.
Instead, create tests to reflect various "business-logic" rules, such as testInitialBalanceIsZero and testNegativeLengthThrows. Breaking down tests like these keeps them small, and leads to names that help a maintainer see very quickly what's been tested for.
It's better to have many separate well-named tests, than a few very big tests. This will happen automatically if each test is focused on one simple rule.
One danger with big tests is that side effects start accumulating in class instances. You may have an assertion that works when it follows several other operations, but would fail if called on a fresh instance.
Another problem with big test is that many unit testing frameworks stop executing a test as soon as an assertion in it fails. With one big test, you don't get to see what else might have worked or failed in that. With many small tests, you get more data about what's working and what's not in a single test run.
A good rule of thumb is: if you have to make a new class instance, you probably want a new test function.
This makes it easy to see what's being tested. The worst -- but distressingly common -- form of test code is this:
Account a1; Account a2( 1000 ); Account a3; a3.setBalance( 5000 ); CHECK_EQUAL( 0, a1.getBalance() ); CHECK_EQUAL( 1000, a2.getBalance() ); CHECK_EQUAL( 5000, a3.getBalance() );
With this code, you can't tell at a glance why the numbers expected are the right ones.
A better way to write the above would be
CHECK_EQUAL( 0, Account().getBalance() ); CHECK_EQUAL( 1000, Account( 1000 ).getBalance() ); Account acct; acct.setBalance( 5000 ); CHECK_EQUAL( 5000, acct.getBalance() );
If something is not specified, it shouldn't be tested for. For example, if you're implementing a square root function, and the result of square root for negative numbers is undefined, but your code happens to return -1 in such cases, you should not put in a test for -1. That locks in undefined behavior, restricting later implementations.
Tests should make sure that specifications are met, and nothing more.
Student often use test functions with values that are completely unlike what the function might actually get, e.g., 1, 2, and 3 for grades or bank balances, or "dummy" for the name of a course.
Testing with unrealistic values has two problems. First, it confuses maintainers who have to wonder if the variables are holding something different than what they claim to hold. Second, your tests may be missing problems that only arise with the kinds of values that normally occur, e.g., course names like "EECS 211: Introduction to C++".
A fairly common question in unit testing is "How do I unit test private functions?" A fairly standard response is "Don't do that." Opinions differ on this, but I think there are some good reasons why not to do it.
The first argument against unit testing private functions is that private functions are implementation decisions. As such, they should not be locked down. They should be open to change. Unit testing supports change. If your code has 100's or 1000's of tests. that helps ensure that a change you make doesn't break anything.
Unit testing private functions makes them harder to change. You not only have to change the code but most likely you have to change the tests. For example, a private function to parse arithmetic expressions might have used characters to represent operators. Then later you decide it's better to use enumerated constants. Bam! Tests break, even though the public functions still pass all their tests.
The second argument is that unit testing private functions puts the focus on the wrong place. The question is not "does this private function do the right thing?" The real question is how it affects the public API. For example, suppose we had a rational number class. A common internal function for such a package is a greatest common divisor (GCD) function, to help reduce fractions to simplest terms. Mathematically, the GCD should always be positive, even if the arguments are negative. Suppose our GCD function gets that wrong. Does it matter? Only if the tests for the public rational number functions are wrong. Otherwise, it doesn't and shouldn't matter. We shouldn't be spending time to make the GCD function pass those tests.
The third argument is that unit testing private functions can, in many languages, force you to make functions public (or protected) so that they're available for testing. That's clearly bad. Depending on the language being used, there may be alternatives for this, but most require some extra effort or change in normal programming practice. This is not a zero-cost option, and can easily make maintenance more problematic.
Comments? Let me know!