Using Neural Nets to Recognize Handwriting
Jeff Hentschel - contact

Northwestern University
EECS 395: Machine Learning
Bryan Pardo

Justin Li - contact
Abstract

Converting printed text into a digital format can have many benefits. This paper will discuss the benefits and shortcomings of using neural nets to classify individual letters stored as images.


Inspiration

People are constantly writing things down. These notes range from the minutes of a meeting, a short reminder, a list of things to do, to a long-winded letter to a friend. Yet in this digital age, some may wonder why people still use paper. The answer is simple: an effective substitute for paper has not been invented. Paper is still the champion when it comes to efficiency, affordability, usability, and mobility. However, having a digital copy of these notes can also present many benefits. Most notable is the ability to search and organize the notes in myriad ways instantaneously. Being able to search a huge database of written notes saves time and money, two things that are important in the business world. People need a way to convert handwritten text to digital text, which can be searched and organized however the user wants.


Solution: Neural Nets

To solve this problem, we created a system which can recognize handwritten letters. Our system uses neural nets to classify a given image as 1 of 52 distinct charactes (the upper and lowercase alphabet). A neural network consists of interconnected nodes that work together to output a result. Neural nets are suited for handwriting recognition because they allow for variation of the data; images of the same character may not look exactly the same. This imprecise nature of the data makes neural nets the optimal choice. The input to our neural net consisted of 2600 black-and-white gif images - each of a single letter. The letters were separated before being fed into the algorithm so we could focus on the efficiency of the recognition algorithm as opposed to the separation of letters.


Training and Testing: N-Fold Cross-validation

Our data set consists of 50 complete sets (52 characters each) obtained from a variety of people aged 18-25. The letters were scanned in and converted to 100x70 pixel black-and-white gif images. To test our neural net, we used N-fold validation, which takes the data set and splits it into N sections. Training was done on (N-1) sections, with the remaining section being used as the testing and validation sets. The sections will then be rotated. With 2600 images to use, we will train on 70%, test on 15%, and use the remaining 15% for validation. The network will be trained to minimize the mean squared error on the testing set. The learner is said to get the correct answer, only if the top response is the correct letter.

If the learner was sucessful, it should be able to classify letters at a rate better than chance, which is 1/52 (any letter out of the combined upper and lower case alphabet).


Results

The results were disappointing - the network did no better than chance. After training the network for 1000 epochs using the scaled conjugate gradient method, the network only correctly identitified 9 out of 260 letters. This is roughly 1/52, the same as any chance performance. We looked at the weights from the input layer to the hidden layer, and obtained this image:

This image shows how much importance the network placed on each pixel. Here darker pixels means the pixel was mostly discarded, while light pixels means the pixel contributed (positively or negatively) to a node. We observed that the network learned that the more important pixels are near the center of the image, as indicated by the relative lightness. However, this distinction between important and unimportant pixels was not enough for the network to differentiate between letters.


Final Paper

The final paper is available for download.

Word Document

?2007 Jeff Hentschel and Justin Li
Please also visit jeffaudio