The Work done so far...

(PAGE -2-)



What the future holds... (12-May-99)

Based on the current status of the system it is possible to make predictions about the extensibility and usefulness of the system.
Limiting the flexibility of the gesture recogniser system is its inherent ability to recognise only a small number of different gestures (from a given gesture set). This is a direct consequence of the feature-based approach that the system takes. It was specifically designed to be "easily" implemented. While much work had already been done on using neural networks for recognition processes, this approach was to test the viability to go for a simpler system that would ultimately be easy to realise and not rely on heavy computing- or memory-resources. Some work-arounds for the limitations the gesture recogniser faces are listed here:

context-switching:
the number of recognisable gestures could be virtually extended within an application, by varying at run-time the set of recognisable gestures. Gestures to drive a File-Manager would most likely be different from those that drive Editing-Commands in the same application. So for each context in a program a different set of gestures could be loaded (or different gesture recognisers be instantiated) to accommodate for the change in context. Even though this would dramatically increase the number of recognisable gestures, each set of gestures would of course have to be evaluated and validated individually before the application could run. (This approach should be considered even for command-systems that could recognise several hundred different gestures at a time, seeing as it decreases the chance for misinterpretation for the set of applicable gestures in a given context.)
Obviously the approach fails as soon as too many gestures have to be recognisable at any given time.

extending the feature-diversity:
Seeing as it is the limited number of available features that dictate the number of differentiable gestures, it is imaginable to extend the set of features to increase the recognition potential of the system. While this is theoretically feasible and in fact the system has been tailored to be easily extendable in this manner as new feature can be plugged into the system without difficulty, there are certain practical aspects to consider. The current system is using recogniser 3, which is a voting system. It depends on the majority of features to vote for a particular gesture in order to recognise it. Now the more feature are introduced into the system, the more this voting system will have to be altered. As it has been shown that not all features are equally good at recognising certain gestures it is foreseeable that the principle of a voting majority may no longer apply. This will happen if for a certain gesture only very few (even if extremely reliable) features are eligible to vote. The current system demands a minimal percentage of available features to be eligible to vote, which may not be supported by the introduction of a large number of new features. This reasoning is highly theoretical and depends heavily on the distribution of eligible features among the gestures to be recognised. A counter-argument to this problem would be that a good feature would be able to recognise a wide range of gestures anyway and only those features should be chosen.
One more problem has to be addressed in this context and that is of the availability of features. Finding new and useable features is a somewhat creative process and thus might pose difficulty. On the other hand literature is readily available to suggest distinctive features of curves and splines. After identifying those the problem reduces to implementing them within the given system and probing them with test gestures for reliability.
As a conclusion it can be said that adding new features to the system will increase the potential number of recognisable gestures, but care has to be taken in not underestimating the changes in recognition behaviour this will bring about.

All in all it is questionable that the presented gesture recognition system has a future as such. The rapidly falling prices for computer hardware make system-resources too readily available for more complex and computationally expensive and thus more reliable and diverse gesture recognition systems. What the system can be seen as though, is an implementation of a feature-based decision making system, which could in itself have use in numerous other applications, where decisions to be taken depend on a variety of factors and the answer-space is more complex than just 'yes' and 'no'. One of the major strongholds of the system is that it not only gives results in form of recognised gestures, but also feedback on the recognition process (i.e. the quality and reliability of the given result). This information in turn can be used as input to further decision-making steps.

<BACK TO TOP>


What the GestureReconizer can do for you
[and what it can't] (12-May-99)

The gesture recognition system can, with a good deal of accuracy, recognise a default set of gestures. This was the main feature of the old system. It was further extended to be able to learn new gestures or 'train' existing ones (the formal difference here being, that new gestures have to be invented, while training existing ones requires merely repetitive iterations of a given set). The capability of doing so is not unlimited though. Given the feature-based structure of the system, not all sets of gestures are useable for the recognition process. Several aspects have to taken into consideration:

With this in mind it is obvious that the system cannot guarantee a given set of gestures to work satisfactorily. What it can do is evaluate the gestures automatically. The way it does this is as follows:

A set of gestures is recorded and saved. This means that for each performance of the gesture the feature-values as returned by the system are stored in a file. A file is recorded for each gesture. These files are then evaluated. A statistics package works on the data, eliminating outliers and calculating minimum, maximum, average and normalised standard deviation for each feature. These are the values that the gesture recogniser system will use to recognise the gestures. Already at this stage, the evaluation program will give feedback about the quality of the gesture, based on the value for the normalised standard deviation. The present system is tailored to judge any feature returning a norm. std. dev. of 10% or more to be unsuitable to recognise this gesture reliably. The recogniser 3 makes use of this information by not allowing a feature to vote for a gesture if it is considered to be a poor recogniser of this gesture.
The next step is to assemble the whole set of gestures in a directory and link the files with a gesture.dat file, which lists the number of gestures along with the names of the gestures in the set. Then the set as a whole is validated, which means cross-checked for consistency. If the real-line spacing for any two gestures for a given feature is too small, then they are both disallowed (in praxis this means that their norm. std. dev. is risen beyond the 10% barrier), because it is assumed that the feature will not be able to reliably distinguish the two gestures. After this has been done, the validator will give information about how many features are suitable for recognising the gestures and thus allow an estimate of how well a given set of gestures will work. This point has to be stressed.

The gesture recogniser system will not work satisfactory with any set of gestures, but only with those, that it itself approves of.

<BACK TO TOP>


Selection of vertices in two dimensional space (27-May-99)

The need to come up with a method to select vertices in two dimensional space made me invent an algorithm to disect a closed curve of arbitrary form into concave subcurves. The problem for determining whether a point is in- or outside a concave curve is relatively easy. Nonetheless the solution still seemd awkward and so after some thinking I came up with another method (actually it is so easy that I'm sure that it's been around for a long time. Log (27.05.199): Re-invented the wheel again ;-) ). When still thinking about the first solution I noticed that points inside the curve always cross its boundary an odd number of times, while those outside always cross it an even number of times (if at all). While in my first attempt I saw this only as a side-effect, my second solution is soley based on this property...

Figure 1

As can be seen in figure 1, the number of times that a ray shot off from a vertex in an arbitrary direction (for simplicity lets take the positive vertical direction - or „upwards") gives a direct indication of whether the vertex is inside the curve or not (this even works with self-intersecting curves, as can easily be shown). In this figure example a shows no intersection and vertex a therefore is outside of the curve, example b has one (odd) intersection and is inside, example c has two (even) and is outside (and so on).

The solution to the selection problem is simple. Assuming that a curve is made up of a number of line-segments with their respective starting- and ending-points (the ending-point of one being the starting point of another), we simply traverse through the line segments and count the number of segments that would be intersected by an imaginary ray emitted from the vertex under consideration. Since this
test has to be performed for all vertices to be examined, optimisations are welcomed if not necessary. To find out whether the imaginary ray intersects a line-segment or not, we look at figure 2. The segement of concern is black, while the adjoining ones are of light gray. Several areas of interest can be identified. Should the vertex be contained within the areas 'A' or 'C' an intersection with the segment is impossible because the vertex' x-coordinate is either too big or too small (remembering that we choose the arbitrary ray direction to be upwards). Similar holds for vertices in area 'D'. Rays emitted upwards from here
are not able to intersect the segment lying below. An opposite but also definite case holds in area 'B'. All upward rays emitted here necessarily intersect the segment (a special case here are vertical lines, which are dealt with later). The area left in the middle needs further calculations as example a does intersect the segment, while example b does not. Here a simple calculation is required to determine the position of the vertex relative to the segment. The intersection point of the ray originating from vertex (x,y) with the segment ((x1,y1) -> (x2,y2)) is y1+(x-x1)*(y2-y1)/(x2-x1). If y is equal or smaller to this value, then an intersection occurs, otherwise not.
A special case arises, when the segment is parallel to the imaginary ray. Theoretically the ray and the segment would have infinitely many intersections on a real number scale. Since here we only look at the endpoints of segments this is where great care has to be taken. When looking at a line segment only one of the endpoints may be included. Should both be
included, vertices lying directly under one of the

Figure 2

end-points would be counted twice (thus changing the in- /outside polarity). Not only does this specification deal with vertices situated underneath end-points, but also subsequently with vertical segents (mentioned earlier). Vertical segments are defined as having the x-values of their end-points the same. If the x-value of the vertex under examination is greater than one of the endpoints, but not smaller than the other it will not be considered. This means that vertical segments are never considered. This does not constitute a problem, because in a closed curve all vertical segments must be joined by non-vertical segments. So the whole of the vertical segment will be considered by one of the end-points of the segment connecting to the vertical one.

<BACK TO TOP>


 

Selection of vertices in three dimensional space (27-July-99)

Seeing as the selection of vertices in 2D space was successfully implemented, one of the next challenges was to implement the 3D equivalent.


Figure 1

In praxis this meant projecting points and from three dimensional space onto a plane and then performing the two-dimensional selection test described earlier on. To understand how this is done, we look at figure 1, which illustrates the general selection process. In the application I created a drawing pointer shows the position of where the selectionspline will be drawn. This works like a three dimensional pen, which is activated by any button on the new improved Stick IIâ . The pointer itself is controlled with a polhemus tracker.
The viewer is assumed to be positioned somewhere in space with an arbitrary orientation. A spline plane is defined normal to the viewvector of the viewer (the direction he/she is looking at) and at a distance away from the viewer such that the vertices comprising the selectionspline are as close to it as possible. In this case this is achieved by restricting the plane to contain the "centre of mass" of the spline (i.e. the average of all vertices of the spline).


Figure 2
Vertices of objects to be selected can be anywhere in space. The main problem here is to map the object- and spline-vertices in a consistent manner and such that they end up on a plane in a distribution that resembles what the viewer saw when he/she selected the vertices.
Several transformations have to be performed in order to implement this mapping. This can be seen in figures 2 through 5.
Figure 2 shows the above set-up again in a top-down view (looking down the y-axis of a right-handed co-ordinate system).
The first transformation necessary is a translation of the viewpoint onto the origin of the co-ordinate system. The resulting picture can be seen in Figure 3.
What is not obvious from this diagram is that in the general case the viewvector will not be aligned with the negative Z-axis (this is defined as the starting position for transformations in CoRgi and OpenGL – the viewer is initially positioned somewhere on the positive z-axis looking down the negative z-axis).

Figure 3

Figure 4
This means that the next step in our transformations has to be that of rotating the world in such a manner that the viewvector becomes axis-aligned with the negative z-axis. Making use of the above mentioned convention simplifies matters here, since the direction that the viewer looks at is readily available and its inverse it the rotation we need to apply to arrive at Figure 4.

From here we need to translate objects and spline up the positive Z-axis so that the spline-plane coincides with the XY-plane. The amount of translation is the distance from the viewer to the spline-plane. In turn this distance can be calculated by finding the dot-product between the normalised viewvector and a vector from the viewer to a point in the plane (for convenience we take again the "centre of mass" found earlier as the point in the plane).
After this translation we end up with the diagram depicted in Figure 5, which, as mentioned earlier, is the starting-position of standard packages.

For this reason the perspective projection that needs to be performed for mapping vertices onto the XY-plane (in the general case this would be the viewplane) is well understood and documented. If the eye is at position (0,0,c) [where c is the distance of the viewer to the splineplane, as found earlier] looking down the negative Z-axis and an arbitrary vertex at position (x,y,z), then the mapped vertex can be found at:

These new co-ordinates can be used to do the 2D selection as described previously.


Figure 5

<BACK TO TOP>