The Abstract Datatype "Sequence"


A study by Bruno Haible

University of Karlsruhe, 1988
(Translated to English, 2004)

The goal of this study is to check whether the concept of abstract datatypes works, by looking at the example of "sequence".

In many domains of practical programming, elements of the same kind have to be manipulated the same way, simultaneously or sequentially. If the amount of available memory allows it, the clearest way to achieve this is to combine the elements to a sequence; the sequence is then manipulated in a single step. Often one is then burdened with routine programming tasks, such as iterating through a sequence, reversing a sequence, concatenating several sequence, sorting a sequence. Once these routines are written, a change of the data structure - e.g. from a list to an array, for the purpose of saving memory, or to a binary tree, for faster lookup and insert operations - is very difficult.

A possible way out is a library of standard functions, with a separate implementation for each data type (array, string, list, binary tree, ...) and each standard function. An orthogonal design and careful testing of this library are necessary, before the user can really work with it.

We have chosen a different way: Sequences are seen as an abstract data type, with a set of operations that can be applied to every kind of sequence. The user of this library can also declare his own data structures as sequences, and instantly all the operations are at his disposal.

We now report about the implementation of this abstract data type. The central ideas were:
Since the implementation language should work efficiently with both lists and arrays and should allow iteration macros that are independent of the data structure, Common Lisp was chosen for this task.

The set of implemented functions contains all operations from [Guy L. Steele: Common Lisp, The Language, Chapter 14]:
For defining new types of sequences, the user can use the following macro:

(define-sequence name
    [:init init-fun]
    :upd update-fun
    :endtest endtest-fun
    [:fe-init from-end-init-fun]
    :fe-upd from-end-update-fun
    :fe-endtest from-end-endtest-fun
    :access access-fun
    :access-set access-set-fun
    [:copy copy-fun]
    [:length length-fun]
    :make make-fun
    [:elt elt-fun]
    [:set-elt set-elt-fun]
    [:init-start init-start-fun]
    [:fe-init-end from-end-init-end-fun])

(Here init-fun and init-start-fun are optional, but at least one of them must be specified. Also, at lease one of from-end-init-fun and from-end-init-end-fun must be specified.)
The sequence type name should be a defstruct type defined by the user which shall be interpreted as sequence. The further arguments must be Lisp functions. During their execution, they can access the sequence through the global variable SEQ. In detail they have the following meaning:
It turns out that these 15 basic operations on sequences are sufficient to execute all of the complex sequence operations (subseq, sort, etc.) efficiently.

This way, the following have been defined as sequences so far:
This list of types can be extended arbitrarily.

Finally we list the advantages and drawbacks of this approach:

Conclusion

Viewing a sequence as abstract data type is not only a theoretical consideration, it is also practically valuable. This view represents a good compromise between efficiency on one side and power, flexibility, orthogonality on the other side.