The Abstract Datatype "Sequence"
A study by Bruno Haible
University of Karlsruhe, 1988
(Translated to English, 2004)
The goal of this study is to check whether the concept of abstract
datatypes works, by looking at the example of "sequence".
In many domains of practical programming, elements of the same kind
have to be manipulated the same way, simultaneously or sequentially. If
the amount of available memory allows it, the clearest way to achieve
this is to combine the elements to a sequence; the sequence is then
manipulated in a single step. Often one is then burdened with routine
programming tasks, such as iterating through a sequence, reversing a
sequence, concatenating several sequence, sorting a sequence. Once
these routines are written, a change of the data structure - e.g. from
a list to an array, for the purpose of saving memory, or to a binary
tree, for faster lookup and insert operations - is very difficult.
A possible way out is a library of standard functions, with a separate
implementation for each data type (array, string, list, binary tree,
...) and each standard function. An orthogonal design and careful
testing of this library are necessary, before the user can really work
with it.
We have chosen a different way: Sequences are seen as an abstract data
type, with a set of operations that can be applied to every kind of
sequence. The user of this library can also declare his own data
structures as sequences, and instantly all the operations are at his
disposal.
We now report about the implementation of this abstract data type. The
central ideas were:
- Polymorphic functions which can be called for each kind of
sequence with the same syntax and semantic,
- User-definable sequences,
- No loss of efficiency worth mentioning, i.e. every operation runs
with optimal speed - up to a constant factor - on every kind of
sequence.
Since the implementation language should work efficiently with both
lists and arrays and should allow iteration macros that are independent
of the data structure, Common Lisp was chosen for this task.
The set of implemented functions contains all operations from [Guy L.
Steele: Common Lisp, The Language, Chapter 14]:
(type-of sequence)
returns the type
of sequence: a user-defined
symbol or LIST
, VECTOR
, STRING
, ...
(elt sequence index)
returns the element
number index in sequence.
(subseq sequence start
[end
])
returns a copy of the
sequence part from the element number start
(inclusive) to the element number end
(exclusive).
(copy-seq sequence)
copies sequence.
(length sequence)
returns the number
of elements in sequence.
(reverse sequence)
reverses the order
of sequence, nondestructively.
(nreverse sequence)
reverses the order
of sequence, by swapping the
first and the last element, the second and the second-to-last element,
etc.
(make-sequence size
[:initial-element init
]
[:update fun
])
creates a sequence of
specified type and length, whereby the first element and the rule for
creating the series of elements can be specified.
(concatenate result-type
{sequence
}*)
concatenate several sequences
(of potentially different types), forming a sequence of specified type.
(map nil function
{sequence
}+)
applies a given function to
elements of the given sequences (consecutively, each time to one
element of each sequence).
(do-sequence (var sequence
[result
])
{declaration
}*
{tag
|statement
}*)
iterates over sequence from left to right,
whereby each time an element of sequence
is assigned to the variable var
and then the remaining statements
are executed. The loop can be terminated through return. Upon normal
loop termination, result is
the result.
(map result-type function
{sequence
}+)
likewise, and constructs a
new sequence of specified type from the results of the function calls.
(some predicate
{sequence
}+)
determines whether the
predicate, applied element-wise to the specified sequences, is
fulfilled at least once.
(every predicate
{sequence
}+)
determines whether the
predicate, applied
element-wise to the specified sequences, is fulfilled each time.
(notany predicate
{sequence
}+)
determines whether the
predicate, applied
element-wise to the specified sequences, is never fulfilled.
(notevery predicate
{sequence
}+)
determines whether the
predicate, applied
element-wise to the specified sequences, is not fulfilled each time.
(reduce function sequence
[:start start-index
]
[:end end-index
]
[:from-end from-end-flag
]
[:initial-value init
])
combines all elements in sequence (more precisely: in the
specified index range of sequence)
through the two-ary function function
from left to right (or from right to left, if specified), whereby init, if specified, is the start
value.
(fill sequence item
[:start start-index
]
[:end end-index
])
fills the specified index
range of sequence with the
element item.
(replace sequence1 sequence2
[:start1 start-index-1
]
[:end1 end-index
-1]
[:start2 start-index-2
]
[:end2 end-index-2
])
replaces a index range of sequence1 with the contents of an
index range of sequence2.
(remove item sequence
[:test testfun
]
[:test-not testfun
]
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
,
(remove-if test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
,
(remove-if-not test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
removes (nondestructively) all elements of sequence that satisfy a certain
test. More precisely, from the given index range of sequence those elements are
removed, whose test key (specified through keyfun) satisfy a given test: For remove
, if it is equal to item (in the sense of the specified
testfun); for remove-if
, if test is fulfilled; for remove-if-not
, if test is not fulfilled. If max-count is specified, only the
first max-count elements that
satisfy the test (from the left, or from the right if from-end-flag
specifies so) are removed.
(delete item sequence
[:test testfun
]
[:test-not testfun
]
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
,
(delete-if test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
,
(delete-if-not test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
like remove
, remove-if
, remove-if-not
, respectively;
however the operation on sequence
is destructive.
(remove-duplicates sequence
[:test testfun
]
[:test-not testfun
]
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
removes all duplicate
occurrences (equality of the test keys in the sense of testfun) from the specified index
range of sequence. This is
done by removing all elements in the index range which are equal to
another element to the right of it (resp. to the left of it, if from-end-flag). Nondestructive.
(delete-duplicates sequence
[:test testfun
]
[:test-not testfun
]
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
like remove-duplicates
; however the
operation on sequence is
destructive.
(substitute newitem olditem sequence
[:test testfun
]
[:test-not testfun
]
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
,
(substitute-if newitem test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
,
(substitute-if-not newitem test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
replaces in the index range of sequence
all elements that fulfill a certain test with newitem. Nondestructive.
(nsubstitute newitem olditem sequence
[:test testfun
]
[:test-not testfun
]
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
,
(nsubstitute-if newitem test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
,
(nsubstitute-if-not newitem test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:count max-count
]
[:key keyfun
])
likewise, however the operation on sequence
is destructive.
(find item sequence
[:test testfun
]
[:test-not testfun
]
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
,
(find-if test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
,
(find-if-not test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
searches in the specified direction for the first element that fulfills
a certain test and returns it.
(position item sequence
[:test testfun
]
[:test-not testfun
]
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
,
(position-if test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
,
(position-if-not test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
searches in the specified direction for the first element that fulfills
a certain test and returns its index.
(count item sequence
[:test testfun
]
[:test-not testfun
]
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
,
(count-if test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
,
(count-if-not test sequence
[:from-end from-end-flag
]
[:start start-index
]
[:end end-index
]
[:key keyfun
])
counts (in the specified direction) the elements that fulfill a certain
test and returns their number.
(mismatch sequence1 sequence2
[:from-end from-end-flag
]
[:test testfun
]
[:test-not testfun
]
[:key keyfun
]
[:start1 start-index-1
]
[:end1 end-index
-1]
[:start2 start-index-2
]
[:end2 end-index-2
])
compares the specified index
ranges in sequence1 and sequence2 (which don't need to be
of the same length) element by element, in the specified direction,
using the specified test, and returns the index at which the longest
subsequence of sequence1 ends
(respectively, begins) that coincides with its corresponding
subsequence of sequence2.
(search sequence1 sequence2
[:from-end from-end-flag
]
[:test testfun
]
[:test-not testfun
]
[:key keyfun
]
[:start1 start-index-1
]
[:end1 end-index
-1]
[:start2 start-index-2
]
[:end2 end-index-2
])
searches in the specified
index range of sequence1
(from left or right, respectively) for a complete copy of the specified
index range of sequence2 and
returns the index (into sequence1)
where this part occurs for the first time.
(sort sequence predicate
[:start start-index
]
[:end end-index
]
[:key keyfun
])
sorts the specified part of sequence, whereby for any two
elements the test keys (specified through keyfun) are compared through predicate.
(stable-sort sequence predicate
[:start start-index
]
[:end end-index
]
[:key keyfun
])
likewise, except that any two
elements of sequence that are
not put into an order by the predicate
are not swapped.
(merge result-type sequence1 sequence2 predicate
[:key keyfun
])
combines the sequences sequence1 and sequence2, returning a sequence of
type result-type. Hereby the
elements of sequence1 and sequence2 are copied into the
result sequence in the order dictated by predicate.
For defining new types of sequences, the user can use the following
macro:
(define-sequence
name
[:init init-fun
]
:upd update-fun
:endtest endtest-fun
[:fe-init from-end-init-fun
]
:fe-upd from-end-update-fun
:fe-endtest from-end-endtest-fun
:access access-fun
:access-set access-set-fun
[:copy copy-fun
]
[:length length-fun
]
:make make-fun
[:elt elt-fun
]
[:set-elt set-elt-fun
]
[:init-start init-start-fun
]
[:fe-init-end from-end-init-end-fun
])
(Here init-fun and init-start-fun are optional, but at
least one of them must be specified. Also, at lease one of from-end-init-fun and from-end-init-end-fun must be
specified.)
The sequence type name should
be a defstruct
type
defined by the user which shall be interpreted as sequence. The further
arguments must be Lisp functions. During their execution, they can
access the sequence through the global variable SEQ
. In detail they have the
following meaning:
- init-fun returns a
"pointer" - the precise data structure is up to the user - which can
traverse the sequence from left to right (called "LR pointer") and is
positioned at the leftmost position, i.e. the beginning of the sequence.
- update-fun moves an LR
pointer, that has not yet reached its rightmost position, to the right
by one position.
- endtest-fun tests
whether an LR pointer has reached its rightmost position, i.e. the end
of the sequence.
- from-end-init-fun
returns a
"pointer" - the precise data structure is again up to the user - which
can
traverse the sequence from right to left (called "RL pointer") and is
positioned at the rightmost position, i.e. the end of the sequence.
- from-end-update-fun
moves an RL pointer, that has not yet reached its leftmost position, to
the left by one position.
- from-end-endtest-fun
tests whether an RL pointer has reached its leftmost position, i.e. the
start of the sequence.
- access-fun returns the
element to which an LR pointer or RL pointer, that has not yet reached
its lastmost position, points to.
- access-set-fun sets the
element of the sequence
SEQ
,
to which a given LR pointer is pointing, to a given value
(destructively).
- copy-fun copies an LR
pointer or RL pointer. (This needs only to be specified if update-fun or from-end-update-fun destructively
modify their input pointer argument. If the pointers are for example
integers, it is not needed.)
- length-fun return the
length of the sequence
SEQ
,
an integer ≥0.
- make-fun returns a fresh
empty sequence of type name
with a given length.
- elt-fun returns the
element at the given index in
SEQ
.
- set-elt-fun sets the
element at the given index in
SEQ
to a given value.
- init-start-fun returns
for a given index an LR pointer that points to the element with this
index. (Same as init-fun
followed by index times update-fun.)
- from-end-init-end-fun
returns for a given index an RL pointer that points before the element
with this index (therefore after exactly index times from-end-update-fun it will have
reached its leftmost position).
It turns out that these 15 basic operations on sequences are sufficient
to execute all of the complex sequence operations (subseq
, sort
, etc.) efficiently.
This way, the following have been defined as sequences so far:
LIST
: lists,
VECTOR
, STRING
, BIT-VECTOR
: general arrays,
character arrays, bit arrays,
AVL-TREE
: AVL trees,
a kind of balanced binary trees.
This list of types can be extended arbitrarily.
Finally we list the advantages and drawbacks of this approach:
- Small loss of efficiency (compared to a particular implementation
for each type) through type checking at runtime and a function call for
each basic operation.
- The library of functions is small. Therefore it is easily
testable, and the user can remember the names of the functions easily.
- Consistent error checking and error reporting in the entire
library is possible and worth it, since it has to be written only once.
- It is worth using more complicated special algorithms (like for
example, Knuth's fast search algorithm for search), because the
functions are used more than once. This reduces the aforementioned loss
of efficiency.
- On user-defined data structures that possess very efficient
special operations (e.g. BIT-AND, BIT-OR on bit vectors, or fast
INSERT, DELETE on AVL trees) or that are appropriate due to their low
consumption of memory (e.g. bit vectors, strings or sparsely populated
vectors), with very small expenses - 15 elementary basic operations -
the common sequence functions can be made available.
Conclusion
Viewing a sequence as abstract data type is not only a theoretical
consideration, it is also practically valuable. This view represents a
good compromise between efficiency on one side and power, flexibility,
orthogonality on the other side.