-23-
locking to avoid deadlocks are well understood (see, for example, [3]). The locking approach
taken in C
0
is dependent on the data structure used. In the case of a (2-3)-tree, for example,
we could write lock a subtree falling below a single (2-3)-directory node that contains all
entries in the range affected during a merge to a node of C
1
; simultaneously, find operations
would lock all (2-3)-nodes on their access path in read mode so that one type of access will
exclude another. Note that we are only considering concurrency at the lowest physical level of
multi-level locking, in the sense of [28]. We leave to others the question of more abstract
locks, such as key range locking to preserve transactional isolation, and avoid for now the
problem of phantom updates; see [4], [14] for a discussion. Thus read-locks are released as
soon as the entries being sought at the leaf level have been scanned. Write locks for (all) nodes
under the cursor are released following each node merged from the larger component. This
gives an opportunity for a long range find or for a faster cursor to pass a relatively slower
cursor position, and thus addresses point (iii) above..
Now assume we are performing a rolling merge between two disk based components, migrating
entries from C
i-1
, which we refer to as the inner component of this rolling merge, out to C
i
,
which we refer to as the outer component. The cursor always has a well-defined inner com-
ponent position within a leaf-level node of C
i-1
, pointing to the next entry it is about to migrate
out to C
i
, and simultaneously a position in each of the higher directory levels of C
i-1
along the
path of access to the leaf level node position. The cursor also has an outer component position in
C
i
, both at the leaf level and at upper levels along the path of access, corresponding to an entry it
is about to consider in the merge process. As the merge cursor progresses through successive
entries of the inner and outer components, new leaf nodes of C
i
created by the merge are im-
mediately placed in left-to-right sequence in a new buffer resident multi-page block. Thus the
nodes of the C
i
component surrounding the current cursor position will in general be split into
two partially full multi-page block buffers in memory: the "emptying" block whose entries
have been depleted but which retains information not yet reached by the merge cursor, and the
"filling" block which reflects the result of the merge up to this moment but is not yet full
enough to write on disk. For concurrent access purposes, both the emptying block and the
filling block contain an integral number of page-sized nodes of the C
1
tree which simply happen
to be buffer resident. During merge step operations restructuring individual nodes, the nodes
involved are locked in write mode, blocking other types of concurrent access to the entries.
In the most general approach to a rolling merge, we may wish to retain certain entries in the
component C
i-1
rather than migrating all entries out to C
i
as the cursor passes over them. In
this case, the nodes in the C
i-1
component surrounding the merge cursor will also be split into
two buffer resident multi-page blocks, the "emptying" block that contains nodes of C
i-1
that the
merge cursor has not yet reached, and the "filling" block with nodes, placed left-to-right, that
contain entries recently passed over by the merge cursor and retained in component C
i-1
. In
this most general case then, the merge cursor position is affecting four different nodes at any
one time: the inner and outer component nodes in the emptying blocks where the merge is about
to occur and the inner and outer component nodes in the filling blocks where new information is
being written as the cursor progresses. Clearly these four nodes may all be less than com-
pletely full at any moment, and the same is true of the containing blocks. We take write locks on
all four nodes during the time the merge is actually modifying the node structures and release
these locks at quantized instants to allow a faster cursor to pass by; we choose to release locks
each time a node in the emptying block in the outer component has been completely depleted, but
the other three nodes will generally be less than full at that time. This is all right, since we can
perform all operations of access on a tree with nodes that are less than completely full as well
as blocks that are less than completely full with nodes. The case where one cursor passes an-
other requires particularly careful thought, because in general the cursor position of the
rolling merge being bypassed will be invalidated on its inner component, and provision must be