5.5 Selection
Selection refers to a class of problems that are easily reduced to sorting, but do
not require the full power of sorting. Let s = e1 , . . . , en be a sequence and let
s = e1 , . . . , en be the sorted version of it. Selection of the smallest element re-
quires determining e1 , selection of the smallest and the largest requires determining
e1 and en , and selection of the k-th largest requires determining ek . Selection of the
median refers to selecting the n/2 -th largest element. Selection of the median and
also quartiles is a basic problem in statistics. It is easy to determine the smallest or
the smallest and the largest element by a single scan of a sequence in linear time.
We show that the k-th largest element can also be determined in linear time. The
following simple recursive procedure solves the problem.
// Find an element with rank k
Function select(s : Sequence of Element; k : ) : Element
assert |s| ≥ k
pick p ∈ s uniformly at random
a := e ∈ s : e < p
if |a| ≥ k then return select(a, k)
b := e ∈ s : e = p
if |a| + |b| ≥ k then return p
c := e ∈ s : e > p
return select(c, k − |a| − |b|)
Fig. 5.9. Quickselect
¥
// pivot key
//
//
//
a
a
b
a
k
k
b
k
c
The procedure is akin to quicksort and is therefore called quickselect. The key
insight is that it suffices to follow one of the recursive calls, see Figure 5.9. As before,
5.5 Selection
115
a pivot is chosen and the input sequence s is partitioned into subsequences a, b, and
c containing the elements smaller than the pivot, equal to the pivot, and larger than
the pivot, respectively. If |a| ≥ k, we recurse on a, and if k > |a| + |b|, we recurse on
c, of course with a suitably adjusted k. If |a| < k ≤ |a| + |b|, the task is solved: The
pivot has rank k and we return it. Observe, that the last case also covers the situation
|s| = k = 1 and hence no special base case is needed. Figure 5.10 illustrates the
execution of quickselect.
s
3, 1, 4, 5, 9, 2, 6, 5, 3, 5, 8
3, 4, 5, 9, 6, 5, 3, 5, 8
3, 4, 5, 5, 3, 5
k
6
4
4
pabc
2123, 4, 5, 9, 6, 5, 3, 5, 8
6 3, 4, 5, 5, 3, 469, 8
53, 4, 35, 5, 5
Fig. 5.10. The execution of select( 3, 1, 4, 5, 9, 2, 6, 5, 3, 5, 8, 6 , 6). The (bold) middle ele-
ment of the current s is used as the pivot p.
As for quicksort, the worst case execution time of quickselect is quadratic. But
the expected execution time is linear and hence a logarithmic factor faster than quick-
sort.
Theorem 19. Algorithm quickselect runs in expected time O(n) on an input of size
n.
Proof. We will give an analysis that is simple and shows linear expectation. It does
not give the smallest constant possible. Let T (n) denote the expected execution time
of quickselect. Call a pivot good if neither |a| nor |b| are larger than 2n/3. Let γ
denote the probability that the pivot is good. Then γ ≥ 1/3. We now make the
conservative assumption that the problem size in the recursive call is only reduced
for good pivots and that even then it is only reduced by a factor of 2/3. Since the
work outside the recursive call is linear in n, there is an appropriate constant c such
that
T (n) ≤ cn + γT
T (n) ≤
cn
+T
γ
2n
+ (1 − γ)T (n)3
2n2n
≤ 3cn + T33
2
3
i
or, equivalently
≤ 3c(n +
2n 4n
++ . . .)
39
≤ 3cn
i≥0
≤ 3cn
Exercise 97. Modify quickselect so that it returns the k smallest elements.
Exercise 98. Give a selection algorithm that permutes an array in such a way that
the k smallest elements are in entries a[1],. . . , a[k]. No further ordering is required
except that a[k] should have rank k. Adapt the implementation tricks from array-
based quicksort to obtain a nonrecursive algorithm with fast inner loops.
1
= 9cn .
1 − 2/3
116
5 Sorting and Selection
Exercise 99 (Streaming selection).
1. Develop an algorithm that finds the k-th smallest element of a sequence that
is presented to you one element at a time in an order you cannot control. You
have only space O(k) available. This models a situation where voluminous data
arrives over a network or at a sensor.
2. Refine your algorithm so that it achieves running time O(n log k). You may want
to read some of Chapter 6 first.
*c) Refine the algorithm and its analysis further so that your algorithm runs in aver-
age case time O(n) if k = O(n/ log n). Here, average means that all presenta-
tion orders of elements in the sequence are equally likely.
đang được dịch, vui lòng đợi..