Lecture 5: Linear Sorting#
Review#
Direct access array is fast, but may use a lot of space
Solve space problem by mapping (hashing) key space
down toHash table gives expected
time operations, amoritized if dynamicMerge Sort
can be , but can we have even faster sorting algorithm?
Comparison Model based Sorting - Lower Bound Analysis#
Comparsion Sorts: the sorted order then determine is based only on comparsions between the input elements.
insertion sort
selection sort
merge sort
heap sort
quick sort
Comparison model implies that algorithm decision tree is binary (constant branching factor)
Requires
leaves L ≥
# possible outputsTree height lower bounded by
, so worst-case running time isTo sort array of
n
elements, # outputs isn!
permutationsThus height lower bounded by
So merge sort is optimal in comparison model
Can we exploit a
direct access array
to sort faster?
Direct Access Array (DAA) Sorting#
make DAA, suppose all keys are unique non-negative integers in range
{0, ,1, 2, ..., u-1}
, sostore item
x
in index,x.key
, usingset
data structure ->walk down DAA and return item seem in order ->
def direct_access_sort(A):
"Sort A assuming items have distinct non-negative keys"
u = 1 + max([x.key for x in A]) # O(n) find maximum key
D = [None] * u # O(u) direct access array
for x in A: # O(n) insert items
D[x.key] = x
i = 0
for key in range(u): # O(u) read out items in order
if D[key] is not None:
A[i] = D[key]
i += 1
# What is the type of A in Python? a list of dictionary?
what if keys are in larger range, like
?represent each key
by tuple where and .one way is to use
divmod
operators as in python:Examples: [17, 3, 24, 22, 12] -> [(3,2), (0,3), (4,4), (4,2), (2,2)] -> [32, 03, 44, 42, 22] when
How can we sort tuples?
Tuple Sort#
Item keys are tuples of equal length, i.e., item
x.key = (x.k1, xk2, ...)
The first key
k1
is most significant.How to sort? -> sort separately each key
But in what order?
most significant to least significant, first
then : -> -> -> Too bad. The second sort totally ruined previous sort.least significant to most significant, first
then : -> -> -> Good. But still may have problem with duplicated keys. The last two elements could be becausesort
alogirhm may mess up with the order when the keys are duplicated.
Idea: use tuple sort with auxiliary DAA sort to sort tuple
(a, b)
.Problem! Many integers could have the same
a
orb
value, even if input keys distinctNeed sort allowing repeated keys which preserves input order
Want sort to be stable: repeated keys appear in output in same order as input
Direct access array sort cannot even sort arrays having repeated keys!
Can we modify direct access array sort to admit multiple keys in a way that is stable?
Counting Sort#
Instead of storing a single item at each array index, store a chain, just like hashing.
For stability, chain data structure should rememebr the order in which items were added
Use a sequence data structure which maintains insertion order
To insert item
,insert_last
to end of the chain at indexx.key
Then to sort, read through all chains in sequence order, return items one by one
def counting_sort(A):
"Sort A assuming items have non-negative keys"
u = 1 + max([x.key for x in A]) # O(n) find maximum key: the key could be their own value or tuples
D = [[] for i in range(u)] # O(u) direct access array of chains
for x in A: # O(n) insert into chain at x.key: use sequence (e.g., list) to maintain insertion order
D[x.key].append(x)
i = 0
for chain in D: # O(u) read out items in order
for x in chain:
A[i] = x
i += 1
Radix Sort#
Idea! If
, use tuple sort with auxiliary counting sort to sort tuples(a, b)
Sort least significant key
b
, then most significant keya
Stability ensures previous sorts stay sorted
Running time for this algorithm is
.If every key
for some positive , every key has at mostc
digits basen
A
c-digit
number can be written as a c-element tuple in timeWe sort each of the
c
base-n digits in timeSo tuple sort with auxiliary counting sort runs in
time in totalIf
c
is constant, so each key is , this sort is linear !