Lecture 5: Linear Sorting#

Review#

  • Direct access array is fast, but may use a lot of space O(u)

  • Solve space problem by mapping (hashing) key space u down to m=O(n)

  • Hash table gives expected O(1) time operations, amoritized if dynamic

  • Merge Sort can be O(nlogn), but can we have even faster sorting algorithm?

Comparison Model based Sorting - Lower Bound Analysis#

  • Comparsion Sorts: the sorted order then determine is based only on comparsions between the input elements.

    • insertion sort

    • selection sort

    • merge sort

    • heap sort

    • quick sort

  • Comparison model implies that algorithm decision tree is binary (constant branching factor)

  • Requires leaves L # possible outputs

  • Tree height lower bounded by Ω(logL), so worst-case running time is Ω(logL)

  • To sort array of n elements, # outputs is n! permutations

  • Thus height lower bounded by log(n!)log((n/2)n/2)=Ω(nlogn)

  • So merge sort is optimal in comparison model

  • Can we exploit a direct access array to sort faster?

Direct Access Array (DAA) Sorting#

  • make DAA, suppose all keys are unique non-negative integers in range {0, ,1, 2, ..., u-1}, so n<u

  • store item x in index, x.key, using set data structure -> nO(1)

  • walk down DAA and return item seem in order -> O(u)

def direct_access_sort(A):
    "Sort A assuming items have distinct non-negative keys"
    u = 1 + max([x.key for x in A])              # O(n) find maximum key
    D = [None] * u                               # O(u) direct access array
    for x in A:                                 # O(n) insert items
        D[x.key] = x 
    i = 0 
    for key in range(u):                        # O(u) read out items in order
        if D[key] is not None:
            A[i] = D[key]
            i += 1
# What is the type of A in Python? a list of dictionary?
  • what if keys are in larger range, like u<n2?

  • represent each key k by tuple (a,b) where k=an+b and 0b<n.

  • one way is to use divmod operators as in python:

  • Examples: [17, 3, 24, 22, 12] -> [(3,2), (0,3), (4,4), (4,2), (2,2)] -> [32, 03, 44, 42, 22] when n=5

  • How can we sort tuples?

Tuple Sort#

  • Item keys are tuples of equal length, i.e., item x.key = (x.k1, xk2, ...)

  • The first key k1 is most significant.

  • How to sort? -> sort separately each key

  • But in what order?

    • most significant to least significant, first k1 then k2: [\bm32,\bm03,\bm44,\bm42,\bm22] -> [\bm03,\bm22,\bm32,\bm44,\bm42] -> [\bm22,\bm32,\bm42,\bm03,\bm44] -> Too bad. The second sort totally ruined previous sort.

    • least significant to most significant, first k2 then k1: [3\bm2,0\bm3,4\bm4,4\bm2,2\bm2] -> [3\bm2,4\bm2,2\bm2,0\bm3,4\bm4] -> [0\bm3,2\bm2,3\bm2,4\bm2,4\bm4] -> Good. But still may have problem with duplicated keys. The last two elements could be 44,42 because sort alogirhm may mess up with the order when the keys are duplicated.

  • Idea: use tuple sort with auxiliary DAA sort to sort tuple (a, b).

  • Problem! Many integers could have the same a or b value, even if input keys distinct

  • Need sort allowing repeated keys which preserves input order

  • Want sort to be stable: repeated keys appear in output in same order as input

  • Direct access array sort cannot even sort arrays having repeated keys!

  • Can we modify direct access array sort to admit multiple keys in a way that is stable?

Counting Sort#

  • Instead of storing a single item at each array index, store a chain, just like hashing.

  • For stability, chain data structure should rememebr the order in which items were added

  • Use a sequence data structure which maintains insertion order

  • To insert item x, insert_last to end of the chain at index x.key

  • Then to sort, read through all chains in sequence order, return items one by one

Counting Sort

def counting_sort(A):
    "Sort A assuming items have non-negative keys"
    u = 1 + max([x.key for x in A])             # O(n) find maximum key: the key could be their own value or tuples 
    D = [[] for i in range(u)]                  # O(u) direct access array of chains
    for x in A:                                 # O(n) insert into chain at x.key: use sequence (e.g., list) to maintain insertion order
        D[x.key].append(x)                      
    i = 0
    for chain in D:                             # O(u) read out items in order
        for x in chain:
            A[i] = x
            i += 1

Radix Sort#

  • Idea! If u<n2, use tuple sort with auxiliary counting sort to sort tuples (a, b)

  • Sort least significant key b, then most significant key a

  • Stability ensures previous sorts stay sorted

  • Running time for this algorithm is O(2n)=O(n).

  • If every key <nc for some positive c=logn(u), every key has at most c digits base n

  • A c-digit number can be written as a c-element tuple in O(c) time

  • We sort each of the c base-n digits in O(n) time

  • So tuple sort with auxiliary counting sort runs in O(cn) time in total

  • If c is constant, so each key is nc, this sort is linear O(n)!