Lecture 3: Set and Sorting#

Set Interface#

Type

Function

Note

Container

build(x)

given an iterable \(x\), build set from items in \(x\)

len(x)

return the number of stored values

Static

find(k)

return the stored item with key $k

Dynamic

insert(x)

add \(x\) to set

delete(k)

remove and return the stored item with key \(k\)

Order

iter_ord()

return the stored items one-by-one in key order

find_min()

return the item with smallest key

find_max()

return the item with largest key

find_next(k)

return the item with smallest key larger than \(k\)

find_prev(k)

return the stored item with largest key smaller than k

  • storing items in an array in arbitrary order can implement a (not so efficient) set

  • stored items sorted increasing by key allows:

    • faster find min/max (at first/last index of array)

    • faster find via binary search: \(O(logn)\)

  • how to construct a sorted array?

! Confused: set and sequence are two types of interface. So array as a data structure can have both interfaces?

Sorting#

  • given a sorted array, we can leverage binary search to make an efficient set data structure

  • input: static array \(A\) of \(n\) numbers

  • output: static array \(B\) of which is a sorted permutation of \(A\)

  • example: [8,2,4,9,3] -> [2,3,4,8,9]

Permutation Sort#

  • there are \(n!\) permuations of \(A\), at least one of which is sorted

  • for each permutation, check wether sorted, \(\Theta(n)\) time

  • permulation_sort analysis:

    • try all possibilities - brute force

    • running time \(O(n!*n)\), which is exponential

from itertools import permutations

def permutation_sort(A):
    '''Sort A'''
    for B in permutations(A):               # O(n!)
        if is_sorted(B):                    # O(n) this is a pesudo code
            return B                        # O(1)

Recurrences#

  • substitution:

  • recurrence tree: draw a tree representing the recursive calls and sum computation at nodes

  • master theorem:

Selection Sort#

  • select the smallest (forward from 0 to n)/greatest (reverse from n to 0) element from the remaining elements and place it a the correct position

  • forward

    • find a smallest number in the remaining array \(A[i:]\) and swap it to \(A[i]\)

    • example: [8,2,4,9,3] -> [2,8,4,9,3] -> [2,3,4,9,8] -> [2,3,4,9,8] -> [2,3,4,8,9]

  • reverse

    • find the largest number in prefix \(A[:i+1]\) and swap it to \(A[i]\)

    • recursively sort prefix \(A[:i]\)

    • example: [8,2,4,9,3] -> [8,2,4,3,9] -> [3,2,4,8,9] -> [3,2,4,8,9] -> [2,3,4,8,9]

def selection_sort(A, i):                           # T(i)
    "Sort A[:i+1]"                                  
    if i > 0:                                       # O(1)
        # find maximum in A[:i] and return index j
        j = find_max_prefix(A, i)                   # O(n)
        # swap A[j] with A[i]
        A[i], A[j] = A[j], A[i]                     # O(1)
        selection_sort(A, i-1)                      # T(i-1)

    return A

def find_max_prefix(A, i):                          # S(i)
    "Find the maximum in prefix A[:i+1]"
    if i == 0:                                      # O(1)
        return i                                    # O(1)
    if i > 0:                                       # O(1)
        j = find_max_prefix(A, i-1)                 # S(i-1)
        if A[j] > A[i]:                             # O(1)
            return j                                # O(1)
        else:
            return i                                # O(1)  

A=[6,2]
print(selection_sort(A, len(A)-1))

A=[2,6,3,9,8]
print(selection_sort(A, len(A)-1))
[2, 6]
[2, 3, 6, 8, 9]
  • prefix_max analysis:

    • idea:

      • for an array \(A\), with prefix \(i\), the maximum of \(A[:i+1]\) is either \(A[i]\) or the maximum of \(A[:i]\).

      • recursion

    • Induction: assume correct for \(i\), maximum is either \(A[i]\) or the maximum of \(A[:i]\), returns correct index in either case.

    • \(S(1) = O(1), S(n) = S(n-1) + O(1)\)

      • substitution: \(S(n) = O(n)\)

      • recurrence tree

  • selection_sort analysis:

    • idea:

      • recursively find the largest number in the prefix \(A[:i+1], i = n, n-1, ..., 0\)

Insertion Sort#

  • build the final sorted array one item at a time by inserting the element into a particular position and shifting the remaining element. A good animation can be found at https://en.wikipedia.org/wiki/Insertion_sort.

  • procedure using recursion

    • assume we have an array \(A[:i+1]\), with the sorted prefix \(A[:i]\), and the element \(A[i]\),

    • compare \(A[i-1]\) and \(A[i]\).

      • if \(A[i] >= A[i-1]\), then move forward the pointer \(i\) by 1 and repeat the loop

      • else swap \(A[i]\) with \(A[i-1]\), and using the same insert_sort procedure to sort \(A[:i-1]\). This is necessary because the swapping will lead a unsorted \(A[:i]\), See the last step in the following example.

        [2, 9, 8, 4] -> [2, 9, 8, 4] -> [2, 8, 9, 4] -> [2, 4, 8, 9] 
        

        The last step internally proceeds as follows:

        [2, 8, 9, 4] -> [2, 8, 4, 9] -> [2, 4, 8, 9]
        
def insert_last(A, i):
    '''Sort A[:i+1] assuming sorted A[:i]'''
    if i > 0 and A[i] < A[i-1]:
        A[i-1], A[i] = A[i], A[i-1]
        insert_last(A, i-1)
    return A

def insert_sort(A, i):
    '''Sort A[:i+1]'''
    if i > 0:
        insert_sort(A, i-1)
        insert_last(A, i)
    return A

print(insert_last([2, 8, 9, 3], 3))
print(insert_sort([2, 8, 9, 3], 3))

print(insert_last([2, 9, 8, 3], 3)) 
print(insert_sort([2, 9, 8, 3], 3))
[2, 3, 8, 9]
[2, 3, 8, 9]
[2, 3, 9, 8]
[2, 3, 8, 9]
  • insert_last analysis

    • base case: for \(i=0\), array has one element so is sorted.

    • induction: assume correct for \(i\), if \(A[i] > A[i-1]\), array is sorted. Otherwise swapping the last two elements allows up to sort \(A[:i]\) by induction.

    • \(S(1) = O(1),S(n)=S(n-1)+O(1), S(n)=O(n)\)

  • insert_sort analysi

    • base case: for \(i=0\), arrya has one element so is sorted

    • induction: assume correct for \(i\), algorithm sorts \(A[:i]\) by induction, and then insert_last correctly sorts the rest as proved before

    • \(T(1)=O(1), T(n) = T(n-1) + S(n) = T(n-1) + O(n) \rArr T(n)=O(n^2)\)

Merge Sort#

  • recursively sort first half and second half

  • merge sorted halves into one sorted list (two-pointer algorithm)

  • example: \([7,1,5,6,2,4,9,3] \rArr [1,7,5,6,2,4,3,9] \rArr [1,5,6,7,2,3,4,9] \rArr [1,2,3,4,5,6,7,9]\)

def merge(L, R, A, i, j, a, b):
    '''Merge sorted L[:i] and R[:j] into A[a:b] using two-pointer algorithm'''
    if a < b:
        if (j <= 0) or (i > 0 and L[i-1] > R[j-1]):
            A[b-1] = L[i-1]
            i = i - 1
        else:
            A[b-1] = R[j-1]
            j = j -1
        merge(L, R, A, i, j, a, b-1)

def merge_sort(A, a , b):
    '''Sort A[a:b]'''
    if 1 < b - a:
        c = (a + b + 1) // 2
        merge_sort(A, a, c)
        merge_sort(A, c, b)
        L, R = A[a:c], A[c:b]
        merge(L, R, A, len(L), len(R), a, b)
        
    return A

A=[2,6,3,9,8]
print(merge_sort(A, 0, len(A)))
[2, 3, 6, 8, 9]
def merge(L, R):
    """Meger two sorted array L and R in a sorted array A """
    A = []
    l, r = 0, 0

    while l <= len(L) - 1 and r <= len(R) - 1:
        if L[l] < R[r]:
            A.append(L[l])
            l += 1
        elif L[l] > R[r]:
            A.append(R[r])
            r += 1
        else:
            A.append(L[l])
            A.append(R[r])
            l += 1
            r += 1
    # if one of the array reachs the end
    if l == len(L):
        A += R[r:]
    elif r == len(R):
        A += L[l:] 

    return A    

def sort(A):
    """Sort A by merge sorting"""
    if len(A) <= 1:
        return A

    l, r = 0, len(A) - 1
    # be careful of mid pointer. special case is length of 2, m should be 1, not 0
    m = (r + l) // 2 + 1 
    
    L = A[:m]
    R = A[m:]
    LS = sort(L)
    RS = sort(R)

    AS = merge(LS, RS)

    return AS

L = [1,4,6,7]
R = [2,3,4]
print(merge(L, R))

A = L + R
print(sort(A))
[1, 2, 3, 4, 4, 6, 7]
[1, 2, 3, 4, 4, 6, 7]

Summary#

  • insertion_sort, selection_sort, merge_sort requires \(O(n^2), O(n^2), O(nlogn)\) time on arrays. They may have different time complexity for other data strucutures, which depends on the operations of get, set and compare.

    • insertion_sort requires \(O(n^2)\) times of get, set, and compare. Static operations in arrays are \(O(1)\) time, so this sorting on arrays uses \(O(n^2)\) time.

    • selection_sort requires \(O(n^2)\) times of get and compare, and \(O(n)\) times of set. So this sorting on arrays uses \(O(n^2)\) time.

    • merge_sort requires \(O(nlogn)\) times of get, set and compare. So this sorting on arrays uses \(O(nlogn)\) time.

    • insert_sort and selection_sort are in-place sorting, which doesnt require additional space. merge_sort requires additional \(O(n)\) space to merge two sorted sub-arrays.