Lecture 3: Set and Sorting#

Set Interface#

Type

Function

Note

Container

build(x)

given an iterable x, build set from items in x

len(x)

return the number of stored values

Static

find(k)

return the stored item with key $k

Dynamic

insert(x)

add x to set

delete(k)

remove and return the stored item with key k

Order

iter_ord()

return the stored items one-by-one in key order

find_min()

return the item with smallest key

find_max()

return the item with largest key

find_next(k)

return the item with smallest key larger than k

find_prev(k)

return the stored item with largest key smaller than k

  • storing items in an array in arbitrary order can implement a (not so efficient) set

  • stored items sorted increasing by key allows:

    • faster find min/max (at first/last index of array)

    • faster find via binary search: O(logn)

  • how to construct a sorted array?

! Confused: set and sequence are two types of interface. So array as a data structure can have both interfaces?

Sorting#

  • given a sorted array, we can leverage binary search to make an efficient set data structure

  • input: static array A of n numbers

  • output: static array B of which is a sorted permutation of A

  • example: [8,2,4,9,3] -> [2,3,4,8,9]

Permutation Sort#

  • there are n! permuations of A, at least one of which is sorted

  • for each permutation, check wether sorted, Θ(n) time

  • permulation_sort analysis:

    • try all possibilities - brute force

    • running time O(n!n), which is exponential

from itertools import permutations

def permutation_sort(A):
    '''Sort A'''
    for B in permutations(A):               # O(n!)
        if is_sorted(B):                    # O(n) this is a pesudo code
            return B                        # O(1)

Recurrences#

  • substitution:

  • recurrence tree: draw a tree representing the recursive calls and sum computation at nodes

  • master theorem:

Selection Sort#

  • select the smallest (forward from 0 to n)/greatest (reverse from n to 0) element from the remaining elements and place it a the correct position

  • forward

    • find a smallest number in the remaining array A[i:] and swap it to A[i]

    • example: [8,2,4,9,3] -> [2,8,4,9,3] -> [2,3,4,9,8] -> [2,3,4,9,8] -> [2,3,4,8,9]

  • reverse

    • find the largest number in prefix A[:i+1] and swap it to A[i]

    • recursively sort prefix A[:i]

    • example: [8,2,4,9,3] -> [8,2,4,3,9] -> [3,2,4,8,9] -> [3,2,4,8,9] -> [2,3,4,8,9]

def selection_sort(A, i):                           # T(i)
    "Sort A[:i+1]"                                  
    if i > 0:                                       # O(1)
        # find maximum in A[:i] and return index j
        j = find_max_prefix(A, i)                   # O(n)
        # swap A[j] with A[i]
        A[i], A[j] = A[j], A[i]                     # O(1)
        selection_sort(A, i-1)                      # T(i-1)

    return A

def find_max_prefix(A, i):                          # S(i)
    "Find the maximum in prefix A[:i+1]"
    if i == 0:                                      # O(1)
        return i                                    # O(1)
    if i > 0:                                       # O(1)
        j = find_max_prefix(A, i-1)                 # S(i-1)
        if A[j] > A[i]:                             # O(1)
            return j                                # O(1)
        else:
            return i                                # O(1)  

A=[6,2]
print(selection_sort(A, len(A)-1))

A=[2,6,3,9,8]
print(selection_sort(A, len(A)-1))
[2, 6]
[2, 3, 6, 8, 9]
  • prefix_max analysis:

    • idea:

      • for an array A, with prefix i, the maximum of A[:i+1] is either A[i] or the maximum of A[:i].

      • recursion

    • Induction: assume correct for i, maximum is either A[i] or the maximum of A[:i], returns correct index in either case.

    • S(1)=O(1),S(n)=S(n1)+O(1)

      • substitution: S(n)=O(n)

      • recurrence tree

  • selection_sort analysis:

    • idea:

      • recursively find the largest number in the prefix A[:i+1],i=n,n1,...,0

Insertion Sort#

  • build the final sorted array one item at a time by inserting the element into a particular position and shifting the remaining element. A good animation can be found at https://en.wikipedia.org/wiki/Insertion_sort.

  • procedure using recursion

    • assume we have an array A[:i+1], with the sorted prefix A[:i], and the element A[i],

    • compare A[i1] and A[i].

      • if A[i]>=A[i1], then move forward the pointer i by 1 and repeat the loop

      • else swap A[i] with A[i1], and using the same insert_sort procedure to sort A[:i1]. This is necessary because the swapping will lead a unsorted A[:i], See the last step in the following example.

        [2, 9, 8, 4] -> [2, 9, 8, 4] -> [2, 8, 9, 4] -> [2, 4, 8, 9] 
        

        The last step internally proceeds as follows:

        [2, 8, 9, 4] -> [2, 8, 4, 9] -> [2, 4, 8, 9]
        
def insert_last(A, i):
    '''Sort A[:i+1] assuming sorted A[:i]'''
    if i > 0 and A[i] < A[i-1]:
        A[i-1], A[i] = A[i], A[i-1]
        insert_last(A, i-1)
    return A

def insert_sort(A, i):
    '''Sort A[:i+1]'''
    if i > 0:
        insert_sort(A, i-1)
        insert_last(A, i)
    return A

print(insert_last([2, 8, 9, 3], 3))
print(insert_sort([2, 8, 9, 3], 3))

print(insert_last([2, 9, 8, 3], 3)) 
print(insert_sort([2, 9, 8, 3], 3))
[2, 3, 8, 9]
[2, 3, 8, 9]
[2, 3, 9, 8]
[2, 3, 8, 9]
  • insert_last analysis

    • base case: for i=0, array has one element so is sorted.

    • induction: assume correct for i, if A[i]>A[i1], array is sorted. Otherwise swapping the last two elements allows up to sort A[:i] by induction.

    • S(1)=O(1),S(n)=S(n1)+O(1),S(n)=O(n)

  • insert_sort analysi

    • base case: for i=0, arrya has one element so is sorted

    • induction: assume correct for i, algorithm sorts A[:i] by induction, and then insert_last correctly sorts the rest as proved before

    • T(1)=O(1),T(n)=T(n1)+S(n)=T(n1)+O(n)\rArrT(n)=O(n2)

Merge Sort#

  • recursively sort first half and second half

  • merge sorted halves into one sorted list (two-pointer algorithm)

  • example: [7,1,5,6,2,4,9,3]\rArr[1,7,5,6,2,4,3,9]\rArr[1,5,6,7,2,3,4,9]\rArr[1,2,3,4,5,6,7,9]

def merge(L, R, A, i, j, a, b):
    '''Merge sorted L[:i] and R[:j] into A[a:b] using two-pointer algorithm'''
    if a < b:
        if (j <= 0) or (i > 0 and L[i-1] > R[j-1]):
            A[b-1] = L[i-1]
            i = i - 1
        else:
            A[b-1] = R[j-1]
            j = j -1
        merge(L, R, A, i, j, a, b-1)

def merge_sort(A, a , b):
    '''Sort A[a:b]'''
    if 1 < b - a:
        c = (a + b + 1) // 2
        merge_sort(A, a, c)
        merge_sort(A, c, b)
        L, R = A[a:c], A[c:b]
        merge(L, R, A, len(L), len(R), a, b)
        
    return A

A=[2,6,3,9,8]
print(merge_sort(A, 0, len(A)))
[2, 3, 6, 8, 9]
def merge(L, R):
    """Meger two sorted array L and R in a sorted array A """
    A = []
    l, r = 0, 0

    while l <= len(L) - 1 and r <= len(R) - 1:
        if L[l] < R[r]:
            A.append(L[l])
            l += 1
        elif L[l] > R[r]:
            A.append(R[r])
            r += 1
        else:
            A.append(L[l])
            A.append(R[r])
            l += 1
            r += 1
    # if one of the array reachs the end
    if l == len(L):
        A += R[r:]
    elif r == len(R):
        A += L[l:] 

    return A    

def sort(A):
    """Sort A by merge sorting"""
    if len(A) <= 1:
        return A

    l, r = 0, len(A) - 1
    # be careful of mid pointer. special case is length of 2, m should be 1, not 0
    m = (r + l) // 2 + 1 
    
    L = A[:m]
    R = A[m:]
    LS = sort(L)
    RS = sort(R)

    AS = merge(LS, RS)

    return AS

L = [1,4,6,7]
R = [2,3,4]
print(merge(L, R))

A = L + R
print(sort(A))
[1, 2, 3, 4, 4, 6, 7]
[1, 2, 3, 4, 4, 6, 7]

Summary#

  • insertion_sort, selection_sort, merge_sort requires O(n2),O(n2),O(nlogn) time on arrays. They may have different time complexity for other data strucutures, which depends on the operations of get, set and compare.

    • insertion_sort requires O(n2) times of get, set, and compare. Static operations in arrays are O(1) time, so this sorting on arrays uses O(n2) time.

    • selection_sort requires O(n2) times of get and compare, and O(n) times of set. So this sorting on arrays uses O(n2) time.

    • merge_sort requires O(nlogn) times of get, set and compare. So this sorting on arrays uses O(nlogn) time.

    • insert_sort and selection_sort are in-place sorting, which doesnt require additional space. merge_sort requires additional O(n) space to merge two sorted sub-arrays.