Lecture 3: Set and Sorting#
Set Interface#
Type |
Function |
Note |
---|---|---|
Container |
|
given an iterable \(x\), build set from items in \(x\) |
– |
|
return the number of stored values |
Static |
|
return the stored item with key $k |
Dynamic |
|
add \(x\) to set |
– |
|
remove and return the stored item with key \(k\) |
Order |
|
return the stored items one-by-one in key order |
– |
|
return the item with smallest key |
– |
|
return the item with largest key |
– |
|
return the item with smallest key larger than \(k\) |
– |
|
return the stored item with largest key smaller than k |
storing items in an array in arbitrary order can implement a (
not so efficient
) setstored items sorted increasing by key allows:
faster find min/max (at first/last index of array)
faster find via binary search: \(O(logn)\)
how to construct a sorted array?
! Confused: set and sequence are two types of interface. So array
as a data structure can have both interfaces?
Sorting#
given a sorted array, we can leverage binary search to make an efficient set data structure
input:
static array
\(A\) of \(n\) numbersoutput:
static array
\(B\) of which is a sorted permutation of \(A\)example: [8,2,4,9,3] -> [2,3,4,8,9]
Permutation Sort#
there are \(n!\) permuations of \(A\), at least one of which is sorted
for each permutation, check wether sorted, \(\Theta(n)\) time
permulation_sort
analysis:try all possibilities - brute force
running time \(O(n!*n)\), which is
exponential
from itertools import permutations
def permutation_sort(A):
'''Sort A'''
for B in permutations(A): # O(n!)
if is_sorted(B): # O(n) this is a pesudo code
return B # O(1)
Recurrences#
substitution:
recurrence tree: draw a tree representing the recursive calls and sum computation at nodes
master theorem:
Selection Sort#
select the smallest (forward from 0 to n)/greatest (reverse from n to 0) element from the remaining elements and place it a the correct position
forward
find a smallest number in the remaining array \(A[i:]\) and swap it to \(A[i]\)
example: [8,2,4,9,3] -> [2,8,4,9,3] -> [2,3,4,9,8] -> [2,3,4,9,8] -> [2,3,4,8,9]
reverse
find the largest number in prefix \(A[:i+1]\) and swap it to \(A[i]\)
recursively sort prefix \(A[:i]\)
example: [8,2,4,9,3] -> [8,2,4,3,9] -> [3,2,4,8,9] -> [3,2,4,8,9] -> [2,3,4,8,9]
def selection_sort(A, i): # T(i)
"Sort A[:i+1]"
if i > 0: # O(1)
# find maximum in A[:i] and return index j
j = find_max_prefix(A, i) # O(n)
# swap A[j] with A[i]
A[i], A[j] = A[j], A[i] # O(1)
selection_sort(A, i-1) # T(i-1)
return A
def find_max_prefix(A, i): # S(i)
"Find the maximum in prefix A[:i+1]"
if i == 0: # O(1)
return i # O(1)
if i > 0: # O(1)
j = find_max_prefix(A, i-1) # S(i-1)
if A[j] > A[i]: # O(1)
return j # O(1)
else:
return i # O(1)
A=[6,2]
print(selection_sort(A, len(A)-1))
A=[2,6,3,9,8]
print(selection_sort(A, len(A)-1))
[2, 6]
[2, 3, 6, 8, 9]
prefix_max
analysis:idea:
for an array \(A\), with prefix \(i\), the maximum of \(A[:i+1]\) is either \(A[i]\) or the maximum of \(A[:i]\).
recursion
Induction: assume correct for \(i\), maximum is either \(A[i]\) or the maximum of \(A[:i]\), returns correct index in either case.
\(S(1) = O(1), S(n) = S(n-1) + O(1)\)
substitution: \(S(n) = O(n)\)
recurrence tree
selection_sort
analysis:idea:
recursively find the largest number in the prefix \(A[:i+1], i = n, n-1, ..., 0\)
Insertion Sort#
build the final sorted array one item at a time by inserting the element into a particular position and shifting the remaining element. A good animation can be found at https://en.wikipedia.org/wiki/Insertion_sort.
procedure using recursion
assume we have an array \(A[:i+1]\), with the sorted prefix \(A[:i]\), and the element \(A[i]\),
compare \(A[i-1]\) and \(A[i]\).
if \(A[i] >= A[i-1]\), then move forward the pointer \(i\) by 1 and repeat the loop
else swap \(A[i]\) with \(A[i-1]\), and using the same
insert_sort
procedure to sort \(A[:i-1]\). This is necessary because the swapping will lead a unsorted \(A[:i]\), See the last step in the following example.[2, 9, 8, 4] -> [2, 9, 8, 4] -> [2, 8, 9, 4] -> [2, 4, 8, 9]
The last step internally proceeds as follows:
[2, 8, 9, 4] -> [2, 8, 4, 9] -> [2, 4, 8, 9]
def insert_last(A, i):
'''Sort A[:i+1] assuming sorted A[:i]'''
if i > 0 and A[i] < A[i-1]:
A[i-1], A[i] = A[i], A[i-1]
insert_last(A, i-1)
return A
def insert_sort(A, i):
'''Sort A[:i+1]'''
if i > 0:
insert_sort(A, i-1)
insert_last(A, i)
return A
print(insert_last([2, 8, 9, 3], 3))
print(insert_sort([2, 8, 9, 3], 3))
print(insert_last([2, 9, 8, 3], 3))
print(insert_sort([2, 9, 8, 3], 3))
[2, 3, 8, 9]
[2, 3, 8, 9]
[2, 3, 9, 8]
[2, 3, 8, 9]
insert_last
analysisbase case: for \(i=0\), array has one element so is sorted.
induction: assume correct for \(i\), if \(A[i] > A[i-1]\), array is sorted. Otherwise swapping the last two elements allows up to sort \(A[:i]\) by induction.
\(S(1) = O(1),S(n)=S(n-1)+O(1), S(n)=O(n)\)
insert_sort
analysibase case: for \(i=0\), arrya has one element so is sorted
induction: assume correct for \(i\), algorithm sorts \(A[:i]\) by induction, and then
insert_last
correctly sorts the rest as proved before\(T(1)=O(1), T(n) = T(n-1) + S(n) = T(n-1) + O(n) \rArr T(n)=O(n^2)\)
Merge Sort#
recursively sort first half and second half
merge sorted halves into one sorted list (two-pointer algorithm)
example: \([7,1,5,6,2,4,9,3] \rArr [1,7,5,6,2,4,3,9] \rArr [1,5,6,7,2,3,4,9] \rArr [1,2,3,4,5,6,7,9]\)
def merge(L, R, A, i, j, a, b):
'''Merge sorted L[:i] and R[:j] into A[a:b] using two-pointer algorithm'''
if a < b:
if (j <= 0) or (i > 0 and L[i-1] > R[j-1]):
A[b-1] = L[i-1]
i = i - 1
else:
A[b-1] = R[j-1]
j = j -1
merge(L, R, A, i, j, a, b-1)
def merge_sort(A, a , b):
'''Sort A[a:b]'''
if 1 < b - a:
c = (a + b + 1) // 2
merge_sort(A, a, c)
merge_sort(A, c, b)
L, R = A[a:c], A[c:b]
merge(L, R, A, len(L), len(R), a, b)
return A
A=[2,6,3,9,8]
print(merge_sort(A, 0, len(A)))
[2, 3, 6, 8, 9]
def merge(L, R):
"""Meger two sorted array L and R in a sorted array A """
A = []
l, r = 0, 0
while l <= len(L) - 1 and r <= len(R) - 1:
if L[l] < R[r]:
A.append(L[l])
l += 1
elif L[l] > R[r]:
A.append(R[r])
r += 1
else:
A.append(L[l])
A.append(R[r])
l += 1
r += 1
# if one of the array reachs the end
if l == len(L):
A += R[r:]
elif r == len(R):
A += L[l:]
return A
def sort(A):
"""Sort A by merge sorting"""
if len(A) <= 1:
return A
l, r = 0, len(A) - 1
# be careful of mid pointer. special case is length of 2, m should be 1, not 0
m = (r + l) // 2 + 1
L = A[:m]
R = A[m:]
LS = sort(L)
RS = sort(R)
AS = merge(LS, RS)
return AS
L = [1,4,6,7]
R = [2,3,4]
print(merge(L, R))
A = L + R
print(sort(A))
[1, 2, 3, 4, 4, 6, 7]
[1, 2, 3, 4, 4, 6, 7]
Summary#
insertion_sort
,selection_sort
,merge_sort
requires \(O(n^2), O(n^2), O(nlogn)\) time on arrays. They may have different time complexity for other data strucutures, which depends on the operations ofget
,set
andcompare
.insertion_sort
requires \(O(n^2)\) times ofget
,set
, andcompare
. Static operations in arrays are \(O(1)\) time, so this sorting on arrays uses \(O(n^2)\) time.selection_sort
requires \(O(n^2)\) times ofget
andcompare
, and \(O(n)\) times ofset
. So this sorting on arrays uses \(O(n^2)\) time.merge_sort
requires \(O(nlogn)\) times ofget
,set
andcompare
. So this sorting on arrays uses \(O(nlogn)\) time.insert_sort
andselection_sort
are in-place sorting, which doesnt require additional space.merge_sort
requires additional \(O(n)\) space to merge two sorted sub-arrays.