Lecture 2: Data Structure and Dynamic Array#
Data Structure vs Interfaces#
data structure
A data structure is a way to store data, with algorithms that support operations on the data
2 main data structure: array based and pointer based
interfaces
Collection of supported operations is called an interface (also API or ADT)
Interface is a specification: what operations are supported (the problem!)
2 main interfaces: sequence and set
Interfaces#
Sequence interface -> maintain a sequence of items such as \(x_0, x_1, ..., x_n\) subject to these operations:
Container
build(x)
- make a new DSlen()
- return n
Static
iter_seq
- main the sequence orderget_at(i)
- return \(x_i\)set_at(i, x)
- set \(x_i\)
Dynamic
insert_at(i,x)
delete_at(i)
insert_first(x)
insert_last(x)
delete_first()
delete_last()
Set interface -> future lectures
Static Array Sequence#
Static array -> python has no static array, only dynamic array.
word RAM -> model of computation
memory: array of \(w\)-bits of words
array: consecutive chunk of memory
array[i] = memeory[address[array]+i]
array access in \(O(1)\) time
the size of static arrary is fixed
\(O(1)\) per
get_at
,set_at
,len
\(O(n)\) per
build
,iter_seq
Memory Allocation Model: allocate array of size n in \(\Theta (n)\) time
The above constant time operations assumes \(w \ge log(n)\)
Apply dynamic sequence interface to static array:
insert/delete_at()
cost \(\Theta(n)\) time, the same holds forinsert/delete_last
.shift all ites after the modified item
allocate a new array and throw away the old one
not efficient for dynamic operations
Dynamic Array Sequence#
Dynamic Arrays; e.g., List in Python
make an array efficient for
insert/delete_last
dynamic operationsIdeas: allocate extra space so reallocation doesn’t occur with every dynamic operation
relax the constraints size(array) = n <- # of items in the sequences
enforce the size of array=\(\Theta (n)\) -> \(\ge n\), which means the size of array is greater than its length
maitain \(A[i] = x_i\)
insert_last
can be run in \(O(1)\) time with extra space
A[len] = x, len +=1
fill ratio: \(1 \le r \le 1\) the ratio of items to space. If the size of array is \(n\), with fill ratio, it will take \(n/r\) space instead of \(n\) as in static array.
whenever the array is full (\(r=1\)), allocate \(\Theta(n)\) extra space at the end to fill ratio \(r_i\) (e.g., 1/2)
will have to insert \(\Theta(n)\) items before the size is full and the next reallocation
example
allocate a new array of 2*size
why not size + 5
n
insert_last
from an empty arrayresize at n = 1, 2, 4, 8, 16, …
resize cost: \(\Theta(1+2+4+8+16+,..., logn) = \Theta(2^(logn)) = \Theta(n)\)
time complexity
a single operation take \(\Theta(n)\) time for reallocation
however, any sequence of \(\Theta(n)\) operations takes \(\Theta(n)\) time
so each operation takes \(\Theta(1)\) time “on average”
see “Amortized Analysis”
If a user continues to append elements to a dynamic array, any reserved capacity will eventually be exhausted.
In that case, the class requests a new, larger array from thesystem, and initializes the new array so that its prefix matches that of the existing smaller array.
At that point in time, the old array is no longer needed, so it is reclaimed by the system.
Dynamic array deletion
delete from the back? \(\Theta(1)\) time
however, wasteful in space. Want size of the data structure to stay \(\Theta(n)\).
Linked List Sequence#
Linked List
pointer-based: pointer is the index for the memeory arrays
Each item stored in a node which contains a pointer to the next node in sequence
Each node has two fields:
node.item
andnode.next
Can manipulate nodes simply by relinking pointers!
Maintain pointers to the first node in sequence (called the
head
)Appy dynamic sequence interface to linked list
insert/delete_first
: very efficient in \(\Theta(1)\) timeaccessing the
i
th item needs follow the next \(i\) pointersi
timesget_at(i)
orset_at(i)
: \(\Theta(i)\) time, and the worst scenario is \(\Theta(n)\)
insert_last
: \(O(n)\)how to improve:
add a
tail
pointer at the front -> \(O(1)\)double linked list
Amortized Analysis#
Data structure analysis technique to distribute cost over many operations
Operation has amortized cost \(T (n)\) if \(k\) operations cost at most \(≤ kT (n)\)
“T (n) amortized” roughly means \(T (n)\) “on average” over many operations
Inserting into a dynamic array takes \(Θ(1)\) amortized time