public int height () {
// post: returns height of this tree
if (this.isEmpty()) return 0;
BinaryTreeInterface left =
this.detachLeftSubtree();
BinaryTreeInterface right =
this.detachRightSubtree();
int totalHeight = 1 +
Math.max(left.height(), right.height());
this.attachLeftSubtree(left);
this.attachRightSubtree(right);
return totalHeight;
}
Properties about trees cont'd

The height of a nonempty binary tree, T, the maximum depth taken over all of its nodes.
i.e. max {depth(x)  x is a node of T}

A binary tree of height h has at most 2^{h }1 nodes.

A binary tree with n nodes has height at least log_{2}(n+1).

Define a full node to be a node with exactly two children.
N.B. A tree of height h having the maximum number of nodes (2^{h}1) is called a full tree.

In any nonempty binary tree, the number of leaves is one more than the number of full nodes.

In any binary tree, the number of empty subtrees is one more than the number of nonempty subtrees.

Most of these properties can be proved by mathematical induction.
ADT Table

Components: associations from keys (from some domain space) to values

simple (partial) functions: values are atomic

databases: values are records of fieldvalue pairs (often including the keyvalue pair too)

sets: values are empty; ~ characteristic function

mapping from student ID to name

mapping from student ID to student record

set of student IDs for students in CS 134 (mapping from ID to “taking CS 134”)

look up given key tableRetrieve

insert a new association tableInsert

delete association for a given key tableDelete

at most one value per key (although that value can be a collection for some tables)

does not support inverse mapping except through enumeration

Iterators do not necessarily encounter keys in order.
Simple representations

Keep table as a sequence of associations in no particular order, with no keys repeated.

using a vector

using a linked list

efficiency of operations:
Consider a vector representation.
Look at number of comparisons of two keys and number of elements whose locations in the vector change:

e.g., how many comparisons are executed during a call to “retrieve” in the worst case? or in the best case? how many moves are needed during a call to “delete”?
Similar analyses can be applied to reason about linked list representations.

implementation convenience: use a single private (nonADT) method “position” to find the location of association matching a given key, if it exists
Ordering collections
Why does a vocabulary dictionary keep words in alphabetical order?

To find a word, we could start at the beginning and scan every word in the dictionary.

We can do better if the dictionary is sorted by key.

Idea: open a dictionary near the middle, and then determine whether to search in the first or second half
a sorted dictionary is either empty or the concatenation of two sorted subdictionaries of (approx.) half the size for which every word in the first is smaller than every word in the second

requires that keys be “comparable”
Binary search
Given a sequence of comparable objects in ascending order, find the index matching a target key if it is present, or otherwise return the index of the slot where it would be inserted.
returns index such that 0 index data.size(),
data[0..index1] < target, and
data[index..data.size()1] target
int position(Comparable target){
// pre: target is nonnull and data values ascending
// post: returns ideal position of target in data vector
return search(0,data.size(),target);
}
int search(int lo, int hi, Comparable key) {
// pre: 0 lo hi data.size( ); key not null
// post: returns ideal position of key in data[lo..hi]
Comparable m;
int mid = (lo + hi)/2;
if (lo == hi) return lo;
else {
m = (Comparable)data.elementAt(mid);
if (m.compareTo(key) < 0) // m < key
return search(mid+1, hi, key);
else return search(lo, mid, key);
}
}

How efficient is binary search?
Bounding efficiency

Running time of a program is a function of the “size” of the problem being solved

for collections: size = number of elements

for binary search: size = hi – lo
Consider solution to a problem of size n

Running time using one compiler and one machine might be
.33365n^{2}  .43n + 3.4 sec

Another compiler and another machine might take 4.5n^{2} + 17n msec

In either case, doubling the size of a large problem means that the solution takes about 4 times as long to run

simplify both to "order nsquared", written O(n^{2})
Bigoh notation

keep dominant term,

remove leading constant,

put O(..) around it

Informally: f(n) is O(g(n)) if f(n) and g(n) differ by at most a fixed constant for sufficiently large n.

Formally: f(n) is O(g(n)) if there exist two positive constants, c and n_{0}, such that f(n) c*g(n) for all n n_{0}

Algorithm A is O(g(n)) if for any reasonable implementation of the algorithm on any reasonable computer, the time required by A to solve a problem of size n is O(g(n)).
(1/2 n^{2} + 1/2 n) is O(n^{2})
(13.3 n + 4 n^{3} + 3/4 n^{2}) is O(n^{3})
Onotation (intuition)
Algorithm Ai has running time O(g_{i}(n))
Use of Onotation

Common classes of functions

constant: O(1)

logarithmic: O(log n)

linear: O(n)

quadratic: O(n^{2})

cubic: O(n^{3})

exponential: O(2^{n})

We don’t need an exact analysis of every operation; constants can be accumulated

Examples:

popping an element from a stack:

removing an element from a list:

calculating the size of a binary tree:
Comparing algorithms
[Jon Bentley, “Programming Pearls: Algorithm Design Techniques,” Comm. ACM 27, 9 (Sept. 1984) pp. 865871]

problem: given an integer array A, find the values i and j which maximize

O(n^{3}) algorithm: try all possible values of i and j

How many choices for i?

How many choices for j, given i?

Cost of figuring out the value for a given i and j?
Alternative approach

in single pass over array, keep track of best range so far as well as best starting point for a range ending at current index

Bentley’s implementations:

O(n^{3}) algorithm in finelytuned FORTRAN on a Cray1 supercomputer

O(n) algorithm in interpreted BASIC on a Radio Shack TRS80 Model III microcomputer

3.0n^{3} nanoseconds on Cray computer

19.5n milliseconds (19500000n nanoseconds) on Radio Shack computer
Bentley’s results

Cray

TRS80

n

3.0n^{3} ns

19.5n ms

10

3 s

.195 s

100

.003 s

1.95 s

1000



2500



10^{4}



10^{5}



10^{6}



. . .
Faster hardware isn’t good enough!
Efficiency of binary search

asymptotic analysis: interested in behaviour for large vectors
int search(int lo, int hi, Comparable key) {
int mid = (lo + hi)/2;
if (lo == hi) return lo;
else ... // compare key to value of data.elementAt(mid)
return search(mid+1, hi, key);
or return search(lo, mid, key);
}

Each recursive call halves vector:
n n/2 n/4 n/8 n/16 …
after i comparisons, hilo = n/2^{i}
but search ends when hilo < 1
and there is O(1) work between calls
time for binary search is O(log_{2}n)

Doubling the size of the vector requires only one more call to search!

We usually write O(log n) with no subscript.
Efficiency of implementing Table using Vector
Reexamine worst case:
method

unordered vector

ordered vector

comps

moves

comps

moves

tableRetrieve





tableDelete





tableInsert (with array large enough)





tableInsert (with array fully occupied)





Binary search trees

A binary search tree is an empty binary tree or a labelled binary tree such that:

The labels can be compared.

The label of the root of a binary search tree is greater than all labels in its left subtree.

The label of the root of a binary search tree is less than all labels in its right subtree.

The left and right subtrees are also binary search trees.

Note: Binary search tree for a given set not unique
Binary search trees as implementations of tables

labels represent associations (or just keys if no associated values)

simple code for retrieving an item:
protected static KeyedItem retrieveItem
(TreeNode tNode, Comparable searchKey) {
KeyedItem treeItem;
if (tNode == null) { treeItem = null; }
else {
KeyedItem nodeItem =
(KeyedItem)tNode.getItem();
int searchResult =
searchKey.compareTo(nodeItem.getKey());
if (searchResult == 0) {
treeItem = (KeyedItem)tNode.getItem();
} else if (searchResult < 0) {
treeItem =
retrieveItem(tNode.getLeft(), searchKey);
} else {
treeItem = retrieveItem(tNode.getRight(),
searchKey);
}
}
return treeItem;
}

Note: Inorder traversal encounters values in increasing order.
Maintaining a binary search tree
Postcondition can be expressed recursively.

Empty tree: replaced by a leaf node containing the new value

Otherwise: if the new value is less than the root’s value, inserted in the left subtree; else inserted in the right subtree
What should be done for duplicate keys?
Postcondition can be expressed by cases.

Not present: no change to the tree

Else value to delete found in leaf: that leaf deleted

Else value to delete found in node having one empty subtree: that node deleted and other subtree attached to the parent

Else value to delete found in node having two nonempty subtrees: that node contains the value previously found in its predecessor (or successor) node and that other node deleted (using either case 2 or 3 as appropriate)

tableRetrieve, tableInsert, and tableDelete use O(h) time, where h is the height of the tree in the best case and on average, h is O(log n)
Sorting

goals for studying sorting:

“common knowledge” in computer science

wide variety of possible approaches

practice in the design and analysis of algorithms

practice in assertions and verification

All the data can fit in memory.

Data is all comparable.

Data is stored in an array of size n.

2 important operations affecting time:

comparisons: comparing values of two data items

data movements: moving or copying a data item

Ignore operations on index values, etc.

Rationale:

data access and manipulation may be expensive for large objects

number of other operations executed between comparisons and data movements is bounded by a constant

1 important factor affecting space: amount of auxiliary storage

Always need O(n) space to hold the data itself, but how much other space is needed?
Selection sort

Idea: repeatedly extract maximal element from among those still unsorted

How much auxiliary space is needed?

What is the running time?

look at the structure of the code:
for (int last=n1; last >= 1; last)
... indexOfLargest(...)

}
Linear insertion sort
Idea: repeatedly insert the element that happens to be next into the proper place among those elements already sorted.
see Carrano & Prichard, pp 390393

How much auxiliary space is needed?

What is the running time?
A recursive sorting algorithm: mergesort
see Carrano & Prichard, pp 393398

Important subroutine: merge

Input is two sorted ranges in an array.

Identify candidate at start of each input range.

Repeatedly copy the smaller of the two candidates to the temporary array.

When one input range is exhausted, simply copy the rest of the other one to the temporary array.

Copy the temporary array back to the input array.
Mergesort itself

Mergesort uses “divide and conquer”

divide a large problem into smaller problems

solve the smaller problems

put the solutions together to form an answer to the larger problem

smaller problems: sorting two half arrays

put the solutions of those two problems together using merge
void mergesort(d[0..n1]) {
if (n>1) {
mergesort(d[0..n/21]);
mergesort(d[n/2..n1]);
merge(d[0..n/21], d[n/2..n1]);
}
}
Analysis of mergesort

If n is a power of 2, the “tree of problems to solve” looks like:
where labels represent sizes of the problems to solve

Runtime =

Auxiliary space =
Quicksort

Idea: pick some pivot element and place it where it belongs;

sort all elements less than the pivot;

sort all elements greater than the pivot

Quicksort also uses divide and conquer, but the sizes of the two problems depend on the input.
see Carrano & Prichard, pp 398410

Important subroutine: partition
int partition(Comparable
theArray[], int first, int last);
// pre: 0 first last < theArray.length
// post: returns split s.t. first split last
// and permutes theArray s.t
// theArray [i] theArray [split] for first i < split
// theArray [i] theArray [split] for split < i last
Quicksort itself

parameters are needed to indicate the subarray being sorted by a given recursive call
void quickSortRec(Comparable theArray[],
int first, int last) {
// pre: (0 first last < theArray.length) or (last < first)
// post: values in theArray[first..last] are permuted into ascending order
int pivot;
if (first < last) {
pivot = partition(theArray, first, last);
quickSortRec(theArray, first, pivot1);
quickSortRec(theArray, pivot+1, last);
}
}

a nonrecursive wrapper method quickSort(Comparable theArray[], int n) calls
quickSortRec(theArray, 0, n1);
Analysis of quicksort

each segment split exactly in two O(n log n) [similar “tree of problems” to that of mergesort]

but splits having n/2 elements in each part not likely

each segment split on its first element O(n^{2})

but how likely are splits into 0 and n1 elements?

So what can we expect on average?

assume all initial permutations of elements equally likely

splits having n/4 elements in one of the parts still yield “logarithmic height tree” (base of logarithm is smaller so value is larger by constant factor) and probability of split at least this good is 0.5

similarly for splits having n/k elements in one of the parts, for any constant k

Intuitively: average case is like best case but with larger constant factor.

O(n log n) average case can be proven using material from Stats 230.

N.B. Auxiliary space = O(log n) with clever reprogramming
Speeding up quicksort

Try to avoid bad splits (e.g., do not just choose first elements as pivots).

randomization works well

worst case still O(n^{2}), but less likely in practice to have “bad” inputs

Speed up the algorithm in practice by using linear insertion sort on small segments (e.g., < 10 elements).

Even better: Stop recursion without sorting small segments (e.g., < 10 elements) at all:

every element within 10 of its final position

final single call to linear insertion sort finishes overall sort quickly
Dostları ilə paylaş: 