What is computer science

Yüklə 1,3 Mb.

səhifə	3/4
tarix	08.08.2018
ölçüsü	1,3 Mb.
	#61229

1 2 3 4

public int height () {

// post: returns height of this tree

if (this.isEmpty()) return 0;

BinaryTreeInterface left =

this.detachLeftSubtree();

BinaryTreeInterface right =

this.detachRightSubtree();

int totalHeight = 1 +

Math.max(left.height(), right.height());

this.attachLeftSubtree(left);

this.attachRightSubtree(right);

return totalHeight;

}

Properties about trees cont'd

The height of a non-empty binary tree, T, the maximum depth taken over all of its nodes.

A binary tree of height h has at most 2^h-1 nodes.

A binary tree with n nodes has height at least log₂(n+1).
Define a full node to be a node with exactly two children.

N.B. A tree of height h having the maximum number of nodes (2^h-1) is called a full tree.

In any non-empty binary tree, the number of leaves is one more than the number of full nodes.
In any binary tree, the number of empty subtrees is one more than the number of non-empty subtrees.
Most of these properties can be proved by mathematical induction.

ADT Table

Components: associations from keys (from some domain space) to values

simple (partial) functions: values are atomic
databases: values are records of field-value pairs (often including the key-value pair too)
sets: values are empty; ~ characteristic function

Examples:

mapping from student ID to name
mapping from student ID to student record
set of student IDs for students in CS 134 (mapping from ID to “taking CS 134”)

Intuitive operations:

look up given key  tableRetrieve
insert a new association  tableInsert
delete association for a given key  tableDelete

Keys are unique; values need not be.

at most one value per key (although that value can be a collection for some tables)
does not support inverse mapping except through enumeration

Iterators do not necessarily encounter keys in order.

Simple representations

Keep table as a sequence of associations in no particular order, with no keys repeated.

using a vector
using a linked list

efficiency of operations:

Consider a vector representation.

Look at number of comparisons of two keys and number of elements whose locations in the vector change:

e.g., how many comparisons are executed during a call to “retrieve” in the worst case? or in the best case? how many moves are needed during a call to “delete”?

Similar analyses can be applied to reason about linked list representations.

implementation convenience: use a single private (non-ADT) method “position” to find the location of association matching a given key, if it exists

Ordering collections

Why does a vocabulary dictionary keep words in alphabetical order?

To find a word, we could start at the beginning and scan every word in the dictionary.
We can do better if the dictionary is sorted by key.
Idea: open a dictionary near the middle, and then determine whether to search in the first or second half

a sorted dictionary is either empty or the concatenation of two sorted sub-dictionaries of (approx.) half the size for which every word in the first is smaller than every word in the second

requires that keys be “comparable”

Binary search

Given a sequence of comparable objects in ascending order, find the index matching a target key if it is present, or otherwise return the index of the slot where it would be inserted.

returns index such that 0  index  data.size(),

data[0..index-1] < target, and

data[index..data.size()-1]  target
int position(Comparable target){

// pre: target is non-null and data values ascending

// post: returns ideal position of target in data vector

return search(0,data.size(),target);

}
int search(int lo, int hi, Comparable key) {

// pre: 0  lo  hi  data.size( ); key not null

// post: returns ideal position of key in data[lo..hi]

Comparable m;

int mid = (lo + hi)/2;

if (lo == hi) return lo;

else {

m = (Comparable)data.elementAt(mid);

if (m.compareTo(key) < 0) // m < key

return search(mid+1, hi, key);

else return search(lo, mid, key);

}

How efficient is binary search?

Bounding efficiency

Running time of a program is a function of the “size” of the problem being solved

for collections: size = number of elements
for binary search: size = hi – lo

Consider solution to a problem of size n

Running time using one compiler and one machine might be

.33365n² - .43n + 3.4 sec

Another compiler and another machine might take 4.5n² + 17n msec
In either case, doubling the size of a large problem means that the solution takes about 4 times as long to run
simplify both to "order n-squared", written O(n²)

Big-oh notation

Intuitively:

keep dominant term,
remove leading constant,
put O(..) around it

Informally: f(n) is O(g(n)) if f(n) and g(n) differ by at most a fixed constant for sufficiently large n.
Formally: f(n) is O(g(n)) if there exist two positive constants, c and n₀, such that f(n)  c*g(n) for all n  n₀
Algorithm A is O(g(n)) if for any reasonable implementation of the algorithm on any reasonable computer, the time required by A to solve a problem of size n is O(g(n)).

(1/2 n² + 1/2 n) is O(n²)

(13.3 n + 4 n³ + 3/4 n²) is O(n³)

O-notation (intuition)

Algorithm Ai has running time O(g_i(n))

Use of O-notation

Common classes of functions

constant: O(1)
logarithmic: O(log n)
linear: O(n)
quadratic: O(n²)
cubic: O(n³)
exponential: O(2ⁿ)

We don’t need an exact analysis of every operation; constants can be accumulated
Examples:

popping an element from a stack:
removing an element from a list:
calculating the size of a binary tree:

Comparing algorithms

[Jon Bentley, “Programming Pearls: Algorithm Design Techniques,” Comm. ACM 27, 9 (Sept. 1984) pp. 865-871]

problem: given an integer array A, find the values i and j which maximize

O(n³) algorithm: try all possible values of i and j

How many choices for i?
How many choices for j, given i?
Cost of figuring out the value for a given i and j?

Alternative approach

O(n) algorithm: possible through more clever analysis

in single pass over array, keep track of best range so far as well as best starting point for a range ending at current index

Bentley’s implementations:

O(n³) algorithm in finely-tuned FORTRAN on a Cray-1 supercomputer
O(n) algorithm in interpreted BASIC on a Radio Shack TRS-80 Model III microcomputer

estimated running times:

3.0n³ nanoseconds on Cray computer
19.5n milliseconds (19500000n nanoseconds) on Radio Shack computer

Bentley’s results

	Cray	TRS-80
n	3.0n³ ns	19.5n ms
10	3 s	.195 s
100	.003 s	1.95 s
1000
2500
10⁴
10⁵
10⁶

. . .

Faster hardware isn’t good enough!

Efficiency of binary search

asymptotic analysis: interested in behaviour for large vectors

int search(int lo, int hi, Comparable key) {

int mid = (lo + hi)/2;

if (lo == hi) return lo;

else ... // compare key to value of data.elementAt(mid)

return search(mid+1, hi, key);

or return search(lo, mid, key);

}

Each recursive call halves vector:

n n/2 n/4 n/8 n/16 …

after i comparisons, hi-lo = n/2ⁱ

but search ends when hi-lo < 1

and there is O(1) work between calls

 time for binary search is O(log₂n)

Doubling the size of the vector requires only one more call to search!
We usually write O(log n) with no subscript.

Efficiency of implementing Table using Vector

Re-examine worst case:

method	unordered vector		ordered vector
method	comps	moves	comps	moves
tableRetrieve
tableDelete
tableInsert (with array large enough)
tableInsert (with array fully occupied)

Binary search trees

A binary search tree is an empty binary tree or a labelled binary tree such that:

The labels can be compared.
The label of the root of a binary search tree is greater than all labels in its left subtree.
The label of the root of a binary search tree is less than all labels in its right subtree.
The left and right subtrees are also binary search trees.

Note: Binary search tree for a given set not unique

Binary search trees as implementations of tables

labels represent associations (or just keys if no associated values)
simple code for retrieving an item:

protected static KeyedItem retrieveItem

(TreeNode tNode, Comparable searchKey) {

KeyedItem treeItem;

if (tNode == null) { treeItem = null; }

else {

KeyedItem nodeItem =

(KeyedItem)tNode.getItem();

int searchResult =

searchKey.compareTo(nodeItem.getKey());

if (searchResult == 0) {

treeItem = (KeyedItem)tNode.getItem();

} else if (searchResult < 0) {

treeItem =

retrieveItem(tNode.getLeft(), searchKey);

} else {

treeItem = retrieveItem(tNode.getRight(),

searchKey);

}

}

return treeItem;

}

efficiency?

Note: Inorder traversal encounters values in increasing order.

Maintaining a binary search tree

Insertion

Postcondition can be expressed recursively.

Empty tree: replaced by a leaf node containing the new value
Otherwise: if the new value is less than the root’s value, inserted in the left subtree; else inserted in the right subtree

What should be done for duplicate keys?

Deletion

Postcondition can be expressed by cases.

Not present: no change to the tree
Else value to delete found in leaf: that leaf deleted
Else value to delete found in node having one empty subtree: that node deleted and other subtree attached to the parent
Else value to delete found in node having two non-empty subtrees: that node contains the value previously found in its predecessor (or successor) node and that other node deleted (using either case 2 or 3 as appropriate)

tableRetrieve, tableInsert, and tableDelete use O(h) time, where h is the height of the tree  in the best case and on average, h is O(log n)

Sorting

goals for studying sorting:

“common knowledge” in computer science
wide variety of possible approaches
practice in the design and analysis of algorithms
practice in assertions and verification

assumptions:

All the data can fit in memory.
Data is all comparable.
Data is stored in an array of size n.

2 important operations affecting time:

comparisons: comparing values of two data items
data movements: moving or copying a data item
Ignore operations on index values, etc.
Rationale:

data access and manipulation may be expensive for large objects
number of other operations executed between comparisons and data movements is bounded by a constant

1 important factor affecting space: amount of auxiliary storage

Always need O(n) space to hold the data itself, but how much other space is needed?

Selection sort

Idea: repeatedly extract maximal element from among those still unsorted
How much auxiliary space is needed?
What is the running time?

look at the structure of the code:

for (int last=n-1; last >= 1; last--)

... indexOfLargest(...)

}

Linear insertion sort

Idea: repeatedly insert the element that happens to be next into the proper place among those elements already sorted.

see Carrano & Prichard, pp 390-393

How much auxiliary space is needed?
What is the running time?

A recursive sorting algorithm: mergesort

Idea: merge results of applying mergesort to both halves of the data

see Carrano & Prichard, pp 393-398

Important subroutine: merge

Input is two sorted ranges in an array.
Identify candidate at start of each input range.
Repeatedly copy the smaller of the two candidates to the temporary array.
When one input range is exhausted, simply copy the rest of the other one to the temporary array.
Copy the temporary array back to the input array.

time =
space =

Mergesort itself

Mergesort uses “divide and conquer”

divide a large problem into smaller problems
solve the smaller problems
put the solutions together to form an answer to the larger problem

In mergesort,

smaller problems: sorting two half arrays

to be solved recursively

put the solutions of those two problems together using merge

Code outline:

void mergesort(d[0..n-1]) {
if (n>1) {
mergesort(d[0..n/2-1]);
mergesort(d[n/2..n-1]);
merge(d[0..n/2-1], d[n/2..n-1]);

}

Analysis of mergesort

If n is a power of 2, the “tree of problems to solve” looks like:

where labels represent sizes of the problems to solve

Runtime =
Auxiliary space =

Quicksort

Idea: pick some pivot element and place it where it belongs;
sort all elements less than the pivot;
sort all elements greater than the pivot
Quicksort also uses divide and conquer, but the sizes of the two problems depend on the input.

see Carrano & Prichard, pp 398-410

Important subroutine: partition

int partition(Comparable

theArray[], int first, int last);

// pre: 0  first  last < theArray.length

// post: returns split s.t. first  split  last

// and permutes theArray s.t

// theArray [i]  theArray [split] for first  i < split

// theArray [i]  theArray [split] for split < i  last

Quicksort itself

parameters are needed to indicate the subarray being sorted by a given recursive call

void quickSortRec(Comparable theArray[],

int first, int last) {

// pre: (0  first  last < theArray.length) or (last < first)

// post: values in theArray[first..last] are permuted into ascending order

int pivot;

if (first < last) {

pivot = partition(theArray, first, last);

quickSortRec(theArray, first, pivot-1);

quickSortRec(theArray, pivot+1, last);

}

a non-recursive wrapper method quickSort(Comparable theArray[], int n) calls

quickSortRec(theArray, 0, n-1);

Analysis of quicksort

Best case

each segment split exactly in two  O(n log n) [similar “tree of problems” to that of mergesort]
but splits having n/2 elements in each part not likely

Worst case

each segment split on its first element  O(n²)
but how likely are splits into 0 and n-1 elements?

So what can we expect on average?

assume all initial permutations of elements equally likely
splits having n/4 elements in one of the parts still yield “logarithmic height tree” (base of logarithm is smaller so value is larger by constant factor) and probability of split at least this good is 0.5
similarly for splits having n/k elements in one of the parts, for any constant k
Intuitively: average case is like best case but with larger constant factor.
O(n log n) average case can be proven using material from Stats 230.

N.B. Auxiliary space = O(log n) with clever re-programming

Speeding up quicksort

Try to avoid bad splits (e.g., do not just choose first elements as pivots).

randomization works well
worst case still O(n²), but less likely in practice to have “bad” inputs

Speed up the algorithm in practice by using linear insertion sort on small segments (e.g., < 10 elements).

increases range of base cases
also appropriate for speeding up mergesort

Even better: Stop recursion without sorting small segments (e.g., < 10 elements) at all:

every element within 10 of its final position
final single call to linear insertion sort finishes overall sort quickly

Yüklə 1,3 Mb.

Dostları ilə paylaş:

1 2 3 4