 # What is computer science

Yüklə 274,63 Kb.
 səhifə 3/4 tarix 08.08.2018 ölçüsü 274,63 Kb.
1   2   3   4

public int height () {

// post: returns height of this tree

if (this.isEmpty()) return 0;

BinaryTreeInterface left =

this.detachLeftSubtree();

BinaryTreeInterface right =

this.detachRightSubtree();

int totalHeight = 1 +

Math.max(left.height(), right.height());

this.attachLeftSubtree(left);

this.attachRightSubtree(right);

}

Properties about trees cont'd

• The height of a non-empty binary tree, T, the maximum depth taken over all of its nodes.

i.e. max {depth(x) | x is a node of T}

• A binary tree of height h has at most 2h -1 nodes.

• A binary tree with n nodes has height at least log2(n+1).

• Define a full node to be a node with exactly two children.

N.B. A tree of height h having the maximum number of nodes (2h-1) is called a full tree.

• In any non-empty binary tree, the number of leaves is one more than the number of full nodes.

• In any binary tree, the number of empty subtrees is one more than the number of non-empty subtrees.

• Most of these properties can be proved by mathematical induction.

• Components: associations from keys (from some domain space) to values

• simple (partial) functions: values are atomic

• databases: values are records of field-value pairs (often including the key-value pair too)

• sets: values are empty; ~ characteristic function

• Examples:

• mapping from student ID to name

• mapping from student ID to student record

• set of student IDs for students in CS 134 (mapping from ID to “taking CS 134”)

• Intuitive operations:

• look up given key  tableRetrieve

• insert a new association  tableInsert

• delete association for a given key  tableDelete

• at most one value per key (although that value can be a collection for some tables)

• does not support inverse mapping except through enumeration

• Iterators do not necessarily encounter keys in order.

Simple representations

• Keep table as a sequence of associations in no particular order, with no keys repeated.

• using a vector

• using a linked list

• efficiency of operations:

Consider a vector representation.

Look at number of comparisons of two keys and number of elements whose locations in the vector change:

• e.g., how many comparisons are executed during a call to “retrieve” in the worst case? or in the best case? how many moves are needed during a call to “delete”?

Similar analyses can be applied to reason about linked list representations.

• implementation convenience: use a single private (non-ADT) method “position” to find the location of association matching a given key, if it exists

Ordering collections

Why does a vocabulary dictionary keep words in alphabetical order?

• To find a word, we could start at the beginning and scan every word in the dictionary.

• We can do better if the dictionary is sorted by key.

• Idea: open a dictionary near the middle, and then determine whether to search in the first or second half

a sorted dictionary is either empty or the concatenation of two sorted sub-dictionaries of (approx.) half the size for which every word in the first is smaller than every word in the second • requires that keys be “comparable”

Binary search

Given a sequence of comparable objects in ascending order, find the index matching a target key if it is present, or otherwise return the index of the slot where it would be inserted.

returns index such that 0  index  data.size(),

data[0..index-1] < target, and

data[index..data.size()-1]  target
int position(Comparable target){

// pre: target is non-null and data values ascending

// post: returns ideal position of target in data vector

return search(0,data.size(),target);

}
int search(int lo, int hi, Comparable key) {

// pre: 0  lo  hi  data.size( ); key not null

// post: returns ideal position of key in data[lo..hi]

Comparable m;

int mid = (lo + hi)/2;

if (lo == hi) return lo;

else {

m = (Comparable)data.elementAt(mid);

if (m.compareTo(key) < 0) // m < key

return search(mid+1, hi, key);

else return search(lo, mid, key);

}

}

• How efficient is binary search?

Bounding efficiency

• Running time of a program is a function of the “size” of the problem being solved

• for collections: size = number of elements

• for binary search: size = hi – lo

Consider solution to a problem of size n

• Running time using one compiler and one machine might be

.33365n2 - .43n + 3.4 sec

• Another compiler and another machine might take 4.5n2 + 17n msec

• In either case, doubling the size of a large problem means that the solution takes about 4 times as long to run

• simplify both to "order n-squared", written O(n2)

Big-oh notation

• Intuitively:

• keep dominant term,

• remove leading constant,

• put O(..) around it

• Informally: f(n) is O(g(n)) if f(n) and g(n) differ by at most a fixed constant for sufficiently large n.

• Formally: f(n) is O(g(n)) if there exist two positive constants, c and n0, such that f(n)  c*g(n) for all n  n0

• Algorithm A is O(g(n)) if for any reasonable implementation of the algorithm on any reasonable computer, the time required by A to solve a problem of size n is O(g(n)).

(1/2 n2 + 1/2 n) is O(n2)

(13.3 n + 4 n3 + 3/4 n2) is O(n3)

O-notation (intuition) Algorithm Ai has running time O(gi(n))

Use of O-notation

• Common classes of functions

• constant: O(1)

• logarithmic: O(log n)

• linear: O(n)

• cubic: O(n3)

• exponential: O(2n)

• We don’t need an exact analysis of every operation; constants can be accumulated

• Examples:

• popping an element from a stack:

• removing an element from a list:

• calculating the size of a binary tree:

Comparing algorithms

[Jon Bentley, “Programming Pearls: Algorithm Design Techniques,” Comm. ACM 27, 9 (Sept. 1984) pp. 865-871]

• problem: given an integer array A, find the values i and j which maximize  • O(n3) algorithm: try all possible values of i and j

• How many choices for i?

• How many choices for j, given i?

• Cost of figuring out the value for a given i and j?

Alternative approach

• in single pass over array, keep track of best range so far as well as best starting point for a range ending at current index

• Bentley’s implementations:

• O(n3) algorithm in finely-tuned FORTRAN on a Cray-1 supercomputer

• O(n) algorithm in interpreted BASIC on a Radio Shack TRS-80 Model III microcomputer

• estimated running times:

• 3.0n3 nanoseconds on Cray computer

• 19.5n milliseconds (19500000n nanoseconds) on Radio Shack computer

Bentley’s results

 Cray TRS-80 n 3.0n3 ns 19.5n ms 10 3 s .195 s 100 .003 s 1.95 s 1000 2500 104 105 106

. . .

Faster hardware isn’t good enough!

Efficiency of binary search

• asymptotic analysis: interested in behaviour for large vectors

int search(int lo, int hi, Comparable key) {

int mid = (lo + hi)/2;

if (lo == hi) return lo;

else ... // compare key to value of data.elementAt(mid)

return search(mid+1, hi, key);

or return search(lo, mid, key);

}

• Each recursive call halves vector:

n n/2 n/4 n/8 n/16 …

after i comparisons, hi-lo = n/2i

but search ends when hi-lo < 1

and there is O(1) work between calls

 time for binary search is O(log2n)

• Doubling the size of the vector requires only one more call to search!

• We usually write O(log n) with no subscript.

Efficiency of implementing Table using Vector

Re-examine worst case:

 method unordered vector ordered vector comps moves comps moves tableRetrieve tableDelete tableInsert (with array large enough) tableInsert (with array fully occupied)

Binary search trees

• A binary search tree is an empty binary tree or a labelled binary tree such that:

• The labels can be compared.

• The label of the root of a binary search tree is greater than all labels in its left subtree.

• The label of the root of a binary search tree is less than all labels in its right subtree.

• The left and right subtrees are also binary search trees. • Note: Binary search tree for a given set not unique Binary search trees as implementations of tables

• labels represent associations (or just keys if no associated values)

• simple code for retrieving an item:

protected static KeyedItem retrieveItem

(TreeNode tNode, Comparable searchKey) {

KeyedItem treeItem;

if (tNode == null) { treeItem = null; }

else {

KeyedItem nodeItem =

(KeyedItem)tNode.getItem();

int searchResult =

searchKey.compareTo(nodeItem.getKey());

if (searchResult == 0) {

treeItem = (KeyedItem)tNode.getItem();

} else if (searchResult < 0) {

treeItem =

retrieveItem(tNode.getLeft(), searchKey);

} else {

treeItem = retrieveItem(tNode.getRight(),

searchKey);

}

}

return treeItem;

}

• efficiency?

• Note: Inorder traversal encounters values in increasing order.

Maintaining a binary search tree

• Insertion

Postcondition can be expressed recursively.

• Empty tree: replaced by a leaf node containing the new value

• Otherwise: if the new value is less than the root’s value, inserted in the left subtree; else inserted in the right subtree

What should be done for duplicate keys?

• Deletion

Postcondition can be expressed by cases.

1. Not present: no change to the tree

2. Else value to delete found in leaf: that leaf deleted

3. Else value to delete found in node having one empty subtree: that node deleted and other subtree attached to the parent

4. Else value to delete found in node having two non-empty subtrees: that node contains the value previously found in its predecessor (or successor) node and that other node deleted (using either case 2 or 3 as appropriate)

• tableRetrieve, tableInsert, and tableDelete use O(h) time, where h is the height of the tree  in the best case and on average, h is O(log n)

Sorting

• goals for studying sorting:

• “common knowledge” in computer science

• wide variety of possible approaches

• practice in the design and analysis of algorithms

• practice in assertions and verification

• assumptions:

• All the data can fit in memory.

• Data is all comparable.

• Data is stored in an array of size n.

• 2 important operations affecting time:

• comparisons: comparing values of two data items

• data movements: moving or copying a data item

• Ignore operations on index values, etc.

• Rationale:

• data access and manipulation may be expensive for large objects

• number of other operations executed between comparisons and data movements is bounded by a constant

• 1 important factor affecting space: amount of auxiliary storage

• Always need O(n) space to hold the data itself, but how much other space is needed?

Selection sort

• Idea: repeatedly extract maximal element from among those still unsorted

• How much auxiliary space is needed?

• What is the running time?

• look at the structure of the code:

for (int last=n-1; last >= 1; last--)

... indexOfLargest(...)

|

}

Linear insertion sort

Idea: repeatedly insert the element that happens to be next into the proper place among those elements already sorted.

see Carrano & Prichard, pp 390-393

• How much auxiliary space is needed?

• What is the running time?

A recursive sorting algorithm: mergesort

see Carrano & Prichard, pp 393-398

• Important subroutine: merge

• Input is two sorted ranges in an array.

• Identify candidate at start of each input range.

• Repeatedly copy the smaller of the two candidates to the temporary array.

• When one input range is exhausted, simply copy the rest of the other one to the temporary array.

• Copy the temporary array back to the input array.

• time =

• space =

Mergesort itself

• Mergesort uses “divide and conquer”

• divide a large problem into smaller problems

• solve the smaller problems

• put the solutions together to form an answer to the larger problem

• In mergesort,

• smaller problems: sorting two half arrays

• to be solved recursively

• put the solutions of those two problems together using merge

• Code outline:

void mergesort(d[0..n-1]) {
if (n>1) {
mergesort(d[0..n/2-1]);
mergesort(d[n/2..n-1]);
merge(d[0..n/2-1], d[n/2..n-1]);

}

}

Analysis of mergesort

• If n is a power of 2, the “tree of problems to solve” looks like: where labels represent sizes of the problems to solve

• Runtime =

• Auxiliary space =

Quicksort

• Idea: pick some pivot element and place it where it belongs;

• sort all elements less than the pivot;

• sort all elements greater than the pivot

• Quicksort also uses divide and conquer, but the sizes of the two problems depend on the input.

see Carrano & Prichard, pp 398-410

• Important subroutine: partition

int partition(Comparable

theArray[], int first, int last);

// pre: 0  first  last < theArray.length

// post: returns split s.t. first  split  last

// and permutes theArray s.t

// theArray [i]  theArray [split] for first  i < split

// theArray [i]  theArray [split] for split < i  last

Quicksort itself

• parameters are needed to indicate the subarray being sorted by a given recursive call

void quickSortRec(Comparable theArray[],

int first, int last) {

// pre: (0  first  last < theArray.length) or (last < first)

// post: values in theArray[first..last] are permuted into ascending order

int pivot;

if (first < last) {

pivot = partition(theArray, first, last);

quickSortRec(theArray, first, pivot-1);

quickSortRec(theArray, pivot+1, last);

}

}

• a non-recursive wrapper method quickSort(Comparable theArray[], int n) calls

quickSortRec(theArray, 0, n-1);

Analysis of quicksort

• Best case

• each segment split exactly in two  O(n log n) [similar “tree of problems” to that of mergesort]

• but splits having n/2 elements in each part not likely

• Worst case

• each segment split on its first element  O(n2)

• but how likely are splits into 0 and n-1 elements?

• So what can we expect on average?

• assume all initial permutations of elements equally likely

• splits having n/4 elements in one of the parts still yield “logarithmic height tree” (base of logarithm is smaller so value is larger by constant factor) and probability of split at least this good is 0.5

• similarly for splits having n/k elements in one of the parts, for any constant k

• Intuitively: average case is like best case but with larger constant factor.

• O(n log n) average case can be proven using material from Stats 230.

• N.B. Auxiliary space = O(log n) with clever re-programming

Speeding up quicksort

• Try to avoid bad splits (e.g., do not just choose first elements as pivots).

• randomization works well

• worst case still O(n2), but less likely in practice to have “bad” inputs

• Speed up the algorithm in practice by using linear insertion sort on small segments (e.g., < 10 elements).

• Even better: Stop recursion without sorting small segments (e.g., < 10 elements) at all:

• every element within 10 of its final position

• final single call to linear insertion sort finishes overall sort quickly

Dostları ilə paylaş:
1   2   3   4

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2019
rəhbərliyinə müraciət Ana səhifə