Notes on Recurrence Relations and Algorithm Analysis: The analysis of algorithms is very mathematically intense. Even a casual understanding requires familiarity with some mathematical concepts. Let's start with a (review of) definition of what complexity classes such as n ,log n, etc mean. ****** Definition of big-O and big-Theta classes ****** Usually, "Complexity class" refers to the "big-Theta" notation, though often we abuse definitions and call it "big-O". Big-O is an upper bound. Big-Theta is both an upper and a lower bound. Specifically, a function g(n) is in big-Theta(f(n)) if limit g(n) n->inf ---- = K > 0 // K is a positive constant. f(n) It's big-O if K can be 0, so n is technically in O(n*n). Often people will say O(n) when they really mean big-Theta(n). For example if g(n) = 2*n-1, then g(n) is big-Theta(n) because, the limit (as n approaches infinity) of (2*n-1)/n is 2 (apply L'Hopital's Rule). To be complete, we can also assume that T(1) = 1. Another assumption that we can make is that if mA[k+1]) //swap ("bubble to the top") { double tmp=A[k]; A[k]=A[k+1]; A[k+1]=temp; } } If n==A.length, then the outer loops runs n-1 times and during each iteration the inner loop runs n-1 times, so the if-statement (which is bounded by a contant time) runs (n-1)(n-1) = n*n -2*n + 1 times. This function is O(n*n) (technically Theta(n*n) ) because Limit n*n- 2*n +1 n -> inf ----------- = 1, which is a positve constant. n*n We can write a slightly optimized version of bubblesort by changing the inner loop to for(int k=0;k inf ----------- = 2 n*(n-1)/2 This limit is also a positive constant. Furthermore, the inverse of this limit is 1/2, which is also a positive constant. The limit of 2 tells us that, for large enough values of n, the optimized version runs twice as fast as the unoptimized one. ////// II. Memorize the solutions of the following six recurrence relations, and be able to see how they reflect the structure of algorithms. Sometimes these equations are written with a constant C, but here we will just assume that C = 1. Recurrence Formula Complexity Class 1. T(n) = T(n/2) + 1 log(n) (logarithmic) 2. T(n) = T(n-1) + 1 n (linear) 3. T(n) = 2*T(n/2) + 1 n (linear) 4. T(n) = 2*T(n/2) + n n*log(n) ("n log n") 5. T(n) = T(n-1) + n n*n (n squared) 6. T(n) = 2*T(n-1) + 1 2**n (exponential), You don't have to be able to derive these forms yourself, but you should at least memorize them, for these are common forms that can be given to many algorithms. Note that the differences between these formulas can be subtle in appearance. III. You must understand how these relations represent the average and worst case complexities of typical algorithms. Here are examples of algorithms for each of the six forms, respectively: 1. log n. binary search (worst-case for sorted arrays, average case for unbalanced search trees) Recurrence relations do NOT necessarily only represent recursive programs. For example, binary search is an algorithm that can be implemented with a while loop just as easily as with recursion, but the complexity classification is the same. 2. n. iterative (any common loop) that goes through a linear sequence of values one by one, such as printing the contents of a linked list or array. 3. n. sum of the numbers of a binary tree (average case). Many algorithms on binary trees can be written with two recursive calls, on the left and right subtrees. Many of these algorithms will have a recurrence relation of this form. 4. n*log(n). worst case heapsort, mergesort, average case quicksort. This very important relation can be given to many of the most important algorithms in computer science, including the "fast Fourier transform", which is "fast" because the regular version is n*n instead of n*lg(n) 5. for n*n. worst case quicksort, bubblesort. Most things that involve a nested loop of the form for(i=1..n){for(j=1..n){..}}. 6. 2**n. naive fibonacci (fib(n) = fib(n-1) + fib(n-2)). This function's behavior is caused by redundant computation, not by "making too many recursive calls". Most of the algorithms of form 4: n*log(n) - also make multiple recursive calls. Also, algorithms that must exhaustively enumerate all possible solutions often will have this form. A complexity class that's also exponential is n! (n-factorial). An algorithm that has to search through all possible permutations of some string, for example, will have O(n!) time and thus also exponential. Note that the naive fibonacci function doesn't exactly match the recurrence relation T(n)=2*T(n-1)+1. It's more like T(n) = T(n-1) + T(n-2) + 1. However, we can assume that T(n-2) < T(n-1) so the time complexity of naive fibonacci actually falls between T(n)= 2*T(n-2)+1 and T(n)=2*T(n-1)+1. So O(2^n) represents a reasonable upper bound. The solution to T(n)=2*T(n-2)+1 is 2 to the n/2 power, or sqrt(2) to the nth power, since the argument will decrease twice as fast (see formula 6GG below). Thus we can more precisely say that the naive Fibonacci function falls between sqrt(2)**n and 2**n in time complexity, but in any case that's exponential time. Additional hints: What recurrence relation best describes the following function (be careful)? int f(int n) { if (n<1) return 1; else return 2*f(n-1); } This function computes 2 to the nth power, and the correct recurrence relation is T(n) = T(n-1) + 1, which means it's O(n), linear time. It makes only ONE recursive call, not two. The recurrence relation is an abstract description of the *structure* of the program. Don't let the syntax trick you. How about this one: double g(int x, int y) { if (x==0 && y==0) return 1; if (x>0) return g(x-1,y); else return g(x,y-1); } Here, we can let n=x+y, so decreasing either by 1 also decreases n by 1. The relation is still T(n)=T(n-1)+1: at most one recursive call is made. *** Generalizations *** The six forms above represent many common algorithms but they can be generalized. In particular, formula 5 can be generalized to: 5G. T(n) = T(n-1) + f(n) solution O(n * f(n)) Here, f(n) is any function of n. For example, if f(n)=log(n), then the solution is n*log(n). If f(n)=n, then the solution is n*n. This formula actually subsumes formula no. 2, where f(n)=1. The recurrence relations can also be used to analyze loops. Mathematically, all computer programs fall under the category of "partially recursive functions:" "partially" refers to the fact that some program don't terminate. A for-loop is just a special, commonly used form of recursion. How do we execute a loop that runs n times? for(int i=0;i1, a>=1, b>=2, and f(n) in order(n**d) T(1) = k ; for constant k>0 Then T(n) has the following complexity classications (Theta - not just big O): T(n) is order n**d (n to the dth power) if a < b**d T(n) is order n**d * log(n) if a = b**d T(n) is order n**(log(a)/log(b)) if a > b**d So for T(n) = 2T(n/2) + n, the second case applies because a=b=2, d=1, so this relation has solution in order n*log(n) For T(n) = 2T(n/2) + 1, the THIRD case applies, with a=b=2, and d=0, because f(n) is in order(n**0) = order(1). So a> b**d, and the solution is in n**(log(a)/log(b)) = n**1 = n. But the master's theorem cannot solve T(n) = T(n-1) + 1, for example, since b must be at least 2.