Notes on Recurrence Relations and Algorithm Analysis:

The analysis of algorithms is very mathematically intense.  Even a
casual understanding requires familiarity with some mathematical
concepts.  Let's start with a (review of) definition of what
complexity classes such as n ,log n, etc mean.

     ****** Definition of big-O and big-Theta classes ******

Usually, "Complexity class" refers to the "big-Theta" notation, though
often we abuse definitions and call it "big-O".  Big-O is an upper
bound.  Big-Theta is both an upper and a lower bound.  Specifically, a
function g(n) is in big-Theta(f(n)) if

       limit   g(n)
       n->inf  ----   =  K > 0  // K is a positive constant.
               f(n)

It's big-O if K can be 0, so n is technically in O(n*n).  Often people
will say O(n) when they really mean big-Theta(n).  For example if
g(n) = 2*n-1, then g(n) is big-Theta(n) because, the limit (as n 
approaches infinity) of (2*n-1)/n is 2  (apply L'Hopital's Rule).

To be complete, we can also assume that T(1) = 1.  Another assumption
that we can make is that if m<n then T(m) <= T(n): that is, it will
only take longer if there's more data.  In these formulas, '1'
actually can be replaced by any positive constant.


These theoretical concepts have important consequences for the
practical programmer.  Often whether or not a problem can be solved
depends on finding an algorithm with a reasonable time complexity.
The famous "N-Bodies" problems concerns how N entities interact with
each other (e.g. collision detection between n objects).  The naive
solution is to compare each object with each other object using a
nested loop, which takes O(n*n) ("n squared") time.  A better solution
is to divide the space that contains these objects into a "quad-tree"
or "oct-tree", which places each object in a quadrant.  Each quadrant
is further divided into sub-quadrants, forming a tree.  Each object
only needs to be compared with other objects in its own subquadrant,
and only occassionally need to be compared with objects in other
quadrants.  This algorithm reduces the N-bodies problem to
average-case O(n*log n) time.  It's used in game programming among
other applications.

////
For the purpose of CSC17, I expect you to be capable of the following:

I.  Be able to write down a mathematical formula representing the running
    time of a program.  For example, the classic (but inefficient) "bubblesort"
    algorithm can be written as:

    void bubblesort(double[] A)
    {
       for(int i=0;i<A.length-1;i++)
         for(int k=0;k<A.length-1;k++)
	    if (A[k]>A[k+1]) //swap ("bubble to the top")
	    { double tmp=A[k]; A[k]=A[k+1]; A[k+1]=temp; } 
    }

    If n==A.length, then the outer loops runs n-1 times and during
    each iteration the inner loop runs n-1 times, so the if-statement
    (which is bounded by a contant time) runs (n-1)(n-1) = n*n -2*n + 1 times.
    This function is O(n*n) (technically Theta(n*n) ) because

        Limit      n*n- 2*n +1
       n -> inf    -----------  = 1, which is a positve constant.
                       n*n

    We can write a slightly optimized version of bubblesort by changing
    the inner loop to 

         for(int k=0;k<A.length-1-i;k++)

    For the first iteration of the outer loop, i==0 so the inner loop
    runs n-1 times, for the next iteration, i==1, so it will run n-2
    times, etc. The sum is 1 + 2 + 3 + ... n-1 which is n(n-1)/2.  This
    algorithm is in the same complexity class as the unoptimized version. 
    In fact:

        Limit      n*n- 2*n +1
       n -> inf    -----------  = 2 
                    n*(n-1)/2

    This limit is also a positive constant.  Furthermore, the inverse of this
    limit is 1/2, which is also a positive constant.  The limit of 2 tells us
    that, for large enough values of n, the optimized version runs twice as
    fast as the unoptimized one.

//////
II. Memorize the solutions of the following six recurrence relations, 
    and be able to see how they reflect the structure of algorithms.
    Sometimes these equations are written with a constant C, but here
    we will just assume that C = 1.

    Recurrence Formula			Complexity Class

    1. T(n) = T(n/2) + 1		log(n) (logarithmic)

    2. T(n) = T(n-1) + 1		n  (linear)

    3. T(n) = 2*T(n/2) + 1		n  (linear)

    4. T(n) = 2*T(n/2) + n 		n*log(n)  ("n log n")

    5. T(n) = T(n-1) + n                n*n  (n squared) 
    
    6. T(n) = 2*T(n-1) + 1		2**n (exponential), 

You don't have to be able to derive these forms yourself, but you should
at least memorize them, for these are common forms that can be given to
many algorithms.  Note that the differences between these formulas can
be subtle in appearance.  


III. You must understand how these relations represent the average and worst
     case complexities of typical algorithms.  Here are examples of
     algorithms for each of the six forms, respectively:

    1. log n. binary search 
       (worst-case for sorted arrays, average case for unbalanced search trees)

	Recurrence relations do NOT necessarily only represent recursive
	programs.  For example, binary search is an algorithm that can
	be implemented with a while loop just as easily as with recursion,
	but the complexity classification is the same.  

    2. n. iterative (any common loop) that goes through a linear sequence of
       values one by one, such as printing the contents of a linked list or
       array.  
       
    3. n. sum of the numbers of a binary tree (average case).  Many algorithms
       on binary trees can be written with two recursive calls, on the 
       left and right subtrees.  Many of these algorithms will have a
       recurrence relation of this form. 

    4. n*log(n). worst case heapsort, mergesort, average case quicksort.  This
       very important relation can be given to many of the most important
       algorithms in computer science, including the "fast Fourier transform",
       which is "fast" because the regular version is n*n instead of n*lg(n)

    5. for  n*n. worst case quicksort, bubblesort.  Most things that
       involve a nested loop of the form for(i=1..n){for(j=1..n){..}}.

    6. 2**n. naive fibonacci (fib(n) = fib(n-1) + fib(n-2)).  This function's
       behavior is caused by redundant computation, not by "making too
       many recursive calls".  Most of the algorithms of form 4: n*log(n) -
       also make multiple recursive calls.  Also, algorithms that 
       must exhaustively enumerate all possible solutions often will have
       this form.  A complexity class that's also exponential is n! 
       (n-factorial).  An algorithm that has to search through all possible
       permutations of some string, for example, will have O(n!) time and
       thus also exponential.

       Note that the naive fibonacci function doesn't exactly match the
       recurrence relation T(n)=2*T(n-1)+1.  It's more like
       T(n) = T(n-1) + T(n-2) + 1.  However, we can assume that
       T(n-2) < T(n-1)  so the time complexity of naive fibonacci actually
       falls between T(n)= 2*T(n-2)+1 and T(n)=2*T(n-1)+1.  So O(2^n) 
       represents a reasonable upper bound.  The solution to T(n)=2*T(n-2)+1
       is 2 to the n/2 power, or sqrt(2) to the nth power,  
       since the argument will decrease twice as fast (see formula 6GG below).
       Thus we can more precisely say that the naive Fibonacci function falls
       between sqrt(2)**n and 2**n in time complexity, but in any case that's
       exponential time.

Additional hints:   What recurrence relation best describes the following 
function (be careful)?

  int f(int n)
  { if (n<1) return 1; else return 2*f(n-1); }

This function computes 2 to the nth power, and the correct recurrence relation
is T(n) = T(n-1) + 1, which means it's O(n), linear time.  It makes only
ONE recursive call, not two.  The recurrence relation is an abstract 
description of the *structure* of the program.  Don't let the syntax trick you.

How about this one:

double g(int x, int y)
{
  if (x==0 && y==0) return 1;
  if (x>0) return g(x-1,y);
  else return g(x,y-1);
}

Here, we can let n=x+y, so decreasing either by 1 also decreases n by 1.
The relation is still T(n)=T(n-1)+1: at most one recursive call is made.


                       *** Generalizations ***  

The six forms above represent many common algorithms but they can be
generalized.  In particular, formula 5 can be generalized to:

    5G. T(n) = T(n-1) + f(n)                solution O(n * f(n))

Here, f(n) is any function of n.  For example, if f(n)=log(n),
then the solution is n*log(n).  If f(n)=n, then the solution is n*n.  This
formula actually subsumes formula no. 2, where f(n)=1.  

The recurrence relations can also be used to analyze loops.
Mathematically, all computer programs fall under the category of
"partially recursive functions:" "partially" refers to the fact that some
program don't terminate.  A for-loop is just a special, commonly used
form of recursion.  How do we execute a loop that runs n times?

  for(int i=0;i<n;i++) { code... }? 

We execute the "code" once, then execute the loop n-1 more times!  Now
let's say that the time to run "code" is log(n) (so f(n)=log(n)), then
this formula says that the algorithm runs in n*log(n) time.  The
heapsort algorithm can be analyzed this way: we loop through n
elements and perform a log(n) value on each element.

Similarly, formula number 1 can be generalized to 

    1G. T(n) = T(n/2) + f(n)                O(log(n) * f(n))
 
And formula 6 can be generalized to 

    6G. T(n)= K*T(n-1) + 1,                 O(K**n) (K to the nth power).

Here, K is a constant.  In the original formula 6, K=2.  
An even more powerful generalization of 6 is:

    6GG. T(n) = K*T(n-c) + 1                O(K**(n/c))

where both K and c are constants.  For example, T(n) = 3*T(n-2) + 1
has complexity 3**(n/2) = sqrt(3)**n.  Intuitively, we divide the
exponent by half because n-2 will reach 0 twice as fast as n-1.
Regardless of K or c, form 6GG generalizes algorithms that run in
exponential time.  It's best to not wait for them :)


                           **** FYI ****

Some of the recurrence relations listed here can be solved by one mathematical
result called the "Master Theorem", which can be stated as follows:

let T(n) = a*T(n/b) + f(n)   ; for n>1, a>=1, b>=2, and f(n) in order(n**d)
    T(1) = k                 ; for constant k>0

Then T(n) has the following complexity classications (Theta - not just big O):

T(n) is order n**d  (n to the dth power) if  a < b**d
T(n) is order n**d * log(n)   if   a = b**d
T(n) is order n**(log(a)/log(b)) if a > b**d

So for T(n) = 2T(n/2) + n, the second case applies because a=b=2, d=1, so this
relation has solution in order n*log(n)
 

For T(n) = 2T(n/2) + 1, the THIRD case applies, with a=b=2, and d=0, because
f(n) is in order(n**0) = order(1).  So a> b**d, and the solution is in 
n**(log(a)/log(b)) = n**1 = n.

But the master's theorem cannot solve T(n) = T(n-1) + 1, for example,
since b must be at least 2.