CSC 17 Assignment: DNA Sequence Alignment with Dynamic Programming Part II: Implementation and Testing Due Thursday 4/3 You MUST implement BOTH the "simple" and "advanced" scoring schemes as described in the handout, and do it in a modular manner (such as using inheritance). I will NOT accept two different versions of the program, one for each scoring scheme. I will also be looking at your code, and it should follow the outline you've agreed upon, unless there was a significant problem no one could have forseen. To test your program, you need to: 1. Print out the alignment and alignment score, such as: CATTAATTACACTCTCGCACTCAC-CACCAAACATCCTA-AACCCAGACAGGCCTCGACTCC | ||| | | ||||| || | || | | || || | ||| | ||| | -ACTAAACA-AGACTCGC-CTGTCTAACTAGGGAGTTTATAATGAACCGTGGCGTAGAC-CA Alignment score is 27 (sample was produced using the advanced scoring scheme) 2. Use small, hand crafted examples, such as (but not limited to): AATCG ATCAG (checks if it figures out where to put the gaps) ********* For at least one example, you need to show how your program generated output that corresponds to what you have worked out on paper. You need to be able to print the matrix generated by your program, and show that it corresponds to what you have done by hand. ********* 3. Use large, randomly generated sequences. Test your program several times before declaring that "it works". Be sure to use examples where the lengths of the sequences are not the same. Here's a function that will generate a random sequence of length n public static String randseq(int n) { String S = ""; for(int i=1;i<=n;i++) { int r = (int)(Math.random()*4); if (r==0) S += 'A'; else if (r==1) S += 'C'; else if (r==2) S += 'G'; else S += 'T'; } return S; } // randseq However, the above function assumes you're going to represent the sequence as a String. But you need to remember that the matrix actually contains one more row and one more column, and you must adjust your string and array indices accordingly. Alternatively, you can use a char[] array to represent the matrix, and align the coordinates to begin with: public static char[] randseq(int n) { char[] S = new char[n+1]; // note one extra space added. for(int i=1;i<=n;i++) { int r = (int)(Math.random()*4); if (r==0) S[i] = 'A'; else if (r==1) S[i] = 'C'; else if (r==2) S[i] = 'G'; else S[i] = 'T'; } S[0] = '-'; // not really used return S; } // randseq Notice that in the function above, the loop started at index 1 (which means you need to allocate at least n+1 bytes to the array). This is to make the array coorespond to the matrix, which also must have an extra row and column. -------- String manipulations. You will of course want to print the sequence as a string. Alternatively, you may choose to use a string to begin with instead of a char[]. Consult the Java API documentation for the String class, where you'll find many useful functions built-in. For example, given char[] A: String s= new String(A); will transform the array of chars to a string, and s.toCharArray() will convert it back. Strings have the advantage in that they can be easily modified. For example, s = 'A'+s; will add a new character to the front of the string. ------- Using command-line arguments. You'll notice that my program accepts both numeric and char-based command-line arguments. To tell the difference, simply check the ascii value of the first character of the first argument. That is, if args[0].charAt(0) > (int)'9' then the first argument is not numeric. ------ Here's a problem that have caused me trouble in the past: when dealing with a coordinate system, we usually think in terms of x, then y, or column then row (this is how the tutorial refers to the coordinates). But in the C/C++ (and Java) programming language, 2D arrays are stored in row-major order, and we typically give the row (y) coordinate first. Thus your program may behave in a way that's different from how you envision it. Be very careful, and consistent, with how you deal with array coordinates. ------ Extra Credit: There is a flaw in the tutorial's "advanced scoring scheme" section. Find and correct it. Hint: during traceback, consider what happens when one of x,y gets to 0, but the other one does not.