Dynamic Programming Exercises. BME100 Paper Homework #1

Fall 2001
Due 29 Oct 2001
(Last Update: 12:07 PST 5 November 2001 )

There is a nice review of dynamic programming on the web at http://www.blc.arizona.edu/courses/bioinformatics/salign.html . Also at that site is a copy of the BLOSUM62 matrix, though you might prefer the output from the program that created blast, as it is machine-readable and makes it clear that 1/2 bit units were used.

There is a nice tutorial on dynamic programming at http://www.sbc.su.se/~per/molbioinfo2001/seqali-dyn.html with pointers to an example at http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dynamic.html. It uses a more expensive alignment algorithm that allows arbitrary gap costs for different length gaps, though only an affine gap cost is used. If you want a textbook presentation, one of the best I know is in Chapter 2 of Durbin, Eddy, Krogh, and Mitchison's book Biological Sequence Analysis.

The version of Needleman-Wunsch we presented in class used only the nearest neighbors, prohibited insert->delete and delete->insert transitions, but needed 3 matrices:

Mi, j = max { Di - 1, j - 1 + subst(Ai, Bj)
Mi - 1, j - 1 + subst(Ai, Bj)
Ii - 1, j - 1 + subst(Ai, Bj)

Di, j = max { Di , j - 1 - extend
Mi , j - 1 - open

Ii, j = max { Mi-1 , j - open
Ii-1 , j - extend

Q1:
Compute the score for the following global alignment using the BLOSUM62 scoring matrix with a gap-opening cost of 11 and a gap-extension cost of 1 (both are in 1/2 bit units to be compatible with the BLOSUM62 matrix). Show what you added up to get the score, since the final number alone could easily be incorrect due to a transcription or addition error.
    ----LADEIAQ--LASEVAQ-
    AAAAVAEDVARSTIA-ELSRT
    
Q2:
Use the Needleman-Wunsch (global) alignment algorithm to find the optimal alignment between LADEIAQ and AAVAEDRTI, using BLOSUM62 and gap costs 11 and 1. Provide an alignment matrix showing the match, delete, and insert scores for the best path up to that point in the matrix. Connect the points for the best path and provide the alignment in the same format as in Question 1. Report the score for the best path.
Q3:
Repeat Question 2, but use the Smith-Waterman algorithm to find the optimal local alignment. Again, show the matrix, the path, the alignment, and the score.
Q4:
Write a short program (in any programming language) that takes two protein sequences and aligns them with the Smith-Waterman algorithm. You can hardwire the BLSOUM62 matrix into your code---this is intended to be a check that you understand the algorithm thoroughly, not a full-featured, user-friendly program. Please do Question 3 by hand before writing the program---that will give you an almost independent check of your understanding.

Turn in source code and the output for running the program on the sequences of Questions 1, 2, and 3. It might be a good idea to use some longer sequences as test cases also. You could try some of the BLAST results from previous homework.


What was learned after the assignment.

slug icon to go to School of Engineering home page SoE home     UCSC Bioinformatics Home Page     BME 100 home page    

Questions about page content should be directed to

Kevin Karplus
Computer Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250