Dynamic Programming Exercises. BME100 Paper Homework #1
Fall 2001
Due 29 Oct 2001
(Last Update:
12:07 PST 5 November 2001
)
There is a nice review of dynamic programming on the web at
http://www.blc.arizona.edu/courses/bioinformatics/salign.html
.
Also at that site is a copy of the
BLOSUM62 matrix, though you might prefer the output from
the program that created blast, as it is machine-readable and makes it
clear that 1/2 bit units were used.
There is a nice tutorial on dynamic programming at
http://www.sbc.su.se/~per/molbioinfo2001/seqali-dyn.html
with pointers to an example at
http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dynamic.html.
It uses a more expensive alignment algorithm that allows arbitrary gap
costs for different length gaps, though only an affine gap cost is used.
If you want a textbook presentation, one of the best I know is in
Chapter 2 of Durbin, Eddy, Krogh, and Mitchison's book Biological
Sequence Analysis.
The version of Needleman-Wunsch we presented in class used only the
nearest neighbors, prohibited insert->delete and delete->insert
transitions, but needed 3 matrices:
| Mi, j = max { |
Di - 1, j - 1 + subst(Ai, Bj)
|
|
Mi - 1, j - 1 + subst(Ai, Bj)
|
|
Ii - 1, j - 1 + subst(Ai, Bj)
|
| Di, j = max { |
Di , j - 1 - extend
|
|
Mi , j - 1 - open
|
| Ii, j = max { |
Mi-1 , j - open
|
|
Ii-1 , j - extend
|
- Q1:
- Compute the score for the following global alignment using
the BLOSUM62 scoring matrix with a gap-opening cost of 11 and a
gap-extension cost of 1 (both are in 1/2 bit units to be
compatible with the BLOSUM62 matrix). Show what you added up to
get the score, since the final number alone could easily be
incorrect due to a transcription or addition error.
----LADEIAQ--LASEVAQ-
AAAAVAEDVARSTIA-ELSRT
- Q2:
- Use the Needleman-Wunsch (global) alignment algorithm to find
the optimal alignment between LADEIAQ and
AAVAEDRTI, using BLOSUM62 and gap costs 11 and 1.
Provide an alignment matrix showing the match, delete, and insert
scores for the best path up to that point in the matrix.
Connect the points for the best path and provide the alignment in
the same format as in Question 1. Report the score for the best path.
- Q3:
- Repeat Question 2, but use the Smith-Waterman algorithm to
find the optimal local alignment. Again, show the matrix, the
path, the alignment, and the score.
- Q4:
- Write a short program (in any programming language) that
takes two protein sequences and aligns them with the
Smith-Waterman algorithm. You can hardwire the BLSOUM62 matrix
into your code---this is intended to be a check that you
understand the algorithm thoroughly, not a full-featured,
user-friendly program. Please do Question 3 by hand before
writing the program---that will give you an almost independent
check of your understanding.
Turn in source code and the output for running the program on the
sequences of Questions 1, 2, and 3. It might be a good idea to
use some longer sequences as test cases also. You could try some
of the BLAST results from previous homework.
What was learned after the assignment.
- About half the students had the boundary conditions for Insert
and Delete swapped. We probably need to give these explicitly on the
web site and not just in class.
- One student misunderstood where the 0 comes in Smith-Waterman,
and omitted the match score for the initial match of an alignment.
Again, the recurrence and initial conditions should be given
explicitly on the web, and not just in class.
- Two students misunderstood the traceback algorithm somewhat. They tried
implementing a record of the "from" location, rather than
recomputing, but recorded just which of (M,I,D) is largest.
Since we were not allowing I->D and D->I transitions, this is
not enough information to correctly reconstruct the path in
general.
SoE home
UCSC Bioinformatics Home Page
BME 100 home page
Questions about page content should be directed to
Kevin Karplus
Computer Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250