For catalog copy and pre-requisites, see the main page for BME205.
Lectures: MWF 2-3:10 PSB 305
Online discussion forum: BME 205 forum
This book is a tutorial introduction to the use of hidden Markov models and other probabilistic models for sequence analysis problems in computational molecular biology, but is aimed mainly at a graduate-student audience. We've been using it for years in this class, and have not yet found as detailed a text.
This is a text and reference book that every bioinformatics programmer should have. I don't follow the book very closely, so you will have to figure out for yourself when it is appropriate to read various sections.
Get the second edition, if you can, which has made corrections as indicated on the errata page.
BME 205 will be using Python, rather than Perl this year. Since I am just learning Python myself, I have not had time to review all the available books. I initially told the bookstore to order Programming Python by Mark Lutz, based on the parallelism with the Programming Perl book we have used in the past, but this was a mistake, as it is a sequel to Learning Python by Mark Lutz, and both are really terrible for this class. It take Lutz forever to get to the point of anything, and material is scattered in random order. I read over a hundred pages of Learning Python, was still not prepared to write even a short Python program, and was heartily sick of the Python boosterism. Python in a Nutshell is much better organized and gets to the point immediately, but is short on examples.
The best source I've found is the online documentation at http://docs.python.org/tutorial/, http://docs.python.org/reference/, and http://docs.python.org/library/. One Python user highly recommends the index http://docs.python.org/genindex.html, but I've done just as well using Google with python and the subject I'm interested in.
If you need a more tutorial introduction, I see that the undergrad course CMPS 5P has used Python for Software Design: How to think like a computer scientist by Allen Downey (Green Tea Press) as a text. An earlier manuscript of the book is available for free.
Lynn Olson, a new PhD student, recommends Dive into Python, a free on-line book for experienced programmers.
This is book came out in summer 2004. It looks like it may be a valuable supplementary text, as it seems to be easier to read and at a slightly less advanced level than the Durbin et al. book. The description of sequence-sequence alignment and HMMs does not seem quite detailed enough for this class though.
Note: the science library now has Darling model kits that you can check out! Also, several of the more advanced graduate students may be willing to lend their model kits.
Some initial instructions for building a protein backbone with this model kit are available.
| Date | Have read these sections | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 Oct 2009 | 1.1–1.4 | |||||||||||
| 9 Oct 2009 | 11.1–11.6 | |||||||||||
| 15 Oct 2009 | 2.1–2.9 | |||||||||||
| 23 Oct 2009 | 3.1–3.6 | |||||||||||
| 30 Oct 2009 | 5.1–5.8 | |||||||||||
| 6 Nov 2009 | 4.1–4.5 | |||||||||||
| 13 Nov 2009 | 6.1–6.5 | |||||||||||
| 20 Nov 2009 | 7.1–7.6 |
| Date (to be) released | Assignment | Date Due |
|---|---|---|
| 31 Aug 2009 | prereq survey | 25 Sept 2009 |
| 1 Sept 2009 | python1 (FASTA parsing) | 9 Oct 2009 |
| 18 Sept 2009 | fellowship application | 16 Oct 2009 |
| 4 Sept 2009 | python2 (Markov chains) | 23 Oct 2009 |
| 20 Oct 2009 | Darling models | 30 Oct 2009 |
| 10 Sept 2009 | python3 (finding palindromes) | 6 Nov 2009 |
| 14 Sept 2009 | python4 (null models) | 13 Nov 2009 |
| 3 Nov 2009 | web and literature search | 20 Nov 2009 |
| 16 Sept 2009 | python5 (affine-gap alignment) | Mon 30 Nov 2009 |
| 6 Sept 2009 | python6 (cassette mutagenesis) | Mon 7 Dec 2009 |
Every student in the class will need a School of Engineering computer account. I will want assignments turned in by providing me with a publicly readable file (PDF for written assignments) or directory (for multi-file assignments) containing the assignment on the SoE machines. All Python programs must execute correctly on the SoE machines, without needing to install additional Python modules. I would prefer to get paper copies of assignments in addition to the electronic ones (to save me the time of printing them), but I will accept electronic-only submissions from those who are too ill to attend class.
To get an SoE computer account see http://support.soe.ucsc.edu/new-accounts
As has been my practice since Fall 2001, there will be no exams, and we will not meet during the final exam period (Mon 7 Dec 2009, 7:30 p.m.) It turns out to be very difficult to make up small enough problems for examination—almost all the homework exercises are much larger problems than could reasonably be given on a timed exam.
The assignments will be distributed on the web.
The relative weights of the different types of assignment in the evaluation has not been determined yet—it should be roughly proportional to how much time the different assignments take to do well. I will try to assign points to each assignment as it is given, but the total number of points won't be known until I've tweaked all the assignments. I expect that most of the assignments will be similar to ones given in previous years, with a few parts tweaked to update them, but I may replace one or more assignments with new ones, if I can think of new problems at the appropriate level of difficulty. Note: since we are switching from Perl to Python this year, the assignments may need more tweaking than usual.
Collaboration without explicit written acknowledgment will be considered cheating. Collaboration on lab assignments with explicit written acknowledgment is encouraged—guidelines for the extent of reasonable collaboration will be given in class.
documentation on MUSCLE:
http://www.drive5.com/muscle/docs.htm
Refereed paper:
Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with
high accuracy and high throughput, Nucleic Acids Research 32(5),
1792-97.
PROBCONS web site (including overview of algorithm): http://probcons.stanford.edu
AMAP http://bio.math.berkeley.edu/amap
Ariel S. Schwartz and Lior Pachter
Multiple alignment by sequence annealing
Bioinformatics 2007 23(2):e24-e29;
doi:10.1093/bioinformatics/btl311
Oher multiple alignment programs:
paper on T-coffee:
T-Coffee: A novel method for fast and accurate multiple sequence alignment.
Notredame C, Higgins DG, Heringa J.
J Mol Biol 2000 Sep 8;302(1):205-17
doi:10.1006/jmbi.2000.4042
Rachel Karchin, Melissa Cline, Yael Mandel-Gutfreund, and Kevin Karplus. Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins: Structure, Function, and Genetics, 51(4):504–514, June 2003. doi:10.1002/prot.10369 reprint
Rachel Karchin, Melissa Cline, and Kevin Karplus. Evaluation of local structure alphabets based on residue burial. Proteins: Structure, Function, and Genetics, 55(3):508–518, 5 March 2004. doi:10.1002/prot.20008 reprint
|
|
| BME 205 home page | BME 205 discussion forum | Karplus's lab page | UCSC Bioinformatics research |
Questions about page content should be directed to
Kevin Karplus
Biomolecular Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
USA
karplus@soe.ucsc.edu
1-831-459-4250
318 Physical Sciences Building