Schedule UCSC BME 205 Fall 2013

Bioinformatics: models and algorithms

(Last Update: 16:42 PST 6 December 2013 )

Lecture and Homework Schedule

The lecture schedule reflects the material actually delivered. To get an idea what is coming up, see Fall 2012 schedule.

Date Lecture Topic(s) Due
Fri 27 Sept 2013administrivia, texts, assignments, structure of course. Fundamental dogma and sequencing as a major source of bioinformatic data.
Error rates of information copying (polymerases):
DNA->DNA: Taq 1/125,000; Pfu 1/2,300,000; mitochondrial pol gamma ; 1/300,000-1/500,000; pol eta 1/18-1/38
RNA->DNA: HIV 1/1700-/8000; SIV 1/19,000; Accuscript RT 1/62,000
RNA->RNA: flu 1/10,000; Qbeta 1/3,000.
Ribosome: Ecoli 1/3000; generally 1/2000-1/20,000.
intake survey
Mon 30 Sept 2013Python tutorial
Wed 2 Oct 2013DNA sequencing (went over table from Wikipedia DNA Sequencing—except SOLiD):
Method Single-molecule real-time sequencing (Pacific Bio) Ion semiconductor (Ion Torrent sequencing) Pyrosequencing (454) Sequencing by synthesis (Illumina) Sequencing by ligation (SOLiD sequencing) Chain termination (Sanger sequencing)
Read length 5,000 bp average; maximum read length ~22,000 bases up to 400 bp 700 bp 50 to 250 bp 50+35 or 50+50 bp 400 to 900 bp
Accuracy 99.999% consensus accuracy; 87% single-read accuracy 98% 99.9% 98% 99.9% 99.9%
Reads per run 50,000 per SMRT cell, or ~400 megabases up to 80 million 1 million up to 3 billion 1.2 to 1.4 billion N/A
Time per run 30 minutes to 2 hours 2 hours 24 hours 1 to 10 days, depending upon sequencer and specified read length 1 to 2 weeks 20 minutes to 3 hours
Cost per 1 million bases (in US$) $0.75-$1.50 $1 $10 $0.05 to $0.15 $0.13 $2400
Advantages Longest read length. Fast. Detects 4mC, 5mC, 6mA. Less expensive equipment. Fast. Long read size. Fast. Potential for high sequence yield, depending upon sequencer model and desired application. Low cost per base. Long individual reads. Useful for many applications.
Disadvantages Moderate throughput. Equipment can be very expensive. Homopolymer errors. Runs are expensive. Homopolymer errors. Equipment can be very expensive. Slower than other methods. More expensive and impractical for larger sequencing projects.

Discussed different error types for different sequencing methods. Also discussed paired-end and mate-pair library construction. Challenge: how is error rate measured for polymerases?
Fri 4 Oct 2013What is a probability function? (domain=event space, range=[0,1], sums to 1). Stochastic model is computable probability function. Developed uniform probability for strings of given length. Challenge: extend to strings of any length.
Mon 7 Oct 2013Discussed bugs in HW 1 (forgetting to include apostrophe in letters, wrong sort order). Also covered tips on various ways to split line into words, to specify sort order, and to do mutual exclusion of options. Covered stop character as one way to get stochastic model for all strings (gave terms "i.i.d." and "zero-order Markov").
Wed 9 Oct 2013Answered questions about FASTA/FASTQ assignment. Discussed ways to measure DNA polymerase accuracy (using lacZ reporter). Reviewed i.i.d. with stop character and got length distribution by marginalizing. Used Socratic dialog to develop factored length and i.i.d. model.
Fri 11 Oct 2013Discussed fellowship applications.
Mon 14 Oct 2013Discussed Python problems seen in 2nd homework. Covered a couple more Python idioms (including izip).
Wed 16 Oct 2013Markov chains.
Fri 18 Oct 2013coding cost, information gain, entropy, relative entropy. Counts to probabilities to log-P. pseudocounts. (MLE and MAP).
Mon 21 Oct 2013Writing problems (research overview first, specific aims, topic sentences, avoid "this" as pronoun, old info->new info, comma splices, "however" is not conjunction, "would" used primarily for contrary-to-fact in tech writing). Questions about Markov chain assignment. How to sum probabilities in log-prob representation.
Wed 23 Oct 2013Answered questions on Markov chain assignment. First part of "better than chance" lecture on null models. P-values and E-values.
Fri 25 Oct 2013Second part of "better than chance"
Mon 28 Oct 2013Finished "better than chance" introduced substitution matrices as log P(aligned)/P(independent) and described how BLOSUM matrices are created. Talked little about PAM matrices. Lecture may have been rough due to fatigue.
Wed 30 Oct 2013Python problems and idioms that came up in Markov-model assignment. A lot of time spent on what documentation needs to contain and why.
Fri 1 Nov 2013Presentation by Ann Hubble in Science Library.
Mon 4 Nov 2013Feedback on programming for palindrome assignment. Started on alignment, covering gapless alignment.
Wed 6 Nov 2013Guest lecture by Dent Earl. Google doc presentation
Fri 8 Nov 2013Substitution matrices, PAM, BLOSUM. Gapless alignment, dynamic programming, traceback
Mon 11 Nov 2013Veteran's Day (no class)
Wed 13 Nov 2013Feedback on homework. Alignment with arbitrary gap costs. Memoized vs dynamic programming. BLAST and prefiltering. Dot plots.
Fri 15 Nov 2013Arbitrary gap costs. Global alignment and local, recurrence relations, matrices.
Mon 18 Nov 2013Linear gap costs. Global and local. Traceback.
Wed 20 Nov 2013Building locked. No class.
Fri 22 Nov 2013Affine gap alignment (global and local, traceback). A2M format.
Mon 25 Nov 2013Feedback on paper, some further discussion of affine-gap alignment traceback.
Wed 27 Nov 2013Intro to HMMs.
Fri 29 Nov 2013Thanksgiving (no class)
Mon 2 Dec 2013HMM review (forward and Viterbi), local alignment by changing HMM, building HMM from multiple sequence alignment (thinning, Henikoff sequence weighting, brief mention of Dirichlet mixtures), EM and Lagrangian multipliers
Wed 4 Dec 2013EM & Lagrangian multiplier review, Baum-Welch training, expected value of other functions of state, HMM for base-caller on nanopores
Fri 6 Dec 2013Overview of read types (including paired-end and mate pair), genome assembly, overlap-consensus graph, de Bruijn graph, k-mer cleaning, coverage needed
Thurs 12 Dec 20138a.m.–11a.m. exam slot, not used

SoE home
sketch of Kevin Karplus by Abe
Kevin Karplus's home page
Biomolecular Engineering Department
BME 205 home page UCSC Bioinformatics research

Questions about page content should be directed to Kevin Karplus
Biomolecular Engineering
University of California, Santa Cruz
Santa Cruz, CA 95064
318 Physical Sciences Building

Locations of visitors to pages with this footer (started 3 Nov 2008)