This track shows predictions of conserved elements produced by the phastCons program. They are based on a phylogenetic hidden Markov model (phylo-HMM), a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next.
Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. A multiple alignment was then constructed from these pairwise alignments using multiz. Predictions of conserved elements were then obtained by running phastCons on the multiple alignments with the --viterbi option.
PhastCons constructs a two-state phylo-HMM with a state for conserved regions and a state for non-conserved regions. The two states share a single phylogenetic model, except that the branch lengths of the tree associated with the conserved state are multiplied by a constant scaling factor &rho (0 &le &rho &le 1). The phylogenetic model and the scaling factor &rho are estimated from the data by maximum likelihood using an EM algorithm. The state-transition probabilities of the phylo-HMM may also be estimated by maximum likelihood, but we find that more biologically useful results are obtained by treating them as tuning parameters. The transition probabilities are set such that the "coverage" of the track (portion of sites in conserved elements) is approximately equal to the share of the reference genome believed to be under purifying selection (5% for mammalian genomes) and the prior expected length of a conserved element is 12 bp. To ensure that the scores in different regions of the genome are directly comparable, we first obtain global estimates of all parameters, by averaging estimates obtained from different regions of the genome, and then, in a second pass, we produce predictions across the genome using the same parameters everywhere. The predicted elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM--i.e., maximal segments in which the maximum-likelihood (Viterbi) path completely resides in the conserved state.
Each predicted conserved element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The transformed scores determine how darkly each element is shaded in the browser, with higher scores resulting in darker shading and lower scores resulting lighter shading. The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full".
This track was created at UCSC using the following programs:
Felsenstein J and Churchill GA (1996). A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13:93-104.
Siepel A and Haussler D (2003). Combining phylogenetic and hidden Markov models in biosequence analysis. In Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB 2003), pp. 277-286.
Siepel A and Haussler D (2004). Phylogenetic hidden Markov models. In R. Nielsen, ed., Statistical Methods in Molecular Evolution, Springer (in press).
Yang Z (1995). A space-time process model for the evolution of DNA sequences. Genetics, 139:993-1005.
Kent WJ, Baertsch R, Hinrichs A, Miller W, and Haussler D (2003). Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20):11484-11489 Sep 30 2003.
Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W. (2004). Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner. Genome Res. 14(4):708-15.
Chiaromonte F, Yap VB, and Miller W (2002). Scoring pairwise genomic sequence alignments. Pac Symp Biocomput 2002;:115-26.
Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, and Miller W. (2003). Human-Mouse Alignments with BLASTZ. Genome Res. 13(1):103-7.