This track shows a measure of evolutionary conservation in human, chimp, mouse, rat, and chicken based on a phylogenetic hidden Markov model (phylo-HMM). The multiz alignments of the human Jul. 2003 (hg16), chimpanzee Nov. 2003 (panTro1), mouse Feb. 2003 (mm3), rat Jun. 2003 (rn3), and chicken Feb. 2004 (galGal2) assemblies were used to generate the annotation.
In "full" visibility mode, this track displays pairwise alignments of chimp, mouse, rat, and chicken, each aligned to the human genome. The pairwise alignments are displayed in the standard UCSC browser "dense" mode using a greyscale density gradient. The checkboxes in the track configuration section allow the exclusion of species from the pairwise display; however, this does not remove them from the conservation score display.
When zoomed-in to the base display level, the track shows the base composition of each alignment. The numbers and symbols on the "human gap" line indicate the lengths of gaps in the human sequence at those alignment positions. If the gap size is greater than 9, the "+" symbol is displayed. To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment.
This track may be configured in a variety of ways to highlight different aspects of the displayed information. Click the "Graph configuration help" link for an explanation of the configuration options.
Best-in-genome blastz pairwise alignments of human-mouse and human-rat were multiply aligned using a program called humor (HUman-MOuse-Rat), which is a special variant of the Multiz program. Multiz was used first to align the humor results with reciprocal best human-chimp alignments, and then to align the human-chimp-mouse-rat multiple alignment with best-in-genome blastz human-chicken alignments. The resulting human-chimp-mouse-rat-chicken multiple alignments were then assigned conservation scores by phylo-HMM.
A phylo-HMM is a probabilistic model that describes both the process of DNA substitution at each site in a genome, and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2003, Siepel and Haussler 2004). A phylo-HMM can be thought of as a machine that generates a multiple alignment, in the same way that an ordinary hidden Markov model (HMM) generates an individual sequence. While the states of an ordinary HMM are associated with simple multinomial probability distributions, the states of a phylo-HMM are associated with more complex distributions defined by probabilistic phylogenetic models. These distributions can capture differences in the rates and patterns of nucleotide substitution observed in different types of genomic regions (e.g., coding or noncoding regions, conserved or nonconserved regions).
To compute a conservation score, we use a
k-state phylo-HMM, whose k associated phylogenetic
models differ only in overall evolutionary rate (Felsenstein and
Churchill 1996, Yang 1995). In the image at right, there are three
k states,
S1, S2, and S3, but in practice we
use k = 10.
A phylogenetic model is estimated globally, using the discrete gamma model
for rate variation (Yang 1994), then a scaled version of the estimated model
is associated with each state in a phylo-HMM. There is a
separate "rate constant", ri, for each state i,
which is multiplied by all branch lengths in the globally estimated model.
The transition probabilities between states allow for autocorrelation of
substitution rates, i.e., for adjacent sites to tend to exhibit similar
overall substitution rates. A single parameter, lambda, describes the
degree of autocorrelation and defines all transition probabilities.
Here, we have estimated the rate constants from the data,
similarly to Yang (1995) (Siepel and Haussler 2003), but have
allowed lambda to be treated as a tuning parameter. For the
conservation score, we use the posterior probability that each site was
"generated" by the state having the smallest rate constant. Because of
the way the rate categories are defined, the plotted values can be
thought of as approximately representing the posterior probability that
each site is among the 10% most conserved sites in the data set
(allowing for autocorrelation of substitution rates).
In this case, the general reversible (REV) substitution model was used in parameter estimation, and lambda was set to 0.9. Alignment gaps were treated as missing data, which sometimes has the effect of producing undesirably high posterior probabilities in gappy regions of the alignment. We are looking at several possible ways of improving the handling of alignment gaps.
This track was created at UCSC using the following programs:
Felsenstein J and Churchill GA (1996). A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13:93-104.
Siepel A and Haussler D (2003). Combining phylogenetic and hidden Markov models in biosequence analysis. In Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB 2003), pp. 277-286.
Siepel A and Haussler D (2004). Phylogenetic hidden Markov models. In R. Nielsen, ed., Statistical Methods in Molecular Evolution, Springer (in press).
Yang Z (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods, J Mol Evol 39:306-314.
Yang Z (1995). A space-time process model for the evolution of DNA sequences. Genetics, 139:993-1005.
Kent WJ, Baertsch R, Hinrichs A, Miller W, and Haussler D (2003). Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20):11484-11489 Sep 30 2003.
Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W. (2004). Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner. Genome Res. 14(4):708-15.
Chiaromonte F, Yap VB, and Miller W (2002). Scoring pairwise genomic sequence alignments. Pac Symp Biocomput 2002;:115-26.
Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, and Miller W. (2003). Human-Mouse Alignments with BLASTZ. Genome Res. 13(1):103-7.