Description

This track shows alignments of $o_organism ($o_db, $o_date) to the $organism genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both $o_organism and $organism simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The $o_organism genomic sequence is from the 13 Nov. 2003 Arachne draft assembly.

The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the $o_organism assembly or an insertion in the $organism assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the $organism genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes.

In the "pack" and "full" display modes, the individual feature names indicate the scaffold, strand, and location (in thousands) of the match for each matching alignment.

Methods

The alignments were generated by blastz on repeatmasked sequence using the following $organism/$o_organism scoring matrix:

          A    C    G    T
     A   100 -300 -150 -300
     C  -300  100 -300 -150
     G  -150 -300  100 -300
     T  -300 -150 -300  100

     K = 4500, L = 3000,  Y = 3400, H = 2000

The resulting alignments were fed into axtChain, which organizes all alignments between a single $o_organism scaffold and a single $organism chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. Chains scoring below a threshold were discarded.

To place additional $o_organism scaffolds that were not initially aligned by blastz, a DNA blat of the unmasked sequence was performed. The resulting blat alignments were also chained and then merged with the blastz-based chains produced in the previous step to produce the chains displayed in this track.

Credits

The $o_organism sequence used in this track was obtained from the 13 Nov. 2003 Arachne assembly. We'd like to thank the National Human Genome Research Institute (NHGRI), the Broad Institute at MIT/Harvard, and Washington University School of Medicine for providing this sequence.

Blastz was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison.

Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program.

The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

The browser display and database storage of the chains were generated by Robert Baertsch and Jim Kent.

References

Chiaromonte, F., Yap, V.B., Miller, W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput 2002, 115-26 (2002).

Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., Haussler, D., and Miller, W. Human-Mouse Alignments with BLASTZ. Genome Res. 13(1), 103-7 (2003).