Description

This track displays "reciprocal best" human/chimpanzee genomic alignment chains. These alignments were generated using blastz and blat alignments of chimpanzee genomic sequence from the 13 Nov. 2003 ARACHNE chimpanzee draft assembly.

Alignments were performed using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chimp and $organism simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species.

The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are often due to gaps caused by unsequenced portions of the chimp genome, but occasionally represent deletion in the non-human species or insertion in the human relative to the chimp. Double lines represent gaps in the chain where sequence is present in both species but they do not align, or cases where there is an unsequenced gap in one species. In cases where there are multiple chains over a particular portion of the $organism genome, chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the fuller display modes, the individual feature names indicate the scaffold name, strand, and location (in thousands) of the match for each matching alignment.

Methods

The alignments were generated by blastz on repeatmasked sequence using the following chimp/human scoring matrix:

     A    C    G    T
     A   100 -300 -150 -300
     C  -300  100 -300 -150
     G  -150 -300  100 -300
     T  -300 -150 -300  100

     K = 4500, L = 3000,  Y=3400, H=2000
 

The resulting alignments were processed by the axtChain program. AxtChain organizes all the alignments between a single chimp scaffold and a single $organism chromosome into a group and makes a kd-tree out of all the gapless subsections (blocks) of the alignments. The maximally-scoring chains of these blocks were found by running a dynamic program over the kd-tree. Chains scoring below a certain threshold were discarded.

To place additional chimp scaffolds that weren't initially aligned by blastz, a DNA blat of the unmasked sequence was performed. The resulting blat alignments were also chained, and then merged with the blastz-based chains produced in the previous step, to produce "all chains".

Due to the draft nature of this initial genome assembly, this chain (and the companion net) track was generated using a "reciprocal best" strategy. This strategy attempts to minimize paralog fill-in for missing orthologous chimp sequence by filtering out of the human net all sequences not in the chimp side of the net.

First, the merged blastz and blat chains were used to generate an alignment net, using the program chainNet (described in the Chimp Net track description page). Next, the subset of chains in the chimp-reference net were extracted, and used for an additional netting step. The reulting human-reference net was used to generate the reciprocal best "Chimp Net" browser track. Non-syntenic sequences smaller than 50 bases were filtered out. Finally, chains extracted from this net are displayed on this track as the reciprocal best "Chimp Chains".

Credits

The chimp sequence used in this track was obtained from the 13 Nov. 2003 Arachne assembly. We'd like to thank the National Human Genome Research Institute (NHGRI), the Eli & Edythe L. Broad Institute at MIT/Harvard, and Washington University School of Medicine for providing this sequence.

Blastz was developed at Pennsylvania State University by Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison.

The chaining and netting programs were developed at the University of California at Santa Cruz by Jim Kent.

The browser display and database storage of the chains were made by Robert Baertsch and Jim Kent.

References

Human-Mouse Alignments with BLASTZ. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler H, and Miller W. Genome Research 2003 Jan;13(1):103-7.

Scoring pairwise genomic sequence alignments. Chiaromonte F, Yap VB, Miller W. Pac Symp Biocomput 2002;:115-26.