How the Alt-Splicing Table was Generated

To generate the alternative splicing table I started with the cDNA alignments. From these I generated an intron list. The intron list was sorted, and distinct introns (defined as differing by at least 2 bases from each other) that overlapped were put on a intron/intron overlap list. I also generated a list of exons from the cDNA alignments by merging together blocks separated by no more than one or two bases of noise in the cDNA alignments. The exon list was sorted and checked for overlaps against the intron list. To reduce the amount of noise the overlaps between intron and exons were required to be at least 3 bases long in general, and a bit longer than that if there appeared to be noise in the sequence near the overlap. The intron/intron overlap and the intron/exon overlap lists were merged, and sorted by the name of the ORF or nameless cluster closest to the overlap. This resulted in a list of 873 possible alternatively spliced genes. I inspected this list by hand and removed items that were clearly the result of noise or overlapping transcriptions on opposite strands. The 677 remaining items are in the Alt-Splicing Table. Though it is likely that some noise remains in the table, I am confident that the vast majority of the items in the table represent alternative splicing or overlapping transcription awaiting better characterization. The web-based cDNA alignment viewer hyper-linked to the table make it an easy matter to judge for yourself how strong the evidence for a particular gene is.

Jim Kent, August 1999