The Spliced EST track displays Expressed Sequence Tags (ESTs) from GenBank that show signs of splicing when aligned against the genome. To be considered spliced, an EST must show evidence of at least one cannonical intron, i.e. one that is at least 32 bases in length and has GT/AG ends. By requiring splicing, the level of contamination in the EST databases is drastically reduced at the expense of eliminating many genuine 3' ESTs. For a display of all ESTs (including unspliced), see the $Organism EST track.
Expressed sequence tags are single-read (typically approximately 500 base) sequences which usually represent fragments of transcribed genes. Aligning regions (usually exons) are shown as black boxes connected by lines for gaps (usually spliced-out introns). In full display mode, arrows on the introns indicate the direction of transcription, which is determined by looking at the splice sites.
Strand information provided for ESTs (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated.
To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector, and a read taken from the 5' and/or 3' primer. For most - but not all - ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded.
In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries are starting to hit transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to get sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. (Even outside of the random-primed projects, there is a degree of non-mRNA contamination.) Because of this, a single unspliced EST should be viewed with considerable skepticism. However, because the $Organism 3' UTRs are quite long, the splicing requirement does eliminate many genuine 3' ESTs.
To generate this track, $Organism ESTs from GenBank are aligned against the genome using the blat program. Note that the maximum intron length allowed by blat is 500,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligns in multiple places, the alignment having the highest base identity is found. Only alignments that have a base identity level within 1% of the best are kept. Alignments must also have at least 93% base identity to be kept.
The track filter can be used to change the color or include/exclude a subset of individual items within a track. This is helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter:
When you have finished configuring the filter, click the Submit button.
CreditsThe Spliced EST track is produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide.
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. (2004) GenBank: update. Nucleic Acids Res. 32 Database issue:D23-6.