This track shows alignments between $organism Expressed Sequence Tags (ESTs) in GenBank and the genome.
Expressed sequence tags are single-read (typically approximately 500 base) sequences which usually represent fragments of transcribed genes. Aligning regions (usually exons) are shown as black boxes connected by lines for gaps (usually spliced out introns). In full display mode, arrows on the introns indicate the direction of transcription. In the December 2001 assembly and later, this direction is taken by looking at the splice sites. In previous assemblies, the direction of transcription was taken from the GenBank annotations, which frequently were inaccurate.
Strand information provided for ESTs (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated.
To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector, and a read taken from the 5' and/or 3' primer. For most - but not all - ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded.
In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries are starting to hit transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to get sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. (Even outside of the random-primed projects, there is a degree of non-mRNA contamination.) Because of this, a single unspliced EST should be viewed with considerable skepticism. However, because the $organism 3' UTRs are quite long, the splicing requirement does eliminate many genuine 3' ESTs.
To generate this track, $organism ESTs from GenBank are aligned against the genome using the blat program. Note that the maximum intron length allowed by blat is 500,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligns in multiple places, the alignment having the highest base identity is found. Only alignments that have a base identity level within 1% of the best are kept. Alignments must also have at least 93% base identity to be kept.
The track filter can be used to change the color or include/exclude a subset of individual items within a track. This is helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter:
When you have finished configuring the filter, click the Submit button.
The $Organism EST track is produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide.