Description
The Known Genes track shows known protein coding genes based on
proteins from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their
corresponding mRNAs from Genbank.
Coding exons are displayed
taller than 5' and 3' untranslated regions (UTR). Connecting introns
are one-pixel lines with hatch marks indicating direction of transcription.
Entries which have corresponding entries in PDB are colored black.
Entries which either have corresponding proteins in SWISS-PROT or mRNAs that are
NCBI Reference Sequences with a "Reviewed" status are colored dark blue.
Entries which have mRNAs that are
NCBI Reference Sequences with a "Provisional" status are colored lighter blue.
Everything else is colored with lightest blue.
Method
All mRNAs of a species are aligned against the genome using the BLAT
program. When a single mRNA aligns in multiple places, only
the best alignments are kept. The alignments must also have
at least 98% sequence identity to be kept.
This set of mRNA alignments is further reduced by keeping only those mRNAs that
are referenced by a protein in SWISS-PROT, TrEMBL, or TrEMBL-NEW.
Among multiple mRNAs referenced by a single protein, the best mRNA is chosen based on
a quality score, which depends on its length, how good its translation matches
the protein sequence, and its release date.
The list of mRNA and protein pairs are further cleaned up by removing
short invalid entries and consolidating entries with identical CDS regions.
Finally, RefSeq entries which are derived from DNA sequences instead of
mRNA sequences are added.
Credits
The Known Genes track is produced at UCSC mainly based on cross-references
between proteins from
SWISS-PROT
(also including TrEMBL and TrEMBL-NEW) and mRNAs from Genbank
generated by scientists worldwide. Part of
NCBI RefSeq
data are also included in this track.