Description

The Known Genes track shows known protein coding genes based on proteins from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their corresponding mRNAs from Genbank. Coding exons are displayed taller than 5' and 3' untranslated regions (UTR). Connecting introns are one-pixel lines with hatch marks indicating direction of transcription. Entries which have corresponding entries in PDB are colored black. Entries which either have corresponding proteins in SWISS-PROT or mRNAs that are NCBI Reference Sequences with a "Reviewed" status are colored dark blue. Entries which have mRNAs that are NCBI Reference Sequences with a "Provisional" status are colored lighter blue. Everything else is colored with lightest blue.

Method

All mRNAs of a species are aligned against the genome using the BLAT program. When a single mRNA aligns in multiple places, only the best alignments are kept. The alignments must also have at least 98% sequence identity to be kept. This set of mRNA alignments is further reduced by keeping only those mRNAs that are referenced by a protein in SWISS-PROT, TrEMBL, or TrEMBL-NEW.

Among multiple mRNAs referenced by a single protein, the best mRNA is chosen based on a quality score, which depends on its length, how good its translation matches the protein sequence, and its release date. The list of mRNA and protein pairs are further cleaned up by removing short invalid entries and consolidating entries with identical CDS regions.

Finally, RefSeq entries which are derived from DNA sequences instead of mRNA sequences are added.

Credits

The Known Genes track is produced at UCSC mainly based on cross-references between proteins from SWISS-PROT (also including TrEMBL and TrEMBL-NEW) and mRNAs from Genbank generated by scientists worldwide. Part of NCBI RefSeq data are also included in this track.