Description
The Known Genes track shows known protein coding genes based on
proteins from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their
corresponding mRNAs from Genbank.
Coding exons are displayed
taller than 5' and 3' untranslated regions (UTR). Connecting introns
are one-pixel lines with hatch marks indicating direction of transcription.
Entries which have corresponding entries in PDB are colored black.
Entries which either have corresponding proteins in SWISS-PROT or mRNAs that are
NCBI Reference Sequences with a "Reviewed" status are colored dark blue.
Entries which have mRNAs that are
NCBI Reference Sequences with a "Provisional" status are colored lighter blue.
Everything else is colored with lightest blue.
Method
All mRNAs of a species are aligned against the genome using the BLAT
program. When a single mRNA aligns in multiple places, only
the best alignments are kept. The alignments must also have
at least 98% sequence identity to be kept.
This set of mRNA alignments is further reduced by keeping only those mRNAs that
are referenced by a protein in SWISS-PROT, TrEMBL, or TrEMBL-NEW.
Among multiple mRNAs referenced by a single protein, the best mRNA is chosen based on
a quality score, which depends on its length, how good its translation matches
the protein sequence, and its release date.
The list of mRNA and protein pairs are further cleaned up by removing
short invalid entries and consolidating entries with identical CDS regions.
Finally, RefSeq entries which are derived from DNA sequences instead of
mRNA sequences are added.
Credits
The Known Genes track is produced at UCSC based primarily on cross-references
between proteins from
SWISS-PROT
(also including TrEMBL and TrEMBL-NEW) and mRNAs from Genbank
generated by scientists worldwide. Part of
NCBI RefSeq
data are also included in this track.
Data Use Restrictions
The SWISS-PROT entries in this annotation track are copyrighted. They are
produced through a collaboration
between the Swiss Institute of Bioinformatics and the EMBL Outstation - the
European Bioinformatics Institute. There are no restrictions on their use by
non-profit institutions as long as their content is in no way modified and this
statement is not removed. Usage by and for commercial entities requires a
license agreement (see
http://www.isb-sib.ch/announce/ or send an email to
license@isb-sib.ch).