The Known Genes II track was built based on UCSC Known Genes data set of hg17 (Human May 2004 Assembly). Clicking the "Outside Link" entry above will bring you to the gene details page of hg17 (Human May 2004 Assembly). The original "Known Genes" track of hg16 (built in March, 2004) is somewhat outdated, but still available.
The hg17 UCSC Known Genes was built by a new process, KG II, as described below.
UniProt protein sequences (including alternative splicing isoforms) and mRNA sequences from RefSeq and GenBank were aligned against the base genome using BLAT. RefSeq alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. GenBank mRNA alignments having a base identity level within 0.2% of the best and at least 97% base identity with the genomic sequence were kept. Protein alignments having a base identity level within 0.2% of the best and at least 80% base identity with the genomic sequence were kept.
Then the genomic mRNA and protein alignments were compared, and protein-mRNA pairings were determined from their overlaps. mRNA CDS data were obtained from RefSeq and GenBank data and supplemented by CDS structures derived from UCSC protein-mRNA BLAT alignments. The initial set of UCSC Known Genes candidates consists of all protein-mRNA pairs with valid mRNA CDS structures. A gene-check program (similar to the one used for the Consensus CDS (CCDS) project) is used to remove questionable candidates, such as those with in-frame stop codons, missing start or stop codons, etc.
From each group of gene candidates that share the same CDS structure, the protein-mRNA pair having the best ranking and protein-mRNA alignment score is selected as a UCSC Known Gene. The ranking of a gene candidate depends on its gene-check quality measures. When all else is equal, a preference is given to RefSeq mRNAs and next to MGC mRNAs. Similarly, preference is given to gene candidates represented by Swiss-Prot proteins. The protein-mRNA alignment score is calculated based on a protein-to-mRNA alignment using TBLASTN, plus weighted sub-scores according to the date and length of the mRNA.
The UCSC Known Genes track was produced using protein data from UniProt and mRNA data from NCBI RefSeq and GenBank.
The UniProt entries in this annotation track are copyrighted. They are produced through a collaboration between the Swiss Institute of Bioinformatics and the EMBL Outstation - the European Bioinformatics Institute. There are no restrictions on their use by non-profit institutions as long as their content is in no way modified and this statement is not removed. Usage by and for commercial entities requires a license agreement (see http://www.isb-sib.ch/announce/ or send an email to license@isb-sib.ch).
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. GenBank: update. Nucleic Acids Res. 32, D23-6 (2004).
Kent, W.J. BLAT - the BLAST-like alignment tool. Genome Res. 12(4), 656-664 (2002).