Description

CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites, and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the C's in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time methylated C's tend to turn into T's because of spontaneous deamination. The result is that CpG's are relatively rare unless there is selective pressure to keep them or a region is not methylated for some reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpG's are present at significantly higher levels than is typical for the genome as a whole.

Method

CpG islands are predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment is then evaluated to determine GC content (>= 50%), length (> 200), and ratio of observed number of CG dinucleotides to the expected number on the basis of the number of G's and C's in the segment (> 0.6).

The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula in Gardiner-Garden and Frommer, J. Mol. Biol. (1987) 196 (2), 261-282:
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)

Credits

This track was generated using a modification of a program developed by G. Miklem and L. Hillier.