Candidate Oligos for every Stanford Oligo Chip track
Oligos were chosen for every Sanger22 annotation on chr22 as
well as about 2000 other genes. Two oligos were chosen with
a 3' bias, two with a 5' bias, and two with no bias. For this
purpose exons are defined to include 3' and 5' UTRs.
The strategy
These oligo selections are based on the following ideas:
- Oligos should have minimum secondary structure as
they must be available for hybridization.
- Oligos should be unique in genome if possible. No
repeats, should not Blat or Blast other places in genome.
- If using oligo-dT for RT-Priming oligos should be in 3' end
of gene transcript (including UTR).
- Oligos should have a uniform hybridization temperature if
possible. All oligos must be hybridized at same temperature,
want to minimize cross hybe yet maximize signal.
Currently we don't have data to identify which parameters
are more important than others. Also, some of these scores
are overlapping (i.e. if tm is limited then high secondary
structure is less likely). See below for histograms of these
criteria.
The Details:
The Algorithm
- Step through each exon at a step size proportional to the
size of the exon examining possible oligos, excluding areas that
are RepeatMasked.
- Score each oligos for: Tm difference, distance from 3' end,
secondary structure, and an Affymetrix heuristic.
- Look through candidate probes remembering the maximum
score for each score.
- Each score is then normalized by dividing by the maximum
and then the normalized scores are combined as an average and oligos
are sorted to find the best overall score.
- Oligos with the best combined normalized scores are BLATed
until one is found that has a BLAT score below a given
threshold.
- As oligos are chosen, candidate oligos that overlap those
already chosen are discarded.
- If no scores pass the BLAT score or not enough oligos have been
chosen just pick oligos that have the best combined score.
About the scores:
Histograms of Scores
Histograms are from the Stanford picked gene set.

Secondary structure measured in Gibb's Free energy, higher scores are better. |

Blat (similar to blast) histogram, lower scores are better. |

Melting temperatures, scores over 100C do happen in algorithm. |

Percentage GC, not used in algorithm but presented anyway. |
Please note that all coordinates are relative to the '+' strand
while all oligo sequences are 5'->3'. This means that all sequences
displayed are part of the sense strand. So if the oligo is represented
in the database as being on the '-' strand and starts at 1 and ends at
5 of 'atgcatgc' the '+' sequence of the probe would be 'tgcat' but
that is 3'->5' on the '-' strand so the sequence in the sequence would
be the reverse complement 'atgct'.