From SMian@lbl.gov  Fri May 19 09:33:15 2000
Return-Path: <SMian@lbl.gov>
Sender: saira@lbl.gov
Date: Fri, 19 May 2000 09:33:11 -0700
From: Saira Mian <SMian@lbl.gov>
X-Accept-Language: en
To: karplus@cse.ucsc.edu
Subject: General comments
Content-Type: text/plain; charset=us-ascii

Dear Kevin,
  A general strategy I've found works well with large numbers of fairly
related sequences (as in, for example T90), is to start with an HMM that
has a very high value for the cutmatch parameter (>0.95). This picks out
highly conserved "motif blocks". Once defined, their match states are
fixed (type definition A) and the model retrained with a smaller
cutmatch to tidy up the intervening regions.
  One fairly easy approach for improving models/alignments is to use a
pseudo Gibbs sampling method. Given a model, align all but the worst
scoring sequence to the model and then use the a2m file and
modelfromalign to reestimate a new HMM. Align all the sequences to the
new HMM, this usually fixes some of the problems with the poor sequence.
Repeat with other problematic seqences. Although the likelihood of the
model may not change significantly, the alignment does often improve.
	-saira
-- 
I. Saira Mian
Life Sciences Division (Mail Stop 74-197)  E-mail: SMian@lbl.gov
Lawrence Berkeley National Laboratory      Tel:    (510) 486-6216
1 Cyclotron Road                           Fax:    (510) 486-6949
Berkeley, California 94720


