From SMian@lbl.gov  Mon May 15 14:36:27 2000
Return-Path: <SMian@lbl.gov>
Sender: saira@lbl.gov
Date: Mon, 15 May 2000 14:36:07 -0700
From: Saira Mian <SMian@lbl.gov>
X-Accept-Language: en
To: Kevin Karplus <karplus@cse.ucsc.edu>
Subject: Different libraries/T87
Content-Type: text/plain; charset=us-ascii

Dear Kevin,

  Have you considered making sets of HMM libraries derived from the
current one that differ only in terms of which columns of the alignment
are designated as match states? This would change the likelihoods of the
different model given the same data but I think it may allow (more
automated) detection of more remote homologues. My intuition is that by
making the models shorter (fewer nodes), structural homologues that are
considerably shorter/longer than the training set or those at the
borderline may move to being significant matches. The sets of HMMs I had
in mind are the following

(i) secondary structure: only residues i.e. columns that are part of a
helix or strand are treated as match states (everything else is insert).

(ii) physical core: only residues that the physical core of the
structure are match states (this is akin to scoring against a set of
"threading-like" models).

(iii) structural core: only residues in helices/strands that are in the
structural core of the fold are match states. In a TIM barrel, for
example, this would correspond to the 8 strands and 8 helices.

  I had a quick look at T87: my suggestions are casp4/t87/saira/README.
I didn't know whether everyone is free to add to files in the main
directory so I made a work space for myself.

	-saira
 
-- 
I. Saira Mian
Life Sciences Division (Mail Stop 74-197)  E-mail: SMian@lbl.gov
Lawrence Berkeley National Laboratory      Tel:    (510) 486-6216
1 Cyclotron Road                           Fax:    (510) 486-6949
Berkeley, California 94720


From karplus@cse.ucsc.edu  Mon May 15 15:51:01 2000
Return-Path: <karplus@cse.ucsc.edu>
Date: Mon, 15 May 2000 15:50:59 -0700
From: Kevin Karplus <karplus@cse.ucsc.edu>
To: SMian@lbl.gov
CC: karplus@cse.ucsc.edu
In-reply-to: <39206DC7.C023090@lbl.gov> (message from Saira Mian on Mon, 15
	May 2000 14:36:07 -0700)
Subject: Re: Different libraries/T87

I have not experimented with building HMMs for reduced numbers of
columns, though I have thought about it occasionally.

My intuition is that there is no advantage on traget models (where
structure is unknown), but there may be some advantage for template
models, where one could restrict the model to core regions, and use
FIMs for large inserts.  The hard part is in figuring out which
columns to keep and which to discard in building the model.  For a
small number of hand-built models, expert intuition is usable, but for
the 2500 or more needed for a template library, it has to be
automatic.  The closest I've come is for some of the SCOP domains,
which are non-contiguous, using a FIM for the inserted other domain.

I'm having enough trouble figuring out how to set parameters for the
whole-chain library---adding extra complexity in the form of choosing
the right subset of the columns would probably overwhelm me right now.