18 May 2000 Kevin Karplus

	ADP-ribose pyrophosphatase seems to be similar to 1mut and 1tum
	nucleoside triphosphate pyrophosphohydrolase (mutt), based on
	double-blast results.  The FSSP representative is 1tum.
	
	The 1MUT and 1TUM files are included in the T2k alignment, so
	prediction should be relatively straightforward.  We may want
	to try doing some alignment trimming or adjustment.
	
	With a quick eyeballing of the results using see-a2m, I think
	I like the 1tum/1tum-T0090-fssp-global.pw.a2m alignment best,
	but there are some suspect chunks of it (the insertion in the
	beta strand, for example).
	
	After we get a good alignment, we should probably run it through
	SCRWL to set the sidechains.  Questions: should we keep the
	ligand there?  Will SCWRL pay attention to ligands?

	The target-model search found 1tum and 1mut, of course, but
	also added a very weak hit for 1lvl.  I see no reason to
	pursue the 1lvl hit.

	The template model search also scored 1tum and 1mut at the
	top, with 1tum slightly better, and added three weak hits
	1puc, 1sceA, 4xis, all down at the noise level.


19 May 2000 Kevin Karplus
	Saira made a recommendation for how to get a better multiple
	alignment for T0090 (and similar large sets of related sequences).
	See "mail" for details.  The basic idea is to use model
	surgery with stringent cutoffs to get the core region, then
	freeze the core region (she suggests changing the model, but I
	think that using constraints may be more principled, and redo
	the model surgery with lower threhsolds to get the somewhat
	conserved non-core regions.

	I have not tried it on T0090 yet, but the T0090.t2k alignment
	seems to have a pretty solid core from about 65 through 180.
	Model surgery is not very good about adding things back on at
	the end, so I'm not sure how well Saira's method would work on this 
	multiple alignment.
	
	We could try doing a tuneup on the alignment (which allows
	model surgery).
	I tried doing "tuneup" and "noseed" alignments for T0090, and
	creating the pairwise alignments with 1TUM.  The alignment 
	1tum/nooseed-1tum-T0090-local.pw.a2m.gz looks pretty good, getting
	a conserved glutamic acid (E) that interacts with the ligand
	(GLU98) that is not conserved in the T0090-1tum-global alignment.
	The 1tum-T0090-fssp-global alignment still looks much better
	to me, as it not only conserves GLU98, but also several other
	residues in that neighborhood.

22 May 2000 Kevin Karplus
	Looking at the evolutionary tree, we can see that the
	MUTT_ECOLI (1tum template) is almost an out-group member for
	tree.  It is only added in t2k_4 (not t2k_3), consistent with
	this view of it as a fairly distant homolog.
	
	We may want to make a family alignment (excluding subfamilies
	33 and 34, the MUTT_HAEIN and MUTT_ECOLI sequences), to get a
	more precise secondary-structure prediction.  We may want to
	even narrow it to subfamilies 9 through 25, though that is
	less clear. 
Tue Jun  6 13:35:30 PDT 2000
Redid 2ry prediction with new neural net.

Fri Jun 23 10:56:10 PDT 2000 Melissa Cline

	Okay, let's see about finalizing one of these alignments!
	First off, which alignments are starting with a good score?  
		1tum-T0090-fssp-global.pw 	-9.20	weak
		1tum-T0090-global.pw.		-37.00	promising
		1tum-T0090-local.pw			-40.50	also promising
		1tum-T0090-vit.pw			-40.89	""	""
		T0090-*						< -150	un-credible: 1tum got in the
											training set?
		noseed-1tum-T0090-global.pw	-82.04	looks better...
		noseed-1tum-T0090-local.pw	-83.14	
		tuneup-1tum-T0090-global.pw	-114.76
		tuneup-1tum-T0090-local.pw	-112.46


Mon Jun 26 09:52:02 PDT 2000
Remade 2ry predictions


Mon Jun 26 16:40:44 PDT 2000 cline
Okay, continuing from campus, where I can *run* all of the tools...
Notes on the alignments:
	1tum-T0090-vit.pw 
		The alignment starts with an n-terminus insert of about 50
		residues.  Then, there's a stretch of 25 residues with two
		identical columns, covering the beta strands.  There's an insert
		of 5-6 residues; we're at the surface of the structure here,
		and there seems to be plenty of room.  Then, there's a 
		high-identity stretch of about 60 residues, covering a helix
		and binding with a metal ion.  Another insert, again at an exposed
		portion of the structure.  Then, there's a short aligned region
		(20 columns, 2 identical), followed by a large c-terminus insert.
		The last helix in the structure is not aligned to: the alignment
		ends with a gap of about 20 residues.

	1tum-T0090-local.pw
		same as above, except for two minor changes.  The C-terminal
		gap starts about four residues later, at the start of the helix
		rather than just before the start.  At the n-terminal, the first 
		residue or two is not aligned.  Other than that, the alignments
		are identical.

	1tum-T0090-global.pw
		Most of the alignment is the same as the top one.  Differences
		are as follows.  The c-terminal helix is aligned to, though the
		alignment of the helix shows no identitical columns and several
		unlikely substitutions.  I'd be inclined to go with the local
		alignment, or check the posterior decoding cost in the helix.
		The other difference is that rather than starting with a long
		n-terminal gap, the alignment starts with one residue
		aligned followed by a long interior gap.  Again, the viterbi
		alignment is probably right.

	1tum-T0090-fssp-global.pw
		Quite a bit different from the top alignment, but looks to be
		much higher in identity.  However, it's also got a lot more
		internal gaps, which is probably how it has such high identity
		and such a low score.  Anyway, here are the differences.

		At the N-terminal, it starts with 5 columns aligned (2 identical),
		then has an insert of 10 columns.  Then, there's about 10
		columns aligned (4 identical), followed by an insert of about 30
		characters.  Then, there's another 10 aligned (5 identical), followed
		by an insert of about 30 characters.  These inserts are at the
		surface but very close to secondary structural elements, which makes
		this whole alignment region a bit shady.

		From residue 26 or 27 in 1tum, they align the same for a while,
		up to guide sequence residue 70.  Then, the fssp alignment
		inserts 5 residues right before the start of a strand (right on
		the surface, plenty of room to grow), aligns 5 residues with
		one identical, inserts 20 residues (again, plenty of room there
		on the surface), and then aligns everything except for the first
		turn in the last helix.  The region covering that last alignment
		segment seems high in identity, and could be a good alternative to
		the end of the other alignment - if its posterior decoding cost
		is reasonable.

	noseed-1tum-T0090-global.pw
		Starts off like the viterbi alignment.  Very minor differences
		up to about template column 75.  Then, there's a two-residue insert
		where a P and G are stuck into the middle of a beta strand.  Since
		there's a beta turn right after this, right where the other alignment
		had them, I like the other alignment better here.  Then, there's
		about 8 residues and a 4-residue insert at a surface loop.  The 
		viterbi alignment puts a big insert about three positions later.  
		After that, they agree.  

		Based on what I see, the viterbi still looks better.

	noseed-1tum-T0090-local.pw

		Just like the previous alignment, except it stops the alignment
		not at the c-terminal helix but at the segment of loop immediately
		adjacent to that helix.  Before I say if this is a good thing, I'd
		like to see the posterior decoding cost in this region.

	tuneup-1tum-T0090-global.pw
		Only minor differences from the viterbi alignment until about template
		column 95.  Then, where the viterbi has a short insert, this one
		keeps going for a while, with a small insert in a position where 
		there's a surface beta turn in the structure - and plenty of room to 
		grow.  To the end of the alignment, it's shifted from the viterbi
		alignment by 4 residues.  Neither alignment looks clearly better.

	tuneup-1tum-T0090-local.pw
		Almost identical to the global version (above).  Just missing a
		couple positions at the very end.

Wed Jun 28 08:48:08 PDT 2000	cline
	Now, I'm setting out to take a look at some of the alignments from above
	in terms of their posterior decoding column cost.  The commands executed
	are shown, and all commands were executed in the directory 
	pce/casp4/1tum/cline

	1tum-T0090-vit.pw 
		rebuild-align ../../T0090.seq 1tum/nostruct-align/1tum.t2k-w0.5.mod \
					1tum/nostruct-align/1tum.t2k.a2m.gz test -viterbi 1 -sw 2
		gzip test.a2m
		(note: here I used measure_shift to verify that in test.a2m.gz I
		 reproduced the alignment of 1tum-T0090-vit.pw.a2m.gz.  I did.)
		build-trimming-info -align test.a2m.gz -target T0090 \
				> 1tum-T0090-vit.pw.pdoccost
	
		On inspection of 1tum-T0090-vit.pw.pdoccost, all posteriors are very
		strong.  The only two that are marginal are for the first two residues
		aligned.

	1tum-T0090-local.pw
		rebuild-align ../../T0090.seq 1tum/nostruct-align/1tum.t2k-w0.5.mod \
				1tum/nostruct-align/1tum.t2k.a2m.gz test -adpstyle 5 -sw 2
		gzip test.a2m.gz
		build-trimming-info -align test.a2m.gz -target T0090 \
				> 1tum-T0090-local.pw.pdoccost
		(used measure_shift to test that we rebuilt the proper alignment)

		The only position that's possibly questionable is the first aligned,
		with a cost of 1.08.  Second and third are slightly on the high side.
		All this meas is if there's an unreliable piece of this alignment,
		that's it.

	1tum-T0090-global.pw
		rebuild-align ../../T0090.seq 1tum/nostruct-align/1tum.t2k-w0.5.mod \
			1tum/nostruct-align/1tum.t2k.a2m.gz test -adpstyle 5 -sw 0
		gzip test.a2m
		measure_shift -r test.a2m.gz -c ../1tum-T0090-global.pw.a2m.gz \
			-ta T0090 -te 1tum 
		(note: checked out fine)
		build-trimming-info -align test.a2m.gz -target T0090 \
				> 1tum-T0090-global.pw.pdoccost

		Here, the shakiest parts of the alignment are the first 2-3 positions
		at the beginning and the last 2-3 positions at the end.  The rest looks
		excellent.

	1tum-T0090-fssp-global.pw
		rebuild-align ../../T0090.seq 1tum/struct-align/1tum.fssp-w0.5.mod \
			1tum/struct-align/1tum.fssp.a2m.gz test -adpstyle 5 -sw 0
		gzip test.a2m
		measure_shift -r test.a2m.gz -c ../1tum-T0090-fssp-global.pw.a2m.gz \
			-ta T0090 -te 1tum
		(checks out)
		build-trimming-info -align test.a2m.gz -target T0090 \
			> 1tum-T0090-fssp-global.pw.pdoccost
	
		This one is quite interesting!  There are many positions with 
		suspiciously high pdoc costs.  The beginning looks mostly like 
		crap.  In the middle of the alignment, where it looks okay, it's
		basically the same alignment as all of the above.  Then, there's a
		four-position gap, and a region that definitely looks like crap.
		Then, at the end of the alignment, the region scores pretty well with
		the following exceptions: the first position aligned doesn't look
		great, and the first 5 of the last 10 positions look shaky (the last
		5 positions look much better).

	tried out a new alignment: 1tum-T0090-fssp-global-fw0.5.a2m.gz
		rebuild-align ../../T0090.seq 1tum/struct-align/1tum.fssp-fw0.5.mod \
			1tum/struct-align/1tum.fssp.a2m.gz 1tum-T0090-fssp-global-fw0.5 \
			-adpstyle 5 -sw 0
		build-trimming-info -align 1tum-T0090-fssp-global-fw0.5.a2m.gz \
			-target T0090 > 1tum-T0090-fssp-global-fw0.5.pdoccost

		The alignment is similar to, and not obviously better than the last 
		one.  

A look at the fssp alignment makes it clear why the sequence-based
alignments for this template are looking so much better than the
structure-based alignment.  The FSSP alignment for 1tum contains only
two sequences, 1tum and 1mut, and these sequences are nearly
identical.  Their structural alignment is no different than their
sequence alignment.  So, the fssp alignment really has only one
sequence.  The sequence-based alignment has a lot more information: it
has 1mut, plus it has many more homologs.  So, even though the idea of
aligning to that last helix (as shown in the fssp-based alignments) is
interesting, it's really not founded on much information.  
		
	
Wed Jun 28 15:00:06 PDT 2000 cline

Here's the gist of what Kevin, Christian, and I talked about when we talked
about this target.  We like 1tum as the fold (so far so good).  The non-fssp 
alignment looks good for the beginning, but we don't like that there's two
beta strands in the second half where it doesn't get anything good.  

The fssp-based alignment, even though it's not based on many
sequences, has some good features: greater identity around the active
site, stronger signal in the second half of the alignment.  However,
the first half of the alignment doesn't look so good.

In terms of consistency with the secondary structure prediction:
	There's a strong prediction of a strand from residues 59 to 66.
	In the FSSP alignment (for which we like the second half but not
	the first), it ends up in a loop.  In the 1tum-T0090-local alignment,
	(for which we like the first half but not the second), it looks great;
	it covers a loop that's next to a beta strand, and looks credibly like
	it might be a strand itself.  The 1tum secondary structure string
	shows a short strand there.

	Another beta strand is predicted with very high probability from 
	residues 72 to 77.  This is in the segment of the sequence following
	shortly after the last predicted strand.  The fssp alignment doesn't 
	align these residues. The t2k alignment aligns it to a part of the 
	structure where the rasmol window shows a loop but the secondary
	structure string shows a short strand.  The structure even puts something
	in between these two segments that could plausibly pose as a beta turn.

    Very strong helix prediction from residues 105 to 117.  Both alignments
	put the helix in this region.

	strong strand prediction from residues 141 to 147. FSSP does not align
	this region.  T2K aligns it to a strand.

All this supports the T2K alignment at the beginning (up to about 147), FSSP
at the end.  In addition, both alignments have an insert at around residue
150; FSSP puts that in the middle of a strand, while T2K puts it at a beta
turn (much better spot).

To try to combine the best features of the fssp and t2k alignment, we created
1tum-T0090-edited.a2m from 1tum-T0090-fssp-global.a2m.  Next, the goal is to
make sure it's consistent with the secondary structure prediction.

	Contraditions and consistencies:
		strand prediction in residues 59-66: mostly not aligned.
		But, the insert comes at a good spot in the structure.

		Strand prediction in residues 72-77: not aligned.

		Perhaps the other alignment is better for this section.  Saira?

		Strong helix prediction from 105-117 where all alignments put 
		a helix.  This region is nailed down.

		strand prediction at 141-148: covers a strand.
		
		helix prediction at 189-199: not aligned.  Had been aligned in
		the automated alignment, but we had a few datapoints against this
		region and figured it was an artifact of global alignment.


Wed Jun 28 22:03:19 PDT 2000 Kevin Karplus

The region we are uncertain about is

T0090	VKRTKPVLSFLASPGGTSERSSIMVGEVDATTASGIHGLA	
	CCEEEEEEEEECCCCCCCCEEEEEEEEECCCCCCCCCCCC
which we want to align with 
1tum		PQHFSLFEKLEYEFP
STRIDE		CCEEEEEEECCEETT
DSSP		LBLLEEEEEELLBLS
	
We can try to align with the first strand, leaving no gap at the beginning

T0090	VKRTKPVLSFLASPGGTSERSSIMVGEVDATTASGIHGLA	
	CCEEEEEEEEECCCCCCCCEEEEEEEEECCCCCCCCCCCC
1tum	PQHFSLFEKLEYEFP
STRIDE	CCEEEEEEECCEETT
DSSP	LBLLEEEEEELLBLS

or we can try to align the second strand

T0090	VKRTKPVLSFLASPGGTSERSSIMVGEVDATTASGIHGLA	
	CCEEEEEEEEECCCCCCCCEEEEEEEEECCCCCCCCCCCC
1tum			  PQHFSLFEKLEYEFP
STRIDE		          CCEEEEEEECCEETT
DSSP			  LBLLEEEEEELLBLS

Neither alignment strikes me immediately as more correct.
The second-strand alignment could be moved back 1-3 positions without
being any worse, so these possible alignments should be looked at.

Thu Jun 29 10:22:48 PDT 2000 Kevin Karplus

The alignment in 1tum/1tum-T0090-edited is probably the best for the
second strand aligmment:

T0090	VKRTKPVLSFLASPGGTSERSSIMVGEVDATTASGIHGLA	
	CCEEEEEEEEECCCCCCCCEEEEEEEEECCCCCCCCCCCC
			    : .    :..
1tum			PQHFSLFEKLEYEFP
STRIDE		        CCEEEEEEECCEETT
DSSP			LBLLEEEEEELLBLS

since it conserves two residues and has some other reasonable substitutions.

Aligning to the first strand gets no conserved residues and only one
or two reasonble substitutions.