Thu Jun 22 11:40:02 PDT 2006
T0343
Make started Thu Jun 22 11:41:02 PDT 2006
Running on camano.cse.ucsc.edu

Thu Jun 22 11:48:57 PDT 2006 Kevin Karplus

Crystallographers tell us this forms a DIMER and that there are no SS
bonds.
We need to do mulitmer submission as well as monomer.

BLAST gets no good hits in PDB, but there are some long fragments.

ORFan!!

Thu Jun 22 15:36:30 PDT 2006 Kevin Karplus

No good hits with the HMMs either.  Best hit is no better than chance:
1p90A with E-value 6.

Fri Jun 23 05:25:45 PDT 2006 Kevin Karplus

The top alignments are not in complete agreement about the structure,
though all have a helix against a small sheet.

The try1-opt2 model seems to be based on alignment to 1p90A (judging
from the log file), but the superposition seems best with the
alignment to 2bkrA---this may be a difference in which alignment was
chosen, more than a difference in the templates.

The try1-opt2 model is plausible, but I think that this target will
need more work.  It's being an ORFan makes all the usual predictions
(local structure, residue-residue contacts, HMM search, ...) rather
suspect.

Fri Jun 30 16:29:15 PDT 2006 George Shackelford

I'm going for the Usual Suspects from 'alphabetmatch' looking at both
str2 and near-backbone. Since the local structure predictions are
suspect, these selections are likely wrong. But it's a start.

id      score   per residue
5S      10N     10N
1irw    155.103 1.49137
1ycc    154.794 1.4884
2b4zA   152.795 1.46918
1ccr    152.642 1.46772
2b10B   152.167 1.46315
1ytc    152.005 1.46159
2b11B   151.925 1.46081
5cytR   151.61  1.45779
1ql3A   149.436 1.43688
1b0nA   149.166 1.43429
1co6A   149.085 1.43351
1ppjF   148.806 1.43082
1bkrA   148.526 1.42814
1aa2    148.249 1.42547
2b0zB   148.232 1.4253
1i8oA   148.176 1.42477
1k3sA   147.69  1.4201
3c2c    147.557 1.41882
1bccF   147.497 1.41824
1qn2A   147.265 1.41601

I've run extra-alignments and read-alignments.
The usual approach of splitting these ids into two groups of ten,
and building two tries with them as included read-alignments-scwrl.
So I have done so with try2, and try3.

I made an error in try2 that included reading all fragments. If the
results are useful then so be it. I have corrected the issue and try4
represents the best ten above.

Sat Jul  1 17:44:13 PDT 2006 George Shackelford

I looked over the results this morning and found that try2 with its
faulty include and the initial try1 scored better than try3 and try4.
This is not encouraging but I did do my search using str2,near for
matching. I've decided to do a new search using ehl2 as before. This
did require some software changes to my search code. I have completed
them and the results are as follows:
# program: alphabetmatch
# George Shackelford
#
# Target: T0343
# length: 104
# length range: 96 to 114
# alphabets used:
#   ehl2
#
id      score   per residue
5S      10N     10N
1k3sA   183.671 1.76607
2irfG   178.011 1.71165
1kte    177.686 1.70852
1awcA   177.241 1.70424
1kafA   176.444 1.69657
1gmxA   176.313 1.69532
1ihnA   175.811 1.69049
1d0qA   175.474 1.68725
1dcoA   174.269 1.67567
1dw0A   173.71  1.67029
1f9mA   173.503 1.6683
1iibA   173.441 1.6677
1ytc    173.418 1.66748
1l8rA   173.412 1.66742
1ccr    173.118 1.66459
1vcbB   173.032 1.66377
1g2rA   172.703 1.66061
1mfiA   172.482 1.65848
1gxqA   172.423 1.65791
1i8oA   172.326 1.65698

 1k3sA 2irfG 1awcA 1d0qA 1mslA 1ytc 1g2rA 1gh6A 1hulA 1dcoA 1kte 1ccr 1ycc
1l8rA 1gxqA 1f9mA 1irw 1gmxA 1dw0A 1co6A

Ok, we repeat our usual approach of ten and ten. Now we do
try5 and try6.

Sat Jul  1 22:30:10 PDT 2006 George Shackelford

try5 looks unreal, like natural bridges. The weights for wet/dry need to
be reset. Try6 is better but way too foamy. I'm going to take out 1kafA,
the basis for try5 and rerun with normal wet/dry weights as try7. I want
something more packed. I'm going to do the same for try6, removing 1mfiA,
to get try8.

Essentially try7 and try8 are fresh.

try7 and try8 running on shaw

Sun Jul  2 09:37:22 PDT 2006 George Shackelford

Try7 and try8 look better. Try8 scores well. I note that the target is a
dimer - that makes a difference with burial exposure.

That makes me wonder if the current definitionn of "burial" needs to take
the dimer configuration into account. The interface appears on the surface
but its composition seems to match the interior. This can confound the
predictions. Perhaps we need a different region representing the interface
which would help the cost and prediction of other dimer interfaces.

With burial in mind, I may try to do a match using ehl2 and burial. The
burial alphabet represents a scaler. There may be a better way to
calculate the cost that incorporates that. I don't have any insight other
than mean and variance. Furthermore the sequence respresentation of the
actual proteins only show one exact value.

# program: alphabetmatch
# George Shackelford
#
# Target: T0343
# length: 104
# length range: 96 to 114
# alphabets used:
#   ehl2 burial
#
id      score   per residue
5S      10N     10N
1k3sA   211.902 2.03752
2irfG   210.828 2.02719
1awcA   205.825 1.97908
1d0qA   205.1   1.97212
1mslA   204.607 1.96737
1ytc    203.683 1.95849
1g2rA   203.385 1.95562
1gh6A   203.224 1.95408
1hulA   202.849 1.95047
1dcoA   202.575 1.94784
1kte    202.372 1.94588
1ccr    201.998 1.94229
1ycc    201.988 1.94219
1l8rA   201.292 1.9355
1gxqA   200.293 1.92589
1f9mA   200.1   1.92404
1irw    199.806 1.92121
1gmxA   199.422 1.91752
1dw0A   199.4   1.91731
1co6A   199.184 1.91523

Interesting. The scores are higher with the new value included. I see many
repeats (including the top two) and one from the first list using str2
(1ycc) that did not appear on the ehl2 list.

I am wondering why the top two score well but don't figure as templates
for models. I need to look at the actual structures.

I took a look at 1k3sA. I like it. I'm going to a run with just 1k3sA
included and see what I get. I'm going to turn constraints back downto 10
so they don't interfere too much. This is a test.

try9 running on shaw

I'm monitoring the best scores. Breaks started out quite bad ~.5 but
dropped quickly to almost zero.

Sun Jul  2 22:42:00 PDT 2006 George Shackelford

try9 seems to be trying to form a sheet but fails. If it had succeeded in
forming the sheet, the score would have been the best. I followed the pool
backwards and found (as far as I can understand) that the basis was a
small local alignment with 1k3sA or that and a global alignment. I could
try and force a sheet between the two strands (starting with 42-46 rather
than the 42-47 which makes a bad hairpin), but it appears that I may as
well enforce a sheet that involves a third strand and see if I can get
some kind of match with something in this batch. If I can, I'm going to
work on that.

I'd like to see if these matches come from some common family ala SCOP or
CATH.

Wed Jul 12 13:01:01 PDT 2006 George Shackelford

Request Id:  1093721343867021157,   	 chain A 
We need to form dimers from what we have. I took the try1-opt2.pdb and
used VAST to find the following neighbors:
1xebA -- dimer
1gheB
2yreB
2yk3F
2vhsA
2vkcB

Looking at 1xebA, it forms a sandwich. It makes me think that the other
models might actually form decent dimers as well. I need to do some more
searches.

using try5 as well.
Request Id:  307886608391903087,   	 chain A

Wed Jul 12 20:25:19 PDT 2006 George Shackelford

So far I have:
dimer-try1-1xeb.pdb
dimer-try5-1kaf.pdb

need to put in super-impose.

let's get try2
Your VAST Search job was submitted at 07/12/2006 23:33:23(EDT). Request
ID: 1106464373513935690
Damn. 4 hits and none are dimers.
I'm going to retry searching all PDB.
Your VAST Search job was submitted at 07/12/2006 23:42:57(EDT). Request
ID: 1094361438954204740
Expected to take 30 minutes.
I think I'll not get this one in...

Approaching 9pm. I'm going to work somemore on try2, but only two
submissions for now

Wed Jul 12 21:12:00 PDT 2006 Kevin Karplus

I don't see any list of monomers to submit here with notes about where
they came from!

superimpose-best.under has only the dimers (which belong in
dimer/superimpose-best.under, not in the monomer directory)

I'm going to have to guess that George intended the commented-out
order for the monomers:
 ReadConformPDB T0343.try2-opt2.pdb
 ReadConformPDB T0343.try5-opt2.pdb
 ReadConformPDB T0343.try1-opt2.pdb
 ReadConformPDB T0343.try8-opt2.pdb
 ReadConformPDB T0343.try7-opt2.pdb

I'll move try1 ahead of try5

For some reason, George has not been generating the
gromacs0.repack-nonPC files (related to his not generating
grep-best-rosetta files?). He has one for try10, but not for earlier
runs.

Wed Jul 12 21:29:42 PDT 2006 Kevin Karplus

I did make T0343.do$x for 1 through 9, and looked at grep-best-rosetta.
Rosetta likes best try3, try2, try5, try6, try1, try4, try8, try7 (all
gromacs0.repack-nonPC models).

The try1 costfcn likes best try2, try1, try5, try8, try10, try7
Unconstrained likes best try2, try5, try1, try8, try7 (the source for
George's order?)

I made a "secondary.costfcn" which is like unconstrained, but has
constraints from just the dssp-ehl2 constraints.
It orders the models try2, try8, try5, try7, try1, try3, try6

I don't have time to look at all the models now, but it seems like the
5 George left in the comments are a reasonable set, so I'll content
myself with moving try1 up one notch.

The dimers George made have not been optimized in a dimer context, but
they hardly interact, so are unlikely to be correct dimers from the
crystal.
I'll submit them anyway, though I think they are still junk.


Wed Jul 12 21:47:59 PDT 2006 Kevin Karplus

Monomers
 ReadConformPDB T0343.try2-opt2.pdb
 ReadConformPDB T0343.try1-opt2.pdb
 ReadConformPDB T0343.try5-opt2.pdb
 ReadConformPDB T0343.try8-opt2.pdb
 ReadConformPDB T0343.try7-opt2.pdb
submitted with comment


    For this preliminary submission, I do not really know how the models
    were generated or selected, as no concise summary was generated, and
    the notes in the lab notebook README file were insufficient for me to
    figure out what was done.

    George apparently used an undocumented program "alphabetmatch" that
    does something to generate more distant fold recognition templates.
    Other than that, I can't figure out what he did.

    Model 1 is try2-opt2

    Model 2 is try1-opt2, the fully automatic method.

    Model 3 is try5-opt2

    Model 4 is try8-opt2

    Model 5 is try7-opt2

--------------------------------------------------
Wed Jul 12 21:56:39 PDT 2006 Kevin Karplus

The two initial dimers submitted.

Thu Jul 13 00:25:18 PDT 2006 George Shackelford

Unfortunately I did not know that submission of the monomers was required
along with the dimers. As a result, I made no list of the the five best
monomers. Kevin correctly guessed that the five monomers that were
commented out were the ones I would have suggested as monomer submissions.
He also guess correctly that they were based on unconstrained scoring. I
agree to his moving try1 up a notch; it looks better than those below.

Had I known to submit the monomers as well, I would have also provided a
history and explanation of their source.

Fri Jul 14 12:12:36 PDT 2006 George Shackelford

This is due on Sunday noon. Time to make the rest of the dimers. I
have dimers for Model 2 / try1 and Model 3 / try5.

For try2, I am using VAST results 1094361438954204740. I have found that
1e80 chains B and D may dimerize. I'm going to try them and see what I
get.

The results is a dimer but the chains are too far apart. I am going to try
1vm6 which is further down the list of the p-value sorted VAST hits. 

Ouch! Too close. 1vm6A covers only the first domain in T0343. I think I
need one that touches on both. I'm going to return to this later when I've
been briefed on using Protein Shop to align for dimers. The 1vm6 alignment
is a good place to start.

Focusing on the last two. try8 is next.
Your VAST Search job was submitted at 07/14/2006 17:00:28(EDT). Request
ID: 77855718860279234

May as well start a job for try7.
Your VAST Search job was submitted at 07/14/2006 17:03:12(EDT). Request
ID: 457155349145852532

Fri Jul 14 23:31:05 PDT 2006 George Shackelford

Earlier this afternoon Chris helped me get a handle on doing dimers.
One interesting note: he had stated that you couldn't load the two
chains from one file but had to load two different files. I found that
the aligned version of try2 could be loaded and the two chains could
be manipulated separately without the troubles he had experienced. I
think that Chris has been working with those dimer files that treat
the dimer as one long chain; I had two chains already. When you have
two chains, you find you have to use the selection popup to save to
two different files. The two files can be combined but simply
renumbering won't get the job done. I think I'll have a renumber and
label chains program so to bypass having to use undertaker and
specifiying where the break takes place (remember I actually have two
chains). This may only be useful where we're not doing a lot of
polishing of placement; frankly I have no idea if my dimer alignments
are even close.

[Sat Jul 15 09:25:15 PDT 2006 Kevin Karplus
   There is already a script for making a single chain into multiple
   chains and a make target for using it.
   
   If you have a multimer foo.pdb.gz, then "make foo.unpack.pdb.gz"
   will make the multi-chain version, assuming that you have
   MONOMER_LENGTH defined in your Makefile.
   
   You can also call unpack-multimer directly if you need to.
]


the two chains I have aligned in ProteinShop are currently saved as
try2A.pdb and try2B.pdb. I'll be finishing with them on Saturday.

This evening I've been using the VAST matches and trying to solve
try8. I have found a few possible dimers that may work:

1f1c - alignment has way too much space. I could use ProteinShop to close
the gap.
1nnq - alignment is way too close. It has the two overlapping a bit.


Sat Jul 15 09:27:47 PDT 2006 Kevin Karplus

Please have a list of predictions (one for monomers, another for
dimers) by early evening, so that I can do the submissions---I had to
stay up way too late last night waiting for people to write up their
notes, and I can't stay away that late tonight.

Fix your .cshrc file so that it doesn't use "setpaths"
(use an explicit list: /bin /usr/bin /usr/X11R6/bin /usr/local/bin)
and you can make decoys/grep-best-rosetta without messing up your KDE
initialization. 

Of course, the model rosetta likes best
(decoys/T0343.try3-opt2.gromacs0.repack-nonPC.pdb.gz) is truly
horrible, so this might not be that worthwhile.

I'm assuming that the monomers we submitted as the preliminary tries
are still the ones we want, and that only dimer optimization is being
done now?

Sat Jul 15 15:15:15 PDT 2006 George Shackelford

Yes, I'm just doing dimer optimization now, if you could call it that.
Mostly I'm just trying to get the monomers in reasonable positions.

Continuing to work on try8 I find that:
1pzw - actually consists of one chain though is seems to dimerize! I can't
recall the name of the database that has extra dimerizations. pqb?? No,
PQS. Found it...
I was able to get the dimerized version of 1pzw from PQS, and I gzipped it
as zpzw.pdb.gz and added it to our PDB database. It did succeed in making
try8 into a dimer but the parts overlapped.
1pzw - too close together

[Sat Jul 15 20:13:03 PDT 2006 Kevin Karplus
We would normally put 1pzw.mmol from pqs in the 1pzwA subdirectory
]

Gotta try the these next two:
1y1a - not good, but close at least.
2pol - useless

Sat Jul 15 18:20:37 PDT 2006 George Shackelford

I finally decided to use ProteinShop and see if I could adjust
try8-1fic.pdb. The program works on my home computer! Yes! I generated
try8.pdb using it. Not perfect but about as decent as I can get it. I
don't think I'll spend as much time with try7 as I have on try8.

So far I have:
dimer-try1-1xeb.pdb
dimer-try5-1kaf.pdb
try2.pdb
try8.pdb

Now for try7. First I'll get a decent copy by using VAST
457155349145852532.

There are only five matches for the whole chain! hope I can get a dimer to
start with.
2b99 is part of a pentamer. We start with that.

That worked pretty good. The two chains are close (actually a bit too
close, I think) and I can use ProteinShop to finish the positioning.

Totally wrong I think, but try7.pdb is ready. Let's turn it over to Kevin.

The monomers:

 ReadConformPDB T0343.try2-opt2.pdb
 ReadConformPDB T0343.try1-opt2.pdb
 ReadConformPDB T0343.try5-opt2.pdb
 ReadConformPDB T0343.try8-opt2.pdb
 ReadConformPDB T0343.try7-opt2.pdb

The dimers:
I changed the order to try and reflect my opinion of the best dimers:

dimer-try1-1xeb.pdb
try2.pdb
dimer-try5-1kaf.pdb
try8.pdb
try7.pdb

The derivations remain the same.


Sat Jul 15 20:14:35 PDT 2006 Kevin Karplus

OK, the monomers are what we already submitted, so nothing new to do there.

Sat Jul 15 20:23:17 PDT 2006 Kevin Karplus

Dimers submitted with comment

    Dimer models have not been optimized in the dimer context, just
    monomers superimposed on a dimeric PDB file.

    Dimer 1 dimer-try1-1xeb, which superimposes try1-opt2 on 1xeb.

    Dimer 2 is try2, which superimposes monomer try2-opt2 on 1vm6, then
	    adjusted with ProteinShop.

    Dimer 3 dimer-try5-1kaf, which superimposes try5-opt2 on 1kaf.

    Dimer 4 is try8, monomer try8-opt2 superimposed on 1fic then adjusted
	    with ProteinShop.

    Dimer 5 is try7, monomer try7-opt2 superimposed on 2b99, then adjusted
	    with ProteinShop.

    Note: no optimization was done in the dimer context, though it would
    have been had we had more time and more confidence in the monomer
    predictions. 

Sat Jul 15 20:27:24 PDT 2006 Kevin Karplus

I looked at the dimer models and they are all far too loose have any
hope of being right---slight clashes would have been preferable to
such loose associations.

Oh well, there isn't time to do any better, and we still have 45
targets to predict.  Let's hope we do better on another target.