15 June 1998 Kevin Karplus Best score 1plq/1plr (FSSP rep 1plq) based on the target model, -5.65 Based on library model, best score is 1eaf -4.37 Protein is translation initiation factor, so the fact that 1plq is DNA-binding is encouraging. Another high-scorer (1gotG) is also nucleotide binding. 25 June 1998 Summing both ways, the best scores are chain score fssp rep other names 1plq -5.650 1plq 1plr 1gotG -5.360 1tbgE 1eaf -4.370 1eaf 1pmaA -3.860 1pmaA 1pma[CDEFGHIJKLMNO] 2pcdM -3.790 3pchM 1pud -3.420 1pud 1wke, 1wkd, 1wkf 1pex -3.310 1gen 2pcdA -3.300 3pchM 1ah6 -3.270 1ah6 1ah8A, 1ah8B 3ladA -3.070 3ladA 3ladB The alignment t63-1plq-global.pw is not compact in 3D. The alignment t63-1plq-global.pw mainly matches one extended loop on the surface of 1plq, together with on strand of a sheet---not a very promising alignment for fold recognition. The 1plq-t63.pw matches almost nothing. The most promising alignment to 1plq is 1plq-t63-global, which is moderately compact. Unfortuantely, some of the holes in the alignment are big chunks in the middle of beta sheets. 9 July 1998 Christian Barrett K42 is posttranslationally modified to the unique amino acid hypusine. It is the lysine in the sequence "T GKHGSA". Thus this position in a template structure should be on the surface. Ref: J Biol Chem 1986 Nov 5;261(31):14515-14519 Eukaryotic initiation factor 4D. Purification from human red blood cells and the sequence of amino acids around its single hypusine residue. Park MH, Liu TY, Neece SH, Swiggard WJ 10 July 1998 Kevin Karplus Ran blast and double-blast a few days ago. Blast finds 1myn, though the E-value is so large that this is almost certainly not an interesting match. Double-blast finds 1pex through intermediate GP:S72024_1_20:133, though the score (-4.4) is in a range where false positives/true positives > 25 (hard to pin down, since the scop-thresh-rdb curve wasn't taken out that far). A remote search (with MUCH looser thresholds) finds chain score fssp rep 1pud -4.520 1pud 1wk[def] -4.520 1pud 1bak -3.350 1bak 1acc -3.070 1acc [12]tmy -3.000 1tmy [34]tmy[AB] -3.000 1tmy 1gotG -2.560 1tbgE 1cg2[ABCD] -2.460 1cg2A 1pex -2.290 1gen Investigating joint models for 1pud, 1bak, 1acc, 1plq, 1tbgE, 1eaf, 1pmaA, 1pex 1pud is a transferase (trna-guanine transglycosylase (tgt)) (and the 1wk* are mutants of it) 1bak is a transferase ( g-protein coupled receptor kinase 2 fragment (grk-2, beta-adrenergic receptor kinase 1, beta-ark 1) Mutant) 1acc is a toxin (anthrax protective antigen (pa)) 1plq is DNA-binding (Proliferating cell nuclear antigen (pcna)) 1tbgE is GTP-binding/transducer (transducin fragment (guanine nucleotide-binding protein g)) 1eaf is dihydrolipoamide acetyltransferase. 1pmaA is a protease. 1pex is Collagenase-3 (Mmp-13 10 July 1998 The best scores for the non-self sequence in pairwise alignments are 1plq/t63-1plq-global.pw.dist:1plq 258 -8.91 -9.18 1pex/t63-1pex-global.pw.dist:1pex 192 -5.91 -8.30 1pmaA/t63-1pmaA-global.pw.dist:1pmaA 221 -6.44 -7.55 1pud/t63-1pud-global.pw.dist:1pud 372 -5.46 -6.85 The t63-1plq-global.pw alignment is not compact in 3d, and has few conserved residues The t63-1pex-global.pw alignment looks more promising. There are some holes in beta sheets, but there are a fair number of conserved residues, and some of the conserved residues cluster. This looks more promising, though the alignment may need some editing to look more believable. The t63-1pmaA-global.pw alignment has a lot of residue identities, but is only moderately compact. Some of the gaps occur in the middles of helices. The t63-1pud-global.pw alignment is quite compact, and matches the C-terminus domain of 1pud. There are not many residue identities, but the ones there are cluster nicely. This looks quite promising, despite the low score. Unfortunately, it has all helices, and we're predicting mostly strands. 10 July 1998 I'm going to discard (move to old/) all the t63-... alignments, and remake them using the t63.remote_4 alignment as the base, since it seems to be fairly clean, without junk sequences. Redoing with the remote alignment changes the top-non-self-scoring alignments: 1pex/t63-1pex-global.pw.dist:1pex 192 -6.28 -9.72 1pud/t63-1pud-global.pw.dist:1pud 372 -5.92 -8.63 1bak/t63-1bak-global.pw.dist:1bak 119 -3.02 -5.60 1pmaA/t63-1pmaA-global.pw.dist:1pmaA 221 -3.29 -5.45 1acc/t63-1acc-global.pw.dist:1acc 665 -6.01 -4.79 1plq/t63-1plq-global.pw.dist:1plq 258 -5.28 -4.48 1eaf/t63-1eaf-global.pw.dist:1eaf 243 -0.43 -2.71 The new t63-1pex-global alignment looks pretty good--compact and with gaps in places that aren't too ridiculous, with reasonable residue identity. Secondary structure prediction matches moderately well. The K42 residue is on the surface (and is conserved). 1pex is one of 4 instances of the the 4-bladed beta propeller, and this alignment covers two of the blades. Question: how likely are we to get this unusual arrangement of beta sheets as the result of dimerizing two halves?? There is a modest structural relation between 1gen (the 1pex fssp-rep) and 1tbgA, another part of the transducin complex that we matched weakly with 1gotG. 1tbgA is a 7-bladed beta propellor, but 1tbgE, which matches 1gotG is an all-helical protein that wraps around 1tbgA. The t63-1pud-global alignment is also compact, but has all helices, even though mostly strands are predicted. The t63-1bak-global alignment is compact, and gets nice alignment of the beta strands to the predictions, except for the long helix in 1bak. K42 is not aligned in this prediction. The t63-1pmaA-global alignment, though moderately compact, has a very poor alignment of secondary structure elements. The t63-1acc-global alignment is a bit spread out, and has poor alignment of secondary structure elements. The t63-1plq-global alignment is very spread out. The t63-1eaf-global alignment is only moderately comapct, and has fairly poor alignment of secondary structures. Right now, the 1pex and 1bak alignments look most promising. I did a little editing of the 1bak alignment to create t63-1bak-hand.a2m, which looks pretty good to me---but does it have any real significance?? I looked for "hypusine" in Entrez, and all the hits seemed to be Initiation Factor 5A or deoxyhypusine synthase. There were more initiation factors than were found by t98_6 or even remote_4, so it may be worthwhile to try an alignment of the Entrez neighbors. July 10 1998 Christian 1pex ---- 1pex is one of the leading candidates and it contains hemopexin domains. In 1pex/cbarrett I gathered all of the 124 known hemodomain sequences from http://www.infobiogen.fr/srs5bin/cgi-bin/getDomo.pl?P02790 and built a weighted model from these sequences. t63 gets a very low score (Simple -3.87, Reverse 0.13) with this model. This indicates to me that t63 is probably not related to the hemopexin domain. The caveat here is that the 124 domain sequences were found automatically by BLAST and BLAST2, so it's conceivable that searching nrp with this model may bring in remoter, bridging homologs. Other structures that contain this domain are: 3D 1CGE; 1CGF; 1CGL; 2TCL; 1FBL; 1RTG; 1GEN; 2SRT; 1SLM; 1SLN; 1UMS; 1UMT; 3D 1JAN; 1JAO; 1JAP; 1JAQ; 1KBC; 1MMB; 1MNC; 1PEX; 1HXN; Not in our library: 1cge 1cgf 1cgl 2tcl 1fbl 1rtg 1gen 2srt 1sln 1ums 1umt 1jan 1jao 1jap 1jaq 1kbc 1mmb 1mnc [CORRECTION: 13 July 1998. 1gen is in our library. Most of these do NOT have the HEMOPEXIN domain, but are just the CATALYTIC domain of collagenase. ] 1bak ---- 1bak is the Pleckstrin homology domain of ARK1_HUMAN Other structures containing this domain: 3D 1BTK; 1DYN; 2DYN; 1IRS; 1PLS*; 2HSP*; 1HSQ*; 1DJG; 1DJH; 1DJI; 1DJW; 3D 1DJX; 1DJY; 1DJZ; 2ISD; 1MAI*; 1QAS; 1QAT; 1DRO*; 1BTN*; 1MPH (* indicates in our library) Not in our library: 1btk 1dyn 2dyn 1irs 1dj[ghiwxyz] 2isd 1qas 1qat 1mph 11 July 1998 Kevin Karplus Did an Entrez search yesterday for hypusine-containing proteins (based on keyword "hypusine" and protein neighbors information). Put the results in hypusine-initiation.seqs. I've made an alignment containing all these plus the sequences in t63.remote_4 using target98-with-param to realign them. This alignment seems to classify the sequences better, separating IF5A_SCHMA from the EIP family (which does not share the conserved site where the hypusine modification occurs). I took the resulting alignment and trimmed off the subfamilies that did not have the conserved hypusine site. I also removed a lot of the repetition, and made t0063 be complete. The resulting alignment is hypusine-trimmed.a2m. Retraining this (which introduces sequence duplication!) gives hypusine-trimmed-retrain.a2m, in which the first 12 residues of t0016 are again unaligned. I put back the whole t0063 sequence, so that the alignment could be used for secondary structure prediction. This prediction has shorter helices than the one from t63.remote_4. 13 July 1998 Christian (PMID is the PubMed ID from Entrez) From the abstract of: PMID: 7578077 ...several experiments confirm this protein to be monomeric. It is further shown that eIF-5A have well-defined secondary structure. Both the far-UV circular dichroism spectrum as well as secondary structure predictions using different algorithms suggest this protein to have predominantly beta-sheet structure. Two plausible models for the packing of the secondary structure elements are presented. In contrast to the main form, all three minor isoforms of eIF-5A are characterized by acetylation of the epsilon-amino group of lysine at position 47. The minor isoforms are distinguishable by their state of modification of the lysine residue at position 50... (I'll pick up this paper to look at the proposed packing models). Here is another paper that defines the minimum domain of the eIF-5A precursor protein required for enzymatic deoxyhypusine synthesis as Phe30-Asp80: PMID: 7929297 "A series of truncated forms of the eIF-5A precursor protein generated by expression in E. coli of recombinant deletion constructs from the human eIF-5A cDNA were tested. Truncation of up to 9 amino acid residues (Met1-Thr9) from the NH2 terminus or 64 amino acid residues (Leu91-Lys154) from the COOH terminus did not significantly decrease the substrate reactivity, but removal of an additional 10 amino acids from either side did. Deletion of 34 amino acid residues (Met1-Lys34) from the NH2 terminus or of 84 amino acid residues (Asp71-Lys154) from the carboxyl terminus caused complete loss of substrate property." While previous evidence suggests that t63 exists as a monomer, this paper reports its existence as a dimer and may have the ability to form higher order polymers: PMID: 1900436 Eukaryotic initiation factor 5A (eIF-5A, formerly known as eIF-4D) purified from human erythrocytes has been found to have a monomeric molecular weight between 17,500 and 18,000. In this study, using exclusion chromatography and analytical ultracentrifugation, we demonstrate that eIF-5A normally exists as a dimer in solution and appears to be capable of undergoing reversible association to form higher polymers. This paper has determined that the hypusine posttranslational modification is essential for t63 to bind RNA: PMID: 9285100 I'll go take a look at this paper, which reports the crystallization of eIF5A: PMID: 9336851 Mon Jul 13 13:32:02 PDT 1998 Kevin Karplus The prediction hypusine-trimmed-retrain.2d is primarily beta, as suggested in the paper 7578077 (We need the author's names for the comments on the prediction!). This pretty much rules out the 1pud prediction. The existence of t63 as a dimer makes the 1pex prediction a little less frightening, since dimerization could complete the 4-bladed propellor. Looking at 1pex more: The "hemopexin-like domains" in SCOP are 1hxn, 1gen, 1rtg, 1fbl, and 1pex. FSSP has 1gen, 1rtg, 1pex, and 1hxn in the 1gen.fssp file