6 August 1998 blast gets no hits (to 1c53 +1.03) double blast gets modest hit: 1bta, 1btb -5.878 t79.t98_6 gets a few good hits: 1smt[AB] -7.250 1ne[qr] -6.930 1af7 -6.630 1l[cd]A -6.290 1yge -6.240 2sblB -6.240 1lqc -6.220 1mek -6.160 None of the template models are great: 1stmA -5.400 1adr -5.170 1chmA_2 -4.980 1r69 -4.870 1glqA_2 -4.730 1mla_1 -4.420 2polA_1 -4.270 1pil -4.260 Summing both ways pulls a few up: 1neq -9.7 1neq DNA-binding 1ftt -9.12 1fjlA thryoid transcription factor 1yrnA -7.83 1yrnA mating-type protein 1lccA -7.53 1pru Lac repressor 1r69 -7.3 1r69 434 repressor 1smt[AB]-7.250 1smtA transcriptional repressor There seems to be a "repressor" theme here. FSSP neighbors of 1neq: 1lmb3 Z=4.7 1r69 Z=4.5 1au7A Z=3.4 FSSP neighbors of 1fjlA (1ftt): 1au7A 10.2 1enh 9.8 1san 9.4 1yrnA 9.1 2hoa 9.1 1hom 9.0 1ahdP 8.8 1ftt 8.0 1lfb 8.0 ... FSSP neighbors of 1yrnA: 1fjlA 9.1 1au7A 8.7 1lfb 7.8 1mnmC 7.6 It looks like we're finding a shared domain---we should build the constrained alignments for these. Fri Aug 7 17:59:25 PDT 1998 The t79-1neq-hand1.a2m alignment conserves 19 residues with 3 small insertions. The t79-1smtA-global alignment looks ok. amd can be slightly improved to t79-1smtA-hand, but we have a big insertion in the end of an exposed hairpin--and there are only 12 conserved residues. We can find other alignments (playing around by hand) that look just as reasonble and save 14 residues. The 1r69-t79-global alignment can be modified to have 10 conserved residues with 3 insertions (1r69-t79-hand1.a2m)---this alignment covers about the first half of t79. Since t79 is a transcription regulatory protein, it seems reasonable for it to be homologous to lambda repressor-like DNA -binding domains, but it is very hard to choose between the families: SCOP: 1.lambda repressor-like DNA-binding domains (3) 1.Oct-1 POU-specific domain (1) 1oct, 1pou 2.Phage repressors (6) 1.lambda C1 repressor, DNA-binding domain 1.Escherichia coli bacteriophage Lambda (3) 1lmb,1lrp,1lli[AB] 2.434 C1 repressor, DNA-binding domain 1.Escherichia coli phage 434 (7) contains a short additional helix at C-terminus 1r69, 1per[LR], 2orl[LR], 1rpe[LR], 1pra, 1r63, 2r63 3.cro 434 1.bacteriophage 434 (3) contains a short additional helix at C-terminus 2cro, 3cro[LR], 1zug 4.p22 C2 repressor, DNA-binding domain 1.Salmonella bacteriophage P22 (1) contains a short additional helix at C-terminus 1adr 5.cro lambda repressor the fourth helix is replaced with a beta hairpin 3 helices; folded leaf, opened; 1.Escherichia coli bacteriophage Lambda (4) 4cro[ABCDEF], 1cro[OABC], 1cop[DE], 1orc 6.NER 1.Bacteriophage mu (2) 1ner, 1neq 3.Bacterial repressors (3) lacks the first helix of canonical fold 3 helices; bundle, partly opened, right-ha 1.Purine repressor (PurR), N-terminal domain 1.Escherichia coli (5) 1bdi, 1pnr, 1pru, 1prv, 1bdh 2.Lac repressor (LacR), N-terminal domain 1.Escherichia coli, strain bmh 74-12 (4) 1lccA, 1lcdA, 1lbg, 1lqc 3.Fructose repressor (FruR), N-terminal domain 1.Escherichia coli (2) 1uxd, 1uxc Other hits: Homeodomain (DNA-binding 3-helical bundle) 1.Homeodomain (11) 1.engrailed Homeodomain 1.Drosophila melanogaster (2) 1enh, 1hdd[CD] 2.mat A1/alpha2 Homeodomain 1.brewer's yeast (Saccharomyces cerevisiae) (2) 1yrn[AB], 1ap1[CD] 3.Transcription factor LFB1 atypical Homeodomain with a large insertion into HTH motif 1.rat (Rattus rattus) (2) 1lfb, 2lfb 4.Oct-1 POU Homeodomain 1.human (Homo sapiens) (2) 1oct, 1pog 5.Thyroid transcription factor 1 homeodomain 1.rat (Rattus rattus) (1) 1ftt 6.Oct-2 POU Homeodomain 1.human (Homo sapiens) (1) 1hdp 7.Oct-3 POU Homeodomain 1.mouse (Mus musculus) (1) 1ocp 8.antennapedia Homeodomain 1.Drosophila melanogaster (4) 1hom, 2hoa, 1ahdP, 1san 9.Fushi Tarazu protein 1.fruit fly (Drosophila melanogaster) (1) 1ftz 10.VND/NK-2 protein 1.fruit fly (Drosophila melanogaster) (1) 1vnd 11.Paired protein 1.fruit fly (Drosophila melanogaster) (1) 1fjl[ABC] 2.Recombinase DNA-binding domain (2) 1hcrA, 1res, 1ret, 1gdt 3.c-Myb, DNA-binding domain (2) s006. 1mbe, 1mbf, 1mbg, 1mbh, 1mbj, 1mbk, 1mse, 1msf, 1idy, 1idz 4.Paired domain (1) duplication: consists of two domains of this fold 1pdnC 5.DNA-binding domain of rap1 (1) duplication: consist of two domains of this 1ign We also have some hits in the winged DNA-binding domain: 1.Biotin repressor, N-terminal domain (1) 1bia, 1bib 2.LexA repressor, N-terminal DNA-binding domain (1) 1lea, 1leb 3.Arginine repressor (ArgR), N-terminal DNA-binding domain (1) 1aoy 4.Catabolite gene activator protein (CAP), C-terminal domain (1) 1cgp, 1ber, 3gap, 1run, 1ruo 5.OMPR C-terminal DNA-binding domain (1) 1opc, 1odd 6.Replication terminator protein (RTP) (1) s007 7.Histone H1/H5 (2) 1hst[AB]. 1ghc 8.HN-3/fork head DNA-binding domain (1) s008 9.ets domain (4) 1fliA, 1etc, 1etd, 2stwA, 2sttA, 1pue[EF] 10.Heat-shock transcription factor (2) 1hks, 1hkt, 2hts, 3hsf 11.iron-dependent represor protein (1) 1dpr, 2dtr, 1tdx 12.Methionine aminopeptidase, insert domain (1) 1xgs 1yge is Lipoxigenase, C-terminal domain, multihelical, large nearly all-alpha domain, so is 2sblB, 1lnh 1mek is Thioredoxin fold core: 3 layers, a/b/a; mixed beta-sheet of 4 strands, order 4312; strand 3 is antiparallel to the rest An unlikely hit give the all-helical prediction. 1stmA is a viral coat and capsid protein 1chmA_2 is mainly beta, so unlikely to be a good hit. We also got some hits in TetR/NARL DNA-binding domain 1. Tetracyclin repressor (Tet-repressor, TetR), N-terminal domain 2tct, 2trt 2. Nitrate/nitrite response regulator (NARL), receiver domain 1rnl Unfortunately, we have LOTS of DIFFERENT DNA-binding domains that we hit. Choosing the righ tone will be very difficult. 8 Aug 1998 Here are the best scores for t79 with various models: 1r69/1r69-t79-global T0079 -10.35 -10.34 1r69/1r69-t79-post T0079 -10.35 -10.34 1adr/1adr-t79-global T0079 -8.76 -9.38 1adr/1adr-t79-post T0079 -8.76 -9.38 1lea/1lea-t79-const-global T0079 -9.44 -8.56 1lea/1lea-t79-fssp-global T0079 -9.03 -8.51 1pru/1pru-t79-const-global T0079 -7.32 -7.20 1yrnA/1yrnA-t79-const-global T0079 -6.96 -7.20 1fjlA/1fjlA-t79-const-global T0079 -8.02 -6.93 1pru/1pru-t79-fssp-global T0079 -8.14 -6.71 1lea/1lea-t79-global T0079 -6.66 -6.46 1lea/1lea-t79-post T0079 -6.66 -6.46 2tct/2tct-t79-fssp-global T0079 -6.38 -6.19 1adr/1adr-t79-vit T0079 -6.64 -5.73 1yrnA/1yrnA-t79-fssp-global T0079 -6.71 -5.07 2tct/2tct-t79-global T0079 -5.74 -4.83 2tct/2tct-t79-post T0079 -5.74 -4.83 1fjlA/1fjlA-t79-global T0079 -5.60 -4.60 1fjlA/1fjlA-t79-post T0079 -5.60 -4.60 1neq/1neq-t79-global T0079 -4.66 -4.59 1neq/1neq-t79-post T0079 -4.66 -4.59 1r69/1r69-t79-vit T0079 -5.78 -4.58 1yrnA/1yrnA-t79-global T0079 -5.51 -4.56 1yrnA/1yrnA-t79-post T0079 -5.51 -4.56 1fjlA/1fjlA-t79-fssp-global T0079 -6.33 -4.37 1neq/1neq-t79-const-global T0079 -3.52 -4.35 1neq/1neq-t79-fssp-global T0079 -3.68 -4.32 1fjlA/1fjlA-t79-vit T0079 -4.00 -4.31 1yrnA/1yrnA-t79-vit T0079 -3.85 -4.00 1neq/1neq-t79-vit T0079 -2.91 -3.76 1pru/1pru-t79-global T0079 -4.55 -3.55 1pru/1pru-t79-post T0079 -4.55 -3.55 2tct/2tct-t79-vit T0079 -4.17 -3.50 1lea/1lea-t79-vit T0079 -3.38 -3.14 1r69/1r69-t79-const-global T0079 -5.21 -2.28 1r69/1r69-t79-fssp-global T0079 -4.51 -1.63 1pru/1pru-t79-vit T0079 -1.51 -1.00 2tct/2tct-t79-const-global T0079 -0.60 -0.95 1smtA/1smtA-t79-const-global T0079 -4.14 -0.85 1smtA/1smtA-t79-fssp-global T0079 -4.77 -0.38 1smtA/1smtA-t79-global T0079 -2.36 1.02 1smtA/1smtA-t79-post T0079 -2.36 1.02 1smtA/1smtA-t79-vit T0079 -0.98 1.06 Here are the best with the t79 model. It is interesting that 1smtA does the best with this model, but the worst in the other direction! 1smtA/t79-1smtA-global 98 -8.60 -10.76 1smtA/t79-1smtA-post 98 -8.60 -10.76 1neq/t79-1neq-vit 74 -8.69 -9.56 1lcdA/t79-1lcdA-global 51 -2.28 -9.50 1lcdA/t79-1lcdA-post 51 -2.28 -9.50 1neq/t79-1neq-global 74 -5.03 -8.96 1neq/t79-1neq-post 74 -5.03 -8.96 1smtA/t79-1smtA-vit 98 -9.05 -7.68 1ftt/t79-1ftt-global 68 -3.33 -7.64 1ftt/t79-1ftt-post 68 -3.33 -7.64 1ftt/t79-1ftt-vit 68 -6.66 -7.25 1lcdA/t79-1lcdA-vit 51 -9.09 -7.11 1lea/t79-1lea-global 72 -2.26 -7.09 1lea/t79-1lea-post 72 -2.26 -7.09 1fjlA/t79-1fjlA-vit 65 -4.47 -4.89 1yrnA/t79-1yrnA-vit 49 -4.55 -4.65 1pru/t79-1pru-vit 56 -4.33 -4.30 1r69/t79-1r69-vit 63 -3.20 -4.08 1fjlA/t79-1fjlA-global 65 -0.81 -4.05 1fjlA/t79-1fjlA-post 65 -0.81 -4.05 1lea/t79-1lea-vit 72 -5.88 -2.85 2tct/t79-2tct-vit 198 -3.60 -2.57 2tct/t79-2tct-global 198 -6.08 -2.18 2tct/t79-2tct-post 198 -6.08 -2.18 1adr/t79-1adr-vit 76 -1.83 -1.85 1enh/t79-1enh-vit 54 -1.86 -1.09 1adr/t79-1adr-global 76 3.30 1000000.00 1adr/t79-1adr-post 76 3.30 1000000.00 1enh/t79-1enh-global 54 3.18 1000000.00 1enh/t79-1enh-post 54 3.18 1000000.00 1pru/t79-1pru-global 56 2.82 1000000.00 1pru/t79-1pru-post 56 2.82 1000000.00 1r69/t79-1r69-global 63 4.05 1000000.00 1r69/t79-1r69-post 63 4.05 1000000.00 1yrnA/t79-1yrnA-global 49 3.20 1000000.00 1yrnA/t79-1yrnA-post 49 3.20 1000000.00 Best hits---alignment, score, best score in reverse direction, domain type (? indicates no library model yet): 1smtA/t79-1smtA-global -10.76 -0.85 winged? (transcriptional repressor) 1r69/1r69-t79-global -10.34 -4.08 lambda-like,phage,434 repressor 1neq/t79-1neq-vit -9.56 -4.59 lambda-like,phage,NER 1lcdA/t79-1lcdA-global -9.50 -8.90 lambda-like,bacterial,LacR 1adr/1adr-t79-global -9.38 -1.85 lambda-like,phage,p22 1lea/1lea-t79-const-global -8.56 -7.09 winged,LexA 1ftt/t79-1ftt-global -7.64 -4.28 homeodomain,thyroid transcription 1yrnA/1yrnA-t79-const-global -7.20 -4.65 homeodomain, matA1/alpha2 1pru/1pru-t79-const-global -7.20 -4.30 lambda-like,bacterial,purR 1fjlA/1fjlA-t79-const-global -6.93 -4.89 homeodomain, paired protein 2tct/2tct-t79-fssp-global -6.19 -2.57 tetR DNA-binding 1enh/t79-1enh-vit -1.09 -4.52 homeodomain, engrailed I have to build the target98 models for 1lcdA, 1ftt, and 1enh, to see if they improve things. They might---1lcdA is represented by 1pru in FSSP, which did quite well with the constrained model, and 1ftt and 1enh are represented by 1fjlA, which also did fairly well with a the constrained model. Right now the most promising is the 1lea constrained alignment, but 1lcdA could pull ahead if it's target98 model is good. [NOTE: after filling in the missing target98 models, 1lcdA DOES move to being the most promising.] T0079 m15hs----------ILDWIEDNLES---PLSLEKVSERSGY-S----KWHLQRMFKKETGHSL D I D P R G S HL K 1lea .....MKALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIV T0079 GQYIRSRKMTEIA-q60ns. R 1lea SGASRGIRLLQEE-...... This is 11 residue identities out of 72 possible, with 3 gaps---about as good as the other alignments I've looked at. The conserved hydrophobicity pattern in the first 7 or 8 residues is also a help in getting good scores. Question: which are the active residues involved in the DNA binding? Conserving these is more important than other conservation. Hmm---the t79-1lcdA-global alignment has a rather LARGE insertion: 10 20 30 40 50 60 | | | | | | T0079 ..MTMSRRNTDAITIHSILDWIEDNLESPLSLEKVSERSGYSKWHLQRMFKKETGHSLGQYIRS P L V E G S R 1lcdA mk--------------------------PVTLYDVAEYAGVSYQTVSRVVNQA----------- 70 80 90 100 110 120 130 | | | | | | | T0079 RKMTEIAQKLKESNEPILYLAERYGFESQQTLTRTFKNYFDVPPHKYRMTNMQGESRFLHPLNHYNS. LN 1lcdA --------------------------------------------SHVSAKTREKVEAAMAELNYIPNr The first part looks good, but the second part looks atrocious---probably a result of global alignment hating end gaps too much. The 1lcdA-t79-global alignment looks more sensible: T0079 m24nlESPLSLEKVSERSGYSKWHLQRMFKKET-----------------------gh76ns. P L V E G S R 1lcdA .....MKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNR....... but still leaves a lot of t79 unaccounted for. Wed Aug 12 16:59:06 PDT 1998 Tried editing the 1lcdA alignments, and found 2 reasonable ways to extend the part they agree on: t79-1lcdA-hand1 t79-1lcdA-hand2 Hand1 has a 7-residue insert and 11 conserved residues, and hand2 has 4-residue insert and 10 conserved residues. I think I like t79-1lcdA-hand1 slightly better, though neither describes a large part of t79. (Note: 1lccA is identical.) I edited an alignment to get t79-1smtA-hand1.a2m, with 10 conserved residues and 1 1-residue insert. t79-1smtA-hand2.a2m has 12 conserved residues, 1 1-residue gap and a 4-residue insert. (I can mix and match a bit, as the middle part is identical in these two alignments, and I can mode the final section to several places equally well.) Unfortunately, there is a proline in the middle of the final helix on these alignments, which is NOT reassuring.) 1r69-t79-hand1 has a 5-residue insert and a 4-residue insert and 14 conserved residues. 1neq-t79-hand2 has a 1-residue insert and a 9-residue insert and 18 conserved residues. The first part agrees with 1neq-t79-hand1, including the 1-residue insert, but 1neq-t79-hand1 had a 2-residue insert and a 4-residue insert to get 19 conserved residues. I think I like 1neq-t79-hand2 best so far. t79-1adr-gapless1.a2m has 11 conserved residues with no gaps. t79-1lea-gapless1.a2m has 9 conserved residues, and has nice pairing of conserved residues on the beta hairpin, though the neural net predicts an all-helical structure. 1ftt has several gapless alignments with 7-10 residues conserved---none do a great job of matching predicted secondary structure. 12 August 1998 Christian T79 is MARA_ECOLI. It is a member of the HTH (helix-turn-helix) ARAC family (Prosite PS00041), which are characterized by the presence of a HTH DNA binding motif. This motif is thought to occur in the first third of T0079 (Leu30 - Lys49). See the Swissprot entry. What's interesting is that most of the hits for T79 are to short (about 55 aa) transcriptional regulators that basically only contain the HTH motif. They appear to be trying to align to the first portion of T79. This leaves the portion of T79 after its HTH motif basically unaligned to any structure, as far as I can tell through the noise of all the structural hits. To this end, I created the directories casp3/t79-1st and casp3/t79-2nd, splitting T79 after the HTH (with a few overlapping residues for slop). For now, the results from this splitting are in the READMEs for those directories. 13 August 1998 Kevin Karplus Moved the t79-1st and t79-2nd directories to be subdirectories of t79. Here is the alignment I was favoring: 10 20 30 40 50 60 | | | | | | T0079 MTMSRRNTDAITIHSILDWIEDNLESPLSLEKVSERSGYSKWHLQRMFKKETGHSLGQYIRSRK N A H D I LSL S GY L 1neq ----CSNEKARDWH-RADVIAGLKKRKLSLSALSRQFGYAPTTLANALERH---------WPKG 70 80 90 100 110 120 | | | | | | T0079 MTEIAQKLKESNEPILYLAERYGFESQQTLTRTFKNYFDVPPHKYRMTNMQGESRFLHPLNHYNS IA L E I G 1neq EQIIANALETKPEVIWPSRYQAGE----------------------------------------- From karplus@cse.ucsc.edu Thu Aug 13 09:58:25 1998 Return-Path: karplus@cse.ucsc.edu Date: Thu, 13 Aug 1998 09:58:24 -0700 From: Kevin Karplus To: cbarrett@cse.ucsc.edu Cc: karplus@cse.ucsc.edu Subject: t79 In t79/README, you suggested looking at the Swissprot entry, but neglected to save it to the t79 directory. I don't have web access when logging in from home, so accessing Swissprot and other web data sources is a nuisance. Could you save the swissprot entry in the t79 directory? Is the location of the HTH motif determined by experiment or by similarity? There are lots of possible alignments of repressors to t79, and I'm not convinced that someone else's similarity search is any more informative than our own. I've added 1fjlA (the 3-5-98 fssp representative for the homeodomains) to the Makefile for t79-2nd. It does not come out as a superhit. The best hit for the second domain (based on scores) looks like 1lea, though 1san and 1ftz do ok. I'm not convinced that there are 2 HTH motifs in the protein---every example I've seen has the multiple motifs in separate chains, with the interesting part of the protein attached to the individual motifs. From cbarrett@cse.ucsc.edu Thu Aug 13 10:19:07 1998 Return-Path: cbarrett@cse.ucsc.edu X-Authentication-Warning: beta.cse.ucsc.edu: cbarrett owned process doing -bs Date: Thu, 13 Aug 1998 10:19:06 -0700 (PDT) From: Christian Barrett X-Sender: cbarrett@beta To: Kevin Karplus Subject: Re: t79 In-Reply-To: <199808131658.JAA30210@purr.cse.ucsc.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Look for t79/MARA_ECOLI.{html, txt}. Yes, the HTH in this protein is by similarity. After looking at the 1neq match, I feel encouraged that it is a template for the HTH motif posited to exist in MARA_ECOLI. I base this on the discussion of the HTH motif in Branden and Tooze pg. 101-103. > I've added 1fjlA (the 3-5-98 fssp representative for the homeodomains) > to the Makefile for t79-2nd. It does not come out as a superhit. The > best hit for the second domain (based on scores) looks like 1lea, > though 1san and 1ftz do ok. I'm not convinced that there are 2 HTH > motifs in the protein---every example I've seen has the multiple > motifs in separate chains, with the interesting part of the protein > attached to the individual motifs. I also was doubtful of the claim that t79 contains two HTH motifs -- for the very same reason. That's why it took me quite some time to come around to this hypothesis. Even though proteins using the HTH motif use two copies of it, usually through dimerization, MarA (T79) is known to function as a monomer. (PMID:8636021, PMID:8626315). Thus it has got to possess the two all by itself, if it indeed uses the motif at all. I think this claim is strengthened by the observation that homeodomain/HTH proteins are matching to both halves of T79. Christian 17 August 1998 Christian I copied the new Makefile from t78 and remade joints. Here's the .dist summary of the higher scores: 1adr/1adr-t79-global.pw.dist:T0079 129 -8.76 -9.38 1fjlA/1fjlA-t79-const-global.pw.dist:T0079 129 -8.02 -6.93 1ftt/t79-1ftt-global.pw.dist:1ftt 68 -3.33 -7.64 1lccA/t79-1lccA-vit.pw.dist:1lccA 51 -9.09 -7.11 1lcdA/1lcdA-t79-post.pw.dist:T0079 129 -8.90 -8.39 1lea/1lea-t79-fsspt98-global.pw.dist:T0079 129 -9.67 -9.80 1neq/t79-1neq-vit.pw.dist:1neq 74 -8.69 -9.56 1pru/1pru-t79-const-global.pw.dist:T0079 129 -7.32 -7.20 1r69/1r69-t79-global.pw.dist:T0079 129 -10.35 -10.34 1smtA/t79-1smtA-global.pw.dist:1smtA 98 -8.60 -10.76 Still need to remake the entire directory when load subsides. Here is what I favor for the t79 prediction: Kevin's 1neq alignment for the first HTH motif, and t79/t97-2nd/1san/t79-2nd-1san-global.cbarrett1.a2m for the second half. What needs to be done is determine how the two alignments will overlap; that is, what border residues in each alignment to keep for just one structure and which t79 residues to keep out of both structures. 18 August 1998 Christian I edited Kevin's 1neq alignment to basically unalign a piece at the end. I have saved it as 1neq/t79-1neq-cbarrett1.a2m. The final structural alignment for the second half is 1san/1san-t79-global.cbarrett1.a2m. The alignments are such that the end of the 1st-half alignment and beginning of the second half alignment are helical, implying that a helix links the two. It is easy to visualize this by opening Rasmol for both halves side-by-side. 19 august 1998 Kevin Karplus I edited Christian's 1san alignment slightly to get 1san-t79-hand2, which I think is slightly better. I have much more confidence in the 1neq alignment than the 1san alignment, but I think we are ready to submit both. Top hits with t79-sum98: t79 1neq -9.7 t79 1ftt -9.12 t79 1yrnA -7.83 t79 1lccA -7.53 t79 1r69 -7.3 t79 1smtA -7.250 t79 1smtB -7.250