From SMian@lbl.gov  Sun May 14 18:35:32 2000
Return-Path: <SMian@lbl.gov>
Sender: saira@lbl.gov
Date: Sun, 14 May 2000 18:35:12 -0700
From: Saira Mian <SMian@lbl.gov>
X-Accept-Language: en
To: Kevin Karplus <karplus@cse.ucsc.edu>
Subject: Re: CASP/t86
Content-Type: multipart/mixed;  boundary="------------08F2AE88E2CBC5CD7735F39B"

This is a multi-part message in MIME format.
--------------08F2AE88E2CBC5CD7735F39B
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Dear Kevin,
  Great! I don't have permission to access
/projects/compbio/experiments/casp4. I have some thoughts about the HMM
library which I will send tomorrow. Since T86 is due soon, I thought I'd
send you an HMM-generated alignment of t86 derived from an initial
PSI-BLAST analysis. You might want to take a look at Xue & Lipscomb PNAS
92:10595 (1995) in terms of the types of chorismate binding residues one
might expect to be conserved.
	-saira

Kevin Karplus wrote:
> 
> Saira,
> 
> I would be willing to include you on the SAM_T2K team (the SAM_T99
> team is the automatic server).
> 
> The best role for you would be look through various "hit" lists and
> give us some hints for functional screening.  Because we don't usually
> understand the names of the proteins, we often miss relationships that
> would be obvious to a real protein person.
> 
> You can also look at alignment predictions and give comments on how
> good or bad they look.  For example, you are more likely to figure out
> whether residues at a binding site are being properly matched than we
> are.
> 
> The work will all be done in /projects/compbio/experiments/casp4
> with a subdirectory for each target.  There are three targets there
> already (t86, t87, t88).  None of these have obvious (or even
> sam-t99-detectable) homologies, so we are going to be reduced to
> secondary structure matching and functional matching.
> 
> Take a look at the files in each of those directories and let me know
> if you have any questions about what they mean.  As I get a chance to
> look at the results, I'll add my working notes as a README file for
> each target.
> 
> Kevin

-- 
I. Saira Mian
Life Sciences Division (Mail Stop 74-197)  E-mail: SMian@lbl.gov
Lawrence Berkeley National Laboratory      Tel:    (510) 486-6216
1 Cyclotron Road                           Fax:    (510) 486-6949
Berkeley, California 94720
--------------08F2AE88E2CBC5CD7735F39B
Content-Type: text/plain; charset=us-ascii;
 name="t00861.out"
Content-Disposition: inline;
 filename="t00861.out"
Content-Transfer-Encoding: 7bit

;  SAM: prettyalign v2.2.1 (October 5, 1998) compiled 10/06/98_16:11:20.
;  SAM:  Sequence Alignment and Modeling Software System
;  (c) 1992-1998 Regents of the University of California, Santa Cruz
;  http://www.cse.ucsc.edu/research/compbio/sam.html
;
; ------ Citations (HMMs, SAM) ------
; A. Krogh et al., Hidden Markov models in computational biology:
;   Applications to protein modeling, JMB 235:1501-1531, Feb 1994.
; R. Hughey, A. Krogh, Hidden Markov models for sequence analysis:
;   Extension and analysis of the basic method, CABIOS 12:95-107, 1996.
; -----------------------
                        10        20        30        40        50            60            70        80      
                         |         |         |         |         |             |             |         |      
T0086      ....SHPALT.QLRALRYCKEIPALDPQLLDWLLLEDSMTKRF-EQQGKTVSVTMIREGFv9eeLPLLPKESR....YWLREILLCADGEPWLAGRTVV
AF187880_4 mev.AYRFSQpHLEWNSYGHWRSSIAATQREWLFDRSSLTRRLRTLSDNEFEVIPLREAA....GPMLPEECRv9vtGWIREVYLAGFGRPWVYARSVI
T36851     113aFDAGAQ.ALRDRRDSRLRVAASMTIAEYLLPGWLVALRA-QLPDTAVSLLAGNSAAva..ERLLADDADl18iGHDRLIVVTAPGHPWARRRRPL


             90       100       110       120           130       140       150            
              |         |         |         |             |         |         |            
T0086      PVSTLSGPELALQKLGKTPLGRYLFTSSTLTRDFIEIGRD....AGLWGRRSRLRLSGKPLLLTELFLPASPLY-.....
AF187880_4 SHCDVEGSDSALLQLGNIPLGSLLFGENPYKRSEIEVCRYp11aYPLWARRSVFSRRQSRVLVHEMFLPALWEE-ls...
T36851     EAAELAATPLILREKGSGTRQVLDAALGGLARPLIELSSTt22gEELTTRRLVSVPVADVVLARD--LRAVWPT-g17a.


--------------08F2AE88E2CBC5CD7735F39B
Content-Type: text/plain; charset=us-ascii;
 name="t0086.pep"
Content-Disposition: inline;
 filename="t0086.pep"
Content-Transfer-Encoding: 7bit

;
; T0086 Chorismate lyase, E. coli
T0086
SHPALTQLRA LRYCKEIPAL DPQLLDWLLL EDSMTKRFEQ QGKTVSVTMI REGFVEQNEI
PEELPLLPKE SRYWLREILL CADGEPWLAG RTVVPVSTLS GPELALQKLG KTPLGRYLFT
SSTLTRDFIE IGRDAGLWGR RSRLRLSGKP LLLTELFLPA SPLY
; [AF187880_4]
; LOCUS       AF187880_4    182 aa                    BCT       07-OCT-1999
; DEFINITION  unknown [Pseudomonas sp. YH102].
; ACCESSION   AAF01449
; PID         g6014666
; VERSION     AAF01449.1  GI:6014666
; DBSOURCE    locus AF187880 accession AF187880.1
; KEYWORDS    .
; SOURCE      Pseudomonas sp. YH102.
;   ORGANISM  Pseudomonas sp. YH102
;             Bacteria; Proteobacteria; gamma subdivision; Pseudomonas group;
;             Pseudomonas.
; REFERENCE   1  (residues 1 to 182)
;   AUTHORS   Newman,L.M. and Zylstra,G.J.
;   TITLE     Analysis of genes for p-nitrobenzoate degradation from Pseudomonas
;             sp. strain YH102
;   JOURNAL   Unpublished
; REFERENCE   2  (residues 1 to 182)
;   AUTHORS   Newman,L.M. and Zylstra,G.J.
;   TITLE     Direct Submission
;   JOURNAL   Submitted (16-SEP-1999) Biotech Center for Agriculture & the
;             Environment, Cook College, Rutgers University, 59 Dudley Road, New
;             Brunswick, NJ 08901-8520, USA
; COMMENT     Method: conceptual translation supplied by author.
; FEATURES             Location/Qualifiers
;      source          1..182
;                      /organism="Pseudomonas sp. YH102"
;                      /strain="YH102"
;                      /db_xref="taxon:104926"
;      Protein         1..182
;                      /product="unknown"
;                      /name="orf2"
;      CDS             1..182
;                      /coded_by="AF187880.1:2680..3228"
;                      /transl_table=11
; ORIGIN      
AF187880_4
MEVAYRFSQPHLEWNSYGHWRSSIAATQREWLFDRSSLTRRLRTLSDNEFEVIPLREAAGPMLPEECRVL
GLQPGVTGWIREVYLAGFGRPWVYARSVISHCDVEGSDSALLQLGNIPLGSLLFGENPYKRSEIEVCRYP
DACNASSRPAYPLWARRSVFSRRQSRVLVHEMFLPALWEELS
; [T36851]
; LOCUS       T36851        325 aa                    BCT       07-DEC-1999
; DEFINITION  probable transcription regulator - Streptomyces coelicolor.
; ACCESSION   T36851
; PID         g7481539
; VERSION     T36851  GI:7481539
; DBSOURCE    pir: locus T36851;
;             summary: #length 325 #molecular-weight 33489 #checksum 5285;
;             genetic: #gene SCOEDB:SCI35.38c;
;             PIR dates: 03-Dec-1999 #sequence_revision 03-Dec-1999 #text_change
;             07-Dec-1999.
; KEYWORDS    .
; SOURCE      Streptomyces coelicolor.
;   ORGANISM  Streptomyces coelicolor
;             Bacteria; Firmicutes; Actinobacteria; Actinobacteridae;
;             Actinomycetales; Streptomycineae; Streptomycetaceae; Streptomyces.
; REFERENCE   1  (residues 1 to 325)
;   AUTHORS   Oliver,K., Harris,D., Parkhill,J., Barrell,B.G. and Rajandream,M.A.
;   TITLE     Direct Submission
;   JOURNAL   Submitted (??-SEP-1998) to the EMBL Data Library
; FEATURES             Location/Qualifiers
;      source          1..325
;                      /organism="Streptomyces coelicolor"
;                      /db_xref="taxon:1902"
;      Protein         1..325
;                      /product="probable transcription regulator"
; ORIGIN      
T36851
MGSGAGSSTNGGTGGGTEGGHDTRARQVAGSLAHRVPDLGAMELLLAVARLGSLGGAARELGITQPAASS
RIRSMERQLGVALVDRSPRGSRLTDAGALVTDWARRIVEAAEA
   FDAGAQALRDRRDSRLRVAASMTIAEY
   LLPGWLVALRAQLPDTAVSLLAGNSAAVAERLLADDADLGFVEGVSVPTGLDSAVIGHDRLIVVTAPGHP
   WARRRRPLEAAELAATPLILREKGSGTRQVLDAALGGLARPLIELSSTTAVKAAAVGGAGPSVLSELAVG
   EELTTRRLVSVPVADVVLARDLRAVWPT
GHRPTGPARQLLSLTRA
; [S72976]
; LOCUS       S72976        353 aa                    BCT       22-OCT-1999
; DEFINITION  hypothetical protein B229_C1_169 - Mycobacterium leprae.
; ACCESSION   S72976
; PID         g2145797
; VERSION     S72976  GI:2145797
; DBSOURCE    pir: locus S72976;
;             summary: #length 353 #molecular-weight 39114 #checksum 3066;
;             PIR dates: 19-Mar-1997 #sequence_revision 25-Apr-1997 #text_change
;             22-Oct-1999.
; KEYWORDS    .
; SOURCE      Mycobacterium leprae.
;   ORGANISM  Mycobacterium leprae
;             Bacteria; Firmicutes; Actinobacteria; Actinobacteridae;
;             Actinomycetales; Corynebacterineae; Mycobacteriaceae;
;             Mycobacterium.
; REFERENCE   1  (residues 1 to 353)
;   AUTHORS   Smith,D.R. and Robison,K.
;   TITLE     Direct Submission
;   JOURNAL   Submitted (??-NOV-1993) to the EMBL Data Library
; FEATURES             Location/Qualifiers
;      source          1..353
;                      /organism="Mycobacterium leprae"
;                      /db_xref="taxon:1769"
;      Protein         1..353
;                      /product="hypothetical protein B229_C1_169"
; ORIGIN      
; Saira: may be missing N-terminal 50 residues
;; S72976*
;; MKWSPVYAGGSPERTARPKPLTDDEIRRRRNGRPWLAGGTGLVPVATIVGALSLRSIFERDNACRDPYVD
;; RDFEKLGDERRCWVTISGGMALVVREEGAVKAPVAMVFAYGFYLRMDSFHFQRKRFGKRWGPQVRMVFYD
;OMIT; HCGHVQSSEVALDTYTLTQLGQDLRTVLQTVTPHGMIVLVGHSMEGILKSPALEAVRLTSRSASKLMHRG
;OMIT; SIASQSLIGPILRAASYSDLRVSRGLDAFSQRIMNDTLIAILVSFLHALELHEETAGLWPLLRVPALIAC
;OMIT; GDHDLLTSDERSRGMAAVLPLLALVIVSGASRLALLDKPGAINDGLVRLVNRAVPGKAALRYRRFKERLQ
;OMIT; RHG
; [CAB59810]
; LOCUS       CAB59810      441 aa                    BCT       29-OCT-1999
; DEFINITION  putative aminotransferase [Streptomyces coelicolor A3(2)].
; ACCESSION   CAB59810
; PID         g6165436
; VERSION     CAB59810.1  GI:6165436
; DBSOURCE    embl locus SCF62, accession AL121855.2
; KEYWORDS    .
; SOURCE      Streptomyces coelicolor A3(2).
;   ORGANISM  Streptomyces coelicolor A3(2)
;             Bacteria; Firmicutes; Actinobacteria; Actinobacteridae;
;             Actinomycetales; Streptomycineae; Streptomycetaceae; Streptomyces.
; REFERENCE   1  (residues 1 to 441)
;   AUTHORS   Redenbach,M., Kieser,H.M., Denapaite,D., Eichner,A., Cullum,J.,
;             Kinashi,H. and Hopwood,D.A.
;   TITLE     A set of ordered cosmids and a detailed genetic and physical map
;             for the 8 Mb Streptomyces coelicolor A3(2) chromosome
;   JOURNAL   Mol. Microbiol. 21 (1), 77-96 (1996)
;   MEDLINE   97000351
; REFERENCE   2  (residues 1 to 441)
;   AUTHORS   Murphy,L. and Harris,D.
;   JOURNAL   Unpublished
; REFERENCE   3  (residues 1 to 441)
;   AUTHORS   Thomson,N.R., Parkhill,J., Barrell,B.G. and Rajandream,M.A.
;   TITLE     Direct Submission
;   JOURNAL   Submitted (29-OCT-1999) Streptomyces coelicolor sequencing project,
;             Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge
;             CB10 1SA E-mail: barrell@sanger.ac.uk Cosmids supplied by Prof.
;             David A. Hopwood, [3] John Innes Centre, Norwich Research Park,
;             Colney, Norwich, Norfolk NR4 7UH, UK
; COMMENT     Notes:
;             Streptomyces coelicolor sequencing at The Sanger Centre is funded
;             by the BBSRC and Beowulf Genomics
;             Details of S. coelicolor sequencing at the Sanger Centre are
;             available on the World Wide Web.
;             (URL; http://www.sanger.ac.uk/Projects/S_coelicolor/) 
;             CDS are numbered using the following system eg SC7B7.01c. SC (S.
;             coelicolor), 7B7 (cosmid name), .01 (first CDS), c (complementary
;             strand).
;             The more significant matches with motifs in the PROSITE database
;             are also included but some of these may be fortuitous. 
;             The length in codons is given for each CDS.
;             Usually the highest scoring match found by fasta -o is given for
;             CDS which show significant similarity to other CDS in the database.
;             The position of possible ribosome binding site sequences are given
;             where these have been used to deduce the initiation codon. 
;             Gene prediction is based on positional base preference in codons
;             using a specially developed Hidden Markov Model (Krogh et al.,
;             Nucleic Acids Research, 22(22):4768-4778(1994)) and the FramePlot
;             program of Bibb et al., Gene 30:157-66(1984) as implemented at
;             http://www.nih.go.jp/
;             jun/cgi-bin/frameplot.pl. CAUTION:  We may not have predicted the
;             correct initiation codon.  Where possible we choose an initiation
;             codon (atg, gtg, ttg or (att)) which is preceded by an upstream
;             ribosome binding site sequence (optimally 5-13bp before the
;             initiation codon).  If this cannot be identified we choose the most
;             upstream initiation codon.
;             IMPORTANT: This sequence MAY NOT be the entire insert of the
;             sequenced clone.  It may be shorter because we only sequence
;             overlapping sections once, or longer, because we arrange for a
;             small overlap between neighbouring submissions. 
;             Cosmid F62 Lies on the AseI-F genomic restriction fragment.
; FEATURES             Location/Qualifiers
;      source          1..441
;                      /organism="Streptomyces coelicolor A3(2)"
;                      /strain="A3(2)"
;                      /db_xref="taxon:100226"
;                      /clone="cosmid F62"
;      Protein         1..441
;                      /product="putative aminotransferase"
;      CDS             1..441
;                      /gene="SCF62.27"
;                      /coded_by="AL121855.2:32366..33691"
;                      /transl_table=11
;                      /note="SCF62.27, possible aminotransferase, len: 441 aa.
;                      Similar to many including Pseudomonas aeruginosa
;                      SW:GSA_PSEAE (EMBL:X82072) glutamate-1-semialdehyde
;                      2,1-aminomutase (EC 5.4.3.8) (glutamate-1-semialdehyde
;                      aminotransferase) (427 aa), fasta scores opt:  475
;                      z-score: 540.8 E(): 9.5e-23 34.8% identity in 434 aa
;                      overlap and Streptomyces coelicolor TR:CAB39702 (EMBL:
;                      AL049485) probable aminotransferase SC6A5.18 (461 aa),
;                      fasta scores opt:  464 z-score: 527.9 E(): 4.9e-22 32.4%
;                      identity in 407 aa overlap. Contains 2xPfam matches to
;                      entry PF00202 aminotran_3, Aminotransferases class-III
;                      pyridoxal-phosphate."
; ORIGIN      
; Saira: may be missing C-terminal residues
;OMIT; CAB59810*
;OMIT; MNAEELGLPRSRQANERLHALVPGGAHTYAKGDDQYPENLAPVISHGRGAHVWDVDGNRYVEYGSGLRSV
;OMIT; SLGHAHPRVTEAVRRELDRGSNFVRPSIVEVDAAERFLATVPTAEMVKFAKNGSDATTAAVRLARAATGR
;OMIT; PRVAVCADHPFFSVDDWFIGTTPMSAGIPAATNELTVAFPYGDLAATEDLLARHEGEVACLILEPATHTE
;OMIT; PPPGYLAGLRELADRHGCVLVFDEMITGFRWSEAGAQGLYGVVPDLSTFGKALGNGFAVAALAGRRELME
;OMIT; LGGLRHSGDRVFLLSTTHGAETHALAAAMAVQGTYVEEGVTARLHALGDRLAAGVREAAASMGVGDHVVV
;; CAB59810*
;; RGRASNLVFATLDENGQPSQRYRTLFLRQLLAGGVLAPSFVVSSALGDADLDHTVDVVAEACAVYRKALD
;; AADPTPWMAGRPVKPVFRRLV

--------------08F2AE88E2CBC5CD7735F39B
Content-Type: text/plain; charset=us-ascii;
 name="t00861.a2m"
Content-Disposition: inline;
 filename="t00861.a2m"
Content-Transfer-Encoding: 7bit

>T0086, 337 bases, D5F233E3 checksum.
..................................................
..................................................
.............SHPALT.QLRALRYCKEIPALDPQLLDWLLLEDSMTK
RF-EQQGKTVSVTMIREGFveqneipeeLPLLPKESR.............
.....YWLREILLCADGEPWLAGRTVVPVSTLSGPELALQKLGKTPLGRY
LFTSSTLTRDFIEIGRD......................AGLWGRRSRLR
LSGKPLLLTELFLPASPLY-.................
>AF187880_4, 337 bases, D7CC0B61 checksum.
mev...............................................
..................................................
.............AYRFSQpHLEWNSYGHWRSSIAATQREWLFDRSSLTR
RLRTLSDNEFEVIPLREAA.........GPMLPEECRvlglqpgvt....
.....GWIREVYLAGFGRPWVYARSVISHCDVEGSDSALLQLGNIPLGSL
LFGENPYKRSEIEVCRYpdacnassrpa...........YPLWARRSVFS
RRQSRVLVHEMFLPALWEE-ls...............
>T36851, 337 bases, 49E624D5 checksum.
mgsgagsstnggtgggtegghdtrarqvagslahrvpdlgamelllavar
lgslggaarelgitqpaassrirsmerqlgvalvdrsprgsrltdagalv
tdwarriveaaeaFDAGAQ.ALRDRRDSRLRVAASMTIAEYLLPGWLVAL
RA-QLPDTAVSLLAGNSAAva.......ERLLADDADlgfvegvsvptgl
dsaviGHDRLIVVTAPGHPWARRRRPLEAAELAATPLILREKGSGTRQVL
DAALGGLARPLIELSSTtavkaaavggagpsvlselavgEELTTRRLVSV
PVADVVLARD--LRAVWPT-ghrptgparqllsltra

--------------08F2AE88E2CBC5CD7735F39B--


From karplus@cse.ucsc.edu  Mon May 15 08:23:07 2000
Return-Path: <karplus@cse.ucsc.edu>
Date: Mon, 15 May 2000 08:23:05 -0700
From: Kevin Karplus <karplus@cse.ucsc.edu>
To: SMian@lbl.gov
CC: karplus@cse.ucsc.edu
In-reply-to: <391F5450.BE756769@lbl.gov> (message from Saira Mian on Sun, 14
	May 2000 18:35:12 -0700)
Subject: Re: CASP/t86


Saira, 

You should have access to all the /projects/compbio/experiments/casp4/t*
directories now.  You are in the protein group, and the directories
are now group-writable (they were only group-readable/executable before).

Let me know if you have any further problems.

The alignment you got from PSI-BLAST is a bit different than the one
from T2K, though both have essentially 3 sequences (T0086+two
others---the T2K alignment has some close duplicates).  Both include
AF187880_4, but the third sequence is different.

I'll look at the article you mention.


From karplus@cse.ucsc.edu  Wed May 17 12:25:48 2000
Return-Path: <karplus@cse.ucsc.edu>
Date: Wed, 17 May 2000 12:25:45 -0700
From: Kevin Karplus <karplus@cse.ucsc.edu>
To: saira@sanger.ac.uk
Cc: karplus@cse.ucsc.edu
Subject: T0086-2chsA


What do you think of the aligment of T0086 to 2chsA (see the notes in
pce/casp4/t86/README)?


From SMian@lbl.gov  Thu May 18 10:41:54 2000
Return-Path: <SMian@lbl.gov>
Sender: saira@lbl.gov
Date: Thu, 18 May 2000 10:41:51 -0700
From: Saira Mian <SMian@lbl.gov>
X-Accept-Language: en
To: karplus@cse.ucsc.edu
Subject: T0086: T0089 & T0090
Content-Type: text/plain; charset=us-ascii

Dear Kevin,

  Since T0089 and T0090 are each listed as having a "Homologous sequence
of known structure", I assume that both will be picked with the
automated scanning of the library. Unless I hear otherwise, I won't
think about these two targets.

  I've created casp4/t86/saira which contains an archaeal sequence
(B69085) that could be a structural (not biochemical) homologue.

  Rather than adding to the README files (I don't know what the
etiquette for writing to them is and I'm not present during your
discussions with everyone else on the team), I'll send you e-mail.

  A reasonable rule of thumb is to look at structures/active sites, if
known, for enzymes that precede and follow the one of interest in the
pathway under consideration. In terms of T0086, the flanking enzymes
should recognise the substrate (chorismate) using similar types of
residues.

	-saira
-- 
I. Saira Mian
Life Sciences Division (Mail Stop 74-197)  E-mail: SMian@lbl.gov
Lawrence Berkeley National Laboratory      Tel:    (510) 486-6216
1 Cyclotron Road                           Fax:    (510) 486-6949
Berkeley, California 94720


From cline@cse.ucsc.edu  Thu May 18 13:47:59 2000
Return-Path: <cline@cse.ucsc.edu>
From: Melissa Cline <cline@cse.ucsc.edu>
Content-Type: text/plain; charset=us-ascii
Date: Thu, 18 May 2000 13:47:57 -0700 (PDT)
To: Kevin Karplus <karplus@cse.ucsc.edu>
Subject: T0086 tasks
In-Reply-To: <200005180028.RAA21406@purr.cse.ucsc.edu>


 > I think it would be valuable to build an active-site model of
 > chorismate mutase by running T2K on a structural alignment of 1ecmA
 > and 5csmA.  This alignment may have to be done with the Yale aligner,
 > not DALI or VAST, since the active sites are alignable, but the
 > scaffolding that supports them has different secondary structure elements.
 > 
 > Melissa, could you produce this structural alignment?
The Yale aligner seems to be blowing up on 1ecmA and 5csmA.  Would you 
like me to keep hunting for a different tool to align them?

Melissa


From karplus@cse.ucsc.edu  Thu May 18 13:53:06 2000
Return-Path: <karplus@cse.ucsc.edu>
Date: Thu, 18 May 2000 13:53:05 -0700
From: Kevin Karplus <karplus@cse.ucsc.edu>
To: cline@cse.ucsc.edu
Cc: karplus@cse.ucsc.edu
In-reply-to: <14628.22166.387111.409716@oink> (message from Melissa Cline on
	Thu, 18 May 2000 13:47:57 -0700 (PDT))
Subject: Re: T0086 tasks


Sigh, maybe we should have created our own iterated superposition
aligner. 

The problem may be coming from the interrupted chain in 5csmA.

We have an fssp alignment, plus an extra helix from the Xue and
Lipscomb paper, so the Yale alignment is not critical.

Perhaps you should focus on T89 and T90 for a while, where the
probable templates are clearer.