Return to the home page for this project
The following is from: /projects/worm/idb.future/build.doc
This describes how whichthis directory was made. This directory contains 
a version of the Intronerator database updated with available sequence
data as of March 16, 2001.

0) Set up environment variable:
     setenv JKWEB=~zahler/.html/cgi-bin
   (you'll change this back to ~/.html/cgi-bin when all done.
1) Make some of the basic directory structure as so:
     cd /projects/worm
     mkdir idb.future
     cd idb.future
     mkdir cDNA
     mkdir ea
     mkdir nt4
     mkdir xeno
     mkdir features
     mkdir features/sanger
     mkdir ra
2) Download Sanger chromosomes and annotations by:
     cd /projects/worm
     mkdir sanger_2000_03_26
     cd sanger_2000_03_26
     ftp ftp.sanger.ac.uk
        ftp> cd pub/C.elegans_sequences/CHROMOSOMES/CURRENT_RELEASE
	ftp> prompt
	ftp> mget *
        ftp> quit
     ln -s ../sanger_2000_03_26 ../idb.future/sanger
     gunzip *.gz

3) Go to http://www.ncbi.nlm.nih.gov/entrez and enter the following in
   the search box:
	"Caenorhabditis elegans" [org] AND "mRNA" [mol]
   choose "genbank" from the display format and "save".  Put the
   resulting file in /projects/worm/idb.future/cDNA/allcdna.gb
   Then convert them to intronerator format with
        gb2cdi allcdna.gb allcdna.fa allcdna.cdi

4) Create NT files in linux/alpha format by logging into a linux
   or alpha machine and doing:
      cd /projects/worm/sanger_2000_03_26
      fatont4 CHROMOSOME_I.dna ../idb.future/nt4/i.nt4
      fatont4 CHROMOSOME_II.dna ../idb.future/nt4/ii.nt4
      fatont4 CHROMOSOME_III.dna ../idb.future/nt4/iii.nt4
      fatont4 CHROMOSOME_IV.dna ../idb.future/nt4/iv.nt4
      fatont4 CHROMOSOME_V.dna ../idb.future/nt4/v.nt4
      fatont4 CHROMOSOME_X.dna ../idb.future/nt4/x.nt4
      cp ../idb/nt4/M.dna.gz .
      gunzip M.dna.gz
      fatont4 M.dna ../idb.future/nt4/m.nt4

5) Start the cDNA alignments as so:
       log onto cc80
       cd /projects/worm/idb.future/ea
       exonAli starting 0.out ../cDNA/allmrna.fa ../nt4 0 10000
       log onto cc81
       cd /projects/worm/idb.future/ea
       exonAli starting 10000.out ../cDNA/allmrna.fa ../nt4 10000 20000
       log onto cc82
       cd /projects/worm/idb.future/ea
       exonAli starting 30000.out ../cDNA/allmrna.fa ../nt4 30000 20000
       log onto cc83
       cd /projects/worm/idb.future/ea
       exonAli starting 50000.out ../cDNA/allmrna.fa ../nt4 50000 20000
       log onto cc84
       cd /projects/worm/idb.future/ea
       exonAli starting 70000.out ../cDNA/allmrna.fa ../nt4 70000 20000
       log onto cc85
       cd /projects/worm/idb.future/ea
       exonAli starting 90000.out ../cDNA/allmrna.fa ../nt4 90000 30000
   wait for them all to finish and then
       cat ??000.out > all.out
   
   Alternatively using codine do:
       log onto cc00
       cd /projects/worm/idb.future/ea
       cp /projects/worm/idb/ea/*.sh .
       source qsubEa.sh
   wait for them all to finish and then
       cat ??000.out > all.out

6) Start cross-species alignments as so:
       cd /projects/worm/idb.future/xeno
       ls -1 /projects/worm/sanger_2000_03_26/*.dna > elegans.lst
       ls -1 /projects/worm/cbriggsae/*/*/*.seq > briggsae.lst
       waba all briggsae.lst elegans.lst cbVsCe.wab
   come back in about 2 weeks....  (Or split the job
   across many machines using hg/conJobs/wabaJobs for
   which the source is lost, argh! - you'll have to
   tweak it to run *all*elegans*at*once* against some
   briggsea cosmids in each job.)

   When done do:
       cd /projects/worm/idb.future/xeno
       mkdir cbriggsae
       wabToSt cbriggsae/all.st wabaCon/wab/*

7) Get latest gene name/ORF name mapping info from Lincoln Stein.  Put
   his table in /projects/worm/idb.future/features/orf2gene.txt.  Then
       makeOrf2gene orf2gene.txt orf2gene sanger/syn
   to create Intronerator version.

8) Make Sun format NT4 files
      mv nt4 alphaNt4
      mkdir nt4
      ssh apache
      cd /projects/worm/idb.future/sanger
      fatont4 CHROMOSOME_I.dna ../idb.future/nt4/i.nt4
      fatont4 CHROMOSOME_II.dna ../idb.future/nt4/ii.nt4
      fatont4 CHROMOSOME_III.dna ../idb.future/nt4/iii.nt4
      fatont4 CHROMOSOME_IV.dna ../idb.future/nt4/iv.nt4
      fatont4 CHROMOSOME_V.dna ../idb.future/nt4/v.nt4
      fatont4 CHROMOSOME_X.dna ../idb.future/nt4/x.nt4
      fatont4 M.dna ../idb.future/nt4/m.nt4

9) Process the big sanger GFFs into a form we can use as follows:
       ssh apache
       cd /projects/worm/idb.future
       makec2c sanger features/c2c /projects/worm/idb/features/c2c
       cd sanger
       gffgenes /projects/worm/idb.future/features/sanger/c2g /projects/worm/idb.future/features/sanger/genes.gdf
       cd /projects/worm/idb.future
       cp features/sanger/c2g features

10) Finish it up with
       ssh apache
       cd /projects/worm/idb.future
       make


This page last updated: Tuesday, 30-Mar-2010 12:06:25 PDT.