Return to the home page for this project
The following is from: /projects/worm/idb.future/build.doc
This describes how whichthis directory was made. This directory contains
a version of the Intronerator database updated with available sequence
data as of March 16, 2001.
0) Set up environment variable:
setenv JKWEB=~zahler/.html/cgi-bin
(you'll change this back to ~/.html/cgi-bin when all done.
1) Make some of the basic directory structure as so:
cd /projects/worm
mkdir idb.future
cd idb.future
mkdir cDNA
mkdir ea
mkdir nt4
mkdir xeno
mkdir features
mkdir features/sanger
mkdir ra
2) Download Sanger chromosomes and annotations by:
cd /projects/worm
mkdir sanger_2000_03_26
cd sanger_2000_03_26
ftp ftp.sanger.ac.uk
ftp> cd pub/C.elegans_sequences/CHROMOSOMES/CURRENT_RELEASE
ftp> prompt
ftp> mget *
ftp> quit
ln -s ../sanger_2000_03_26 ../idb.future/sanger
gunzip *.gz
3) Go to http://www.ncbi.nlm.nih.gov/entrez and enter the following in
the search box:
"Caenorhabditis elegans" [org] AND "mRNA" [mol]
choose "genbank" from the display format and "save". Put the
resulting file in /projects/worm/idb.future/cDNA/allcdna.gb
Then convert them to intronerator format with
gb2cdi allcdna.gb allcdna.fa allcdna.cdi
4) Create NT files in linux/alpha format by logging into a linux
or alpha machine and doing:
cd /projects/worm/sanger_2000_03_26
fatont4 CHROMOSOME_I.dna ../idb.future/nt4/i.nt4
fatont4 CHROMOSOME_II.dna ../idb.future/nt4/ii.nt4
fatont4 CHROMOSOME_III.dna ../idb.future/nt4/iii.nt4
fatont4 CHROMOSOME_IV.dna ../idb.future/nt4/iv.nt4
fatont4 CHROMOSOME_V.dna ../idb.future/nt4/v.nt4
fatont4 CHROMOSOME_X.dna ../idb.future/nt4/x.nt4
cp ../idb/nt4/M.dna.gz .
gunzip M.dna.gz
fatont4 M.dna ../idb.future/nt4/m.nt4
5) Start the cDNA alignments as so:
log onto cc80
cd /projects/worm/idb.future/ea
exonAli starting 0.out ../cDNA/allmrna.fa ../nt4 0 10000
log onto cc81
cd /projects/worm/idb.future/ea
exonAli starting 10000.out ../cDNA/allmrna.fa ../nt4 10000 20000
log onto cc82
cd /projects/worm/idb.future/ea
exonAli starting 30000.out ../cDNA/allmrna.fa ../nt4 30000 20000
log onto cc83
cd /projects/worm/idb.future/ea
exonAli starting 50000.out ../cDNA/allmrna.fa ../nt4 50000 20000
log onto cc84
cd /projects/worm/idb.future/ea
exonAli starting 70000.out ../cDNA/allmrna.fa ../nt4 70000 20000
log onto cc85
cd /projects/worm/idb.future/ea
exonAli starting 90000.out ../cDNA/allmrna.fa ../nt4 90000 30000
wait for them all to finish and then
cat ??000.out > all.out
Alternatively using codine do:
log onto cc00
cd /projects/worm/idb.future/ea
cp /projects/worm/idb/ea/*.sh .
source qsubEa.sh
wait for them all to finish and then
cat ??000.out > all.out
6) Start cross-species alignments as so:
cd /projects/worm/idb.future/xeno
ls -1 /projects/worm/sanger_2000_03_26/*.dna > elegans.lst
ls -1 /projects/worm/cbriggsae/*/*/*.seq > briggsae.lst
waba all briggsae.lst elegans.lst cbVsCe.wab
come back in about 2 weeks.... (Or split the job
across many machines using hg/conJobs/wabaJobs for
which the source is lost, argh! - you'll have to
tweak it to run *all*elegans*at*once* against some
briggsea cosmids in each job.)
When done do:
cd /projects/worm/idb.future/xeno
mkdir cbriggsae
wabToSt cbriggsae/all.st wabaCon/wab/*
7) Get latest gene name/ORF name mapping info from Lincoln Stein. Put
his table in /projects/worm/idb.future/features/orf2gene.txt. Then
makeOrf2gene orf2gene.txt orf2gene sanger/syn
to create Intronerator version.
8) Make Sun format NT4 files
mv nt4 alphaNt4
mkdir nt4
ssh apache
cd /projects/worm/idb.future/sanger
fatont4 CHROMOSOME_I.dna ../idb.future/nt4/i.nt4
fatont4 CHROMOSOME_II.dna ../idb.future/nt4/ii.nt4
fatont4 CHROMOSOME_III.dna ../idb.future/nt4/iii.nt4
fatont4 CHROMOSOME_IV.dna ../idb.future/nt4/iv.nt4
fatont4 CHROMOSOME_V.dna ../idb.future/nt4/v.nt4
fatont4 CHROMOSOME_X.dna ../idb.future/nt4/x.nt4
fatont4 M.dna ../idb.future/nt4/m.nt4
9) Process the big sanger GFFs into a form we can use as follows:
ssh apache
cd /projects/worm/idb.future
makec2c sanger features/c2c /projects/worm/idb/features/c2c
cd sanger
gffgenes /projects/worm/idb.future/features/sanger/c2g /projects/worm/idb.future/features/sanger/genes.gdf
cd /projects/worm/idb.future
cp features/sanger/c2g features
10) Finish it up with
ssh apache
cd /projects/worm/idb.future
make
This page last updated: Tuesday, 30-Mar-2010 12:06:25 PDT.