Return to the home page for this project

Summary of different words found in the fields of the *.gff files from Sanger, Feb 2003
With a count of how many times the word was found
2nd field
45 different words found
"Source"
3rd field
67 different words found
"Feature"
9th field, first word
19 different words found
"Attribute"
""2441518"similarity"4675061"Target"4712848
"wublastx"1535655"HOMOL_GAP"2369975"Sequence"415487
"BLATX_NEMATODE"820422"repeat"166584"Note"149021
"waba_weak"734467"exon"137776"PCR_product"27281
"BLAT_EST_BEST"561143"CDS"135780"Homol_data"26819
"curated"407317"intron"117813"RNAi"22570
"waba_strong"321334"structural"26842"Microarray_aff"18454
"BLAT_EST_OTHER"303997"Homol_data"26819"Expr_profile"17360
"waba_coding"284716"experimental"25225"UTR"16795
"inverted"83592"Sequence"25127"Allele"15374
"hmmfs.3"62275"Microarray_aff"18454"Clone"7715
"tandem"54068"Expression"17360"Feature_data"3042
"RepeatMasker"28347"UTR"16795"Confirmed_by_EST"2557
"GenePair_STS"26842"transcription"10354"Transcript"1837
"BLAT_EMBL_OTHER"26819"ALLELE"7687"Operon"876
"RNAi"22570"SNP"5921"Confirmed_by_cDNA"104
"Expr_profile"17360"Clone_left_end"4358"Transposon"95
"UTR"16795"OLIGO"4298"Confirmed_in_UTR"76
"BLAT_mRNA_BEST"13454"Clone"3472"Confirmed_as_FALSE"12
"assembly_tag"10473"Clone_right_end"3357-
"BLASTN_TC1"9440"Feature_data"3042-
"Allele"7687"annotation"2830-
"BLAT_mRNA_OTHER"6087"Finished"2516-
"Genomic_canonical"3266"Allele"1536-
"cDNA_for_RNAi"2655"trans-splice_acceptor"1278-
"hmmfs"2270"Transcript"888-
"Pseudogene"2092"PCR_product"439-
"tRNAscan-SE-1.11"1538"oligo"268-
"scan"1380"Knockout_allele"230-
"BLAT_EMBL_BEST"1042"comment"214-
"SL1"988"compression"196-
"operon"876"repeat_region"195-
"SL2"290"cosmid"138-
"RNA"156"ambiguous"90-
"snRNA"116"miscellaneous"75-
"Transposon"100"misc_feature"47-
"miRNA"76"LTR"42-
"possible_error"64"Conflict"38-
"mRNA"30"Inverted"35-
"hand_built"24"Comment"25-
"Link"24"Direct"24-
"rRNA"12"stop"21-
"misc_feature"11""18-
"possible_exon"8"ignore"17-
"TSL_site"5"Alu"15-
-"Polymorphism"14-
-"repeat_unit"12-
-"TeamLeader"11-
-"Stolen"9-
-"CpG"9-
-"resolved"9-
-"Warning"9-
-"Finisher"8-
-"polyA_site"8-
-"misc_signal"7-
-"DSTM"7-
-"unknown"6-
-"Consensus"6-
-"final"3-
-"Operon"1-
-"Masked"1-
-"sequencing"1-
-"Clooe"1-
-"polymorphism"1-
-"Possible_frameshift"1-
-"polyA_signal"1-
-"misc_structure"1-

The entries in the above table were created by the PERL script: gff.pl It is pretty slow (~20 minutes on a 400Mhz PII), but it gets the job done.
This page last updated: Tuesday, 30-Mar-2010 12:06:43 PDT.