Running Improbizer and Motif Matcher from the Command Line

Though Improbizer was originally designed as a Web CGI program, on large data sets it takes long enough to run that it is sometimes necessary to run it from the command line. The executable for the Improbizer is called ameme. The same executable is used for both the Improbizer and the Motif Matcher web pages. The format of the command line reflects Improbizer's origins. In general command line options are of the form:

cgiVar=someValue

The program writes out an html file to standard output, and also creates a gif file and a temporary file with the suffix .pfl. Normally you'll want to redirect the output to a file, and then view that file using the "Open File" or "Open" option of your web browser. A simple example of a command line would be:

ameme good=foreground.fa bad=background.fa numMotifs=4 >motifs.html

To invoke Motif Matcher rather than Improbizer include

motifMatcher=on

in the command line. A simple example of a Motif Matcher command line would be

ameme motifMatcher=on seqFile=data.fa motifs=motifFile maxOcc=3 >some.html

The order of arguments in the command line does not matter. Below is a table which lists all of the current command line options for Improbizer, whether they are required, and their default values. Following this is the corresponding table for Match Maker

Improbizer Command Line Arguments

Var. Name Description
good Name of foreground sequence file. Required. A fasta (.fa) format file containing DNA or RNA sequences that you suspect share a motif or three. Example:
good=immunePromoters.fa
bad Background sequence file name. Highly recommended especially for 1st and 2nd order Markov background models. This file should ideally contain a large number of sequences in most ways like the "good" sequences, but not the motif you're looking for (or at least not high levels of the motif you're looking for). If you don't use this the background model will be created from the foreground sequence. Example:
bad=mousePromoters.fa
ignoreLocation Controls whether the position of a motif is considered important. By default position is considered. To change this include in the command line: ignoreLocation=on
numMotifs The number of motifs to looks for. By default this is 2. To only look for one do: numMotifs=1
maxOcc The maximum number of times you expect a single motif to occur in a sequence. Default is 1.
rcToo Set rcToo=on if you want to search both strands for motifs.
tileSize This sets the initial size (in nucleotides) of a motif. Generally motifs will grow and shrink to fit the data, but if you have some idea of the size you expect it can help to set this explicitly with something like: tileSize=13. By default tileSize is 7.
constrainer This controls the tendency of the motif size to grow. Set constrainer=1000 if you wish the motif to stay at tileSize. Set to zero for unconstrained growth (which is often not a bad thing on large data sets). The default value of 1.0 mildly constrains motif size.
leftAlign If your sequences aren't all the same size the shorter ones are padded so that the right ends all line up. If you set leftAlign=on then instead they'll be padded so that the left ends all line up.
startScanLimit This sets how many sequences are scanned for initial motifs. By default it is 20. Doubling this to 40 with make the program take nearly twice as long to run, but occassionally will result in a better motif.
background This controls the background (null) model. Possible values are:
even - each base has a 25% chance
m0 - (Markov 0) Base probability depends on how many of that base are in background.
m1 - (Markov 1) Base probability depends on base before.
m2 - (Markov 2) Base probability depends on previous two bases.
coding - Three interleaved Markov 2 models, one for each frame of codon.
By default background=m0
motifOutput In addition to the .html and .gif files, program will create a simple text file containing the motifs if this is set. Example:
motifOutput=splicingMotifs.txt
controlRun If you include controlRun=on in the command line, a random set of sequences will be generated that match your foreground data set in size, and your background data set in nucleotide probabilities. The program will then look for motifs in this random set. If the scores you get in a real run are about the same as those you get in a control run, then the motifs Improbizer has found are probably not significant.
html Where to put html output (by default goes to standard output). Example
html=run1.html.
gif Where to put gif output (by default goes to a cryptically named file). Example
gif=run1.gif.
Motif Matcher Command Line Arguments

Var. Name Description
motifMatcher Tells program to just search for a predefined motif in the input sequences rather than to find a motif. Required in essence for program to behave as Motif Matcher rather than as Improbizer. Example:
motifMatcher=on
motifs A file containing the motifs. This can be either a file you've gotten from using the motifOutput option with Improbizer, or any file containing one or more motifs as described in the Motif Matcher help.
hits Where to put motif hits in a simple tab-delimited format. The columns are: motif# score sequence position
Example: hits=sl1.txt
good Fasta format file containing sequences to scan for motifs. Example: good=immunePromoters.fa
bad Background sequence file name. Highly recommended especially for 1st and 2nd order Markov background models. This file should ideally contain a large number of sequences in most ways like the "good" sequences, but not the motif you're looking for (or at least not high levels of the motif you're looking for). If you don't use this the background model will be created from the foreground sequence. Example:
bad=mousePromoters.fa
background This controls the background (null) model. Possible values are:
even - each base has a 25% chance
m0 - (Markov 0) Base probability depends on how many of that base are in background.
m1 - (Markov 1) Base probability depends on base before.
m2 - (Markov 2) Base probability depends on previous two bases.
coding - Three interleaved Markov 2 models, one for each frame of codon.
By default background=m0
ignoreLocation Controls whether the position of a motif is considered important. By default position is considered. To change this include in the command line: ignoreLocation=on
maxOcc The maximum number of times you expect a single motif to occur in a sequence. Default is 1.
rcToo Set rcToo=on if you want to search both strands for motifs.
html Where to put html output (by default goes to standard output). Example
html=run1.html.
gif Where to put gif output (by default goes to a cryptically named file). Example
gif=run1.gif.