gen_sequence a library of routines for generating random sequences with compositions distributed according to a mixture of Dirichlet densities Copyright (C) 2000 Kevin Karplus This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. See http://www.gnu.org/copyleft/lesser.html for the license details, or write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ------------------------------ This directory contains code for a random protein-sequence generator and four auxiliary random generators for normal, beta, Dirichlet, and mixture of Dirichlet distributions. This code is distributed through two mechanisms, with different licensing restrictions. The source code is distributed with version 2.1 of the GNU Lesser General Public License (http://www.gnu.org/copyleft/lesser.html). This open-source agreement has very mild restrictions on use and distribution. The library routines are also used in the SAM suite of tools, which is a proprietary system. The SAM distribution is not open-source and is covered by licensing agreements with the UC Regents. See http://www.cse.ucsc.edu/research/compbio/sam-lic/obj.0 for information about SAM licensing. ------------------------------ The random-variate algorithms in this library were selected more for robustness and simplicity of implementation than raw speed. Despite that, the generation seems to be quite efficient, taking about 1 microsecond per beta generation and 0.6 per normal generation on a DEC alpha xp1000. The random number generator can be changed by changing the DRAND macro definitions in the .c files. Since all the generators rely on successive pairs of uniformly distributed random numbers, a high-quality generator should be used. The additive random number generator "random" in the standard UNIX libraries is such a generator, so was chosen for this application. Test programs are provided for each of the generators. The tests are far from exhaustive, checking only the first two moments for a few parameter values (covering each of the different algorithms for gen_beta). The test programs do not make a decision about whether the generators are working or not---they simply report the first and second moments from the sample and what they should be analytically. It is up to the user to decide whether this match is adequate. Although the test programs were written for debugging the random-variate generators, their main function now is to determine the speed of the generators. 28 April 2003 Bug found and corrected in gen_beta, which failed for a=b=0.5. Thu Jul 21 07:19:21 PDT 2005 Kevin Karplus The methods were changed from c to c++, and the tests were modified to provide a crude test for whether the methods were working. Thu Jul 21 10:53:00 PDT 2005 Kevin Karplus GenSequence.h and GenSequence.cc added to provide easy interface for sequence generation.