Data Analysis, Modeling and Visualization for Bioinformatics
X445.1 Computer Science (3), UCSC Extension
Next offering
I am teaching one section in Spring'2002, and another in Summer'2002. Visit
UCSC extension
for enrolling.
Course Description
The current explosion of biological data has created the need for
mathematical and computational methods for their analyses, and to turn
them into biological insights.
This course
presents the main such methods used in the analysis of
biological information, with emphasis on statistical methods (multinomial
and extreme-value distributions),
information-theoretic methods (entropy, etc), unsupervised
methods (clustering), and supervised
methods (neural networks, decision trees). Examples
and applications of each covered method to bioinformatics are emphasized.
Covered
applications include database searches, classification of
protein sequences, classification of protein structures,
identification of domains, phylogenetic tree construction,
gene finding, coding region determination, and analysis of gene
expression data.
Also
included are probabilistic modeling and its application in
bioinformatics, particularly in sequence analysis, and an overview of
visualization principles, formats and methods that are commonly used to
display large volumes of complex types of biological data.
Required Texts
When I teach this course, the required text is Data Analysis and
Classification for Bioinformatics, Arun Jagota . I provide a complementary copy
of this text to each enrolled participant, during the first class.
Prerequisites:
Some familiarity with probability and statistics, as acquired from
a first college course on these topics, is necessary.