Research Question: Is it useful to compare raw chromatogram data directly to one another (compared with comparing the DNA sequences inferred automatically from the chromatograms)?
Biological Interest: Facilitation of biological research; detecting polymorphic differences (e.g. between human individuals) also know as SNPs; detecting allelic differences also known as heterozygous differences, or haplotypes.
Computational/Technical Interest: Peer-to-peer architecture for bioinformatics research. Devise a mini-scenario for OpenKnowledge in which scientists share sequencing chromatograms (i.e. allow each other to look into their trash, sotospeak).
Programming languages: Perl/CGI; possibility to learn LCC at a later stage
Materials:
Research Question: Are there additional proteins that contain the recurrent domain found in Toxoplasma gondii and Plasmodium falciparum so far? Can we produce models for them and speculate what their functions and evolution may be?
Biological Interest: Some of the PGSH proteins are candidates for transmission-blocking vaccines against malaria. These proteins are difficult to express, meaning that the models will remain the only structural information available for a long time.
Computational/Technical Interest: Straightforward use of profile-HMMs for detecting remote homologs.
Programming languages: Perl
Materials:
Research Question: Can side-chain modeling correct for the overfitting of electrostatic interactions between side-chain that has become apparent in complexes generated by a standard MODELER protocol? (Extension: Validate predicted CDK-cyclin combinations by a different, docking-based, approach using RosettaDock)
Biological Interest: The modeled complexes are to be evaluated for their plausibility (only about 10-15% would be expected to occur in nature). One of the criteria for this should be electrostatic complementarity (molecular recognition), which is not a valid criterion unless the overfitting can be corrected.
Computational/Technical Interest: Reasonably straightforward use of academic structural biology/ bioinformatics software (Rosetta) (Extension: RosettaDock)
Programming languages: Perl, possibly a bit of C (Rosetta)
Materials:
Research Question: Is it possible to implement the oldest way people have predicted which cysteine residue is linked to which other cysteine residue in proteins with disulfide bonds. Does it work better than existing methods and what is it good for?
Biological Interest: Predicting disulfide bond connectivity from sequence provides important clues for tertiary structure modeling.
Computational/Technical Interest: Software development (for academic use), and refinement of existing components, and heuristic scoring function, of a method (rough) prototype.
Programming languages: Perl; possibly Perl-CGI (for a WWW-server implementation)
Materials:
Research Question: How accurate have high-confidence automatically generated models been for the set of malaria proteins for which experimental structures have appeared within the past two years?
Biological Interest: Databases of automatically generated comparative (= homology) models are very important but tend to present too many low-confidence models, which are interesting to expert users but not the typical (biologist user).
Computational/Technical Interest: Straightforward use of Kevin Karplus' implementation of the evaluation function GDT used to evaluate modeled protein structures in CASP.
Programming languages: Perl
Materials:
Research Question: A number of peculiarities have been observed in the few currently known structures of coiled-coil protein regions, for example opposing charged residue of the same charge does not seem to pose as much of a problem as one would think. How are these sequence peculiarities accommodated at the detailed structural level (side-chain conformation) and how can they be exploited/considered if a detection method for irregular coiled-coil regions is to be developed?
Biological Interest: It is becoming more and more obvious that coiled-coil regions are often sites of important protein-protein interactions, i.e. play more than merely structural roles. Particularly irregularities in such regions should provide important clues for interaction sites. (Preliminary data for a grant application.)
Computational/Technical Interest: Straightforward structural biologist programming (extracting relevant information from PDB files) and use of protein visualization/manipulation software (e.g. Chimera; PyMol)
Programming languages: Perl; possibly other languages to extract contact angles
Materials:
Research Question: It is generally assumed that protein domains that interact specifically with DNA or RNA sites will bear largely positive surface charge in proximity of the phosphate backbone. We noticed that this is not generally true - can we understand when it is not true, and why?
Biological Interest: Electrostatic surface potential is an important property for function prediction. Accurate prediction of DNA, or RNA-binding using modeled protein structures would provide important clues for experimental biologists.
Computational/Technical Interest: Use of a previously developed 1-D abstraction of the electrostatic potential surface of 3-D structural models for part of the work. Dealing with academic third party software for electrostatic potential calculations, which is not always straightforward, and multiple sequence alignments.
Programming languages: mostly Perl for utility scripting
Materials: