CMP 261 Project Proposal

Phil Heller

Spring 2010

 

 

 

My project is a feature-oriented visualization of the gp-120 protein of the HIV virus. This protein protrudes from the viral envelope and helps the virus enter cells in infected patients. Gp-120 mutates quickly, so while it is theoretically an excellent vaccine target, it changes its structure before vaccine-induced antibodies can wipe it out. It is exactly this mutability which makes HIV so deadly. Advances in understanding gp-120 will enhance our ability to craft effective vaccines.

 

Protein analysis begins with genome sequencing, which yields a 1-dimensional primary structure. This is simply a string on the 20-char alphabet {ACDEFGHIKLMNPQRSTVWY}, where each character represents an amino acid in the linear protein sequence. Shortly after creation (it may be within picoseconds), a nascent protein folds into its 3D tertiary structure. The proteinÕs functionality derives from the shape of this structure and from the distribution of electric charge over the moleculeÕs surface. As the protein folds up, amino acids which are distant in the primary structure can be brought into contact in important ways. Thus a 3d visualization of tertiary structure is greatly superior to a listing of the primary structure. In the case of HIV and gp-120, itÕs important to understand how the elusive mutating sites relate to one another and to the overall protein structure. That is, itÕs important to be able to visualize the feature differences between a reference version of the protein and the many mutations that can be isolated from infected patients.

 

The state of the art involves annotating a RasMol, PyMol, or JMol image. RasMol, PyMol, and JMol are 3D visualizers implemented in X11, Python, and Java respectively. Functionality is nearly identical in all 3 versions. Rendering is quite fast, because the code gets to assume that all objects are either spherical atoms or cylindrical bonds. On my laptop, real-time rotation and zooming can be done with no delay. The programs allow users to annotate images by editing the appearance of designated bonds or atoms. Editing can be done by typing text commands or by mousing. This approach is good for single instances, and marked-up *Mol images often appear in journal articles.

 

The state of the art has two drawbacks. First, any location on the protein can bear more than one feature of interest. Five features of interest are:

1)   Positive selection

2)   Affected by neutralizing antibodies

3)   Not affected by neutralizing antibodies

4)   Receptor binding site

5)   Protease binding site

The second drawback is that markup must be performed manually, which makes inspection of large data sets impractical.

 

I propose to modify JMol so that it displays 6 images simultaneously. Each of the first 5 images will highlight one of the 5 features above; the 6th will show a user-definable Boolean combination of the features. I hope to be able to ÒhijackÓ the mouse events that control image orientation, broadcasting events to all windows so that all images appear identically oriented at all times. (I think this feature is essential to the usefulness of the tool.)

 

If time allows, I also propose to create a markup editor to let researchers enter feature information as it is experimentally acquired, in its most natural format. For example, a researcher might observe that a certain location 123 of sample 45678 is affected by neutralizing antibodies. This researcher shouldnÕt be required to do more than enter the triple {123, 45678, Òaffected by neutralizing antibodiesÓ}; when my application displays the Òneutralizing antibodiesÓ instance of the molecule for sample 45678, location 123 should be appropriately marked. This requires storing observations in a database (ideally relational, but thereÕs no time for that, so IÕll use a flat file), converting the observations to JMolÕs command syntax, and injecting the commands into JMolÕs input parser.

 

A successful implementation will let users visually correlate protein features with the same or other features in nearby locations in the tertiary structure. It may be that otherwise hidden relationships present themselves, and it may be that such relationships inform development of more effective HIV vaccines.

 

Timeline (week #s are wrt Spring Quarter):

 

Week 3: Download JMol, get it working under Eclipse (DONE)

Week 4: Event hijacking

Week 5: 6-instance display

Week 6: Understand command parser, inject one markup command

Week 7: Design & implement feature database

Week 8: Inject features from database

Week 9: Cleanup, demo at Bioinformatics Seminar

Week 10: Document, shakedown, fix bugs, present to CMP261