Philip Heller's CMPS 261 Project

Spring 2010

Visualizing the HIV gp-120 Envelope Protein


Description:

This project was developed in association wih the Phillip Berman lab, which works on HIV vaccine development. Thanks to Prof. Berman and to Sara O'Rourke for their guidance and assistance.
I extended the standard JMol program to support visualization of 5 biochemical features of the gp120 envelope protein of the HIV virus. These features are:
  • Receptor binding sites: sites where the virus makes initial contact with a host cell
  • Conservation: sites where mutation is low
  • Neutralizing antibodies: sites where the protein is vulnerable to attack by the host's immune system
  • Glycosylation: sites where the protein protects itself with a carbohydrate shield
  • Protease binding sites: sites where the protein can be cut by an enzyme

  • Standard JMol can draw a 3D image of proteins (including gp120), and supports mouse-driven rotation and zooming. A scripting language supports annotation of the image, usually by adjusting the size and color of individual amino acids of interest. Scripts can be written by hand or automatically generated from data.

    A single JMol image cannot usefully carry information about the 5 features. For example, color coding is not an option because JMol only allows single-color markup of amino acids. One molecule view per feature is much easier to understand, but two problems must be overcome:
  • Users will be trying to address questions such as "At what sites can either receptors or proteases bind?" or "Are there any sites where antibodies can attack despite glycosylation?" For such inquiry, a single view with multiple features is preferable, but only features of ad-hoc interest should be shown.
  • With multiple views, after only a few mouse gestures the 5 molecules can be oriented/zoom arbitrarily with respect to one another. At this point, correlating features in the mind's eye is difficult or impossible.

  • My work consists of 3 pieces of functionality. The first generates markup scripts from lab data; the other 2 address the two issues above. For ad-hoc combination of features, I implemented not 5 but 6 JMol views. The 6th, called the "Predicate View", marks sites that satisfy a boolean predicate. A dialog box allows easy specification of the operators AND, AND NOT, OR, and OR NOT. To maintain consistent orientation among all 6 views, I implemented an event echoing scheme that ensures identical roll/pitch/yaw/zoom of all molecules. Implementation details are in the "Implementation" section below.


    Screenshots:

    All 6 images.



    Predicate Configuration Dialog.



    Glycosylation AND Antibodies (white clump in upper-left).



    Glycosylation AND Antibodies AND Conservation.


    Implementation:

    JMol is an open-source Java application consisting of nearly 500 source files and 500,000 lines. I wanted to modify this base as little as possible, to avoid the risk of breaking interactions among subsystems. I was able to implement the functionality I wanted in original code, except for 2 small exceptions. First, it was necessary to change a number of access modifiers to "public", as the JMol development team hadn't anticipated my needed the (previously non-public) data and methods. Second, I needed to add 2 lines of logic to the existing mouse motion event handlers. The code submitted with this project is just my own; the JMol sources require an extensive installation process, and for best results should be used under Eclipse. (Google "JMol source download" for instructions, and be prepared to spend a little time).

    My project consists of 13 Java source files (~1700 lines). The top-level application class creates 6 instances of the JMol display panel, and installs each in its own custom frame. The custom frame applies the event echoing scheme to the display panels.

    To support event echoing, I extended Java's MouseEvent class. The new class is called MouseEventEcho. A central dispatcher adds itself to all 6 JMol displays as a mouse motion listener. On receipt of a MouseEventEcho the dispatcher does nothing; on receipt of an original MouseEvent, the dispatcher uses the event as a template for creating a corresponding MouseEventEcho, which is sent to all JMol displays except the one that reported the original event. The upshot is if any mouse gesture happens in any view, it's as if the gesture had happened simultaneously in all the views. The mechanism succeeds in maintaining identical orientation of all 6 molecules.

    I had two data sources. One sourse was tabular data from an article [1] produced by the Berman Lab describing locations of receptor binding, antibodies, glycosylation, and protease binding; protease binding was reported as prevalence percentage, while the other 3 were reported as present or absent. The second source was an Excel spreadsheet listing all observed single-point mutations at all sites. I wrote a script generator that parses the data and emits one script for each of the 5 features. Each script marks its feature with a unique color. Molecule size ranges from a minimum to a maximum. For receptor binding, antibodies, and glycosylation, sites bearing the feature are rendered at maximum size. For protease binding, prevalence (0% to 100%) is mapped to the size range. For conservation, the number of mutation types can range from 1 to 20, and this is mapped to the size range (so small spheres are less subject to mutation than large ones).

    The predicate view is configured by a dialog which allows 2 to 5 features to be connected by the operators AND, AND NOT, OR, and OR NOT. Data is considered numerical rather than boolean, since protease binding and conservation are prevalences. Operators are computed with fuzzy logic. So for example the predicate "glycosylation AND conservation" will show glycosylation sites, with atoms sized according to conservation. When the user chooses a new predicate, a new predicate script is generated and applied to the predicate view.

    User's Guide:

    The 6 molecule views and the predicate editor are always visible. To manipulate all molecules, use the mouse on any individual molecule and execute the standard JMol gestures:
  • Pitch forward/back = Drag up/down.
  • Yaw left/right = Drag left/right.
  • Roll left/right = Shift-drag left/right.
  • Zoom out/in = Shift-drag up/down.


  • To edit the predicate, just change the combo-box selections for features and operators. After the last (RHS) feature is an operator combo set to "X--X--X". If you want a longer predicate, set this combo to the desired operator; the following feature combo will become enabled. Do the reverse if you want a shorter predicate: set the operator combo following the last desired feature to "X--X--X"; all subsequent combos will disable.

    Links:

  • There is no link to the executable because it's mostly a huge 3rd-party class structure that requires non-trivial installation.
  • Source code: Jarfile containing the 13 Java sources. Click below for individual source files.
  • Data: Here is the molecule Description file (PDB format), and here is the mutation spreadsheet (exported as tab-delimited text).
  • Progress Report (Powerpoint)
  • Final Presentation (Powerpoint)
  • Written Report