Chuck Sugnet's Home Base
Stuff for work and play...

Proteins:

Proteins are the macromolecules that preform almost all of the cell's work. Proteins are used for energy, communication, ezymatic activity, communication, transport, and many other things. Proteins are very complex and versitile macromolecules. Proteins are involved in almost every cellular process and come in many shapes and sizes. however all proteins are made up of only twenty amino acids. From a chemical point of view proteins are just heterogenous polymers composed of the twenty amino acids.

Each of the amino acids have different chemical structures and characteristics. By combining the properties of these different amino acids proteins can preform a diverse range of functions. Amino acids have two forms of shorthand, a one character code and a three character code. Table 1 lists the amino acids in alphabetical order along with the corresponding shorthand and chemical characteristics. For more information see the Introduction to Biomolecular Modeling at Tufts

Table 1: Amino acids and their abbreviations
Amino AcidCodeCodehydrophobicPolarAromaticAliphaticSize
AlanineAlaAy   tiny
ArginineArgR y+   
AsparagineAsnN y  small
Aspartic AcidAspD y-  small
CysteineCysCy   small
Glutamic AcidGluE y-   
GlutamineGlnQ y   
GlycineGlyGy   tiny
histidinehishyy+y  
IsoleucineIleIy  y 
LeucineLeuLy  y 
LysineLysKyy+   
MethionineMetMy    
PhenyalaninePheFy y  
ProlineProP    small
Serine Ser S   y     tiny
Threonine Thr T y y     small
Tryptophan Trp W y y y    
Tyrosine Tyr Y y y y    
Valine Val V y     y small


These long chains of amino acids fold into very specfic shapes in solution to function properly. Scientists have definded four levels structure in proteins:
  1. Primary Structure: The sequence of amino acids in the protein. By conventrion sequences are written starting at the N-terminus and ending at the Carboxyl terminus.
  2. Secondary Structure: how local areas of the protein fold. Examples of this are alpha helixes and beta sheets.
  3. Tertiary Structure: The folding of the complete protein, how the secondary structures fold to form the protein in solution.
  4. Quaternary Structure: how multiple proteins interact to form on large functional complex. An example of this would be members of the spicesome interacting to form a functional splicing complex.

One of the big challenges in bioinformatics today is to be able to predict seconndary and tertiary structure of proteins given the primary sequence. Right now it is practically impossible to predict tertiary structure of a protein given only the primary sequence of amino acids.

Another challenge in bioinformatics is to predict the function of a protein given the primary sequence. It is possible to do this even though we cannot predict the tertiary structure of a protein and it is that tertiary sturcture that determines the function. The trick is to compare the new sequence to other sequences that we know the function of. It turns out that nature is very conservative, once something works nature may tinker with it but often the basic functionality will remain the same. An example of this is hemeoglobin, the protein that carries oxygen in blood. Many different forms of this protein exist in different species, however they all have a group of core of amino acids that are the same. These homologous regions leads to subsequences of proteins that have specific functionality and are conserved in nature, these are calledmotifs. By finding motifs that we know in novel proteins can predict the function of the novel protein even though we don't know it's 3-dimensional structure. Other proteins are not so obviously related at the primary structure (amino acid) level but are related at the secondary and tertiary structure levels. For example myoglobin and hemoglobin have strutural similarities, only about 20% of the amino acids are the same. (thanks to Dr. "hank" for the correction)