Understanding the Determining Factors and Their Cooperative Effects on Protein Folding
Li, Yue, 1978 April- (author)
Tyson, Gary (professor co-directing dissertation)
Zhang, Jinfeng (professor co-directing dissertation)
Berg, Bernd (university representative)
Liu, Xiuwen (committee member)
Kumar, Piyush (committee member)
Department of Computer Science (degree granting department)
Florida State University (degree granting institution)
Protein is a linear chain of amino acids bonded by peptide bonds. Protein plays a vital role in almost every biological process. For most proteins, they need to fold into a stable 3D structure (native state) in order to function properly. This process that proteins fold from the sequence of amino acids to the 3D structure is known as protein folding. The relations among protein sequence, structure and function have been studied for many years in lab by expensive experimental methods such as X-ray crystallography and NMR spectroscopy. The emerging high performance microprocessors enable efficient and accurate protein structure simulation. Despite extensive studies previously, the factors behind the protein folding and their cooperative effects are still not completely understood. This dissertation presents the methods and results about investigating the determining factors for protein folding and their cooperative effects. We start with a simplified protein folding model, three-dimensional hydrophobic-polar (HP) model which only care about the hydrophobic interaction because it is believed that the hydrophobic interaction between residues is the driving force for protein folding. Although HP model is simple, it is useful enough to provide us a platform to measure the stability of protein which will eventually benefit us in the design of protein sequences. The funnel-shape energy landscape theory  states that a protein can fold into its specific three-dimensional structure through multiple pathways. There are many local minima during the process of folding. Protein can be trapped in these local minima temporarily, but eventually it will be guided towards the native state. Therefore, energy landscape entropy (ELE) is proposed to measure the foldability of a protein sequence. Based on our experimental results, the proportion of hydrophobic residues which is defined as hydrophobic content (HC) plays a determining role in the foldability of a protein sequence. A specific range of HC is the key for HP sequences to maintain the stability. To compare our simulation results with real protein sequences, we probed the proportion of hydrophobic residues values in a set of non-homologous globular proteins. We also reviewed the HC values of sequences of intrinsically disordered proteins (IDPs) from DisProt database. We then compared the hydrophobic contents between globular proteins and intrinsically disordered proteins (IDPs). We found that IDPs tend to occupy the lower range in the HC spectrum which matches the fact that IDPs have marginally stable states in their energy landscapes. The research on the proportion of hydrophobic residues shed a light on the protein design because it will prune the astronomical space of protein sequences. Previously, we have to enumerate every possible protein sequence in order to design a protein with specified function. With the aid of the knowledge about the proportion of hydrophobic residues, we can exclude a large number of redundant sequences whose proportion of hydrophobic residues is not eligible. However, the protein folding is a complicated process affected by multiple factors. In order to investigate these factors and their cooperative effects, an optimized energy function on a more realistic protein model is required. The quality of an energy function is the key of the success in protein structure prediction. Optimizing an energy function is a non-trivial problem in protein structure prediction. We propose to optimize the weights associated with energy terms by using the near native structures (NNS). The near native structure (NNS) is defined as the set of conformation whose side-chain dihedral angles are within a certain degree away from the native structure in this dissertation. The near native structure provides us an opportunity to assess the quality of an energy function. Correspondingly, the probability of the near native structure for a single type of residue is proposed. The optimized energy function will minimize the probability of the near native structure. The results indicate that our method invariably improve the average probability of the near native structure. We also find the similarity in the topology of side-chain infers the similar weights. However, there is no general configuration of weights for all twenty types of amino acids under current energy function. It is obvious that not all the determining factors are integrated into the current model consisting of only five energy terms. The current energy function is too coarse-grained to reflect the cooperative effects of all determining factors. When scientists design the energy function for protein structure prediction, it should be noted that the energy function should be fine-grained enough to disclose all the determining factors behind protein folding.
clustering analysis, conformation sampling, energy function, energy landscape, near native structure, protein folding
June 12, 2012.
A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Includes bibliographical references.
Gary Tyson, Professor Co-Directing Dissertation; Jinfeng Zhang, Professor Co-Directing Dissertation; Bernd Berg, University Representative; Xiuwen Liu, Committee Member; Piyush Kumar, Committee Member.
Florida State University