The inherent structure landscape of a protein
- *Department of Mathematical Sciences, Ibaraki University, Mito, Ibaraki 310-8512, Japan; and
- †Laboratoire de Physique, Ecole Normale Supérieure de Lyon, 46 Allée d'Italie, 69364 Lyon Cedex 07, France
-
Communicated by Hans Frauenfelder, Los Alamos National Laboratory, Los Alamos, NM, January 8, 2006 (received for review April 14, 2005)
Abstract
Using the Gō model of a real protein, we explore the landscape of its metastable structures. First, we show how the inherent structure energy density can be obtained from the probability density determined by sampling molecular dynamics trajectories and quenching. The analysis of the inherent structure landscape can characterize the folding transition. Then we show how thermodynamics of the inherent states can be established to study the equilibrium properties of proteins. Our work brings some elements into the current discussion about the protein dynamical transition. The study uses a simplified model to illustrate the ideas, but, as the inherent structure landscape is much simpler than the free energy surface of the protein, it appears to be accessible for an all-atom model of a small protein, at the expense of much longer calculations.
The concept of energy landscape (1, 2) is very important for proteins because it provides a unifying viewpoint on the structure of the proteins, their ability to fold, and their fluctuations, which are essential to function. The energy landscape of a protein is itself a very complex object so that, except for simple lattice models of proteins (3), it cannot be explicitly obtained. But, do we really need to know this landscape? In this article we argue that the answer is no, or more precisely that a partial view of this landscape can already tell us a lot about a protein. The example of glass-forming materials, which are also very complex, shows that many structural and thermodynamic properties can be deduced from the knowledge of a very small subset of the configuration space, the inherent structures, i.e., the local minima of the energy landscape (4, 5). Here we show how the inherent structure landscape (ISL) can be determined for the model of a real protein, and that it provides a fairly complete description of the thermodynamics of the protein.
The key result needed to establish reduced thermodynamics from the ISL is the corresponding density of states. Previous investigations have pointed out the importance of the general features of the density of energy states in determining the properties of proteins (6, 7). These studies show that the most significant properties of proteins can be obtained from a suitably built random energy model, based on a stochastic Hamiltonian. In this context protein folding has been formulated in terms of a discrete set of microstates, which could be taken as the minima of a continuously defined potential energy surface (7). The spirit of ISL-based thermodynamics is similar, but, in addition we show here how to build a density of microstates that is based on the structure of an actual protein. The framework that we introduce with a simple model to reduce computation time could be used to analyze any model, including models with an atomic description. The calculations would, of course, become much heavier, but they appear feasible with the computing facilities available today.
Exploring the ISL of a protein is not a new idea, but, to our knowledge, up to now this exploration has been restricted to its structural properties, for a model heteropolymer with 46 beads (8). We consider here a 56-aa real protein, and even with a very simple model, it implies that the number of inherent structures is increased by more than four orders of magnitude. An exhaustive search is not possible, but we show that a statistical physics analysis can be used to extract the necessary information from the phase space trajectories of the thermalized protein. Then we compare the thermodynamics over the ISL to the full thermodynamics of the protein.
We selected protein G (9) for this study because it is rather small (56 residues) but nevertheless includes typical secondary structure elements of proteins, one α-helix and two β-strands. We use molecular dynamics at constrained temperature to let the protein explore its free energy landscape, and applying rapid quenchings, i.e., calculating a steepest descent path from different points of this trajectory, we sample the underlying inherent structures. Although the characterization of the ISL is much simpler than a full determination of the energy landscape, calculations with an all-atom model of the protein would require a large computer power. Here, to illustrate the ideas, we use an off-lattice Gō model (10), which is rather simple but nevertheless accurate enough to reproduce experimental properties of the transition state for several proteins (11).
Density of States for the Inherent Structures
Molecular dynamics simulations show that the model has the expected qualitative properties of a good folder: at low temperature it folds to the global minimum of the potential energy, whereas at high temperature it denaturates. If the model is cooled down quickly from a high-temperature unfolded state to a temperature well below its folding temperature (T f), the folding process is, however, very slow, and may never reach a shape close to the native shape within the time scale accessible in a simulation. This finding suggests that this model protein exhibits the glassy properties observed in actual proteins. The folding transition can be detected from the temperature variation of the specific heat C v. As shown in Fig. 1, a characteristic peak of C v appears at a temperature that we define as T f. The ISL is determined by the basins of the potential energy surface, labeled by an index α, which are defined as the set of conformations that are connected to the same local energy minimum, called the inherent structure, which has an energy e α.
Temperature dependence of specific heat C v and probability distribution of the inherent-state energies, P IS(e α , T) at three temperatures in the vicinity of the T f.
The rapid change of the shape of the curve showing the probability density of inherent structures P
IS versus their energy e
α in the vicinity of T
f confirms this interpretation: below T
f, P
IS(e
α, T) is higher for the low values of e
α. At T
f it shows two maxima, one for low energies corresponding to folded states, and one for high-energy, unfolded states. Above
Tf, P
IS(e
α, T) is dominated by the high-energy, unfolded states. To characterize the ISL we need not only the values of the energies e
α but also their density of states Ω
IS(e
α). A realistic protein has a huge number of inherent states so that Ω
IS(e
α) cannot be determined from a systematic search. Instead it can be deduced from the sampling of the inherent structures that
gives P
IS
. In the basin of attraction of inherent structure α, the potential energy of the protein can be written as V(r) = e
α + ΔV(r), where r designates all of the degrees of freedom of the protein. Let us assume that the inherent structures can be split into a set
of discrete states α0, α1, …, α
K (which includes at least one state, the ground state) and a continuum of higher-energy states. With this notation, the configurational
contribution of the partition function of the protein is
where e
max is the highest inherent structure energy. Moreover we have introduced
which is the vibrational free energy in the basin of attraction B(α) of the inherent structure α. The probability to be in the basin of attraction of a discrete inherent structure is
and, for the continuum of inherent structures, the probability density of inherent structures is
If we can assume that the free energy F
v(α, T) does not depend on the inherent structure, in Z the term exp[−βF
v(T)] can be factorized so that p
αi and P
IS simplify into
with
ZIS can be viewed as an inherent structure partition function. It can be expressed in terms of the probability p
α0(T) that the protein is in the basin of attraction of the ground state, which is henceforth denoted as p
0(T). This allows us to rewrite P
IS(e
α, T) as
if we chose the ground-state energy as the reference state of the energies (e
α0 = 0). Eq. 7 provides a method to compute ΩIS(e
α) from the probability density P
IS(e
α, T) deduced from the sampling of molecular dynamics trajectories. The validity of the method relies on the assumption that the
vibrational free energy in a basin of attraction F
v(α, T) does not vary significantly from a basin to another. For harmonic basins of attraction, we have
where the values ωq are the frequencies of the vibrational modes of the protein when it is in the configuration corresponding to the inherent
structure α. Of course, these frequencies change individually from one inherent state to another, but they keep the same order
of magnitude, and the sum has an averaging effect that keeps F
v(α, T) only weakly dependent of the conformation.
This conjecture is verified by our calculations because Ω IS(e α) deduced from P IS(e α, T) at different temperatures gives the same result in all of the ranges of inherent structure energies that are accessed with a sufficient probability to allow a correct sampling at the temperature considered. Fig. 2shows the density of states Ω IS(e α) computed from the sampling of molecular dynamics trajectories at three temperatures. Fig. 2's results are remarkably similar, showing that the assumption that F v(α, T) can be eliminated in the calculation is a good approximation; however, the results are obviously not identical. Fig. 2 Right, deduced from the calculation at the lowest temperature, shows some values of e α where the density of states appears to vanish, whereas it does not vanish if Ω IS(e α) is computed from a trajectory at higher T. Such inconsistency is particularly noticeable for high values of e α, and it points out the limit of the numerical calculation. As shown from Eq. 7, the probability to be in the basin of attraction of an inherent structure decreases as exp(−e α/k B T). Therefore, if we use a molecular dynamics trajectory at a low temperature to sample the basins of attraction of the inherent structures, it may happen that we completely miss some inherent structures because of insufficient sampling. This outcome is even more likely if the basin of attraction of a given inherent structure is narrow because, besides the temperature factor, its small size also reduces the chance to sample it properly. Studies performed at higher temperatures reduce this problem and T = T f is particularly appropriate because we expect a molecular dynamics trajectory to sample both structures corresponding to folded and unfolded states. Once Ω IS(e α) has been determined from Eq. 7 at a temperature where the sampling is particularly efficient, Z IS(T) can be computed at any temperature with Eq. 6.
Comparison of the density of states Ω IS(e α) computed from the probability densities P IS(e α, T) deduced from molecular dynamics trajectories at three different temperatures: T = T f (Left), T = 0.83 T f (Center), and T = 0.69 T f (Right).
Fig. 3displays Ω IS(e α) in a large range of energies. It shows an isolated minimum energy state, followed by a few states very close in energy and then a large gap, of the order of k B T f, before the next state. A plot of the structure shows that the ground state and states immediately above it in energy have very similar shapes. Thus the few states near the minimum can be viewed as conformational substates (12) of the ground state. The gap between these states and the next states indicates that the model protein is a good folder (3), as observed in molecular dynamics simulations.
For larger energies Ω IS(e α) shows an exponential growth, with two different slopes depending on the energy range:
with
where Ω0 and A are positive constants, and T s = 1.1 T f and T u = 0.9 T f. It is interesting to notice that a similar exponential dependence of the density of metastable state was found for spin glasses (13) although the energy dependence of g(e α) is more complex for such systems.
Thermodynamics in the Inherent Structure Space
The density of states in the ISL can be used to analyze the folding of the protein. Substituting the expression of Ω IS(e α) into Eq. 7 gives
But Eq. 11 is also the expression that gives the behavior of P IS(e α, T) at low temperature (T < T u) because, in this temperature range, the equations giving P IS(e α , T) show that low-energy inherent structures dominate. On the contrary, Eq. 12 should be used at high temperature (T > T s), because in this range high-energy inherent structures dominate. As shown in Fig. 1, in the range T u < T ≈ T f < T s none of the energy ranges dominate, and P IS(e α, T) has two humps, one of them at low e α corresponding to folded structures, the other one at high e α, corresponding to unfolded structures. Therefore, when we take into account the expression of Ω IS(e α), we find that the shape of P IS(e α , T) changes qualitatively in a narrow temperature range around T f. This picture is consistent with a first-order transition, T u < T < T s being the coexistence region, so that the two temperatures exhibited by the analysis of Ω IS(e α) appear as the limit of the spinodals. These characteristics are consistent to the behavior of specific heat C v. As is indicated in Fig. 1, T s and T u are close to the temperatures at the start and the end of the peak for C v.
As noticed above, the exponential growth for the density of state Ω IS of the inherent structures is not a special feature of the protein model but was also obtained in spin glasses (13). However, in the present case, the exponent g(e α) is connected to the folding transition, which is specific to proteins. We also observed a similar behavior in a standard Gō model, without interaction through dihedral angles.
The density of inherent structures energies and the associated partition function Z IS(T) are not only useful for characterizing the folding transition. They can also tell us a lot about the thermodynamics of the protein. We can build thermodynamics based only on the configurations of the inherent structures by the usual scheme of statistical mechanics,
where U IS, F IS, S IS, or C IS are the internal energy, Helmholtz free energy, entropy, or specific heat, respectively, deduced only from the thermal fluctuations between conformations, so that they can be called conformational energy, conformational free energy, etc. They are shown in Fig. 4.
Thermodynamic quantities defined from the inherent structures. (Top) Conformational specific heat C IS/k B. (Middle) Magnification of the variation of the specific heat in the low-temperature range, also showing the contribution C W = C v − C IS, which is expected to reflect the properties of thermalization inside each well. Notice that the value shown is C W/N r, i.e., the vibrational specific heat per residue, while C IS, which is expected to reflect a global property of the protein, is shown for the full molecule. The dashed line indicates the harmonic limit C W/N r = 3 k B/2. (Bottom) Conformation entropy S IS and conformation energy U IS. (Inset) Conformational entropy in logarithmic scale.
The conformational specific heat is remarkably similar to the full specific heat of Fig. 1. The detailed analysis of Fig. 4 Middle, which displays the vibrational specific heat per residue C W/N r = (C v − C IS)/N r and conformational specific heat C IS, shows that the vibrational contribution of the specific heat is very close to the value 3 k B/2 expected for harmonic oscillators. This finding suggests that the thermalization in the basin of attraction of each inherent structure allows a good separation between a vibrational part and a conformational part of the specific heat. When comparing the values of C W and C IS one should notice an essential difference between the two. C W is a vibrational contribution, which comes from all of the residues and, as it is well approximated by the harmonic value, indicates that the contributions of all vibrational degrees of freedom are independent, so that C W/N r has an actual physical meaning. On the contrary, the structural fluctuations entering in C IS may be either a flip of a residue, for instance, and thus can be very local, or a slow structural change extending to a significant part of the protein, such as the motion of the whole α-helix. This is why the quantity C IS/N r would not have a physical meaning because a protein is not a self-similar structure. A meaningful comparison must compare C IS with C W/N r: structural fluctuation will start to play a role in the physics of the protein when C IS > C W/N r. As shown in Fig. 4, C IS starts to rise significantly ≈0.3 T f and quickly takes over for T > 0.4 T f, whereas the fluctuations are dominated by the vibrational contribution in the low temperature range.
The behaviors of the conformational entropy and energy (Fig. 4 Bottom) show a similar rise clearly visible for T > 0.4 T f. The increase of entropy that appears to accelerate and become exponential for T > 0.4 T f (see Fig. 4 Bottom Inset) can be understood as coming from the structural variety observed in the thermal fluctuations. This idea would be consistent with the behavior of C IS and C W.
Discussion
The results of the thermodynamics of the inherent structure states shed an interesting light on the current debate about the existence and observation of a protein dynamical transition (14). For our model protein two temperatures appear to play a particular role. The lowest one is 0.05 T f, above which energy and entropy show a rise (Fig. 4). With the resolution of our calculation,§
The results from thermodynamics in the inherent structure space are not affected by a possible lack of ergodicity at low temperature because they are established from a density of states deduced from simulations around the T f. Their accuracy is determined by the quality of the sampling of the phase space in the vicinity of the T f.
<0.05 T f, fluctuations among inherent states seem completely suppressed, so that this temperature could be viewed as the Kauzmann temperature for this model (15). One should, however, notice that Fig. 4 shows a gradual decrease of the entropy ≈0.05 T f, suggesting that, as previously noticed for another off-lattice protein model (16), a true thermodynamics glass transition does not exist for a finite system such as a protein. The other characteristic temperature is 0.4 T f. It should correspond to a qualitative change in some observations on the protein because, around this temperature, conformational changes among the different inherent structures start to dominate the fluctuations, as shown by the specific heat. Thus a glassy behavior could be expected when the temperature is lowered to values in the range 0.05 T f < T < 0.4 T f or below.However, as far as we deal with equilibrium aspects, the characteristics of a glass phase are not detected in our study because they only show up in out-of-equilibrium situations. From the viewpoint of equilibrium thermodynamics, we cannot find a characteristic change caused by the “glass transition.” In the temperature range 0.05 T f < T < 0.4 T f, in which we could expect the glass transition, the specific heat C v only shows a smooth evolution without any drop when temperature is lowered. In the thermodynamic sense there is no transition. However, the picture could become different if nonequilibrium aspects were probed. For instance, when cooling the protein rapidly from high temperature, a drop of C v might be observed and a transition temperature identified in the range 0.05 T f < T < 0.4 T f, but it should depend on the cooling rate.
Our results raise a question about the nature of the fluctuations that we observe: do they correspond to the α or β relaxations observed in proteins (14)? Experiments (17, 18) and molecular dynamics simulations in which the temperature of the solvent and the protein are constrained to be different (19) show that the solvent mobility is an important factor in determining the large-scale fluctuations of a protein. The α relaxation is “slaved” to the solvent fluctuations, which is not the case for the glassy aspects of our results because the solvent only enters into our study through its contribution to the interaction energies not through its dynamics because the model uses effective potentials to include hydrophobic effects. The fluctuations must be attributed to the protein energy landscape, which suggests that the fluctuations that are observed in our simulations correspond to the β relaxation, i.e., to fluctuations that do not involve large conformational changes that could be hindered by a frozen solvent in an actual experiment. This makes sense if we think of the peculiarities of the Gō model, which strongly favors the native structure by the selection of the interactions that are introduced in the model. It is likely that such a bias toward the native shape prevents such a simplified model from exhibiting an α relaxation.
In the studies of glassy systems with inherent structures, it is customary to introduce the configurational entropy
In the context of a thermodynamic description based on the inherent structures, s c can be viewed as the microcanonical entropy, whereas S IS(T) defined by Eq. 15 is the canonical entropy. Therefore, in the thermodynamic limit, we could expect that s c(ẽα) = S IS(T), where the most probable value ẽ α of the inherent structure energy would also tend toward its mean value 〈e α〉. However, this is not true for a protein, because of the finiteness and heterogeneity of the system, but also because, below T f, the most probable value of e α is the ground state of the protein. In this regime the density of states Ω IS(e α) is dominated by a few isolated states, and one cannot build thermodynamics on the configurational entropy, whereas S IS(T) defined by Eq. 15 has the expected properties of a thermodynamic entropy, such as dS IS(T)/dU IS(T) = 1/T.
In this article we have shown that the ISL of a protein can be analyzed because its density of states can be deduced from the probability density of inherent states extracted from molecular dynamics simulations. The ISL contains considerably less information than the free-energy landscape of the protein, and this is what makes it accessible. At the expense of much longer calculations, the ISL appears to be accessible for an all-atom model of a simple protein because our scheme to obtain it does not require the full determination of every local minimum because it takes advantage of the usual ability of statistical physics to provide very accurate results from a simplified description of the system. An important aspect is that, in this framework, all thermodynamics properties are deduced from the density of inherent states that is obtained from molecular dynamics simulations in the vicinity of T f to sample folded and unfolded states. As a result the difficult issue of the ergodicity of simulations at very low temperature can be avoided.
When a complicated model, such as an all-atom description is considered, the landscape becomes immense and one may be concerned by the validity of the sampling of the phase space to get the inherent structure density of states. But it should be noted that, even for a frustrated Gō model, the landscape is rugged and complex. For our protein model, it is multifunneled. Kinetic studies of folding detect the existence of a second folding funnel, leading to a kinetic trap (unpublished work). The density of inherent states in the kinetic trap can be obtained and it is very low. Therefore, although the second funnel plays a significant role in the kinetics of the folding, it does not show up in the thermodynamics of the protein. This is an interesting aspect of the canonical sampling that we use. It introduces some self-consistency in the method because states that are important for the reduced thermodynamics are also those that are more likely to be sampled. Another test of the validity of this approach is provided by the possibility to get Ω IS(e α) by sampling at different temperatures in the vicinity of T f. They should lead to the same structure for the density of states. This is why, despite the obvious difficulty of carrying long molecular dynamics trajectories at different temperatures for a complex protein model, the thermodynamics in the ISL seems reachable in this case.
The Gō model of a real protein is sufficient to allow the investigation of the properties of the ISL. It shows that, despite the loss of information with respect to the full free energy landscape, the ISL still contains a lot of data on the protein. It can be used to study the folding transition because P IS exhibits a qualitative change across T f and can even detect the associated spinodals. The density of inherent states shows an exponential scaling, which can be related to the folding transition. This finding suggests that it could be a general property of proteins, and it will be interesting to test this hypothesis on other protein models, including all-atom descriptions. The ISL can also be used to analyze the thermodynamics of the protein. It cannot observe the vibrational contributions, but detects the most important features of proteins, their conformational fluctuations that are responsible for their function.
Materials and Methods
The potential energy of our Gōmodel of protein G is
where N r is the number of residue, and b i, θ i, and φ i are the ith bond length, ith bond angle, and ith dihedral angle, respectively. r ij is the distance between ith and jth residues. The index 0 denotes the parameters determined from the native structure. Such a model includes local and nonlocal interactions. The local interactions are the bonds that define the Cα chain through stiff elastic potentials describing the covalent bonds, angular potentials between adjacent bonds, and dihedral potentials. The natural bond length or bond angles are determined from the native structure. The nonlocal bonds connect Cα carbons that are not adjacent along the peptide chain, but nevertheless close in space in the folded geometry of the protein. Such carbons are said to form a “native contact” and interact with a Lennard-Jones potential in the model. They are determined from the experimental structure of the protein in its folded state: two Cα carbons are said to form a native contact if the distance between them or between atoms of the side chains attached to them is less than a critical distance, here chosen as 5.5 Å. A repulsive interaction is introduced between all Cα carbons that do not form a native contact to describe the steric repulsion. Such a Gō model is specifically designed to ensure that the system reaches the correct folded structure and these models are generally good folders, i.e., their folding transition is sharp and leads to a well defined structure, because they are minimally frustrated. A comparison of the folding of an off-lattice Gō model and a frustrated model based on hydrophobic (B), hydrophilic (P), and neutral (N) residues (BPN model) showed that the Gō model leads to a sharp transition where collapse and folding occur simultaneously, whereas the highly frustrated model shows a very broad collapse and even at the lowest temperature the number of contacts is far from maximal (20), pointing out that a realistic model of protein should be minimally frustrated (2). In our model a slight frustration is introduced (10) by assuming that the dihedral angles have the ideal values 45° or 135° of the α-helices or β-sheets instead of the angles deduced from the native structure. The native structure of protein G has been obtained from the Protein Data Bank (21). The dimensionless parameters of the model are K b = 200.0, K a = 40.0, K d = 0.3, ε = 0.18, and C = 4.0. The experimental structure lies in the basin of the global minimum of the model but does not correspond exactly to the ground-state structure of our model because of the frustration introduced on the dihedral angles.
Thermalized phase space trajectories are generated for this model, using underdamped Langevin simulations, where the masses of all of the residues are assumed to be equal. Our goal here is not to explore out-of-equilibrium states, but rather to analyze the equilibrium properties of a protein. Therefore, to avoid long nonequilibrium transients that occur on cooling, except for the general tests of the model properties, all of our simulations started from the ground state and heated up to the temperature of interest. To collect statistics over a given equilibrium trajectory, a typical simulation included 3 × 108 steps after the desired temperature was reached. For each temperature 20 simulations were performed with different sets of random numbers in the Langevin equations.
Equilibrium trajectories were used, on one hand, to analyze the thermodynamic properties of the model, and, on the other hand, to characterize its ISL. The basins of the potential energy surface were determined by quenching 6 × 104 instantaneous configurations separated by 105 steps, i.e., calculating a steepest-descent path from these points of the trajectory. A similar algorithm had been used earlier for the BPN model of a four-helix bundle protein (22) to detect states that could be long-lived intermediates in the folding.
Acknowledgments
We thank Fumiko Takagi for helpful advice. This work was supported by the Japan Society for the Promotion of Science and the Centre National de la Recherche Scientifique.
Footnotes
- ‡To whom correspondence should be addressed. E-mail: nah{at}mx.ibaraki.ac.jp
-
Author contributions: N.N. and M.P. performed research and wrote the paper.
-
↵ § The results from thermodynamics in the inherent structure space are not affected by a possible lack of ergodicity at low temperature because they are established from a density of states deduced from simulations around the T f. Their accuracy is determined by the quality of the sampling of the phase space in the vicinity of the T f.
-
Conflict of interest statement: No conflicts declared.
- Abbreviations:
- ISL,
- inherent structure landscape;
- Tf,
- folding temperature
Abbreviations:
- © 2006 by The National Academy of Sciences of the USA









