Shen et al. 10.1073/pnas.0800256105.

Fig. 5. Plots of fragment accuracy for GB3. For each specific GB3 segment, 200 fragment candidates were selected using either the standard ROSETTA procedure (filled triangles), or from an MFR search of the 5665-protein structural database, assigned by the programs DC (filled circles), or SPARTA (filled diamonds). Like SPARTA, DC also can readily assign chemical shifts to a large database of protein structures, but the error in predicted chemical shift is on average slightly worse than for SHIFTX, and about 17% worse than SPARTA. For all panels, coordinate rmsds (N, Ca, and C') between query segment and selected fragments are normalized with respect to randomly selected fragments (i.e., the average rmsd between this target fragment and 1200 randomly selected fragments of the same length). The averaged rmsd of the 200 selected fragments is plotted as a solid line; dotted lines represents the lowest rmsd (best fragment out of 200). Average (A) and lowest (B) rmsd of 200 selected fragments, as a function of fragment size, relative to the NMR coordinates of the corresponding GB3 segment, averaged over all (overlapped) consecutive segments. (C and D) Average rmsd of 200 nine-residue (C) and three-residue (D) fragments relative to the x-ray coordinates, as a function of position in the GB3 sequence. (E and F) Lowest rmsd of any of these selected nine-residue (E) or three-residue (F) fragments.

Fig. 6. Comparison of results obtained with standard ROSETTA and CS-ROSETTA for ubiquitin and GB3. All atom energy versus Ca rmsd of the ROSETTA models obtained using standard sequence based ROSETTA-selected fragments (Upper) and chemical shift based MFR-selected fragments (Lower) for ubiquitin (Left) and GB3 (Right). All-atom energies correspond to the raw ROSETTA energy score, before rescoring using experimental chemical shifts

Fig. 7. Plots of ROSETTA all atom energy versus Ca rmsd relative to the experimental structures for proteins of Table 1, not presented in Fig. 2. For each of these proteins, the upper plots show the standard ROSETTA all atom energy versus Ca rmsd from the experimental structures (see SI Table 3), and the lower plots show ROSETTA all atom energy rescored by using the experimental chemical shifts (cf. Eq. 1). The model with the lowest energy, marked by an arrow, is shown either in Fig. 3 or SI Fig. 9.

Fig. 8. Plot of c2cs score (Eq. 1b) of CS-ROSETTA models versus Ca rmsd relative to the experimental structures for proteins listed in Table 1.

Fig. 9. Backbone ribbon representations of the lowest-energy CS-ROSETTA model (red), superimposed on the experimental x-ray/NMR structures (blue) for the proteins listed in Table 1, with superposition optimized for ordered residues, as defined in the footnote to SI Table 3. Overlays of the 6 remaining structures are shown in Fig. 3.

Fig. 10. Plots of ROSETTA all atom energy versus Ca rmsd of CS-ROSETTA models relative to the lowest-energy models for each of the 16 test proteins of Table 1.

Fig. 11. Plots of ROSETTA all atom energy versus Ca rmsd of CS-ROSETTA models for the 7 proteins of SI Table 4, for which no convergence was obtained. For each protein, the upper panel presents the chemical-shift-rescored ROSETTA all-atom energy versus the Ca rmsd from the experimental structure; for the lower panels the Ca rmsd is calculated versus the coordinates of the lowest-energy model, whose energy is marked as a bold dot on the y axis. For nsp1 protein, the lowest-energy model is the only one out of 12,000 generated models that has the same topology as the experimental NMR structure, and even then it deviates considerably (backbone rmsd of 5.1 Å) from the experimental NMR structure.

Fig. 12. CS-ROSETTA structures generated for five structural genomics targets (Table 2). The remaining four are shown in Fig. 4. (A-E) Superposition of lowest-energy CS-ROSETTA models (red) with experimental NMR structures (blue), with superposition optimized for ordered residues, as defined in the footnote to SI Table 5. (A'-E') Plots of rescored (Eq. 1) ROSETTA all-atom energy versus Ca rmsd, calculated relative to the lowest-energy model (bold dot on y axis). (A and A') TR80; (B and B') RhR95; (C and C') PsR211; (D and D') AtR23; (E and E') NeR45A.

Fig. 13. Accuracy of the models in subsets randomly selected from the final ROSETTA all-atom models. For each protein (Table 1 and SI Table 3), the Ca rmsd values (relative to the experimentally determined reference structure) of the lowest-energy models in 100 randomly selected 5-, 50-, 100-, 1000-, 5,000-, and 10,000-sized subsets from the final ROSETTA all atom models were calculated and these averaged values are plotted against the size of the subsets. The figure shows that for 13 of the 16 proteins, generation of 5,000 ROSETTA full atom models suffices to yield a lowest-energy model that differs by £0.2 Å from the lowest-energy models obtained by using 10,000-20,000 ROSETTA predictions (Table 1).
Table 3. Full survey of converged protein structures generated by CS-ROSETTA
|
Protein name |
PDB*/BMRB ID |
Na/Nb† |
Nall‡ |
Ncs§ |
RMSDmean¶ [Å] |
RMSDexp║ [Å] |
||
|
Backbone |
All |
Backbone |
All |
|||||
|
GB3 |
2OED |
14/26 |
56(1-55) |
332 |
0.25±0.08 |
0.48±0.11 |
0.74±0.05 (0.69) |
1.43±0.05 (1.34) |
|
CspA |
1MJC/4296 |
0/33 |
70(4-70) |
405 |
0.96±0.23 |
1.44±0.19 |
1.43±0.29 (1.08) |
2.25±0.33 (1.74) |
|
Calbindin |
4ICB/390 |
47/0 |
75(3-74) |
435 |
0.68±0.23 |
0.90±0.21 |
1.39±0.11 (1.20) |
2.13±0.07 (1.92) |
|
Ubiquitin |
1D3Z |
18/25 |
76(2-72) |
426 |
0.34±0.11 |
0.76±0.12 |
0.82±0.06 (0.75) |
1.59±0.14 (1.40) |
|
XcR50 |
1TTZ/6363 |
28/16 |
76(3-73) |
352 |
0.98±0.32 |
1.37±0.39 |
1.67±0.27 (1.34) |
2.13±0.50 (2.06) |
|
DinI |
1GHH |
36/21 |
81(1-77) |
463 |
0.90±0.24 |
1.16±0.25 |
1.73±0.25 (1.54) |
2.38±0.14 (2.07) |
|
HPr |
1POH |
29/23 |
85(2-83) |
419 |
0.95±0.32 |
1.28±0.35 |
1.30±0.43 (0.93) |
1.99±0.37 (1.54) |
|
MrR16 |
1YWX/6799 |
23/35 |
88(2-81) |
514 |
0.73±0.18 |
1.03±0.19 |
1.77±0.22 (1.61) |
2.40±0.21 (2.17) |
|
TM1112 |
1O5U/5357 |
10/52 |
89(4-88) |
524 |
1.06±0.26 |
1.55±0.22 |
1.58±0.16 (1.16) |
2.30±0.14 (1.70) |
|
PHS018 |
2GLW/7116 |
20/41 |
92(6-88) |
531 |
1.12±0.31 |
1.51±0.28 |
1.56±0.26 (1.08) |
2.27±0.20 (1.69) |
|
HR2106** |
2HZ5/6210 |
37/25 |
96(2-92) |
470 |
0.80±0.26 |
1.10±0.22 |
1.85±0.27 (1.47) |
2.58±0.23 (2.14) |
|
TM1442 |
1SBO/5921 |
41/23 |
110(5-109) |
647 |
0.66±0.31 |
1.02±0.29 |
1.22±0.27 (1.01) |
1.90±0.20 (1.60) |
|
Vc0424 |
1NXI/5589 |
55/25 |
114(2-112) |
679 |
0.88±0.16 |
1.34±0.17 |
1.74±0.09 (1.35) |
2.53±0.11 (2.04) |
|
Spo0F |
1SRR/5899 |
55/25 |
121(2-115) |
590 |
1.09±0.21 |
1.41±0.22 |
1.67±0.19 (1.26) |
2.30±0.13 (1.80) |
|
Profilin |
1PRQ |
41/41 |
125(2-123) |
595 |
1.04±0.31 |
1.46±0.35 |
2.26±0.35 (2.02) |
2.88±0.34 (2.49) |
|
Apo_lfabp |
1LFO/4098 |
15/70 |
129(5-126) |
688 |
1.36±0.35 |
1.64±0.30 |
1.72±0.55 (1.12) |
2.33±0.43 (1.68) |
* Proteins for which experimental structures were obtained by X-ray diffraction are in italic; for proteins solved by NMR the first model of the NMR ensemble is used as the experimental reference structure.
† Number of residues in a-helix and b-strand.
‡ Total number of residues. Numbers of the first and last residue involved in secondary structures are listed in parentheses; these and all intervening residues were used to superimpose structures and to calculate the RMSD values of the predicted models relative to experimental structures. For cspA, residues 39 to 46 in the flexible loop are excluded for RMSD calculation.
§ Total number of the backbone chemical shifts used for the structure prediction; no d13C' available for XcR50, Hr2106 and Spo0F; no d1HN available for Profilin.
¶ RMSD between the 10 lowest-energy models and the mean coordinates for all backbone Ca, C' and N atoms (referred as "Backbone"), and all non-hydrogen atoms ("All").
║ RMSD between the 10 lowest-energy models and the experimental structure. The RMSD of the mean coordinates of the 10 lowest-energy models and the experimental structures are listed in parenthesis.
** Protein HR2106 is a homo-dimer, only the monomer conformation is predicted by CS-ROSETTA and used for comparisons.
Table 4. Survey of proteins for which CS-ROSETTA did not meet convergence criteria
|
Protein name |
PDB*/BMRB code |
Na/Nb† |
Nall‡ |
Nshifts§ |
Carmsd, Ŷ |
|
|
Lowest RMSD |
Lowest Energy |
|||||
|
HI0719 |
1J7H/5606 |
40/30 |
130 (3-129) |
733 |
4.50║ |
14.31║ |
|
MTH1598 |
1JW3/5165 |
32/47 |
140 (4-139) |
830 |
3.65** |
12.17** |
|
HR1958 |
1TVG/6344 |
8/73 |
140 (4-139) |
829 |
9.37†† |
16.29†† |
|
CcR19 |
1T17/6120 |
37/59 |
148 (2-144) |
842 |
3.67 |
7.09 |
|
YwIE |
1ZGG/6460 |
68/21 |
150 (2-145) |
851 |
3.72 |
9.37 |
|
Flua |
1N0S/5756 |
26/83 |
173 (2-163) |
1022 |
5.54 |
15.57 |
|
nsp1 |
2GDT/7014 |
17/33 |
116 (2-112) |
609 |
5.16‡‡ |
5.16‡‡ |
* Proteins with reference X-ray structures are in italic; for proteins solved by NMR the first model of the NMR ensemble is used as the reference structure.
† Number of residues in a-helix and b-strand.
‡ Total number of residues. The first and last residue numbers of the secondary structures are listed in parenthesis; Numbers of the first and last residue involved in secondary structures are listed in parenthesis; these and all intervening residues were used to calculate the RMSD values of the predicted models relative to experimental structures.
§ Total number of backbone chemical shifts.
¶ Ca RMSD (relative to the experimental reference structures) for the models with the lowest RMSD and lowest energy.
║ Residues 7 to 20 and 31 to 45, which are in flexible loops, are excluded for the RMSD calculation.
** Residues 39 to 47 and 104 to123, in flexible loops, are excluded for the RMSD calculation.
†† Flexible loop residues 17-38 are excluded for the RMSD calculation.
‡‡ Flexible loop residues 63-73 are excluded for the RMSD calculation.
Table 5. Survey of protein structures generated by CS-ROSETTA and independently by the NESG consortium
|
Protein name |
RpT7 |
StR82 |
RhR95 |
NeT4 |
TR80 |
VfR117 |
PsR211 |
AtR23 |
NeR45A‡‡ |
|
UniProt ID |
Q6N4D8_RHOPA |
Q04822_SALTY |
Q3IZ23_RHOS4 |
Q82V59_NITEU |
RLX_METTH |
Q5E7H1_VIBF1 |
Q885L4_PSESM |
Q8UEE9_AGRT5 |
Q82VF2_NITEU |
|
PDB/BMRB ID |
2jtv |
2jt1 |
2jvm |
2jv8 |
2jxt |
2jvw |
2jva |
2yja |
2jxn |
|
Protein Size * |
65(2-63) |
69(5-69) |
72(22-68) |
73(3-66) |
78(5-77) |
80(15-75) |
100(2-100) |
101(2-78) |
147(16-143) |
|
M.W [kDa] * |
7.8 |
8.0 |
8.5 |
8.7 |
9.8 |
10.2 |
11.6 |
10.8 |
15.4 |
|
Na/Nb† |
38/15 |
36/10 |
4/19 |
11/18 |
23/31 |
43/0 |
29/21 |
11/25 |
41/52 |
|
NCS |
345 |
400 |
405 |
429 |
357 |
468 |
589 |
569 |
765 |
|
Predicted models‡ |
|||||||||
|
RMSDbb/RMSDall§, [Å] |
0.73±0.10 1.25±0.18 |
0.24±0.09 0.53±0.13 |
0.68±0.26 1.26±0.26 |
0.47±0.15 1.05±0.15 |
0.44±0.11 0.84±0.11 |
0.68±0.16 1.15±0.22 |
1.34±0.27 1.72±0.24 |
1.19±0.67 1.73±0.65 |
0.83±0.17 1.29±0.14 |
|
Ramachandran plot¶,§, [%] |
98/2/0/0 |
98/2/0/0 |
95/5/0/0 |
90/10/0/0 |
96/4/0/0 |
96/4/0/0 |
95/5/0/0 |
96/4/0/0 |
95/5/0/0 |
|
Procheck G-factor§, F&Y/All |
0.20/0.38 |
0.47/0.56 |
-0.26/0.11 |
-0.13/0.21 |
-0.1/0.16 |
0.50/0.56 |
0.11/0.27 |
-0.12/0.20 |
-0.01/0.21 |
|
MOLPROBITY clash score§ |
6.71 |
7.28 |
4.40 |
1.98 |
3.62 |
4.50 |
6.38 |
4.41 |
3.34 |
|
DP score§, [%] |
69 |
65 |
55 |
57 |
67 |
37 |
57 |
60 |
53 |
|
NMR ensembles |
|||||||||
|
RMSDbb/RMSDall§ [Å] |
0.32±0.05 0.97±0.09 |
0.50±0.09 1.02±0.10 |
0.50±0.11 0.91±0.11 |
0.42±0.07 0.94±0.09 |
0.42±0.08 0.87±0.08 |
0.59±0.10 1.17±0.11 |
0.58±0.10 0.96±0.10 |
0.42±0.08 |
0.70±0.08 1.22±0.07 |
|
Ramachandran plot¶,§, [%] |
97/3/0/0 |
97/3/0/0 |
92/7/1/0 |
85/13/1/1 |
92/8/0/0 |
94/6/0/0 |
93/7/0/0 |
90/10/0/0 |
90/10/0/0 |
|
Procheck G-factor§, F&Y/All |
0.20/0.07 |
0.14/0.12 |
-0.44/-0.31 |
-0.31/-0.32 |
-0.31/-0.20 |
0.17/0.19 |
-0.09/-0.16 |
-0.32/-0.32 |
-0.34/-0.35 |
|
MOLPROBITY clash score§ |
20.89 |
19.20 |
12.73 |
29.01 |
19.80 |
14.65 |
16.64 |
11.2 |
20.44 |
|
DP score§, [%] |
72 |
78 |
80 |
70 |
85 |
81 |
80 |
76 |
71 |
|
Expert time║ [days] |
15 |
15 |
17 |
12 |
15 |
20 |
14 |
25 |
35 |
|
RMSDbb** [Å] |
0.64 |
0.57 |
0.66 |
0.70 |
0.69 |
0.60 |
2.07 |
1.10 |
2.03‡‡ |
|
RMSDall†† [Å] |
1.29 |
1.14 |