electronic reprint Journal of

Applied Crystallography ISSN 0021-8898

Editor: Anke R. Pyzalla

Optimal side-chain packing in proteins and crystallographic refinement Swanand Gore and Tom Blundell

J. Appl. Cryst. (2008). 41, 319–328

c International Union of Crystallography Copyright  Author(s) of this paper may load this reprint on their own web site or institutional repository provided that this cover page is retained. Republication of this article or its storage in electronic databases other than as specified above is not permitted without prior permission in writing from the IUCr. For further information see http://journals.iucr.org/services/authorrights.html

Many research topics in condensed matter research, materials science and the life sciences make use of crystallographic methods to study crystalline and non-crystalline matter with neutrons, X-rays and electrons. Articles published in the Journal of Applied Crystallography focus on these methods and their use in identifying structural and diffusioncontrolled phase transformations, structure–property relationships, structural changes of defects, interfaces and surfaces, etc. Developments of instrumentation and crystallographic apparatus, theory and interpretation, numerical analysis and other related subjects are also covered. The journal is the primary place where crystallographic computer program information is published.

Crystallography Journals Online is available from journals.iucr.org J. Appl. Cryst. (2008). 41, 319–328

Gore and Blundell · Optimal side-chain packing

research papers Journal of

Applied Crystallography

Optimal side-chain packing in proteins and crystallographic refinement

ISSN 0021-8898

Swanand Gore* and Tom Blundell Received 17 December 2007 Accepted 16 January 2008

# 2008 International Union of Crystallography Printed in Singapore – all rights reserved

Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK. Correspondence e-mail: [email protected]

Amino acid side chains often adopt one of a few distinct, physicochemically favourable conformational states called rotamers. Rotameric preferences and compact packing are sufficient to estimate the conformations of most side chains, as demonstrated by approaches such as SCWRL in homology modelling, but such algorithms have not yet been applied to protein crystallographic refinement. SCWRL’s combinatorial optimization algorithm was adapted for assigning side-chain rotameric states that maximize the electron density map occupation while minimizing steric clashes. Our program (OPSAX) was tested on five proteins by introducing error in main chains and comparing the subsequent CNS-only and CNS/OPSAX refinements. The latter refinement was also extended to multiconformer models. A sequence-assignment exercise examined whether CNS/OPSAX refinement can discriminate between correct and incorrect assignments at various artificially lowered resolutions. The composite CNS/OPSAX refinement yielded better Rfree values than CNS-only refinement. A further drop in Rfree was observed with multiconformer refinement. For complete main chains, the correct sequence could be discriminated efficiently in most cases even for a low observation-to-parameter ratio of 4, indicating that the OPSAX approach should find useful applications in protein X-ray refinement.

1. Introduction Amino acid side chains are vital to the folding, function and dynamics of proteins because they are the main ingredients of physicochemical diversity therein. Two salient features of side chains are their rotamericity and their compact packing in globular protein domains. Rotamericity is the tendency to be in one of a few, distinct, locally optimal conformations and was hypothesized from small-molecule analysis (Sasisekharan & Ponnuswamy, 1971). With the increasing size of the Protein Data Bank (PDB), this hypothesis was statistically confirmed by many studies, leading to the compilation of rotamer libraries for use in homology modelling (Tuffery et al., 1997; De Maeyer et al., 1997; Dunbrack & Cohen, 1997; Lovell et al., 2000). Rotamer libraries describe a set of discrete states for side chains and their propensities, dependent on or regardless of the backbone conformation. Packing of side chains in globular protein domains is complementary, compact and like a jigsaw puzzle (Richards, 1973; Levitt et al., 1997). Taken together, packing and rotamericity can determine the conformations adopted by side chains. This has been demonstrated by side-chain placement programs (De Maeyer et al., 1997; Canutescu et al., 2003; Xu, 2005; Smith et al., 2007), which find the optimal balance between rotamer propensity score, van der Waals energy and sometimes evolutionary information. These methods are capable of finding the global J. Appl. Cryst. (2008). 41, 319–328

minimum of a composite function consisting of the self-energy of rotameric states (rotameric propensity, conservation scores from homologous templates) and pairwise energies (van der Waals energy between close rotamers). They rely on techniques such as dead-end elimination, branch-and-bound and graph decomposition to reduce the search space and avoid futile energy calculations. Side-chain optimization programs are valuable in homology modelling because the main-chain conformers can be predicted from templates more accurately than side chains, in addition to the fact that side chains are densely packed. For the same reasons, they could also be useful in protein crystallography, for example, in the following scenarios. Firstly, refinement of a model with an approximately known main chain can be automated with an optimal side-chain assignment procedure. Secondly, side-chain heterogeneity can be gauged by running the procedure several times and performing a multiconformer refinement. Thirdly, hypotheses about sequence registry of a full or fragmented main chain can be quickly tested. Here we assess the value of optimal side-chain assignment in these crystallographic scenarios. We begin by describing the side-chain placement procedure (termed OPSAX, for optimal side-chain assignment for crystallography) and its implementation within the RapperTK framework (Gore & Blundell, 2007). Then we describe and compare the composite (CNS/OPSAX) refinement protocol doi:10.1107/S0021889808001672

electronic reprint

319

research papers

Figure 1 A side-chain graph (left) is prepared by checking which side chains have clashing rotamers. This graph is decomposed into biconnected subgraphs and the articulation vertices are identified. The biconnected subgraphs can be represented as a graph (right); two subgraphs are connected if they share an articulation vertex. The biconnected subgraphs are ordered using a topological sort and solved in that order. The final rotamer assignment is found by traversing the subgraphs in the reverse order.

against the CNS-only protocol (Bru¨nger et al., 1998) and show that the former generally outperforms the latter when refinement statistics are compared. Multiconformer refinement with two and four copies is performed to observe a further improvement in refinement. We then ask whether the correct sequence registry can be efficiently discriminated with simulated low-resolution data sets for complete and fragmented main chains.

chosen because it seems to be large enough to allow averaging over the surrounding density while avoiding density that belongs to atoms not in the rotamer. The pairwise energy of rotamers of two different side chains is defined as 1 or 0 depending on whether they sterically clash or not. The van der Waals radii of rotamer atoms are reduced by 50% while calculating the steric clashes, following previous work (DePristo et al., 2003). This is necessary because our experience shows that using the full radius of side-chain rotamers leads to excessive steric clashing, sometimes preventing allatom protein modelling. (3) Rotamers are further removed using dead-end elimination (Goldstein DEE; Goldstein, 1994). For a side chain, a set of neighbouring side chains which may sterically clash with it are identified (denoted Nbrs). Then two rotamers a; b of this side chain are compared by checking whether ða; iÞ  ðb; iÞ þ

Nbrs P j6¼i

 minc ða; i; c; jÞ  ðb; i; c; jÞ > 0;

ð1Þ

where ða; iÞ is the self-energy of the rotamer a of residue i, and (a; i; c; j) is the pairwise energy of rotamer a of residue i and rotamer c of residue j. If this relation holds, then rotamer a is said to be dominated by rotamer b and is removed from further calculations. (4) Side-chain pairs with the possibility of steric clash are connected to form a side-chain graph (Fig. 1). A connected graph is biconnected when removal of any one vertex does not disconnect the graph. Articulation vertices are those that belong to more than one biconnected subgraph in a graph.

2. Side-chain reassignment procedure This procedure is inspired by SCWRL (Canutescu et al., 2003), a popular and very effective approach to optimal side-chain reassignment, which we find difficult to better in the absence of the use of evolutionary information. For clarity and completeness, we describe our implementation, which is a new module written in our conformationsampling software RapperTK. (1) We use rotamers from the Penultimate Rotamer Library (Lovell et al., 2000) in this work. For each side chain to be assigned, steric clashes of all of its rotamers with the rest of the system are checked. All clashing rotamers are removed. (2) The self-energy of a rotamer is the average value of the electron density in the 2Fo  Fc map over grid ˚ of atoms in points that lie within 1 A ˚ the rotamer. A 1 A neighbourhood is

320

Gore and Blundell



Figure 2 Branch-and-bound in OPSAX. For each rotameric state of the articulation vertex of a biconnected subgraph with N + 1 nodes, there are N assignments to be decided in order to solve the component. This can be viewed as an N-level tree, each level corresponding to an assignment. A search of this tree can be pruned to make it more efficient. The best assignment in the tree seen so far is remembered (Ebest ). At the current stage of a search, the best possible energy from the subtree below is computed and added to the present energy. If this sum is worse than Ebest then the subtree is not searched further.

Optimal side-chain packing

electronic reprint

J. Appl. Cryst. (2008). 41, 319–328

research papers Such biconnected subgraphs and articulation vertices are identified by using the Boost Graph Library (Siek et al., 2002). The biconnected subgraphs and articulation points can be viewed as vertices and edges in a complementary graph, which is then topologically sorted to order the biconnected subgraphs. These subgraphs are then solved in that order. (5) A biconnected subgraph has an articulation vertex assigned to it. Solving a subgraph means finding the best rotamer assignment for the subgraph for each rotamer of the articulation vertex. A biconnected subgraph is solved with a branch-and-bound technique. In principle, the number of possible rotamer assignments in a subgraph grows exponentially with the number of vertices. This can be viewed as a tree (Fig. 2), but most of the leaves of the tree are not feasible due to steric clashes. In order to avoid visiting all possible rotamer assignments for the components, at each stage of the search, the best possible energy from unassigned nodes in the component is calculated in the context of currently assigned nodes. This is the bounding energy: P PP mina;b ði; a; k; bÞ: ð2Þ Ebound ¼ mina ði; aÞ þ i>j

DePristo et al. (2004). In CNS-only refinement, the trace is refined with ten rounds of CNS, each round consisting of three cycles of molecular dynamics/simulated annealing (MDSA) starting from 5000 K. At the start of CNS/OPSAX composite refinement, C and side-chain atoms are removed and deposited structure factors are phased with the main chain. Side chains are reassigned within this map after calculating the

i>j kj

If Ebound þ Ej > Ebest then the current subtree of search is abandoned as it is not possible to obtain a solution better than that already known from that subtree. When a subgraph is solved, the internal energy of each rotamer of its articulation vertex is replaced by the minimum energy of the subgraph with articulation vertex in that rotameric state and the corresponding subgraph rotamer assignment is remembered. (6) After all subgraphs have been solved in this way, the biconnected subgraphs are visited in reverse order. For each articulation vertex assigned to the subgraph, the least-energy rotamer is now known. This process is used to assign rotamers to all vertices; this is the global minimum assignment for this energy function. (7) Unconnected side chains are assigned the least-energy rotamer.

Figure 3 Single-conformer refinement flowchart. The difference between the CNSonly (left) and CNS/OPSAX (right) protocols is that the ill-fitting side chains are identified and reassigned in the latter.

3. Single- and multiconformer refinement 3.1. Refinement protocols

The goal of this exercise is to compare the CNS-only and CNS/OPSAX refinement procedures to assess the value of the side-chain placement procedure (Fig. 3). The deposited structure is perturbed by carrying out an all-atom C trace (DePristo et al., 2003; Gore & Blundell, 2007) on the structure. C tracing is a procedure that samples a protein chain sequentially from N to C termini using a genetic algorithm cum branch-and-bound (GABB) technique. It employs a residue-specific fine-grained ’– library and a rotamer library for sampling the main chain and side chains, respectively. Every C atom is restrained positionally within a sphere centred on the given C coordinates and having a radius called the C restraint radius or the trace radius. Side chains can also be positionally restrained in this procedure, but that has not been done here. After tracing, B factors of all protein atoms are reset to 30, a typical value of B factor, similar to J. Appl. Cryst. (2008). 41, 319–328

Figure 4

Multiconformer refinement. 2Fo  Fc maps are calculated using the composite model and equal partial occupancies. A composite model at the ith stage is split into single conformers. Ill-fitting side chains in single conformers are identified and reassigned using the composite map. Rebuilt conformers are combined into a multiconformer model and refined with CNS in three cycles of MDSA starting at 5000 K. Gore and Blundell

electronic reprint



Optimal side-chain packing

321

research papers C atom positions. This is followed by similar CNS refinement as a CNS-only protocol. In subsequent CNS/OPSAX rounds, all side chains are first scored by the correlation coefficient ˚ of between Fc and 2Fo  Fc maps on grid points around 1 A the side-chain atoms. Only the side chains with a correlation

lower than 0.9 are reassigned. Whenever a rotamer is reassigned, its B factors are set to 30; otherwise they are copied from the previous round. In both protocols, structures with the best Rfree are considered for further analysis. The CNS/OPSAX protocol is extended to multiconformer models with two and four copies. In the two-copy case, two C traces are generated and stripped of their side chains and C atoms. They are assigned a partial occupancy of 0.5 and deposited structure factors are phased with this model. C and side-chain atoms are built for both traces with this map. Allatom models with partial occupancy are refined with CNS. In subsequent rounds, in a way similar to single-conformer models, each copy is assessed to detect ill-fitting side chains and rebuilt independently of others, but subjected to CNS refinement together as a multiconformer model. This is depicted in Fig. 4. A similar protocol is replicated for the fourcopy case. Single and multiconformer refinements are carried out over a range of C trace radii. Refinement with each radius is repeated five times. The data set used in this work is shown in Table 1.

Figure 5 The improvement in Rfree calculated with the side-chain placement protocol as compared with a CNS-only protocol for (a) 9ilb, (b) 1aac, (c) 1kx8, (d ) 1g35 and (e) 8cho. Each CNS run consists of two cycles of MDSA starting at 5000 K with isotropic B factor refinement and a maximum likelihood target.

322

Gore and Blundell



Optimal side-chain packing

electronic reprint

J. Appl. Cryst. (2008). 41, 319–328

research papers 3.2. Single-conformer refinement

Fig. 5 shows the mean Rfree value plotted against the trace radius for all proteins. For the single conformer case, we observe that the Rfree values are consistently better in CNS/ OPSAX refinement than in CNS-only refinement over all trace radii, with the improvements more noticeable in 1aac,

1g35 and 8cho. Fig. 6 shows the comparison of rotameric states between the deposited structure and single-conformer models. The 1 and 1;2 values of the structure and a model are compared by defining 1 to be similar for two corresponding residues if they are within 40 of one another, and by defining 1;2 to be similar if both 1 and 2 for two corresponding residues are within 40 of one another. Clearly, the rotamer accuracies depend on the quality of the main chain. Fig. 6 shows that the CNS/OPSAX procedure models rotamers similar to those in the deposited structure when the resolution is high, but this is not so as the resolution decreases; when compared with rotamer statistics of the CNS-only protocol, the rotamer statistics of CNS/OPSAX are better for 1aac and 1g35, comparable for 1kx8, and worse for 9ilb and 8cho. This is perhaps expected because the rotamericity of low-resolution structures deposited in the PDB tends to be lower than that of the high-resolution structures, and OPSAX uses only the rotameric side chains in the reassignment procedure. Fig. 7 shows the reduction in rotamer accuracies as resolution drops. The HIV protease structure (1g35) does not follow this trend, but it can be explained by the existence of a large ligand (inhibitor AHA024). In the CNS/OPSAX

Figure 6

Side-chain (1 , 1;2 ) accuracy statistics for CNS/OPSAX and CNS-only protocols for (a) 9ilb, (b) 1aac, (c) 1kx8, (d ) 1g35 and (e) 8cho. J. Appl. Cryst. (2008). 41, 319–328

Gore and Blundell

electronic reprint



Optimal side-chain packing

323

research papers Table 1 The five-protein set used in this work. #AA denotes the number of amino acid residues. CDPI is the Cruickshank DPI value [Murshudov & Dodson, 1997, equation (3)]. The number and percentage of loop residues are also listed. PDB id 1aac 1g35 8cho 9ilb 1kx8

Name

#AA

No. (%) of loop AA

Amicyanin HIV protease 5-3-Ketosteroid isomerase Interleukin- Chemosensory protein A6

105 198 174

64 (61) 95 (48) 40 (23)

1.31 1.80 1.47

21199 19059 7439

0.002 0.022 0.114

153 109

82 (54) 20 (18)

2.28 2.80

9535 5096

0.036 0.261

Resolution ˚) (A

No. of reflections

CDPI ˚) (A

refinement protocol, the ligand is copied from the deposited structure at the beginning and then copied from the previous to the present iteration. This indirectly informs CNS about incorrectly placed rotamers. With the exception of this outlier, lowering of mean rotamer accuracy and an increase in its standard deviation is a clear tendency and indicates that lower-resolution data accommodate greater variability. The improvement in the CNS refinement as a result of using the OPSAX procedure is primarily due to the fact that CNS alone rarely relocates bulky side chains into better density. Because of the gradient-driven nature of CNS, relocation into better density is likely only if there is a continuous favourable gradient towards a correct solution, but this is sometimes not the case because of intermediate regions of low density. OPSAX relocates such side chains whenever permitted by rotamericity and excluded volume restraints. An ill-fitting side chain will be moved out of a patch of density if there is another side chain which fits better. This process enriches the model with more well placed rotamers and leads to better refinement. However, the best Rfree statistics do not indicate native-like refinement, but rather show a dependence on the quality of the main chain provided at the start of refinement. This is due to the fact that OPSAX does not modify incorrectly placed parts of main chain. For this reason, it is not often that refinement statistics become comparable to those of the native, but in the rare instances when they do, the corresponding models can indicate the structural heterogeneity. For instance, in the 1kx8 case (see Fig. 8), we observed that three of the 15 models generated had comparable Rfree values (0.287, 0.287, 0.284) to the native (0.283), yet the differences between the deposited structure and the models were different for each model. This example suggests that OPSAX can be a quick way to identify the regions of structural heterogeneity. 3.3. Multiconformer refinement

Previous work (e.g. Wilson & Brunger, 2000) has identified two ways to describe structural heterogeneity better than with a single-conformer isotropic B factor model: a singleconformer anisotropic B factor model or a multiconformer isotropic B factor model with partial occupancies on each

324

Gore and Blundell



Figure 7

Accuracies of (a) 1 and (b) 1;2 as a function of resolution. The accuracy generally drops with resolution and the sizes of the error bars increase, indicating greater variability accommodated by low-resolution data. Note that the increasing order of resolution is 1aac, 8cho, 1g35, 9ilb and 1kx8. The bump for 1g35 is due to a large ligand.

conformer. The previous section suggested that a collection of single-conformer isotropic B factor models generated with the CNS/OPSAX protocol can give clues about structural heterogeneity. This encouraged us to extend the protocol to multiconformer refinement for deriving an ensemble. Multiconformer refinement is generally expected to yield better refinement statistics than its single-conformer counterpart because the refinement process has more parameters to fit to the experimental data. Fig. 5 shows that this is true in all five cases for the two-copy refinement, but contrary to expectations, four-copy refinement is not consistently better than two-copy refinement. This can be explained as follows: (a) all conformers have different and erroneous main chains, (b) owing to the nature of the derivative calculation procedure, errors in a conformation are not corrected as vigorously as in a single-conformer case, (c) over-fitting may occur as a result of there being too many parameters, and (d ) the relationship between multiple conformers and their combined density is not as clear as a single conformer and its density.

Optimal side-chain packing

electronic reprint

J. Appl. Cryst. (2008). 41, 319–328

research papers In spite of these difficulties, sometimes the refinement yields Rfree values slightly lower than the native model and the resulting ensemble may be considered as good an interpretation of the diffraction data as the deposited model. We find four such cases, two each for 1kx8 and 1aac; the two cases for both proteins consist of a two- and a four-conformer refinement (Fig. 9). Both ensembles show variability in the same regions, lending credibility to the interpretations. The fourconformer ensemble shows more variation than the twoconformer ensemble. An interesting case of concerted conformational change is observed in the four-conformer 1kx8 case (Fig. 10). Residues Tyr98, Trp81, Leu73 and His72 are systematically different in one of the copies, while the other copies are very similar to each other. It is unclear whether this is an artefact of the CNS/OPSAX protocol or a genuine

concerted movement. In any case, the observed ensemble refinements suggest that the protocol developed here is useful to derive multiconformer interpretations.

4. Sequence assignment

Typical protein crystallographic refinement begins with approximate identification of a complete or fragmented main chain, followed by sequence identification and detailed allatom iterative refinement. At medium and low resolutions, sequence assignment on an approximate main chain becomes difficult because electron density contours may not contain much shape information and side-chain densities may be absent or intermerged. Pattern matching techniques, exploitation of knowledge of the structures of homologues and trialand-error all play a part in addressing this problem. In theory, all possible assignments can be made and refinement carried out to find the correct one, but it is easy to see that even five mainchain fragments create a combinatorial explosion which cannot be addressed with brute force. Here we investigate whether an efficient testing of a sequence assignment is possible at low resolution when the main chain is known only approximately. The same five proteins used in previous sections are used here, but with artificially lowered resolution such that the observation-to-parameter ratio drops to three. The observation-toparameter ratio is calculated as the ratio of the number of observed X-ray reflections to the number of protein atoms. Resolution is lowered by gradually removing the high-resolution reflections till the desired observations˚ to-parameter ratio is obtained. On a 2 A C-traced main chain, 21 different sequences are assigned and refined. The 21 sequences consist of the correct sequence and the correct one offset by up to ten residues in either direction. The terminal region left vacant by offsetting is assigned a random sequence. The CNS refinement carried out here is shorter than the previous protocol: a round consists of two cycles of MDSA starting at 3000 K. After initial sequence assignment, five rounds of such CNS/OPSAX refinement are carried out. A sequence discrimination exercise consists of 21 sequence trials Figure 8 and it is repeated 25 times to obtain a Structural heterogeneity identified for 1kx8. (a) The residue-wise all-atom r.m.s. deviation in the statistical estimate of the success rate in three models, with Rfree comparable to the deposited value. (b) The deposited model as green sticks and the three models as lines. Heterogeneity occurs at different residues in different models. identification of the correct sequence. J. Appl. Cryst. (2008). 41, 319–328

Gore and Blundell

electronic reprint



Optimal side-chain packing

325

research papers

Figure 9 Two-conformer (left) and four-conformer (right) ensemble interpretations of 1aac (top) and 1kx8 (bottom), having a lower Rfree value than the deposited structure.

Figure 11 (a) Success rates of registry recognition and (b) average time required for refining a main chain for a registry, for various X-ray observation-toparameter ratios. Observations are removed gradually, those in higherresolution shells before those in lower-resolution shells. All main chains ˚ restraints. For a main chain, 20 incorrect were generated with 2 A registries are tried in addition to the correct one. Each CNS run consists of two cycles of MDSA starting at 3000 K with isotropic B factor refinement with a standard residual target.

Figure 10 Conformational variability in the 1kx8 multiconformer refined model with four copies. The Rfree values are comparable for the multiconformer and deposited models. Tyr98, Trp81, Leu73 and His72 are significantly different in one of the copies (sticks) than the other three (lines) and the deposited model (red lines). This can be interpreted as the concerted motion occurring rarely.

An exercise is successful if Rfree for the correct sequence is lower than those of the other 20 incorrect sequences. Fig. 11 shows the success rate (i.e. discriminating power) in identifying correct sequences as a function of the X-ray observation-to-parameter ratio. For even a small X-ray observation-to-parameter ratio of 4, the discriminating power is more than 80% for four of the five proteins. In other words, in four of five attempts, the correct sequence refines to a smaller Rfree than other sequences. Thus the CNS/OPSAX protocol is highly effective in identifying a correct sequence.

326

Gore and Blundell



As expected, the discriminatory power increases with increasing X-ray observation-to-parameter ratio. The time required to refine a combination of sequence and main chain for an X-ray observation-to-parameter ratio of 4 is around 20 min, i.e. around 7 h for 21 sequence/main-chain combinations, suggesting that the protocol is also efficient. Investigation of the 1aac case where the protocol does not work well revealed that too many X-ray observations were removed in the artificial lowering of resolution, and the protocol worked satisfactorily when more data were used, e.g. the correct registry was identified 60 and 80% of the time for X-ray observation-to-parameter ratios of 10 and 20, respectively. We attempted to extend this exercise further to fragmented main chains. We assumed that secondary structure elements were approximately known and we repeated the same procedure as before to estimate the discriminatory power. Loop residues were not included in the model during any round of refinement. Our results confirm our expectation that

Optimal side-chain packing

electronic reprint

J. Appl. Cryst. (2008). 41, 319–328

research papers

Figure 12 Success rates of registry recognition in the fragmented main-chain case. Only the residues in secondary structure elements are used in the refinement. Discrimination of the correct sequence is correlated to the fraction of the secondary structure residues.

this case is more challenging than the complete main chain (Fig. 12). For a low-resolution scenario, the success rate is better than 50% only for 1kx8. Even when all reflections are used, only 1kx8 and 8cho have satisfactory success rates. This observation is consistent with the secondary structure content: both 1kx8 and 8cho have nearly 80% of the residues in secondary structure compared with around 50% for the others (Table 1). This suggests that in the fragmented main-chain case, the packing restraint is lost because of the exclusion of side chains and main chains of loop residues, and the packing restraint is important to drive refinement to a nonrandom Rfree . Perhaps rotameric propensities would be helpful here to compensate for loss of packing, and this approach needs to be further investigated.

5. Conclusions One of the motivations for this work was to experiment with orderless sampling. In Rapper’s GABB algorithm (DePristo et al., 2003), there is a fixed N-to-C order on sampling. In RapperTK, this order can be suitably changed but not eliminated. Indeed the existence of sampling order is essential to keep the sampling tractable. However, some scenarios where sets of conformational samples of residues are independent of one another allow orderless sampling. Side-chain sampling is a classic example of this. The limitations of this approach are computational in nature; for instance, we noted that it takes a long time to solve large biconnected components, as observed by Canutescu et al. (2003). The availability of an orderless sampling algorithm makes RapperTK a more useful software package and complements the GABB algorithm, extending its applicability. OPSAX can be extended beyond amino acid side chains, to entities such as flexible ligands or main-chain fragments adopting a limited set of conformations. We have shown that orderless sampling as implemented in OPSAX is useful for protein crystallography: (a) side-chain J. Appl. Cryst. (2008). 41, 319–328

optimization can be used in conjunction with CNS to carry out automated refinement starting from approximate main chains; (b) multiconformer refinement improves refinement statistics and may help in speculating concerted conformational changes; (c) efficient discrimination of a correct sequence assignment is possible at low resolution with an approximate main chain. Where Rfree of a single- or multiconformer model is comparable to Rfree of the deposited structure, our approach is useful to assess the structural heterogeneity inherent in the structure afforded by the diffraction data. Single-conformer refinements lead to collections and multiconformer refinements lead to ensembles. A simple application of the CNS/ OPSAX protocol might be to obtain both collections and ensembles. Another interesting avenue would be to detect rotameric heterogeneity and build multiple side chains instead of one. Efficient discrimination of a correct sequence, given an approximate main chain and low-resolution data, is a promising result. However, we also realize that the same procedure was not so promising for fragmented main chains owing to loss of packing. Further work will need to find ways to simulate the packing effect that missing loops would provide, e.g. building missing loops first before sequence assignment, use of rotameric propensities in the energy function, more realistic treatment of van der Waals interactions etc. From the computational perspective, the OPSAX algorithm can be improved by identifying articulation edges and components, especially when the biconnected subgraphs are large. Large subgraphs can be divided into more manageable subgraphs this way and many interesting scenarios can be addressed, which would otherwise be computationally prohibitive, e.g. taking into account long-range correlations between side-chain rotamers such as NMR NOEs and electrostatic energy. To conclude, this work demonstrates that optimal side-chain assignment is useful for crystallographic refinement and points to ways to adapt it better to multiconformer and low-resolution applications. This approach and the applications described here will be made freely available online in the near future. We thank Dr Nick Furnham and Anjum Karmali for helpful discussions. SG thanks Cambridge Commonwealth Trust and Universities UK for funding his PhD studentship, and Mr Anand Chitipothu for technical help.

References Bru¨nger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Canutescu, A. A., Shelenkov, A. A. & Dunbrack, R. L. Jr (2003). Protein Sci. 12, 2001–2014. De Maeyer, M., Desmet, J. & Lasters, I. (1997). Fold. Des. 2, 53–66. Gore and Blundell

electronic reprint



Optimal side-chain packing

327

research papers DePristo, M. A., de Bakker, P. I. W. & Blundell, T. L. (2004). Structure, 12, 831–838. DePristo, M. A., de Bakker, P. I. W., Lovell, S. C. & Blundell, T. L. (2003). Proteins Struct. Funct. Genet. 51, 41–55. Dunbrack, R. L. & Cohen, F. E. (1997). Protein Sci. 6, 1661–1681. Goldstein, R. F. (1994). Biophys. J. 66, 1335–1340. Gore, S. & Blundell, T. (2007). arXiv:0710.3948v1 [q-bio.BM]. Levitt, M., Gerstein, M., Huang, E., Subbiah, S. & Tsai, J. (1997). Annu. Rev. Biochem. 66, 549–579. Lovell, S. C., Word, J. M., Richardson, J. S. & Richardson, D. C. (2000). Proteins Struct. Funct. Genet. 40, 389–408. Murshudov, G. N. & Dodson, E. J. (1997). CCP4 Newsletter on Protein Crystallography, No. 33.

328

Gore and Blundell



Richards, F. M. (1973). J. Mol. Biol. 82, 1–14. Sasisekharan, V. & Ponnuswamy, P. K. (1971). Biopolymers, 10, 583– 592. Siek, J. G., Lee, L.-Q. & Lumsdaine, A. (2002). The Boost Graph Library User Guide and Reference Manual. Boston: Addison– Wesley Longman. Smith, R. E., Lovell, S. C., Burke, D. F., Montalvao, R. W. & Blundell, T. L. (2007). Bioinformatics, 23, 1099–1105. Tuffery, P., Etchebest, C. & Hazout, S. (1997). Protein Eng. 10, 361– 372. Wilson, M. A. & Brunger, A. T. (2000). J. Mol. Biol. 301, 1237– 1256. Xu, J. (2005). Lecture Notes Comput Sci. 3500, 423–439.

Optimal side-chain packing

electronic reprint

J. Appl. Cryst. (2008). 41, 319–328

electronic reprint Optimal side-chain packing in proteins ...

graphic apparatus, theory and interpretation, numerical analysis and other ... assigning side-chain rotameric states that maximize the electron density map.

2MB Sizes 1 Downloads 202 Views

Recommend Documents

electronic reprint Bis(tetraethylammonium)
geometry is the reduction of the SРCuРS angle from 120 as a consequence of the ... (Bruker, 2001); data reduction: SAINT; program(s) used to solve structure: ...

electronic reprint Bis(tetraethylammonium) bis ...
Mo–S3. 2.2089 (5). Mo–S4. 2.2143 (5). S1–Cu1. 2.2136 (5). S2–Cu1. 2.2111 (5). S3–Cu2. 2.2156 (5). S4–Cu2. 2.2101 (5). Cu1–C1. 1.889 (2). C1–N1. 1.137 (3).

electronic reprint Bis(N-phenylpyrazole-1 ...
Author(s) of this paper may load this reprint on their own web site provided that this cover page is retained. ... Correspondence e-mail: [email protected].

electronic reprint Bis(N-phenylpyrazole-1 ...
Author(s) of this paper may load this reprint on their own web site provided that this cover page is retained. ... Correspondence e-mail: [email protected].

electronic reprint A second tetragonal polymorph of ...
axis, requiring equal occupancy of two sets of four positions for the S atoms. There is additional disorder as a result of. 8.2(2)% substitution of Cl by Br, arising ...

electronic reprint Full-profile refinement by derivative ...
Applied. Crystallography. ISSN 0021-8898. Received 1 March 2004. Accepted 27 June 2004 й 2004 International Union of Crystallography. Printed in Great Britain · all rights reserved. Full-profile refinement by derivative difference minimization. Leo

electronic reprint Bis(N-phenylpyrazole-1 ...
Data-to-parameter ratio = 16.1. For details of how these key ... 2000); cell refinement: SAINT. (Bruker, 2000); data reduction: SAINT; method used to solve struc-.

electronic reprint Linear and circularly polarized light to ...
investigation of local atomic and electronic anisotropy, as well as local magnetic moment and magnetic order in magnetic systems. In this paper we discuss the ...

electronic reprint Structure of laccase from ...
Jan 1, 2011 - 3(c) and 3(d) show positive peaks of an Fo À Fc electron- .... Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010).

electronic reprint GenX: an extensible X-ray reflectivity ...
Correspondence e-mail: [email protected]. GenX is a versatile program using the differential evolution algorithm for fitting. X-ray and neutron reflectivity data.

electronic reprint Linear and circularly polarized light to ...
instantaneously circular motion, so that its acceleration is perpendicular to its ..... fluorescence mode using a Ge 15-element solid-state detector. 0.5 mm slits ...

electronic reprint The high-resolution structure of the ...
Republication of this article or its storage in electronic databases other than as specified ..... sodium ion with the participation of the carbonyl O atom of Gly91 in.

electronic reprint Structure of laccase from ...
The data- collection and processing parameters are given in Table 1. The enzyme ... Initial phases were assigned by molecular replacement in. MOLREP (Vagin ...

electronic reprint GenX: an extensible X-ray reflectivity ...
In addition,. GenX manages to fit an arbitrary number of data sets simultaneously. The .... large package repository, with packages for numerical computing and.

packing-list.pdf
Phone. Electrical power strip. Wastebasket. Closet. Clothes. Rod and curtain for your closet. Clothes hangers. Shoe bag/rack. Clothespins- Laundry. Decorating. Rug/carpet. Pictures/posters for walls. 3M Command Hooks (3M brand is. the only material t

Packing list.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Packing list.pdf.

Long-range energy transfer in proteins - CiteSeerX
Nov 12, 2009 - destination site, which acts as an efficient energy-accumulating center. Interestingly ..... In summary, despite its coarse-grained nature, the NNM.

Packing List.pdf
Post-it note. Big Erasers. Duck Tape. Index Cards (If you like flashcards). Folders. Paper Clips. Ruler*. Printer*. Mini Stapler. Colored Pencils.

Packing-List.pdf
SHOES, DRESS, BLACK 1 1. UNDERSHIRT, COTTON, WHITE 1 1. NECKTIE/NECK TAB, BLACK 1 1. OVERCOAT, BLACK, TRENCH 1 1. AWARDS, INSIGNIAS, BADGES, ACCOUTERMENTS AS REQUIRED. Operational Camouflage Pattern (OCP) or Army Combat Uniforms (ACU). BELT, RIGGERS,

USQ_Player Packing Checklist.pdf
... to sell (check with the TD rst to see if this is. permitted). qTeam ag/banner. qTeam debit card. US QUIDDITCH. Page 1 of 1. USQ_Player Packing Checklist.pdf.

safari-packing-list.pdf
64 MB memory card. Foldable Tripod. Proof of Vaccination for Yellow Fever. Extra Credit Card. Cash for Tips and Restaurants. Other. Clothes. Underwear. Good ...

Packing List Camp.pdf
(Please bring your own shoe for water-based activities). Spectacles band/ extra spectacles-> for students wearing spectacles. 1.5L Water Bottle. Sleeping Bag – lightweight type. Insect Repellent (Non-perfumed) eg Johnson's Clear Baby Lotion Anti Mo

Wanee Packing List.pdf
__Refillable water bottle. __Coolers (drink cooler & food. cooler). __ICE ICE BABY (freeze 1/2 gallon. milk jugs or other recycled. containers). __Plates & bowls. __Silverware. __Paper towels. __Pots/pans. __Cooking utensils- spatula, knife,. spoon,