This is on-line version of our paper:
Nature Methods,2006,3:605-607
Removal of a time barrier for high-resolution multi-dimensional NMR spectroscopy Victor Jaravine1, Ilgis Ibraghimov2, Vladislav Yu. Orekhov1* 1
The Swedish NMR Centre at Gothenburg University, Box 465, 40530 Gothenburg, Sweden 2 Saarbrücken University, Mathematical Department, Saarbrücken, D-66041, Germany
Keywords: MDD, fast methods, high-resolution, missing data, sparse, non-uniform sampling, PARAFAC Abbreviations: ppm, parts per million; FID, free induction decay; DFT, discrete Fourier transform; nD, ndimensional; MDD, multidimensional decomposition. *correspondent author,
[email protected]
We
introduce
decomposition
a
recursive
(R-MDD)
multi-dimensional
method
to
by uniform sampling of the data points needed to fill
speed-up
the huge space of multidimensional spectra. The total
recording of high-resolution NMR spectra. This
time of the traditional M-dimensional experiment with
method has a logarithmic dependence of measurement
N points in all indirect dimensions is estimated as: TDFT(M, N) = NM-1 2M-1 TFID
time on the size of indirect spectral dimensions, enjoys sensitivity and resolution advantages of optimized non-uniform acquisition schemes, and is applicable to all types of spectra of biomolecules. We demonstrate performance for several triple resonance experiments recorded on three globular proteins with molecular weights 8-22 kDa.
where TFID is the time for a single data point. It is clear (Fig. 1) that for dimensionality higher than two it is practically impossible to sample more than 100 points using conventional approach. This situation is known as the “sampling barrier”, which limits typical dimension sizes for 3D and 4D spectra to within 20-50
Since the definitive introduction by Richard Ernst of
points
Fourier Transformation and pulsed NMR spectroscopy
dimensional spectroscopy. In most cases, information
1
, molecular biologists had benefited from a dramatic
content of spectra can be substantially higher would
increase in sensitivity and resolution. Currently NMR
the larger sizes be feasible. For example, for a
spectroscopy plays an essential role in structural
deuterated 30 kDa protein measured at spectrometer
genomics as a tool for determining the spatial structure
magnetic field of 21.1 Tesla, the optimal sizes for
of proteins with atomic resolution
2,3
. However,
15
and
prohibits
Nx-TROSY and
13
practical
use
of
higher
Cx2Hz, coherences were estimated
throughput was limited by long measurement times4 as
to be 554 and 1395, respectively5. For 10 kDa and 100
conventional multidimensional NMR can take two to
kDa proteins the numbers are approximately twice as
six weeks of measurements per structure. With modern
large and small, respectively.
sensitive instrumentation, this time is mostly defined
A general solution for overcoming the sample-limiting
superposition of sine waves decaying with time. This
barrier is seen in unconventional sampling and signal
behavior of the NMR signal is exploited, for example,
processing schemes, several of which were recently
in the linear prediction 9, the method routinely applied
demonstrated6-8. In an approach referred as non-
to rescue digital resolution from short time series.
uniform or sparse sampling data points are measured here
and
there
with
non-equal
pseudo-random
intervals. By employing ideas of matched acquisition
9
this sparse sampling schedule can be optimized (Supplementary Figure 1 online) for providing resolution and sensitivity that are superior to uniformsampling Fourier transform based methods. Until now, however, Maximum Entropy (ME)10 and Multidimensional decomposition (MDD)
11,12
, which are the
two methods capable of dealing with the sparse sampling, both required at least 20% of the full data set for successful spectra reconstruction. The main result of this work is a substantial improvement in the MDD analysis, which allows spectra reconstruction with a drastically smaller proportion of the data.
In this work, we show that the autoregressive assumption can be incorporated into the MDD processing. In particular, each of the time domain shapes of length N in the original MDD model is recursively
subjected
to
additional
MDD
decomposition into a product of K vectors of length d so that N = dK. This converts the original Mdimensional decomposition to (M-1+K)-dimensional one, which we entitle R-MDD. The aim of the second MDD decomposition is to reduce the number of unknowns in the least-square minimization in the MDD model. For small d values (we use d = 2) the original vectors with N unknown elements are defined by much smaller number of parameters (d-1)Logd(2N). This provides the basis for further timesaving,
Reconstruction of a spectrum from a non-uniformly
considering that number of measurements should at
sampled time signal is the equivalent to prediction of
least be equal to number of adjustable parameters.
data lacking actual measurements. The only way to
Assuming that not more than half of signals (total Nc)
solve this problem is to introduce assumptions or, in
have the same position in the directly detected
other words, employ prior knowledge about the
dimension, we make an estimate of the measurement
spectrum. The mathematical model of the MDD,
time needed for a R-MDD reconstruction:
which was introduced three decades ago, assumes that
TR-MDD(M, N) = Nc (M-1)(d-1) logd(2N) TFID
all essential features of a M-dimensional matrix can be described as the sum of a small number of tensor products of one-dimensional vectors 13. When MDD is applied to NMR, the matrix corresponds to an input spectrum, whereas the vectors represent output lineshapes. The original MDD approach does not apply any constraints on the appearance of the line-shapes. However, using a model with a smaller number of adjustable parameters has an advantage that a bigger fraction of missing data can be predicted. In most practical cases, the NMR line-shapes are known. Theory predicts the autoregressive property of the NMR
time
domain
signal,
Removal of the time barrier in NMR
exemplified
2
by
a
While equation 1 gives a clear illustration of the sampling barrier in conventional multidimensional NMR, equation 2 gives solution to this problem. Logarithmic dependence on the dimension sizes and linear
dependence
on
spectrum
dimensionality
essentially eliminates any practical restrictions on these parameters. For example, a 5D experiment with all indirect dimensions sizes of 100 points would take 101 years of measurement time using conventional uniform sampling scheme. R-MDD using matched non-uniform sampling obtains such spectrum in 2.4 hours, which is comparable with time offered by the
methods based on 2D projections7,8 and 13 times faster
both for weak and strong peaks (Supplementary
than the original MDD (Supplementary Methods
Figure 3a online) with the correlation factor R of
online).
0.9984. The accuracy of the peak positions in the
Precise spectral R-MDD reconstructions with 91-94% missing data for the systems with molecular weights ranging from 8 to 22 kDa (Table 1) demonstrate general applicability of the method. We verify accuracy of the reconstruction by comparing it with the conventional “full” spectrum referred hereafter as
reconstructed spectrum with respect to the reference is 0.0056 ppm. The value should be compared with five times poorer value (0.0270 ppm) obtained for the conventional (“truncated”) spectrum (Fig. 2a) that was recorded during the same measurement time. The accuracy in
13
Cα frequencies can be important, for
“reference”. The time estimates from Eq. 2 for each
example, during sequential assignment, as it defines
protein compare well with the actual minimal
how many sequential connectivity links need to be
measurement
all
considered for a particular resonance. For the
backbone resonances, and are related to the number of
“truncated” experiment 43% of all resonances have
peaks in the spectra. Below we focus on describing the
ambiguous connectivities within the range defined by
most demanding case: 3D HNCA spectrum of a 14
the accuracy, while for the R-MDD reconstruction
kDa protein azurin recorded with the large number of
only 9% resonances have non-unique connectivity. In
times,
required
for
obtaining
increments to obtain high resolution in the
13
Cα
dimension. The reconstructions (Fig. 2c) exhibit higher noise compared to the reference (Fig. 2b), since sensitivity intrinsically decreases with reduction of measurement time. Analysis of this spectrum has additional complexity because of resolved J(Cα-Cβ)coupling, which almost doubles the number of peaks. A total of the 246 signals (singlet or duplet) corresponding to the published backbone assignment of 123 residues of azurin
14
were observed in the
region from 5.75 to 10.75 ppm in the amide proton dimension of the reference 3D HNCA spectrum. All of these
signals
(their
line-shapes, intensities
and
positions) are correctly reproduced in the reconstructed azurin spectrum (Supplementary Figure 2a online), with the exception of the three weakest inter-residue sequential correlations (Gly45, Gly88, and Phe114). It should be emphasized that the distribution of the peak intensities spans two orders of magnitude including a large number of weak signals that define sensitivity and final time allocation. The signal intensities in the reconstruction and the reference are highly correlated Removal of the time barrier in NMR
3
general, precise data allows for high degree of automation and is vital for obtaining accurate and unambiguous spectral NMR information
15
. To
summarize, we present a new method that solves the sampling limit problem. This opens an avenue for routine usage of higher spectroscopic dimensions in biomolecular NMR spectroscopy. Thus, we expect it to become a “single-stop solution” for highly automated resonance assignment and structure determination. We implemented non-uniform sampling for R-MDD processing
on
commercially
available
NMR
spectrometers (Varian). The processing package will be available from the authors. ACKNOWLEDGEMENTS This work was supported by grants from the Swedish Foundation for Strategic Research (A3 04:160d), the Swedish National Allocation Committee (SNIC 3/0444), Swedish Research Council (621-2005-2951), Wenner-Gren Foundation. The authors are grateful to G. Karlsson and S. Grzesiek for the azurin and ubiquitin samples.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
11. 12. 13. 14. 15.
Ernst, R.R. Angew. Chem. Int. Edit. 31, 805-823 (1992). Montelione, G.T., Zheng, D.Y., Huang, Y.P.J., Gunsalus, K.C. & Szyperski, T. Nat. Struct. Biol. 7, 982-985 (2000). Yee, A.A. et al. J. Am. Chem. Soc. 127, 1651216517 (2005). Liu, G. et al. Proc. Natl. Acad. Sci. USA 102, 10487-10492 (2005). Rovnyak, D., Hoch, J.C., Stern, A.S. & Wagner, G. J. Biomol. NMR 30, 1-10 (2004). Mandelshtam, V.A., Taylor, H.S. & Shaka, A.J. J. Magn. Reson. 133, 304-312 (1998). Atreya, H.S. & Szyperski, T. Proc. Natl. Acad. Sci. USA 101, 9642-9647 (2004). Freeman, R. & Kupce, E. 23A, 63-75 (2004). Hoch, J.C. & Stern, A.S. NMR data processing, xi, p.196 (Wiley-Liss, New York, 1996). Stern, A.S. & Hoch, A.S. Maximum Entropy Reconstruction in NMR. in Encyclopedia of NMR, Vol. 8 (ed. Harris, D. & Grant, R.) (John Wiley & Sons, Chichester, 1996). Orekhov, V.Y., Ibraghimov, I. & Billeter, M. J. Biomol. NMR 27, 165-173 (2003). Tugarinov, V., Kay, L.E., Ibraghimov, I. & Orekhov, V.Y. J. Am. Chem. Soc. 127, 2767-2775 (2005). Kruskal, J.B. Linear Algebra Appl. 18, 95–138 (1977). van de Kamp, M. et al. Biochemistry 31, 1019410207 (1992). Altieri, A.S. & Byrd, R.A. Curr. Opin. Struct. Biol. 14, 547-553 (2004).
Removal of the time barrier in NMR
4
Tables & Figures
Table 1. Results of R-MDD analysis of the protein spectra. Ubiquitin Spectrum type Sample concentration, mM time per one real data point, TFID, sec Mol. Weight, kDa Dimensions sizes, 13C,15N, complex points Number of spectral signals All peak 13C position r.m.s.d., R-MDD vs. reference, ppm Time estimates from eq. 2, hours Actual time needed to acquire all backbone correlations, hours, (% sparse)
azurin
HNCO 1.7 2.4 8 32 32
HNcoCA 1.7 4 8 64 64
HNCA 1.0 4 14 400 62
barstar-barnase complex HNCO 1.2 2.8 22 64 64
70 0.0015
70 0.010
481 0.006
101 0.005
0.5 0.2 (7%)
1.1 0.6 (6%)
9.2 8.7 (6%)
1.1 1.5 (9%)
Figure 1. Theoretically estimated times Texp required for experimental measurement of 2D, 3D, 4D and 5D spectra using RMDD reconstruction (solid lines, Eq.2 for d = 2, Nc = 70, TFID = 4 sec) and conventional method (dash lines, Eq.1) as a function of indirect dimensions sizes N.
Figure 2. High resolution in the R-MDD reconstructions.
13
Cα -1HN regions (top) and
13
Cα 1D cross-sections (bottom) are
compared for the signals of Gly123, Val31 in (a) “truncated”, i.e. conventional 3D HNCA with 40(13Cα)x40(15N) points, experimental time 8.9 hours; (b) “reference”, i.e. conventional spectrum with 400(13Cα)x64(15N) points, 138 hours ; (c) RMDD reconstruction from 6% of data, 8.9 hours.
Removal of the time barrier in NMR
6
Suppl. Tables & Figures
Supplementary Methods – R-MDD theory and experimental details find scalar numbers βa and normalized vectors Calculation of measurement time for the case of uniformly sampled experiment. The total duration
β
Fm, with elements βFm(nm) (nm=1..Nm), such that
the following norm becomes minimal.
of an M-dimensional NMR experiment is defined
| G • [S – Σβ (βa β F1 ⊗ β F2 …⊗ β FM)]|2 +
by three factors: (i) TFID - time required per single
λΣβ (βa)2
transient of a one-dimensional spectrum (or FID), times the number of transients in a phase cycle used for eliminating spurious signals; (ii) Nm number of sample points in each of M indirect dimensions; (iii) Np=2M-1 - the number of FID’s required for phase discrimination in the indirect dimensions. Without loosing generality of this presentation we assume equal number of points Nm=N in all indirect dimension. Thus, the time in seconds of a conventional experiment with uniformly sampled points can be expressed as following: TDFT(M, Nm) = NM-1 2M-1 TFID
Here, the symbol ⊗ denotes tensor product operation;
the
matrix
S
corresponds
to
experimental M-dimensional NMR spectrum in time and/or frequency domain representation. In the case of sparse sampling only a fraction of elements in S is measured and the matrix G, which
contains
elements
gn1,n2,…,nM ∈ {0,1},
indicates the absence or presence of a particular data point. Accordingly, the symbol • describes element-wise multiplication of matrices. The last term represents a Tikhonov regularization3, which is parameterized with the factor λ and may be (S1) used for improving the convergence of the MDD algorithm 4. The summation index β runs over the
The general MDD model. Multidimensional
number
of
components
decomposition (MDD) model assumes that all
decomposition. The range for this index depends
essential features of M-dimensional matrix can be
on the type of spectrum. For example, for a 3D
described as a sum of small number of tensor
HNCO spectrum it is roughly equal to the number
products of one-dimensional vectors 1,2. MDD has
of protein amide groups.
been used in different fields as a tool for data
Each vector
analysis and signal processing since the early
elements. We make an estimate of measurement
seventies under various names such as the parallel
time needed for MDD reconstruction from the
factor analysis, canonical decomposition, and
notion that number of measurements should
three-way decomposition. When applied to NMR
exceed number of adjustable parameters with a
spectra the MDD can be formulated as follows.
certain redundancy. We also need to make an
Given a matrix S with sizes Nm of its M
assumption about maximal number of peaks that
dimensions (m=1…M) and elements Sn1,n2,…,nM ,
have the same position in the directly detected
β
used
for
the
Fm is defined by Nm unknown
dimension. Altogether, the redundancy and signal
Note that Eq. S4 is a generalization of Eq. S5,
overlap give a factor, which is approximately
when d is any integer number and vector elements
equal to half of the number of peaks in the
are not restricted to integer powers of exp(-Δt).
spectrum. Finally, the measurement time is
The aim of the second MDD decomposition is to
estimated as:
reduce number of unknowns in the least-square
TMDD(M, Nm) = Nc (M-1) Nm TFID where Nc is number of spectral signals. For example, for HNCO spectrum of ubiquitin Nc is ca 70.
minimization of Eq S2. This is (S3)achieved by replacing vectors β Fm for all β in Eq. S2 by the right side of Eq. S4. This merely converts Mdimensional
decomposition
to
(M-1+Km)-
dimensional one. In case of phase sensitive detection, the complex vector
The Recursive MDD model. Each of the time m
β
Fm with 2Nm
unknown elements is defined by much smaller
of length Nm in Eq. S2 is
number of real parameters 2(d-1)*Logd(2Nm).
represented as a product of Km vectors of length
Finally, using the same arguments as for Eq. S3,
dm so that Nm = (dm)Km:
total measurement time needed for robust R-MDD
domain shapes β F
β
Fm= β V1 ⊗ β V2 …⊗ β VK
reconstruction of the spectrum can (S4)be estimated
This can be interpreted as applying second
as, TRMDD(M, Nm) = Nc (M-1) (d-1) Logd(2Nm)
decomposition to the shapes obtained in the first decomposition, thus the name “recursive”. To
TFID
rationalize the additional decomposition of each
This equation with Nm=N is plotted for M=2,3,4,5
k
(S6)
vector F into a set of subvectors V , (we can omit
as solid curves in Figure 1 of the paper using d=2,
indices m, for simplicity), we shall illustrate the
TFID =4 sec, and Nc=70 (number of backbone
recursive decomposition for d=2. The points of
amides in ubiquitin).
exponential decay function E(t)=exp(-t) sampled
logarithmic dependences on number of increments
at
t=nΔt,
in the indirect dimensions, Nm. The exponential
(n=0,…,N), constitute vector E={exp(-Δt) , exp(-
dependence (Nm)M-1 on the number of dimensions
Δt)1,… exp(-Δt)N-1} of length N. The vector can be
M in Eq. S1 is reduced to linear term (M-1) in Eq.
“folded” into a hypercube with N elements, where
S6. From the plot it is clear that all of the
each side has length 2. The hypercube can be
experimental times needed for the MDD 2D-5D
uniformly
spaced
time
points 0
k
decomposed into K=log2N vectors V = {1, exp(k
Δt) } of length two
The curves exhibit
spectral processing are below 5 hours, which is generally the 2D range of standard Fourier
E= {1, exp(-Δt)1} ⊗ {1, exp(-Δt)2} ⊗ {1, exp(-Δt)4… }
spectroscopy. Eq. S6 provides the following enhancement in (S5) time saving relative to eq. S3:
Removal of the time barrier in NMR
8
TMDD(Nm) / TRMDD(Nm) = Nm / {(d-1) Logd(2Nm) }
shapes as well as ability of the method (S7) to deal
The ratio increases with larger Nm (e.g. equals 13
with signals in wide range of intensities.
for Nm = 100). It does not depend on spectra dimensionality and number of components. Equations S4 – S7 were derived in the assumption
Materials and Methods
that the recursion, and correspondingly the
NMR spectroscopy. All spectra were recorded
autoregressive constraint on the time domain line
using room temperature triple resonance probes
shapes, is applied in all indirect dimensions.
with pulse field gradients. The proteins were
However, the constraint can be used in a subset of
13
C,15N labeled (except for barstar): ubiquitin14;
dimensions. Example is the 3D 15N NOESY
barstar-barnase complex12,13; reduced azurin11.
experiment, where for reducing number of
The key experimental parameters are listed
components and preserving sensitivity, all peaks
(Supplementary Table 1).
belonging to one amide group are collected in one
Sampling schedules. Optimal non-linear sampling
MDD component5. In this case, the recursion can
schedules are devised to match the envelopes of
be applied only to the
15
N dimension. Thus,
signal coherences in all indirect dimensions for a
although NOESY type spectra will also profit
particular NMR experiment 9. Since 15N chemical
from R-MDD, the expected timesaving relative to
shift evolves in all our experiments during a
original MDD in this case is moderate.
constant delay, the envelope in this dimension is a
MDD is a true multidimensional signal processing
constant function. The envelope of the signals in
method, as it analyses entire multidimensional
13
C dimension is the exponential decay with
data array simultaneously. An additional potential
transverse relaxation time T2 (Supplementary
offered by the method is in its ability to process
Table 1). The convolution of the two envelopes
non-uniform data, when valuable spectrometer
produces a two-dimensional probability density
time
function in two acquisition dimensions t1 and t2.
is
preferentially
spent
on
obtaining
information on signals of interest rather than
The sampling
noise. This insures highest sensitivity for given
accordance with the density function. For
digital resolution and total measurement time.
example, in the sampling schedule optimized for
High sensitivity delivered by the MDD processing
the 3D HNCA of azurin (Supplementary Figure
have
been
addressed
7
generated
in
and
1) in addition to the T2-decay, the sampling
. A related
density is modulated by cosine function due to
specifically
demonstrated in practical cases
was randomly
6
technique 8, which is designed for dealing with
J(13Cα-13Cβ)-coupling of 35 Hz.
GFT projections, was also proved successful in identifying weak signals. Likewise, sensitivity of the R-MDD is reflected in demonstrated high accuracy of signal amplitudes, positions, and line Removal of the time barrier in NMR
9
Data processing and reconstruction for all spectra were done using the same procedure. It is
exemplified below for the azurin case. Initially
were merged to obtain the whole spectrum
time domain data along the directly detected
reconstruction.
dimension, t3 of the reference spectrum were
components for each of the 48 segments was
multiplied with a squared cosine-bell window
estimated as following: the number of 1HN-15N
function, zero-filled to 2K points, and Fourier
correlations in the corresponding 1H strip of the
transformed with nmrPipe package
10
. The
15
The
number
of
the
MDD
N HSQC spectrum was multiplied by 4 to
“truncated” data set was derived from the
account for the intra- and inter-residue duplets in
“reference” by truncation after initial 40 (t1), 40
the HNCA spectrum; the number was further
(t2) points. The “sparse” data set was extracted
increased by 30% to account for presence of
from the “reference” according to the sampling
possible minor peaks and some of the large noise
schedule (Supplementary Figure 1) containing
features
1600 points (t1,t2). A complete time domain
reconstruction, which produced a regular 3D data
signal containing 29=512 (t1), 26=64 (t2) complex
set in the format of nmrPipe, the time domain
points was reconstructed from the sparse data set
signal was multiplied with a squared cosine-bell
using the R-MDD procedure (Eqs. S2, S4)
window function, zero-filled to double size and
implemented in a homebuilt software mddNMR.
Fourier transformed. This gave digital resolution
Note that the final size in
13
a
C dimension (t1) in
or
spectral
artefacts.
of 0.029 ppm (5.17 Hz) for
13
After
the
Ca dimension. The
the reconstruction is 25% larger than that in the
peaks positions in the reference spectrum are
reference spectrum. This shows that R-MDD does
determined using ‘pkFindROI’ subroutine of the
not only fill gaps between the measured points,
nmrPipe package
but also can effectively extrapolate time domain
all peak lists were refined by three-point
data. Application of the recursion (Eq. S4) to
interpolation. In the analysis the parameters of
dimensions t1 and t2 but not to t3 resulted finally
duplets were determined as average of two
in 9+6+1 = 16 – dimensional decomposition in
singlets. To check stability of the algorithm
Eq.
and
convergence we repeated the R-MDD calculation
reconstruction were performed individually for 48
with the same experimental input but different
overlapped 3D segments covering the range from
initial approximation for the shapes β VK (Eq. S4).
5.75
S2.
to
The
10.75
R-MDD
ppm
calculation
(w3-direct
1
H),
each
corresponding to 32 frequency-domain points. The segmentation doesn’t affect the algorithm convergence and quality of reconstruction. It was done only to save computational time by parallel processing, since segments can be processed independently. To account for possible incomplete reconstruction of peak tails on the segments borders (in w3) only the central 16 point parts Removal of the time barrier in NMR 10
10
. The exact peak positions in
The result was essentially the same as for the first run described in the main text: the all-signal r.m.s.d. to reference was 0.0062 ppm in the CA dimension and the correlation of intensities was 0.998. These values for the difference between the two solutions were: 0.0036 ppm and 0.9985, respectively.
8. 9.
References 1. 2. 3. 4. 5. 6. 7.
Malmodin, D. & Billeter, M. J. Am. Chem. Soc. 127, 13486-7 (2005). Hoch, J.C. & Stern, A.S. NMR data processing, xi, 196 p. (Wiley-Liss, New York, 1996). Delaglio, F., Grzesiek, S., Vuister, G.W., Zhu, G., Pfeifer, J. & Bax, A. J. Biomol. NMR 6, 277-93 (1995). Leckner, J. Folding and structure of azurin the influence of a metal PhD thesis, Chalmers Technical University (2001). Korzhnev, D.M. et al. Appl. Magn. Reson. 21, 195-201 (2001). Zhuravleva, A.V., Korzhnev, D.M., Nolde, S.B., Kay, L.E., Arseniev, A.S., Billeter, M. & Orekhov, V.Y. manuscript in preparation (2006). Wang, A.C., Grzesiek, S., Tschudin, R., Lodi, P.J. & Bax, A. J. Biomol. NMR 5, 376-382 (1995).
Kruskal, J.B. Linear Algebra Appl. 18, 95–138 (1977). Beylkin, G. & Mohlenkamp, M.J. Proc. Natl. Acad. Sci. USA 99, 10246-10251 (2002). Tikhonov, A.N.a.S., A.A. Equations of mathematical physics, (Dover Publ., New York, 1990). Ibraghimov, I. Num. Linear Algebra Appl. 9, 551-565 (2002). Orekhov, V.Y., Ibraghimov, I. & Billeter, M. J. Biomol. NMR 27, 165-173 (2003). Luan, T., Jaravine, V., Yee, A., Arrowsmith, C.H. & Orekhov, V.Y. J. Biomol. NMR 33, 114 (2005). Tugarinov, V., Kay, L.E., Ibraghimov, I. & Orekhov, V.Y. J. Am. Chem. Soc. 127, 27672775 (2005).
10. 11. 12. 13.
14.
Supplementary Table 1. The samples and spectral parameters of the 3D experiments.
ubiquitin
Barstar-barnase
azurin
complex Experiment type
HNCO
HNcoCA
HNCO
HNCA
1
800
600
800
900
Number of transients,
2
4
4
4
Interscan delay, sec
1.2
1
0.75
1.25
Sample concentration, mM
1.7
1.4
1.2
1.0
Mol. Weight, kDa
8
8
22
14
dimensions, complex points 13C,15N
32 32
64 64
64 64
400 64
Spectral width, Hz, 13C,15N
2000 2000
4500 1500
2350 2800
6800 2500
Limits 1H, ppm
6.047 9.702
6.742 10.397
5.75 10.75
Bias in sparse sampling, 13C
T2=50 ms
T2=50 ms
T2=30ms
T2=50ms, 1J=35 Hz
Number of points in the sparse
72
246
369
1600
3.0
10
16.8
138
0.16
0.23
0.092
0.029
H spectrometer frequency, MHz
sampling schedule Full experiment measurement time, hours Digital resolution 13C, ppm/pnt
Removal of the time barrier in NMR
11
Supplementary Figure 1 Optimized sampling schedule for acquisition of high-resolution 3D HNCA spectrum of azurin. 1600 points are distributed non-linearly to match optimal sensitivity. Uniform random distribution of sampled points is modulated by homo-nuclear 13Cα-13Cβ coupling of 35 Hz and exponential decay of 50 ms in 13Cα dimension; constant amplitude signal is assumed in 15N dimension.
Supplementary Figure 2 Comparison with the reference of individual resonance reconstructions for 3D HNCA of azurin. All peaks with reported 15N -1HN backbone assignment are shown: 1st column of each cell is the reference spectrum; 2rd column is the R-MDD reconstruction from the sparse data.
13
C -1HN
regions for the amides of the backbone resonances are shown above the 13C 1D cross-sections, taken at the peak centers. The 15N, 1H, and 13C peak positions are indicated in ppm ; the cell sizes in ppm are indicated of the 1st page. The plots have the same intensity threshold and contour multiplication factor of 1.3. For better visibility, the 1D cross-sections of weak signals were scaled up. The corresponding scaling factors (1x or 3x) are indicated for all peaks.
Removal of the time barrier in NMR
12
Removal of the time barrier in NMR
13
Removal of the time barrier in NMR
14
Removal of the time barrier in NMR
15
Removal of the time barrier in NMR
16
Removal of the time barrier in NMR
17
Removal of the time barrier in NMR
18
Removal of the time barrier in NMR
19
Supplementary Figure 3 Comparison with the reference of individual resonance reconstructions for 3D HNCO of barnase in complex with barstar. All peaks with reported
15
N -1HN backbone assignment are
shown: 1st column of each cell is the reference spectrum; 2rd column is the R-MDD reconstruction from the sparse data. 13C -1HN regions for the amides of the backbone resonances are shown above the 13C 1D cross-sections, taken at the peak centers. The 15N, 1H, and 13C peak positions are indicated in ppm ; the cell sizes in ppm are indicated of the 1st page. The plots have the same intensity threshold and contour multiplication factor of 1.3. For better visibility, the 1D cross-sections of weak signals were scaled up. The corresponding scaling factors (1x or 3x) are indicated for all peaks.
Removal of the time barrier in NMR
20
Removal of the time barrier in NMR
21
Removal of the time barrier in NMR
22
Removal of the time barrier in NMR
23
Supplementary Figure 4 Comparison with the reference of individual resonance reconstructions for (a) 3D HNCO and (b) 3D HNcoCA of ubiquitin. All peaks with reported 15N -1HN backbone assignment are shown: 1st column of each cell is the reference spectrum; 2rd column is the R-MDD reconstruction from the sparse data. 13C -1HN regions for the amides of the backbone resonances are shown above the 13C 1D cross-sections, taken at the peak centers. The 15N, 1H, and 13C peak positions are indicated in ppm ; the cell sizes in ppm are indicated of the 1st page. The plots have the same intensity threshold and contour multiplication factor of 1.3. For better visibility, the 1D cross-sections of weak signals were scaled up. The corresponding scaling factors (1x or 3x) are indicated for all peaks.
Removal of the time barrier in NMR
24
a
Removal of the time barrier in NMR
25
Removal of the time barrier in NMR
26
b
Removal of the time barrier in NMR
27
Removal of the time barrier in NMR
28
Supplementary Figure 5 Accuracy of intensities and peak positions (a-b) between 6% data reconstruction and reference spectra for all backbone 3D HNCA correlations of protein azurin. (a) High level of “reconstruction-reference” correlation of intensities (normalized to the max intensity). (b) All-peak histograms of differences of bars, “6% sparse”) for the “reconstruction-reference” and
13
Cα chemical shifts is plotted (filled
(“40x40 truncated”, gray bars) for “truncated-reference”.
Reconstruction and truncated spectra had total of 1600 pts in two indirect dimensions, with equal experimental times of ca 8.9 hours each. Reference spectrum was recorded with 400x64 points with experimental time of 138 h.
Removal of the time barrier in NMR
29