Parallelization of reconstruction algorithms in three ...

Viewer
Transcript

Applied Mathematical Modelling 30 (2006) 688–701 www.elsevier.com/locate/apm

Parallelization of reconstruction algorithms in three-dimensional electron microscopy J.R. Bilbao-Castro b, J.M. Carazo b, I. Garcı´a a, J.J. Ferna´ndez

a,*

a

b

Dept. Arquitectura de Computadores, Universidad de Almerı´a, 04120 Almerı´a, Spain Biocomputing Unit, Centro Nacional de Biotecnologı´a, Universidad Auto´noma, 28049 Madrid, Spain Received 20 March 2005; received in revised form 10 May 2005; accepted 16 May 2005 Available online 19 December 2005

Abstract Structure determination of biological specimens is one of the key issues addressed in biosciences because the functionality of biological machinery is strongly linked to its spatial structure. Although microscopes present limitations in terms of scales, indirect approaches for electron microscopy have been developed to allow visualization of biological macromolecules at nearly atomic resolution. Powerful mathematical algorithms allow researchers to obtain three-dimensional models of biological specimens. Nevertheless, those algorithms imply huge computational costs, which prevent them to be broadly used to study the structure of biological specimens. In this work we compare some of those algorithms and propose a parallelization strategy to help overcome their demands of computational power. 2005 Elsevier Inc. All rights reserved. Keywords: 3D electron microscopy; Reconstruction algorithms; Parallel computing; Distributed computing; High performance computing

1. Introduction Electron microscopy (EM) is central to the study of many structural problems in BioSciences. Availability of structural information is critical to understanding the biological function at all levels of detail [1]. Combined with image processing and three-dimensional (3D) reconstruction techniques, EM yields quantitative information about the 3D structure of biological specimens [2,3]. Consequently, this discipline is commonly known as three-dimensional electron microscopy (3DEM) [2]. This work focuses on the macromolecular ˚ , such as proteins) [4]. domain (specimens with sizes around 200–100 A Structural analyses of macromolecules involve processing thousands EM projection images taken from the specimen at very diﬀerent orientations. In order to preserve as much detail as possible, the specimens are imaged at very low electron doses, which makes the EM images extremely noisy (SNR in the order of 0.1). The EM images are combined together through reconstruction algorithms to derive the 3D structure at *

Corresponding author. Tel.: +34 950015711; fax: +34 950015486. E-mail address: [email protected] (J.J. Ferna´ndez).

0307-904X/$ - see front matter 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.apm.2005.05.019

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

689

˚ ) to discern important structural features [2,3]. State-of-theenough resolution (typically in the range of 6–20 A art structural analyses involve a number of images around 10,000 [5], 20,000 [6–8], 35,000 [9–11], 50,000 [12] ˚. and 75,000 [13] to determine the 3D structure of biological macromolecules at resolution in the range 6–10 A With the advent of computational methodologies for automatic detection and extraction of EM images [14], it is expected the involvement of hundreds of thousands of images in such structural studies in the short term. ˚ resoThe ultimate goal pursued by 3DEM is structure determination up to atomic detail (better than 3–4 A lution) [15,16]. The development of the technology in this ﬁeld will make such a goal aﬀordable in the near future. Such levels of resolution require the combination of million(s) of EM images [17,16], hence huge computational demands. Rigorous structural analyses require that image reconstruction introduces as little noise and artifact as possible at the spatial scales of interest, for a proper interpretation of the structure. Weighted backprojection (WBP) [18] is the standard reconstruction method, whose relevance stems mainly from its simplicity. Series expansion reconstruction methods constitute one of the main alternatives to WBP to image reconstruction, and are getting increasing interest in this ﬁeld [19–22]. In general, these methods (i) yield smoother solutions under extremely noisy conditions and (ii) exhibit better behaviour under limited-angle conditions than WBP. They are thus better suited for high-resolution structure determination of macromolecules [19]. Nevertheless, these methods still have not been extensively used mainly due to their computational cost. This cost mainly comes from the fact that they proceed iteratively until convergence (i.e. a solution coherent with the experimental data is obtained). This work addresses the parallelization of several 3D reconstruction algorithms in the ﬁeld of 3DEM of macromolecules. As stated above, huge amounts of EM images have to be combined in order to reach high resolution in structure determination. Parallel and distributed computing [23] then emerges as a crucial tool to make this problem aﬀordable. Although this point already raised some time ago in this ﬁeld [15], only recently it is being paid attention [24–26]. In this work, both types of reconstruction algorithms, WBP and series expansion methods, have been parallelized and their performance has been evaluated in terms of speedups and computation versus communication times. With regard to series expansion methods, this study has included traditional techniques [27], such as algebraic reconstruction techniques (ART), and other recently developed methods: such as averaging sequential strings (ASS) [28] or component averaging methods (CAV) [29] characterized by better convergence. In these methods, overlapping spherically symmetric volume elements (blobs) with smooth transition to zero have been used as basis functions since they are better suited for representing natural structures [30,31,19]. 2. 3D reconstruction algorithms 2.1. Weighted backprojection Currently, the standard tomographic reconstruction method in the ﬁeld is WBP (weighted backprojection) [18]. The method simply distributes the known specimen mass present in projection images evenly over computed backprojection rays. In this way, specimen mass is projected back into a reconstruction volume (i.e., backprojected). When this process is repeated for a series of projection images recorded from diﬀerent tilt angles, backprojection rays from the diﬀerent images intersect and reinforce each other at the points where mass is found in the original structure. Therefore, the 3D mass of the specimen is reconstructed (Fig. 1, left). The relevance of WBP in 3DEM mainly stems from its linearity and its computational simplicity. Its disadvantages, however, are (i) the sensitiveness to limited tilt angle conditions found in EM, and (ii) that it does not implicitly take into account the noise conditions nor the transfer function of the electron microscope. 2.2. Series expansion methods Series expansion reconstruction methods assume that the 3D object or function f to be reconstructed can be approximated by a linear combination of a ﬁnite set of known and ﬁxed basis functions bj

690

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

Fig. 1. 3D reconstruction algorithms. Left: WBP. Right: Series expansion methods.

f ðr; /1 ; /2 Þ

J X

xj bj ðr; /1 ; /2 Þ;

ð1Þ

j¼1

(where (r, /1, /2) are spherical coordinates), and that the aim is to estimate the unknowns xj. These methods also assume an image formation model where the measurements depend linearly on the object in such a way that yi

J X

li;j xj ;

ð2Þ

j¼1

where yi denotes the ith measurement of f and li,j the value of the ith projection of the jth basis function. Under those assumptions, the image reconstruction problem can be modeled as the inverse problem of estimating the xjs from the yis by solving the system of linear equations given by Eq. (2). Such systems of equations are typically solved by means of iterative methods. ART methods constitute one of the best known families of iterative algorithms to solve such systems [27]. CAV methods have recently arisen [29] as eﬃcient iterative algorithms for solving large and sparse systems of linear equations. These methods have been derived from ART, with the important innovation of a weighting related to the sparsity of the system. This component-related weighting yields a convergence rate that may be far superior to the ART methods, specially at the early iteration steps. Assuming that the whole set of equations in the linear system (Eq. (2)) may be subdivided into B blocks each of size S, a generalized version of iterative methods can be described via its iterative step from the kth estimate to the (k + 1)th estimate by P S X y i Jv¼1 li;v xkv k xkþ1 ¼ x þ k l ; ð3Þ P k j j J 2 i;j b s¼1 v¼1 wv ðli;v Þ where kk denotes the relaxation parameter, b = (k mod B) is the index of the block, i = b S + s is the ith equation of the system and wbv denotes the weighting factor. The weighting factor is set up to S in the case of ART methods. In CAV methods, this factor wbv is set up to the number of times that the component xv of the volume contributes with nonzero value to the equations in the bth block. The processing of all the equations in one of the blocks produces a new estimate. All blocks are processed in one iteration of the algorithm. This technique produces iterates that converge to a least squares solution of the system provided that the relaxation parameter is optimized [27,29]. ART and CAV methods can be classiﬁed into the following three categories as a function of the number of blocks:

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

691

Sequential. This version cycles through the equations one-by-one producing consecutive estimates (S = 1). This method exactly matches the well-known row-action ART method [27] and is characterized by a fast convergence as long as relaxation factors are optimized. Simultaneous. This version uses only one block (B = 1), considering all equations in the system in every iterative step. This version is inherently parallel, in the sense that every equation can be processed independently from the others. Simultaneous methods are characterized by better robustness under noise, but with a slow convergence rate. SIRT (simultaneous iterative reconstruction technique) [32] is the simultaneous version of ART. For CAV methods, that version is simply known as CAV [33]. Block-iterative. The block-iterative (BI) version represents the general case. In essence, this version sequentially cycles block-by-block, and every block is processed in a simultaneous way. BI methods (with S > 1) also exhibit an inherent nature for parallelization. The convergence rate is in-between of sequential and simultaneous versions, and is higher as block size decreases. SART (simultaneous algebraic reconstruction technique) [34] and BICAV [29] are the BI versions of ART and CAV methods, respectively. This work focuses on the simultaneous and block iterative versions because of their inherent parallel nature. In this work, the blocks are chosen so that the number of equations is a multiple of the number of pixels in the EM projection images. In this way, all the equations involved by an EM image belong to the same block. Conceptually, iterative methods (either CAV or ART) proceed in the following way (Fig. 1, right): First, they start from an initial model, and the model is progressively reﬁned as iterations evolve. For every iteration and every block: (1) the corresponding projections are computed from the current model; (2) the error between the experimental projections and those computed from the model is calculated; (3) the model is reﬁned so that the error is minimized. In essence, this process is the one analytically expressed by Eq. (3). 2.3. Averaging sequential strings Recently, another iterative algorithmic scheme to solve linear systems has arisen: averaging sequential strings (ASS) [28]. This is also a block-iterative approach. The important diﬀerence with respect to the previous ones is that the equations in every block are processed sequentially or following a BI strategy, yielding a partial result. The ﬁnal result in the iteration is then computed as the weighted sum of the partial results. ASS is also inherently parallel in the sense that blocks can be processed independently. The convergence rate is in-between of sequential and simultaneous versions of ART and CAV. The speciﬁc implementation of ASS in this work makes the blocks of equations composed by subsets of EM images (i.e., the number of equations in a block is a multiple of the number of pixels in the EM images). Then every block of equations is processed according to a BI scheme that proceeds sequentially image-by-image, in such a way that all the equations involved in an image are processed simultaneously. 2.4. Basis functions and projections In 3DEM, the choice of the set of basis functions to represent the object to be reconstructed greatly inﬂuences the result of the algorithm [19]. Overlapping spherically symmetric volume elements (blobs) with smooth transition to zero have been thoroughly investigated [30,31] as alternatives to voxels for image representation, concluding that the properties of blobs make them well suited for representing natural structures. The use of blobs basis functions provides the reconstruction algorithm with an implicit regularization mechanism, which is specially appropriate to work under noisy conditions as found in 3DEM. The basis functions in Eq. (1) are shifted versions of the blob function. The arrangement of overlapping blobs covers the space with a pseudocontinuous density distribution, which is very suitable for representing 3D biological structures [19]. Fig. 2(a) shows the arrangement of the voxels (left) versus that of blobs (right). Voxels are nonoverlapping cubes that may produce strong density discontinuities in the space. However, the overlapping nature of blobs produces smooth density variations. The projection process essentially consists of computing a line-integral of the volume. The direction of the projection is the direction along which the line-integral takes place. The use of blobs allows an eﬃcient computation of the forward- and backward-projection stages in any iterative reconstruction method. The spherical

692

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

Fig. 2. Basis functions and their projections. (a) Arrangement of voxels versus that of blobs. (b) The footprint is the projection of a single blob. The projection of a structure is made up of the accumulation of the corresponding footprints.

symmetry of blobs makes the projection of the blob along any direction the same. Consequently, it is possible to pre-compute the projection of the generic blob, and store it in a look-up table [31], the so-called footprint. As a result, the computation of the li,j terms (Eq. (3)) in the forward- and backward-projection stages are transformed into simple references to the look-up table. Consequently, the use of blobs as basis functions enables substantial computational savings compared to the use of voxels. Fig. 2(b) shows a scheme of the projection of a single blob, i.e. the footprint (left) and the projection of a set of blobs (right). 3. Parallelization The characteristics of this particular problem allow the use of parallel strategies that are based on domain decomposition and the SPMD model (single program, multiple data) in which every node in the system carries out, essentially, the same task over its own data domain. The parallelization has been focused on distributed memory computers, in concrete clusters of workstations, using the message-passing paradigm. In three-dimensional electron microscopy, the SPMD model consists of splitting the set of projection images into groups, and distributing the groups across the computing nodes. Fig. 3(a) shows a conceptual scheme of this approach. The nodes then process the images in parallel, and once or several times per iteration, they communicate the results to obtain the solution at the current iteration. The data distribution and the processing of the groups depend on the iterative methods.

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

693

Fig. 3. Conceptual parallelization scheme. (a) General scheme, (b) scheme for simultaneous methods, CAV and SIRT, (c) scheme for block-iterative methods, SART and BICAV and (d) scheme for ASS.

694

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

The parallel strategy of WBP is straightforward. Essentially, the set of EM images are distributed across the computing nodes, and every node computes a partial reconstruction through WBP with its own subset of images. Those partial reconstructions are ﬁnally summed together via a global reduction operation to yield the deﬁnite reconstruction. The conceptual scheme shown in Fig. 3(a) is also valid for WBP, except that no iterations are needed. Fig. 3(b)–(d) are intended to illustrate how the parallel implementation of iterative methods work (see detailed description in the following paragraphs). Those ﬁgures show an example with eight projection images and two computing nodes. In the case of block-iterative methods, two blocks are considered. In those ﬁgures, projection images are represented by means of squares, and 3D reconstructions by cubes. All the equations involved in an image are processed according to a simultaneous strategy, and that is denoted by solid arrows. Dotted arrows represent the fact that the 3D model obtained from a previous iteration is reﬁned in the current iteration. Dashed arrows represent the combination of the partial results from the diﬀerent nodes to yield the deﬁnite 3D model in the current iteration or block. Note that the combination of dotted and dashed arrows represent the global reduction operation that yields the deﬁnite reconstruction and also distributes it through all the nodes. The parallelization of the simultaneous and block-iterative (BI) versions of ART/CAV consists of distributing the EM images of the current block of equations into the computing nodes. Every node then computes its partial result following a simultaneous scheme with its equations. Finally, the nodes communicate each other to yield the ﬁnal result through a weighted sum of the partial results. Then, a new block is processed and so forth. Note that simultaneous algorithms consider all the images included in only one block. Therefore, simultaneous algorithms require only a global reduction operation per iteration, whereas BI methods require a number of communications per iteration which is inversely proportional to the block size (see Fig. 3(b) and (c)). With this parallelization approach, parallel simultaneous and BI methods are guaranteed to yield the same results as the corresponding sequential version. In other words, the convergence rate of the method is preserved. The parallelization of ASS follows a diﬀerent approach (see Fig. 3(d)). In each iteration, the blocks of equations are distributed across the nodes, i.e. each node receives a subset of images. Every node then computes its partial result from its images following the BI strategy above described. Finally, the nodes communicate each other to yield the ﬁnal result by a global reduction operation. This strategy proves to be interesting to analyze from the mathematical point of view due to the implications in the convergence rate of the algorithm. Parallelization allows reduction of the computation time per iteration. However, the way the images are processed may reduce the convergence rate of the algorithm. It has been formulated a heuristic rule over the relaxation parameter of the algorithm so that the sequential convergence rate is maintained [24]: kN p ¼ kSeq N p ; where Np denotes the number of processors, kSeq is the relaxation parameter used in the sequential version and kN p is the value that should be used in the parallel version to get similar convergence rate. 4. Results This section presents the performance evaluation of the parallel implementation of the reconstruction methods above described. This evaluation has been carried out in terms of speedups and percentage of the turnaround time dedicated to communication versus computation. A model of Bacteriorhodopsin, a macromolecule whose structure was solved by electron microscopy [35] and that is very well known in this ﬁeld, has been used to perform the reconstructions and evaluate the parallel algorithms [36]. A set of 12,288 projections, randomly distributed in projection space and contaminated with Gaussian noise at SNR = 1/3, was calculated from the model. This dataset is representative of the current structural studies by 3DEM. Then, the set of projections was used to compute the reconstructions by following the diﬀerent parallel reconstruction methods. In order to obtain more accurate measures, 50 iterations of each method were used to calculate average values of speedups and times. The reconstructions and the projections were 100 · 100 · 100 voxels and 100 · 100 pixels in size, respectively.

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

695

Six diﬀerent reconstruction algorithms have been tested: WBP, ASS, SIRT, CAV, SART and BICAV. For BI algorithms (SART and BICAV), three diﬀerent block sizes have been tested: 64, 128 and 256 images (denoted in this paper by SART-64, SART-128, SART-256, BICAV-64, BICAV-128, BICAV-256). The values for basic parameters of the algorithms (relaxation parameter and blobs parameters) were taken from previous studies [20] since their settings do not inﬂuence the performance of the parallel implementation. The parallel performance of the algorithms was measured in a 16-nodes (dual processor) Linux cluster of workstations whose characteristics are • 16 computing nodes: 2 · 3.06 GHz Pentium Xeon. 2 GB RAM, 70 GB HDD, 2 · 1 Gb Ethernet NICs. Cross-mounted /home ﬁlesystem. • 1 Server Node: 2 · 3.2 GHz Pentium Xeon. 4 GB RAM. 6 · 100 GB SCSI-320 HDD. 2 · 1 Gb Ethernet NICs. MPICH 1.2.6 was installed as Message-Passing Interface (MPI) library [37]. To assure exclusive access to nodes so times cannot be altered by other users, a batch scheduling policy was implemented. Jobs remained queued until requested resources were available so they could be launched in exclusive mode. OpenPBS was used as the batch scheduling system. 4.1. Speedups and execution times Speedup is the metric that is most often used to assess parallel performance [38]. It is deﬁned as the ratio between the turnaround times in the sequential and in the parallel version of the program. In this work, speedups have been computed to assess the eﬀectiveness of the parallel approaches. Table 1 represents the speedups obtained for the diﬀerent algorithms tested, using a number of processors that is power of 2. Fig. 4 shows graphs with the speedup curves for the diﬀerent methods versus the number of processors. It is clearly observed that WBP, ASS, and the simultaneous algorithms, SIRT and CAV, exhibit the best speedup curves (see Fig. 4, top). In particular, ASS, SIRT and CAV approach linear behaviour, while WBP exhibits a slightly decay of the slope at high number of processors (32 processors). This behaviour may be caused by the fact that WBP is computationally simple and, with 32 processors, the computation burden is so light that the time for communications signiﬁcantly contributes over the execution time. As far as the BI methods SART and BICAV are concerned, they exhibit much poorer speedups, and the behaviour is worse as the block size decreases (see Fig. 4, bottom). This is justiﬁed by the fact that BI algorithms require a number of global reduction operations per iteration that is proportional to the number of blocks. To quantify the cost of these operations, the following section presents the time devoted to communications for the diﬀerent algorithms. In order to compare the relative computation costs of the diﬀerent reconstruction algorithms, the average execution time (including computation and communication) per iteration was computed. Fig. 5 shows the

Table 1 Speedups for the diﬀerent reconstruction methods and number of processors Reconstruction method

WBP ASS CAV SIRT SART-64 SART-128 SART-256 BICAV-64 BICAV-128 BICAV-256

Number of processors 2 procs.

4 procs.

8 procs.

16 procs.

32 procs.

1.95 1.99 1.97 1.99 1.97 1.97 1.98 1.96 1.96 1.98

3.88 3.95 3.98 3.97 3.81 3.87 3.80 3.84 3.84 3.91

7.71 7.96 7.96 7.94 7.33 7.52 7.60 7.38 7.52 7.62

15.22 15.93 14.39 15.85 13.30 14.40 14.85 13.00 13.74 14.09

28.76 31.59 31.38 31.10 21.76 25.47 27.97 21.47 24.56 26.44

696

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701 35 30 25 20 15 10 5 0

1 2

4

8

16

WBP

32

ASS

CAV

SIRT

30 25 20 15 10 5 0

1 2

4

8

WBP SART-64

16 SART-128 SART-256

32 BiCAV-64 BiCAV-128

BiCAV-256

Fig. 4. Graphs of speedup versus number of processors.

12000 10000 8000 6000 4000 2000 0

12

4 WBP

8

16 ASS

32 CAV

SIRT

16000 14000 12000 10000 8000 6000 4000 2000 0 12 4 WBP SART-64

8 SART-128 SART-256

16 BiCAV-64 BiCAV-128

32 BiCAV-256

Fig. 5. Average execution time per iteration.

average turnaround time for the diﬀerent algorithms. In the case of WBP, the time included in the graph represents the time of the whole reconstruction process. With regard to the iterative algorithms, it is clearly seen that ASS, SIRT and CAV behave similarly. BI algorithms, however, are more expensive, and their demands

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

697

depend inversely on the block size. It is remarkable the fact that WBP is, by far, the less computationally consuming method. Its execution time is much less than the time required for a single iteration of any of the iterative methods. 4.2. Computation versus communication Communication is usually one of the factors which inﬂuences the degree of parallelism that an application can reach (i.e. scalability). In this section this parameter is analyzed in order to discern which algorithms are more suitable for parallelization and which ones scale badly. The percentage of the execution time that is used for computation and for communications was measured for the diﬀerent reconstruction methods and number of processors. Table 2 represents those times. Fig. 6 shows a graph corresponding to the data in Table 2. In this table, the data for every reconstruction method are represented by means of ﬁve columns that correspond to a number of processors of 2, 4, 8, 16, 32, respectively. Fig. 6 clearly shows that, independently of the reconstruction method, the higher the number of processors is, the more communication time is required because of the network overload. Consequently, it is expected that those reconstruction methods that are strongly based on communication (as BICAV and SART) are seriously penalized. As a matter of fact, those algorithms turn out to be severely aﬀected by communications, and this eﬀect is stronger as the number of processors increases. Moreover, BICAV and SART carry out as many global reduction operations as the number of blocks. Consequently, the smaller the block size is, the more communications are established. It is thus observed in Fig. 6 that, for those algorithms, the percentage of communication Table 2 Percentage of the execution time that is used for computation (CPU) and for communications Reconstruction method

Number of processors 2 procs.

WBP ASS CAV SIRT SART-64 SART-128 SART-256 BICAV-64 BICAV-128 BICAV-256

4 procs.

8 procs.

16 procs.

32 procs.

CPU

Comm

CPU

Comm

CPU

Comm

CPU

Comm

CPU

Comm

99.99 100.0 100.0 100.0 99.71 99.86 99.92 99.69 99.84 99.92

0.01 0.00 0.00 0.00 0.29 0.14 0.08 0.31 0.16 0.08

99.97 99.99 99.99 99.99 98.63 99.30 99.66 98.51 99.25 99.62

0.03 0.01 0.01 0.01 1.37 0.70 0.34 1.49 0.75 0.38

99.84 99.98 99.98 99.98 96.64 98.27 99.12 96.39 98.14 99.06

0.16 0.02 0.02 0.02 3.36 1.73 0.88 3.61 1.86 0.94

99.30 99.96 99.96 99.96 93.07 96.28 98.07 92.17 95.89 97.93

0.70 0.04 0.04 0.04 6.93 3.72 1.93 7.83 4.11 2.07

96.18 99.91 99.91 99.91 84.44 90.69 95.11 80.99 89.93 94.50

3.82 0.09 0.09 0.09 15.56 9.31 4.89 19.01 10.07 5.50

20 18 16 14 12 10 8 6 4 2 0

1 2

4

WBP SART-64

8

16 SART-128 SART-256

32 BiCAV-64 BiCAV-128

BiCAV-256

Fig. 6. Percentage of the execution time that is used for communications, represented for 1, 2, 4, 8, 16, 32 processors.

698

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

increases for smaller block sizes. For instance, around no less than 15–20% of the total execution time is dedicated to communication in SART and BICAV when block size is equal or less than 64 images and 32 processors are used. This behaviour makes those BI algorithms not very suitable for parallelization. With regard to the other iterative algorithms (ASS, CAV and SIRT), the communication time is expected not to be a serious limitation factor. These algorithms perform just a communication per iteration to share the partial results and, consequently, the communication time should be signiﬁcantly less than the computation time as illustrated in Fig. 3. The maximum percentage of communication time is needed when 32 processors are involved, and it is always below 1%. Therefore, these algorithms are not strongly aﬀected by communications and they can thus achieve good scalability. Finally, WBP exhibits a behaviour similar to ASS, CAV and SIRT. However, it is noteworthy that the percentage of communication time is slightly greater than for those iterative algorithms. WBP requires only one global reduction operation to yield the deﬁnite reconstruction, i.e. the same amount of communications per

Fig. 7. Three-dimensional reconstruction results. From top to bottom: a set of ﬁve noisy projection images; frontal and lateral views of the model of Bacteriorhodopsin from which the projection images were computed; frontal and lateral views of the reconstruction obtained with WBP; frontal and lateral views of the reconstruction obtained with ASS. The reconstruction results were obtained after processing 12,288 diﬀerent projection images as those shown at the top.

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

699

iteration than ASS, CAV and SIRT. However, WBP performs, approximately, half the computation. WBP only performs a backprojection of the image density whereas every iteration of ASS, CAV and SIRT takes projections from the model, compute the error with the experimental projection, and reﬁne the model according to the errors. Therefore, WBP exhibits less scalability than ASS, CAV and SIRT because of its ratio computation versus communication. 4.3. Reconstruction results With the aim of illustrating the ﬁnal solutions obtained with the numerical algorithms analyzed in this work, Fig. 7 shows some views from diﬀerent 3D reconstructions obtained with these methods. At the top of this ﬁgure, a set of ﬁve projection images is shown to illustrate the level of SNR used here. Then the model of Bacteriorhodopsin from which the projection images were computed is shown. Finally, 3D reconstruction results obtained after processing 12,288 diﬀerent projection images are presented. Only the results obtained with WBP and ASS are shown. WBP is the standard method in the ﬁeld [18]. And ASS is the best iterative reconstruction method in terms of stability, convergence speed, and quality of the results [24]. In this ﬁgure, it is possible to see how the iterative method is far superior to WBP as the result better resembles the model and the resolution is higher, specially in the vertical direction. Assessment of the reconstructions and evaluation of the reconstruction methods in terms of the quality is out of the scope of this work. There already are other works dealing with such a kind of comparisons [19–21]. 5. Conclusion This work has described and analyzed parallel approaches for reconstruction methods used in three-dimensional electron microscopy. Several types of algorithms have been parallelized and analyzed: WBP, simultaneous iterative methods (SIRT, CAV) and block-iterative methods (SART, BICAV, ASS). Datasets resembling current structural studies by 3DEM have been used for the evaluation. The results that have been obtained show that ASS, SIRT and CAV exhibit similar behaviour in terms of scalability, showing a nearly linear speedup. WBP is the following method in terms of scalability. All of them can be considered well suited for parallelization. However, the block-iterative algorithms BICAV and SART exhibit a poor behaviour in terms of scalability as the block size is reduced. This is a strong disadvantage as small block sizes are desirable in order to get fast convergence rates in reconstruction quality [27,24]. WBP is currently the standard method for high resolution structure determination of biological specimens in 3DEM [3,15]. However, it was shown that regularized iterative reconstruction methods are better suited for this problem because of their robustness to noise and their less sensitiveness under limited-tilt angle conditions [19]. Despite their potential, iterative methods are not extensively used in this ﬁeld because of their computational costs. On the other hand, a previous work [24] showed that ASS exhibits the highest convergence rate in terms of quality of the reconstructions. The present work has shown that ASS is one of the methods with higher scalability. Therefore, the conjunction of these two facts makes ASS the best iterative method for large-scale three-dimensional reconstruction from projections. This work has shown that parallel versions of WBP and iterative reconstruction algorithms—ASS in particular—are eﬀective enough to substitute sequential versions without loss of quality and with signiﬁcant speedup factor. Previously unaﬀordable large reconstructions could then be faced and carried out now in a cluster of workstations in a relatively short computation time. The use of parallel computing is becoming essential to aﬀord ‘‘grand challenge’’ problems currently unapproachable in structural biology [25,39,26] in particular, and in Biomedical research in general [40]. High resolution structure determination of biological specimens by electron microscopy intends to identify accurately their molecular components in order to suggest structural mechanisms related to their biological functions. This discipline is currently trying to reach close-to-atomic resolution by combining up to one million images of the specimen under study [17,15,3,16]. The computational costs that this challenge involves make high performance computing central in this ﬁeld. This work has shown that parallel and distributed computing can be eﬀectively used in 3DEM and will play a key role in the short term to aﬀord those high resolution structural studies.

700

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

Acknowledgments This work has been partially supported by the Spanish CICYT (TIC2002-00228, TIN2005-00447, BIO2001-1237 and BFU2004-00217/BMC), by the NIH (grant 1R01HL67465-01), by Fundacio´n BBVA (2004X578), by CAM (GR/SAL/0342/2004), by the EU (FP6-502828: The European Network of Excellence on 3DEM) and the EU TRACS Programme at EPCC (HPRI-CT-1999-00026). J.R. Bilbao-Castro is a fellowship of the Spanish National Research Council (CSIC). References [1] A. Sali, R. Glaeser, T. Earnest, W. Baumeister, From words to literature in structural proteomics, Nature 422 (2003) 216–225. [2] J. Frank, Three Dimensional Electron Microscopy of Macromolecular Assemblies, Oxford University Press, 2005. [3] J. Frank, Single-particle imaging of macromolecules by cryo-electron microscopy, Annu. Rev. Biophys. Biomol. Struct. 31 (2002) 303–319. [4] E.V. Orlova, H.R. Saibil, Structure determination of macromolecular assemblies by single-particle analysis of cryo-electron micrographs, Curr. Opin. Struct. Biol. 14 (2004) 584–590. [5] A. Fotin, Y. Cheng, N. Grigorieﬀ, T. Walz, S.C. Harrison, T. Kirchhausen, Structure of an auxilin-bound clathrin coat and its implications for the mechanism of uncoating, Nature 432 (2004) 649–653. [6] M.S. Jurica, D. Sousa, M.J. Moore, N. Grigorieﬀ, Three-dimensional structure of C complex spliceosomes by electron microscopy, Nat. Struct. Biol. 11 (2004) 265–269. [7] B.P. Klaholz, A.G. Myasnikov, M.V. Heel, Visualization of release factor 3 on the ribosome during termination of protein synthesis, Nature 427 (2004) 862–865. [8] X. Agirrezabala, J. Martı´n-Benito, M. Valle, J.M. Gonza´lez, A. Valencia, J.M. Valpuesta, J.L. Carrascosa, Structure of the connector of bacteriophage T7 at 8 angstroms resolution: structural homologies of a basic component of a DNA translocating machinery, J. Mol. Biol. 347 (2005) 895–902. [9] M.M. Golas, B. Sander, C.L. Will, R. Luhrmann, H. Stark, Molecular architecture of the multiprotein splicing factor SF3b, Science 300 (2003) 980–984. [10] Y. Cheng, O. Zak, P. Aisen, S. Harrison, T. Walz, Structure of the human transferrin receptor–transferrin complex, Cell 116 (2004) 565–576. [11] M. Halic, T. Becker, M.R. Pool, C.M. Spahn, R.A. Grassucci, J. Frank, R. Beckmann, Structure of the signal recognition particle interacting with the elongation-arrested ribosome, Nature 427 (2004) 808–814. [12] H. Gao, J. Sengupta, M. Valle, A. Korostelev, N. Eswar, S.M. Stagg, P.V. Roey, R.K. Agrawal, S.C. Harvey, A. Sali, M.S. Chapman, J. Frank, Study of the structural dynamics of the E. coli 70S ribosome using real-space reﬁnement, Cell 113 (2003) 789–801. [13] M. Valle, A. Zavialov, W. Li, S.M. Stagg, J. Sengupta, R.C. Nielsen, P. Nissen, S.C. Harvey, M. Ehrenberg, J. Frank, Incorporation of aminoacyl-tRNA into the ribosome as seen by cryo-electron microscopy, Nat. Struct. Biol. 10 (2003) 899–906. [14] C.S. Potter, Y. Zhu, B. Carragher (Eds.), Special Issue on ‘‘Automated particle selection for cryo-electron microscopy’’, J. Struct. Biol. 145 (2004) 1–180. [15] M. van Heel, B. Gowen, R. Matadeen, E.V. Orlova, R. Finn, T. Pape, D. Cohen, H. Stark, R. Schmidt, M. Schatz, A. Patwardhan, Single-particle electron cryo-microscopy: towards atomic resolution, Quart. Rev. Biophys. 33 (2000) 307–369. [16] R. Henderson, Realising the potential of electron cryomicroscopy, Quart. Rev. Biophys. 37 (2004) 3–13. [17] R. Henderson, The potential and limitations of neutrons, electrons and X-rays for atomic resolution microscopy of unstained biological molecules, Quart. Rev. Biophys. 28 (1995) 171–193. [18] M. Rademacher, in: J. Frank (Ed.), Electron Tomography, Plenum Press, 1992 (Ch. Weighted Back-Projection Methods), pp. 91–115. [19] R. Marabini, G.T. Herman, J.M. Carazo, 3D reconstruction in electron microscopy using ART with smooth spherically symmetric volume elements (blobs), Ultramicroscopy 72 (1998) 53–56. [20] C.O.S. Sorzano, R. Marabini, N. Boisset, E. Rietzel, R. Schroder, G.T. Herman, J.M. Carazo, The eﬀect of overabundant projection directions on 3D reconstruction algorithms, J. Struct. Biol. 133 (2001) 108–118. [21] J.J. Ferna´ndez, A.F. Lawrence, J. Roca, I. Garcı´a, M.H. Ellisman, J.M. Carazo, High performance electron tomography of complex biological specimens, J. Struct. Biol. 138 (2002) 6–20. [22] J.J. Ferna´ndez, J.M. Carazo, I. Garcı´a, Three-dimensional reconstruction of cellular structures by electron microscope tomography and parallel computing, J. Parallel Distrib. Comput. 64 (2004) 285–300. [23] D. Clery, D. Voss, All for one and one for all. Distributed computing: Introduction to special issue, Science 308 (2005) 809–824. [24] J.R. Bilbao-Castro, J.M. Carazo, I. Garcfa, J.J. Ferna´ndez, Parallel iterative reconstruction methods for structure determination of biological specimens by electron microscopy, in: Proc. IEEE Intl. Conf. Image Processing (ICIP), vol. 1, 2003, pp. 565–568. [25] J.R. Bilbao-Castro, R. Marabini, J.M. Carazo, I. Garcı´a, J.J. Ferna´ndez, The potential of grid computing in 3D electron microscopy, Parallel Process. Lett. 14 (2004) 151–162. [26] P.A. Leong, J.B. Heymann, G.J. Jensen, Peach: a simple Perl-based system for distributed computation and its application to cryoEM data processing, Structure 13 (2005) 505–511. [27] G.T. Herman, in: C. Leondes (Ed.), Medical Imaging. Systems, Techniques and Applications, Gordon and Breach Science, 1998, (Algebraic Reconstruction Techniques in Medical Imaging), pp. 1–42.

J.R. Bilbao-Castro et al. / Applied Mathematical Modelling 30 (2006) 688–701

701

[28] Y. Censor, T. Elfving, G.T. Herman, Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications, Elsevier Science, 2001, (Ch. Averaging Strings of Sequential Iterations for Convex Feasibility Problems), pp. 101–114. [29] Y. Censor, D. Gordon, R. Gordon, BICAV: a block-iterative, parallel algorithm for sparse systems with pixel-related weighting, IEEE Trans. Med. Imag. 20 (2001) 1050–1060. [30] R.M. Lewitt, Alternatives to voxels for image representation in iterative reconstruction algorithms, Phys. Med. Biol. 37 (1992) 705– 716. [31] S. Matej, R.M. Lewitt, G.T. Herman, Practical considerations for 3D image reconstruction using spherically symmetric volume elements, IEEE Trans. Med. Imag. 15 (1996) 68–78. [32] P. Gilbert, Iterative methods for the 3D reconstruction of an object from projections., J. Theor. Biol. 76 (1972) 105–117. [33] Y. Censor, D. Gordon, R. Gordon, Component averaging: an eﬃcient iterative parallel algorithm for large and sparse unstructured problems, Parallel Comput. 27 (2001) 777–808. [34] A.H. Andersen, A.C. Kak, Simultaneous algebraic reconstruction technique (SART): a superior implementation of the ART algorithm., Ultrason. Imaging 6 (1984) 81–94. [35] R. Henderson, J.M. Baldwin, T.A. Ceska, F. Zemlin, E. Beckmann, K.H. Downing, Model for the structure of bacteriorhodopsin based on high-resolution electron cryo-microscopy, J. Mol. Biol. 213 (1990) 899–929. [36] J.R. Bilbao-Castro, C.O.S. Sorzano, I. Garcı´a, J.J. Ferna´ndez, Phan3D: design of biological phantoms in 3D electron microscopy, Bioinformatics 20 (2004) 3286–3288. [37] W. Gropp, E. Lusk, A. Skjellum, Using MPI Portable Parallel Programming with the Message-Passing Interface, MIT Press, 1994. [38] B. Wilkinson, M. Allen, Parallel Programming, second ed., Prentice-Hall, 2004. [39] J.J. Ferna´ndez, J.R. Bilbao-Castro, R. Marabini, J.M. Carazo, I. Garcı´a, On the suitability of biological structure determination by electron microscopy to grid computing, New Generation Comput. 23 (2005) 101–112. [40] K.H. Buetow, Cyberinfrastructure: Empowering a Ôthird wayÕ in biomedical research, Science 308 (2005) 821–824.

Fusion of Sparse Reconstruction Algorithms in ...