A Modified SSOR Preconditioner for Sparse Symmetric Indefinite Linear Systems of Equations Chen, X., Toh, K. C. & Phoon, K. K. Abstract: The standard SSOR preconditioner is ineffective for the iterative solution of the symmetric indefinite linear systems arising from finite element discretization of the Biot’s consolidation equations. In this paper, a modified block SSOR preconditioner combined with the Eisenstat-trick implementation is proposed.

For actual

implementation, a pointwise variant of this modified block SSOR preconditioner is highly recommended to obtain a compromise between simplicity and effectiveness. Numerical experiments show that the proposed modified SSOR preconditioned symmetric QMR solver can achieve faster convergence than several effective preconditioners published in the recent literature in terms of total runtime. Moreover, the proposed modified SSOR preconditioners can be generalized to nonsymmetric Biot’s systems.

1.

Introduction

The finite element discretization of time-dependent Biot’s coupled consolidation equations typically give rises to large symmetric indefinite linear systems with different right-hand side corresponding to different time step [e.g. 1].

At each time

step, the linear system has the form K Ax  b, where A   T B

B   C 

(1)

Here A  NN is a sparse 2×2 block symmetric indefinite matrix; x, b N; K  mm is a symmetric positive definite matrix corresponding to soil stiffness; C  nn is a symmetric positive semidefinite matrix corresponding to fluid stiffness; and B  mn is a full column rank connection matrix. A more detailed description of Biot’s consolidation equations is given elsewhere [2].

1

The linear system (1) is typically sparse because each node in the discretization by finite elements, finite difference, or finite volumes has only a few neighbors. In other words, the number of nonzeros generated in each row is rather small regardless of the matrix dimension (e.g. Reference [3]).

To solve a very large linear system of the form given in (1), especially those derived from 3D soil-structure interaction problems, direct solution methods such as sparse LU factorization are impractical from the perspective of computational time and memory requirement. Krylov subspace iterative methods on the other hand, may overcome these difficulties and they are commonly used for solving large-scale linear systems. Three of the most popular Krylov subspace iterative methods for solving the system (1) are SYMMLQ, MINRES and Symmetric QMR (SQMR) method. The SYMMLQ and MINRES methods, developed by Paige and Saunders in 1975 [4], could only be used in conjunction with symmetric positive definite preconditioners. Unlike the optimal SYMMLQ and MINRES methods, the SQMR method proposed by Freund and Nachtigal [5] can be used in conjunction with a symmetric indefinite preconditioner. Because of this flexibility, we choose to use SQMR throughout this paper. There are also recent numerical results showing that SQMR combined with a symmetric indefinite preconditioner is more effective than a positive definite preconditioner (e.g. References [2, 5–8]).

Generally, to achieve fast convergence, an iterative method should be used with a preconditioner. To incorporate a preconditioner, we can either use left, right, or left-right preconditioning. In this paper, we choose the left-right preconditioning for two reasons. First, it is desirable to make the preconditioned matrix as symmetric as possible. Second, the Eisenstat trick (e.g. Reference [9]) that we will apply to the proposed modified SSOR preconditioner in this paper is applicable only under the left-right preconditioning framework for SQMR. Given a preconditioner of the form P = PLPR, the left-right preconditioned system is as follows:

2

~ ~ A~ x b

(2)

~ ~ x  PR x and b  PL1b . where A  PL1 APR1 , ~

In the last few years, several effective symmetric indefinite preconditioners such as MJ [6], GJ [2] and block constrained preconditioner Pc [7][8] have been proposed in conjunction with the SQMR method for solving (1). The preconditioners, GJ and Pc, were developed based on the block structure of A. The heuristic preconditioner MJ was developed based on the observation that the standard Jacobi preconditioned SQMR actually performed worse than the unpreconditioned version when the diagonal elements of the (2, 2) block of A is close to zero. It had been shown that GJ is an improvement over MJ from both theoretical and numerical perspectives. Thus we shall not concern ourselves with MJ further in this paper. A brief description of the GJ and Pc preconditioners is given at the end of this section.

Classical standard SSOR (Symmetric Successive Over-Relaxation) preconditioner has been used by Mroueh and Shahrour [10] to solve soil-structure interaction problems and they recommended left SSOR preconditioning. However, nonsymmetric iterative methods such as Bi-CGSTAB and QMR-CGSTAB were used for elastic-perfectly plastic problems with associated flow rule, even though SQMR method would be more efficient (in terms of CPU time) based on our experience [11]. Therefore, not only the preconditioning methods but also the iterative methods themselves have not been exploited to the greatest efficiency. For linear systems arising from soil-structure interaction problems, the behavior of the standard SSOR preconditioner may be seriously affected by small negative diagonal elements corresponding to the pore pressure DOFs and as a result, the convergence of the standard SSOR preconditioned iterative method may be exceedingly slow. In this paper, we first propose a new modified block SSOR preconditioner which is based on an exact factorization form of the coefficient matrix A. The modified block SSOR preconditioner is expensive to construct. However, it serves as a useful theoretical basis to develop a simpler

3

pointwise variant (named MSSOR from hereon) that can exploit the so-called Eisenstat trick (e.g., References [9][1213]) for computational efficiency. Numerical experiments demonstrate that the MSSOR preconditioned SQMR can converge faster than the GJ or Pc preconditioned SQMR method. We note that our modified SSOR preconditioner can readily be extended to nonsymmetric 22 block systems.

We end this section by briefly describing the GJ and Pc preconditioners that we will use to benchmark our proposed modified SSOR preconditioner. 1.1 The GJ preconditioner The GJ preconditioner developed by Phoon et al. [2] has the form: 0  diag ( K ) PGJ   diag Sˆ   0



(3)

where Sˆ  C  B T diag ( K ) 1 B is an approximate Schur complement matrix; α is a real scalar, and in [2], it is recommended to choose   1 for a reasonable convergence rate. 1.2 The Pc preconditioner The block constrained preconditioner Pc was derived from A by replacing K by diag(K) in [7] as: diag ( K ) B  Pc   T  C   B

(4)

Although Pc is significantly more complicated than PGJ, each preconditioning step Pc1 u ; v  can be computed efficiently by solving 2 triangular linear systems of equations whose coefficient matrices involve the sparse Cholesky factor of the approximate Schur complement matrix Sˆ . The detailed pseudocode for constructing Sˆ and the implementation procedure for sparse Cholesky factorization have been

given in [7]. It has been demonstrated that Pc preconditioned SQMR method can result in faster convergence rate than the GJ preconditioned counterpart. On some fairly large linear systems arising from FE discretization of Biot’s equations, the 4

saving in CPU time can be up to 40%. 2.

Modified SSOR preconditioner

The SSOR iteration is a symmetric version of the well-known SOR iteration. When the SSOR method is used as a preconditioning technique, it is in fact equivalent to a single iteration of the standard SSOR method with zero initial guess (e.g. References [10][15]). In addition, the SSOR preconditioner, like the Jacobi preconditioner, can be constructed from the coefficient matrix without any extra work.

For a symmetric matrix A with the decomposition, A = L + D + LT, where L and D are the strictly lower triangular and diagonal parts of A (pointwise factorization), or the corresponding block parts if A is a block matrix (block factorization), the standard SSOR preconditioner has the following factorized form (e.g. Reference [916]) PSSOR  L  D D 1 LT  D 

(5)

It is obvious that the factorized form of (5) follows the stencils for LDU, so SSOR can also be regarded as an incomplete factorization preconditioner. For a symmetric indefinite linear system with a zero diagonal sub-block, direct application of (5) is impossible. To avoid this difficulty, Freund et al. (e.g. Reference [12]) proposed to apply a permutation  to the original matrix A, that is, A = TA = L  D  LT where L and D are the resultant strictly lower triangular and block diagonal parts of A , respectively. The objective of the permutation  is to obtain a trivially invertible and in some sense “as large as possible” block diagonal D with a series of 11 and 22 blocks. Therefore, the corresponding modified SSOR preconditioner to combine with the SQMR solver can be obtained from the permuted matrix as





P SSOR  L  D D

1

L

T

D



(6)

However, it may be difficult to find a permutation that totally avoids zero or small diagonal elements in the reordered matrix.

In addition, incomplete factorization

methods such as the SSOR preconditioner are sensitive to permutations, and thus may incur a larger number of iterations than the same incomplete factorization applied to 5

the original matrix (e.g. References [1719]).

2.1 Derivation of a new modified block SSOR preconditioner For the 22 block symmetric indefinite linear system (1) arising from Biot’s consolidation problem, the standard pointwise SSOR preconditioner (5) has small negative diagonal elements corresponding to pore pressure DOFs, and they may cause the iterative solver to stagnate, diverge or even breakdown. The permutation approach has limitations as noted above. This motivated us to consider a different approach involving the modification of D without the need for choosing a permutation. First, observe that the 22 block matrix of Equation (1) can be factorized as K  BT 

B  K   C   BT

0  K  S   0

1

0  K  S   0

B  S 

(7)

Here S  C  B T K 1B is the exact Schur complement matrix. The factorization shown in Eq. (7) has been mentioned as block LDU factorization in Reference [13], or generalized SSOR form in Reference [14]. In this paper, we propose a new modified block SSOR (MBSSOR) preconditioner which is derived from (7), that is, PMBSSOR

 Kˆ  T B

0   Kˆ   Sˆ   0

1

0   Kˆ    Sˆ   0

B   Kˆ   Sˆ   B T

 B  B Kˆ B  Sˆ  T

1

(8)

Here, Kˆ and Sˆ are the approximations of K and S, respectively, and the approximate matrix Kˆ may differ from the approximation of K appearing in Sˆ . Many variants can be obtained from this modified block SSOR preconditioner. It is interesting to note that the factorized form in Equations (8) contains all three block preconditioners discussed by Toh et al. [7], and the Pc preconditioner in (4) is the exact product form of Equation (8) with Kˆ  diag ( K ) and Sˆ  C  B T Kˆ 1 B . The eigenvalue analysis of PMBSSOR preconditioned matrix is similar to that for the Pc preconditioner given in [7], and thus shall not be repeated here.

2.2 A new modified pointwise SSOR preconditioner

6

A complicated near-exact preconditioner is not always the most efficient, because the cost of applying the preconditioner at each iteration could obviate the reduction in iteration count. The pragmatic goal is to reduce total runtime and simpler variants could potentially achieve this even though iteration counts might be higher. With this pragmatic goal in mind, we proposed a parameterized pointwise variant of PMBSSOR as PMSSOR

 Dˆ  Dˆ    L        

1

 T Dˆ  ~ ~ L    L  D D    



  L 1

T

~ D



(9)

where L is the strictly lower triangular part of A, Dˆ is a diagonal matrix that is to be ~ chosen, D  Dˆ  , and   [1, 2] is a relaxation parameter. The choice of Dˆ is crucial.

Pommerell and Fichtner (e.g., References [16][20]) proposed a D-ILU

preconditioner, for which the diagonal matrix Dˆ = diag(ãii) is chosen from the ILU(0) factorization procedure as follows: (1)

Set  as the nonzero sparsity structure of coefficient matrix A

(2) (3) (4) (5) (6)

For i = 1 : N Set ãii = aii End for For i = 1 : N For j = i+1 : N

(7) (8) (9) (10) (11)

If (i, j)   and (j, i)   Set ãjj = ãjj –ajiãii-1aij End End End

However, our numerical experiences indicate that SQMR is unable to converge when the linear systems stemming from Biot’s consolidation equations are preconditioned by D-ILU. The GJ preconditioner is a more natural candidate for Dˆ . Phoon et al. [2] has already demonstrated that PGJ is a good approximation to Murphy’s preconditioner [21], which is the un-inverted block diagonal matrix in Eq. (7). Hence, Eq. (9)

7

would be studied based on choosing Dˆ  PGJ from hereon.

Another important

parameter in Eq. (9) is the relaxation parameter  . The optimal choice of  is usually expensive to estimate, but practical experiences suggest that choosing  slightly greater than 1 usually result in faster convergence than the choice  =1 (where SSOR preconditioner reduces to the symmetric Gauss-Seidel preconditioner). We should mention that picking  too far away from 1 can sometimes result in significant deterioration in the convergence rate. Thus, it is advisable not to pick  larger than 1.5 unless the optimal choice of  is known. Rewriting the expression



 



~ ~ ~ ~ ~ for PMSSOR in (9) as PMSSOR  L  D D 1 LT  D  A  LD 1 LT  D  D , it is quite obvious

that

PMSSOR

can

better

approximate

A

when

the

error

matrix

~ ~ E  LD 1 LT  D  D is small.

Note that for the matrix A in (1), it is easy to see that the (1,1) block of PMSSOR is given by

LK  DK  DK  1 LK  DK  T where LK and D K are the strictly lower and diagonal parts of K. Thus the (1,1) block K in A is approximated by its SSOR matrix. 2.3 Combining with Eisenstat trick Eisenstat [9] exploited the fact that some preconditioners such as generalized SSOR [22], ICCG(0) [23] and MICCG(0) [24] contain the same off-diagonal parts of the original coefficient matrix to combine the preconditioning and matrix-vector multiplication step into one single efficient step. This trick is obviously applicable to any preconditioner of the form shown in Eq. (9).

The pseudo-code for the SQMR algorithm coupled with the MSSOR preconditioner is provided in Appendix A (e.g., Reference [12]), in which the Eisenstat trick is applied to the left-right preconditioned matrix of the form: ~ ~ ~ ~ A  ( L  D ) 1 A( LT  D ) 1 D

(10)

8

Following Chan’s implementation (e.g. [9][13]), the preconditioned matrix can be written as





~ ~ ~ ~ ~ ~ ~ A  ( L  D ) 1 ( L  D )  ( D  2 D )  ( LT  D ) ( LT  D ) 1 D

(11)

~ Thus, given a vector vk 1 , the product t k 1  A v k 1 in the SQMR algorithm can be

computed from the following procedure: Procedure PMatvec:

~ ~ (1) f  ( LT  D ) 1 wk 1 where wk 1  Dvk 1 ~ (2) g  ( D  2 D ) f  wk 1 ~ (3) h  ( L  D ) 1 g (4) tk 1  f  h ~ When applying the above PMatvec procedure to the Pc preconditioned matrix A , we get the resultant vector,

t 1  t k 1   k21   t k 1 

1  Kˆ 1 Kv 1  Kˆ 1 KKˆ 1 Bv k21  Kˆ 1 Bvk21 ~ v   A  k21    1 T 1 1 k 1 1           1 2  1 2 1 2 v k 1   Sˆ B Kˆ Kvk 1  Kˆ KKˆ Bvk 1  Kˆ Bvk 1  v k 1  v k 1 





(12) This computation can be carried out by the following steps: Compute u  Kˆ 1Bvk21 Compute s  Kˆ 1K vk11  u   u

(13)



Compute and set tk 1  s ; Sˆ 1B T s  vk11   vk21



Thus the computational cost of each Pc preconditioned SQMR step combined with the Eisenstat trick can broadly be summarized as: 1 K; 1 B; 1 BT; 1 Sˆ 1 ; 2 Kˆ 1 .

On the other hand, for Pc preconditioned SQMR method without applying the Eisenstat trick, the matrix-vector product and preconditioning steps at each iteration are separately carried out as follows (e.g., Reference [8]): Procedure Matvec:

9

Given [u; v] Compute z1 = Bv, z2 = BTu, z3 = Cv

(14)

Compute w = Ku Compute and set A[u; v] = [w + z1; z2  z3]

Procedure Pvec:

Given [u; v]; Compute w  Kˆ 1u ;

(15)

Compute z  Sˆ 1 B T w  v  ;



Compute Pc1 u; v   Kˆ 1 u  Bz ; z



Thus the main computational cost for each Pc preconditioned SQMR iteration without using the Eisenstat trick can be summarized as: 1 K; 1 C; 2 B; 2 BT; 1 Sˆ 1 ; 2 Kˆ 1 . Since the number of nonzero elements of B is only about 12% that of K , it is clear that the implementation of Pc combined with the Eisenstat trick is only marginally cheaper than the efficient implementation described in [8] for Pc without using the Eisenstat trick.

In contrast to the modified block SSOR-type preconditioners such as factorized Pc, the pointwise MSSOR preconditioner proposed in Equation (9) is very promising in that it heavily exploits the Eisenstat trick. Each step of the MSSOR-preconditioned iterative method is only marginally more expensive than that of the original matrix-vector product because the strictly off-diagonal parts of PMSSOR and A cancelled one another and only two triangular solves are involved at each iteration. Equation (10) or (11) demonstrated the Eisenstat trick in left-right form; in fact, it is also applicable in the left or right form for nonsymmetric solvers (e.g., Reference [20]). 2.4 Other implementation issues of GJ, Pc and PMSSOR In practical finite element programming, the displacement unknowns and pore 10

pressure unknowns can be arranged in different order. In Reference [25], Gambolati et al. have studied the effect of three different nodal orderings on ILU-type preconditioned Bi-CGSTAB method, and these nodal orderings can be expressed as: (1) Natural ordering: iord1 = [x1, y1, z1, p1, x2, y2, z2, p2, …, xN, yN, zN, pN]; (2) Block ordering: iord2= [x1, y1, z1, x2, y2, z2, …, xN, yN, zN, p1, p2,…]; (3) Block ordering: iord3= [x1, x2, ..., y1, y2, …, z1, z2, …, p1, p2,…]. where xi, yi, zi and pi are the displacement unknowns at the x,y,z directions and excess pore pressure at node i, respectively; N = m + n is the dimension of A. In this study, GJ and MSSOR preconditioners are applied to linear systems based on iord1, while Pc is applied to linear systems based on iord2. Numerical results discussed in the next section support the above choices.

Because of the symmetry in A, only the upper triangular part of A needs to be stored. In our implementation of GJ or MSSOR-preconditioned SQMR methods, the upper triangular part of A is stored in the CSC (Compressed Sparse Column) format (icsc, jcsc, csca) (Refer to [3] [16] for sparse storage formats). Clearly, the CSC storage of

the upper triangular part of A is also the CSR (Compressed Sparse Row) storage of the lower triangular part for A. Both the “forward solve step” and “backward solve step” at each iteration of the SQMR method can be executed quite rapidly from the CSC storage. The pseudo-code of the forward and backward solves are provided in Appendix A. Other than the CSC storage for the upper triangular part of A, only a few additional vectors are required to store the original and modified diagonal elements in MSSOR. In the case of Pc, it is natural to store K, B and C separately in order to obtain the approximate Schur complement matrix and its sparse Cholesky factor. The detailed implementation is described in Reference [8].

3

Numerical Experiments

3.1 Convergence Criteria An iterative solver typically produces increasingly accurate solutions with iteration count, and one can terminate the solver when the approximate solution is deemed 11

sufficiently accurate.

A standard measure of accuracy is based on the relative

residual norm. Suppose xi is the approximate solution at the i-th iterative step, then ri = b  Axi is the corresponding residual vector.

In Algorithm 2 presented in

Appendix A, the residual is the preconditioned residual.

For the purpose of

comparing with other preconditioned SQMR methods, the true relative residual for the SQMR method preconditioned by MSSOR or Pc combined with the Eisenstat trick is returned at every fifth iteration.

Note that the true relative residual (modulo

~ ~ rounding errors) is easily computed from the equation b  Axi = PL (b  A~ xi ) .

Given an initial guess x0 (usually zero initial guess), an accuracy tolerance stoptol, and the maximum number (maxit) of iterative steps allowed, we stop the iterative process if i  max it or

ri r0

2 2



b  Axi b  Ax0

2

 stoptol

(13)

2

Here  2 denotes the 2-norm. In this paper, the initial guess x0 is taken to be the zero vector, stoptol = 10-6, and maxit = 5000. More details on various stopping criteria can be found in [16][26].

3.2 Problem descriptions Figure 1 shows a 7×7×7 finite element mesh for a flexible square footing resting on homogeneous soil subjected to a uniform vertical pressure of 0.1 MPa. The sparsity pattern of the global matrix A with natural ordering obtained from the 777 meshed footing problem is also shown in the figure. In figure 2, a 20×20×20 mesh for the same square footing problem resting on a layered soil is plotted. Symmetry consideration allows a quadrant of the footing to be analyzed. Mesh sizes ranging from 8×8×8 to 20×20×20 were studied. These meshes result in linear systems of equations with DOFs ranging from about 7,000 to 100,000, respectively. In our FE discretization of the time-dependent Biot’s consolidation problems, the twenty-node hexahedral solid elements coupled with eight-node fluid elements were used.

12

Therefore, each hexahedral element consists of 60 displacement degrees of freedom (8 corner nodes and 12 mid-side nodes with 3 spatial degrees of freedom per node) and 8 excess pore pressure degrees of freedom (8 corner nodes with 1 degree of freedom per node). Thus the total number of DOFs of displacements is much larger than that of excess pore pressure, usually, the ratio is larger than ten. Details of these 3-D finite element meshes are provided in Table 1.

The ground water table is assumed to be at the ground surface and is in hydrostatic condition at the initial stage. The base of the mesh is assumed to be fixed in all directions and impermeable, side face boundaries are constrained in the transverse direction, but free in in-plane directions (both displacement and water flux). The top surface is free in all direction and free-draining with pore pressures assumed to be zero. The soil material is assumed to be isotropic and linear elastic with constant effective Poisson’s ratio (’) of 0.3.

The effective Young’s modulus (E’) and

coefficient of permeability (k) depend on the soil type. A uniform footing load of 0.1 MPa is applied “instantaneously” over the first time step of 1 s.

Subsequent

dissipation of the pore water pressure and settlement beneath the footing are studied by using a backward difference time discretization scheme with t = 1 s.

In

summary, we study the footing problem resting on the following 3 soil profiles: Soil profile 1: homogeneous soft clay with E = 1 MPa, k = 10-9 m/s. Soil profile 2: homogeneous dense sand with E = 100 MPa, k = 10-5 m/s; Soil profile 3: heterogeneous soil consisting of alternate soft clay and dense sand soil layers with parameters E = 1 MPa, k = 10-9 m/s and E = 100 MPa, k = 10-5 m/s, respectively. All the numerical studies in this paper are conducted using a Pentium IV, 2.4 GHz desktop PC with a physical memory of 1 GB, and the programs are written in FORTRAN.

3.3 Choice of parameters in GJ(MSSOR) and eigenvalue distributions of GJ (MSSOR) preconditioned matrices 13

Figure 3 shows the eigenvalue distributions of two GJ preconditioned matrices, for the 7×7×7 meshed problem on soil profile 1, corresponding to the parameters   1 , and   20 , respectively. It is interesting to observe that the “imaginary” wing is considerably compressed when 

is larger. Thus one would expect the GJ

preconditioned matrix associated with a larger 

to have a faster asymptotic

convergence rate. Unfortunately, the minimum real part of the eigenvalues also decreases with 

and the resultant effect is to slow down the asymptotic

convergence rate. Therefore, to obtain the optimal asymptotic convergence rate, a balance has to be maintained between the above two opposing effects: reducing the span of the imaginary wing, and having eigenvalues closer to the origin. With the help of Schwarz-Christoffel mapping for polygonal region, the asymptotic convergence rates for both eigenvalue distributions are estimated to be 0.98 for   1 and 0.95 for   20 . These estimated convergence rates indicate that it is beneficial to use GJ with   1 .

For completeness, the eigenvalue distribution of the Pc

preconditioned matrix is also shown in Figure 3, and the asymptotic convergence rate associated with the spectrum is 0.92. Interestingly, the eigenvalue distribution of the Pc preconditioned matrix (which can be proven to have only real eigenvalues) is

almost the same as the real part of the eigenvalue distribution of the GJ preconditioned matrix with   1 . This could partially be attributed to the fact that both GJ and Pc preconditioners use the same diagonal approximation to the (1, 1) block K.

In Section 2.4 it has been mentioned that the performance of MSSOR preconditioner can be influenced by different ordering scheme. This is shown by the results in Table 2. In this table, the performance of MSSOR with   1.0 and   4 is evaluated under three different ordering schemes (iord1, iord2 and iord3). Homogeneous and layered soil profiles are considered. The performance indicators are: (1) iteration count (iters), (2) overhead time (to) which covers formation of sparse coefficient matrix and construction of the preconditioner, (3) iteration time (ti), and (4) 14

total runtime for one time step (tt). Regardless of the mesh size and soil profile, it can be seen that the natural ordering scheme always lead to less iteration count and less total runtime. Hence, we only implement MSSOR for the natural ordering scheme in the numerical examples discussed below.

It is interesting to also investigate how the choice of  and  would affect the asymptotic convergence rate of the MSSOR preconditioned matrix. We must emphasize that our intention here is not to search for the optimal pair of parameters but to use the example problems to investigate the benefit of taking Dˆ to be the GJ preconditioner with  different from 4 recommended in Reference [27].

For

MSSOR(   1.0 ) preconditioned SQMR solver, the effect of varying  over the range from −1 to −80 is shown in Table 3 for the 7×7×7 and 16×16×16 meshed problems over different soil conditions. It is clear that the optimal  value is problem-dependent. It appears that  should be larger for a larger mesh. The iteration count under   4 is at most 1.5 times the iteration count under the optimal α in the problems studied.

Next, we investigate the effect of varying the relaxation parameter  in the MSSOR preconditioner for   50 . The results for  ranging from 1.0 to 1.8 for the 7×7×7 and 16×16×16 meshed problems under different soil conditions are shown in Table 4. It is obvious that there exists optimal values of  for different soil conditions, and appropriate choices of  may lead to smaller iteration counts and shorter total runtimes. However, as we have mentioned previously, the optimal value of  is expensive to determine and usually can only be obtained through numerical experiments. It seems that the performance of MSSOR preconditioned SQMR method is not very sensitive to  compared with the effect of varying  . Based on the numerical results provided in Table 4, a reasonably good choice for  is located in the interval [1.2, 1.4].

15

Table 4 also gives the performance of standard SSOR preconditioned SQMR method, it is clear that for soil profile 1 and soil profile 3, standard SSOR preconditioner may breakdown (more exactly near-breakdown). The breakdown is likely to be the consequence of unstable triangular solves produced by the irregular distribution of small negative entries in the leading diagonal under natural ordering. To avoid this breakdown, block ordering strategy can be used; moreover, block ordering can significantly improve the performance of standard SSOR preconditioner for soil profile 2, but the performance of block ordering for SSOR preconditioned SQMR method is still inferior to that achieved by using the MSSOR preconditioned version.

Figure 4 shows the eigenvalue distributions of three MSSOR preconditioned matrices corresponding to the pair of parameters (   4,   1.0 ), (   20,   1.0 ) and (   20,   1.3 ), respectively. Again, these estimated convergence rates indicate that it is beneficial to use   1 . A larger  value has a compression effect on the imaginary part of the eigenvalues. By taking   1.0 , the interval containing the real part of the eigenvalues may be enlarged, but the eigenvalues with nonzero imaginary parts are moved further way from the origin. Observe that the eigenvalue distributions of the MSSOR preconditioned matrices have more compressed real parts compared to the GJ preconditioned counterparts (Fig. 3b).

The MSSOR

preconditioner contracts the range of the real part of eigenvalues by adopting the SSOR approximation to the (1,1) block. From the compactness of the eigenvalue distributions, it is quite clear that the MSSOR preconditioned matrices would have smaller convergence rates than the GJ preconditioned ones.

3.4 Performance of MSSOR versus GJ and Pc Tables 5 and 6 give the numerical results of the SQMR method preconditioned by GJ, Pc and MSSOR in homogeneous soft clay and dense sand, respectively. Numerical experiments to compare the performance of these preconditioners in the layered soil profile are also performed, and the results are tabulated in Table 7. 16

The Pc

preconditioner is implemented with and without the Eisenstat trick. It is clear that the Eisenstat trick works in principle on block SSOR preconditioners, but has little practical impact on the runtime. In contrast, the simple pointwise variant [Equation (9)] is able to exploit the Eisenstat trick intensively in terms of runtime.

Figure 5 (a) displays the convergence histories of SQMR solver preconditioned by GJ, Pc and MSSOR preconditioners, respectively for the 7×7×7 meshed footing problem on homogeneous soil profiles. It can be seen that the MSSOR preconditioner leads to faster convergence rate than GJ and Pc. The convergence histories of these preconditioned methods for the layered soil profile problem are shown in Fig. 5b. The iteration counts are larger, but the MSSOR preconditioner still out-performs the others.

Figure 6 shows the iteration counts versus DOFs. The basic trend for all the 3 preconditioned SQMR methods is that the iteration counts increase sub-linearly, but each with a different growth rate with GJ preconditioned method being the fastest and MSSOR preconditioned method being the slowest. In fact, for the harder layered soil problem, the iteration count for the MSSOR preconditioned method only increases 2.1 times when the DOFs increases 15 times from 7160 to 107180 (corresponding to the 8×8×8 and 20×20×20 problems). Figure 7 compares the cost of each iteration of the various preconditioned methods.

MSSOR is only marginally more expensive than

GJ. This result is very significant because the cost per iteration in GJ is almost minimal (it is just a simple diagonal preconditioner). It is obvious that the Eisenstat trick is very effective when exploited in the right way.

4.

Conclusions

The main contribution in this work is to propose a cheap modified SSOR preconditioner (MSSOR). The modification is carefully selected to exploit the Eisenstat trick intensively in the terms of runtime.

Specifically, each MSSOR

preconditioned SQMR step is only marginally more expensive than a single 17

matrix-vector product step. This is close to the minimum achievable from simple Jacobi type preconditioners, where the preconditioner application step is almost cost free.

For the diagonal matrix forming part of the MSSOR preconditioner, we adopt

the GJ preconditioner to overcome the numerical difficulties encountered by the standard SSOR preconditioner for matrices having very small (or even zero) diagonal entries. These matrices occur in many geomechanics applications involving solution of large-scale symmetric indefinite linear systems arising from FE discretized Biot’s consolidation equations. We analyzed the eigenvalue distributions of our MSSOR preconditioned matrices empirically and concluded that the associated asymptotic convergence rates to be better than those of GJ or Pc preconditioned matrices. Numerical experiments based on several footing problems including the more difficult layered soil profile problem show that our MSSOR preconditioner not only perform (in terms of total runtime) far better than the standard SSOR preconditioner, but it is also much better than some of the effective preconditioners such as GJ and Pc published very recently. Moreover, with increasing problem size, the performance of our MSSOR preconditioner against GJ or Pc is expected to get better. Another major advantage is that with a better approximation to the (1,1) K block, iteration count increases more moderately from homogeneous to layered soil profiles.

Nevertheless, it is acknowledged that

improvements are possible and should be further studied because one should ideally construct a preconditioner that is not affected by the type of soil profile or mesh size. It is worth noting that the proposed MSSOR preconditioners can be generalized readily to nonsymmetric cases or problems with a zero (2, 2) block.

APPENDIX A Algorithm 1: The symmetric QMR algorithm [12] for the symmetric indefinite system Ax = b, using the modified SSOR preconditioner. 0) Start: choose an initial guess x0, and z0 = 0, then set ~ ~ s0  ( L  D ) 1 (b  Ax0 ) , v0  s0 , w0  Dv0 ,  0  s0T s0 ,  0  s0T w0 ,

d 0  0  n and 0  0   .

18

For k = 1 to maxit Do: 1) Compute

 ~ t k 1  A v k 1 ,  k 1  wkT1tk 1  k 1  k 1 and sk  sk 1   k 1tk 1  k 1

2) Compute

k 

skT sk

 k 1

, ck 

1 ,  k   k 1k ck and d k  ckk 1d k 1  ck k 1vk 1 1  k

3) Set zk  zk 1  d k 4) Check convergence ~ ~ ~ rk  ( L  D ) sk . If converged, then set x k  x0  ( LT  D ) 1 Dz k , and stop. 5) Compute ~

 k  skT Dsk ,  k 

k ~ , vk  sk   k vk 1 and wk  Dv k  k 1

End Algorithm 2: Forward substitution (1  N) using the CSC storage (icsc, jcsc, csca) of the upper triangular part of A. The modified diagonal is stored in a vector {ãjj}. ~ Compute x  ( L  D) 1 y as follows:

(1) (2) (3) (4) (5) (6) (7) (8)

x(1) = y(1)/ã11 For j = 2 : N k1 = jcsc(j); k2 = jcsc(j+1) – 1; For i = k1 : k2 – 1 y(j) = y(j) – csca(i)* x(icsc(i)) End x(j) = y(j)/ ãjj End

Algorithm 3: Backward substitution (N  1) using the CSC storage (icsc, jcsc, csca) of the upper triangular part of A. The modified diagonal is stored in a vector {ãjj}. ~ Computer x  ( LT  D) 1 y as follows:

(1) (2) (3) (4) (5) (6) (7)

For j = N : 2 x(j) = y(j)/ ãjj k1 = jcsc(j); k2 = jcsc(j+1) – 2; For i = k1 : k2 y(icsc(i)) = y(icsc(i)) – csca(i)* x(j) End End

19

(8)

x(1) = y(1)/ ã11

~ Algorithm 4: Compute r  ( L  D) s given the CSC storage (icsc, jcsc, csca) of the upper triangular of A, and the modified diagonal vector { ãjj }. (1) r = zeros(N, 1) (2) r(1) = r(1)+ ã11*s(1) (3) For j = 2 : N (4) k1 = jcsc(j); k2 = jcsc(j+1) – 1; (5) For k = k1: k2-1 (6) r(j) = r(j)+ csca(k) * s(icsc(k)) (7) End (8) r(j) = r(j)+ ãjj *s(j) (9) End

References

[1]. Britto AM and Gunn MJ. Critical State Soil Mechanics via Finite Elements. Ellis Horwood Ltd.: Chichester, West Sussex, 1952.

[2]. Phoon KK, Toh KC, Chan SH and Lee FH. An Efficient Diagonal Preconditioner for Finite Element Solution of Biot’s Consolidation Equations. International Journal for Numerical Methods in Engineering 2002; 55(4), pp. 377-400.

[3]. Saad Y. Iterative methods for sparse linear systems. PWS Publishing Company, Boston, 1996. [4]. Paige CC, Saunders MA. Solution of Sparse indefinite systems of linear equations. SIAM Journal on Numerical Analysis 1975; 12:617629. [5]. Freund RW, Nachtigal NM. A new Krylov-subspace method for symmetric indefinite linear system. In Proceedings of the 14th IMACS World Congress on Computational and Applied Mathematics, Ames WF, (ed.). IMACS: New Brunswick, NJ, 1994; 12531256. [6]. Chan SH, Phoon KK and Lee FH. A modified Jacobi preconditioner for solving ill-conditioned Biot’s consolidation equations using symmetric quasi-minimal residual method. International Journal for Numerical and Analytical Methods in Geomechanics 2001; 25:10011025.

20

[7]. Toh KC, Phoon KK, Chan SH. Block preconditioners for symmetric indefinite linear systems. International Journal for Numerical Methods in Engineering, 2004; 60: 1361-1381. [8]. Phoon KK, Toh KC, Chen X. Block constrained versus generalized Jacobi preconditioners for iterative solution of large-scale Biot’s FEM equations. Computers & Structures, 2004; 82: 24012411. [9]. Eisenstat S.C. Efficient implementation of a class of preconditioned conjugate gradient methods, SIAM Journal on Scientific and Statistical Computing, 1981; 2: 1–4.

[10]. Mroueh H, Shahrour I. Use of sparse iterative methods for the resolution of three-dimensional soil/structure interaction problems. International Journal for Numerical and Analytical Methods in Geomechanics, 1999; 23: 19611975. [11]. Toh KC, Phoon KK, Further observations on generalized Jacobi preconditioning for iterative solution of Biot’s FEM equations, working paper, Department of Mathematics, National University of Singapore, 2005. [12]. Freund RW, Jarre F. A QMR-based interior-point algorithm for solving linear programs. Mathematical Programming, 1997; 76: 183210. [13]. Chan TF, Van der Vorst HA. Approximate and incomplete factorizations. Technical Report 871, Department of Mathematics, University of Utrecht, 1994. [14]. Chow E, Heroux MA. An object-oriented framework for block preconditioning. ACM Transactions on Mathematical Software (TOMS), 1998; 24: 159183. [15]. Axelsson O. Iterative solution methods, Cambridge University Press, New York, NY, 1995. [16]. Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Van der Vorst HA. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Philadelphia: SIAM Press, 1994. [17]. Duff IS, Meurant GA. The effect of ordering on preconditioned conjugate gradients. BIT, 1989; 29: 635657.

21

[18]. Eijkhout V. On the existence problem of incomplete factorisation methods, Lapack Working Note 144, UT-CS-99-435, 1999. [19]. Chow E, Saad Y. Experimental study of ILU preconditioners for indefinite matrices, Journal of Computational and Applied Mathematics, 1997; 86: 387414. [20]. Pommerell C, Fichtner W. PILS: an iterative linear solver package for ill-conditioned systems, Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Albuquerque, New Mexico, United States, 1991; 588-599. [21]. Murphy MF, Golub GH, Wathen AJ. A note on preconditioning for indefinite linear systems, SIAM Journal of Scientific Computing, 2000; 21: 1969–1972. [22]. Axelsson O. A generalized SSOR method, BIT, 1972; 13: 443467. [23]. Meijerink JA, Van der Vorst HA. An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix, Mathematics of Computation, 1977; 31: 148162. [24]. Gustafsson I. A class of first order factorization methods, BIT, 1978; 18: 142156. [25]. Gambolati G, Pini G and Ferronato M. Numerical performance of projection methods in finite element consolidation models. International Journal for Numerical and Analytical Methods in Geomechanics, 2001; 25: 14291447. [26]. Van der Vorst HA. Iterative methods for large linear systems. Cambridge University Press, Cambridge, 2003. [27]. Toh KC. Solving large scale semidefinite programs via an iterative solver on the augmented systems. SIAM Journal on Optimization, 2003; 14:670698.

22

Figure 1. 7×7×7 finite element mesh and correspondent sparsity pattern of coefficient matrix of Biot’s linear system for simple footing problem.

Figure 2.

20×20×20 finite element mesh and soil layer profile of a quadrant symmetric shallow foundation problem.

23

1.0

imag(eig)

0.5 0.0 |real()|min=8.546 |real()|max=5.327

-0.5 -1.0 0.0 (a)

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

0.3

imag(eig)

0.2 0.1 0.0 -0.1

|real()|min=2.933 |real()|max=5.344

-0.2 -0.3 0.0

(b)

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

0.3

imag(eig)

0.2 0.1 0.0 -0.1

|real()|min=8.541 |real()|max=5.141

-0.2 -0.3 0.0 (c)

0.5

1.0

1.5

2.0

2.5 3.0 real(eig)

3.5

4.0

4.5

5.0

5.5

Figure 3. Eigenvalue distributions in complex plane (a) of GJ (   4 ) preconditioned matrix; (b) of GJ (   20 ) preconditioned matrix; (c) of Pc preconditioned matrix, for 7×7×7 FE mesh with soil profile 1.

24

0.3

imag(eig)

0.2 0.1 0.0 -0.1

|real()|min=1.464 |real()|max=1.0

-0.2 -0.3 0.0 (a)

0.5

1.0

1.5

0.3

imag(eig)

0.2 0.1 0.0 -0.1

|real()|min=2.912 |real()|max=1.0

-0.2 -0.3 0.0

(b)

0.5

1.0

1.5

0.3

imag(eig)

0.2 0.1 0.0 -0.1

|real()|min=3.484 |real()|max=1.429

-0.2 -0.3 0.0 (c)

0.5

1.0

1.5

real(eig)

Figure 4. Eigenvalue distributions in complex plane of MSSOR preconditioned matrices (a) MSSOR (   4 ,  = 1.0); (b) MSSOR (   20 ,  = 1.0); (c) MSSOR (   20 ,  = 1.3).

25

Relative residual norm

1E+003 1E+002 1E+001 1E+000 1E-001 1E-002 1E-003 1E-004 1E-005 1E-006 1E-007

Relative residual norm

(a)

Pc

0

1E+003 1E+002 1E+001 1E+000 1E-001 1E-002 1E-003 1E-004 1E-005 1E-006 1E-007 (b)

GJ

SSOR

MSSOR 100

200

MSSOR

0

100

300

400 500 Iteration count

Pc

200

300

600

700

800

600

700

800

GJ

400 500 Iteration count

Figure 5. Convergence history of SQMR method preconditioned by GJ (   4 ), Pc, SSOR (  = 1.0) and MSSOR (   4 ,  = 1.0), respectively, for 777 finite element mesh (a) with soil profile 1 (solid line) and with soil profile 2 (dashed line), respectively; (b) with soil profile 3.

26

4500

4500 GJ-SP 1 GJ-SP 2 Pc-SP 1 Pc-SP 2 MSSOR-SP 1 MSSOR-SP 2

4000 3500

Iteration count

3000

3500 3000

2500

2500

2000

2000

1500

1500

1000

1000

500

500

0

0 0

(a)

GJ-SP 3 Pc-SP 3 MSSOR-SP 3

4000

2

4

6 DOFs

8

10

12 

0 (b)

2

4

6 DOFs

8

10

12 

Figure 6. Iteration count versus DOFs for SQMR method preconditioned by GJ (   4 ), Pc and MSSOR (   4 ,  = 1.0), respectively for three soil profile (SP) cases.

27

1.5

1.5

1.4

1.4

Soil Profile 1 & 2

1.3

1.3

1.2

1.2

Time per iteration (s)

1.1

1.1

Pc

1.0

Pc

1.0

0.9

0.9

MSSOR

0.8

0.7

0.6

0.6

0.5

MSSOR

0.8

0.7

0.5

GJ

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0.0

GJ

0.0 0

(a)

Soil profile 3

2

4

6 DOFs

8

10

12 

0 (b)

2

4

6 DOFs

8

10

12 

Figure 7. Time per iteration versus DOFs for SQMR method preconditioned by GJ (   4 ), Pc and MSSOR (   4 ,  = 1.0) for three soil profile cases.

28

Table 1. 3-D finite element meshes Mesh size 7×7×7 Number of elements (ne) 343 Number of nodes 1856 DOFs Displacement (m) 4396 Pore pressure (n) 448 Total (N = m + n) 4844 No. non-zeros (nnz) nnz(triu(K)) 289386 nnz (B) 67817 nnz( triu©) 9196

8×8×8 512 2673

12×12×12 1728 8281

16×16×16 4096 18785

20×20×20 8000 35721

6512 648 7160

21576 2028 23604

50656 4624 55280

98360 8820 107180

443290 1606475 103965 375140 7199 24287

3920747 907608 57535

7836115 1812664 112319

nnz (Sˆ )

33524

51714

187974

461834

921294

nnz(A)/(N 2 ) (%)

3.10

2.15

0.72

0.32

0.17

29

Table 2. Effect of ordering on MSSOR(  =1,  = 4) preconditioned SQMR method. Mesh

Soil Profile 1

202020

161616

121212

888

size

Soil Profile 2

Soil Profile 3

iord1

iord2

iord3

iord1

iord2

iord3

iord1

iord2

iord3

iters

100

110

105

95

105

105

270

300

310

to(s)

10.9

11.0

11.2

10.8

11.1

11.3

11.1

11.3

11.5

ti(s)

4.5

5.0

4.8

4.3

4.8

4.8

12.2

13.4

13.8

tt(s)

15.5

16.2

16.1

15.3

16.0

16.2

23.8

25.4

25.9

iters

160

175

165

155

165

170

470

520

500

to(s)

37.0

38.0

38.7

37.0

38.0

38.6

37.5

38.4

39.1

ti(s)

25.7

28.1

26.4

24.8

26.4

27.1

74.7

82.7

79.1

tt(s)

63.0

66.3

65.3

62.1

64.7

65.9

114.2

123.1

120.2

iters

225

270

265

220

250

240

725

790

740

to(s)

88.3

90.7

92.5

88.3

90.7

92.5

89.4

91.9

93.6

ti(s)

87.6

105.5

103.5

85.5

97.6

93.8

280.7

307.5

288.0

tt(s)

176.2

196.6

196.3

174.1

188.6

186.7

374.5

403.7

386.0

iters

330

365

360

290

315

330

965

1045

1170

to(s)

173.7

178.8

182.5

173.6

178.7

182.4

177.8

181.4

185.0

ti(s)

258.0

286.2

283.1

226.6

247.4

259.6

755.9

814.0

914.1

tt(s)

432.3

465.6

466.1

400.9

426.7

442.6

942.2

1003.9

1107.6

Table 3. Effect of α on iterative count of MSSOR(  =1.0) preconditioned SQMR method 

1

4

Soil profile 1 Soil profile 2 Soil profile 3

70 65 245

65 65 175

Soil profile 1 Soil profile 2 Soil profile 3

405 215 1540

225 220 725

10

20

30

40

Mesh size: 777 90 120 145 145 65 85 95 100 150 140 155 165 Mesh size: 161616 215 205 200 210 200 180 165 155 600 550 505 500

30

50

60

70

It  4 It min

165 100 180

165 110 180

165 115 210

1.00 1.00 1.25

210 155 490

225 165 485

235 170 495

1.13 1.42 1.49

Table 4. Effect of  on iterative count of standard SSOR and preconditioned SQMR methods, respectively. (maxit = 5000)  1.0 1.1 1.2 1.3 1.4 1.5 Mesh size: 777 Soil SSOR * * * * * * profile 1 MSSOR 165 155 145 155 160 165 Soil SSOR 745 1830     profile 2 MSSOR 100 105 110 110 115 120 Soil SSOR * * * * * * profile 3 MSSOR 180 175 185 190 195 220 Mesh size: 161616 Soil SSOR * * * * * * profile 1 MSSOR 210 205 185 190 190 210 Soil SSOR 380 480 720 3600   profile 2 MSSOR 155 155 145 140 155 165 Soil SSOR * * * * * * profile 3 MSSOR 490 460 445 420 415 415 (*) means ‘Breakdown’; () means fail to converge.

31

MSSOR(  = 50) 1.6

1.7

1.8

* 180  135 * 240

* 205  140 * 270

* 245  180 * 285

* 220  175 * 440

* 260  200 * 485

* 335  240 * 570

Table 5. Performance of several preconditioners over different mesh sizes for soil profile 1 with homogeneous soft clay, E’ = 1 MPa, v’ = 0.3, k = 10-9 m/s. Mesh Size

8×8×8

RAM(MB)

26.5

Iteration count Overhead (s) Iteration time (s) Total runtime (s)

378 11.1 14.0 25.7

RAM(MB)

27.5 220 11.4 11.4 23.4

Iteration count Overhead (s) Iteration time (s) Total runtime (s) RAM(MB) Iteration count Overhead (s) Iteration time (s) Total runtime (s) RAM(MB) Iteration count Overhead (s) Iteration time (s) Total runtime (s) RAM(MB) Iteration count Overhead (s) Iteration time (s) Total runtime (s)

12×12×12 GJ (  = 4) 84 654 37.8 87.5 127.3 Pc

84.5 333 41.0 66.9 109.8 Pc with Eisenstat trick

27.5 84.5 220 335 11.1 39.5 11.0 65.6 22.7 107.0 MSSOR (  =1.0,  = 4) 26.5 84.5 100 160 10.9 37.0 4.5 25.7 15.5 63.0 MSSOR (  =1.3,  = 50) 26.5 84.5 205 185 10.9 37.1 9.2 29.6 20.1 66.9

32

16×16×16

20×20×20

194.5 1062 90.1 347.5 442.0

386 1448 177.2 951.2 1136.9

196 444 106.7 232.9 343.9

387 554 247.5 636.8 892.8

196 445 102.3 227.3 334.0

387 560 238.1 629.5 876.1

195.0 225 88.3 87.6 176.2

387 330 173.7 258.0 432.3

195.0 190 88.3 73.8 162.5

387 215 173.6 168.3 342.5

Table 6. Performance of several preconditioners over different mesh sizes for soil profile 2 with homogeneous dense sand, E’ = 100 MPa, v’ = 0.3, k = 10-5 m/s. Mesh Size

8×8×8

RAM(MB)

26.5

Iteration count Overhead (s) Iteration time (s) Total runtime (s)

346 10.9 12.7 24.2

RAM (MB) Iteration count Overhead (s) Iteration time (s) Total runtime (s)

27.5 215 11.4 11.1 23.1

RAM (MB) Iteration count Overhead (s) Iteration time (s) Total runtime (s)

27.5 84.5 215 325 11.0 39.5 10.8 63.5 22.4 105.0 MSSOR (  =1.0,  = 4) 26.5 84.0 95 155 10.8 37.0 4.3 24.8 15.3 62.1 MSSOR (  =1.3,  = 50) 26.5 84.0 115 115 10.9 37.1 5.2 18.4 16.2 55.8

RAM (MB) Iteration count Overhead (s) Iteration time (s) Total runtime (s) RAM (MB) Iteration count Overhead (s) Iteration time (s) Total runtime (s)

12×12×12 GJ (  = 4) 84.0 578 37.3 76.3 115.5 Pc

84.5 322 40.9 64.4 107.3 Pc with Eisenstat trick

33

16×16×16

20×20×20

194.5

386

866 89.0 278.7 371.9

1292 174.9 837.4 1020.8

196 432 106.7 226.5 337.5

387 540 247.1 620.8 876.3

196 435 102.3 222.5 329.1

387 540 237.5 606.6 852.7

194.5 220 88.3 85.5 174.1

386 290 173.6 226.6 400.9

194.5 140 88.3 54.5 143.2

386 185 173.6 144.7 319.0

Table 7. Performance of several preconditioners over different mesh sizes for soil profile 3 with, E = 100 MPa, k = 10-5 m/s; Material 2: E = 1 MPa, k = 10-9 m/s. Mesh Size

8×8×8

RAM(MB)

26.5

Iteration count Overhead (s) Iteration time (s) Total runtime (s)

1143 10.9 41.6 53.0

RAM(MB)

27.5 572 11.4 29.5 41.6

Iteration count Overhead (s) Iteration time (s) Total runtime (s) RAM(MB) Iteration count Overhead (s) Iteration time (s) Total runtime (s) RAM(MB) Iteration count Overhead (s) Iteration time (s) Total runtime (s) RAM(MB) Iteration count Overhead (s) Iteration time (s) Total runtime (s)

12×12×12 GJ (  = 4) 84.0 2023 37.0 266.2 305.0 Pc

84.5 883 41.1 176.7 219.7 Pc with Eisenstat trick

27.5 84.5 575 880 11.0 39.2 28.8 172.1 40.4 213.2 MSSOR (  =1.0,  = 4) 26.5 84.0 270 470 11.1 37.5 12.2 74.7 23.8 114.2 MSSOR (  =1.3,  = 50) 26.5 84.0 240 330 10.9 37.3 10.6 52.4 22.2 91.7

34

16×16×16

20×20×20

194.5

386

2994 88.6 961.3 1054.2

4318 174.2 2779.5 2962.1

196 1186 107.1 620.6 732.0

387 1477 247.2 1696.3 1952.0

196 1190 101.8 607.9 714.1

387 1480 236.8 1662.0 1907.2

194.5 725 89.4 280.7 374.5

386 965 177.8 755.9 942.2

194.5 420 88.9 162.3 255.5

386 515 174.9 397.8 581.2

1 A Modified SSOR Preconditioner for Sparse ...

left-right preconditioning framework for SQMR. Given a .... indefinite linear system with a zero diagonal sub-block, direct application of (5) is impossible. ..... desktop PC with a physical memory of 1 GB, and the programs are written in .... numerical results provided in Table 4, a reasonably good choice for ω is located in.

3MB Sizes 3 Downloads 171 Views

Recommend Documents

A modified training scheme for SOFM to cluster ...
the University of Mysore and Master's in Electrical Engineering at Indian Institute of Science. He obtained his PhD Degree from Indian Institute of Science in the area of constructive learning RBF networks. He is the chairman of Information Science a

A Convex Hull Approach to Sparse Representations for ...
noise. A good classification model is one that best represents ...... Approximate Bayesian Compressed Sensing,” Human Language Tech- nologies, IBM, Tech.

A New Push-Relabel Algorithm for Sparse Networks 1 ...
Jul 4, 2014 - computer science and operations research, as well as practical ... algorithm for bounded-degree networks; that is, when there is some .... A distance labeling is a function d : V → Z≥0 that associates each vertex with a positive.

A New Estimate of Restricted Isometry Constants for Sparse Solutions
Jan 12, 2011 - q < hTc. 0 q. (11) for all nonzero vector h in the null space of Φ. It is called the null ... find two solutions hT0 and −hTc. 0 ... First of all, we have. 6 ...

A Convex Hull Approach to Sparse Representations for ...
1IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA. ... data or representations derived from the training data, such as prototypes or ...

pdf-2351\mitam-a-modified-ict-adoption-model-for-developing ...
... loading more pages. Retrying... pdf-2351\mitam-a-modified-ict-adoption-model-for-dev ... ing-in-a-developing-country-by-mohamed-elsaadani.pdf.

A modified approach for aggregation technique in WSN - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 3, Issue 7, ... Wireless sensor network are used in many application such as military, ...

A modified approach for aggregation technique in WSN - IJRIT
In computer science and telecommunications, wireless sensor networks are an ... and conferences arranged each year. ... Figure 1: Types of grids (a) Triangular lattice (b) Square grid (c) Hexagonal grid ... Degree-degree Correlations”, in Proc.

A Modified Complex Permittivity Measurement ...
Frequency-domain, MathCADTM, Microwave ... [100]. Some of the microwave measurement techniques are simple and .... using the microwave free-space.

A modified method for detecting incipient bifurcations in ...
Nov 6, 2006 - the calculation of explicit analytical solutions and their bifurcations ..... it towards c using polynomial regression and call it the DFA- propagator .... Abshagen, J., and A. Timmermann (2004), An organizing center for ther- mohaline

A WIDEBAND DOUBLY-SPARSE APPROACH ... - Infoscience - EPFL
a convolutive mixture of sources, exploiting the time-domain spar- sity of the mixing filters and the sparsity of the sources in the time- frequency (TF) domain.

A greedy algorithm for sparse recovery using precise ...
The problem of recovering ,however, is NP-hard since it requires searching ..... The top K absolute values of this vector is same as minimizing the .... In this section, we investigate the procedure we presented in Algorithm 1 on synthetic data.

A Sparse Structured Shrinkage Estimator for ...
Jan 11, 2011 - on model selection consistency and estimation bounds are derived. ..... The gradient and Jacobian of the objective function in (7) are, respectively,. Grg ...... the SSS procedure can indeed recover the motifs related to the cell ...

A fast convex conjugated algorithm for sparse recovery
of l1 minimization and run very fast on small dataset, they are still computationally expensive for large-scale ... quadratic constraint problem and make use of alternate minimiza- tion to solve it. At each iteration, we compute the ..... Windows XP

A Sparse Structured Shrinkage Estimator for ...
Jan 11, 2011 - for high-dimensional nonparametric varying-coefficient models and ... University of Pennsylvania School of Medicine, Blockley Hall, 423 .... the Appendix, available as online supplemental materials. 3 ...... Discovery and Genome-Wide E

A New Estimate of Restricted Isometry Constants for Sparse Solutions
Jan 12, 2011 - where ˜x1 is the standard l1 norm of vector ˜x. Suppose that x∗. 0 = k. Let T0 ⊂ {1,2,ททท ,n} be the subset of indices for the k largest entries of ...

Sparse Representations for Text Categorization
statistical algorithm to model the data and the type of feature selection algorithm used ... This is the first piece of work that explores the use of these SRs for text ...

A WIDEBAND DOUBLY-SPARSE APPROACH ... - Infoscience - EPFL
Page 1 .... build the matrices BΩj such that BΩj · a(j) ≈ 0, by selecting rows of Bnb indexed ... To build the narrowband CR [6] we first performed a STFT and then.

f) l / MODIFIED
(56). References Clted an audio signal provided by the microphone sound energy .... tions in alternative forms, speci?c embodiments thereof have been shoWn ...

Maharashtra Civil Services (conduct) Rules 1979 (Modified upto 1-3 ...
Maharashtra Civil Services (conduct) Rules 1979 (Modified upto 1-3-2014).pdf. Maharashtra Civil Services (conduct) Rules 1979 (Modified upto 1-3-2014).pdf.

BAYESIAN PURSUIT ALGORITHM FOR SPARSE ...
We show that using the Bayesian Hypothesis testing to de- termine the active ... suggested algorithm has the best performance among the al- gorithms which are ...

Deformation techniques for sparse systems
Deformation methods for computing all solutions of a given zero-dimensional ...... Denote by IK the ideal in K[X1,...,Xn] which is the extension of the ideal I :=.