1 2
Reliability of global sensitivity indices
3 4 5
Chonggang Xu1, George Z. Gertner2 1-Department of Entomology and Center for Quantitative Sciences in Biomedicine, North
6
Carolina State University; 2-Department of Natural Resources and Environmental
7
Sciences, University of Illinois at Urbana-Champaign.
8 9 10
In Press. doi:10.1080/00949655.2010.509317
11 12
1
1
Abstract: Uncertainty and sensitivity analysis is an essential ingredient of model
2
development and applications. For many uncertainty and sensitivity analysis techniques,
3
sensitivity indices are calculated based on a relatively large sample to measure the
4
importance of parameters in their contributions to uncertainties in model outputs. To
5
statistically compare their importance, it is necessary that uncertainty and sensitivity
6
analysis techniques provide standard errors of estimated sensitivity indices. In this paper,
7
a delta method is used to analytically approximate standard errors of estimated sensitivity
8
indices for a popular sensitivity analysis method, the Fourier Amplitude Sensitivity Test
9
(FAST). Standard errors estimated based on the delta method were compared to those
10
estimated based on 20 sample replicates. We found that the delta method can provide a
11
good approximation for the standard errors of both first-order and higher-order sensitivity
12
indices. Finally, based on the standard error approximation, we also proposed a method
13
to determine a minimum sample size to achieve the desired estimation precision for a
14
specified sensitivity index. The standard error estimation method presented in this paper
15
can make the FAST analysis computationally much more efficient for complex models.
16
Keywords: Fourier Amplitude Sensitivity Test, Sensitivity analysis, Uncertainty analysis,
17
Standard error, Simple random sampling, Random balance design sampling
18
2
1 2
1
Introduction Uncertainty and sensitivity analysis is an essential ingredient of numerical model
3
development and applications [1]. It can help in the model application by providing
4
model users with information on the reliability of model predictions, and can help the
5
model development by providing judging criteria for model identification, calibration and
6
corroboration. Uncertainty and sensitivity analysis attempts to quantify the uncertainties
7
in model outputs (generally resulted from uncertainties in model parameters) and the
8
importance of parameters in their contribution to the uncertainties in model outputs. In
9
accordance with the tradition within the uncertainty and sensitivity analysis community,
10
the parameters in this paper refer to the numerical model input variables (e.g., the daily
11
survival rate describing mosquito survival in a mosquito population model), instead of
12
those describing the probability distribution as used in statistics (e.g., mean and standard
13
deviation of a normal distribution). Many uncertainty and sensitivity analysis techniques
14
are now available [2,3]. A large group of them are variance-based methods, which
15
attempt to decompose the variance of a model output y into partial variances contributed
16
by individual model parameters. Their main concern is to calculate total variance in a
17
model output y [i.e., V(y)] and a conditional variance for expected value of model output
18
y given a specific set of parameters [4]. The ratios of conditional variances (also termed
19
partial variances) to total variance in the model output (referred to as sensitivity indices)
20
are used to measure the importance of parameters in their contributions to the uncertainty
21
in model outputs. Since variance-based sensitivity indices are generally calculated by
22
incorporating uncertainties in all parameters simultaneously, they are also referred to as
23
global sensitivity indices in the literature [3].
3
1
One of the most popular variance-based uncertainty and sensitivity analysis
2
techniques is Fourier Amplitude Sensitivity Test (FAST) [5,6,7,8]. For a review of other
3
variance-based methods, please refer to Borgonovo [9]. The FAST uses a periodic
4
sampling approach to introduce signals for all parameters of interest and a Fourier
5
transformation to decompose the variance of a model output into partial variances
6
contributed by different model parameters. The ratios of partial variances to total variance
7
(referred to as sensitivity indices) are used to measure the importance of main effects of
8
parameters (or interaction effects among parameters) in their contributions to variance of
9
the model output. Different sampling approaches can be used to estimate partial variances
10
in FAST, including the traditional sampling by a search curve in the parameter space
11
using an auxiliary variable [5,6,8], random balance design sampling [10], and simple
12
random sampling [11]. Xu and Gertner [11] compared the performance of different
13
sampling algorithms for FAST analysis. The FAST analysis is originally developed for
14
models with independent parameters. In order to extend FAST for models with dependent
15
parameters, Xu and Gertner [12,13] introduced a random reordering approach to account
16
for rank correlations among parameters. Until now, FAST analysis was mainly used to
17
estimate the partial variances contributed by the main effects of parameters. For the
18
traditional sampling based on the search curve, it has been heuristically shown or realized
19
that FAST can be used to estimate partial variances contributed by interactions among
20
parameters using linear combination of characteristic frequencies assigned to individual
21
parameters [14]. However, there is a lack of a theoretical proof and understanding for the
22
calculation of partial variances contributed by interactions among parameters. Xu and
23
Gertner [11] theoretically showed that FAST analysis can be used to estimate partial
4
1
variances contributed by both main effects and interaction effects of model parameters
2
for different sampling approaches including sampling based on the search curve, simple
3
random sampling, and random balance design sampling.
4
Sensitivity indices determined by the ratios of partial variances to total variance of
5
a model output in FAST can be used to measure the importance of parameters in their
6
contributions to uncertainty in the model output. However, due to the random errors
7
introduced by sampling, it could be misleading to judge the relative importance order for
8
parameters only based on the estimated sensitivity indices, especially if the sensitivity
9
indices of two parameters are close. In view of the random errors introduced by sampling,
10
it is important that we know the reliability of estimated sensitivity indices. In statistics,
11
reliability of a statistic (i.e., a sensitivity index in this paper) is generally measured by its
12
standard error. A common approach for standard error estimation is to conduct FAST
13
analysis using many (e.g., m) replicates of its periodic sample. For each replicate, a new
14
periodic sample with a relatively large sample size of N (N > 1000) that is required by
15
FAST is drawn . We will need to run the model on each newly drawn periodic sample
16
and calculate the sensitivity indices for different model parameters. Finally, the standard
17
error of an estimated sensitivity index can be calculated based on the standard deviation
18
of m estimated sensitivity indices. This method is effective and simple to implement for a
19
simple model (which takes a very short time to run a single simulation). However, for a
20
complex model which takes a long time (e.g., hours) to run a single simulation, the above
21
replicated sample method for standard error estimation could be computationally
22
expensive or infeasible.
5
1
Nowadays more complex models (especially spatial models) have been developed
2
to simulate complicated systems. For example, many forest landscape models have been
3
developed to predict the potential forest landscape changes under future global warming
4
(generally including inter-species competition, seed dispersal and seedling recruitment,
5
tree growth, fire disturbances or harvesting), which may take a number of hours to run a
6
single simulation for a large study area (e.g., millions of hectares) [15,16,17]. For vector-
7
borne diseases (e.g., dengue fever or malaria), population dynamics models of disease
8
vector (e.g., mosquitoes) and human movement are important for a better understanding
9
of vector dynamic and a more efficient disease control. In view of the importance of
10
mosquito dispersals to population dynamics in heterogeneous spatial domain and to the
11
success of vector control for dengue fevers [18], new spatial models have been developed
12
to account for mosquito dispersals [19,20]. Since the spatial model needs to simulate
13
biological developments within water containers (e.g., weight growth, physiological
14
development, pupation), effects of environment factors (e.g., temperature and moisture),
15
spatial dispersals for individual mosquito cohort and even gene flow for releasing
16
genetically modified mosquitoes to reduce virus-borne mosquito population, it can take
17
the model several hours to run a single simulation for a single city with thousands of
18
houses [20]. Uncertainty and sensitivity analysis is an important tool to help us
19
understand the reliability of landscape or population dynamics predicted from those
20
complex models. However, for complex models, the replicate-based method of standard
21
error estimation for sensitivity indices could be difficult or infeasible with a large number
22
(mൈN) of model runs.
6
1
Two potentially efficient approaches available to calculate standard errors of
2
sensitivity indices are the bootstrap method and the delta method. For the bootstrap
3
method, m replicates of samples drawn from the original periodic sample can be used to
4
estimate the standard errors of sensitivity indices calculated by FAST. As such, running
5
of the model for each of the m replicates is not required with the bootstrap method.
6
However, since the FAST analysis depends on periodic samples to form signals for
7
parameters (see eq.(2)), the resampling can potentially destroy the periodic signal which
8
would lead to an unreliable estimate of the standard error. Therefore, in this paper, the
9
delta method is used to analytically calculate the standard errors of sensitivity indices
10
based on a single periodic sample of size N required by FAST. Based on the derived
11
standard error calculation formula, we also propose a method of determining the
12
minimum sample size required to achieve desired estimation precision for a specified
13
sensitivity index.
14
The paper is organized as follows: Section 2.1 gives a general background of
15
FAST analysis; Section 2.2 derives the standard errors of estimated sensitivity indices for
16
simple random sampling; Section 2.3 derives the standard errors of estimated sensitivity
17
indices for random balance design sampling; and Section 2.4 provides a summary of
18
FAST analysis with standard error estimation by the delta method. Section 3 provides
19
two test models. Section 4 presents their results. Section 5 discusses the potential use of
20
derived standard errors for the determination of a minimum sample size to achieve a
21
desired estimation precision and extends the proposed method to models with correlated
22
parameters.
7
1
2
2
2.1
3
Method Review of FAST Consider a computer model y g ( x1 ,.., xn ) , where xi (i 1,..., n) is a model parameter
4
with probability density function f i ( xi ) and n is the total number of parameters of
5
interest. The parameter space is defined as K x n {x1 ,..., xn | xi ~ f i ( xi ), i 1,..., n} . (1)
6 7
The traditional version of the FAST assumes independence among model parameters.
8
The main idea of FAST is to use a periodic sampling approach to introduce a signal for
9
each parameter and then use a Fourier transformation to decompose total variance of
10
model output (y) into partial variances contributed by different model parameters. In
11
order to generate periodic samples for a specific parameter xi , FAST uses a periodic
12
search function (a function to search or explore the parameter space) as follows
13
[11,14,21,22],
14
1 xi G (i ) Fi 1 ( arcsin(sin( i ))) . 2
(2)
15
where i is a random variable uniformly distributed between 0 and 2 , and Fi 1 () is the
16
inverse cumulative distribution function of parameter xi . For the sampled values of i ,
17
the G( i ) is a function used to generate the corresponding samples for parameter xi ,
18
which will follow the pre-specified probability density function f i ( xi ) . Using eq. (2), the
19
parameter space can now be explored by samples in the -space defined as follows
20
K n {1 ,..., n | 0 i 2 , i 1,..., n} .
8
(3)
1
A main concern of variance-based uncertainty and sensitivity analysis is to calculate the
2
total variance of model output y [i.e., V(y)] and the conditional variance for expected
3
value of y given a specific set of parameters [4]. The conditional variances are used to
4
measure the importance of parameters in their contributions to uncertainty in the model
5
output. For example, a relatively large conditional variance for the expected value of y
6
given xi [i.e., V ( E ( y | xi )) ] will indicate that a relatively high amount of uncertainty in a
7
model output is contributed by this parameter xi . Similarly, a relatively large conditional
8
variance for the expected value of y given a specific set of parameters xsub [i.e.,
9
V ( E ( y | xsub )) ] will indicate that a relatively large amount of uncertainty in a model
10
output is contributed by this set of parameters. Using the Strong Law of Large Numbers,
11
it can be shown that the conditional variance for the expected value of y in the parameter
12
space can now be calculated in the -space (see Xu and Gertner [11] for details). Namely,
V ( xi ) Vx ( Ex ( y | xi )) V ( E ( y | i ))
13
Vx ( Ex ( y | xi , x j )) V ( E ( y | i , j )), i j
V
( xi , x j )
V
( xi , x j , xk )
Vx ( Ex ( y | xi , x j , xk )) V ( E ( y | i , j , k )), i j k
(4)
...... V
({ x1,..., xn }\ xi )
Vx ( Ex ( y |{x1,..., xn } \ xi )) V ( E ( y |{1,..., n } \ i ))
V ( x1 ,..., xn ) Vx ( g ( x1 ,..., xn )) V ( g (G (1 ),..., g (G ( n )) 14
where Vx () and V () are the conditional variances calculated in the parameter space
15
and -space respectively; Ex () and E () are the expected values calculated in the
16
parameter space and -space respectively; { x1,..., xn } \ xi represents all parameters except
17
xi .For a subset ( xsub ) of all parameters {x1,..., xn } , the V ( xsub ) represents the partial variance
18
in model output y due to the uncertainty in subset parameters xsub . The V
9
( x1,..., xn )
represents
1
the variance of model output y resulting from uncertainties in all model parameters.
2
Namely,
3
V
V ( y ).
( x1,..., xn )
(5)
4
Following the decomposition of variance in analysis of variance (ANOVA) assuming
5
independence among parameters [4], we define
Vxi V ( xi ) , Vxi x j V 6
( xi , x j )
Vxi Vx j , i j (6)
...... Vx1,..., xn V
r
( x1,..., xn )
s 1 1i1 i2 is n
Vxi ... xi , r 1, 2,..., n 1 1
s
7
as partial variances of model output contributed by the first-order (or main) effects, the
8
second-order interaction effects, the third-order interaction effects, and so on, until the
9
nth order interaction effects of model parameters. Summing all the left and right terms in
10
eq. (6), we get the variance decomposition as follows, n
V ( y ) Vxi Vxi x j
11
i 1
i j
V
i j k
xi x j xk
Vx1,..., xn
(7)
12
which suggests that the total variance resulting from parameter uncertainties can be
13
decomposed into partial variances contributed by the first-order effects, the second-order
14
interaction effects, third-order interaction effects, and until the nth order interaction
15
effects of parameters. Dividing both sides of eq. (7) by V ( x1,..., xn ) , we get n
1 xi xi x j
16
i 1
17
i j
i j k
xi x j xk
x1,..., xn
where
10
(8)
x i
V
V
x x
1
i j
xi ( x1,..., xn )
Vxi x j V
( x1,..., xn )
(9)
...
x
1... xn
Vx1... xn V
( x1,..., xn )
2
represent the first-order, second-order, and so on, until nth order sensitivity indices,
3
respectively.
4
Using search functions with eq. (2), the model output becomes a multiple periodic
5
function of ( 1 ,..., n ). Thus, we can apply a multiple Fourier transformation to the model
6
y g (G (1 ),..., G ( n )) over all variables in the -space. Namely, we have g (G (1 ),..., G ( n ))
7
8 9
C ( ) r1 ..rn ei ( r11 rnn )
(10)
r1 ,..., rn
where C ( ) r1 ..rn (
2
2
1 n ) g (G (1 ),..., G ( n ))e i ( r11 rn n ) d1 d n . (11) 2 0 0
10
It is notable that C ( ) r1 ..rn is the expected value of g (G (1 ),..., G ( n ))e i ( r11 rnn ) in view
11
that 1 ,..., n are independently and uniformly distributed in the -space. Namely,
12
C ( ) r1 ..rn E ( g (G (1 ),..., G ( n ))e i ( r11 rn n ) )
13 14
(12)
which can be estimated based on N samples of 1 ,..., n as follows,
1 Cˆ ( ) r1 ..rn N
N
(g (G( j 1
( j) 1
),..., G ( n ))e i ( r11 ( j)
( j)
rn n( j ) )
)
(13)
15
where ( j ) represents the jth sample for i . For notational convenience, we define the i
16
cosine Fourier coefficient ar1 ..rn and sine Fourier coefficient br1 ..rn as follows, 11
ar1 ..rn E [ g (G (1 ),..., G ( n )) cos( r11 rn n )]
1
2
br1 ..rn E [ g (G (1 ),..., G ( n )) cos(r11 rn n )]
(14)
with
C ( ) r1 ..rn ar1 ..rn ibr1 ..rn .
3
(15)
4
Based on eq. (14), the cosine Fourier coefficient ar and sine Fourier coefficient br can be
5
estimated as follows, aˆr
6
1 N
1 bˆr N
N
g (
( j) 1
, n ( j ) ) cos(r11( j ) rn n ( j ) ),
( j) 1
, n ) sin(r
j 1 N
g ( j 1
(16) ( j)
( j) 1 1
rn n ), ( j)
7
where ( j ) represents the jth sample for i . Using the multiple Fourier transformation, i
8
partial variances in eq. (6) can be estimated by the sum of Fourier amplitudes (i.e.,
9
| Cr1rn |2 ) at different frequencies (see Xu and Gertner [11] for a proof),
Vxi | C ( ) 00ri 00 |2 |ri | 1
Vxi x j 10
|ri |,|r j | 1
Vxi x j xk
| C ( ) 00ri rj 00 |2
|ri |,|r j |,|rk | 1
| C ( ) 00ri rj rk 00 |2
(17)
... Vx1xn
|r1 |,,|rn | 1
| C ( ) r1 rn |2 .
11
Eq. (17) shows that the Fourier amplitude | C ( ) 00ri 00 |2 results from the main effects of
12
parameters, the Fourier amplitude | C ( ) 00ri rj 00 |2 results from the second-order
13
interaction effects, the Fourier amplitude | C ( ) 00ri rj rk 00 |2 results from the third-order
12
1
interaction effects, and the Fourier amplitude | C ( ) r1 rn |2 results from the nth order
2
interaction effects. Thus, to calculate the partial variances, we only need to estimate the
3
Fourier coefficients based on eq. (13). Summing all the terms in eq. (17) based on eq. (7),
4
it is easy to show that
5
V ( x1 ,..., xn )
r1 ,, rn , ( r1 ,, rn ) (0,...,0)
| C ( ) r1rn |2 .
(18)
6
It is notable that there is only one period for the sampled parameter values using search
7
functions with eq. (2). Thus, there are strong signals for parameters in the Fourier
8
amplitudes (i.e., | C ( ) r1rn |2 ) when the fundamental frequency is equal to one (i.e.,
9
r1 1 , or r2 1 ,..., or rn 1 ). The signals decrease at higher harmonics which are integer
10
(termed the harmonic order) multiples of the fundamental frequency [6]. The signals in
11
the Fourier amplitude become close to zero, if any of r1 ,..., rn are relatively large (i.e., at a
12
relatively high harmonic order). The harmonic order at which the Fourier amplitudes
13
become close to zero is defined as a maximum harmonic order, which is commonly four
14
or six in practice and could be greater for highly nonlinear models [6]. With a specified
15
maximum harmonic order M (please see the end of Section 2.2 for a description of M
16
selection), the partial variances contributed by the main effects and interaction effects can
17
be calculated as
13
M
Vxi | C ( ) 00ri 00 |2 |ri | 1
Vxi x j
M
|ri |,|r j | 1
Vxi x j xk
1
| C ( ) 00ri rj 00 |2
M
|ri |,|r j |,|rk | 1
| C ( ) 00ri rj rk 00 |2
(19)
... Vx1xn 2
M
|r1 |,,|rn | 1
| C ( ) r1 rn |2 .
Different sampling methods can be used to estimate the partial variances
3
contributed by different model parameters [11]. The sampling methods include the
4
traditional sampling by a search curve using an auxiliary variable s [5,6,8], random
5
balance design sampling [10], and simple random sampling [11]. For the traditional
6
sampling using an auxiliary variable s, in addition to sampling errors, there are also
7
interference errors resulted from the selection of frequencies for different model
8
parameters (see Xu and Gertner [11] for details). The sampling errors are substantially
9
smaller than the interference errors. In addition, for the traditional search-curve based
10
sampling, it is difficult or infeasible to conduct FAST analysis for models with many
11
parameters due to the large sample size required (e.g., >20 parameters) [10]. Therefore, in
12
this paper, we are mainly concerned with simple random sampling and random balance
13
design sampling.
14
2.2
15 16
Standard errors of sensitivity indices for simple random sampling For simple random sampling, we draw a random sample of size N in the -space
( j) { (1( j ) ,..., n ( j ) ) , j=1, …, N } to estimate the Fourier coefficients using eq. (13). For
14
1
the first-order or higher-order sensitivity indices, using eq. (9) and eq. (19), they can be
2
estimated based on a sample in the -space as follows,
ˆ
3
r H ( M )
Cˆ (r )
2
(20)
ˆ 2
4
where r H ( M ) ( xi ) {(r1 ,..., rn ), | ri | 1,..., M , rj 0 for j {1,.., n} \ i} for the calculation
5
of first-order sensitivity index of a specific parameter xi [i.e., xi in eq. (9)];
6
r H ( M ) ( xi , x j ) {(r1 ,..., rn ), | ri |,| rj | 1,..., M , rk 0 for k {1,.., n} \{i, j}} for the
7
calculation of second-order sensitivity index resulted from interactions of two parameters
8
xi and xj [i.e., xi x j in eq. (9)]; and r H ( M ) ( x1 ,..., xn ) {(r1 ,..., rn ), | r1 |,...,| rn | 1,..., M }
9
for the calculation of nth order sensitivity index resulted from interactions of all
10
parameters [i.e., x1... xn in eq. (9)]. We define H ( M ) as a general set to represent one of
11
the harmonic sets H ( M ) ( xi ) , H ( M ) ( xi , x j ) , …, or H ( M ) ( x1 ,..., xn ) for the calculation of first-
12
order, second-order and so on until the nth order sensitivity indices. The M is a user-
13
selected maximum harmonic order (generally four or six, we will discuss selection details
14
at the end of this section after we introduce the necessary basics). The ˆ 2 is the sample
15
variance of model output y,
16
ˆ 2
1 N [ g (1( j ) ,..., n ( j ) ) g (1 ,..., n )]2 N 1 j 1
(21)
17
where g (1 ,..., n ) is the sample mean of model output y. Replace Cˆ (r ) in eq. (20) by the
18
estimated cosine and sine Fourier coefficient in eq. (15), the sensitivity indices can be
19
calculated as follows,
15
ˆ
1
r H ( M )
(aˆr 2 bˆr 2 )
ˆ 2
(22)
.
2
Based on eq. (22), we can easily see that both estimation errors for Fourier coefficients
3
aˆr and bˆr , and the estimation error for sample variance ˆ 2 of the model output leads to
4
the estimation error for a sensitivity index ˆ . If we can estimate the standard errors for
5
aˆr , bˆr , and ˆ 2 , we are able to approximately estimate the standard error for the
6
sensitivity index ˆ using the delta method (see Appendix A for details of delta method).
7
For a large sample size N , we assume that aˆr , bˆr , and ˆ 2 approximately follow a
8
multivariate-normal distribution with a mean vector ( {ar } |rH ( M ) , {br } |rH ( M ) , 2 ) and a
9
covariance matrix , where {ar } |rH ( M ) represents the list of cosine Fourier coefficients
10
and {br } |rH ( M ) represents the list of sine Fourier coefficients for all r H ( M ) . Let
({ 11
} |rH ( M ) ,{ }|rH ( M ) , 2 ) ar br
M (ar 2 br 2 ) 2a 2b , { 2r } |rH ( M ) ,{ 2r } |rH ( M ) , r ( 2 ) 2
(23)
12
based on the delta method (see Appendix A for details), ˆ will approximately follow a
13
normal distribution as follows
14
ˆ ~ N ( , T )
(24)
15
for a large sample size N. In order to estimate the variance of ˆ , we need to estimate the
16
covariance matrix ( i.e., variance of and covariance among aˆr , bˆr , and ˆ 2 ).
16
1
Based on the Central Limit Theorem, Xu and Gertner [11] showed that the
2
estimated cosine Fourier coefficient aˆr and sine Fourier coefficient bˆr in eq.(16)
3
approximately follow normal distributions for a large sample size N . Namely,
aˆr ~ N (ar ,V (aˆr )), bˆ ~ N (b , V (bˆ )),
4
r
r
(25)
r
5
where V (aˆ (r1..)rn ) and V (bˆr(1..)rn ) are the variances of a ( ) r1 ..rn and b( ) r1 ..rn , respectively. In
6
view that {1( j ) ,..., n ( j ) , j 1,..., N } are random samples drawn in the -space, V (aˆ (r1..)rn )
7
and V (bˆ(r1..)rn ) can be empirically estimated based on eq. (16) as follows,
8
1 Vˆ (aˆr ) 2 N 1 Vˆ (bˆr ) 2 N
9
N
[g
2
(G (1( j ) ),..., G ( n ( j ) )) cos 2 (r11( j ) rn n ( j ) )]
2
(G (
j 1 N
[ g
( j) 1
j 1
),..., G ( n )) sin (r ( j)
2
( j) 1 1
rn n
( j)
1 (aˆr ) 2 , N
(26)
1 )] (bˆr ) 2 . N
In order to estimate the standard errors of sensitivity indices, we also need to estimate the
10
covariance between any two Fourier coefficients. Using eq. (16), we can estimate the
11
covariance between cosine Fourier coefficients aˆ (r ) and sine Fourier coefficients bˆr(' )
12
(where r (r1 , , rn ) and r ' is another vector of harmonics which could be different from
13
or the same as r ) as follows (see Appendix B for details),
14
1 1 Cov ( aˆ r , bˆr ' ) E[ g 2 (1 , n ) cos( r ( ) T ) sin( r '( )T ] ar br ' . N N
(27)
15
Based the above equation, an empirical estimation of covariance between cosine Fourier
16
coefficients aˆ (r ) and sine Fourier coefficient bˆr(' ) can be derived,
17
ˆ (aˆ , bˆ ) 1 Cov r r' N2
N
g j 1
2
1 (1( j ) ,..., n ( j ) ) cos(r ( ( j ) )T ) sin(r '( ( j ) )T ) aˆ r bˆr ' . N
17
(28)
1
Similarly, covariance between two cosine Fourier coefficients aˆ (r ) and aˆ (r' ) ( r r ' ) and
2
covariance between two sine Fourier coefficients bˆr( ) and bˆ(r' ) ( r r ' ) can be calculated
3
by the following two equations,
4
5
ˆ (aˆ , aˆ ) 1 Cov r r' N2 ˆ (bˆ , bˆ ) 1 Cov r r' N2
N
g j 1
N
g j 1
2
2
1 (1( j ) , n ( j ) ) cos(r ( ( j ) )T ) cos(r '( ( j ) )T ) aˆ r aˆ r ' , N
(29)
1 (1( j ) , n ( j ) ) sin(r ( ( j ) )T ) sin(r '( ( j ) )T ) bˆr bˆr ' , N
(30)
6
respectively. For the standard error of ˆ 2 , based on eq. (21) and the Central Limit
7
Theorem, ˆ 2 approximately follow a normal distribution for a large sample size N.
8
Namely,
9 10
ˆ 2 ~ N ( 2 ,
1 V [ g (1 ,..., n ) g (1 ,..., n )]2 ) . N 1
(31)
Therefore, an empirical estimation of variance of ˆ 2 [i.e., V( ˆ 2 ) ] can be calculated as 1 ˆ V [ g (1 ,..., n ) g (1 ,..., n )]2 N 1 N 1 1 [ g (1( j ) ,..., n ( j ) ) g (1 ,..., n )]4 (ˆ 2 ) 2 N ( N 1) j 1 N 1
Vˆ (ˆ 2 ) 11
(32)
1 [uˆ (4) (ˆ 2 ) 2 ], ( N 1)
12
where uˆ (4) is the estimated fourth central moment for g (1 ,..., n ) . Finally, the
13
covariance between cosine Fourier coefficient and ˆ 2 can be estimated as follows (see
14
Appendix C for details),
Cov (aˆr , ˆ 2 ) 15
ar 2 1 T 2 . E{g (1 ,..., n ) cos(r ( ) )[ g (1 ,..., n ) g (1 ,..., n )] } ( N 1) N
18
(33)
1
Thus, we can get an empirical estimation of covariance between cosine Fourier
2
coefficient and ˆ 2 with the following equation,
ˆ (aˆ , ˆ 2 ) Cov r 3
N aˆ ˆ 2 (34) 1 {g (1( j ) ,..., n ( j ) ) cos( r ( ( j ) )T )[ g (1( j ) ,..., n ( j ) ) g (1 ,..., n )]2 } r . N ( N 1) j 1 N
4
Similarly, we can derive an empirical estimation of variance between sine Fourier
5
coefficient and ˆ 2 as follows, Cov (bˆr , ˆ 2 )
6
N 1 aˆrˆ 2 (35) ( j) T ( j) ( j) ( j) ( j) 2 { ( ,..., ) sin( ( ) )[ ( ,..., ) ( ,..., )] } . g r g g 1 n n n 1 1 N ( N 1) j 1 N
7
ˆ among Fourier coefficients and sample variance With the estimated covariance matrix
8
of output y based on eq. (27), eq. (28), eq. (29), eq. (30), eq. (32), eq. (34), and eq. (35),
9
we have an empirical estimation of the standard error of sensitivity index ˆ as follows
ˆ (ˆ ) ˆ ˆ ˆ T
10 11
(36)
where
12
(aˆr 2 bˆr 2 ) 2aˆ ˆ 2b (M ) . ˆ { 2r } |rH ( M ) ,{ 2r }|rH ( M ) , r H (ˆ 2 ) 2
13
Based on eq. (36), we can see that the standard errors of sensitivity indices depend on
14
three components: 1) the magnitude of Fourier amplitudes, which depend on parametric
15
importance in terms of uncertainty contribution (i.e., the true value of sensitivity indices);
16
2) the magnitude of model outputs and variance of model outputs; and 3) the random
17
ˆ , which depends on sample size N. error measured by
19
1
The FAST method gives a biased estimator of partial variances contributed by
2
parameters due to the random noises introduced by sampling [11]. Thus, selection of the
3
maximum harmonic order (M) in eq. (19) is very important for the estimation of
4
sensitivity indices. If we select a very large M, it may cause overestimation by
5
incorporating much noise for those Fourier amplitudes at high harmonics. If we select a
6
low M, then it may cause too much underestimation if the model is highly nonlinear
7
(which will still have signals at high harmonics). Xu and Gertner [11] recommended
8
using a reasonably large M (specifically, M<50 for first-order sensitivity indices, M<5 for
9
second-order sensitivity indices, M<3 for third-order sensitivity indices) and estimating 2
bˆ ) by only using those sine and cosine 2
Fourier amplitudes (i.e., Cˆ (r1..)rn aˆ (r1..)rn
11
Fourier coefficients significantly larger or smaller than zero based on hypothesis tests.
12
The hypothesis tests are based on a z-score statistic of aˆr and bˆ r as follows Za
13
( ) r1 ..rn
2
10
aˆr , Vˆ (aˆ ) r
(37)
bˆr Zb , Vˆ (bˆ ) r
14
which will follow standard normal distributions (assuming a relative large sample size N).
15
The incorporation of only Fourier coefficients (i.e., aˆr and bˆ r ) significantly larger or
16
smaller than zero will not affect the proposed method of standard error estimation.
17
Namely, we only need to calculate the standard errors based on those Fourier coefficients
18
significantly larger or smaller than zero.
20
1 2 3
2.3
Standard errors of sensitivity indices for random balance design sampling For the random balance design sampling, we first draw N grid samples for each
parameter in the -space, {1(1) ,..., n ( N ) | i ( j ) 0
4
2 ( j 1), j 1,..., N } N
(38)
5
where 0 is randomly drawn from [0,
6
explore between 0 and 2 . Then the samples are randomly permuted to form N samples
7
in the -space. In this way, random balance design sampling draws random samples in
8
the high-dimensional -space similar to simple random sampling. Therefore, estimation
9
errors for Fourier coefficients using random balance design will be similar to those based
N
] so that the grid sample for i can ergodically
10
on simple random sampling. However, by using grid samples for each individual
11
parameter, estimation errors for Fourier coefficients used to calculate partial variances
12
contributed by main effects are different from those based on simple random sampling.
13
Based on the definition of Fourier coefficients C ( ) 00ri 00 in eq.(11) and eq. (12), we have C ( ) 00ri 00 E ( g (G (1 ),..., G ( n ))e i ( rii ) ) 2
2
1 ( ) n g (G (1 ),..., G ( n ))e i ( rii ) d1 d n 2 0 0 1 2 E ( g (G (1 ),..., G ( n )) | i )e i ( rii ) di 0 2 1 2 u (i )e i ( rii ) di 2 0 Ei (u (i )e i ( rii ) )
14
15 16 17
where u (i ) E ( g (G (1 ),..., G ( n )) | i ) . Let model output
21
(39)
y u ( i ) i
1
(40)
2
where i is a random error independent from i with E ( i ) 0 . Using eq. (40), it can
3
be easily shown that V ( i ) V ( y ) V [ g (G (1 ),..., G ( n )) | i ]
4
5
Based on eq. (40), the Fourier coefficients can be estimated as 1 ) Cˆ (00.. ri ..00 N
N
(g (G (
( j) 1
j 1
( j) 1 N 1 u ( i ( j ) )e i ( rii ) N j 1 N Cˆ ( ,u ) Cˆ ( , )
00 ri 00
8
),..., G ( n ( j ) ))e i ( rii
6
7
(41)
V ( y ) Vxi
( j)
N
(
j 1
( j) i
)
)
)e i ( rii
( j)
)
(42)
00 ri 00
where 1 Cˆ (00 ,uri)00 N 1 Cˆ (00 , ri )00 N
N
u ( j 1
( j)
)e i ( rii
( j)
)e i ( rii ) .
i
N
( j 1
i
( j)
)
,
( j)
9 10
) Based on eq. (42), it indicates that the standard error of Fourier coefficient Cˆ (00.. ri ..00 is
11
resulted from estimation errors of both C ( ,u ) 00ri 00 and C ( , ) 00ri 00 . For
12
( , ) Cˆ (00 , ri )00 aˆ (00 , ri )00 ibˆ00 ri 00 with
aˆ (00 , ri )00 13
14
1 N
1 ( , ) bˆ00 ri 00 N
N
( j 1
( j)
) cos(rii ( j ) ),
( j)
) sin(rii ( j ) ),
i
N
( j 1
i
based on the Central Limit Theorem, we have
22
aˆ (00 , ri )00 ~ N (0, 1
bˆ
( , ) 00 ri 00
~ N (0,
V ( i ) 2N V ( i ) 2N
), (43)
),
2
for a relatively large sample size N. This indicates that the estimation error of C ( , ) 00ri 00
3
decreases at an order of
4
C ( ,u ) 00ri 00 (where C ( ,u ) 00ri 00 E[u (i )e i ( rii ) ] ), since we use grid samples for each
5
individual parameter, the estimation errors decrease as an order
6
sample size N (see Xu and Gertner [11] for details). This suggests that the estimation
7
error for C ( , ) 00ri 00 is substantially larger than that for C ( ,u ) 00ri 00 for a relatively large
8
sample size N. Thus, we can ignore the estimation error for C ( ,u ) 00ri 00 . Finally, for
9
) ˆ ˆ Cˆ (00.. ri ..00 a00..ri ..00 i b00..ri ..00 , we have
10
1 with an increasing sample size N. For estimation of N
V ( i ) aˆ00ri 00 ~ N a00ri 00 , , 2N V ( i ) bˆ00ri 00 ~ N b00ri 00 , , 2N
1 with an increasing N2
(44)
11
in view that C ( ,u ) 00ri 00 C ( ) 00ri 00 by eq. (39). To estimate V ( i ) based on eq. (41), we
12
need to first estimate Vxi . Following Xu and Gertner [11], a conservative preliminary
13
estimation of Vxi can be calculated as follows, M
14
Vxi | Cˆ (00 ) ri 00 |2
(45)
|ri | 1
23
1
where M is a relatively small harmonic order (e.g., 4), under which the Fourier
2
amplitudes have relatively small proportions of errors. In order to improve the estimation
3
accuracy of V ( xi ) , the statistical tests with eq. (37) for simple random sampling can be
4
used to select those cosine and sine Fourier coefficients significantly larger or smaller
5
than zero for the estimation of Fourier amplitudes (i.e., | Cˆ (00 ) ri 00 |2 (a00ri 00 ) 2 (b00ri 00 ) 2 ).
6
Finally, V ( i ) can be estimated as Vˆ ( i ) Vˆ ( y ) Vxi .
7
(46)
8
It can be shown that the covariance between cosine Fourier coefficient a00..ri ..00 and sine
9
Fourier coefficient b00..ri '..00 is zero (where the harmonic ri can be the same as or different
10
from ri ' ,| ri |,| ri ' | 1,..., M , see Appendix D for details). We can also show that the
11
covariance between a00..ri ..00 and a00..ri '..00 , covariance between b00..ri ..00 and b00..ri '..00 are both
12
zero (where ri ri ' ,| ri |,| ri ' | 1,..., M , see Appendix D for details).
13
For the covariance between cosine Fourier coefficients aˆ00ri 00 and sample
14
variance ˆ 2 in simple random sampling for first-order sensitivity indices, based on eq.
15
(42), we have
16
Cov(aˆ00ri 00 , ˆ 2 ) Cov(aˆ (00 , uri )00 , ˆ 2 ) Cov(aˆ (00 , ri )00 , ˆ 2 ).
(47)
17
Notice that the covariance estimation for Cov(aˆ00ri 00 , ˆ 2 ) based on eq. (33) is mainly
18
dependent on the signal of model output y for i at harmonic r (0,..., 0, ri , 0...0) with
19
ri 1,..., M . Since aˆ (00 , ri )00 is estimated based on the part of model output y with no signal
20
for i (i.e., i which is independent from i , see eq. (40) for details), the second term
24
1
in right-hand side of eq. (47) [i.e., Cov(aˆ (00 , ri )00 , ˆ 2 ) ] will be close to zero. Therefore, we
2
have
Cov(aˆ00ri 00 , ˆ 2 ) Cov(aˆ (00 , uri )00 , ˆ 2 ).
3 4
At the same time, for a grid sample of parameter xi in random balance design sampling,
5
aˆ (00 , uri )00 has a substantially lower estimation error compared to that of aˆ (00 , ri )00 [11].Thus,
6
we can treat aˆ (00 , uri )00 as a constant for a relatively large sample size of N and we have
7
Cov(aˆ00ri 00 , ˆ 2 ) Cov(aˆ (00 , uri )00 , ˆ 2 ) 0.
8
Please see Appendix E for a proof of above equation. Similarly, we can also show that
.
Cov(bˆ00ri 00 , ˆ 2 ) 0 .
9 10
ˆ becomes a diagonal matrix and the standard error of Finally, the covariance matrix
11
first-order sensitivity index ˆ xi can be estimated based on delta method as follows (see
12
Appendix F for details),
ˆ x 2 1 (1 ˆ xi )ˆ xi 2i [uˆ (4) (ˆ 2 )2 ]. ˆ N N ( 1) 2
ˆ (ˆ x )
13
i
(48)
14
Eq. (48) suggests that the standard errors of sensitivity indices calculated by FAST
15
analysis depend on four factors. They are 1) the magnitude of sensitivity indices, 2)
16
sample variance ( ˆ 2 ) of the model output, 3) fourth central moment ( uˆ (4) ) of the model
17
output, and 4) the sample size N.
18
2.4
19 20
Procedures In this section, we specifically provide a summary of the procedure for FAST
analysis and corresponding standard error estimation for the calculated sensitivity indices.
25
1
2
FAST procedure using simple random sampling a. Draw
independent
random
samples
in
the
-space
( K n {1 ,..., n | 0 i 2 , i 1,..., n} ).
3 4
b. Generate corresponding random samples in the parameter space using search functions in eq. (2) and the sample in the -space.
5 6
c. Run the model using parameter samples from b.
7
d. Calculate the Fourier coefficients based on eq. (16) and the corresponding
8
sensitivity indices with eq. (22). Statistical tests with eq. (37) are used to
9
select Fourier coefficients significantly larger or smaller than zero.
10
ˆ among Fourier coefficients and sample e. Calculate the covariance matrix
11
variance of output y based on eq. (27), eq. (28), eq. (29), eq. (30), eq. (32),
12
eq. (34), and eq. (35).
13 14 15 16 17 18
f. Estimate standard errors of sensitivity indices using eq. (36).
FAST procedure using random balance design sampling a. Draw N grid samples for { i } using eq. (38), which are then randomly permuted to form a random sample in the -space. b. Generate a corresponding random sample in the parameter space using search functions in eq. (2) and the sample in the -space.
19
c. Run the model using parameter samples from b.
20
d. Calculate the Fourier coefficients based on eq. (16) and the corresponding
21
sensitivity indices with eq. (22). Statistical tests with eq. (44) are used to
22
select Fourier coefficients significantly larger or smaller than zero for first
23
order sensitivity indices calculation, and statistical tests with eq. (37) are
26
1
used to select Fourier coefficients significantly larger or smaller than zero
2
for higher-order sensitivity indices calculation.
3
ˆ among Fourier coefficients and sample e. Calculate the covariance matrix
4
variance of output y using eq. (44) and eq. (32) for first order sensitivity
5
ˆ equal to a diagonal matrix, and using eq. (27), eq. (28), eq. indices with
6
(29), eq. (30), eq. (32), eq. (34), and eq. (35) for higher order sensitivity
7
indices.
8
f. Estimate standard errors of first order and higher order sensitivity indices
9 10 11
using eq.(48) and eq. (36), respectively.
3
Applications In order to test how good the delta method approximates the standard errors of
12
sensitivity indices, we use a simple test model and a realistic complex model. The simple
13
test model is as follows 3
14
y (i xi ) x1 x2 2 x2 x3 3 x1 x3 x1 x2 x3 2
(49)
i 1
15
where x1 , x2 , and x3 are three independent parameters for the model. We assume all
16
parameters follow standard normal distributions. Although the model is simple, it is
17
representative since it is nonlinear and non-monotonic. The analytical sensitivity indices
18
have been derived by Xu and Gertner [11] for this model.
19
For the complex model, we applied the delta method to a world population model,
20
World 3 [23]. The World 3 model is a computer program for simulating the interactions
21
among population, industrial growth, food production and limits in the ecosystems of the
22
Earth. The model consists of six main systems: food system, agriculture system,
27
1
industrial system, population system, nonrenewable resources system and pollution
2
system. In this paper, we are concerned with the industrial system, which can provide the
3
products for world population [23]. At the same time, this industrial system creates
4
pollution, which reduces the land productivity. For details, please refer to [23]. The seven
5
parameters of interest in the model are shown in Table 1. The output of interest is the
6
world human population on a 5-year basis. For real application models, the ranges,
7
distributions and correlations among parameters can be derived from empirical or
8
historical data [24,25]. However, for the simplification of our test case, we assume a
9
uniform distribution for each parameter. The bounds for each parameter are assumed to
10
be a 10% deviation of the central value (Table 1). Since the FAST analysis does not allow
11
for the calculation of higher-order sensitivity indices for correlated parameters, in this
12
paper, we assume the independence among parameters.
13
Xu and Gertner [11] showed that, for partial variances contributed by main effects
14
of parameters in simple random sampling, and partial variances contributed by higher-
15
order interactions in random balance design sampling, the estimation bias for Fourier
16
amplitude C ( ) r1 ..rn
17
E[ g (G (1 ),..., G ( n ))]2 . Since the sensitivity index is calculated based on a ratio of sum N
18
of Fourier amplitudes C ( ) r1 ..rn
19
details), it is possible that the calculated sensitivity index is larger than one if
20
E[ g (G (1 ),..., G ( n ))]2 is much larger than 2 in a model. To overcome this problem, Xu N
21
and Gertner [11] recommended that the model output be centered by sample mean and/or
2
is related to the expected square of model output
2
to the variance of model output 2 (see eq. (9) for
28
E[ g (G (1 ),..., G ( n ))]2 is N
1
scaled by standard deviation of the model output, so that
2
generally smaller than the variance 2 . For the World 3 model, the variance of model
3
E[ g (G (1 ),..., G ( n ))]2 , especially at the beginning of output is much smaller than N
4
model simulation. In order to overcome this potential large estimation bias, we subtracted
5
the model output by its sample mean.
6
In order to test the reliability of standard errors derived based on the delta method, we
7
used the standard deviations of the sensitivity indices based on 20 replicates of samples
8
as references. In our preliminary analysis, we found that 20 replicates is enough for
9
reasonable approximation of the standard errors (i.e., the increase in the number of
10
replicates does not substantially affect the standard error estimation).
11
4
Results
12
For the simple test model, our results showed that the delta method can provide a
13
good approximation for standard errors of both first-order and higher-order sensitivity
14
indices in simple random sampling (Figure 1) as well as random balance design sampling
15
(Figure 2). The approximation is better for a larger sample size since the delta method is
16
dependent on the large sample size assumption. We can also see that the standard errors
17
of first-order sensitivity indices for random balance design sampling are substantially
18
reduced compared to those based on simple random sampling. Finally, the standard errors
19
are generally higher for larger sensitivity indices. However, the coefficients of variations
20
(i.e., the precision relative to the mean values of sensitivity indices) are much lower
21
(Figure 1a and Figure 3).
29
1
For the World 3 model, parameters x2, x3 and x5 have relative high first-order
2
sensitivity indices before year 2000. After that, the interaction between x2 and x3 and
3
interaction between x3and x5 become important (Figure 4). We derived the sensitivity
4
indices for all first-order, second-order and third-order interactions based on a single
5
sample of size N (N = 1000 or N = 5000). For convenience, we did not plot those
6
sensitivity indices less than 0.05. The standard errors of first-order and higher-order
7
sensitivity indices estimated based on the delta method can reasonably approximate those
8
based on 20 sample replicates for both simple random sampling (Figure 4 and Figure 5)
9
and random balance design sampling (Figure 6 and Figure 7). The increase of sample size
10
from 1000 to 5000 decreases the estimated standard errors for both simple random
11
sampling (Figure 4 and Figure 5) and random balance design sampling (Figure 6 and
12
Figure 7). However, the difference in standard errors for first-order sensitivity indices
13
between random balance design sampling and simple random sampling is not so evident
14
compared to that based on the test model using eq.(49). There are two reasons for that.
15
First, the centered output values reduce the magnitude of possible random errors by
16
reducing the bias term
17
variations of the model output at the beginning of the simulation so that the random
18
balance design sampling does not increase much in sampling efficiency compared to
19
simple random sampling. After that (about 100 years after initial simulation year), the
20
random balance design sampling does increase the precision (i.e., reduce the standard
21
errors) slightly for first-order sensitivity indices of parameters x2 and x3 (compare Figure
22
4 and Figure 6, Figure 5 and Figure 7).
E[ g (G (1 ),..., G ( n ))]2 . Second, there is only a small amount of N
30
1 2
5
Discussion Due to the random errors introduced by sampling, it is important that uncertainty and
3
sensitivity techniques provide a measure of reliability of estimated sensitivity indices
4
(commonly using the standard errors of estimated sensitivity indices). For complex
5
models which take a long time to run a single simulation, it is computationally expensive
6
or infeasible to estimate standard errors of sensitivity indices based on replicates of a
7
relative large sample (e.g., 10 or 20 replicates). The delta method can provide a good
8
approximation of standard errors of sensitivity indices at no additional cost of the model
9
simulations. This can substantially increase the efficiency of FAST analysis for complex
10 11
models. In another way, the delta method can provide the user with a general approach to
12
obtain the minimum sample size to achieve a desired precision level for a specified
13
sensitivity index using the random balance design sampling. For a model with a sample
14
variance of output y ( ˆ 2 ) and a 4th central moment ( uˆ (4) ) (estimated based on a
15
preliminary sample), if we want to have a desired standard error ˆ (ˆ xi ) of a specified
16
sensitivity index (e.g., ˆ (ˆ xi ) is less than 50% of a sensitivity index of 0.02), the
17
minimum sample size N ( Nˆ min ) can be approximately estimated based on eq. (48) as
18
follows,
19
Nˆ min
ˆ xi 1 ˆ ˆ (1 ) [uˆ (4) (ˆ 2 ) 2 ]. 2 x x 2 2 i i ˆ ˆ (ˆ xi ) ˆ (ˆ xi ) 2
2
(50)
20
Eq. (50) can provide the model user an operational approach to determine the minimum
21
sample size to achieve a desired precision level for a specified sensitivity index. Namely,
22
the model user can first have a preliminary estimation of ˆ 2 and uˆ (4) based on a 31
1
relatively small sample size (dependent on simulation time of the model), which will take
2
less computational time. Then, the user can define a desired standard error for a
3
sensitivity index of interest. For example, for the last simulation year in World3 model,
4
we have ˆ 2 = 1.605 (10+18) and uˆ (4) = 6.518 (10+36). In order to achieve a standard error
5
of 0.03 for a sensitivity index of 0.5, a minimum sample size of 981 is required based on
6
eq. (50) (see Figure 8). As far as we know, this is the first statistical approach proposed to
7
determine the sample size for an uncertainty and sensitivity analysis method based on
8
random balance design sampling.
9
We need to point out that the sample size based on eq. (50) is only a minimum
10
sample size from the standard error perspective. One key assumption in FAST is that the
11
model output y will become a multiple periodic function for parameters in the -space.
12
For models with many parameters, if a parameter has a relatively low contribution to
13
uncertainty in a model output, then the periodic signal may easily be contaminated by the
14
random noise (e.g., i in eq. (50)) due to other parameters. In order to detect and
15
distinguish the signals for different parameters, a relatively large sample size is required.
16
Currently we do not have an analytical form to calculate the minimum required sample
17
size for that. Based on our experiences, a sample with size of 1000 is generally good for
18
models with less than 20 parameters. A statistical robust method to achieve a desired
19
precision level for estimated sensitivity indices is especially important for complex
20
models since it is computationally inefficient for trial-error approaches (i.e., try many
21
replicates at different samples sizes to get the desired precision). Thus, our proposed
22
method is a first step toward a precise determination of sample size for FAST
23
applications in complex models.
32
1
The proposed standard error estimation for the sensitivity indices is mainly based on
2
the assumption that the model parameters are independent. In practices, it is common that
3
the parameter can be correlated. Xu and Gertner [12,13] proposed using a simple random
4
reordering approach to generate samples for parameters with a specific rank correlation
5
structure. The main idea is that, after generating samples for (1 ,.., n ) [using simple
6
random sampling approach or random balance design sampling approach] and the
7
corresponding samples for parameters ( x1 ,.., xn ) with the search function of eq. (2),
8
sampled parameter values are reordered to honor a specified correlation structure using
9
Iman and Conover’s method [26]. Samples of (1 ,.., n ) are reordered correspondingly
10
based on the order of parameter samples. For first order sensitivity indices, it would be
11
easy to extend our proposed approaches for standard error estimation to model with
12
correlated parameters. For a model y g ( x1 ,.., xn ) with dependent parameters, the model
13
output can be decomposed as follows for a specific model parameters xi ,
14 15
y u ( xi ) ( xsub )
(51)
u ( xi ) E ( g ( x1 ,..., xn ) | xi )
(52)
where
16 17
and ( xi ) is the random error that arises due to the uncertainty in the parameters except
18
xi with
19
E ( ( xi ) ) 0 V ( ( xi ) ) V(y )- V( E ( g ( x1 ,..., xn ) | xi )).
33
(53)
1
With the search function eq. (2), we can see that the model output y becomes a periodic
2
function of i using eq. (51). Based on a Fourier transformation over i , the partial
3
variance contributed by parameter xi can be calculated as [11] M
Vxi | C (i
4
|ri | 1
5
u 1 where Cˆ (rii ) N
N
g (G( j 1
( j) 1
u
) ri
|2 ,
(54)
),..., G ( n ( j ) ))e i ri i . With the sine and cosine Fourier ( j)
6
u coefficient of Cˆ (rii ) , we can estimate the first-order sensitivity indices of all parameters
7
using eq. (22). It is noteworthy that the decomposition of y in the parameter space with eq.
8
(51) is equivalent to the decomposition in -space with eq. (40). Therefore, we can still
9
estimate standard errors of first order sensitivity indices using eq.(48) for models with
10
correlated parameter based on random balance design sampling. The decomposition in -
11
space with eq. (40) is a general decomposition and is also valid for simple random
12
sampling for first-order sensitivity indices. Therefore, eq. (36) can be still used to
13
estimate standard errors of first order sensitivity indices for models with correlated
14
parameter based on simple random sampling. Due to the difficulty to distinguish
15
correlated effects and interaction effects, currently FAST analysis has not been proposed
16
for calculation of higher order sensitivity indices in models with correlated parameters.
17
6
18
Acknowledgement This study was supported by U.S. Department of Agriculture McIntire-Stennis funds
19
(MS 875-359) and NIH grant R01-AI54954-0IA2. We thank two anonymous reviewers
20
for their very helpful comments which substantially improved this paper.
34
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
References: 1. A. Saltelli, S. Tarantola, and F. Campolongo, Sensitivity analysis as an ingredient of modeling. Statistical Science 15 (2000), pp. 377-395. 2. A. Saltelli, M. Ratto, S. Tarantola, and F. Campolongo, Sensitivity analysis for chemical models. Chem Rev 105 (2005), pp. 2811-2826. 3. A. Saltelli, K. Chan, and M. Scott, Sensitivity Analysis. John Wiley and Sons, West Sussex, 2000. 4. A. Saltelli, and S. Tarantola, On the relative importance of input factors in mathematical models: safety assessment for nuclear waste disposal. J Am Stat Assoc 97 (2002), pp. 702-709. 5. R.I. Cukier, J.H. Schaibly, and K.E. Shuler, Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. III. Analysis of the approximations. J Chem Phys 63 (1975), pp. 1140-1149. 6. R.I. Cukier, H.B. Levine, and K.E. Shuler, Nonlinear sensitivity analysis of multiparameter model systems. Journal of Computational Physics 26 (1978), pp. 1-42. 7. J.H. Schaibly, and K.E. Shuler, Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. II. Applications. J Chem Phys 59 (1973), pp. 3879-3888. 8. R.I. Cukier, C.M. Fortuin, K.E. Shuler, A.G. Petschek, and J.H. Schaibly, Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. I. Theory. J Chem Phys 59 (1973), pp. 3873-3878. 9. E. Borgonovo, Measuring uncertainty importance: investigation and comparison of alternative approaches. Risk Anal 26 (2006), pp. 1349-1361. 10. S. Tarantola, D. Gatelli, and T.A. Mara, Random balance designs for the estimation of first order global sensitivity indices. Reliab Eng Syst Safe 91 (2006), pp. 717727. 11. C. Xu, and G.Z. Gertner, Understanding and comparisons of different sampling approaches for the Fourier Amplitudes Sensitivity Test (FAST). Comput Stat Data An Accepted (2010), pp. 999. 12. C. Xu, and G.Z. Gertner, Extending a global sensitivity analysis technique to models with correlated parameters. Comput Stat Data An 51 (2007), pp. 5579-5590. 13. C. Xu, and G.Z. Gertner, A general first-order global sensitivity analysis method. Reliab Eng Syst Safe 93 (2008), pp. 1060-1071. 14. A. Saltelli, S. Tarantola, and K.P.S. Chan, A quantitative model-independent method for global sensitivity analysis of model output. Technometrics 41 (1999), pp. 3956. 15. R. Scheller, and D. Mladenoff, An ecological classification of forest landscape simulation models: tools and strategies for understanding broad-scale forested ecosystems. Landsc Ecol 22 (2007), pp. 491-505. 16. H.S. He, Forest landscape models: Definitions, characterization, and classification. For Ecol Manag 254 (2008), pp. 484-498. 17. C. Xu, G.Z. Gertner, and R.M. Scheller, Uncertainties in the response of a forest landscape to global climatic change. Glob Change Biol 15 (2009), pp. 116-131.
35
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
18. P. Reiter, Oviposition, dispersal, and survival in Aedes aegypti: implications for the efficacy of control strategies. Vector Borne Zoonot Dis 7 (2007), pp. 261-274. 19. M. Otero, N. Schweigmann, and H.G. Solari, A stochastic spatial dynamical model for Aedes aegypti. Bull Math Biol 70 (2008), pp. 1297-1325. 20. K. Magori, M. Legros, M.E. Puente, D.A. Focks, T.W. Scott, et al., Skeeter Buster: a stochastic, spatially-explicit modeling tool for studying Aedes aegypti population replacement and population suppression strategies. Plos Neglect Trop Dis 3 (2009), pp. e508. 21. S.F. Fang, G.Z. Gertner, S. Shinkareva, G.X. Wang, and A. Anderson, Improved generalized Fourier Amplitude Sensitivity Test (FAST) for model assessment. Stat Comput 13 (2003), pp. 221-226. 22. Y. Lu, and S. Mohanty, Sensitivity analysis of a complex, proposed geologic waste disposal system using the Fourier Amplitude Sensitivity Test method. Reliab Eng Syst Safe 72 (2001), pp. 275-291. 23. D.H. Meadows, D.L. Meadows, and J. Randers, Beyond the Limits. Chelsea Green Publishing Company, Post Mills, Vermont, 1992. 24. R.L. Iman, M.E. Johnson, and T.A. Schroeder, Assessing hurricane effects. Part 1. Sensitivity analysis. Reliab Eng Syst Safe 78 (2002), pp. 131-145. 25. A. Kanso, G. Chebbo, and B. Tassin, Application of MCMC-GSA model calibration method to urban runoff quality modeling. Reliab Eng Syst Safe 91 (2006), pp. 1398-1405. 26. R.L. Iman, and W.J. Conover, A distribution-free approach to inducing rank correlation among input variables. Commun Stat-Simul C 11 (1982), pp. 311-334. 27. E.L. Lehmann, Elements of large-sample theory. Springer, New York, 1999.
36
1
Tables and Figures
2 3 4 5 6
Table 1 Parameter specifications of uniform distributions for World3 model Parameter
Label
Lower bound
Upper bound
x1
industrial output per capita desired
315
385
x2
industrial capital output ratio before 1995
2.7
3.3
0.387
0.473
0.387
0.473
fraction of industrial output allocated to x3 consumption before 1995 fraction of industrial output allocated to x4 consumption after 1995 x5
average life of industrial capital before 1995
12.6
15.4
x6
average life of industrial capital after 1995
16.2
19.8
x7
initial industrial capital
1.89 (10+11)
2.31 (10+11)
7
37
(a)
0.45 0.4 0.35
FAST
0.09
Delta
Reference
0.08
Reference
Standard error
0.3 Sensitivity
(b)
0.1
0.25 0.2 0.15
0.07 0.06 0.05 0.04 0.03
0.1
Parameters and their interactions
Parameters and their interactions
x1 x2 x3
x1 x3
x2 x3
x1 x2 x3
x2 x3
x1
x1 x2 x3
x2 x3
0 x1 x3
0.005
0 x3
0.05
x1 x2
0.01
x2
0.1
x1 x3
0.15
0.02 0.015
x1 x2
0.2
x3
0.25
0.03 0.025
x2
Standard error
0.3
x1
Reference
0.035
Reference
0.35 Sensitivity
Delta
0.04
FAST
0.4
1 2 3 4 5 6 7 8 9 10 11
(d)
0.045
0.45
x1 x2
Parameters and their interactions
(c)
0.5
x3
x1
x1 x2 x3
x2 x3
x1 x3
x1 x2
x3
x2
0 x1
0.01
0
x2
0.02
0.05
Parameters and their interactions
Figure 1 Estimated sensitivity indices and their standard errors based on simple random sampling. The left column [panel (a) for a sample size of 1000 and panel (c) for a sample size of 5000] shows the sensitivity indices and the right column [panel (b) for a sample size 1000 and panel (d) for a sample size of 5000] shows the standard errors. For the sensitivity indices in (a) and (c), the white bars represent the FAST-based sensitivity indices using a randomly selected replicate, while the grayed bars represent the analytical derived sensitivity indices. For the standard errors in (b) and (d), the white bars represent estimated standard errors based on the delta method. The grayed bars represent the standard errors of sensitivity indices calculated based on 20 sample replicates.
38
(a)
0.45 0.4
Delta
0.045
Reference
0.35
Reference
0.04 Standard error
0.3 Sensitivity
(b)
0.05 FAST
0.25 0.2 0.15
0.035 0.03 0.025 0.02 0.015
0.1
Parameters and their interactions
x1 x2 x3
x2 x3
x1 x3
(d)
0.025
0.4
x1 x2
Parameters and their interactions
(c)
0.45
Delta
FAST
0.35
Reference
0.02
Reference Standard error
0.3 Sensitivity
x3
x1
x1 x2 x3
x2 x3
x1 x3
x1 x2
x3
0 x2
0 x1
0.005
x2
0.01
0.05
0.25 0.2 0.15 0.1
0.015
0.01 0.005
0.05 0
Parameters and their interactions
1 2 3 4 5 6 7 8 9 10 11
x1 x2 x3
x2 x3
x1 x3
x1 x2
x3
x2
x1
x1 x2 x3
x2 x3
x1 x3
x1 x2
x3
x2
x1
0
Parameters and their interactions
Figure 2 Estimated sensitivity indices and their standard errors based on random balance design sampling. The left column [panel (a) for a sample size of 1000 and panel (c) for a sample size of 5000] shows the sensitivity indices and the right column [panel (b) for a sample size 1000 and panel (d) for a sample size of 5000] shows the standard errors. For the sensitivity indices in (a) and (c), the white bars represent the FAST-based sensitivity indices using a randomly selected replicate, while the grayed bars represent the analytical derived sensitivity indices. For the standard errors in (b) and (d), the white bars represent estimated standard errors using the delta method. The grayed bars represent the standard errors of sensitivity indices calculated based on 20 sample replicates.
39
0.45
Coefficient of variation
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05
x1 x2 x3
x2 x3
x1 x3
x1 x2
x3
x2
x1
0
Parameters and their interactions
1 2 3 4
Figure 3 Coefficients of variations for the test model in eq. (49) based on simple random sampling with a sample size of 5000.
40
.030
Standard error
.035
.4 .3 .2 .1
.020
(x2)
.015 .010
.005 0.0 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 .6 .05 Year Year .5 .04 .4 .3 .2 .1 1850 1900 1950 2000 2050 2100 2150 .16
Year
.10 .08 .06
(x3)
.02 .01
.030
.14 .12
.03
0.00 1850 1900 1950 2000 2050 2100 2150
Standard error
Sensitivity index
.025
Standard error
Sensitivity index Sensitivity index
.5
Year
.025 .020 .015
(x5)
.010 .005
.04
.02 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 Year
.035
Year
.030
.4 Standard error
Sensitivity index
.5
.3 .2 .1 0.0
.025 .020
(x2, x3)
.015 .010 .005
.16 .14 .12 .10 .08 .06 .04 .02 0.00
.025
Year
Standard error
Sensitivity index
0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150
1850 1900 1950 2000 2050 2100 2150
1 2 3 4 5 6
Year
.020 .015 .010
(x3, x5)
.005
0.000 1850 1900 1950 2000 2050 2100 2150
Year
Year
Figure 4 Estimated sensitivity indices (left column) and their standard errors (right column) using simple random sampling with a sample size of 1000. The solid lines indicate sensitivity indices and standard errors estimated based on 20 sample replicates. The dotted lines indicate sensitivity indices and standard errors estimated using a single sample of size 1000. The standard errors are estimated using the delta method.
41
.016 .014 .4 .012 .3 .010 (x2) .008 .2 .006 .004 .1 .002 0.0 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 .6 .020 Year Year .018 .5 .016 .014 .4 (x3) .012 .010 .3 .008 .006 .2 .004 .002 .1 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 Standard error
Standard error
.16 .012 Year Year .14 .010 .12 .008 .10 .08 (x5) .006 .06 .004 .04 .002 .02 0.00 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150
Standard error
Sensitivity index Sensitivity index
Year
Year .018 .016 .4 .014 .3 .012 (x2, x3) .010 .2 .008 .1 .006 .004 0.0 .002 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 .14 .012 Year .12 Year .010 .10 .008 .08 .06 .006 .04 (x3, x5) .004 .02 .002 0.00 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 .5
Standard error
Sensitivity index
Sensitivity index
Standard error
Sensitivity index
.5
Year
1 2 3 4 5 6 7
Year
Figure 5 Estimated sensitivity indices (left column) and their standard errors (right column) using simple random sampling with a sample size of 5000. The solid lines indicate sensitivity indices and standard errors estimated based on 20 sample replicates. The dotted lines indicate sensitivity indices and standard errors estimated using a single sample of size 5000. The standard errors are estimated using the delta method.
42
Standard error
.030
.5
Standard error
Sensitivity index Sensitivity index
.45 .035 .40 .030 .35 .025 (x2) .30 .020 .25 .015 .20 .010 .15 .10 .005 .05 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 .6 .035
.4 .3 .2
.025
(x3)
.020 .015 .010
.005 .1 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 .020 Year .018 .016 .014 .10 .012 (x5) .08 .010 .008 .06 .006 .004 .04 .002 .02 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 Standard error
.5 Sensitivity index
Year
.12
.04
Year
.3 .2 .1 0.0 1850 1900 1950 2000 2050 2100 2150 .16 .14 .12 .10 .08 .06 .04 .02 0.00
.03
(x2, x3) .02 .01
0.00 1850 1900 1950 2000 2050 2100 2150 .030
Year Standard error
Sensitivity index
Year
.4 Standard error
Sensitivity index
.14
Year
.025 .020 .015
(x3, x5)
.010 .005
0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150
1 2 3 4 5 6 7
Year
Year
Figure 6 Estimated sensitivity indices (left column) and their standard errors (right column) using random balance design sampling with a sample size of 1000. The solid lines indicate sensitivity indices and standard errors estimated based on 20 sample replicates. The dotted lines indicate sensitivity indices and standard errors estimated based a single sample of size 1000. The standard errors are estimated using delta method.
43
.014
.4
.012
Standard error
Sensitivity index
.5
.3 .2 .1
.010
(x2)
.008 .006 .004
.50 .014 .45 .012 .40 .010 .35 .008 .30 .006 .25 .004 .20 .15 .002 .10 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 Standard error
Sensitivity index
.002 0.0 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150
.010
.12
.008
Standard error
Sensitivity index
.14
(x3)
.10 .08 .06 .04
.006
(x5)
.004 .002
.02
0.00 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 .016 .014 .012 .3 .010 (x2, x3) .2 .008 .006 .1 .004 0.0 .002 0.000 1850 1900 1950 2000 2050 2100 2150 1850 1900 1950 2000 2050 2100 2150 .14 .012 .12 .010 .10 .008 .08 Standard error
.4
Standard error
Sensitivity index
Sensitivity index
.5
.06 .04 .02 0.00
.004 .002
1850 1900 1950 2000 2050 2100 2150
1 2 3 4 5 6 7
(x3, x5)
.006
0.000 1850 1900 1950 2000 2050 2100 2150
Year
Year
Figure 7 Estimated sensitivity indices (left column) and their standard errors (right column) using random balance design sampling with a sample size of 5000. The solid lines indicate sensitivity indices and standard errors estimated based on 20 sample replicates. The dotted lines indicate sensitivity indices and standard errors estimated based on a single sample. The standard errors are estimated using delta method.
44
10000
8000
Sample size
6000
4000
2000
0 0.00
1 2 3 4
.01
.02
.03
.04
.05
.06
Standard error
Figure 8 Minimum sample size required for specified standard errors for a sensitivity index of 0.5 for the last simulation year of World3 model.
45
Appendix A: Delta method
1 2
For a statistic TN with E (TN ) and V (TN ) 2 , D
N (TN ) N (0, 2 )
3 D
4
where indicates convergence in distribution as N , and N (0, 2 ) indicates a
5
normal distribution with mean zero and variance 2 . Then, for a function g ( ) where the
6
derivative g '( ) is not zero, based on the delta method, we have D
N ( g (TN ) g ( )) N (0,[ g '( )]2 2 ).
7 8
If TN is a multivariate vector with covariance matrix , and D
N (TN ) N (0, ) ,
9 10
for a function g ( ) with (1 ,..., m ) , the delta method states that D
N ( g (TN ) g ( )) N (0, T )
11 12 13
where
g ( ) g ( ) g ( ) g ( ) ,..., ,..., 0. , m 1 m 1
14
The delta method is based on a first-order Taylor series expansion. For a detailed proof,
15
please refer to Lehmann [27].
46
1 2 3
Appendix B: Covariance between estimated Fourier coefficients for simple random sampling
4
The covariance between cosine Fourier coefficients aˆ (r ) and sine Fourier coefficients bˆr(' )
5
(where r (r1 , , rn ) and r ' is another vector of harmonics which could be different from
6
or the same as r ) can be calculated as,
Cov(aˆ (r ) , bˆr(' ) ) ( j) T 1 N ( j) ( j) ( j) g r ( , ) cos( ( ) ) g (1( j ) , n ( j ) ) sin(r '( )T ] a ( ) r b ( ) r ' 1 n N j 1 j 1 N 1 ( j) ( j) (B1) 2 {E[ g 2 (1( j ) , n ( j ) ) cos(r ( )T ) sin(r '( )T ] N j 1 N (k) ( j) E[ g (1( j ) , n ( j ) ) cos(r ( )T ) g (1( k ) , n ( k ) ) sin(r '( )T )]} E[ 7
1 N
N
j , k 1, j k ,
a
( )
r
b ( ) r '
8
( j) ( j) ( j) where (1( j ) ,..., n ( j ) ) and ( )T represents the transpose of the vector . The
9
first term of the above equation can be simplified as
10
N 1 ( j) ( j) {E[ g 2 (1( j ) , n ( j ) ) cos(r ( )T ) sin(r '( )T ]} 2 N j 1 1 E[ g 2 (1 , n ) cos(r ( )T ) sin(r '( )T ]. N
(B2)
11
Since (1( j ) ,..., n ( j ) ) are independently drawn from (1( k ) , n ( k ) ) , the second term of eq.
12
(B1) can be simplified as follows,
47
1
N 1 ( j) T (k ) T ( j) ( j) (k ) (k ) E g r g r { [ ( , ) cos( ( ) ) ( , ) sin( '( ) )]} n n 1 1 N2 j , k 1, j k , N 1 ( j) (k ) 2 { E[ g (1( j ) , n ( j ) ) cos(r ( )T )]E[ g (1( k ) , n ( k ) ) sin( r '( )T )]} N j , k 1, j k ,
1 2 [ N ( N 1)a ( ) r b ( ) r ' ] N N 1 ( ) ( ) a r b r ' N 2
Finally, based on eq. (B1), (B2) and (B3), we have
Cov(aˆ (r ) , bˆr(' ) ) 1 E[ g 2 (1 , n ) cos(r ( )T ) sin(r '( )T ] N N 1 ( ) ( ) a r b r ' N a ( ) r b( ) r ' 1 1 E[ g 2 (1 , n ) cos(r ( )T ) sin(r '( )T ] a ( ) r b( ) r ' . N N 3
4
48
(B3)
1 2 3
Appendix C: Covariance between estimated Fourier coefficients and sample variance for simple random sampling
4
The covariance between the estimated cosine Fourier coefficients and sample variance ˆ 2
5
can be estimated as Cov(aˆr , ˆ 2 ) E[
6
1 N
N
1 N ( j) ,..., n ( j ) ) cos(r ( )T ) [ g (1( j ) ,..., n ( j ) ) g (1 ,..., n )]2 ] ar 2 N 1 j 1 N ( j) E{ g (1( j ) ,..., n ( j ) ) cos(r ( )T )[ g (1( j ) ,..., n ( j ) ) g (1 ,..., n )]2 }
g ( j 1
1 N ( N 1)
( j) 1
(C1)
j 1
N 1 ( j) E{ g (1( j ) ,..., n ( j ) ) cos(r ( )T )[ g (1( k ) ,..., n ( k ) ) g (1 ,..., n )]2 } N ( N 1) j ,k 1, j k ,
ar 2
7
The first term of the above equation can be simplify as N 1 ( j) E{ g (1( j ) ,..., n ( j ) ) cos(r ( )T )[ g (1( j ) ,..., n ( j ) ) g (1 ,..., n )]2 } N ( N 1) j 1 1 E[ g (1 ,..., n ) cos(r ( )T )[ g (1 ,..., n ) g (1 ,..., n )]2 ]. ( N 1)
8
9 10
Let u E[ g (1 ,..., n )] , based on the Central Limit Theorem, for a large sample size N, we have g (1 ,..., n ) ~ N (u,
11 12 13 14
2 N
)
and E ( g (1 ,..., n ) u ) 2
2 N
.
The second term of eq. (C1) can be simplified as
49
N 1 ( j) E{ g (1( j ) ,..., n ( j ) ) cos(r ( )T )[ g (1( k ) ,..., n ( k ) ) g (1 ,..., n )]2 } N ( N 1) j ,k 1, j k , N 1 ( j) { E[ g (1( j ) ,..., n ( j ) ) cos(r ( )T )]E[ g (1( k ) ,..., n ( k ) ) g (1 ,..., n )]2 } N ( N 1) j ,k 1, j k ,
1
ar E[ g (1 ,..., n ) g (1 ,..., n )]2 ar E[ g (1 ,..., n ) u u g (1 ,..., n )]2 ar {E[ g (1 ,..., n ) u ]2 E[u g (1 ,..., n )]2 2 E[( g (1 ,..., n ) u )( g (1 ,..., n ( k ) ) u )]} ar 2 . N Finally, we have ar 2
2
Cov(aˆr , ˆ 2 )
1 E{g (1 ,..., n ) cos(r ( )T )[ g (1 ,..., n ) g (1 ,..., n )]2 } ( N 1) ar 2
3
ar 2 N
ar 2
a 2 1 E{g (1 ,..., n ) cos(r ( )T )[ g (1 ,..., n ) g (1 ,..., n )]2 } r . N ( N 1)
4
Similarly, we can show that
5
Cov(bˆr , ˆ 2 )
1 b 2 E{g (1 ,..., n ) sin(r ( )T )[ g (1 ,..., n ) g (1 ,..., n )]2 } r . ( N 1) N
6
50
1
Appendix D: Covariance between estimated Fourier coefficients for random balance
2
design sampling
3
The covariance between cosine Fourier coefficients a00..ri ..00 and sine Fourier coefficient
4
b00..ri '..00 (where the harmonic ri can be the same as or different from ri ' ,| ri |,| ri ' | 1,..., M )
5
can be calculated as, Cov(aˆ00..ri ..00 , bˆ00..ri '..00 ) Cov[ Cov[
6
Cov[
N
1 N
g (1( j ) ,...,n( j ) ) cos(rii ), j 1 N
1 N
8
9
10
11 12
13
1 N
(u(i ( j ) ) i ( j ) ) cos(rii ), ( j)
j 1 N
1 N
(u( j 1
1 N 7
( j)
( j) i
) cos(rii ) ( j)
N
j 1
1 N
( j) 1
j 1
1 N
u (i ( j ) ) sin(r 'i i ) j 1
g (
(u ( j 1
N
1 N
N
j 1
( j) i
( j) i
) i ( j ) ) sin(ri 'i )]
( j) i
cos(ri 'i )]. ( j)
1 N
N
u( j 1
1 N
N
u( j 1
( j) i
) cos(rii ) and ( j)
1 N
1 N
N
j 1
N
u( j 1
( j) i
( j) i
( j) i
) cos(rii ) is ( j)
cos(rii ) for a relatively large sample size ( j)
(see Section 2.2 for details). Similarly, the estimation error of
substantially smaller than that of
( j)
( j)
By using a grid sampling for i , the estimation error of
substantially smaller than that of
( j)
cos(rii ),
( j) i
j 1
,..., n ( j ) ) sin(ri 'i )]
N
N
1 N
( j)
N
1 N
N
u ( j 1
( j) i
) sin(ri 'i ) is ( j)
sin(ri 'i ) .Thus, we treat ( j)
) sin(ri 'i ) as constants. In addition, based on ( j)
eq. (44), we have
E[ i cos(rii )] E[aˆ (00 , ri )00 ] 0, ( , ) E[ i sin(ri 'i )] E[bˆ00 ri '00 ] 0.
51
1
Thus, Cov(aˆ00..ri ..00 , bˆ00..ri '..00 ) is simplified as
Cov(aˆ00..ri ..00 , bˆ00..ri ..00 ) E[
1 N
N
( j ) cos(rii ) ( j)
i
j 1
( j)
N
N
1 N
j 1
( j) i
sin(ri 'i )] ( j)
N
( j) 1 E[ i ( j ) cos(rii ) i ( j ) sin(ri '(i )T )] 2 N j 1 j 1
1 N2 1 N2
E[ N
( j)
j 1
N
j , k 1, j k
i
2
cos(rii ) sin(ri 'i ) ( j)
( j)
E[ i ( j ) cos(rii ) i ( k ) cos(ri 'i )]
( j)
(k )
2 ( j) 1 E[ i ( j ) cos(rii ) sin(ri 'i )] N N ( j) (k ) 1 2 E[ i ( j ) cos(rii ) i ( k ) cos(rii ] N j , k 1, j k
2
3
2 1 E[ i cos(rii ) sin(ri 'i )] N N 1 E[ i cos(rii )]E[ i sin(ri 'i )]. N
Finally, in view that i is independent from i , we have Cov(aˆ00..ri ..00 , bˆ00..ri '..00 )
4
5
2 N 1 2 1 E[ i ]E[cos(rii ) sin(ri 'i )] E [ i ]E[cos(rii )]E[sin(ri ' )]. N N
Using the fact that E[cos(ri i ) sin( ri 'i )]
6
2
cos(r ) sin(r ' )d i i
i
i
i
0,
0
7 8 9 10 11
and
E[ i ] 0,
we have Cov(aˆ00..ri ..00 , bˆ00..ri '..00 ) 0. Similarly, we can show that the covariance between aˆ00..ri ..00 and aˆ00..ri '..00 ( ri ri ' ) is
52
Cov(aˆ00..ri ..00 , aˆ00..ri '..00 ) 1 2 3
2 1 E[ i ]E[cos(rii ) cos(ri 'i )] N 0, in view that
E[cos( rii ) cos( ri 'i )]
2
cos(r ) cos(r ' )d i i
i
i
i
0, for ri ri '.
0
4
5 6 7
The covariance between bˆ00..ri ..00 and bˆ00..ri '..00 can be calculated as Cov(bˆ00..ri ..00 , bˆ00..ri '..00 )
2 1 E[ i ]E[sin(rii ) sin(ri 'i )] N 0, in view that
E[sin( rii ) sin( ri 'i )]
2
sin(r ) sin(r ' )d i i
i
0
8
53
i
i
0, for ri ri '.
1
Appendix E: Covariance between estimated Fourier coefficients and sample
2
variance of model output for random balance design sampling
3
Using eq. (42), the covariance between cosine Fourier coefficients aˆ00ri 00 and sample
4
variance ˆ 2 can be calculated as
5
Cov(aˆ00ri 00 , ˆ 2 ) Cov(aˆ (00 , uri )00 aˆ (00 , ri )00 , ˆ 2 ).
6
For random balance design sampling, the cosine coefficient
aˆ (00 , uri )00
7
1 N
N
cos(r
( j)
i i
j 1
(E1)
)u (i ( j ) )
8
is estimated based on a grid sample of i (see eq. (38) for details), which will have a
9
much lower estimation error (the estimation error decreases at an order of
1 with an N2
10
increasing sample size N) compared to that of aˆ (00 , ri )00 based on simple random sampling
11
(the estimation error decreases at an order of
12
Therefore, we treat aˆ (00 , uri )00 as a constant and we have
Cov(aˆ00ri 00 , ˆ 2 ) Cov(aˆ (00 , ri )00 , ˆ 2 ).
13 14
1 with an increasing sample size N). N
Using eq. (21) and eq. (42), we have Cov(aˆ (00 , ri )00 , ˆ 2 ) E{
15
16
1 N
N
( j ) cos(i ( j ) ) j 1
i
1 N
N
[ g (G ( j 1
( j) 1
),..., G ( n ( j ) )) g (G (1 ),..., G ( n ))]2 }
1 E{ i cos(i )[ g (G (1 ),..., G ( n )) g (G (1 ),..., G ( n ))]2 } N N ( N 1) E ( i ( j ) cos(i ( j ) )) E[ g (G (1( k ) ),..., G ( n ( k ) )) g (G (1 ),..., G ( n ))]2 . 2 N
Since E ( i ( j ) cos(i ( j ) )) E (aˆ (00 , ri )00 ) 0 (see eq. (43) for details), we have
54
Cov( aˆ (00 , ri )00 , ˆ 2 )
1 2
Let
(1 ,..., n ) g (G (1 ),..., G ( n )) g (G (1 ),..., G ( n ))]2
3 4
and u ( ) ( i ) E[ (1 ,..., n ) | i ] ,
5 6
1 E{ i cos( i )[ g (G (1 ),..., G ( n )) g (G (1 ),..., G ( n ))]2 }. (E2) N
we have
(1 ,..., n ) u ( ) (i ) ( )
7
(E3)
i
8
where ( ) i is the random error independent from i and E ( ( ) i ) 0 . Using eq. (E3),
9
the covariance in eq. (E2) can be simplified as follows, Cov(aˆ (00 , ri )00 , ˆ 2 )
10
11 12 13 14
1 1 E[ i cos(i )u ( ) (i )] E[ i cos(i ) ( ) i ] N N 1 1 E[ i ]E[cos( i )u ( ) (i )] E[ i ( ) i ]E[cos(i )]. N N
In view that E[ i ] 0 and E[cos(i )]
1 2
2
0
cos( i ) d i 0 , we have
Cov(aˆ00ri 00 , ˆ 2 ) Cov(aˆ (00 , ri )00 , ˆ 2 ) 0 . Similarly, we can show that ( , ) ˆ2 Cov(bˆ00ri 00 , ˆ 2 ) Cov(bˆ00 ri 00 , ) 0 .
15
55
1
Appendix F: Standard errors of first-order sensitivity indices for random balance
2
design sampling
3
Standard errors of first-order sensitivity indices can be estimated based on the delta
4
method and eq. (44) as follows,
5
ˆ (ˆ x ) i
ˆ ˆ T ˆ 2
4aˆ00...ri ..00 2 Vˆ ( i ) h
aˆ00... ri ..00 0
(ˆ 2 ) 2
2N
4bˆ00..ri ..00 2 h
bˆ00.. ri ..00 0
h
(ˆ 2 ) 2
h (aˆ00...ri ..002 bˆ00..ri ..00 2 ) h ˆ V ( i ) aˆ00.. ri ..00 0,bˆ00..ri ..00 0 ˆ 2 V [ˆ ]. 2 2 2N (ˆ )
h
6
where aˆ00... ri ..00 0 and bˆ00... ri ..00 0 indicate that cosine Fourier coefficient aˆ00...ri ..00 and sine
7
Fourier coefficient bˆ00... ri ..00 are significantly larger or smaller than zero based on the
8
hypothesis test in eq. (37), respectively. Since we have
Vˆ ( i ) Vˆ ( y ) Vxi ˆ 2 ˆ xi ˆ 2
9
ˆ 2 (1 ˆ xi ), 10
the estimated standard error ˆ (ˆ xi ) can be simplified as
ˆ (ˆ x ) i
11
2 Nˆ 2
h
aˆ00...ri ..00
2
aˆ00... ri ..00 0
2 (1 ˆ xi ) Nˆ 2
12
2
i
2 (1 ˆ xi )[ Nˆ 2 ˆ
ˆ x bˆ00...ri ..00 2 ] 2i Vˆ [ˆ 2 ] h ˆ bˆ00.. r ..00 0 2
h
aˆ00...ri ..00
2
a00... ri ..00 0
x h bˆ00...ri ..002 (1 ˆ xi ) ˆ 2i Vˆ[ˆ 2 ] bˆ00.. r ..00 0
i
ˆ x 2 (1 ˆ xi )ˆ xi ( 2i ) 2 Vˆ [ˆ 2 ]. N ˆ
Based on estimation of Vˆ [ˆ 2 ] by eq. (32), we have
56
ˆ x 2 1 (1 ˆ xi )ˆ xi 2i [uˆ (4) (ˆ 2 )2 ]. N ˆ ( N 1) 2
1
ˆ (ˆ x ) i
57