Acta Mathematicae Applicatae Sinica, English Series Vol. 19, No. 1 (2003) 13–18

Comparison of MINQUE and Simple Estimate of the Error Variance in the General Linear Models Song-gui Wang1 , Mi-xia Wu2 , Wei-qing Ma3 1,2 Department

of Applied Mathematics, Beijing Polytechnic University, Beijing 100022, China (1 Email: [email protected]) 3 Department

of Probability and Statistics, Peking University, Beijing 100871, China

Abstract Comparison is made between the MINQUE and simple estimate of the error variance in the normal linear model under the mean square errors criterion, where the model matrix need not have full rank and the dispersion matrix can be singular. Our results show that any one of both estimates cannot be always superior to the other. Some sufficient criteria for any one of them to be better than the other are established. Some interesting relations between these two estimates are also given. Keywords

General linear model, MINQUE, mean square error

2000 MR Subject Classification 62J05

1

Introduction

We consider the general linear model y = Xβ + e,

E(e) = 0,

Cov (e) = σ 2 V,

(1)

where y is an n × 1 observable random vector, an n × p matrix X and n × n nonnegative definite matrix V is known, while β is a p × 1 vector of unknown parameter, the positive scalar σ 2 is also unknown. The error vector e has the normal distribution N (0, σ 2 V ). The matrices X and V are both allowed to be of arbitrary rank. Throughout the paper, it is assumed that . the model is consistent[5] , i.e., y ∈ M(X ..V ), where M(A) stands for the range of a matrix A . and (A..B) denotes the partitioned matrix with A and B placed adjacent to each other. In the literature, there are two important estimates of σ 2 . One of them is the MINQUE (Minimum Norm Quadratic Unbiased Estimate) 2 σ m = y  M (M V M )+ M y/k,

(2)

+

suggested by Rao[6] , where M = I − X(X  X) X  , A+ stands for the Moore-Penrose inverse of . a matrix A, k = rank (X ..V ) − rank (X). According to [6, Theorem 3.4], the MINQUE can be

2 is the estimate of σ 2 based on the generalized represented in several different forms. In fact, σ m 2 least squares residuals, that is σ m = (y − Xβ ∗ ) T − (y − Xβ ∗ )/k, where T = V + XX  , A− denotes a generalized inverse, and β ∗ = (X  T − X)− X  T − y. Another estimate of σ 2 is given by

σ s2 = y  M y/k,

(3)

Manuscript received September 18, 2000. Revised April 11, 2002. Partially supported by the National Natural Science Foundation of China (No.10271010), the Natural Science Foundation of Beijing and a Project of Science and Technology of Beijing Education Committee.

S.G. Wang, M.X. Wu, W.Q. Ma

14

which is obtained simply by replacing V by I in (2), and is called simple estimate or the ordinary least squares estimate. Some authors studied statistical properties of σ s2 when V [3] has some special structures, see, for example, [2,4]. Groß established some necessary and 2 sufficient conditions for the equality σ m =σ s2 when X and V can be deficient in rank, without the normality assumption of error distribution. The object of the present note is to make further comparison of these two estimates. Obviously in the general case σ s2 need not even be unbiased. Thus the mean square error (MSE) criterion is adopted, where the mean square error  = E(θ − θ)2 . Some sufficient of an estimate θ of a scalar parameter θ is defined by M SE(θ) conditions are obtained for the inequality 2 MSE ( σm ) ≤ MSE ( σs2 ).

(4)

The reverse of (4), however, also can hold in some cases. Some interesting relations between these two estimates are also obtained. To illustrate theoretical results, two examples are given.

2

Comparison of the Estimates

The following lemmas are necessary for the proof of our main theorem. Lemma 1. Let Σ be n × n nonnegative definite matrix with rank r. A random vector X ∼ Np (µ, Σ) if and only if X = µ + AU , where A is p × r matrix with rank r and AA = Σ, U ∼ Nr (0, Ir ). A proof can be found in [5]. Lemma 2. Let X be an n × p matrix and V n × n nonnegative definite matrix. Then . rank (V M ) = rank (V ..X) − rank (X), where M = I − X(X  X)+ X  . Proof.

Denote by dim (S) the dimension of a linear space S. We have rank (V M ) =dim (V M ) = dim {V M t, for any tn×1 } . =dim {V u, X  u = 0} = rank (V ..X) − rank (X).

The last equality follows from Theorem 2.1.4 of [11]. Lemma 3. 2 σ m =

k 

u2i /k,

(5)

λi ui 2 /k,

(6)

i=1

σ s2 =

k  i=1

where ui ∼ N (0, σ 2 ), i = 1, · · · , k are independent and λ1 ≥ . . . ≥ λk > 0 are the positive eigenvalues of M V . 2 2 Proof. Since M X = 0, thus σ m and σ s2 can be rewritten as σ m = e M (M V M )+ M e/k, σ s2 =  2 e M e/k. In view of Lemma 1 and e ∼ N (0, σ V ), r = rank (V ), we note that there is an n × r matrix A such that e = Aε, ε ∼ N (0, σ 2 Ir ), V = AA , thus 2 σ m = ε Q1 ε/k,

σ s2 = ε Q2 ε/k,

(7) (8)

where Q1 = A M (M AA M )+ M A, Q2 = A M A. It is easy to verify that Q1 Q2 = Q2 Q1 , which implies (see for example [5]) that there is an r × r orthogonal matrix T such that both T  Q1 T

Comparison of MINQUE and Simple Estimate of the Error Variance in the General Linear Models

15

and T  Q2 T are diagonal. By using Lemma 2, it can be shown that . rank (Q1 ) = rank (A M ) = rank (A M A) = rank (V M ) = rank (V ..X) − rank (X) = k.

(9)

We note that Q1 is a projection matrix, thus T  Q1 T = diag (Ik , 0),

(10)



T Q2 T = diag (Λk , 0),

(11)

where Λk = (λ1 , · · · , λk ). Denote u = T  ε, then u ∼ Nr (0, σ 2 Ir ).

(12)

Substituting (10), (11) and (12) in (7) and (8) yields (5) and (6). The proof of Lemma 3 is completed.   Denote r0 = rank (X), which implies rank (M ) = n − r0 . Thus k ≤ min n − r0 , rank (V ) . In particular, when V > 0, that is, V is a positive definite matrix, we have k = n − r0 , which follows from Lemma 2. By using Poincare theorem (see, for example, [11]), we obtain αr0 +i ≤ λi ≤ αi

i = 1, . . . , k.

(13)

where α1 ≥ α2 ≥ · · · ≥ αn are the eigenvalue of V . From Lemma 3, it is easy to show the following fact. Theorem 1.

2 2 2 2 α1 σ m ≥ λ1 σ m ≥σ s2 ≥ λk σ m ≥ αr0 +k σ m .

(14)

 2 m ≤ α1 . s2 σ From (14) we have αr0 +k ≤ σ The results above show that if the eigenvalues α1 and αr0 +k are very close, then so are the 2 estimates σ s2 and σ m . Denote tr (M V )2 [tr (M V )]2 k f (M V, k) = − tr (M V ) + + , (15) k 2k 2 where k is defined in Lemma 3 as the number of the nonzero eigenvalues of M V , tr (A) denotes the trace of matrix A. Theorem 2. 2 (a) If f (M V, k) > 1, then MSE ( σm ) < MSE ( σs2 ); 2 (b) If f (M V, k) = 1, then MSE ( σm ) = MSE ( σs2 ); 2 (c) If f (M V, k) < 1, then MSE ( σm ) > MSE ( σs2 ). Proof.

It follows form (5) that 2 2 MSE ( σm ) = Var ( σm ) = Var

k 

2σ 4 u2i /k = . k i=1

On the other hand, from (6) we have k   λi u2i /k = 2 λ2i σ 4 /k2 , Var ( σs2 ) = Var thus

i=1

E( σs2 ) =

σ2

 k

λi

,

σs2 − σ 2 )2 = E( σs2 )2 − 2σ 2 E( σs2 ) + σ 4 = Var ( σs2 ) + (E σs2 )2 − 2σ 2 E( σs2 ) + σ 4 MSE ( σs2 ) =E(    2σ 4 ( λ2i ) σ 4 ( λi )2 2σ 4 λi + σ4 = + − 2 k k2 k  ( λi ) 2  2σ 4  λ2i k + − λi + . = k k 2k 2

S.G. Wang, M.X. Wu, W.Q. Ma

16

Note that tr (M V ) =

k  i=1

λi and tr (M V )2 =

k  i=1

λ2i , the proof of Theorem 2 is completed.

Theorem 2 involves the design matrix X which is expressed in terms of M , and this is not convenient for applications. However, it follows from (13) that k 

αr0 +i ≤ tr(M V ) ≤

i=1 k  i=1

k 

k 

αi ,

i=1



αr20 +i ≤ tr(M V )

2



αr0 +i

2

k 2  2   ≤ tr(M V ) ≤ αi ,

i=1

k 

i=1

αi2 .

i=1

Thus k k k 2 k  1  1 2 αr0 +i − αi + αr0 +i + k i=1 2k i=1 2 i=1

≤f (M V, k) ≤

k k k 1 2  1   2 k αi − αr0 +i + αi + . k i=1 2k i=1 2 i=1

Denote l= u=

k k k 2 k  1 2 1  αr0 +i − αi + αr0 +i + , k i=1 2k i=1 2 i=1 k k k 1 2  1   2 k αi − αr0 +i + αi + , k i=1 2k i=1 2 i=1

according to Theorem 2, we easily obtain the following corollary. Corollary 1. 2 (a) If l > 1, then MSE ( σm ) < MSE ( σs2 ); 2 (b) If 0 < u < 1, then MSE ( σm ) > MSE ( σs2 ). It is clear that l and u depend only on V and rank (M V ), therefore Corollary 1 is more convenient than Theorem 2 in applications. For example we consider the model (1) with rank (X) = 1 and V = diag (λ, λ, λ, αλ), where λ > 0 and α > 0. It is easy to see k = 3. When 2 we take λ = 2 and α = 2, then l = 3.5 > 1, according to (a), σ m is the better estimate of σ 2 . 1 When we take λ = 2 and α = 1.1, then u ≈ 0.67 < 1, according to (b), we know that σ s2 is better. However, Corollary 1 does not always work. For example, when we take λ = 45 and α = 2, then l = −0.1 < 1, and u ≈ 2.1 > 1, we cannot make any decision by Corollary 1, so we must return to Theorem 2 again. We note that in many situations, such as sample surveys, animal genetic selection, economic panel data and longitudinal data, X and V may satisfy the condition M V M = tPM V 1/2 for some t > 0, where PA = A(A A)− A . The condition implies that the nonzero eigenvalues of M V M : λi = t, i = 1, · · · , k. By using the special information about X and V , we obtain another result. Theorem 3. Suppose that M V M = tPM V 1/2 for some t > 0 and k ≥ 2, 2 (a) when k−2 σm ) > MSE ( σs2 ); k+2 < t < 1, MSE ( 2 (b) when t = k−2 σm ) = MSE ( σs2 ); k+2 or 1, MSE ( 2 σs2 ). (c) if (a) and (b) are not cases, MSE ( σm ) < MSE ( Proof. Note that M V and M V M have the same nonzero eigenvalues. If M V M = tPM V 1/2 , then the nonzero eigenvalues of M V are λi = t, i = 1, · · · , k. hence f (M V, k) = t2 +

k 2 k t − kt + . 2 2

Comparison of MINQUE and Simple Estimate of the Error Variance in the General Linear Models

17

The conclusions follow from straightforward discussion. 2 s2 = t σm with unit Theorem 4. If M V M = tPM V 1/2 or V > 0 and M V M = tM, then σ Probability.

Proof. It is easy to see that the hypothesis M V M = tPM V 1/2 implies V M V M V = tV M V. Let V0 = V /t, then we have V0 M V0 M V0 = V0 M V0 , in view of [3, Proposition 1], Theorem 4 is proved. 2 Remark 1. Under the condition of Theorem 3, obviously when t = 1, we have σ s2 = σ m ; 2 when t = (k − 2)/(k + 2), σ s2 < σ m , but their MSE’s are equal. 2 2 Further, when 0 < t < 1, σ s2 is a shinking estimate of σ m , but when t > 1, we have σ s2 > σ m 2 2 2 2 and MSE ( σs ) > MSE ( σm ), so if t > 1, we should choose σ m as the estimate of σ .

3

Examples

The estimate of σ 2 are often used in the estimation of variances of estimable functions. In what follows we will give two simple examples to illustrate applications of the results obtained in this paper. Example 1. Consider the following linear model 1n + e, y = µ1

E(e) = 0,

Cov(e) = σ 2 V.

(16)

This model has been found useful in certain statistical inference problems on the mean µ of a population when the observations y1 , · · · , yn are not independent. For some examples of applications in medical data and animal genetic selection, the reader is referred to [7–9]. For the model (16), if the matrix V has following form 

1 ρ ··· ρ 1 ··· . . . . . .. . . ρ ρ ···

 ρ ρ ..  , . 1

(17)

where ρ is known and satisfies 0 < ρ < 1, then M V M = (1 − ρ)M , and k = n − 1, which is 11 . clear by noting the fact V = (1 − ρ)I + ρ1 According to Theorems 3 and 4, we have the following statements 4 2 (a) if 0 < ρ < n+1 , then MSE ( σs2 ) < MSE ( σm ); 4 2 2 (b) if n+1 < ρ < 1, then MSE ( σs ) > MSE ( σm ); 4 2 2 , then MSE ( σs2 ) = MSE ( σm ), thus σ m and σ s2 cannot be distinguished by (c) if ρ = n+1 the mean square error criterion; 2 2 (d) σ s2 = (1 − ρ) σm <σ m . In practice, ρ is usually unknown, we can use any estimate ρ as its true value. we can easily 2 choose better estimate from σ m and σ s2 according to the above statement based on ρ and the sample size n. Although the least squares estimate (LSE) µ  = y coincides with the best linear unbiased estimate (BLUE) of µ∗ under model (16) (see [11]), however, its variance depends on V. For general matrix V, Tong[10] established the following lower and upper bounds on the variance 1 V −11)−1 1 V −1 y for all V with eigenvalues of the generalized least squares estimate µ  = (1 α1 ≥ · · · ≥ αn > 0, α1 σ 2 αn σ 2 ≤ Var ( µ) ≤ . (18) n n 2 To obtain better estimated bounds of Var ( µ) in (18), we can replace by σ s2 or σ m by using Corallary 1.

S.G. Wang, M.X. Wu, W.Q. Ma

18

Example 2. Consider the following linear model for longitudinal data yij = xij β + αi + eij ,

i = 1, · · · , m, j = 1, · · · , n,

(19)

where yij denotes the ith observation of the response variable on the jth individual, xij is a p×1 vector of known explanatory variables. β is a p × 1 vector of fixed effects, the αi are random individual effects, and the eij are random errors. Assume that the αi are mutually independent N (0, σα2 ), the eij are mutually independent N (0, σe2 ) and αi and eij are independent of one another (see, for example, [1]). After introducing the following matrix notations   y = (y1 , · · · , ym ),

yi = (yi1 , · · · , yin ),

e = (e1 , . . . , em ),

ei = (ei1 , · · · , ein ) ,

Xi = (xi1 , · · · , xin ),

α = (α1 , . . . , αm ) ,

  X  = (X1 , · · · , Xm ),

the model (19) can be rewritten as y = Xβ + (Im ⊗ 1n )α + e, where α ∼ N (0, σα2 Im ), e ∼ N (0, σe2 Imn ), and ⊗ denotes the Kronecker product of matrices.   1n 1n ) , Cov (y) = σe2 Imn + (Im ⊗ θ1 where θ = σα2 /σe2 > 0. 1n1n ), then V (θ) > 0 and the eigenvalues of V (θ) are 1 + nθ Denoting V (θ) = Imn + (Im ⊗ θ1 and 1 with multiplicity m and m(n − 1) respectively. For a special case m = 2, n = 5, therefore, eigenvalues are 1 + 5θ (with multiplicity 2) and 1 (with multiplicity 8). Let rank (X) = 2. Then k = mn − rank (x) = 8. Because of l = 1 − 10θ < 1, (a) of Corollary 1 fails to work, but u = (25θ2 + 15θ − 2)/2, it is easy to see if 1.425 < θ < 1.6, then 0 < u < 1. According to Corollary 1, we know that σ s2 as the estimate of 2 σe is better. However, let rank (X) = 1, then k = 9, and l = 1 + (65/9)θ + (100/9)θ2 > 1 for 2 any θ > 0, which shows that σ m is better than σ s2 . References [1] Diggle, P.J., Liang, K., Zeger, S.L. Analysis of longitudinal dada. Oxford, New York, 1994 [2] Dufour, J. Bias of S 2 in linear regressions with dependent errors. The Amer. Stati., 40: 284–285 (1996) [3] Groß, J. A note on equality of MINQUE and simple estimator in the general Gauss-Markov model. Statistics Probability Letters. 35: 335–339 (1997) [4] Neudecker, H. Bounds for the bias of the least squares estimator of σ 2 in the case of a first-order autoregressive process. Econometrica, 45: 1257–1262 (1977) [5] Rao, C.R. Linear statistical inference and its applications. Wiley, New York, 1973 [6] Rao, C.R. Projectors, generalized inverses and the BLUE’s. J. Roy. Statist. Soc. (Series B), 36: 442–448 (1974) [7] Rawlings, J.O. Order statistics for a special case of unequally correlated multinormal variables. Biometrics, 32: 875–887 (1976) [8] Shaked, M., Tong, Y.L. Comparison of experiments via dependence of normal variables with a common marginal distribution. Ann. Statist., 20: 614–618 (1992) [9] Shoukri, M.M., Lathrop, G.M. Statistical testing of genetic linkage under heterogeneity. Biometrics., 49: 151–161 (1993) [10] Tong, Y.L. The role of the covariance matrix in the least squares estimation for a common mean. Linear Algebra and Its Applications. 264: 313–323 (1997) [11] Wang, S.G., Chow, S.C. Advanced linear models. Marcel Dekker Inc., New York, 1994

Comparison of MINQUE and Simple Estimate of the ... - Springer Link

of an estimate ̂θ of a scalar parameter θ is defined by MSE(̂θ) = E(̂θ − θ). 2 ..... panel data and longitudinal data, X and V may satisfy the condition MV M ...

189KB Sizes 1 Downloads 289 Views

Recommend Documents

Comparison of MINQUE and Simple Estimate of the ... - Springer Link
Vol.19, No.1 (2003) 13–18. Comparison of MINQUE and Simple Estimate of the. Error Variance in the General Linear Models. Song-gui Wang. 1. , Mi-xia Wu. 2.

Comparison of MINQUE and Simple Estimate of the ... - Springer Link
1,2Department of Applied Mathematics, Beijing Polytechnic University, Beijing ... and a Project of Science and Technology of Beijing Education Committee.

A Comparison Between Broad Histogram and ... - Springer Link
KEY WORDS: Microcanonical averages; numerical simulation. We could find three ... choose the energy E0 at the center of the spectrum, and equate our numerically .... authors re-formulated their criticism concerning what they call ''systemati-.

A Comparison Between Broad Histogram and ... - Springer Link
called Entropic Sampling, which, from now on, we call ESM. We present ... to note that these movements are virtual, since they are not actually per- ..... B. A. Berg, in Proceedings of the International Conference on Multiscale Phenomena and.

Production and validation of the pharmacokinetics of a ... - Springer Link
Cloning the Ig variable domain of MAb MGR6. The V-genes of MAb MGR6 were reverse-transcribed, amplified and assembled to encode scFv fragments using the polymerase chain reaction essentially as described [6], but using the Recombi- nant Phage Antibod

Calculus of Variations - Springer Link
Jun 27, 2012 - the associated energy functional, allowing a variational treatment of the .... groups of the type U(n1) × ··· × U(nl) × {1} for various splittings of the dimension ...... u, using the Green theorem, the subelliptic Hardy inequali

The Incredible Economics of Geoengineering - Springer Link
Dec 6, 2007 - As I shall explain in this paper, its future application seems more likely than not. ... because the incentives for countries to experiment with ...

The Strength of Weak Learnability - Springer Link
high probability, the hypothesis must be correct for all but an arbitrarily small ... be able to achieve arbitrarily high accuracy; a weak learning algorithm need only ...

The Strength of Weak Learnability - Springer Link
some fixed but unknown and arbitrary distribution D. The oracle returns the ... access to oracle EX, runs in time polynomial in n,s, 1/e and 1/6, and outputs an ...

Simple and Improved Parameterized Algorithms for ... - Springer Link
May 8, 2009 - School of Computer Science and Engineering, University of Electronic ... Multiterminal Cut, and the current best approximation algorithm is the ...

Time to tweak the TTO: results from a comparison of ... - Springer Link
dom effects regression analysis to take account of the panel structure of the data (multiple TTO observations per ..... What's affecting the TTO? (in press). 15. van ...

The molecular phylogeny of the type-species of ... - Springer Link
dinokaryotic and dinokaryotic nuclei within the life- cycle, and the absence of the transversal (cingulum) and longitudinal (sulcus) surface grooves in the parasitic ...

Molecular dating and biogeography of the neritic krill ... - Springer Link
Jun 10, 2008 - ing of nodes using a Bayesian MCMC analysis and the. DNA sequence information contained in mtDNA 16S. rDNA and cytochrome oxidase ...

New Definitions of the Concepts and Terms Ecosystem ... - Springer Link
solution is practically unattainable, because it is impos- sible to find an ideal .... text of energy and matter exchange) has been replaced by “flows”; the last part of ...

Survey nonresponse and the distribution of income - Springer Link
E-mail: [email protected]. (Received: 4 May 2004; accepted: 19 ... because they explicitly refuse to do so or nobody is at home. In the literature, this.

Contributions of beliefs and processing fluency to the ... - Springer Link
Nov 27, 2012 - Abstract Discovering how people judge their memories has been a major issue for metacognitive research for over. 4 decades; many factors ...

Climate and the evolution of serpentine endemism in ... - Springer Link
Nov 12, 2011 - that benign (e.g., high rainfall and less extreme temperatures) ... small populations with novel adaptations, and because competition with non- ...

A Simple Feedforward Neural Network for the PM10 ... - Springer Link
Dec 23, 2008 - A Simple Feedforward Neural Network for the PM10. Forecasting: Comparison with a Radial Basis Function. Network and a Multivariate Linear ...

Population dynamics and life cycle of the introduced ... - Springer Link
Oct 11, 2008 - Ó Springer Science+Business Media B.V. 2008. Abstract Marine introductions are a ... intensive. Microcosmus squamiger is a widespread.

Population and distribution of wolf in the world - Springer Link
In addition, 50 wolves live in the forest of the low ar- ... gulates groups (mostly red deer) to live forever. .... wolf, holding a meeting every a certain period, pub-.

The Equivalence of Bayes and Causal Rationality in ... - Springer Link
revised definition of rationality given that is not subject to this criticism. .... is nonempty for every w ∈W (i.e. we assume that Bi is serial: (∀w)(∃x)wBix).

New Definitions of the Concepts and Terms Ecosystem ... - Springer Link
Doklady Biological Sciences, Vol. 383, 2002, pp. 141–143. Translated ... body of new biological information has been accumu- lated since the time of ... degree of integrity (unity of components, indivisibility). GENERAL BIOLOGY. New Definitions of

The Past, Present, and Future of Meridian System ... - Springer Link
and different forms of acupuncture point stimulation such as electro acupuncture, .... negative membrane potential generated by the mitochondria at a high level of energy ...... (ed),Fundamentals of complementary and alternative medicine.