Testing for the maximum cell probabilities in multinomial distributions XIONG Shifeng & LI Guoying Academy of Mathematics and System Sciences, Chinese Academy of Sciences, Beijing 100080, China Correspondence should be addressed to Li Guoying (email:
[email protected])
Abstract: This paper investigates one-sided hypotheses testing for p[1] , the largest cell probability of multinomial distribution. A small sample test of Ethier (1982) is extended to the general cases. Based on an estimator of p[1] , a kind of large sample tests is proposed. The asymptotic power of the above tests under local alternatives is derived. An example is presented at the end of this paper. Keywords: multinomial distribution, hypothesis testing, maximum multinomial probability, order statistics, asymptotic distribution, local alternatives.
1
Introduction
Let Xn = (Xn1 , . . . , Xnm )0 be a random m-vector from the multinomial distribution, mult(n; p), Pm Pm where i=1 Xni = n, p = (p1 , . . . , pm )0 and i=1 pi = 1. The inference for p plays an important role in lots of areas of statistical applications. In many cases, we are interested in whether every pi , i = 1, . . . , m, is less than a given number q. This leads to the two testing hypotheses. H1 : p[1] 6 q
versus K1 : p[1] > q;
(1.1)
H2 : p[1] > q
versus K2 : p[1] < q,
(1.2)
where p[1] = max16i6m pi , 1/m 6 q < 1. In literature there have been some studies on these problems. For example, in recent years, Dykstra et al [1] and Lee and Yan[2] discussed the usual stochastic order restricted inference and the uniform stochastic order restricted inference for multinomial parameters, respectively. For the maximum cell probability p[1] , Gelfand et al [3] investigated the point and interval estimation . Bootstrap and Bayesian approaches were discussed for interval estimation in their paper. Glaz and Sison[4] considered approximate parametric bootstrap confidence intervals for p[1] without resampling. For the testing problems (1.1) and (1.2), only some very special cases were discussed (cf. [5-7]). In [5] and [6], the case of q = 1/m was studied, In this situation, the test (1.1) is equivalent to the goodness-of-fit test for the null hypothesis H0 : p1 = · · · = pm = 1/m. The Power-Divergence Statistics [5] , the range of relative frequencies and the mean deviation [6] were considered as the test statistics. In the cases of q > 1/m, Ethier[7] studied testing (1.1) when q = 1/m0 , where m0 is an integer with 2 6 m0 < m. In this paper, we study the two testing problems of (1.1) and (1.2) in the most general case, that is q can be any real number in (1/m, 1). The organization of the rest of the paper is as follows. In Section 2, we give several preliminary large sample results which are used to obtain asymptotic distributions of our test statistics. In Section 3, by application of a theorem in theory 1
of majorization, we obtain small sample tests for (1.1) and (1.2), which is an extension of a result in ref. [7]. The asymptotic power of the tests under local alternatives is presented. Section 4 discusses large sample tests. The likelihood ratio statistics and their asymptotic distributions for (1.1) and (1.2) are derived. However, the likelihood ratio tests are difficult to implement. Then two tests based on an estimator for p[1] are proposed and their local asymptotic efficiency is investigated. We recommend the latter tests for (1.1) and (1.2) because they are simple to calculate and have high local asymptotic efficiency. In Section 5, a set of RNA sequence data is analyzed to illustrate our methods. In this paper, for any x = (x1 , . . . , xm )0 ∈ Rm , the nonincreasing rearrangement of x1 , . . . , xm is denoted by x[1] > · · · > x[m] .
2
Preliminary results
The test statistics for (1.1) and (1.2) studied in this paper are all related to the order statistics Xn[1] , . . . , Xn[m] of Xn . In this section, we derive the asymptotic distributions of these order statistics based on the asymptotic normality of multinomial distributions. Theorem 2.1. Let Yn = (Yn1 , . . . , Ynm )0 , η = (η1 , . . . , ηm )0 be random m-vectors. If there d exist an ∈ R, an → +∞, b = (b1 , . . . , bm )0 ∈ Rm , such that an (Yn − b) −→ η, (n → ∞), where d ‘−→’ denotes ‘converges in distribution’, then à ! à ! max{i:bi =b[1] } ηi Yn[1] − b[1] d an −→ . min{i:bi =b[m] } ηi Yn[m] − b[m] In order to prove this theorem and Theorem 2.2 below, we give the following simple lemma and omit its proof. d Lemma 2.1. Let Xn , X be random variables. If Xn −→ X and an , bn ∈ R, an → +∞, bn → −∞, then (1). P (Xn 6 an ) → 1, P (Xn 6 bn ) → 0; (2). P (Xn = an ) → 0, P (Xn = bn ) → 0. Theorem 2.2. Assume the conditions in Theorem 2.1 hold and P (ηj = ηk ) = 0 for any j 6= k. Suppose b1 = · · · = bk1 > bk1 +1 = · · · = bk2 > · · · · · · > bkl−1 +1 = · · · = bm , where l = #{b1 , · · · , bm }. (m = kl ). Let random m-vector In = (In1 , . . . , Inm )0 be a permutation of (1, . . . , m)0 such that, for every i = 0, . . . , l − 1, In,ki +1 < · · · < In,ki+1 and {YnIn,ki +1 , . . . , YnIn,ki+1 } = {Yn[ki +1] , . . . , Yn[ki+1 ] }, (k0 = 0). Then ¡ ¢ (1). P In = (1, 2, . . . , m)0 −→ 1; d
(2). an (YnIn − b) −→ η, where YnIn = (YnIn1 , . . . , YnInm )0 . The proofs of Theorem 2.1 and Theorem 2.2 can be found in the Appendix. It is clear that for any b ∈ Rm , we can rearrange the components of Yn , η and b such that b1 = · · · = bk1 > bk1 +1 = · · · = bk2 > · · · · · · > bkl−1 +1 = · · · = bm . For any Yn , η and b satisfying the conditions in Theorem 2.2, we define function h : Rm 7−→ Rm , by h(x1 , . . . , xm ) = (x01 , . . . , x0l )0 , where xi is a vector whose components are the nonincreasing rearrangement of xki−1 +1 , . . . , xki , i = 1, . . . , l. Because h is continuous, we have the following corollary 2
Corollary 2.1.
Under the conditions in Theorem 2.2, Yn[1] − b[1] Yn[2] − b[2] d −→ h(η). an .. . Yn[m] − b[m]
Another useful corollary is Corollary 2.2. Under the conditions in Theorem 2.2, for each continuous and symmetric function t defined on Rk1 (‘symmetric’ means ‘for any permutation {i1 , . . . , ik1 } of {1, . . . , k1 }, t(xi1 , . . . , xik1 ) = t(x1 , . . . , xk1 )’), we have d
t(an (Yn[1] − b[1] ), . . . , an (Yn[k1 ] − b[1] )) −→ t(η1 , . . . , ηk1 ). Proof. d
t(an (Yn[1] − b[1] ), . . . , an (Yn[k1 ] − b[1] )) = t(an (YnIn1 − b1 ), . . . , an (YnInk1 − bk1 ))
−→ t(η1 , . . . , ηk1 ). ¤ We can apply the above results to multinomial distributions based on the fact that if Xn ∼ √ d mult(n; p), then n( Xnn − p) −→ N (0, Σ), where the (i, j)th entry of Σ is pi (δij − pj ) for i, j = 1, . . . , m, and δij = 1 for i = j and δij = 0 for i 6= j. For example, by Theorem 2.1 or Corollary 2.1, we have Corollary 2.3. √ Xn[1] d n( − p[1] ) −→ ξ[1] , n where (ξ1 , . . . , ξk )0 ∼ N (0, (p[1] (δij − p[1] ))k×k and k = #{i : pi = p[1] }.
3
Small sample tests
In this section, we extend the test based on Xn[1] in ref. [7] to the general cases and discuss their consistency and local asymptotic efficiency. First, we need several concepts in theory of majorization. For x = (x1 , . . . , xn )0 , y = (y1 , . . . , yn )0 Pn Pn Pk Pk ∈ Rn , i=1 xi = i=1 yi , we denote x ≺ y if i=1 x[i] 6 i=1 y[i] , k = 1, . . . , n−1. A real valued function φ is said to be a symmetric Schur-convex function if φ is symmetric and φ(x) 6 φ(y) whenever x ≺ y. Lemma 3.1.[8] If Xn ∼ mult(n; p), then for each c, ψ(p) = P (Xn[1] > c | p) and ψ ∗ (p) = −P (Xn[1] < c | p) are both symmetric Schur-convex functions. 1 , 1). It is clear that Now consider the hypotheses (1.1) and (1.2). Let q ∈ ( m p[1] 6 q
iff. p ≺ p1 , (q, . . . , q, r, 0, . . . , 0)0 ,
(3.1)
where 0 6 r < q, and p[1] > q
iff. p2 , (q,
1−q 1−q 0 ,..., ) ≺ p. m−1 m−1
(3.2)
Dykstra et al[1] discussed likelihood ratio test for p ≺ q = (q1 , . . . , qm )0 . However, they require qi > 0 for all i, which is not applicable here. 3
Now we present small sample tests. From (3.1), (3.2) and Lemma 3.1, we have the following theorem which comprises Lemma 1 in ref. [7] as a special case. Theorem 3.1. For each c > 0, sup{P (Xn[1] > c | p) : p ∈ H1 } = P (Xn[1] > c | p1 ), sup{P (Xn[1] < c | p) : p ∈ H2 } = P (Xn[1] < c | p2 ). From Theorem 3.1, two tests for (1.1) and (1.2) are obtained as follows. For the desired significance level α, we reject H1 if Xn[1] > un (α), and reject H2 if Xn[1] < vn (α), where un (α) and vn (α) satisfy P (Xn[1] > un (α) | p1 ) ≈ α, P (Xn[1] < vn (α) | p2 ) ≈ α, Denote the two tests for (1.1) and (1.2) by φ1 and φ2 , respectively. In order to determine un (α) and vn (α), we need the distributions of Xn[1] with p = p1 and p = p2 . Notice that P (Xn[1] 6 x | p) = P (Xn1 6 x, . . . , Xnm 6 x | p). When n and m are small, we can calculate the true value of P (Xn[1] 6 x | p) fast by computer if n and m are small. If n is sufficiently large, we can apply the asymptotic distribution of Xn[1] given in Corollary 2.3. Especially, if p = (q, . . . , q, 0, . . . , 0)0 , the approximations to P (Xn[1] 6 x | p) can be found in ref. [7, 9, 10]. Now we show the two tests are consistent. Denote the cdf of ξ[1] by Fs , where (ξ1 , . . . , ξs )0 ∼ N (0, (q(δij − q))s×s ). Let the upper α quantile of Fs be cs (α), i.e., Fs (cs (α)) = 1 − α. Given the √ X − q) > c[ q1 ] (α)}. If level α, by Corollary 2.3, the asymptotic rejection region of φ1 is { n( n[1] n √ Xn[1] Xn[1] a.s. K1 holds, then n −→ p[1] > q. Therefore limn→∞ P ( n( n − q) > c[ q1 ] (α) | K1 ) = 1. This property for φ2 also holds. Furthermore, we consider their local asymptotic efficiency. The following result is useful. Pm δm 0 Lemma 3.2.[11] Let Xn ∼ mult(n; p1 + √δ1n , . . . , pm + √ ), i=1 δi = 0 and set δ = (δ1 , . . . , δm ) . n √ Xn d We have n( n − p) −→ N (δ, Σ), where Σ is given in Section 2. Given level α, let k be #{i : pi = q} 6 [1/q]. Then from Corollary 2.3, under local alternatives K1,n
m δ1 δm X : Xn ∼ mult(n; p1 + √ , . . . , pm + √ ), δi = 0, n n i=1
p[1] = q, and δi = δ > 0 for pi = q, we have
√
n(
Xn[1] n
d
− q) −→ max16i6k (ξi + δ). Therefore the asymptotic power of φ1 under K1,n is ¡√ Xn[1] ¢ lim P n( − q) > c[ q1 ] (α) | K1,n n ¢ ¡ = P max (ξi + δ) > c[ q1 ] (α) = 1 − Fk (c[ q1 ] (α) − δ).
β1 (q, α, δ, k) =
n→∞
16i6k
Similarly, under local alternatives K2,n
m δ1 δm X : Xn ∼ mult(n; p1 + √ , . . . , pm + √ ), δi = 0, n n i=1
p[1] = q, and δi = −δ < 0 for pi = q,
4
1
0.8
0.9
0.7
0.8 0.6 k=3
0.7
0.5 k=2
β2
β
1
0.6
0.5
0.4
k=1
k=1 0.4
0.3 k=2
0.3 0.2 0.2
k=3 0.1
0.1
0
0
0.1
0.2
0.3
0.4
0.5
δ
0.6
0.7
0.8
0.9
0
1
0
0.1
0.2
0.3
0.4
0.5
δ
0.6
0.7
0.8
0.9
1
Fig.1. Curves of β1 (0.3, 0.05, δ, k) and β2 (0.3, 0.05, δ, k).
the asymptotic power of φ2 is √ Xn[1] β2 (q, α, δ, k) = lim P ( n( − q) < c1 (1 − α) | K2,n ) = Fk (c1 (1 − α) + δ). n→∞ n Fig.1 shows the curves of β1 (0.3, 0.05, δ, k) and β2 (0.3, 0.05, δ, k) for k = 1, 2, 3 (m > 4). We can see from the figure that β1 and β2 are not always greater than α. Furthermore, β2 is quite small if k > 1.
4
Large sample tests
We have seen that the local asymptotic efficiency of φ1 and φ2 given in last section is not high. The reason may be that only Xn[1] is used as the test statistic. In fact, in large sample cases, another parameter k = #{i : pi = p[1] } plays an important role in inferences for p[1] . If k > 1, from the results in Section 2, Xn[1] , . . . , Xn[k] should be used equally. Another natural test for (1.1) is the likelihood ratio test. Ethier[7] presented the likelihood ratio statistic for (1.1) when q = 1/m0 , m0 is an integer and 2 6 m0 < m. However, he did not provide its asymptotic distribution. Here Theorem 4.1 discusses the likelihood ratio test for (1.1). The following lemma is the key to derive the likelihood ratio statistic. . Lemma 4.1. (1). Let x1 , . . . , xn , y1 , . . . , yn ∈ R. Then for any two permutations, {j1 , . . . , jn } and {k1 , . . . , kn }, of {1, . . . , n}, n n X X xji yki 6 x[i] y[i] . i=1
i=1
(2). Let a1 , . . . , an , x1 , . . . , xn > 0. Then n X i=1
ai ln xi 6
n X
Pn j=1
xj
j=1
aj
ai ln( Pn
i=1
ai ).
Proof. Conclusion (1) can be found in ref. [12] and conclusion (2) can be proved by Lagrange multiplier method. The details are omitted here. ¤ 5
Pm Theorem 4.1. Let Θ = {p = (p1 , . . . , pm )0 ∈ (0, 1)m : i=1 pi = 1} . (1). Denote the log-likelihood function of p by ln (p). We have Tn
=
sup{ln (p) : p ∈ Θ} − sup{ln (p) : p ∈ H1 } ( 0, J = 0; ¡ Xn[i] ¢ ¡ Pm ¢ ¡ Pm ¢ = PJ i=J+1 Xn[i] − X ln , J > 1, n[i] i=1 Xn[i] ln i=J+1 nq n(1−Jq)
where J = min{j = 0, . . . , [ 1q ] : q a.s.
Pj i=1
Xn[i] + (1 − jq)Xn[j+1] 6 nq}.
(2). If p[1] < q, then Tn −→ 0; If p[1] = q, then for each x ∈ R, lim P (Tn 6 x) =
n→∞
k X
P (Si 6 x, V0 > 0, . . . , Vi−1 > 0, Vi 6 0),
(4.1)
i=0
Pj P 2 Pj ξ[i] ( ji=1 ξ[i] )2 where k = #{i : pi = q}; S0 = 0, Sj = i=1 + i=1 ξ[i] + (1 − 2q 2(1−jq) , j = 0, . .¢. , k; Vj = q ¡ jq)ξ[j+1] , j = 0, . . . , k − 1, Vk = 0; (ξ1 , . . . , ξk )0 ∼ N 0, (q(δij − q))k×k . P[ q1 ] Proof. (1). We can easily prove that q i=1 Xn[i] + (1 − [ 1q ]q)Xn[[ q1 ]+1] 6 nq, then {j = P j 0, . . . , [ 1q ] : q i=1 Xn[i] + (1 − jq)Xn[j+1] 6 nq} 6= φ. Pm It is clear that when pi = Xi /n, ln (p) (p ∈ Θ) reaches its maximum value i=1 Xni ln( Xnni )+C, ¢ ¡ n! . where C = ln Xn1 !...X nm ! Pm When p ∈ H1 , there exist j1 , . . . , jm being a permutation of 1, . . . , m, such that i=1 Xni ln pi = Pm Pm i=1 Xnji ln p[i] . By conclusion (1) of Lemma 4.1, ln (p) 6 i=1 Xn[i] ln p[i] + C. If J = 0, i.e., Xn[1] 6 nq, then when p[i] = Xn[i] /n, i = 1, . . . , m, ln (p) reaches its maximum Pm Pm X Xni value i=1 Xn[i] ln( n[i] i=1 Xni ln( n ) + C over p ∈ H1 . Hence Tn = 0. n )+C = If J > 1, i.e., Xn[1] > nq, qXn[1] + (1 − q)Xn[2] > nq, . . . , q(Xn[1] + · · · + Xn[J−1] ) + (1 − (J − 1)q)Xn[J] > nq, q(Xn[1] + · · · + Xn[J] ) + (1 − Jq)Xn[J+1] 6 nq, then by conclusion (2) of Lemma 4.1, m J X X ¡ 1 − (p[1] + · · · + p[J] ) ¢ ln (p) 6 Xn[i] ln p[i] + Xn[i] ln Xn[i] + C. (4.2) Xn[J+1] + · · · + Xn[m] i=1 i=J+1
Consider the right side of (4.2) as a function of p[1] , . . . , p[J] , and notice that Xn[J] (1−(J−1)q) Xn[J+1] +···+Xn[m] −Xn[J]
> q, . . . ,
> q, then we have
J m X X ¡ ln (p) 6 ( Xn[i] ) ln q + Xn[i] ln i=1
Notice that q >
Xn[1] n
i=J+1
1−Jq Xn[J+1] +···+Xn[m] Xn[J+1] ,
¢ 1 − Jq Xn[i] + C. Xn[J+1] + · · · + Xn[m]
(4.3)
so (4.3) yields that when p[i] = q, i = 1, . . . , J, p[i] =
1−Jq Xn[J+1] +···+Xn[m] Xn[i] ,
i = J + 1, . . . , m, ln (p) reaches its maximum value over p ∈ H1 . Then we can derive Tn as in the theorem. a.s. (2). If p[1] < q, then for n sufficiently large, J = 0 a.s. Thus Tn −→ 0. √ X If p[1] = q, then for n sufficiently large, J 6 k a.s. Let Zni = n( n[i] n − q), i = 1, . . . , k + 1. We have lim P (Tn 6 x) =
n→∞
k X i=0
lim P (Tn 6 x, J = i)
n→∞
6
=
¡ ¢ ¡ ¢ lim P Tn0 6 x, Zn1 6 0 + lim P Tn1 6 x, Zn1 > 0, qZn1 + (1 − q)Zn2 6 0 + · · ·
n→∞
n→∞
¡
+ lim P Tnk 6 x, Zn1 > 0, . . . , q n→∞
q
k−1 X
Zni + (1 − (k − 1)q)Znk > 0,
i=1
k X
¢ Zni + (1 − kq)Zn,k+1 6 0 ,
(4.4)
i=1 P 2 ¢ ¡ Pm ¢ Pji=1 Zni ¡ Xn[i] ¢ ¡ Pm Pj ( ji=1 Zni )2 i=j+1 Xn[i] − = + 2(1−jq) where Tn0 = 0, Tnj = i=1 Xn[i] ln nq i=j+1 Xn[i] ln n(1−jq) 2q + op (1), j = 1, . . . , k. By Corollary 2.1, (4.1) follows from (4.4). ¤ The likelihood ratio and its asymptotic distribution for (1.2) can be derived similarly. Although the likelihood ratio test for (1.1) fully utilizes Xn[1] , . . . , Xn[k] , from Theorem 4.1, it is too complicated and depends on an unknown parameter k. Now we consider other tests fully utilizing Xn[1] , . . . , Xn[k] . We should estimate k first. Assume that there is a consistent estimator kˆn for k, i.e., P (kˆn = k) → 1. Such an estimator pˆ[1]n = Pk ˆn
Xn[i] ˆn nk
is provided for p[1] . We adopt pˆ[1]n as the test statistic for (1.1) and (1.2) based on the following theorem. Theorem 4.2. p[1] − kp2[1] √ d ). n(ˆ p[1]n − p[1] ) −→ N (0, k Proof. By Corollary 2.2, i=1
k k X X p[1] − kp2[1] 1 √ Xn[i] d 1 ξi ∼ N (0, − p[1] ) −→ ), n( k n k i=1 k i=1 ¡ ¢ where (ξ1 , . . . , ξk )0 ∼ N 0, (p[1] (δij − p[1] ))k×k . For each x ∈ R, √ P ( n(ˆ p[1]n − p[1] ) 6 x) √ = P (Qn 6 x) − P (Qn 6 x, kˆn 6= k) + P ( n(ˆ p[1]n − p[1] ) 6 x, kˆn 6= k) = P (Qn 6 x) + o(1).
Qn ,
The theorem follows. ¤ From the theorem above, two large sample tests for (1.1) and (1.2) respectively can be q established. √ ˆ 2 Given level α, when n is sufficiently large, we can reject H1 if n(ˆ p[1]n − q) > uα q−kˆkn q , and n q √ ˆ 2 reject H2 if n(ˆ p[1]n − q) < −uα q−kˆkn q , where uα is the upper α quantile of standard normal n distribution Φ. Denote the above two tests for (1.1) and (1.2) by φ01 and φ02 , respectively. We now study their local asymptotic efficiency. Similarly to the proof of Theorem 4.2, under local alternatives K1,n , 2 √ d 0 n(ˆ p[1]n − q) −→ N (δ, q−kq k ). Hence the asymptotic power of φ1 is s s ¢ ¢ ¡ ¡√ q − kˆn q 2 k . | K1,n = Φ − uα + δ p[1]n − q) > uα β(q, α, δ, k) = lim P n(ˆ 2 n→∞ q − kq kˆn Similarly the asymptotic power of φ02 under K2,n is s √ q − kˆn q 2 lim P ( n(ˆ p[1]n − q) < −uα | K2,n ) = β(q, α, δ, k). n→∞ kˆn 7
1
0.9
0.8 k=3
0.7
0.6
β
k=2 0.5
0.4 k=1 0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
δ
0.6
0.7
0.8
0.9
1
Fig.2. Curve of β(0.3, 0.05, δ, k).
The curves of β for k = 1, 2, 3 with q = 0.3 and α = 0.05 are shown in Fig. 2. All the curves are above the line β = α and increase much faster than β1 or β2 as δ increases. In the sense of the asymptotic power under local alternatives, φ01 and φ02 are superior to φ1 and φ2 , respectively. Based on the above discussion, it may be concluded that when n is small, φ1 and φ2 can be used as exact tests; when n is large, because φ1 and φ2 is not easy to calculate and their local asymptotic efficiency is low, we recommend to adopt φ01 and φ02 . Now we give a consistent estimator for k. A natural estimator is the MLE. But from the proof © ª of Theorem 4.1, we see that the MLE for k can be any one of 1, . . . , #{i : Xni = Xn[1] } . Also, P (#{i : Xni = Xn[1] = 1) → 1 as n → ∞. Hence the MLE of k is not reasonable. We adopt the following estimator for k. Pk ³X ¡ kXn[1] − X Xn[i] ¢´ n[k] n[k+1] kˆn = arg max − λn − i=1 , (4.5) k=1,...,m−1 n n n √ where {λn } is a sequence satisfying λn → ∞ and λn / n → 0. Theorem 4.3. If the true value of k is k0 , k0 < m, then P (kˆn = k0 ) → 1. Pk ¡ kXn[1] ¢ Xn[k] −Xn[k+1] X Proof. Let Un (k) = − λn − i=1n n[i] . If k 6 k0 , then from Corollary n n Pk Pk ¢ ¡ kXn[1] ¢ P √ ¡ kX X X P − i=1n n[i] −→ 0, where ‘−→’ denotes 2.1, n nn[1] − i=1n n[i] = Op (1). Thus λn n P
P
‘converges in probability’. Therefore, Un (k0 ) −→ p[1] − p[k0 +1] > 0; Un (k) −→ 0 for k < k0 . If kX
Pk
X
a.s.
a.s.
k > k0 , then nn[1] − i=1n n[i] −→ (k − k0 )p[1] − (p[k0 +1] + · · · + p[k] ) > 0, so Un (k) −→ −∞. Hence P (Un (k0 ) > Un (k)) → 1, ∀k 6= k0 . The proof is completed. ¤ From the proof above, Un (k) → 0 for k < k0 , while Un (k) → −∞ for k > k0 . It is likely that we underestimate k more possibly. Therefore, we should choose such {λn } in (4.5) such that {λn } tends to infinity very slowly. According to our experience, λn = log10 log10 n may be a good choice. A question is how large does one need n to be for using φ01 and φ02 . We now present numerical results for evaluating the performance of φ01 (similarly for φ02 ). Given n, p, we generate 10,000 random samples from mult(n; p). Let q = p[1] . For each random multinomial sample, we test (1.1) using φ01 (λn = log10 log10 n) with the level α = 0.05. Table 1 shows the frequencies of acceptance of H1 . It follows from Table 1 that for k = 1, n ∈ (200, 300) would be enough; for k > 1 and p[k+1] 8
Table 1 Frequencies of acceptance of H1 with α = 0.05 n\p 100 200 300 400
(0.4,0.3,0.2,0.1)’ 0.960 0.952 0.945 0.955
n\p 300 500 1000 1200 1500 1800
(0.3,0.25,0.25,0.2)’ 0.955 0.944 0.944 0.952 0.952 0.943
(0.4,0.2,0.2,0.2)’ 0.959 0.951 0.946 0.953
(0.4,0.4,0.15,0.05)’ 0.909 0.933 0.942 0.954
(0.3,0.3,0.25,0.15)’ 0.886 0.900 0.915 0.921 0.936 0.942
(0.3,0.3,0.2,0.2)’ 0.887 0.905 0.922 0.934 0.937 0.943
(0.4,0.4,0.1,0.1)’ 0.912 0.944 0.951 0.952 (0.3,0.3,0.3,0.1)’ 0.922 0.957 0.949 0.955 0.955 0.949
being not close to p[1] , n ∈ (300, 500) would be enough; otherwise, n should be at least greater than 1,000. How to find better tests or how to improve the performance of the large sample tests given in this paper is a valuable problem and will be considered in the future.
5
An example
It is well-known that the four possible bases in a RNA sequence are A, C, G and U. We are interested in the base distribution at each position of an aligned RNA donor signal 1 . Take the position +3 as an example. From 1254 donor site regions, there are 614 A’s, 38 C’s, 564 G’s and 38 U’s. Let n = 1254 and (Xn1 , Xn2 , Xn3 , Xn4 )0 ∼ mult(n; p), where Xni , i = 1, 2, 3, 4, are the numbers of A, C, G and U at this position, respectively. Using these data, we test H1 : p[1] 6 0.45 versus K1 : p[1] > 0.45. (i). We adopt φ1 defined in Section 3 to test H1 . Let p1 = (0.45, 0.45, 0.1, 0)0 , and we have p−value= P (Xn[1] > xn[1] | p1 ) = P (Xn[1] > 614 | p1 ). Since n is large, it is difficult to calculate the p−value accurately. We use two methods to compute the approximate p−value. First, Monte Carlo method is applied through generating 50,000 random samples from mult(1254; p1 ). Then we have p−value ≈ 0.0053. Second, we use the asymptotic distribution of Xn[1] . Let (ξ1 , ξ2 )0 ∼ Ã !
0.45 × 0.55 −0.452 N (0, Σ1 ), where Σ1 = . From Corollary 2.3, P (Xn[1] > xn[1] | p1 ) ≈ 2 −0.45 0.45 × 0.55 √ x P (max(ξ1 , ξ2 ) > n( n[1] n − 0.45)). We use Monte Carlo method to compute the right side through generating 500,000 random samples from N (0, Σ1 ), and get p−value ≈ 0.0048. Computation of using the second method is faster than that of the first one. (ii). Because n = 1254 is large, we can apply the large sample method φ01 to test H1 . Let λn = log10 log10 n in (4.5). Then we get kˆn = 2 and pˆ[1]n = (614 + 564)/2n = 0.4697. p−value = 1 Burge,
C., Identification of genes in human genomic DNA, Ph.D. dissertation, Department of Mathematics, Stanford University, 1990, p.76.
9
q ¡√ 2 ¢ 1 − Φ n(ˆ p[1]n − 0.45)/ 0.45−2×0.45 ≈ 1.66 × 10−6 . 2 The p−values derived in (i) and (ii) are both small enough to provide sufficient evidences to reject H1 . The fact that the p−value in (i) is greater than the p−value in (ii) accords with the conclusion in Section 4 that the local asymptotic efficiency of φ1 is lower than that of φ01 .
Appendix Let (x, y)0 be a continuity point of the cdf of (max{i:bi =b[1] } ηi , min{i:bi =b[m] } ηi )0 .
Proof of Theorem 2.1.
P (an (Yn[1] − b[1] ) 6 x, an (Yn[m] − b[m] ) 6 y) m ¡\ ¢ = P {an (Yni − bi ) 6 x + an (b[1] − bi )} i=1 m \
−P
¡
¢ {y + an (b[m] − bi ) < an (Yni − bi ) 6 x + an (b[1] − bi )} .
i=1
Without any loss of generality, we assume b[1] = b1 = · · · = bk > bk+1 > · · · > bm−l−1 > bm−l = · · · = bm = b[m] . Then P
k ¡\
{an (Yni − bi ) 6 x}
¢
i=1
> P
m ¡\
{an (Yni − bi ) 6 x + an (b[1] − bi )}
¢
i=1
> P
m ³£ \ ¢ ¤c ´ {an (Yni − bi ) 6 x} − P {an (Yni − bi ) 6 x + an (b[1] − bi )}
k ¡\ i=1
> P
k ¡\
i=k+1
¢ {an (Yni − bi ) 6 x} −
i=1
By Lemma 2.1(1), P
m ¡\
m X
P (an (Yni − bi ) > x + an (b[1] − bi )).
i=k+1
Pm i=k+1
P (an (Yni − bi ) > x + an (b[1] − bi )) → 0. Hence
k ¢ ¡\ ¢ {an (Yni − bi ) 6 x} + o(1). {an (Yni − bi ) 6 x + an (b[1] − bi )} = P
i=1
i=l
Similarly, P
m ¡\
¢ {y − an (bi − b[m] ) < an (Yni − bi ) 6 x + an (b[1] − bi )}
i=1
= P
k ³£ \
m ¤\£ \ ¤´ {an (Yni − bi ) 6 x)} {an (Yni − bi ) > y} + o(1).
i=1
i=m−l
Finally, P (an (Yn[1] − b[1] ) 6 x, an (Yn[m] − b[m] ) 6 y)
10
=
P
k ¡\
{an (Yni − bi ) 6 x}
¢
i=1 k ³£ \
−P
m ¤\£ \ ¤´ {an (Yni − bi ) 6 x)} {an (Yni − bi ) > y} + o(1)
i=1
=
i=m−l
P ( max
{i:bi =b[1] }
−→ P ( max
{i:bi =b[1] }
an (Yni − bi ) 6 x, ηi 6 x,
min
{i:bi =b[m] }
min
{i:bi =b[m] }
an (Yni − bi ) 6 y) + o(1)
ηi 6 y).
¤ P
Proof of Theorem 2.2. (1). For every i = 1, . . . , m, Yni −→ bi . So P (Ynj > Ynk ) −→ ¡ ¢ 1, ∀j ∈ {1, . . . , k1 }, ∀k ∈ {k1 +1, . . . , m}. Thus, P (In1 , . . . , Ink1 )0 = (1, . . . , k1 )0 −→ 1. Similarly, ¡ ¢ ¡ ∀i = 0, . . . , l − 1, P (In,ki +1 , . . . , In,ki+1 )0 = (ki + 1, . . . , ki+1 )0 −→ 1. Finally, we have P In = ¢ (1, 2, . . . , m)0 −→ 1. (2). Let x = (x1 , . . . , xm )0 be a continuity point of the cdf of η. We have m ³£ \ X ¤\ ª´ {an (YnIni − bi ) 6 xi } {Ynj1 > · · · > Ynjm P i=1
j1 ,...,jm m ³\
{an (YnIni − bi ) 6 xi
6 P
ª´
i=1
X
6 P
P
m ³£ \ ©
j1 ,...,jm
¤\ ª´ X an (YnIni − bi ) 6 xi } {Ynj1 > · · · > Ynjm + P (Ynj = Ynk ),
i=1
j6=k
where j1 ,...,jm denotes ‘sums for all permutations j1 , . . . , jm of 1, . . . , m ’. Since 0 is a continuity point of the cdf of ηj −ηk and from Lemma 2.1(2), P (Ynj = Ynk ) = P (an (Ynj −bj )−an (Ynk −bk ) = an (bk − bj )) → 0. Therefore m ´ ³\ {an (YnIni − bi ) 6 xi } P i=1
=
X
P
j1 ,...,jm
=
X
m ³£ \
{an (YnIni − bi ) 6 xi }
P
³£ l−1 \
{an (YnIki +1 − bki +1 ) 6 xki +1 , . . . ,
i=0
an (YnIki+1 − bki+1 ) 6 xki+1 X j1 ,...,jm
´ {Ynj1 > · · · > Ynjm } + o(1)
i=1
j1 ,...,jm
=
¤\
P
³£ l−1 \
ª¤ \
´ {Ynj1 > · · · > Ynjm } + o(1)
{an (YnIki +1 − bki +1 ) 6 xki +1 , . . . ,
i=0
¤ an (YnIkk+1 − bki+1 ) 6 xki+1 } \© an (Ynj1 − bj1 ) − an (Ynj2 − bj2 ) > an (bj2 − bj1 ), . . . ,
ª´ an (Ynjm−1 − bjm−1 ) − an (Ynjm − bjm ) > an (bjm − bjm−1 ) + o(1).
Similarly to the proof of Theorem 2.1, notice the three cases that bji+1 − bji > 0, < 0 or = 0. We have P (an (YnIn1 − b1 ) 6 x1 , . . . , an (YnInm − bm ) 6 xm ) 11
X
=
{j1 ,...,jk1 }={1,...,k1 }
P
³£ l−1 \
X
···
{jkl−1 +1 ,...,jm }={kl−1 +1,...,m}
{an (YnIki +1 − bki +1 ) 6 xki +1 , . . . ,
i=0
¤ an (YnIki+1 − bki+1 ) 6 xki+1 } \© an (Ynj1 − bj1 ) − an (Ynj2 − bj2 ) > an (bj2 − bj1 ), . . . ,
ª´ an (Ynjm−1 − bjm−1 ) − an (Ynjm − bjm ) > an (bjm − bjm−1 ) + o(1) X X ···
=
{j1 ,...,jk1 }={1,...,k1 }
P
m ³£ \
{jkl−1 +1 ,...,jm }={kl−1 +1,...,m}
¤ {an (Yni − bi ) 6 xi }
i=1
\©
an (Ynj1 − bj1 ) − an (Ynj2 − bj2 ) > an (bj2 − bj1 ), . . . ,
ª´ an (Ynjm−1 − bjm−1 ) − an (Ynjm − bjm ) > an (bjm − bjm−1 ) + o(1) X X ···
=
{j1 ,...,jk1 }={1,...,k1 }
P
m ³£ \ i=1
= P
m ³\
{jkl−1 +1 ,...,jm }={kl−1 +1,...,m}
\© ¤ l−1 ª´ {an (Yni − bi ) 6 xi } an (Ynjki +1 − bj1 ) > · · · > an (Ynjki+1 − bjk1 ) + o(1) i=0
m ´ ¢ ¡\ {ηi 6 xi } . {an (Yni − bi ) 6 xi } + o(1) −→ P
¤
i=1
i=1
Acknowledgements This work was supported partly by the National Natural Science Foundation of China (Grant No. 10371126).
References 1.
Dykstra, R.L., Lang, J.B., Myongsik, O., et al., Order restricted inference for hypotheses concerning qualitative dispersion, J. Statist. Plann. Inference., 2002, 107: 249—265.
2.
Lee, C-L.C., Yan, X., Chi-squared tests for and against uniform stochastic ordering on multinomial parameters, J. Statist. Plann. Inference., 2002, 107: 267—280.
3.
Gelfand, A.E.,Glaz,J., Kuo, L., et al., Inference for the maximum cell probability under multinomial sampling, Naval Res. Logist., 1992, 39: 97—114.
4.
Glaz, J., Sison, C.P., Simultaneous confidence intervals for multinomial proportions, J. Statist. Inference., 1999, 82: 251—262.
5.
Read, T.R.C., Cressie, N., Goodness-of-fit Statistics for Discrete Multivariate Data, New York: SpringerVerlag, 1988.
6.
Young, D.H., Two alternatives to the standard χ2 -test of the hypothesis of equal cell frequencies, Biometrika, 1962, 49: 107—116.
7.
Ethier, S.N., Testing for favorable numbers on a roulette wheel, J. Amer. Statist. Assoc., 1982, 77: 660—665.
8.
Marshall, A.W., Oklin, I., Inequalities: Theory of Majorization and Its Applications, New York: Academic Press, 1979.
12
Plann.
9.
Johnson, N.L., Young, D.H., Some applications of two approximations to the multinomial distribution, Biometrika, 1960, 47: 463—469.
10.
Johnson, N.L., Young, D.H., Kotz, S., Discrete Multivariate Distributions, New York: John Wiley & Sons, Inc, 1997.
11.
Mitra, S.K., On the limiting power function of the frequency chi-square test, Ann. Math. Statist., 1958, 29: 1221—1233.
12.
Hardy, G.H., Littlewood, J.E., Polya, G., Inequalities, 2nd Edition, Cambridge: Cambridge University Press, 1952.
13