Shrinkage Estimation of High Dimensional Covariance Matrices

Shrinkage Estimation of High Dimensional Covariance Matrices Yilun Chen, Ami Wiesel, Alfred O. Hero III Department of EECS, University of Michigan

April 22, 2009

Shrinkage Estimation of High Dimensional Covariance Matrices Outline

Introduction The Rao-Blackwell Ledoit-Wolf estimator The Oracle Approximating Shrinkage estimator Simulation Conclusion

Shrinkage Estimation of High Dimensional Covariance Matrices Introduction

Covariance estimation

Importance PCA, factor analysis, regression, ...

Growing interest in “large p small n” Beamforming (Stoica 2008, 2009), DOA estimation (Abramovich 2008), Gene expression arrays (Sch¨afer 2005, 2006), hyperspectral remote sensing, image classification... Classical estimation methods do not perform well

Previous works: Stein (1961, 1975), Haff (1980, 1991), Dey-Srinivasan (1985), Ledoit-Wolf (2003, 2004), ...

Shrinkage Estimation of High Dimensional Covariance Matrices Introduction

Problem formulation

x: p-dimensional, real Gaussian random vector, mean zero {xi }ni=1 : n independent realizations Use {xi }ni=1 to estimate Σ = E [xx0 ] ˆ ({xi }n ) in terms of minimum MSE Seek an estimate Σ i=1 i h ˆ ({xi }n ) − Σk2 min E kΣ F i=1

Shrinkage Estimation of High Dimensional Covariance Matrices Introduction

Shrinkage estimators Sample covariance n

1X 0 Sˆ = xi xi n i=1

Unbiased, ML (n > p) Ill-conditioned and poor performance for large p

Shrinkage estimator

ˆ

tr(S ) Fˆ = p I , diagonal loading Bias-variance tradeoff

Shrinkage Estimation of High Dimensional Covariance Matrices Introduction

The Ledoit-Wolf estimator Optimal shrinkage h   i E tr Σ − Sˆ Fˆ − Sˆ  ρ=



ˆ ˆ 2 E S − F F

ρ can not be implemented, use ρˆLW as a consistent estimate of ρ (under any distribution)

n P

0 ˆ 2 x x − S

i i

F h i=1     i ρˆLW = n2 tr Sˆ 2 − tr2 Sˆ /p ρˆLW → ρ in probability as n, p → ∞ with p/n fixed Can be improved under Gaussian settings

Shrinkage Estimation of High Dimensional Covariance Matrices Introduction

The Ledoit-Wolf estimator Optimal shrinkage h   i E tr Σ − Sˆ Fˆ − Sˆ  ρ=



ˆ ˆ 2 E S − F F

ρ can not be implemented, use ρˆLW as a consistent estimate of ρ (under any distribution)

n P

0 ˆ 2 x x − S

i i

F h i=1     i ρˆLW = n2 tr Sˆ 2 − tr2 Sˆ /p ρˆLW → ρ in probability as n, p → ∞ with p/n fixed Can be improved under Gaussian settings

Shrinkage Estimation of High Dimensional Covariance Matrices Introduction

The Ledoit-Wolf estimator Optimal shrinkage h   i E tr Σ − Sˆ Fˆ − Sˆ  ρ=



ˆ ˆ 2 E S − F F

ρ can not be implemented, use ρˆLW as a consistent estimate of ρ (under any distribution)

n P

0 ˆ 2 x x − S

i i

F h i=1     i ρˆLW = n2 tr Sˆ 2 − tr2 Sˆ /p ρˆLW → ρ in probability as n, p → ∞ with p/n fixed Can be improved under Gaussian settings

Shrinkage Estimation of High Dimensional Covariance Matrices Introduction

Introduction The Rao-Blackwell Ledoit-Wolf estimator The Oracle Approximating Shrinkage estimator Simulation Conclusion

Shrinkage Estimation of High Dimensional Covariance Matrices The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

The Rao-Blackwell theorem h i If θˆ is an estimator of θ, T is a sufficient statistic, then E θˆ T is ˆ and is at least never worse. a typically better estimator than θ, Sˆ is the minimal i statistic of Σ h sufficient ˆ ˆ LW The RBLW, E ΣLW Sˆ , is thus better than Σ

Shrinkage Estimation of High Dimensional Covariance Matrices The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

The Rao-Blackwell theorem h i If θˆ is an estimator of θ, T is a sufficient statistic, then E θˆ T is ˆ and is at least never worse. a typically better estimator than θ, Sˆ is the minimal i statistic of Σ h sufficient ˆ ˆ LW The RBLW, E ΣLW Sˆ , is thus better than Σ

Shrinkage Estimation of High Dimensional Covariance Matrices The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

The Rao-Blackwell theorem h i If θˆ is an estimator of θ, T is a sufficient statistic, then E θˆ T is ˆ and is at least never worse. a typically better estimator than θ, Sˆ is the minimal i statistic of Σ h sufficient ˆ ˆ LW The RBLW, E ΣLW Sˆ , is thus better than Σ

Shrinkage Estimation of High Dimensional Covariance Matrices The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

The Rao-Blackwell Ledoit-Wolf (RBLW) estimator Theorem The RBLW estimator also has a shrinkage form that ˆ RBLW = ρˆRBLW Fˆ + (1 − ρˆRBLW )S, ˆ Σ where

    (n − 2)/n · tr Sˆ 2 + tr2 Sˆ h     i. = (n + 2) tr Sˆ 2 − tr2 Sˆ /p

ρˆRBLW

Proof: Wishart/Singular Wishart distribution, Haar integrals,... Z

Xi X T ≺M i

4

kXi k

M

1 m

− Xi XiT 2

1 (m+1) |M| 2

p

dXi =

π2

Γ {m/2 + 1}

4

Γ {(m + p)/2 + 3}

h

2

2

2tr(M ) + tr (M)

i

Shrinkage Estimation of High Dimensional Covariance Matrices The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

The Rao-Blackwell Ledoit-Wolf (RBLW) estimator Theorem The RBLW estimator also has a shrinkage form that ˆ RBLW = ρˆRBLW Fˆ + (1 − ρˆRBLW )S, ˆ Σ where

    (n − 2)/n · tr Sˆ 2 + tr2 Sˆ h     i. = (n + 2) tr Sˆ 2 − tr2 Sˆ /p

ρˆRBLW

Proof: Wishart/Singular Wishart distribution, Haar integrals,... Z

Xi X T ≺M i

4

kXi k

M

1 m

− Xi XiT 2

1 (m+1) |M| 2

p

dXi =

π2

Γ {m/2 + 1}

4

Γ {(m + p)/2 + 3}

h

2

2

2tr(M ) + tr (M)

i

Shrinkage Estimation of High Dimensional Covariance Matrices The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

Dominance Due to the Rao-Blackwell theorem, this estimator satisfies  

2 

2 

ˆ

ˆ

E ΣRBLW − Σ ≤ E ΣLW − Σ F

for any Σ  0. We use ρˆ∗RBLW = min (ˆ ρRBLW , 1) in practice.

F

Shrinkage Estimation of High Dimensional Covariance Matrices The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

Dominance Due to the Rao-Blackwell theorem, this estimator satisfies  

2 

2 

ˆ

ˆ

E ΣRBLW − Σ ≤ E ΣLW − Σ F

for any Σ  0. We use ρˆ∗RBLW = min (ˆ ρRBLW , 1) in practice.

F

Shrinkage Estimation of High Dimensional Covariance Matrices The Rao-Blackwell Ledoit-Wolf (RBLW) estimator

Introduction The Rao-Blackwell Ledoit-Wolf estimator The Oracle Approximating Shrinkage estimator Simulation Conclusion

Shrinkage Estimation of High Dimensional Covariance Matrices The Oracle Approximating Shrinkage (OAS) estimator

The Oracle estimator Optimal shrinkage under Gaussian setting: 

2 

ˆ

min E ΣO − Σ ρ

F

ˆ O = ρFˆ + (1 − ρ) Sˆ s.t. Σ Theorem If {xi }ni=1 are i.i.d. Gaussian vectors drawn from N(0, Σ), the optimal solution is  (1 − 2/p) tr Σ2 + tr2 (Σ) ∗ ρ = . (n + 1 − 2/p) tr(Σ2 ) + (1 − n/p)tr2 (Σ)

Shrinkage Estimation of High Dimensional Covariance Matrices The Oracle Approximating Shrinkage (OAS) estimator

Approximate the Oracle ρ∗ is a function of Σ, can not implemented Conduct an iterative process to approximate ρ∗ : Iteration scheme ˆ O , say Σ ˆ 0 = S. ˆ Starting from an initial guess of Σ For j = 1, 2, 3, . . . Update ρˆj :

ρˆj+1

    ˆ j Sˆ + tr2 Σ ˆj (1 − 2/p)tr Σ     = ˆj ˆ j Sˆ + (1 − n/p)tr2 Σ (n + 1 − 2/p)tr Σ

ˆj: Update Σ ˆ j = ρˆj Fˆ + (1 − ρˆj )S. ˆ Σ

Shrinkage Estimation of High Dimensional Covariance Matrices The Oracle Approximating Shrinkage (OAS) estimator

Approximate the Oracle ρ∗ is a function of Σ, can not implemented Conduct an iterative process to approximate ρ∗ : Iteration scheme ˆ O , say Σ ˆ 0 = S. ˆ Starting from an initial guess of Σ For j = 1, 2, 3, . . . Update ρˆj :

ρˆj+1

    ˆ j Sˆ + tr2 Σ ˆj (1 − 2/p)tr Σ     = ˆj ˆ j Sˆ + (1 − n/p)tr2 Σ (n + 1 − 2/p)tr Σ

ˆj: Update Σ ˆ j = ρˆj Fˆ + (1 − ρˆj )S. ˆ Σ

Shrinkage Estimation of High Dimensional Covariance Matrices The Oracle Approximating Shrinkage (OAS) estimator

Approximate the Oracle ρ∗ is a function of Σ, can not implemented Conduct an iterative process to approximate ρ∗ : Iteration scheme ˆ O , say Σ ˆ 0 = S. ˆ Starting from an initial guess of Σ For j = 1, 2, 3, . . . Update ρˆj :

ρˆj+1

    ˆ j Sˆ + tr2 Σ ˆj (1 − 2/p)tr Σ     = ˆj ˆ j Sˆ + (1 − n/p)tr2 Σ (n + 1 − 2/p)tr Σ

ˆj: Update Σ ˆ j = ρˆj Fˆ + (1 − ρˆj )S. ˆ Σ

Shrinkage Estimation of High Dimensional Covariance Matrices The Oracle Approximating Shrinkage (OAS) estimator

The Oracle Approximating Shrinkage (OAS) estimator

Theorem The iterative process converges to the following limit, ˆ OAS = ρˆ∗ Fˆ + (1 − ρˆ∗ )S, ˆ Σ OAS OAS       (1 − 2/p)tr Sˆ 2 + tr2 Sˆ h     i , 1 . ρˆ∗OAS = min  2 2 ˆ (n + 1 − 2/p) tr S − tr Sˆ /p

Shrinkage Estimation of High Dimensional Covariance Matrices The Oracle Approximating Shrinkage (OAS) estimator

Comparison between the RBLW and the OAS

RBLW      (n − 2)/n · tr Sˆ 2 + tr2 Sˆ h     i , 1 = min  (n + 2) tr Sˆ 2 − tr2 Sˆ /p 

ρˆ∗RBLW OAS

     (1 − 2/p)tr Sˆ 2 + tr2 Sˆ h     i , 1 = min  2 2 ˆ (n + 1 − 2/p) tr S − tr Sˆ /p 

ρˆ∗OAS

Derived from different approaches, share similar forms

Shrinkage Estimation of High Dimensional Covariance Matrices The Oracle Approximating Shrinkage (OAS) estimator

Comparison between the RBLW and the OAS

RBLW      (n − 2)/n · tr Sˆ 2 + tr2 Sˆ h     i , 1 = min  (n + 2) tr Sˆ 2 − tr2 Sˆ /p 

ρˆ∗RBLW OAS

     (1 − 2/p)tr Sˆ 2 + tr2 Sˆ h     i , 1 = min  2 2 ˆ (n + 1 − 2/p) tr S − tr Sˆ /p 

ρˆ∗OAS

Derived from different approaches, share similar forms

Shrinkage Estimation of High Dimensional Covariance Matrices The Oracle Approximating Shrinkage (OAS) estimator

Comparison between the RBLW and the OAS ˆ The test statistic U    ˆ2 p · tr S ˆ = 1    − 1 U p−1 2 tr Sˆ 

is the locally most powerful invariant test statistic for sphericity (John, 1971) RBLW ρˆ∗RBLW OAS ρˆ∗OAS

 = min αRBLW  = min αOAS

 βRBLW ,1 + ˆ U  βOAS + ,1 ˆ U

Shrinkage Estimation of High Dimensional Covariance Matrices The Oracle Approximating Shrinkage (OAS) estimator

Introduction The Rao-Blackwell Ledoit-Wolf estimator The Oracle Approximating Shrinkage estimator Simulation Conclusion

Shrinkage Estimation of High Dimensional Covariance Matrices Simulation

Simulation settings

Two classes of covariance matrices studied 1st order autoregressive model (AR(1)) Σij = r |i−j| Incremental Fractional Brownian Motion (IFBM) Process: Σij =

 1 (|i − j| + 1)2H − 2|i − j|2H + (|i − j| − 1)2H 2

p = 100, n varies from 5 to 120 Simulation repeated for 100 times

Shrinkage Estimation of High Dimensional Covariance Matrices Simulation

Comparison of MSE MSE(Oracle) < MSE(OAS) < MSE(RBLW) < MSE(LW) Substantial improvement of OAS when n is small

AR(1) r = 0.5 p = 100 n: 5 ∼ 120 Repeated for 100 times

Shrinkage Estimation of High Dimensional Covariance Matrices Simulation

Comparison of shrinkage coefficients Differences when n is small

AR(1) r = 0.5 p = 100 n: 5 ∼ 120 Repeated for 100 times

Shrinkage Estimation of High Dimensional Covariance Matrices Simulation

Simulation results on different iterations Performances are refined during iterations

AR(1) r = 0.5 p = 100 n: 5 ∼ 120 Repeated for 100 times

Shrinkage Estimation of High Dimensional Covariance Matrices Simulation

Comparison of MSE MSE(Oracle) < MSE(OAS) < MSE(RBLW) < MSE(LW) Substantial improvement of OAS when n is small

IFBM H = 0.7 p = 100 n: 5 ∼ 120 Repeated for 100 times

Shrinkage Estimation of High Dimensional Covariance Matrices Simulation

Comparison of shrinkage coefficients Differences when n is small

IFBM H = 0.7 p = 100 n: 5 ∼ 120 Repeated for 100 times

Shrinkage Estimation of High Dimensional Covariance Matrices Simulation

An example that the RBLW outperforms the OAS

Table: IFRM process: comparison of MSE and shrinkage coefficients when H = 0.9, n = 20, p = 100, repeated for 100 times

Oracle OAS RBLW LW

MSE 428.9972 475.2691 472.8206 475.5840

Shrinkage coefficient 0.2675 0.3043 0.2856 0.2867

Shrinkage Estimation of High Dimensional Covariance Matrices Simulation

Introduction The Rao-Blackwell Ledoit-Wolf estimator The Oracle Approximating Shrinkage estimator Simulation Conclusion

Shrinkage Estimation of High Dimensional Covariance Matrices Conclusion

Conclusion and future work

Improving the LW under the Gaussian case: RBLW: provable dominance OAS: numerically good performance

Future work Theoretical analysis of the OAS, sufficient conditions for dominance Choice of the shrinkage target: Fˆ . Thanks to Dr. Yonina Eldar for her valuable suggestions!

Shrinkage Estimation of High Dimensional Covariance Matrices Conclusion

Thank you!

Shrinkage Estimation of High Dimensional Covariance Matrices Conclusion

Back up slides

Shrinkage Estimation of High Dimensional Covariance Matrices Conclusion

Some references C. Stein, Inadmissibility of the usual estimator for the mean of a multivariate distribution, Proc. Third Berkeley Symp. Math. Statist. Prob. 1, Pages 197 - 206, 1956. O. Ledoit and M. Wolf, A well-conditioned estimator for large-dimensional covariance matrices, Journal of Multivariate Analysis, Volume 88, Issue 2, Pages 365 - 411, February 2004. C. Stein, Estimation of a covariance matrix. Rietz Lecture, 39th Annual Meeting IMS, Atlanta, GA, 1975. L. R. Haff, Empirical Bayes Estimation of the Multivariate Normal Covariance Matrix, Annals of Statistics, Volume 8, Number 3, Page 586-597, 1980. D. K. Dey and C. Srinivasan, Estimation of a covariance matrix under Stein’s loss. Annals of Statistics, Volume 13, Page 1581 - 1591, 1985.

Shrinkage Estimation of High Dimensional Covariance Matrices Conclusion

Comparison including Sˆ AR(1), r = 0.5, p = 100, repeated for 100 times

Shrinkage Estimation of High Dimensional Covariance Matrices Conclusion

Comparison including Sˆ AR(1), r = 0.9, p = 100, repeated for 100 times

Shrinkage Estimation of High Dimensional Covariance Matrices Conclusion

Loss functions

Stein’s loss function h   i   ˆ −1 − log det ΣΣ ˆ −1 − p E tr ΣΣ Quadratic loss function h i ˆ −1 − I k2 E kΣΣ F Mean-squared error h i ˆ − Σk2 E kΣ F

Shrinkage Estimation of High Dimensional Covariance Matrices Conclusion

Stein-Haff estimator (1975, 1991)

Keep the eigen-vectors of Sˆ and replace its eigen-values l1 , . . . , lp as   p X 1  nli / n − p + 1 + 2li li − lj j=1,j6=i

Shrink the eigen-values Assume Gaussian data, require n > p.

Shrinkage Estimation of High Dimensional Covariance Matrices Conclusion

Empirical Bayesian [Haff, 1982]

Combination of two raw estimators h  i 1 ˆ EB = pn − 2n − 2 det Sˆ p I + n Sˆ Σ pn2 n+1 Shrink to the identity matrix. Assume Gaussian data, require n > p.

Shrinkage Estimation of High Dimensional Covariance Matrices Conclusion

Dey-Srinivasan Minimax estimator (1985)

Keep the eigen-vectors of Sˆ and replace its eigen-values l1 , . . . , lp as nli n + p + 1 − 2i Shrink the eigen-values Assume Gaussian data, require n > p.

Shrinkage Estimation of High Dimensional Covariance Matrices

Apr 22, 2009 - Shrinkage Estimation of High Dimensional Covariance Matrices. Outline. Introduction. The Rao-Blackwell Ledoit-Wolf estimator. The Oracle ...

471KB Sizes 2 Downloads 332 Views

Recommend Documents

sparse covariance thresholding for high-dimensional ...
partition the data into a training and a testing set. Tuning ... with a different random partition of the training and testing set. In Table 5, we ...... recovery of sparsity.

Exploiting Feature Covariance in High-Dimensional Online Learning
with the number of features, are often used in prac- tice (Dredze et al., 2008; .... increased confidence in µp and smaller subsequent up- dates to that value. Thus ...

Effect of Noise Covariance Matrices in Kalman Filter - IJEECS
In the Kalman filter design, the noise covariance matrices. (Q and R) are .... statistical variance matrix of the state error, the optimality of the Kalman filter can be ...

Effect of Noise Covariance Matrices in Kalman Filter - IJEECS
#1, *2 Electrical and Computer Engineering Department, Western Michigan ... *3 Electrical Engineering Department, Tafila Technical University, Tafila, Jordan.

Dimensional Reduction, Covariance Modeling, and ...
Page 1 ... using the hybrid feature technique of [2]; and the number of Gaus- sians one needs .... optimize for the tied matrix defining the constrained linear space.

One-dimensional Grey-level Co-occurrence Matrices for Texture ...
perform as good as the conventional GLCM but with ... light is less bright, the computer will percept the two object as distinct due to .... degree and 135 degree.

Power in High-Dimensional Testing Problems - ULB
In the sequel we call tests ... As a consequence, in many high-dimensional testing problems the power .... for all possible rates d(n) at which the dimension of the parameter space can increase ..... that the total variation distance between Nn(θn,n

Reporting Neighbors in High-Dimensional Euclidean Space
(c) For each cell τ with |Pτ | ≥ 2, go over all pairs of points of Pτ and report those pairs at ...... Geometry, Second Edition, CRC Press LLC, Boca Raton, FL, 2004.

Power in High-Dimensional Testing Problems - ULB
In regression models power properties of F-tests when the ...... But a(n) → ∞ hence implies (via the triangle inequality, together with dw-continuity of (µ, σ2) ↦→.

A Heterogeneous High Dimensional ... - Research at Google
Dimensional reduction converts the sparse heterogeneous problem into a lower dimensional full homogeneous problem. However we will ...... [6] C.Gennaro, P.Savino and P.Zezula Similarity Search in Metric Databases through Hashing Proc.

On high-dimensional acyclic tournaments
Jul 11, 2013 - ∗School of Computer Science and engineering, The Hebrew ... The degree sequence of the tournament is the vector A · 1 where 1 is the vector of 1's ..... In the opposite direction, we want to associate a chamber C to a given ...

Number of spanning clusters at the high-dimensional ...
Nov 18, 2004 - d=6. The predictions for d 6 depend on the boundary conditions, and the ... of the five-dimensional multiplicity data with a constant plus a simple linear ..... P k,L of samples with k spanning clusters each, versus k, with the.

Regularized k-means clustering of high-dimensional ...
Jan 25, 2012 - Step 1. Randomly select 3 initial centers C1,C2,C3. ... Mean of each cluster becomes new center. .... C(j)is the jth variable across all the centers.

Number of spanning clusters at the high-dimensional ...
Nov 18, 2004 - of the five-dimensional multiplicity data with a constant plus a simple linear correction to scaling reproduces the data very ..... This is quite interesting, because it allows us to ... Finally, we analyzed the two other data sets in

Criteria-Based Shrinkage and Forecasting
J Implications for Data Mining. H Harder to mine ... J Empirical results using Stock & Watson macro data .... Theorem 1: Given Regularity Conditions, define. ˆθ. 1.

Recent shrinkage and hydrological response of Hailuogou glacier, a ...
13 200 km2 and accounting for 22.2% of China's total glacier area (Shi and ... military geodetic service. Mass-balance .... orthorectified using ENVI software.

POSITIVE DEFINITE RANDOM MATRICES
We will write the complement of α in N as c α . For any integers i .... distribution with n degrees of freedom and covariance matrix , and write . It is easy to see that ...

mixtures of inverse covariances: covariance ... - Vincent Vanhoucke
Jul 30, 2003 - archive of well-characterized digital recordings of physiologic signals ... vein, the field of genomics has grown around the collection of DNA sequences such ... went the transition from using small corpora of very constrained data (e.

mixtures of inverse covariances: covariance ... - Vincent Vanhoucke
Jul 30, 2003 - of the software infrastructure that enabled this work, I am very much indebted to. Remco Teunen .... 6.2.2 Comparison against Semi-Tied Covariances . ...... Denoting by ⋆ the Kronecker product of two vectors, the Newton update can be

Covariance Matrix.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Covariance Matrix.pdf. Covariance Matrix.pdf. Open. Extract.

Salient Region Detection via High-Dimensional Color Transform
As demonstrated in our experimental results, our per-pixel saliency map represents how distinctive the color of salient regions is compared to the color of the background. Note that a simple linear combination or transformation of the color space can