Ensemble Estimation of Multivariate ๐‘“-Divergence Kevin Moon, Alfred O. Hero III University of Michigan, Department of Electrical Engineering and Computer Science

Introduction

Application to Divergence Estimation

Proofs of Theorems 2 and 3

๏‚ง ๐‘“-divergence measures the difference between distributions

๏‚ง Setup: ๐‘ฟ1 , โ€ฆ , ๐‘ฟ๐‘ , ๐‘ฟ๐‘+1 , โ€ฆ , ๐‘ฟ๐‘+๐‘€2 i.i.d. samples from ๐‘“2 and ๐’€1 , โ€ฆ , ๐’€๐‘€1 i.i.d. samples from ๐‘“1 , ๐‘‡ = ๐‘ + ๐‘€2 . ๏‚ง Plug-in Estimators: the ๐‘˜-nn density estimate is

๏‚ง Principal tools are concentration inequalities and moment bounds applied to a higher order Taylor expansion of ๐†๐‘˜ . ๏‚ง Due to dependence of ๐†๐‘˜ on the likelihood ratio, bounds on covariances of products of the density estimates are derived.

โ€ข Has the form G ๐‘“1 , ๐‘“2 =

๐‘“1 ๐‘ฅ ๐‘“2 ๐‘ฅ

๐‘”

๐‘“2 (๐‘ฅ) ๐‘‘๐‘ฅ, where ๐‘“1 and ๐‘“2 are

densities and ๐‘” is a smooth convex function โ€ข Includes KL and Renyi-๐›ผ divergences โ€ข Important in statistics, machine learning, and information theory ๏‚ง Convergence rates are unknown for most divergence estimators ๏‚ง We derive the MSE convergence rates for kernel density plug-in ๐‘“divergence estimators using ๐‘˜-nn kernel โ€ข Assume densities are smooth and two finite populations of i.i.d. samples are available

f i ,k ( X j ) ๏€ฝ

Weighted Ensemble Estimation Let ๐‘™ = ๐‘™1 , โ€ฆ , ๐‘™๐ฟ be a set of index values and ๐‘ฌ๐‘™ ๐‘™โˆˆ๐‘™ be an ensemble of estimates of ๐ธ. The weighted ensemble estimator is ๐‘ฌ๐‘ค =

๐‘ค(๐‘™)๐‘ฌ๐‘™ , ๐‘™โˆˆ๐‘™

where ๐‘™โˆˆ๐‘™ ๐‘ค ๐‘™ = 1. Consider the following conditions on ๐‘ฌ๐‘™ ๏‚ง ๐ถ. 1 The bias is given by Bias(๐‘ฌ๐‘™ ) =

๐‘‡ โˆ’๐‘–/2๐‘‘

๐‘๐‘– ๐œ“๐‘– ๐‘™

+๐‘‚

๐‘–โˆˆ๐ฝ

1 ๐‘‡

: ๐‘™โˆˆ๐‘™

,

where ๐‘๐‘– are constants depending on the density, ๐ฝ is a finite index set with length< ๐ฟ, min J > 0 and max ๐ฝ โ‰ค ๐‘‘, ๐‘‘ is the dimension of the data, and ๐œ“๐‘– ๐‘™ are basis functions depending only on the parameter ๐‘™. ๏‚ง ๐ถ. 2 The variance is given by Var ๐‘ฌ๐‘™ = ๐‘๐‘ฃ

,

Experiments

1 Gk ๏€ฝ N

๏ƒฆ f 1,k ( X i ) ๏ƒถ g๏ƒง ๏ƒท. ๏ƒฅ i ๏€ฝ1 ๏ƒจ f 2,k ( X i ) ๏ƒธ

๏‚ง Assumptions 1. ๐‘“1 , ๐‘“2 , and ๐‘” are smooth (๐‘‘ differentiable) 2. ๐‘“1 and ๐‘“2 have bounded support 3. ๐‘“1 and ๐‘“2 are strictly lower bounded 1/2 4. ๐‘€1 = ๐‘‚ ๐‘€2 and ๐‘˜ = ๐‘˜0 ๐‘€2 ๏‚ง MSE Convergence Rates: under the above assumptions, Theorem 2. The bias of the plug-in estimator ๐†๐‘˜ is

๏€จ ๏€ฉ

Bias G k

๐‘ค

๐‘ ๐‘ข๐‘๐‘—๐‘’๐‘๐‘ก ๐‘ก๐‘œ

๐‘ค ๐‘™ = 1, ๐‘™โˆˆ๐‘™

๐›พ๐‘ค ๐‘– =

๐‘ค ๐‘™ ๐œ“๐‘– (๐‘™) = 0, ๐‘– โˆˆ ๐ฝ. ๐‘™โˆˆ๐‘™

We apply Theorem 1 to ๐‘“ -divergence estimators to obtain the parametric rate of ๐‘‚ 1/๐‘‡ . To do this, we verify conditions ๐ถ. 1 and ๐ถ. 2.

Truncated Gaussians, ๐‘‘ = 5

j ๏ƒฆ ๏ƒถ d d ๏ƒฆ k ๏ƒถ ๏ƒท ๏ƒฆ1 k ๏ƒถ ๏ƒฆ1๏ƒถ ๏ƒง ๏€ฝ ๏ƒฅO ๏ƒง ๏€ซO๏ƒง ๏ƒท ๏€ซ o๏ƒง ๏€ซ . ๏ƒท ๏ƒท ๏ƒง ๏ƒท ๏ƒจk๏ƒธ j ๏€ฝ1 ๏ƒง ๏ƒจ M 2 ๏ƒธ ๏ƒท ๏ƒจ k M2 ๏ƒธ ๏ƒจ ๏ƒธ

Theorem 3. The variance of the plug-in estimator ๐‘ฎ๐‘˜ is

๏€จ ๏€ฉ

Var G k

๏ƒฆ1 ๏ƒฆ1 1 ๏ƒถ 1 1๏ƒถ ๏€ฝ O๏ƒง ๏€ซ ๏€ซ 2 ๏ƒท. ๏ƒท ๏€ซ o๏ƒง ๏€ซ ๏ƒจ N M2 ๏ƒธ ๏ƒจ N M2 k ๏ƒธ

Truncated Gaussians, ๐‘‡ = 3000

Non-truncated Gaussians, d=5

Conclusions ๏‚ง Derived MSE convergence rates for a plug-in estimator of ๐‘“ divergence 1 ๏‚ง Obtained an estimator with convergence rate of ๐‘‚ by applying ๐‘‡

the theory of optimally weighted ensemble estimation โ€ข Simple and performs well for higher dimension โ€ข Performs well for densities with unbounded support

๏€ฉ

where ๐‘ค0 is the solution to the following convex optimization problem: min ๐‘ค 2

๏‚ง Truncated Gaussian pdfs with different means and covariances ๏‚ง Estimate Renyi-๐›ผ divergence with ๐›ผ = 0.8 for 100 trials ๏‚ง Two experiments: Fixed ๐‘‘, increasing ๐‘‡; fixed ๐‘‡, increasing ๐‘‘

N

1 1 +๐‘‚ . ๐‘‡ ๐‘‡

Theorem 1. [2] Under the conditions ๐ถ. 1 and ๐ถ. 2, there exists a weight vector ๐‘ค0 such that ๏ƒฉ Ew ๏€ญ E 2 ๏ƒน ๏€ฝ O ๏ƒฆ 1 ๏ƒถ , ๏ƒง ๏ƒท ๏ƒช๏ƒซ 0 ๏ƒบ๏ƒป ๏ƒจT ๏ƒธ

๏€จ

M ic ๏ฒ ( j) d i ,k

where ๐‘˜ โ‰ค ๐‘€๐‘– , ๐‘ is the volume of a ๐‘‘-dimensional unit ball, and ๐‘‘ ๐œŒ๐‘–,๐‘˜ ๐‘— is the distance to the ๐‘˜ th nearest neighbor of ๐‘‹๐‘— in ๐‘Œ1 , โ€ฆ , ๐‘Œ๐‘€1 or ๐‘‹๐‘+1 , โ€ฆ , ๐‘‹๐‘+๐‘€2 for ๐‘– = 1,2, respectively. The divergence estimate is

1 ๐‘‡

๏‚ง We obtain an estimator with rate ๐‘‚ (๐‘‡ =sample size) by applying the theory of optimally weighted ensemble estimation โ€ข More computationally tractable than competing estimators

k

Acknowledgments Heat map of predicted bias of non-averaged ๐‘“-divergence estimator based on Theorem 2 as a function of dimension and sample size.

๏‚ง Weighted Ensemble Divergence Estimator: Choose ๐ฟ > ๐‘‘ โˆ’ 1 and choose ๐‘™ = ๐‘™1 , โ€ฆ , ๐‘™๐ฟ . Let ๐‘˜ ๐‘™ = ๐‘™ ๐‘€2 and ๐†๐‘ค = satisfies ๐‘™โˆˆ๐‘™ ๐‘ค(๐‘™)๐†๐‘˜(๐‘™) . From Theorem 2, the bias of ๐†๐‘˜(๐‘™) ๐‘™โˆˆ๐‘™

๐ถ. 1 when ๐œ“๐‘– ๐‘™ = ๐‘™ ๐‘–/๐‘‘ and ๐ฝ = 1, โ€ฆ , ๐‘‘ โˆ’ 1 . From Theorem 3, the general form of the variance of ๐†๐‘˜(๐‘™) also follows ๐ถ. 2. ๐‘™โˆˆ๐‘™

Theorem 1 then gives us the optimal weight to obtain a convergence rate of ๐‘‚ 1/๐‘‡ .

This work was partially supported by NSF grant CCF-1217880 and a NSF Graduate Research Fellowship to the first author under Grant No. F031543.

References [1] K. Moon, A. Hero, โ€œEnsemble estimation of multivariate ๐‘“divergence,โ€ Submitted to ISIT 2014. [2] K. Sricharan, D. Wei, and A. Hero, โ€œEnsemble estimators for multivariate entropy estimation,โ€ IEEE Trans. on Info. Theory, vol. 59, no. 7, pp 4374-4388, 2013.

Application to Divergence Estimation Introduction ...

2. ( ) , where 1 and 2 are densities and is a smooth convex function. รขย€ยข Includes KL and Renyi- divergences. รขย€ยข Important in statistics, machine learning, and information theory. รขย–ยซ Convergence rates are unknown for most divergence estimators. รขย–ยซ We derive the MSE convergence rates for kernel densityย ...

423KB Sizes 0 Downloads 196 Views

Recommend Documents

true motion estimation รขย€ย” theory, application, and ... - Semantic Scholar
From an application perspective, the TMT successfully captured true motion vectors .... 6 Effective System Design and Implementation of True Motion Tracker.

true motion estimation รขย€ย” theory, application, and ... - Semantic Scholar
5 Application in Motion Analysis and Understanding: Object-Motion Estima- ...... data, we extend the basic TMT to an integration of the matching-based techniqueย ...

Appendix to "Reconciling the divergence in ... - Andrรƒยฉ Kurmann
Mar 23, 2017 - ... 2002 North American Industry Classification System (NAICS), the U.S. economy ... LPC covers the non-farm business sector of the U.S economy. ...... To construct a sample analog of S, we use the Newey-West estimate of S:.

Application of complex-lag distributions for estimation of ...
The. Fig. 1. (Color online) (a) Original phase รย•(x; y) in radians. (b) Fringe pattern. (c) Estimated first-order phase derivative. รย•(1)(x; y) in radians/pixel. (d) First-order phase derivative esti- mation error. (e) Estimated second-order phase de

Introduction to Mobile Application Architecture and ...
M.S. Computer Science and Software Engineering รขย€ย“ DePaul University, Chicago,. Illinois ... Windows Mobile and Android developer. Contact information: ... Preparing the Development Environment. What we needรขย€ยฆ 1. A laptop computer running Windows XP

An Introduction to Application-Independent Evaluation ...
tion in 1996, these evaluation tools have been embraced by the research community and ... It was proposed in a conference paper in 2004 [2] and followed in ... 4 One might call this a one-speaker open set identification task. 5 This is how theย ...