On Speeding Up Computation In Information Theoretic Learning Sohan Seth and Jos´e C. Pr´ıncipe Computational NeuroEngineering Lab, University of Florida, Gainesville

Introduction

Evaluation

With the recent progress in kernel based learning methods, computation with Gram matrices has gained considerable attention. Given n samples {xi}ni=1 and a positive definite function κ(x, y), Gram matrix KXX is defined as, KXX =

   

κ(x1, x1) .. κ(xn, x1)



· · · κ(x1, xn)  .. . ...  · · · κ(xn, xn)

KXY =

  

κ(x1, y1) .. κ(xn, y1)

ˆ 2(X) can be written as Using G, H



· · · κ(x1, yn)  .. . ...  · · · κ(xn, yn)

1 ⊤ 1 ⊤ ⊤ ˆ H2(X) ≈ 2 1 GXX GXX 1 = 2 ||1 GXX ||22. n n

However, consider the matrix KZZ

"

KXX KXY = KY X KY Y

{z1, . . . , zn, zn+1, . . . , z2n} = {x1, . . . , xn, y1, . . . , yn} such that KZZ





κ(z , z ) · · · κ(z , z ) 1 1 1 n   . .  ... . . =   κ(zn, z1) · · · κ(zn, zn)

Any n × n symmetric positive definite matrix K can be expressed as K = GG⊤.

where G is a n × n lower triangular matrix with positive diagonal entries. This decomposition is known as the Cholesky decomposition. However, if the eigenvalues of K drops rapidly then the matrix can be approximated by a n × d (d ≤ n) lower triangular matrix G with arbitrary accuracy i.e. <ǫ

where ǫ is a small positive number of choice and || · || is a suitable matrix norm. This decomposition is called the incomplete Cholesky decomposition (ICD). The complexity of computing G is O(nd2) [2]. =

*

=

*





Incom plete Cholesky decom position



exp −(x − y)2 , ǫ = 10−6.

Table 1: Description of the datasets IRIS WINE CANCER YEAST ABALONE Features 4 13 32 8 8 Samples 150 178 198 1484 4177

Table 2: Total time of computing correntropy coefficient between all possible pairs of variables Direct Method Optimized method Dataset Value Time (s) Value Time (s) IRIS 1.5685 0.67 1.5685 0.04 CANCER 3.8530 12.15 3.8530 0.6 76.2108 95.0 76.2108 4.4 WINE YEAST 0.3031 301.9 0.3031 3.19 12.7 ABALONE 19.0452 2447.2 19.0452

Table 3: Total time of computing Cauchy-Schwartz quadratic mutual information between all possible pairs of variables Direct Method Optimized method Dataset Value Time (s) Value Time (s) IRIS 1.5109 0.36 1.5109 0.04 CANCER 5.5022 6.76 5.5022 0.7 20.0637 53.7 20.0637 4.8 WINE YEAST 0.1142 201.2 0.1142 1.65 ABALONE 6.711 2162.4 6.711 8.5





0 · · · 0 1 · · · 0     . . . . . . . .  and 0 =  . . . . .  . I=    . 0 ··· 0 0 ··· 1

denote the identity and zero matrix respectively. Then h

Define

"

I I1 = 0

Then

i #

"

KXX KXY KY X KY Y "

0 and I2 = I

#"

0 I

#

#

Summary

1 ⊤ 1 ⊤ ⊤ ˆ CIP = 2 1 I1GZZ GZZ I21 = 2 (e1 GZZ )(G⊤ ZZ e2) n n

where ⊤ ⊤ , 0, . . . , 0 } and e = {0, . . . , 0 , 1, . . . , 1 } e1 = {1, . . . , 1 2 | {z } | {z } . | {z } | {z } n

n

n

n

Therefore in the same way, the complexity of computing CIP reduces to O(2nd2z + 2ndz + dz ) ≈ O(2nd2z ) from O(n2). The similar approach can be extended to other estimators such as estimators of divergence, mutual information and centered correntropy [4]. This approach is particularly useful when we have an estimator that requires KXX ,KY Y and KXY at the same time such as the estimator of correntropy coefficient [3]. In such cases we use h

KXX = I 0 Cholesky decom position

√1 π



Therefore, this matrix is again symmetric positive definite and we can perform ICD.

KXY = I 0

Incomplete Cholesky decomposition

#

where KY X = K⊤ XY . This 2n × 2n matrix can also be generated by the samples

Let

This typical matrix appear in several ITL estimators such as in the estimator of cross-information potential which is defined as 1 ⊤ ˆ CIP(X, Y ) = 2 1 KXY 1. n





K − GG

Parameters: κ(x, y) =

ˆ 2(X) reduces from O(n2) to O(nd2 + nd + d) = Thus, complexity of computing H ˆ O(nd2). However, similar trick can not be applied to CIP.

However, the complexity of computing the entire Gram matrix is quadratic in n. Therefore, a considerable amount of work has been focused on extracting relevant information from the Gram matrix without accessing all the elements [1, 2]. Although information theoretic learning (ITL) is conceptually different from kernel based learning, several ITL estimators can be written in terms of Gram matrices [4]. For example, the estimator of R´enyi’s quadratic entropy is given by 1 ⊤ ˆ H2(X) = 2 1 KXX 1. n However, the difference between ITL and kernel based methods is that ITL estimators might involve a different type of matrix which is neither positive definite nor symmetric. Given samples {xi}ni=1 and {yi}ni=1 and a positive definite function κ, this matrix, KXY is defined as 

Simulation

i

"

KXX KXY KY X KY Y

and apply similar approach.

#"

I 0

#

h

and KY Y = 0 I

i

"

KXX KXY KY X KY Y

#"

0 I

#

We suggest the use of incomplete Cholesky decomposition to reduce the computational cost of ITL estimators. We experimentally verify that the proposed approach reduces the computation cost drastically. However, it should be noted that we assume the existence of a low rank approximation of the Gram matrix which might not be always available in practice. Finally, a bound on the absolute difference between the actual and estimated statistic in terms of the precision parameter ǫ would be interesting to see.

References [1] F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002. [2] Shai Fine, Katya Scheinberg, Nello Cristianini, John Shawe-taylor, and Bob Williamson. Efficient svm training using low-rank kernel representations. Journal of Machine Learning Research, 2:243–264, 2001. [3] J.-W. Xu, H. Bakardjian, A. Cichocki, and J. C. Principe. A new nonlinear similarity measure for multichannel signals. Neural Netowrks, 21:222–231, 2008. [4] J. W. Xu, A. R. C. Paiva, I. Park, and J. C. Principe. A reproducing kernel hilbert space framework for informationtheoretic learning. Signal Processing, IEEE Transactions on, 56(12):5891–5902, 2008.

On Speeding Up Computation In Information Theoretic Learning

On Speeding Up Computation In Information Theoretic Learning

https://sites.google.com/site/sohanseth/files-1/IJCNN2009.pdf
by S Seth - ‎Cited by 22 - ‎Related articles
where G is a n × n lower triangular matrix with positive diagonal entries. This decomposition is known as the Cholesky decomposition. ... Therefore, this matrix is again symmetric positive definite and we can perform ICD.

90KB Sizes 1 Downloads 424 Views

Recommend Documents

On the Information Theoretic Limits of Learning Ising ...
IIS-1320894, IIS-1447574, and DMS-1264033. K.S. and A.D. acknowledge the support of NSF via. CCF 1422549, 1344364, 1344179 and DARPA ... lower bounds for distributed statistical estimation with communication constraints. In Ad- vances in Neural Infor

game theoretic models of computation
Dec 15, 2004 - he has suffered the most); Daniel Peng (for hosting me in Princeton where i actually ... Definition 1.2.1. A Game can be defined as interaction between rational decision makers. Game theory provides us with a bag of analytical tools de

Information theoretic models in language evolution - ScienceDirect.com
Information theoretic models in language evolution. 1. Rudolf Ahlswede, Erdal Arikan, Lars Bäumer, Christian Deppe. Universität Bielefeld, Fakultät für Mathematik, Postfach 100131, 33501 Bielefeld,. Germany. Abstract. We study a model for languag

Speeding Up Multiprocessor Machines with ...
Modern examples of this last class of machines range from small, 2- or 4-way SMP server machines, over mainframes with tens of processors (Sun Fire, IBM iSeries), up to supercomputers with hundreds of processors. (SGI Altix, Cray X1). The larger type

Cue Fusion using Information Theoretic Learning - Semantic Scholar
Apr 28, 2006 - hR. S (x, y) = g(x, y, π/2),. (2.1) with g(·,·,·) is a vertically oriented Gabor kernel centered ..... 1http://www.ncrg.aston.ac.uk/GTM/3PhaseData.html.

Cue Fusion using Information Theoretic Learning - Semantic Scholar
Apr 28, 2006 - disparity. Also in the model, we have used a number of other cues: luminance, horizontal and vertical. 3 ... problem with two classes: IMO's and environment. ...... Lecture Notes in Computer Science, pages 232–238, 2003.

An Information-Theoretic Primer on Complexity, Self-Organization ...
An Information-Theoretic Primer on Complexity, Self-Organization and Emergence.pdf. An Information-Theoretic Primer on Complexity, Self-Organization and ...

Speeding up Domain Wall Fermion Algorithms using QCDLAB
Mar 17, 2007 - considered as an acceptable approximation when studying QCD with light quarks. A prototyping code with these characteristics should signal the rapid advance in the field, in which case, precision lattice computations are likely to happ

Speeding Up External Sorting with No Additional Disk ... - PDFKUL.COM
... and Engineering Discipline, Khulna University, Khulna-9208, Bangladesh. cseku @khulna.bangla.net, sumonsrkr @yahoo.com†, optimist_2195 @yahoo.com ...

Constrained Information-Theoretic Tripartite Graph Clustering to ...
bDepartment of Computer Science, University of Illinois at Urbana-Champaign. cMicrosoft Research, dDepartment of Computer Science, Rensselaer ...

Constrained Information-Theoretic Tripartite Graph Clustering to ...
1https://www.freebase.com/. 2We use relation expression to represent the surface pattern of .... Figure 1: Illustration of the CTGC model. R: relation set; E1: left.

Information-Theoretic Identities, Part 1
Jan 29, 2007 - Each case above has one inequality which is easy to see. If. X − Y − Z forms a Markov chain, then, I(X; Z|Y ) = 0. We know that I(X; Z) ≥ 0. So, I(X; Z|Y ) ≤ I(X; Z). On the other hand, if X and Z are independent, then I(X; Z)

Speeding up Domain Wall Fermion Algorithms using QCDLAB
Mar 17, 2007 - The first version of the software, QCDLAB 1.0 offers the general ... in order to develop a compact and easily manageable computing project.

Graphical processors for speeding up kernel machines - University of ...
on a multi-core graphical processor (GPU) to partially address this lack of scalability. GPUs are .... while the fastest Intel CPU could achieve only ∼ 50. Gflops speed theoretically, GPUs ..... Figure 4: Speedups obtained on the Gaussian kernel co

On the Protection of Private Information in Machine Learning Systems ...
[14] S. Song, K. Chaudhuri, and A. Sarwate, “Stochastic gradient descent with differentially ... [18] X. Wu, A. Kumar, K. Chaudhuri, S. Jha, and J. F. Naughton,.

GPUML: Graphical processors for speeding up kernel ...
Siam Conference on Data Mining, 2010 ... Kernel matrix ⬄ similarity between pairs of data points ..... Released as an open source package, GPUML.

Concept Boundary Detection for Speeding up SVMs
Abstract. Support Vector Machines (SVMs) suffer from an O(n2) training cost, where n denotes the number of training instances. In this paper, we propose an algorithm to select boundary instances as training data to substantially re- duce n. Our propo

Speeding Up External Sorting with No Additional Disk ...
Md. Rafiqul Islam, Md. Sumon Sarker†, Sk. Razibul Islam‡ ... of Computer Science and Engineering Discipline, Khulna University, Khulna-9208, Bangladesh.

A Graph-theoretic perspective on centrality
measures of centrality assess a node's involvement in the walk structure of a network. Measures vary along. 10 .... We can express this in matrix notation as CDEG = A1, where 1 is a column vector of ones. 106 ...... Coleman, J.S., 1973. Loss of ...

On Counterfactual Computation
cast the definition of counterfactual protocol in the quantum program- ... fact that the quantum computer implementing that computation might have run.

Information processing, computation, and cognition
Apr 9, 2010 - Published online: 19 August 2010 ... University of Missouri – St. Louis, St. Louis, MO, USA ...... different notions of computation, which vary in both their degree of precision and ...... Harvard University Press, Cambridge (1990).

Simulation-Based Computation of Information Rates ... - CiteSeerX
Nov 1, 2002 - Email: {kavcic, wzeng}@deas.harvard.edu. ..... Goldsmith and P. P. Varaiya, “Capacity, mutual information, and coding for finite-state Markov.

Information processing, computation, and cognition - Semantic Scholar
Apr 9, 2010 - 2. G. Piccinini, A. Scarantino. 1 Information processing, computation, and the ... In recent years, some cognitive scientists have attempted to get around the .... used in computer science and computability theory—the same notion that

Nielsen, Chuang, Quantum Computation and Quantum Information ...
Nielsen, Chuang, Quantum Computation and Quantum Information Solutions (20p).pdf. Nielsen, Chuang, Quantum Computation and Quantum Information ...