Non-Gaussian Methods for Learning Linear Structural Equation Models Shohei Shimizu and Yoshinobu Kawahara August 2, 2010 Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide Slide

8: [1, 2]. 10: [2]. 11: [3]. 15: [4]. 16: [5, 6]. 19: [7, 8, 9, 10, 11, 12, 13, 14]. 21: [14]. 22: [7]. 23: [15, 16]. 24: [7, 17, 18]. 27: [19]. 32: [19]. 35: [14, 20]. 36: [21, 14, 22]. 37: [14]. 38: [17, 23]. 39: [24]. 42: [20, 21, 14, 22]. 43: [2]. 46-2: [25, 26]. 48: [27, 28, 29, 27, 8]. 50: [14, 30]. 51: [31]. 53: [32]. 54: [33, 22]. 56: [34, 35, 22, 36, 37, 38, 39, 40, 41]. 58: [22, 8, 42, 10, 43, 44, 36, 38]. 61: [45]. 62: [3]. 63: [46, 22]. 64: [47]. 65: [48, 49, 22, 21, 50, 51, 52]. 66: [53, 39, 40]. 1

Slide 67: [54, 41, 46, 40, 53]. Slide 68: [5, 55]. Slide 69: [56]. Slides 71-73: [57]. Slides 74: [58, 57].

References [1] Wright, S.: Correlation and causation. J. Agricultural Research 20 (1921) 557–585 [2] Bollen, K.A.: Structural Equations with Latent Variables. John Wiley & Sons (1989) [3] Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search. Springer Verlag (1993) (2nd ed. MIT Press 2000). [4] Pearl, J., Verma, T.: A theory of inferred causation. In Allen, J., Fikes, R., Sandewall., E., eds.: Principles of Knowledge Representation and Reasoning: Proc. 2nd Int. Conf., Morgan Kaufmann, San Mateo, CA (1991) 441–452 [5] Spirtes, P., Glymour, C.: An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review 9 (1991) 67–72 [6] Chickering, D.: Optimal structure identification with greedy search. J. Machine Learning Research 3 (2002) 507–554 [7] Hyv¨ arinen, A., Karhunen, J., Oja, E.: Independent component analysis. Wiley, New York (2001) [8] Sogawa, Y., Shimizu, S., Hyv¨arinen, A., Washio, T., Shimamura, T., Imoto, S.: Discovery of exogenous variables in data with more variables than observations. In: Proc. Int. Conf. on Artificial Neural Networks (ICANN2010). (1986) In press. [9] Micceri, T.: The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin 105(1) (1989) 156–166 [10] Moneta, A., Entner, D., Hoyer, P.O., Coad, A.: Causal inference by independent component analysis with applications to micro- and macroeconomic data. Jena Economic Research Papers in Economics 2010031, Friedrich-Schiller-University Jena, Max-Planck-Institute of Economics (May 2010) [11] Bentler, P.M.: Some contributions to efficient statistics in structural models: specification and estimation of moment structures. Psychometrika 48 (1983) 493–517

2

[12] Mooijaart, A.: Factor analysis for non-normal variables. Psychometrika 50 (1985) 323–342 [13] Dodge, Y., Rousson, V.: On asymmetric properties of the correlation coefficient in the regression setting. The American Statistician 55(1) (2001) 51–54 [14] Shimizu, S., Hoyer, P.O., Hyv¨arinen, A., Kerminen, A.: A linear nongaussian acyclic model for causal discovery. J. Machine Learning Research 7 (2006) 2003–2030 [15] Jutten, C., H´erault, J.: Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing 24(1) (1991) 1–10 [16] Comon, P.: Independent component analysis, a new concept? Processing 36 (1994) 62–83

Signal

[17] Hyv¨ arinen, A.: Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. on Neural Networks 10 (1999) 626–634 [18] Amari, S.: Natural gradient learning works efficiently in learning. Neural Computation 10 (1998) 251–276 [19] Shimizu, S., Hyv¨ arinen, A., Kano, Y., Hoyer, P.O.: Discovery of nongaussian linear causal models using ICA. In: Proc. the 21st Conf. on Uncertainty in Artificial Intelligence (UAI2005), Arlington, Virginia, AUAI Press (2005) 526–533 [20] Shimizu, S., Hyv¨ arinen, A., Kawahara, Y., Washio, T.: A direct method for estimating a causal ordering in a linear non-gaussian acyclic model. In: Proc. the 25st Conf. on Uncertainty in Artificial Intelligence (UAI2009), Montreal, Canada, AUAI Press (2009) 506–513 [21] Zou, H.: The adaptive Lasso and its oracle properties. Journal of the American Statistical Association 101 (2006) 1418–1429 [22] Hyv¨ arinen, A., Zhang, K., Shimizu, S., Hoyer, P.O.: Estimation of a structural vector autoregressive model using non-Gaussianity. J. Machine Learning Research 11 (May 2010) 1709 − 1731 [23] Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Research Logistics Quarterly 2 (1955) 83–97 [24] Hoyer, P.O., Shimizu, S., Hyv¨arinen, A., Kano, Y., Kerminen, A.: New permutation algorithms for causal discovery using ICA. In: Proc. Int. Conf. on Independent Component Analysis and Blind Signal Separation, Charleston, SC, USA. (2006) 115–122 [25] Darmois, G.: Analyse g´en´erale des liaisons stochastiques. Rev. Inst. Internat. Stat. 21 (1953) 2–8 3

[26] Skitovitch, W.: On a property of the normal distribution. Akademii Nauk SSSR 89 (1953) 217–219

Doklady

[27] Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Machine Learning Research 3 (2002) 1–48 [28] Gretton, A., Bousquet, O., Smola, A.J., Sch¨olkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In: Algorithmic Learning Theory: 16th Int. Conf. (ALT2005). (2005) 63–77 [29] Kraskov, A., St¨ ogbauer, H., Grassberger, P.: Estimating mutual information. Physical Review E 69(6) (2004) 066138 [30] Hoyer, P.O.: DirectLiNGAM does not need faithfulness. personal communication (July 2010) [31] Sogawa, Y., Shimizu, S., Kawahara, Y., Washio, T.: An experimental comparison of linear non-gaussian causal discovery methods and their variants. In: Proc. Int. Joint Conference on Neural Networks (IJCNN2010). (2010) In press. [32] Shimizu, S., Kano, Y.: Use of non-normality in structural equation modeling: Application to direction of causation. Journal of Statistical Planning and Inference 138 (2008) 3483–3491 [33] Komatsu, Y., Shimizu, S., Shimodaira, H.: Assessing statistical reliability of LiNGAM via multiscale bootstrap. In: Proc. Int. Conf. on Artificial Neural Networks (ICANN2010). (2010) In press. [34] Lacerda, G., Spirtes, P., Ramsey, J., Hoyer, P.O.: Discovering cyclic causal models by independent components analysis. In: Proc. Uncertainty in Artificial Intelligence (UAI2008). (2008) 366–374 [35] Shimizu, S., Hyv¨ arinen, A.: Discovery of linear non-gaussian acyclic models in the presence of latent classes. In: Proc. 14th Int. Conf. on Neural Information Processing (ICONIP2007), Kitakyushu, Japan. (2008) 752–761 [36] Kawahara, Y., Shimizu, S., Washio, T.: Analyzing relationships between ARMA processes based on non-Gaussianity of external influences. (2010) Submitted manuscript. [37] Hoyer, P.O., Shimizu, S., Kerminen, A., Palviainen, M.: Estimation of causal effects using linear non-gaussian causal models with hidden variables. Int. J. Approximate Reasoning 49(2) (2008) 362–378 [38] Kawahara, Y., Bollen, K., Shimizu, S., Washio, T.: GroupLiNGAM: Linear non-Gaussian acyclic models for sets of variables. arXiv:1006.5041 (June 2010)

4

[39] Hoyer, P.O., Janzing, D., Mooij, J.M., Peters, J., Sch¨olkopf, B.: Nonlinear causal discovery with additive noise models. In: Advances in Neural Information Processing Systems 21 (NIPS2008). (2009) 689–696 [40] Zhang, K., Hyv¨ arinen, A.: On the identifiability of the post-nonlinear causal model. In: Proc. 25th Conference in Uncertainty in Artificial Intelligence (UAI2009). (2009) 647–655 [41] Tillman, R.E., Gretton, A., Spirtes, P.: Nonlinear directed acyclic structure learning with weakly additive noise models. In: Advances in Neural Information Processing Systems (NIPS2009). (2010) 1847–1855 [42] Wang, Z., Tan, S.: Automatic linear causal relationship identification for financial factor modeling. Expert Systems with Applications 36(10) (2009) 12441–12445 [43] Ozaki, K., Ando, J.: Direction of causation between shared and non-shared environmental factors. Behavior Genetics 39(3) (2009) 321–336 [44] Niyogi, D., Kishtawal, C., Tripathi, S., Govindaraju, R.S.: Observational evidence that agricultural intensification and land use change may be reducing the Indian summer monsoon rainfall. Water Resources Research 46 (2010) W03533 [45] Hoyer, P.O., Hyv¨ arinen, A., Scheines, R., Spirtes, P., Ramsey, J., Lacerda, G., Shimizu, S.: Causal discovery of linear acyclic models with arbitrary distributions. In: Proc. 24th Conf. on Uncertainty in Artificial Intelligence (UAI2008). (2008) 282–289 [46] Zhang, K., Hyv¨ arinen, A.: Causality discovery with additive disturbances: an information-theoretical perspective. In: Proc. European Conference on Machine Learning (ECML2009). (2009) 570–585 [47] Shimizu, S., Hoyer, P.O., Hyv¨arinen, A.: Estimation of linear non-gaussian acyclic models for latent factors. Neurocomputing 72 (2009) 2024–2027 [48] Inazumi, T., Shimizu, S., Washio, T.: Use of prior knowledge in a nongaussian method for learning linear structural equation models. In: Proc. 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA2010). (2010) In press. [49] Zhang, K., Peng, H., Chan, L., Hyv¨arinen, A.: ICA with sparse connections: Revisited. In: Proc. 8th Int. Conf. on Independent Component Analysis and Signal Separation (ICA2009). (2009) 195–202 [50] Hoyer, P.O., Hyttinen, A.: Bayesian discovery of linear acyclic causal models. In: Proc. 25th Conf. on Uncertainty in Artificial Intelligence (UAI2009). (2009) 240–248

5

[51] Henao, R., Winther, O.: Bayesian sparse factor models and DAGs inference and comparison. In: Advances in Neural Information Processing Systems 22 (NIPS2009). (2010) 736–744 [52] Peters, J., Janzing, D., Sch¨olkopf, B.: Identifying cause and effect on discrete data using additive noise models. In: JMLR Workshop and Conference Proceedings, AISTATS 2010 (Proc. 13th Int. Conf. on Artificial Intelligence and Statistics). Volume 9. (2010) 597–604 [53] Imoto, S., Kim, S., Goto, T., Aburatani, S., Tashiro, K., Kuhara, S., Miyano, S.: Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network. In: Proc. 1st IEEE Computer Society Bioinformatics Conference. (2002) 219–227 [54] Mooij, J., Janzing, D., Peters, J., Sch¨olkopf, B.: Regression by dependence minimization and its application to causal inference in additive noise models. In: Proc. 26th Int. Conf. on Machine Learning (ICML2009). (2009) 745–752 [55] Spirtes, P., Meek, C., Richardson, T.: Causal inference in the presence of latent variables and selection bias. In: Proc. 11th Annual Conf. on Uncertainty in Artificial Intelligence. (1995) 491–506 [56] Daniusis, P., Janzing, D., Mooij, J., Zscheischler, J., Steudel, B., Zhang, K., Sch¨ olkopf, B.: Inferring deterministic causal relations. In: Proc. 26th Conference in Uncertainty in Artificial Intelligence (UAI2010). (2010) [57] Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press (2000) (2nd ed. 2009). [58] Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66 (1974) 688– 701

6

Non-Gaussian Methods for Learning Linear Structural ...

Aug 2, 2010 - Signal Processing 24(1). (1991) 1–10. [16] Comon, P.: Independent component analysis, a new concept? Signal. Processing 36 (1994) 62–83.

35KB Sizes 1 Downloads 257 Views

Recommend Documents

Learning Methods for Dynamic Neural Networks - IEICE
Email: [email protected], [email protected], [email protected]. Abstract In .... A good learning rule must rely on signals that are available ...

BASED MACHINE LEARNING METHODS FOR ...
machine learning methods for interpreting neural activity in order to predict .... describing the problems faced in motor control and what characterizes natural.

pdf-1295\computational-methods-for-linear-integral ...
Try one of the apps below to open or edit this item. pdf-1295\computational-methods-for-linear-integral-equations-by-prem-kythe-pratap-puri.pdf.

Methods of conjugate gradients for solving linear systems
Since bi-131, it follows from (7:11) and. (7:7) that (7:8) holds. ... 7,12 cº-ºrº bi-, cº- - . b-1–şng-. ...... Gauss mechanical quadrature as a basic tool. It can.

Penalized Regression Methods for Linear Models in ... - SAS Support
Figure 10 shows that, in elastic net selection, x1, x2, and x3 variables join the .... Other brand and product names are trademarks of their respective companies.

Methods of conjugate gradients for solving linear systems
bi;-firing"; Y71*i=7“1+1'iTb1P1-

Linear methods for input scenes restoration from ...
RAW-files of real correlation signals obtained by digital photo sensor were used for ... Sergey N. Starikov and Vladislav G. Rodin: [email protected]. Edward ...

Linear methods for input scenes restoration from ...
Linear methods of restoration of input scene's images in optical-digital correlators are described. Relatively low ..... From the data presented in Fig. 7a was ...

Linear methods for input scenes restoration from ...
ABSTRACT. Linear methods of restoration of input scene's images in optical-digital correlators are described. Relatively low signal to noise ratio of a camera's photo sensor and extensional PSF's size are special features of considered optical-digita

Kernelized Structural SVM Learning for Supervised Object ... - CiteSeerX
dim. HOG Grid feature. Right: Horse detector bounding boxes generated by [7], the coordinates of the 9 bounding boxes are con- catenated to create a 36 dim.

Kernelized Structural SVM Learning for Supervised Object Segmentation
that tightly integrates object-level top down information .... a principled manner, an alternative to the pseudo-likelihood ..... What energy functions can be.

Learning improved linear transforms for speech ... - Semantic Scholar
class to be discriminated and trains a dimensionality-reducing lin- ear transform .... Algorithm 1 Online LTGMM Optimization .... analysis,” Annals of Statistics, vol.

2015_Comparing linear vs hypermedia Online Learning ...
... and Technology. Technion - Israel Institute of Technology, Haifa, Israel ... Retrying... 2015_Comparing linear vs hypermedia Online Learning Environments.pdf.

Learning Non-Linear Combinations of Kernels - CiteSeerX
(6) where M is a positive, bounded, and convex set. The positivity of µ ensures that Kx is positive semi-definite (PSD) and its boundedness forms a regularization ...

LEARNING IMPROVED LINEAR TRANSFORMS ... - Semantic Scholar
each class can be modelled by a single Gaussian, with common co- variance, which is not valid ..... [1] M.J.F. Gales and S.J. Young, “The application of hidden.

Active Learning Methods for Remote Sensing Image ...
Dec 22, 2008 - This is the case of remote sensing images, for which active learning ...... classification”, International Conference on Image Processing ICIP,.