Core Technology Center

Tokyo Institute of Technology

Implicit Regularization in Variational Bayesian Matrix Factorization Shinichi Nakajima (Nikon) Masashi Sugiyama (Tokyo Tech.) June 22, 2010

NIKON CORPORATION Core Technology Center June 22, 2010

Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods

¾ Theoretical results ¾ VBMF has implicit regularization

¾ Discussion ¾ Mechanism of implicit regularization

¾ Conclusion

2

NIKON CORPORATION Core Technology Center June 22, 2010

Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods

¾ Theoretical results ¾ VBMF has implicit regularization

¾ Discussion ¾ Mechanism of implicit regularization

¾ Conclusion

3

NIKON CORPORATION Core Technology Center June 22, 2010

Matrix factorization



Movie M

Š Multivariate analysis (RRR, CCA, oPLS) Š Missing entries prediction (CF)

Movie 3 Movie 2 Movie 1

Applications:

User 1 User 2 User 3 …

Output

Input

User L

4

NIKON CORPORATION Core Technology Center June 22, 2010

Bayesian matrix factorization model

5

NIKON CORPORATION Core Technology Center June 22, 2010

Bayesian estimation

Not easy to calculate. Approximation methods.

[Salakhutdinov&Mnih2008] 6

NIKON CORPORATION Core Technology Center June 22, 2010

Free energy minimization

[Lim&Teh2007, Raiko et al.2007]

Constraint makes optimization much easier. [Lim&Teh2007] [Raiko et al.2007]

7

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits

[Raiko et al.2007]

[Lim&Teh2007]

8

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

[Raiko et al.2007]

[Lim&Teh2007]

9

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

10

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

Why? 11

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

Why?

Because it’s Bayesian! 12

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

Why?

Because it’s Bayesian!

Implicit regularization exists. 13

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

In this paper, we Š show implicit regularization of VBMF. Š explain its mechanism. 14

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

Notes: we assume Š no missing entries. Š

[Raiko et al.2007] 15

NIKON CORPORATION Core Technology Center June 22, 2010

Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods

¾ Theoretical results ¾ VBMF has implicit regularization

¾ Discussion ¾ Mechanism of implicit regularization

¾ Conclusion

16

NIKON CORPORATION Core Technology Center June 22, 2010

Maximum likelihood (ML) estimator

17

NIKON CORPORATION Core Technology Center June 22, 2010

MAP estimator (Theorem 1)

18

NIKON CORPORATION Core Technology Center June 22, 2010

VB estimator (Theorem 2, 3)

19

NIKON CORPORATION Core Technology Center June 22, 2010

Implicit regularization

(Singular component-wise) positive-part James-Stein (PJS) estimator. [James&Setin1961, Efron&Morris1973] PJS

20

NIKON CORPORATION Core Technology Center June 22, 2010

Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods

¾ Theoretical results ¾ VBMF has implicit regularization

¾ Discussion ¾ Mechanism of implicit regularization

¾ Conclusion

21

NIKON CORPORATION Core Technology Center June 22, 2010

Simplest case

Equivalent classes 22

NIKON CORPORATION Core Technology Center June 22, 2010

Bayes posterior

23

NIKON CORPORATION Core Technology Center June 22, 2010

Bayes posterior

: ML estimators

: MAP estimators

24

NIKON CORPORATION Core Technology Center June 22, 2010

Bayes posterior

: ML estimators

: MAP estimators

Bayes posterior has two modes! 25

NIKON CORPORATION Core Technology Center June 22, 2010

Bayes posterior

: ML estimators

: MAP estimators

Bayes posterior has two modes! 26

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

27

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

28

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

29

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

Both modes are approximated. 30

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

31

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

One of the modes is approximated. 32

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

One of the modes is approximated. 33

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior

Ridge between the peaks attracts VB posterior. Implicit regularization

34

NIKON CORPORATION Core Technology Center June 22, 2010

Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods

¾ Theoretical results ¾ VBMF has implicit regularization

¾ Discussion ¾ Mechanism of implicit regularization

¾ Conclusion

35

NIKON CORPORATION Core Technology Center June 22, 2010

Conclusion ¾ Derived bounds of VBMF solution. ¾ Explained mechanism of implicit regularization. ¾ What were omitted. Please visit to our poster! ¾ Explanation with Jeffreys prior. ¾ Bounds of empirical VBMF (EVBMF) solution.

¾ Future work ¾ Tighter bounds ¾ Analyze imputation cases.

36

NIKON CORPORATION Core Technology Center June 22, 2010

Empirical VBMF (EVBMF)

38

NIKON CORPORATION Core Technology Center June 22, 2010

EVB estimator (Theorem 4, 5)

39

NIKON CORPORATION Core Technology Center June 22, 2010

EVB posterior

Stronger regularization than VB with flat prior. 40

NIKON CORPORATION Core Technology Center June 22, 2010

Jeffreys prior

[Jeffreys1946].

Parameterization invariant.

Equivalent

No regularization

Flat prior in (A, B) distributes more mass around origin than flat prior in U. 41

NIKON CORPORATION Core Technology Center June 22, 2010

Appendix: Negative log likelihood

42

NIKON CORPORATION Core Technology Center June 22, 2010

Appendix: VB Free energy

43

Implicit Regularization in Variational Bayesian Matrix ...

Jun 22, 2010 - 9. NIKON CORPORATION. Core Technology Center. June 22, 2010. VB hardly overfits. [Raiko et al.2007]. [Lim&Teh2007]. MAP/ML test error ...

1MB Sizes 0 Downloads 210 Views

Recommend Documents

Implicit Regularization in Variational Bayesian ... - Semantic Scholar
MAPMF solution (Section 3.1), semi-analytic expres- sions of the VBMF solution (Section 3.2) and the. EVBMF solution (Section 3.3), and we elucidate their.

Adaptive Bayesian personalized ranking for heterogeneous implicit ...
explicit feedbacks such as 5-star graded ratings, especially in the context of Netflix $1 million prize. ...... from Social Media, DUBMMSM '12, 2012, pp. 19–22.

Variational Nonparametric Bayesian Hidden Markov ...
[email protected], [email protected]. ABSTRACT. The Hidden Markov Model ... nite number of hidden states and uses an infinite number of Gaussian components to support continuous observations. An efficient varia- tional inference ...

Learning in Implicit Generative Models
translation, or fine-grained spatio-temporal models tracking the spread of disease. Alternatively, we ... and ecology, since the mechanistic understanding of such systems can be used to directly create a data simulator ... Without a likelihood functi

Implicit Theories 1 Running Head: IMPLICIT THEORIES ...
self, such as one's own intelligence (Hong, Chiu, Dweck, Lin, & Wan, 1999) and abilities (Butler,. 2000), to those more external and beyond the self, such as other people's .... the self. The scale had a mean of 4.97 (SD = 0.87) and an internal relia

Implicit Interaction
interaction is to be incorporated into mainstream software development, ... of adaptive user interfaces on resource-constrained mobile computing devices.

Variational Program Inference - arXiv
If over the course of an execution path x of ... course limitations on what the generated program can do. .... command with a prior probability distribution PC , the.

Entropic Graph Regularization in Non ... - Research at Google
of unlabeled data is known as semi-supervised learning (SSL). ... obtained, and adding large quantities of unlabeled data leads to improved performance.

Regularization behavior in a non-linguistic domain - Linguistics and ...
gest that regularization behavior may be due to domain-general factors, such as memory ... (2005, 2009) have shown that children tend to regularize free variation ..... one output per input) can be predicted with 100% accuracy. The ceiling on ...

User Demographics and Language in an Implicit Social Network
between language and demographics of social media users (Eisenstein et .... YouTube, a video sharing site. Most of the ... graph is a stricter version of a more popular co-view .... over all the comments (10K most frequent unigrams were used ...

Implicit Factors in Networked Information Feeds
School of Information and Library Science. University of North ..... degree density between prominent members of the NIF has implications for the variety, ...

Variational Program Inference - arXiv
reports P(e|x) as the product of all calls to a function: .... Evaluating a Guide Program by Free Energy ... We call the quantity we are averaging the one-run free.

Model-induced Regularization
The answer may be trivial; we get an unregularized es- timator. (More accurately, the mode of the Bayesian predictive distribution coincides to the maximum like- lihood (ML) estimator.) Suppose next the following model: p(x) = N(x; ab, 12). (2). Here

Elastic Net Regularization in Learning Theory
Elastic Net Regularization in Learning Theory ... Center for Biological and Computational Learning, Department of Brain and ... prone to overfit the data.

Entropic Graph Regularization in Non-Parametric ... - NIPS Proceedings
Most of the current graph-based SSL algorithms have a number of shortcomings – (a) in ... clude [7, 8]) attempt to minimize squared error which is not optimal for classification problems [10], ..... In this section we present results on two popular

Elliptical moveout operator for data regularization in ...
Dec 11, 2012 - Elliptical moveout operator for data regularization in azimuthally anisotropic media. Jeffrey Shragge1. ABSTRACT. Data regularization by azimuthal moveout (AMO) is an important seismic processing step applied to minimize the deleteriou

Entropic Graph Regularization in Non-Parametric Semi ...
We also showed that this objective can be minimized using alternating minimization (AM), and can ... For example, on the Internet alone we create 1.6 ... We use the phone classification problem to demonstrate the scalability of the algorithm.

The Interaction of Implicit and Explicit Contracts in ...
b are bounded for the latter, use condition iii of admissibility , and therefore we can assume w.l.o.g. that uq ª u*, ¨ q ª¨*, and bq ª b*. Since A is infinite, we can assume that aq s a for all q. Finally, if i s 0, i. Ž . Д q. 4 condition iv

Implicit versus explicit interference effects in a number ...
digm, a color name is presented in different ink colors (e.g., the word RED in red or .... a Windows 2000 desktop computer. Stimuli were .... First, the slower RTs observed in all four incongruent conditions demonstrate that the interference occurred