Implicit Regularization in Variational Bayesian Matrix ...

Viewer
Transcript

Core Technology Center

Tokyo Institute of Technology

Implicit Regularization in Variational Bayesian Matrix Factorization Shinichi Nakajima (Nikon) Masashi Sugiyama (Tokyo Tech.) June 22, 2010

NIKON CORPORATION Core Technology Center June 22, 2010

Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods

¾ Theoretical results ¾ VBMF has implicit regularization

¾ Discussion ¾ Mechanism of implicit regularization

¾ Conclusion

2

NIKON CORPORATION Core Technology Center June 22, 2010

Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods

¾ Theoretical results ¾ VBMF has implicit regularization

¾ Discussion ¾ Mechanism of implicit regularization

¾ Conclusion

3

NIKON CORPORATION Core Technology Center June 22, 2010

Matrix factorization

…

Movie M

Multivariate analysis (RRR, CCA, oPLS) Missing entries prediction (CF)

Movie 3 Movie 2 Movie 1

Applications:

User 1 User 2 User 3 …

Output

Input

User L

4

NIKON CORPORATION Core Technology Center June 22, 2010

Bayesian matrix factorization model

5

NIKON CORPORATION Core Technology Center June 22, 2010

Bayesian estimation

Not easy to calculate. Approximation methods.

[Salakhutdinov&Mnih2008] 6

NIKON CORPORATION Core Technology Center June 22, 2010

Free energy minimization

[Lim&Teh2007, Raiko et al.2007]

Constraint makes optimization much easier. [Lim&Teh2007] [Raiko et al.2007]

7

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits

[Raiko et al.2007]

[Lim&Teh2007]

8

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

[Raiko et al.2007]

[Lim&Teh2007]

9

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

10

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

Why? 11

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

Why?

Because it’s Bayesian! 12

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

Why?

Because it’s Bayesian!

Implicit regularization exists. 13

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

In this paper, we show implicit regularization of VBMF. explain its mechanism. 14

NIKON CORPORATION Core Technology Center June 22, 2010

VB hardly overfits MAP/ML test error

VB test error

[Raiko et al.2007]

[Lim&Teh2007]

Notes: we assume no missing entries.

[Raiko et al.2007] 15

NIKON CORPORATION Core Technology Center June 22, 2010

Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods

¾ Theoretical results ¾ VBMF has implicit regularization

¾ Discussion ¾ Mechanism of implicit regularization

¾ Conclusion

16

NIKON CORPORATION Core Technology Center June 22, 2010

Maximum likelihood (ML) estimator

17

NIKON CORPORATION Core Technology Center June 22, 2010

MAP estimator (Theorem 1)

18

NIKON CORPORATION Core Technology Center June 22, 2010

VB estimator (Theorem 2, 3)

19

NIKON CORPORATION Core Technology Center June 22, 2010

Implicit regularization

(Singular component-wise) positive-part James-Stein (PJS) estimator. [James&Setin1961, Efron&Morris1973] PJS

20

NIKON CORPORATION Core Technology Center June 22, 2010

Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods

¾ Theoretical results ¾ VBMF has implicit regularization

¾ Discussion ¾ Mechanism of implicit regularization

¾ Conclusion

21

NIKON CORPORATION Core Technology Center June 22, 2010

Simplest case

Equivalent classes 22

NIKON CORPORATION Core Technology Center June 22, 2010

Bayes posterior

23

NIKON CORPORATION Core Technology Center June 22, 2010

Bayes posterior

: ML estimators

: MAP estimators

24

NIKON CORPORATION Core Technology Center June 22, 2010

Bayes posterior

: ML estimators

: MAP estimators

Bayes posterior has two modes! 25

NIKON CORPORATION Core Technology Center June 22, 2010

Bayes posterior

: ML estimators

: MAP estimators

Bayes posterior has two modes! 26

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

27

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

28

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

29

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

Both modes are approximated. 30

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

31

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

One of the modes is approximated. 32

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior Approximation of Bayes posterior

One of the modes is approximated. 33

NIKON CORPORATION Core Technology Center June 22, 2010

VB posterior

Ridge between the peaks attracts VB posterior. Implicit regularization

34

NIKON CORPORATION Core Technology Center June 22, 2010

Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods

¾ Theoretical results ¾ VBMF has implicit regularization

¾ Discussion ¾ Mechanism of implicit regularization

¾ Conclusion

35

NIKON CORPORATION Core Technology Center June 22, 2010

Conclusion ¾ Derived bounds of VBMF solution. ¾ Explained mechanism of implicit regularization. ¾ What were omitted. Please visit to our poster! ¾ Explanation with Jeffreys prior. ¾ Bounds of empirical VBMF (EVBMF) solution.

¾ Future work ¾ Tighter bounds ¾ Analyze imputation cases.

36

NIKON CORPORATION Core Technology Center June 22, 2010

Empirical VBMF (EVBMF)

38

NIKON CORPORATION Core Technology Center June 22, 2010

EVB estimator (Theorem 4, 5)

39

NIKON CORPORATION Core Technology Center June 22, 2010

EVB posterior

Stronger regularization than VB with flat prior. 40

NIKON CORPORATION Core Technology Center June 22, 2010

Jeffreys prior

[Jeffreys1946].

Parameterization invariant.

Equivalent

No regularization

Flat prior in (A, B) distributes more mass around origin than flat prior in U. 41

NIKON CORPORATION Core Technology Center June 22, 2010

Appendix: Negative log likelihood

42

NIKON CORPORATION Core Technology Center June 22, 2010

Appendix: VB Free energy

43

Implicit Regularization in Variational Bayesian ... - Semantic Scholar