Smoothness Maximization via Gradient Descents Author: Bin Zhao, Fei Wang, Changshui Zhang Affiliation: Department of Automation, Tsinghua University, Beijing 100084, P.R.China Email: [email protected], [email protected], [email protected]

Semi-Supervised Learning

Iterative Smoothness Maximization

Learning from partially labeled data

Convergence Study

• Smoothness maximization:

• Transductive Learning

∂Q = 0 ⇒ F = (1 − α)(I − αS)−1T ∂F

• Inductive Learning

• Smoothness Maximization: (3)

where S = D −1/2W D −1/2 and α = 1/(1 + µ) • Graph reconstruction: optimize the objective function Q w.r.t. σ via gradient descent

Graph Based Semi-Supervised Learning

∂ 2Q H= = I − S + µI 2 ∂f

∀x ∈ Rn, x 6= 0, xT Hx = xT (I − S)x + µxT x !2 n n X X 1 1 x2i > 0 (8) xj + µ = Wij √ xi − p Dii Djj

• Iterate between the above two steps until convergence

• E: edge set, each edge eij ∈ E associated with a weight wij = exp (−kxi − xj k2/(2σ 2))

Toy Data (Two−moon)

2

unlabeled point labeled point +1 labeled point −1

1.5

1.5

1

1

0.5

0.5

0

0

−0.5

−0.5

−1

−1

−1.5 −2

−1

0

1

(a) 2

d2ij 2σ 2

d2ij 2σ 2

∂ exp(− ) d2ij exp(− ) ∂Wij = = ∂σ ∂σ σ3 2 P d 2 exp(− ij ) X X d W ∂ ∂Wij ∂Dii j ij ij 2σ 2 = = = ∂σ ∂σ ∂σ σ3 j

−1.5 3 −2

2

Gradient Computing

Compute the gradient of Q(F, σ) w.r.t. σ with F fixed as F ∗ = arg minF Q  " " #2 # n  Fj Fj Fi ∂Q(F, σ) X ∂Wij Fi √ = −p −p −Wij √  ∂σ ∂σ Dii Dii Djj Djj i,j=1     ∂D F F ∂D  jj  j ii · q i −q (4)   3 ∂σ 3 ∂σ  Djj Dii

Classification Result with Sigma=0.15

2

−1

0

(b)

Hence, the Hessian matrix is positive definite, moreover, f ∗ is the ∗ is the global minimum of Q unique zero point of ∂Q . Therefore, f ∂f with σ fixed. Q(fn+1, σn) < Q(fn, σn) (9)

(1)

The Influence of σ

1

2

3

i=1

i,j=1

Represent the dataset as an weighted undirected graph G =< V, E > • V: node set corresponding to the labeled and unlabeled examples

(7)

• Graph Reconstruction: learning rate η in gradient descent guarantee Q(fn+1, σn+1) < Q(fn+1, σn)

(10)

Hence, objective function Q decreases monotonically and is lower bounded by 0. The algorithm is guaranteed to converge.

Digit Recognition

(5)

Sigma ~ Step of Iteration

2

1

(6)

j

0.95 0.9

1.95

0.85 0.8

1.9

Classification Result with Sigma=0.4

Digit Recognition Accuracy on USPS

0.75 0.7

1.85

1.5

Learnign Rate Selection

1 0.5

ISM NN LLGC SVM

0.65 1.8 0

20

40

(a)

60

80

100

0.6

10

20

(b)

30

40

50

0 −0.5 −1 −1.5 −2

−1

0

(c)

1

2

3

• σ is updated as σnew = σold − η ∂Q ∂σ |σ=σold • learning rate η affects the performance of gradient descent severely

• In ISM, η is selected dynamically to accelerate convergence of the algorithm. We hope the objective function decreases monotonically to guarantee the convergence. Also, to simplify our method, we hope the value of σ avoids oscillation.

Graph Construction

Object Recognition

70

Sigma ~ Step of Iteration

1

68

0.9

66

0.8

64 62

• Graph construction is at the heart of GBSSL. The final classification result is significantly affected by σ • No reliable approach that can determine an optimal σ automatically so far

Implementation of Iterative Smoothness Maximization

Objective Function

• Tij = 1 if xi is labeled as ti = j and Tij = 0 otherwise • Classification result represented as F = [F1T , . . . , FnT ]T , which determines the label of xi by ti = arg max1≤j≤c Fij P • W is the weight matrix, and Dii = j Wij  

2 n n

F

X F 1X

i j √ p Q= Wij kFi − Tik2 (2) −



Dii 2 Djj i=1

• First term measures label smoothness • Second term measures label fitness

• µ > 0 adjust the tradeoff between these two terms

0.7 ISM NN LLGC SVM

60

0.6

58 56 0

10

20

(a)

30

40

50

0.5

5

10

(b)

15

1. Initialization. σ = σ0, total iteration steps N0, initial learning rate η0 and small learning rate ηs 2. Calculate the optimal F .

i,j=1

Object Recognition Accuracy on COIL−20

3. Update σ with gradient descent and adjust learning rate η. σˆ1 = σn − η ∂Q ∂σ |σ=σn   |σ=σˆ1 (a) If Q(Fn+1, σˆ1) < Q(Fn+1, σn) and sgn ∂Q ∂σ   ∂Q | = sgn ∂Q , then η = 2η, σ ˆ = σ − η n 2 ∂σ σ=σn ∂σ |σ=σn . i. If Q(Fn+1, σˆ2) < Q(Fn+1, σn) and     ∂Q sgn ∂Q | = sgn ∂σ σ=σˆ2 ∂σ |σ=σn , then σn+1 = σˆ2

ii. Else σn+1 = σˆ1, η = η/2 (b) Else η = ηs, σn+1 = σn

4. If n > N0, quit iteration and output classification result; else, go to step 2.

References

• D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf, Learning with local and global consistency, Advances in Neural Information Processing Systems, 2003. • M. Belkin, P. Niyogi, and V. Sindhwani, On manifold regularization, Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, 2005. • X. Zhu, Semi-supervied learning literature survey, Computer Sciences Technical Report, 1530, University of Wisconsin- Madison, 2006. • L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, Advances in Neural Information Processing Systems, 2004.

Smoothness Maximization via Gradient Descents

Iterate between the above two steps until convergence ... to guarantee the convergence. Also ... X. Zhu, Semi-supervied learning literature survey, Computer Sci-.

265KB Sizes 2 Downloads 148 Views

Recommend Documents

SMOOTHNESS MAXIMIZATION VIA GRADIENT ...
State Key Laboratory of Intelligent Technologies and Systems,. Department of ..... 5http://www1.cs.columbia.edu/CAVE/software/softlib/coil-100.php. 18 and the ...

GRADIENT IN SUBALPINE VVETLANDS
º Present address: College of Forest Resources, University of Washington, Seattle .... perimental arenas with alternative prey and sufficient habitat complexity ...... energy gain and predator avoidance under time constraints. American Naturalist ..

Motion Segmentation by Spatiotemporal Smoothness ...
smoothness both in the spatial and temporal domains si- ... 1. Introduction. Motion segmentation is one of the most important as- pects of video sequence analysis, and serves ..... cent layers are tested for merging, and based on those pair-.

A Soft Edge Smoothness Prior for Color Image Super-Resolution
May 21, 2009 - and unsupervised solution for by the spectral clustering tech- nique. Thus ...... from the Georgia Institute of Technology, Atlanta, in. 1981 and ...

Repetition Maximization based Texture Rectification
Figure 1: The distorted texture (top) is automatically un- warped (bottom) using .... however, deals in world-space distorting and not with cam- era distortions as is ...

Repetition Maximization based Texture Rectification
images is an essential first step for many computer graph- ics and computer vision ... matrix based rectification [ZGLM10] can be very effective, most of our target ...

SPARSITY MAXIMIZATION UNDER A ... - Semantic Scholar
This paper considers two problems in sparse filter design, the first in- volving a least-squares ..... We used a custom solver for the diagonal relaxation; ... sparse FIR filters using linear programming with an application to beamforming,” IEEE ..

Expected Sequence Similarity Maximization - Semantic Scholar
ios, in some instances the weighted determinization yielding Z can be both space- and time-consuming, even though the input is acyclic. The next two sec-.

Path-Constrained Influence Maximization in ...
marketers may want to find a small number of influential customers and ...... AT&T. 42 should be different under different targeting node types. While in real-world applications, even we have the same ... search fields related to their business.

Throughput Maximization for Opportunistic Spectrum ...
Aug 27, 2010 - Throughput Maximization for Opportunistic. Spectrum Access via Transmission. Probability Scheduling Scheme. Yang Cao, Daiming Qu, Guohui Zhong, Tao Jiang. Huazhong University of Science and Technology,. Wuhan, China ...

Extended Expectation Maximization for Inferring ... - Semantic Scholar
uments over a ranked list of scored documents returned by a retrieval system has a broad ... retrieved by multiple systems should have the same, global, probability ..... systems submitted to TREC 6, 7 and 8 ad-hoc tracks, TREC 9 and 10 Web.

Synthesis, temperature gradient interaction ...
analysis of these combs, providing directly the distribution of the number of arms on the synthesised ..... data sets are shifted vertically by factors of 10 for clarity.

Path-Constrained Influence Maximization in ...
For example, mobile phone marketers may want to find a small number of influential customers and give them mobile phones for free, such that the product can ...

PENALTY FUNCTION MAXIMIZATION FOR LARGE ...
state-level Hamming distance versus a negative phone-level ac- curacy. Indeed ... The acoustic features for the English system are 40-dimensional vectors obtained via an .... [3] F. Sha and L. Saul, “Comparison of large margin training to other ...

Welfare Maximization in Congestion Games
We also describe an important and useful connection between congestion .... economic literature on network formation and group formation (see, e.g., [22, 13]). .... As a last step, we check that this result cannot be outperformed by a trivial.

Throughput Maximization for Opportunistic Spectrum ... - IEEE Xplore
Abstract—In this paper, we propose a novel transmission probability scheduling scheme for opportunistic spectrum access in cognitive radio networks. With the ...

Robust Utility Maximization with Unbounded Random ...
pirical Analysis in Social Sciences (G-COE Hi-Stat)” of Hitotsubashi University is greatly ... Graduate School of Economics, The University of Tokyo ...... Tech. Rep. 12, Dept. Matematica per le Decisioni,. University of Florence. 15. Goll, T., and

Evolution of Movement Smoothness and Submovement ...
successful decomposition of movement data into submovements may produce suffi- cient evidence .... 2 Movement smoothness changes during stroke recovery. 23 ..... While there is general agreement in the medical and therapy fields that this.

A Soft Edge Smoothness Prior for Color Image Super-Resolution
Apr 10, 2009 - borhood system represented by a set of vectors. , where is the neighborhood order, and the. 's are chosen as the relative position (taking integer values as its components, and the unit is the grid interval) of the nearest neighbors wi

Synthesis, temperature gradient interaction ...
2Department of Chemistry and Center for Integrated Molecular Systems,. Pohang .... For the normal phase temperature gradient interaction chromatography (NP-TGIC) analysis, a ..... data sets are shifted vertically by factors of 10 for clarity.

An Urban-Rural Happiness Gradient
Abstract. Data collected by the General Social Survey from 1972 to 2008 are used to confirm that in the United States, in contrast to many other parts of the world, there is a gradient of subjective wellbeing (happiness) that rises from its lowest le

Robust Maximization of Asymptotic Growth under ... - CiteSeerX
Robust Maximization of Asymptotic Growth under Covariance Uncertainty. Erhan Bayraktar and Yu-Jui Huang. Department of Mathematics, University of Michigan. The Question. How to maximize the growth rate of one's wealth when precise covariance structur

SPARSITY MAXIMIZATION UNDER A ... - Semantic Scholar
filters, as a means of reducing the costs of implementation, whether in the form of ... In Section. 2 it is shown that both problems can be formulated in terms of a.