A Regularized Line Search Tunneling for Efficient Neural Network Learning Dae-Won Lee, Hyung-Jun Choi, and Jaewook Lee Department of Industrial Engineering, Pohang University of Science and Technology, Pohang, Kyungbuk 790-784, Korea. {woosuhan,chj,jaewookl}@postech.ac.kr

Abstract. A novel two phases training algorithm for a multilayer perceptron with regularization is proposed to solve a local minima problem for training networks and to enhance the generalization property of networks trained. The first phase is a trust region-based local search for fast training of networks. The second phase is an regularized line search tunneling for escaping local minima and moving toward a weight vector of next descent. These two phases are repeated alternatively in the weight space to achieve a goal training error. Benchmark results demonstrate a significant performance improvement of the proposed algorithm compared to other existing training algorithms.

1

Introduction

Many supervised learning algorithms of multilayer perceptrons (MLPs, for short) find their roots in nonlinear minimization algorithms. For example, error backpropagation, conjugate gradient, and Levenberg-Marquardt methods have been widely used and applied successfully to solve diverse problems such as pattern recognition, classification, robotics and automation, financial engineering, and so on [4]. These methods, however, have difficulties in finding a good solution when the error surface are very rugged since they often get trapped in poor sub-optimal solutions. To overcome the problem of local minima and to enhance generalization capability, in this letter, we present a new efficient regularized method for MLPs and demonstrate its superior performance on some difficult benchmark neural network learning problems.

2

Proposed Method

The proposed method is based on viewing the supervised learning of a MLP as an unconstrained minimization problem with a regularization term as follows: min Eλ (w) = Etrain (w) + λEreg (w), w

F. Yin, J. Wang, and C. Guo (Eds.): ISNN 2004, LNCS 3173, pp. 239–243, 2004. c Springer-Verlag Berlin Heidelberg 2004 

(1)

240

D.-W. Lee, H.-J. Choi, and J. Lee

Fig. 1. Basic Idea of Tunneling Scheme

where Etrain (·) is a training error cost function averaged over the training samples which is a highly nonlinear function of the synaptic weight vector w and Ereg (·) is a regularization term to smooth the networks (for example, Ereg (w) = w2 is a weight decay term. See ([4]) for other regularization terms). The proposed training algorithm consists of two phases. The first phase employs a trust region-based local search to retain the rapid convergence rate of secondorder methods in addition to the globally convergent property of gradient descent methods. The second phase employs a regularized line search tunneling to generate a sequence of weight vectors converging to a new weight vector with a lower minimum squared error (MSE), Eλ . The repeated iteration of these two phases alternatively forms a new training procedure which results in fast convergence to a goal error in the weight space. (See Figure 1.)

2.1

Phase I (Trust Region-Based Local Search)

The basic procedure of a trust region-based local search ([3]) adapted to Eq. (1) ˆ is is as follows. For a given weight vector w(n), the quadratic approximation E defined by the first two terms of the Taylor approximation to Eλ at w(n); 1 ˆ E(s) = Eλ (w(n)) + g(n)T s + sT H(n)s 2

(2)

where g(n) is the local gradient vector and H(n) is the local Hessian matrix. A trial step s(n) is then computed by minimizing (or approximately minimizing) the trust region subproblem stated by ˆ min E(s) s

subject to s2 ≤ ∆n

(3)

where ∆n > 0 is a trust-region parameter. According to the agreement between predicted and actual reduction in the function E as measured by the ratio ρn =

Eλ (w(n)) − Eλ (w(n) + s(n)) , ˆ ˆ E(0) − E(s(n))

(4)

A Regularized Line Search Tunneling

∆n is adjusted between iterations as follows:   s(n)2 /4 if ρn < 0.25 if ρn > 0.75 and ∆n = s(n)2 ∆n+1 = 2∆n  ∆n otherwise The decision to accept the step is then given by  w(n) + s(n) if ρn ≥ 0 w(n + 1) = w(n) otherwise

241

(5)

(6)

which means that the current weight vector is updated to be w(n) + s(n) if Eλ (w(n) + s(n)) < Eλ (w(n)); Otherwise, it remains unchanged and the trust region parameter ∆n is shrunk and the trial step computation is repeated. 2.2

Phase II (Regularized Line Search Tunneling)

Despite of its rapid and global convergence properties ([3]), the trust regionbased local search would get trapped at a local minimum, say w∗ . To escape from this local minimum, our proposed method, which we call a regularized line ˆ of next descent, search tunneling, attempts to compute a weight vector, say w by minimizing a subproblem given by min Eλ (w(t))

(7)

t>0

where {w(t) : t > 0} is the solution trajectory of a tunneling dynamics described by dw(t) = −∇Ereg (w(t)), dt

w(0) = w∗

(8)

One distingushed feature of the proposed tunneling technique is that the obˆ is located normally outside the convergent region of w∗ tained weight vector w with respect to the trust-region method of Phase I so that applying a trust-region ˆ leads us to get another locally optimal weight vector. Another local search to w feature of the proposed method is that the value of Ereg becomes relatively small during the regularized line search tunneling in Eq. (8). Consequently, these features make it easier to find a new weight vector of next descent with a lower MSE, thereby enhancing generalization ability.

3

Simulation Results

To evaluate the performance of the proposed algorithm, we conducted experiments on some benchmark problems we have found in the literature. The neural network models for the applied benchmark problems (Iris, Sonar, 2D-sinc, and Mackey-Glass) are 4-6-3-1, 60-5-1, 2-15-1, and 2-20-1, respectively. Table 3 shows the performance of our proposed algorithm compared to error back-propagation

242

D.-W. Lee, H.-J. Choi, and J. Lee Table 1. Experimental Results results for the benchmark data Benchmark T Iris E1 E2 T Sonar E1 E2 T 2D-sinc E1 E2 T Mackey-Glass E1 E2

EBPR 235 1% 4% 273 1.0% 27.8% 2652 0.0055 0.0078 671 0.5640 0.5643

DTR 263 1% 4% 265 0.0% 27.8% 2843 0.0052 0.0074 6365.3 0.0664 0.0668

BR 1.7 1.2% 6% 11.8 0.0092 0.0120 156.2 0.483 0.597

LMR 1.4 1.7% 4.6% 16.5 0.0% 29.8% 10.0 0.0054 0.0087 3.6 0.6109 0.6147

GA 347 4.9% 8.2% 182.9 7.95% 36.2% 1824 0.0112 0.0142 4765.6 0.7313 0.7235

SA 499 5.8% 11.2% 224.6 13.1% 41.5% 3966 0.0287 0.0308 9454 0.7919 0.7730

Proposed 18.2 1.1% 4% 43.4 0.0% 25.0% 12.9 0.0052 0.0073 15.0 0.0418 0.0428

based regularization (EBPR) [4], Dynamic tunneling based regularization (DTR) [5], Baysian Regularization (BR) [2], Levenberg-Marquardt based regularization (LMR) [4], genetic algorithm based network training (GA) and simulated annealing based network training (SA). Experiments were repeated a hundred times for every algorithm in order to decrease effects of randomly chosen initial weight vector. The criteria for comparison are the average time of training (T), the mean squared (or misclassification) training error(E1) and test error(E2). The results demonstrate that the new algorithm not only successfully achieves the goal training error and smaller test error for all these benchmark problems but also is substantially faster than these state-of-art methods.

Fig. 2. Convergence curve for Mackey-Glass problem

A Regularized Line Search Tunneling

4

243

Conclusion

In this paper, a new deterministic method for training a MLP has been developed. This method consists of two phases: Phase I for approaching a new local minimum in terms of a trust region-based local search and Phase II for escaping from this local minimum in terms of line search tunneling. Benchmark results demonstrate that the proposed method not only successfully achieves the goal training error but also is substantially faster than other existing training algorithms. The proposed method has several features: First it does not require a good initial guess. Second, even in the complex network architecture it can bring appropriate tunneling directions against a trapped local minimum. Finally, weights converge relatively small value and reduce regularization error term. The robust and stable nature of the proposed method enable to apply it to various supervised learning problems. An application of the method to more large-scale benchmark problems remains to be investigated.

Acknowledgement. This work was supported by the Korea Research Foundation under grant number KRF-2003-041-D00608.

References 1. Barhen, J., Protopopescu, V., Reister, D.: TRUST: A Deterministic Algorithm for Global Optimization. Science, Vol. 276 (1997) 1094-1097 2. Foresee, F.D., Hagan, M.T.: Gauss-Newton Approximation to Bayesian Regularization. In Procedeengs, International Joint Conference on Neural Networks (1997) 1930-1935 3. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999) 4. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, New York (1999) 5. Singh, Y.P., Roychowdhury, P.: Dynamic Tunneling Based Regularization in Feedforward Neural Networks. Artificial Intelligence, Vol. 131 (2001) 55-71

A Regularized Line Search Tunneling for Efficient Neural Network ...

Efficient Neural Network Learning. Dae-Won Lee, Hyung-Jun Choi, and Jaewook Lee. Department of Industrial Engineering,. Pohang University of Science and ...

772KB Sizes 0 Downloads 291 Views

Recommend Documents

Alternative Regularized Neural Network Architectures ...
collaboration, and also several colleagues and friends for their support during ...... 365–370. [47] D. Imseng, M. Doss, and H. Bourlard, “Hierarchical multilayer ... identity,” in IEEE 11th International Conference on Computer Vision, 2007., 2

Efficient Training Strategies for Deep Neural Network ... -
many examples using the back-propagation algorithm and data selection ? Is care- ful initialization important ? How to speed-up training ? Can we benefit from ...

Efficient Training Strategies for Deep Neural Network ... -
many examples using the back-propagation algorithm and data selection ? Is care- ful initialization important ? How to speed-up training ? Can we benefit from ...

Efficient Computation of Regularized Boolean ...
enabled the development of simple and robust algorithms for performing the most usual and ... Some practical applications of the nD-EVM are also commented. .... Definition 2.3: We will call Extreme Vertices of an nD-OPP p to the ending ...

Efficient DES Key Search
operation for a small penalty in running time. The issues of development ... cost of the machine and the time required to find a DES key. There are no plans to ...

Learning a L1-regularized Gaussian Bayesian network ...
with other states depends on the specific DAG of the equivalence class, so the best transition might not be performed if equivalence classes are .... As well as (Nielsen et al., 2003) do, we do not cal- culate the complete IB neighbourhood at ...

Neural Network Toolbox
3 Apple Hill Drive. Natick, MA 01760-2098 ...... Joan Pilgram for her business help, general support, and good cheer. Teri Beale for running the show .... translation of spoken language, customer payment processing systems. Transportation.

LONG SHORT TERM MEMORY NEURAL NETWORK FOR ...
a variant of recurrent networks, namely Long Short Term ... Index Terms— Long-short term memory, LSTM, gesture typing, keyboard. 1. ..... services. ACM, 2012, pp. 251–260. [20] Bryan Klimt and Yiming Yang, “Introducing the enron corpus,” .

A programmable neural network hierarchical ...
PNN realizes a neural sub-system fully controllable (pro- grammable) behavior ...... comings of strong modularity, but it also affords flex- ible and plausible ...

A Review on Neural Network for Offline Signature Recognition ... - IJRIT
Based on Fusion of Grid and Global Features Using Neural Networks. ... original signatures using the identity and four Gabor transforms, the second step is to ...

A Deep Convolutional Neural Network for Anomalous Online Forum ...
as releasing of NSA hacking tools [1], card cloning services [24] and online ... We propose a methodology that employs a neural network to learn deep features.

Development and Optimizing of a Neural Network for Offline Signature ...
Computer detection of forgeries may be divided into two classes, the on-line ... The signature recognition has been done by using optimum neural network ...

Neural Network Toolbox
[email protected] .... Simulation With Concurrent Inputs in a Dynamic Network . ... iii. Incremental Training (of Adaptive and Other Networks) . . . . 2-20.

A Regenerating Spiking Neural Network
allow the design of hardware and software devices capable of re-growing damaged ..... It is interesting to analyse how often a single mutilation can affect the ...

A Novel Three-Phase Algorithm for RBF Neural Network Center ...
Network Center Selection. Dae-Won Lee and Jaewook Lee. Department of Industrial Engineering,. Pohang University of Science and Technology,. Pohang ...

Neural Network Toolbox
to the government's use and disclosure of the Program and Documentation, and ...... tool for industry, education and research, a tool that will help users find what .... Once there, you can download the TRANSPARENCY MASTERS with a click.

A Neural Network for Global Second Level Trigger
•Calorimeter parameters (5 values). •TRT parameters (2 values). •Preshower parameters (3 values). •Muon detector (1 value). From these 12 parameters one ...

A Cyclostationary Neural Network Model for the ...
fects the health of the community and directly influences the sustainability ... in order to enable the development of tools for the management and reduction of pollution ..... [10] G. Masters, Introduction to Environmental Engineering and Science.

Validation of a constraint satisfaction neural network for ...
In addition, the effect of missing data was evaluated in more detail. Medical Imaging .... Receiver Operating Characteristics (ROC) analysis. We used the ROCKIT ...

Convolutional Neural Network Committees For Handwritten Character ...
Abstract—In 2010, after many years of stagnation, the ... 3D objects, natural images and traffic signs [2]–[4], image denoising .... #Classes. MNIST digits. 60000. 10000. 10. NIST SD 19 digits&letters ..... sull'Intelligenza Artificiale (IDSIA),

A Review on Neural Network for Offline Signature ...
This paper represents a brief review on various approaches used in signature verification systems. Keywords: Signature, Biometric, Artificial Neural Networks, Off-line Signature Recognition and Verification. I. INTRODUCTION. Biometrics are technologi

A Multi-Module Minimization Neural Network for Motion ...
Abstract–A competitive learning network, called Multi-Module Mini- mization (MMM) Neural ... not be mistakenly modeled as a meaningful class. Accordingly, we.

A Simple Feedforward Neural Network for the PM10 ... - Springer Link
Dec 23, 2008 - A Simple Feedforward Neural Network for the PM10. Forecasting: Comparison with a Radial Basis Function. Network and a Multivariate Linear ...