Improvement of Learning Algorithms for RBF Neural Networks in a Helicopter Sound Identification System Gh. A. Montazer*

Reza Sabzevari**

H. Gh. Khatir*

Assis. Prof. of IT Eng.

M.Sc. Student of Mechatronics Eng.

M.Sc. Student of Electrical Eng.

*School of Engineering, Tarbiat Modares University **School of Engineering, Islamic Azad University of Qazvin, Member of YRC Add: P. O. Box 14115-179 , Tehran, Iran e-mail: [email protected] Abstract: This paper presents a set of optimizations in learning algorithms commonly used for training radial basis function neural networks. These optimizations are applied to a RBF neural network used in identifying helicopter types processing their rotor sounds. The first method uses an optimum learning rate in each iteration of train process. This method increases the speed of learning process and also achieves an absolute stability in network response. Another modification is applied to Quick Propagation (QP) method as a generalization which attains more learning speed. Finally, we introduced the General Optimum Steepest Descent (GOSD) method, which contains both improvements in learning RBF networks. All modified methods are employed in training a system which recognizes helicopters’ rotor sounds exploiting a RBF neural network. Comparing results of these learning methods with the previous ones yields interesting outcomes. Keywords: Steepest Descent Method, Radial Basis Functions, Artificial Neural Networks, Audio Processing, Helicopter Type Identification

I. INTRODUCTION The design of a supervised neural network may be pursued in a variety of different ways. The BackPropagation (BP) algorithm or the design of Multi Layer Perceptron (MLP) may be considered as a practical application of an optimization method known in statistics as stochastic approximation [9]. Radial Basis Function (RBF) neural networks take a different approach by considering the design of neural networks as a curve-fitting (approximation) problem in a high dimensional space [9]. According to this viewpoint, the learning process could be deemed as finding a surface in a multidimensional space fitted on training data. The criterion to find the "best fitted surface" uses some statistical computations. Broomhead and Lowe [4] were the first ones exploit the use of radial basis functions in design of neural networks. Other major contributions to the theory, design, and application of RBF networks could be found in works done by Moody and Darken [13], Renals [15] and Poggio and Girosi [14]. Poggio and Girosi considered the learning process of RBF networks as an ill-posed problem. They believe that training data could not be covering enough to construct a unique surface in those regions which the data is not available. From this point of view, learning will be closely related to classical approximation techniques, such as generalized splines and regularization theory [14]. In RBF neural networks the transformation from the input space to the space constructed by hidden layer is nonlinear, whereas the transformation from the space constructed by hidden layer to the output space is linear. The mathematical proof of this claim is based on Cover’s theorem in the separability of patterns [6]. It states that a complex pattern-classification problem which casts nonlinearly in a high dimensional space is more likely to be linearly separable in a low-dimensional space. It is obvious that once the pattern is separable, then the classification problem is relatively easy to solve. Thus, beside the curve fitting capabilities of RBF networks, they are also suitable for pattern classification applications. One of the major differences between RBF networks and MLP networks is that RBFs have localized characteristics. That is, they provide a nonzero output for a portion of input space surrounding the

1

center of the RBF, which is not the case in MLP networks [11, 2]. According to the localized characteristics of RBF networks, the nonlinear transformations used in these networks would be limited and also as Gaussian functions (i.e., they ensemble multidimensional Gaussian probability density functions). In fact, Poggio and Girosi [14] proved that Gaussian functions are the only functions that could minimize the estimation error by minimizing the equation ε ( F ) = ε s ( F ) + λ ε c ( F ) , where

ε s (F ) is the standard error term, F is the Gaussian function, ε c (F ) is the regularization term and λ is the regularization parameter came from Regularization Theory [16]. There are a wide variety of learning strategies that have been proposed in literatures for changing the parameters of a RBF network. These strategies are of two main categories. The first category contains strategies in which centers and variances of the network are changed, including [9]: 1) Fixed centers selected at random 2) Self-organized selection of centers, containing: a) K-means clustering procedure, b) The self-organizing feature map clustering procedure, 3) Supervised selection of centers, 4) Supervised selection of centers and variances. The second category includes strategies in which the weights of the network are changed, containing [9]: 1) The pseudoinvers (minimum-norm) method [4], 2) The Least-Mean-Square (LMS) method [17], 3) The Steepest Descent (SD) method, 4) The Quick Propagation (QP) method. The pseudoinvers method is the most direct learning algorithm used in RBF neural networks. This method finds solutions in which results the minimum-norm weights for a given data. A variation of this method, which could be considered as a regularization, is that of Hennessey et al. [9, 14]. The only disadvantage of this method is the need of inversing a N × N matrix, where N is the number of nodes in the RBF network which is often a large number. Thus, this method has high computational and memory usage costs, especially when the number of nodes is large which is the common case in practice. The LMS method is another method which does not need such vast memory. The most noticeable disadvantage of the LMS algorithm is that using this learning method, the network convergence is not promising. And if it goes to convergence, this process would be very slow. The Steepest Descent (SD) method uses the gradient of error function for each stage of learning process to produce weights of the next stage. The disadvantage of this method is also the slow rate of convergence. The Quick Propagation (QP) method is a way to optimize the SD method. It uses the gradients of the error function in learning process: the current gradient, and the former gradients (quantum). This method excels the learning procedure, although it is still slow. This paper presents a set of modified steepest descent methods improving the classical ones and makes them suitable for the problem of identifying helicopters’ type [1, 7]. The organization of the paper is as follows: sections two, three and four introduce the Optimum Steepest Descent (OSD), the Optimum Quick Propagation (OQP) and the General Optimum Steepest Descent (GOSD) methods, respectively. Section five presents the implementation of proposed methods on a helicopter sound identification system and also discussed on comparison with previous methods. And finally, section six concludes the paper. II. OPTIMIZATION OF LEARNING RATE FOR SD METHOD Let us consider the following definitions:

Yd = [ ydi ] , i = 1,..., M

(1)

where Yd is the data sample vector and M is the number of samples.

W = [w j ] ,

j = 1,..., N h

(2)

where W is the weight vector and N h is the number of hidden neurons.

Φ = [φ j ( xi )] , i = 1,..., M ,

j = 1, ..., N h

(3)

where Φ is the general Radial Basis Function value matrix, which for the Gaussian RBFs we have

φ j ( xi ) = exp[−

( xi − c j ) 2

δ ij2

]

(4)

2

In a RBF neural network we have

Y = [ yi ] = W ΦT , i = 1,..., M

(5)

where Y is the estimated output vector. It is obvious that the error vector is: E = Yd − Y = Yd − W Φ T (6) and the sum squared error, which should be minimized through the learning process, will be: 1 J = E ET (7) 2 In the conventional SD method, the new weights are computed using the gradient of J in the W space: 1 ∂( E E T ) ∂ (W Φ T ) ∂J ∂Y OJ = = 2 = (Yd − Y ) =E = EΦ (8) ∂W ∂W ∂W ∂W ΔW = OJ = E Φ

(9) (10)

Wnew = Wold + λ ΔW

where the coefficient λ is called Learning Rate, and remains constant through the learning process. It is clear that although Equation 9 shows the optimum direction of delta weight vector, in the sense of first order estimation, but it still does not specify the optimum length of this vector and therefore the Optimum Learning Rate (OLR). To achieve the OLR, the sum squared error of the new weights should be computed employing Equations 6, 7 and 10: T 1 J (W + λ ΔW ) = (Yd − (W + λ ΔW ) Φ T )(Yd − (W + λ ΔW ) Φ T ) 2 T 1 1 1 (11) = (E − λΔW Φ T )(E − λ ΔW Φ T ) = E E T − λ E Φ ΔW T + λ2 ΔW Φ T Φ ΔW T 2 2 2 = A + Bλ + Cλ2 where 1 A = E ET , (12) 2 B = − E Φ ΔW T , (13) and 1 C = ΔW Φ T Φ ΔW T (14) 2 are scalar constants. Thus, J (W + λ ΔW ) is a quadratic function of λ with constant coefficients A, B and C. Now, considering these coefficients in detail: 1 1M A = E E T = ∑ Ei2 > 0 (15) 2 2 i =1 .9 (16) B = − E Φ ΔW T ⎯Eq ⎯→ ⎯ B = − E Φ ΦT E T = −( E Φ)( E Φ)T ≤ 0

C=

1 1 . 14 E Φ ΦT Φ ΦT E T ⎯Eq ⎯⎯ → C = ( E Φ ΦT )( E Φ ΦT )T ≥ 0 2 2

(17)

J (λ ) will define a quadratic function of Φ with positive coefficients of second order term. Thus, it would have a minimum which can be found computing the derivative of J (λ ) :

∂J ∂ ( A + Bλ + Cλ2 ) = = B + 2λ C = 0 ∂λ ∂λ hence,

λmin = −

B ( EΦ )( EΦ )T = 2C ( EΦΦ T )( EΦΦ T )T

(18)

This learning rate minimizes the J (λ ) , and so we can call it the Optimum Learning Rate, OLR:

λopt =

( EΦ )( EΦ ) T ( EΦΦ T )( EΦΦ T ) T

≥0

(19)

3

Now the Optimum Delta Weight Vector (ODWV) can be determined as: ΔWopt = λopt ΔW =

( EΦ )( EΦ )T EΦ

( EΦΦ T )( EΦΦ T )T To show the absolute stability of OLRSD method, let us determine the J (W + λ ΔW ) for

(20)

λopt employing Equation 18: J (W + λopt ΔW ) = A + Bλ + Cλ2

λopt

= A−

B2 B2 = J (W ) − 4C 4C

It is known from Equation 17 that C ≥ 0 . Thus, the term

(21)

B2 is always positive except when B = 0 , 4C

so: B = −( EΦ)( EΦ)T = 0 ⇒ EΦ = 0 ⇒ E = 0 ⇒

J =0

(22)

Ignoring the case when J = 0 , the sum squared error is always decreasing. Thus, it gains the absolute stability.

III. OPTIMIZATION OF LEARNING RATE IN QP METHOD If we define the new weight vector as: Wnew = Wold + λ1 ΔW1 + λ2 ΔW2 Then, using Equation 7 the sum of squared error, J is:

(23)

1 E (Wnew ) E T (Wnew ) 2 1 = [ yd − (λ1 ΔW1 + λ2 ΔW2 ) ΦT ][ yd − (λ1 ΔW1 + λ2 ΔW2 ) ΦT ]T 2 1 T = yd yd − 2λ1 yd ΦΔW1T − 2λ2 yd ΦΔW2T + λ12 ΔW1ΦT ΦΔW1T 2 + λ22 ΔW2ΦT ΦΔW2T + 2λ1λ2 ΔW1ΦT ΦΔW2T

J (Wnew ) =

(24)

(

)

Let us define the following terms

1 T yd yd 2 B = − y d ΦΔW1T

(25-b)

C = − y d ΦΔW

(25-c)

A=

(25-a)

T 2

1 ΔW1Φ T ΦΔW1T 2 1 E = ΔW2 Φ T ΦΔW2T 2 1 F = ΔW1Φ T ΦΔW2T 2 and consider the J , as a function of λ1 , λ 2 , so we will have

D=

J (λ1 , λ 2 ) = A + B λ1 + C λ 2 + D λ12 + E λ22 + 2 F λ1λ 2 If we define ⎡D F ⎤ U =⎢ ⎥ ⎣F E ⎦ ⎡B⎤ V =⎢ ⎥ ⎣C ⎦

⎡ λ1 ⎤ ⎥ ⎣λ 2 ⎦

λ=⎢

(25-d) (25-e) (25-f)

(26)

(27) (28) (29)

4

then the J , its gradient vector and its Hessian matrix are respectively: J new = λT U λ + V T λ + A

(30)

OJ (λ ) = 2 U λ + V

(31)

O J (λ ) = 2 U (32) It is well known [9] that if the Hessian matrix of a function is positive definite, it will always have a unique minimum point which could be achieved using the gradient of the function. We have proved that U is semi-definite, which means that its eigenvalues are all negative or zero (see Appendix). In 2

addition, also we have proved that U is positive-definite except when DE − F 2 = 0 , which causes ΔW1 = α ΔW2 , i.e. in the case when there is no further necessity for ΔW1 . On the other hand, there exists a condition for using Optimum Quick Propagation (OQP) method: DE − F 2 = 0 . So, there would be two states. In the first state the U matrix is positive infinite and also invertible when DE − F 2 ≠ 0 . Therefore, J would have a minimum. We use its gradient to find the minimum point, ( λ1 , λ 2 ): OJ (λ ) = 2 U λ + V = 0 (33) or (34) λopt = −(2 U ) −1V

Using equations 27 - 29 we have: ⎡ λ1 ⎤ ⎡ EB − FC ⎤ −1 ⎢λ ⎥ = 2 ⎢ DC − FB ⎥ ⎦ ⎣ 2 ⎦ 2 ( DE − F ) ⎣

(35)

In the second state, where DE − F 2 = 0 , there is no requirement to use the OQP method, so we can use the former OSD method. It is clear that OQP method is a generalization of the previous OSD method, which at a certain point diminishes to that. It is also clear that OQP performs faster than OSD, because it uses two directions for decreasing the J . This fact guides us to the General Optimum Steepest Descent method, which is the subject of the next section.

IV. THE GENERAL OPTIMUM STEEPEST DESCENT METHOD

In general optimum steepest descent method, we use the following new weight vector: K

Wnew = Wold + ∑ λi ΔWi

(36)

i =1

where K is the number of previous stages used to calculate the Learning Rate (LR) for current stage. Thus, in the next step error would be:

J (Wnew ) =

K K 1 1 E (Wnew ) E (Wnew )T = [ yd − (∑ λi ΔWi ) ΦT ][ yd − (∑ λi ΔWi ) ΦT ]T 2 2 i =1 i =1

K K K ⎞ 1⎛ T = ⎜⎜ yd yd − 2 yd ∑ λi ΔWi T Φ + ∑∑ λi λ j ΔWi ΦT ΦΔWi ⎟⎟ 2⎝ i =1 i =1 j =1 ⎠

(37)

If we assume

ΔW = [ΔW1 , K , ΔW K ]T

(38)

and

Λ = [λ1 , K , λ K ]T then considering definitions 1 U = ΔW Φ T Φ ΔW T 2

(39) (40)

V = − y d ΔW T Φ

A=

(41)

1 T yd yd 2

(42)

we have: J new = ΛT U Λ + V T Λ + A

(43)

5

OJ (Λ ) = 2 U Λ + V

(44)

O J (Λ ) = 2 U Similar to QP method we can obtain the optimum learning rate as follows: Λ opt = −(2 U ) −1V 2

(45) (46)

and W new = Wold + Λ opt ΔW

(47)

V. EXPERIMANTAL RESULTS

We used samples of aerodynamically generated sounds of helicopters’ main rotors to train a RBF neural network as the core of a helicopters’ type identification system [3]. Different designs in helicopters, cause variations in aerodynamic behavior of different types of helicopters which results variations in wave shapes obtained form sound patterns [5, 12]. Using this fact, our system recognizes the type of helicopters when it receives a sample sound generated by its rotors, of course if the system was trained for that type of helicopter. As we know, technologies used in military systems are classified as strategic information. So, some of helicopters even may not be seen until they used in a battle field. For this reason, we could not provide sound patterns for all types of helicopters at initial time. In our system, we used on hand patterns of sound from helicopters’ rotors to train the network. So, we tried to design our system to be fast and easy enough to retrain the network in the case of facing new types of helicopters in order to support real-time air defense systems. This fact gives us motivations to modify learning algorithms to achieve desired results in fewer iterations of training network [10]. We provided a vector of signal amplitudes for each sound pattern which is sampled in 10 signal cycles as network inputs. Two sample signals are shown in Figure 1. The size of these vectors is assumed to be varying to make the system work on different sampling frequencies. Due to limitations in providing sample rotor sounds for creating train data, our system is initially trained with 12 cleared sample sounds from different types of military affairs helicopters categorized by their rotor configurations: Conventional, Tandem, Coaxial and Cynchropter helicopters. In the following we present discussions on comparing obtained results using these algorithms with the traditional ones.

Figure 1. . Sound pattern for two types of helicopters in 10 cycles sampled at 8.0 KHz frequency; Top: Bell206B, Bottom: Bell-206L.

6

The GOSD method is a generalization of SD methods which uses K recent LRs of learning stages to calculate the new Learning Rate. With K = 1 , this method downgrades to the OSD one and with N K = 2 it achieves the OQP method. Experimental results show that, in most cases K ≈ h is a good 5 choice, where N h is the number of neurons in the hidden layer of neural network. Lower Ks reduce the convergence rate, while the higher ones increase the computation cost, both in memory usage and in time. We trained the RBF network with OSD, OQP, GOSD (when K = 3 ) methods which Figure 2 shows the error function of these method over training iterations. From Figure 2 it is obvious that OQP is faster than OSD and also GOSD is faster than OQP. In general, the more increasing the K , the faster training of the network.

Figure 2. Comparison among OSD, OQP, GOSD methods

To compare this method with previous ones, we first compare the OSD method with the conventional SD method. Figure 3 shows the performance of OSD in comparison with conventional SD method, which apparently shows that OSD method attains the least error in less iteration.

Figure 3. Comparison between SD and OSD methods

7

Also, Figure 4 shows the comparison between conventional QP method and OQP method. It is obvious that OQP is much faster than conventional QP method.

Figure 4. Comparison between QP and OQP methods

Using different learning methods (Steepest Descent, Optimum Steepest Descent, Quick Propagation, Optimum Quick Propagation and General Optimum Steepest Descent), we trained our system for 90 epochs. To examine our system we did sampling in a real noisy environment and passed the filtered signals to the neural network, as network inputs. Finally, acquired results from different learning methods are compared together and show the better performance for our proposed method, as shown in Table 1.

Table 1. Comparing results obtained from conventional and modified training methods.

Input Signal SD Signal1 Signal2 Signal3 Signal4 Signal5 Signal6 Signal7

Type4 Type9 Type2 Type3 Type3

Identified Helicopter Type OSD QP OQP Type4 Type9 Type11 Type6 Type2 Type3 Type3

Type4 Type9 Type6 Type2 Type3 Type3

Type4 Type9 Type7 Type8 Type1 Type3 Type3

Real Signal Types GOSD Type4 Type9 Type7 Type8 Type1 Type10 Type5

Type4 Type9 Type7 Type8 Type1 Type10 Type5

In the phase of examining system, as it is apparent in Table 1, we have tested the performance of training network using different train methods. For this reason 7 sample signals from 7 different helicopter types are employed. According to diagrams showing network errors in consecutive train epochs, figures 2-4, Steepest Descent and Quick Propagation methods reaches to higher error value in 90 epochs of train in comparison with optimized ones. This fact results classification mistakes for our input data which commonly have lots of similarities and also are noise dependent. On the other hand, as shown in Table 1, our proposed methods give satisfactory results in this case.

8

VI. CONCLUSION

This paper proposed a set of optimum steepest descent methods, as some improvements to the previous ones, for training RBF neural networks. In these methods we used an optimum learning rate for conventional SD and QP methods. Also, improving the QP method we proposed the GOSD method, which is much faster than QP. These improvements to learning algorithms of RBF neural networks are performed to work on a helicopters’ sound identifier as a part of intrusion detection structure in an air defense systems. As mentioned in previous section, such system should have the ability to retrain rapidly and also easily for acting satisfactory in facing new types of helicopters. To reach this goal we applied a customized learning method to train our system. Experimental results show that the proposed modifications in learning methods will lead us to our desired classification accuracy with less train iterations, which consequently results better classification outcomes.

Gholam Ali Montazer received his B.Sc. degree in Electrical Engineering from Kh. N. Toosi University of Technology, Tehran, Iran, in 1991, the M.Sc. degree in Electrical Engineering from Tarbiat Modares University, Tehran, Iran, in 1994, and the Ph.D. degree in Electrical Engineering from the same university, in 1998. He is an Assistant Professor of the Department of Information Engineering in Tarbiat Modares University, Tehran, Iran. His areas of research include Information Engineering, Knowledge Discovery, Intelligent Methods, System Modeling, ELearning and Image Mining. Reza Sabzevari received his B.Sc. degree in Computer-Hardware Engineering from Islamic Azad University of Qazvin, Iran, in 2003 and his M.Sc. degree in Mechatronics Engineering from the same university in 2007. His research interests center around Machine Vision, Machine Learning, Artificial Intelligence, Information Engineering and Data Mining. He is a member of Young Researchers’ Club. Hassan Gholipour Khatir received his BS degree in Electrical Engineering (Electronic Tendency) from Petroleum University of Technology in 2001, and the MS degree in Control Engineering from University of Tarbiat Modares in 2004. In 2005 he joined Iran Telecommunication Research Center (ITRC) to perform researches on Wireline Networks as an access network for Iran Next Generation Network (NGN). He is currently is manager of www.Ghatreh.com (Persian News Search Engine).

9

APPENDIX

From (27) we have | U − λI |= 0, λ > 0 (48) or λ2 − tr (U ) λ + | U |= 0 (49) or λ2 − ( D + E ) λ + DE − F 2 = 0 (50) so 1 λ1, 2 = ⎛⎜ ( D + E ) ± ( D + E ) 2 − 4 ( DE − F 2 ) ⎞⎟ (51) ⎠ 2⎝ And from (25-d) and (25-e) we can write: 1 1 (52) D = ΔW1Φ T ΦΔW1T = (ΔW1Φ T )(ΔW1Φ T ) T ≥ 0 2 2 and 1 1 (53) E = ΔW2 Φ T ΦΔW2T = (ΔW2 Φ T )(ΔW2 Φ T ) T ≥ 0 2 2 from Equations 51 to 53 it is clear that λ1 ≥ 0 . On the other hand, in order to λ 2 ≥ 0 , we should have ( D + E ) ≥ ( D + E ) 2 − 4 ( DE − F 2 )

(54)

( D + E ) 2 ≥ ( D + E ) 2 − 4 ( DE − F 2 ) consequently we should have ( DE − F 2 ) ≥ 0

(55)

or

(56)

so, it is clear that if (56) remains true, then λ 2 ≥ 0 . Now consider

1 1 ΔW1ΦT ΦΔW1T ΔW2ΦT ΦΔW2T − ΔW1ΦT ΦΔW2T ΔW1ΦT ΦΔW2T 4 4 1 1 = ΔW1ΦT ΦΔW1T ΔW2ΦT ΦΔW2T − ΔW2ΦT ΦΔW1T ΔW2ΦT ΦΔW1T 4 4

DE − F 2 =

If we define Ψ1 = ΔW1Φ T

(57)

(58)

Ψ2 = ΔW2 Φ then we will have: 1 1 DE − F 2 = (Ψ1 Ψ1T Ψ2 Ψ2 T − Ψ2 Ψ1T Ψ2 Ψ1T ) = | Ψ1 | 2 | Ψ2 | 2 − | Ψ2 Ψ1T | 2 ≥ 0 . 4 4 notice that, the last term in equation 60 is the Shwartz inequality. Now, consider the situation when DE − F 2 = 0 . So, we have | Ψ1 | 2 | Ψ2 | 2 =| Ψ2 Ψ1T | 2 ⇒ Ψ1 = α Ψ2 or ΔW1Φ T = α ΔW2 Φ T ⇒ (ΔW1 − α ΔW 2 ) Φ T = 0 . Now if we multiply both side of 67 by Φ , then we will have: (ΔW1 − α ΔW2 ) Φ T Φ = 0 or (ΔW1 − α ΔW2 ) = 0 or ΔW1 = α ΔW2 . T

(

10

)

(59) (60)

(61) (62) (63) (64) (65)

REFERENCES

[1] S. Akhtar, M. Elshafei-Ahmed, M.S. Ahmed, Detection of Helicopters Using Neural Nets, in IEEE Transactions On Instrumentation And Measurement, Vol. 50, Issue 3 (2001) 749. [2] N. Benoudjit and M. Verleysen, On the Kernel Widths in Radial-Basis Function Networks, in Neural Processing Letters, Vol. 18, Issue 2 (2003) 139-154. [3] K.S. Brentner, F. Farassat, Modeling aerodynamically generated sound of helicopter rotors, in Progress in Aerospace Sciences, Vol.39, Issue 2 (2003), 83-120. [4] D. S Broomhead and D. Lowe, Multi-variable functional interpolation and adaptive networks, in Complex Systems, Vol. 2 (1988) 321-355. [5] A.T. Conlisk, Modern helicopter rotor aerodynamics, in Progress in Aerospace Sciences, Vol. 37, Issue 5 (2001) 419-476. [6] T.M. Cover, Geomeasureal and statistical properties of systems of linear inequalities, in Pattern Recognition, (IEEE Transactions on Electronic Computers, 1965) EC-14. [7] M. Elshafei, S. Akhtar, M.S. Ahmed, Parametric models for helicopter identification using ANN, in IEEE Transactions on Aerospace and Electronic Systems, Vol. 36, Issue 4 (2000) 1242-1252. [8] M. A. Fischler and O. Firschein, Parallel Guessing: a strategy for high-speed computation, in Pattern Recognition, Vol. 20, Issue 2 (Elsevier, 1987) 257–263. [9] S. Haykin, Neural Networks: A Comprehensive Foundation (Prentice Hall USA, 1998). [10] M. Lawrence, T. Trappenberg, A. Fine, Rapid learning and robust recall of long sequences in modular associator networks, in Neurocomputing, Vol. 69 (2006) 634-641. [11] M. Lázaro, I. Santamaría and C. Pantaleón, A new EM-based training algorithm for RBF networks, in Neural Networks, Vol. 16, Issue 1 (2003) 69-77. [12] J.G. Leishman, Principles of Helicopter Aerodynamics (Cambridge University Press, 2006). [13] J. Moody and C. Darken, Fast learning in networks of locally tuned units, in Neural Computations (1989). [14] T. Poggio and F. Girosi, Networks for Approximation and Learning, in Proceedings of the IEEE, Vol. 78, Issue 9 (1990) 1481-1497. [15] S. Renals, Radial basis function network for speech pattern classification, in Electronics Letters, Vol. 25, Issue 7 (1989) 437-439. [16] A. N. Tikhonov, Solution of incorrectly formulated problems and the regularization method, in Soviet Math. Dokl., Vol. 4 (1963) 1035-1038. [17] B. Widrow and M. E. Hoff, Adaptive Switching Circuits, in Chapter Neurocomputing: Foundation of Research (The MIT Press, Cambridge, 1988) 126-134.

11

Improvement of Learning Algorithms for RBF Neural ...

The Quick Propagation (QP) method is a way to optimize the SD method. It uses the gradients ..... manager of www.Ghatreh.com (Persian News Search Engine).

665KB Sizes 3 Downloads 198 Views

Recommend Documents

pdf-0749\radial-basis-function-rbf-neural-network-control-for ...
... apps below to open or edit this item. pdf-0749\radial-basis-function-rbf-neural-network-contr ... design-analysis-and-matlab-simulation-by-jinkun-liu.pdf.

A Novel Three-Phase Algorithm for RBF Neural Network Center ...
Network Center Selection. Dae-Won Lee and Jaewook Lee. Department of Industrial Engineering,. Pohang University of Science and Technology,. Pohang ...

A novel RBF neural network training methodology to ...
laboratory at the Institute of Soil Science, Academia Sinica,. Najing [23]. 2.1. Data Set. As mentioned above, the toxicity data toVibrio fischerifor the. 39 compounds that constituted our data base were obtained from the literature [22]. The toxicit

Neural GPUs Learn Algorithms
Mar 15, 2016 - Published as a conference paper at ICLR 2016. NEURAL ... One way to resolve this problem is by using an attention mechanism .... for x =9=8+1 and y =5=4+1 written in binary with least-significant bit left. Input .... We call this.

Learning Methods for Dynamic Neural Networks - IEICE
Email: [email protected], [email protected], [email protected]. Abstract In .... A good learning rule must rely on signals that are available ...

Learning Improvement Plan
Develop Achievement Teams to support children who are at risk of under-achievement. Ensure all pupils are challenged to their full potential by developing mastery in the new curriculum. Higher. Attaining Pupils (HAP's) across all phases are a priorit

Ensemble Learning for Free with Evolutionary Algorithms ?
Free” claim is empirically examined along two directions. The first ..... problem domain. ... age test error (over 100 independent runs as described in Sec-.

LEARNING AND INFERENCE ALGORITHMS FOR ...
Department of Electrical & Computer Engineering and Center for Language and Speech Processing. The Johns ..... is 2 minutes, and the video and kinematic data are recorded at 30 frames per ... Training and Decoding Using SS-VAR(p) Models. For each ...

pdf-175\artificial-neural-nets-and-genetic-algorithms-proceedings-of ...
... apps below to open or edit this item. pdf-175\artificial-neural-nets-and-genetic-algorithms-p ... nal-conference-in-ales-france-1995-by-david-w-pears.pdf.

Neural Graph Learning: Training Neural Networks Using Graphs
many problems in computer vision, natural language processing or social networks, in which getting labeled ... inputs and on many different neural network architectures (see section 4). The paper is organized as .... Depending on the type of the grap

Learning Methods for Dynamic Neural Networks
5 some of the applications of those learning techniques ... Theory and its Applications (NOLTA2005) .... way to develop neural architecture that learns tempo-.

levenberg-marquardt learning neural network for ...
This comparison examined through computer simulation for 64 ... the broadband input signal. These memory ... Section IV presents comparison results of the.

Interactive Learning with Convolutional Neural Networks for Image ...
data and at the same time perform scene labeling of .... ample we have chosen to use a satellite image. The axes .... For a real scenario, where the ground truth.

Learning Modular Neural Network Policies for Multi ...
transfer learning appealing, but the policies learned by these algorithms lack ..... details and videos can be found at https://sites. .... An illustration of the tasks.

A Neural Circuit Model of Flexible Sensorimotor Mapping: Learning ...
Apr 18, 2007 - 1 Center for Neurobiology and Behavior, Columbia University College of Physicians and Surgeons, New York, NY 10032, ... cross the road, we need to first look left in the US, and right ...... commonplace in human behaviors.

Adaptive Incremental Learning in Neural Networks
structure of the system (the building blocks: hardware and/or software components). ... working and maintenance cycle starting from online self-monitoring to ... neural network scientists as well as mathematicians, physicists, engineers, ...

Learning theory and algorithms for shapelets and ...
sparse combinations of features based on the similarity to some shapelets are useful [4, 8, 6]. Similar situations ... Despite the empirical success of applying local features, theoretical guarantees of such approaches are not fully ... tion that map