Single-Step Prediction of Chaotic Time Series Using Wavelet-Networks E. S. Garcia-Treviño, V. Alarcon-Aquino

Universidad de las Américas Puebla (UDLA), Departamento de Ingeniería Electrónica Cholula, Puebla, México CP. 72820 [email protected], [email protected]

Abstract This paper presents a wavelet neural-network for chaotic time series prediction. Waveletnetworks are inspired by both the feed-forward neural network and the theory underlying wavelet decompositions. Wavelet-networks are a class of neural network that take advantage of good localization properties of multiresolution analysis and combine them with the approximation abilities of neural networks. This kind of networks uses wavelets as activation functions in the hidden layer and a type of backpropagation algorithm is used for its learning. Comparisons are made between a wavelet-network and the typical feedforward network trained with the backpropagation algorithm. The results reported in this paper show that wavelet-networks have better prediction properties than its similar backpropagation networks.

wavelet networks, wavelets, backpropagation networks, approximation theory, time series prediction, multiresolution analysis. Keywords:

1. Introduction Wavelet neural networks are a novel powerful class of neural networks that incorporate the most important advantages of multiresolution analysis introduced by Mallat in 1989 [4]. Zhang and Benveniste [1] found a link between the wavelet decompositions theory and neural networks. These networks preserve all the features of common neural networks, like universal approximation properties, but in addition, present an explicit link between the network coefficients and some appropriate transform. In neural networks two types of activation functions are commonly used; global, as in BackPropagation Networks (BPN), and local, as in

Radial Basis Function Networks (RBFN). Both networks have different approximation properties, and given enough nodes, both networks are capable of approximating any continuous function with arbitrary accuracy [5]. With global activation functions, adaptation and incremental learning are slow due to the iterations of many nodes, and convergence is not guaranteed. Also, global functions do not allow local learning or manipulation of the network. These problems are overcome in neural networks with local activation functions. In recent years, several researches have been looking for better ways to design neural networks. For this purpose they had analyzed the relationship between neural networks, approximation theory and functional analysis. In functional analysis any continuous function can be represented as a weighted sum of orthogonal basis functions. Such expansions can be easily represented as neural networks which can be designed for the desired error rate using the properties of orthonormal expansions [5]. Unfortunately, most orthogonal functions are global approximators, and suffer from the disadvantage mentioned previously. In order to take full advantage of orthonormality of basis functions, and localized learning, we need a set of basis functions which are local and orthogonal. Wavelets are functions with these features. In wavelet theory we can build simple orthonormal bases with good localization properties. Wavelets are a new family of basis functions that combine powerful properties such as orthogonality, compact support, localization in time and frequency, and fast algorithms. Wavelets have generated a tremendous interest in both theoretical and applied areas, especially over the past few years [6]. Wavelet networks are a class of neural networks that employ wavelets as activation functions. These have been recently researched as

an alternative approach to the traditional neural networks with sigmoidal activation functions. Wavelet networks have attracted great interest, because of their advantages over other networks schemes (see e.g., [10]-[14]). The main research group that combine wavelet theory and neural networks is based on the work of Zhang and Benveniste [1], that introduces a (1+½) layer neural network based on wavelets. This approach uses simple wavelets and wavelet network learning is performed by the standard back-propagation type algorithm in the traditional neural network. Time series prediction is a very important problem in many applications, including physical science, control systems, engineering processes, bioengineering, environmental systems, business, etc. [3]. Recently, feed-forward neural networks such as multilayer perceptrons and radial basis function networks have been widely used as an alternative approach to time series prediction since they provide a generic black-box functional representation. Basically, the goal of time series prediction can be stated as follows: given a finite sequence CÐ"Ñß CÐ#Ñß ÞÞÞ CÐRÑß find the continuation CÐR "Ñß CÐR #ÑÞÞÞ[15]. The main goal of this paper is to show the general performance of wavelet networks in a typical application of neural networks. Particularly, we present wavelet networks applied to chaotic time series prediction. For this purpose comparisons are made between a wavelet network, tested with the first and second derivative of the Gaussian wavelet with stochastic gradient type algorithm for learning, and the typical feedforward network trained with the backpropagation algorithm. The results reported in this work show clearly that wavelet networks have better prediction properties than its similar backpropagation networks. The reason for that is the firm theoretical foundations of wavelet networks, that combine, the mathematical methods and tools of multiresolution analysis with the neural network framework. The remainder of this paper is organized as follows. Section 2 briefly reviews wavelet theory. Section 3 describes Zhang-Benveniste's wavelet networks structure. In Section 4 chaotic time series are presented. In Section 5 comparisons are made and discussed between wavelet networks and backpropagation networks. Finally, Section 6 presents the conclusions of this work.

2. Review of wavelet theory

Wavelet transforms involve representing a general function in terms of simple, fixed building blocks at different scales and positions. These building blocks are generated from a single fixed function called mother wavelet by translation and dilation operations. The continuous wavelet transform considers a family "

<+ß, ÐBÑ œ È <Œ l+l

B, +

(1)

ß

where + − Å ß , − Å, with + Á !, and <Ð † Ñ satisfies the admissibility condition. For discrete wavelets the scale (or dilation) and translation parameters in Eq (1) are chosen such that at level 7 the wavelet +!7 <Ð+!7 BÑ is +!7 times the width of <ÐBÑ. That is, the scale parameter Ö+ œ +!7 À 7 − ™× and the translation parameter Ö, œ 5,! +!7 À 7ß 5 − ™×. This family of wavelets is thus given by <7ß5 aBb œ +!7Î#
so the discrete version of wavelet transform is .7ß5 œ Ø1ÐBÑß <7ß5 ÐBÑÙ 7Î#

œ +!

(

_

_

Ð#Ñ

Ð$Ñ

1ÐBÑ< +!7 B  5,! .B

a

b

the P# -inner product. To recover coefficients Ö.7ß5 ×, the following stability condition should exist [7], Em1ÐBÑm# Ÿ " "l 1ÐBÑß <7ß5 ÐBÑ¡l# Ÿ Fm1ÐBÑm# ß Ø † ß † Ù denotes 1ÐBÑ from the

7−™ 5−™

Ð%Ñ

with E  ! and F  _ for all signals 1ÐBÑ in P# ÐÅ Ñ denoting the frame bounds. These frame bounds can be computed from +! , ,! and <ÐBÑ [7]. The reconstruction formula is thus given by # " " 1ÐBÑß <7ß5 ÐBÑ¡<7ß5 ÐBÑ Ð&Ñ 1ÐBÑ ¸ E

F 7−™

5−™

Note that the closer E and F , the more accurate the reconstruction. When E œ F œ ", the family of wavelets then forms an orthonormal basis [7].

2.1 Orthonormal bases and multiresolution analysis The mother wavelet function <ÐBÑ, scaling +! and translation ,! parameters are specifically chosen such that
bases for P# ÐÅÑ [4], [7]. To form orthonormal bases with good time-frequency localisation properties, the time-scale parameters Ð,ß +Ñ are sampled on a so-called dyadic grid in the timescale plane, namely, +! œ # and ,! œ ", [4], [7]Þ Thus, from Eq. (2) substituting these values, we have a family of orthonormal bases, <7ß5 ÐBÑ œ # 7Î#
5bß

Ð'Ñ

Using Eq. (3), the orthonormal wavelet transform is thus given by  1ÐBÑß <7ß5 ÐBÑ¡

#

œ

7Î# ' _ 1ÐBÑ<7ß5 a# 7 B _

5b.B

Ð(Ñ

and the reconstruction formula is obtained from Eq. (5). A formal approach to constructing orthonormal bases is provided by multiresolution analysis (MRA) [4]. The idea of MRA is to write a function 1ÐBÑ as a limit of successive approximations, each of which is a smoother version of 1ÐBÑ. The successive approximations thus correspond to different resolutions [4]. The differences between two successive smooth approximations at resolution #7 " and # give the detail signal at resolution #7 . In other words, after choosing an initial resolution P, any signal 1ÐBÑ − P# ÐÅÑ can be expressed as [4], [7]: _ 1ÐBÑ œ "-Pß5 9Pß5 ÐBÑ  " ".7ß5 <7ß5 ÐBÑ (8) 5−™

7œP5−™

where the detail or wavelet coefficients Ö.7ß5 × are given by Eq. (7), while the approximation or scaling coefficients Ö-Pß5 × are defined by

-Pß5 œ #

PÎ# (

_ _

wavelet network architecture improved in [2] follows the form showed in Figure 1. The equation that defines the network is given by: N 1ÐBÑ œ ! =3 <Ò.3 ÐB >3 ÑÓ  -B  1 Ð10Ñ 3œ" where B is the input, 1 is the output, >'s are the bias of each neuron, < are the activation functions and finally . 's and A's are the first layer and second layer ( "# layer) coefficients, respectively. It is important to note that 1 is an additional and redundant parameter, introduced to make easier the learning of nonzero mean functions, since the wavelet <ÐBÑ is zero mean.

1ÐBÑ9Pß5 ˆ# P B 5‰.B Ð*Ñ

Equations (7) and (9) express that a signal 1ÐBÑ is decomposed in details Ö.7ß5 × and approximations Ö-Pß5 × to form a multiresolution analysis of the signal [4]. 3. Description of wavelet-networks

Based on the so-called Ð"  "# Ñ-layer neural network , Zhang and Benveniste introduced the general wavelet network structure [1]. In [2] Zhang presented a modified version of this network, in which a parallel lineal term - is introduced to help the learning of the linear relation between input and output signals. The

t1

t2

x

d1

ψ

ω1

d2

ψ

ω2

M

M

M

g

tN

ω

ψ

dN

__

g

N

c

Figure 1. Modified

Ð"

" #

Ñ

-layer wavelet neural

network

In this kind of neural networks, each neuron is replaced by a wavelet. In this approach the translation and dilation parameters are iteratively adjusted according to the given function to be approximated. 3.3 Wavelet Networks Initialisation

In the works of Zhang [1] and [2] two different initialisation procedures based on the idea somehow similar to the wavelet decomposition are proposed. Both divide the input domain of the signal following a dyadic grid of the form showed in Figure 2. This grid has its foundations on the use of the first derivative of the Gaussian wavelet and it is a not orthogonal grid because the wavelet support at a given dilation, is higher than the translation step, at this dilation. The main difference between the two initialisation approaches presented in [1] and [2] is the way of the wavelet selection. In the first method, the wavelets at higher dilations are chosen until the number of neurons of the network has been reached. In the particular case, that the number of wavelet candidates, at a given dilation, exceeds the remainder neurons, then they are

randomly selected using the dyadic grid showed in Figure 2. It is important to note that this method does not take into consideration the input signal for the selection of wavelets. On the other hand, in the second approach an iterative elimination procedure is used [2]. It is based on the least squares error between the output observations and the output of the network with all the wavelets of the initialisation grid. At each iteration of the process the least contributive wavelet for the least squares error is eliminated of the network until the number of wavelets left in the network is the expected. The number of levels (different dilations) for both cases depends on the number of wavelets available for the network.

4. Chaotic time series

1

dilation

0.8 0.6 0.4 0.2 0 -1

Due to the fact that the traditional procedure used to train wavelet networks is based on the gradient of the error function respect to each parameter of the network, differentiable activation functions are necessary. This is the main reason for the use of the first derivative of the Gaussian wavelet in [1] and [2]. Gaussian wavelets are continuous and differentiable wavelets respect to its dilation and translation parameters. Note that in wavelet networks additional processing is needed to avoid divergence or poor convergence. This processing involves constraints that are used, at each iteration, after the modification of the parameter vector with the stochastic gradient, to project the parameter vector in the restricted domain of the signal.

-0.5

0 translation

0.5

1

Figure 2. Dyadic Grid for Wavelet Network Initialisation

3.2 Training algorithm As we mention above, wavelet networks use a stochastic gradient type algorithm to adjust the network parameters. If all the parameters of the network (- , 1, . 's, >'s, A's) are collected in a vector ), and using C5 to refer the original output signal and 1) to refer the output of the network with the vector of parameters ); then the error function (cost function, - ) to be minimized is given by:

GÐ)Ñ œ "# IÖÒ1 ÐBÑ CÓ# × )

Ð11Ñ

This stochastic gradient algorithm recursively minimize the criterion ("1) using inputÎoutput observations. This algorithm modifies the vector ) after each measurement ÐB5 ß C5 Ñ in the opposite direction of the gradient of

-Ð)ß B5 ß C5 Ñ œ "# Ò1 ÐB5 Ñ C5 Ó# )

Ð12Ñ

Chaos is the mathematical term for the behavior of a system that is inherently unpredictable. Unpredictable phenomena are readily apparent in all areas of life [8]. Many systems in the natural world are now known to exhibit chaos or non-linear behavior, the complexity of which is so great that they were previously considered random. The unraveling of these systems has been aided by the discovery, mostly in this century, of mathematical expressions that exhibit similar tendencies [9]. Chaos is part of an even grander subject known as dynamics. Whenever dynamical chaos is found, it is accompanied by nonlinearity. Naturally, an uncountable variety of nonlinear relations is possible, depending perhaps on a multitude of parameters. These nonlinear relations are frequently encountered in the form of difference equations, mappings, differential equations, partial differential equations, integral equations, or even sometimes combinations of these. We note that, for each differential equation, the specific parameters were selected because they are the more representative values for chaotic behavior, and also are the most commonly used in the literature [8][9]. In this section, we present briefly the two chaotic time series used in this work.

4.1 Lorenz equation It was introduced by E. N. Lorenz, in 1963. It was derived from a simplified model of atmospheric interactions. The system is most

commonly expressed as 3 coupled non-linear differential equations: .B œ +ÐC BÑ ("3) .> .C .>

œ BÐ, DÑ C œ BC -DÑ

.D .>

where + œ "!ß , œ #)ß - œ )Î$.

cannot learn difficult time series, with abrupt changes in its behavior; consequently, the training process often cannot converge or converge very slowly, and then, the trained network may not generalize well. It is also clear from Table 1 that wavelet-networks require a less number of iterations to perform time series prediction. 1.2

4.2 Mackey-Glass equation

0.8 Output

It was first advanced as a white blood cell production. It is a time-delay differential equation described by .B œ +BÐ>XÑ ("4) .> " B- Ð>XÑ ,BÐ>Ñ where + œ !Þ2ß , œ !.1ß - =10 .

0.6 0.4 0.2 0

5. Simulation results

-0.2 100

120

140 160 Samples

180

200

(a) 1.2 original wavenet A

1

Output

0.8 0.6 0.4 0.2 0 -0.2 100

120

140 160 Samples

180

200

(b) 1.2 original wavenet B

1 0.8 Output

The work reported in this paper presents wavelet networks applied to the prediction of two of the most known chaotic time series: Lorenz and Mackey-Glass equations. Particularly, this section presents comparisons between wavelet networks tested with the first and the second derivative of Gaussian wavelet (wavenet A and wavenet B, respectively), and the typical feed-forward network trained with the backpropagation algorithm (BPN). The Lorenz equation was sampled every 0.001 time steps. For the Mackey-Glass equation X =17 with a step size of 1.0. Both series were normalized in the range of 0-1. For all networks, and for both series, the first 100 samples were used for training, and the next 100 samples for single step prediction. Figure 3 shows the results obtained for Lorenz equation approximation. The wavelet used for wavenet A was the first derivative of the Gaussian wavelet: <ÐBÑ œ B/ "# B# , and for wavenet B was the second derivative of the same wavelet: <ÐBÑ œ ÐB2 "Ñ/ "# B# . For both cases the initialisation grid proposed in [2] is used. For BPN a logaritmic-sigmoidal activation function was used, and the network was trained with a learning rate of 0.05, and a momentum term of 0.5. The architecture used for BPN was: 1-15-1 (one input layer, fifteen hidden units and one output unit). As can be seen in Table 1, wavelet networks outperform BPN in terms of MSE (Mean Square Error). This is due to the fact that neural networks based on conventional single-resolution schemes

original BPN

1

0.6 0.4 0.2 0 -0.2 100

120

140 160 Samples

180

200

(c) Figure 3. Single step prediction of Lorenz Equation, (a) BPN, (b) wavenet A, (b) wavenet B

On the other hand, it is important to underline that for Lorenz series, wavenet B has better prediction properties than wavenet A. In counterpart, for Mackey-Glass series, wavenet A presents better behavior than wavenet B. The main reason for these dissimilar results is the direct relation between the form of the wavelet used to perform the analysis and the form of the chaotic series analyzed. The first derivative of the Gaussian wavelet is a simpler wavelet than its corresponding second derivative. Then we expect that for simpler chaotic series, like Mackey-Glass series, the first derivative of Gaussian wavelet performs a better analysis. Table 1. Simulations results

Lorenz equation method neurons iterations wavenet A 15 40 wavenet B 15 40 BPN 15 500 Mackey-Glass equation method neurons iterations wavenet A 15 40 wavenet B 15 40 BPN 15 500

MSE 0.1934 0.0164 0.0335 MSE 0.0008 0.0019 0.0359

6. Conclusions

The wavelet network described in this paper can be used for prediction of general non-linear systems. This method was inspired by both neural networks and the wavelet decomposition. The basic idea is to replace the neurons by more powerful computing units obtained by cascading an affine transforms. The results reported in this paper show clearly that wavelet networks have better prediction properties than its similar backpropagation networks. The reason for this is that wavelets, in addition to forming an orthogonal basis, have the capability to explicitly represent the behavior of a function a different resolutions of input variables. A more detailed study of the chaotic features of the series presented here, and the analysis of the performance for this kind of network applied to real data, will be reported in a forthcoming paper. 7. References

[1] Q. Zhang, A. Benveniste, "Wavelet Networks", IEEE Transactions on Neural Networks. Vol 3. No 6. July 1992

[2] Q. Zhang. "Wavelet network: the radial structure and an efficient initialisation procedure". In European Control Conference (ECC), Groningen, Pays-Bas, 1993. [3] S. Sitharama Iyengar, E.C. Cho, Vir V. Phoha, "Foundations of Wavelet Networks and Applications", Chapman & Hall/CRC. U.S.A. 2002 [4] S. G. Mallat, "A theory for Multiresolution Signal Decomposition: The Wavelet Representation", IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol II. No 7. July 1989 [5] B. R. Bakshi, G.Sthepanopoulos, "Wavelets as Basis Functions for Localized Learning in a Multiresolution Hierarchy", Laboratory for Intelligent Systems in Process Engineering, Department of Chemical Engineering. Massachusetts Institute of Technology, Cambridge, MA 02139, 1992 [6] B. Jawerth, W. Sweldens. "An overview of wavelet based multiresolution analyses", SIAM, 1994. [7] I, Daubechies. "Ten Lectures on Wavelets", New York. SIAM. 1992. [8] R. L. Devaney, "Chaotic Explosions in Simple Dynamical Systems", The Ubiquity of Chaos, Edited by Saul Krasner. American Association for the Advancement of Science. Washington D. C., U.S.A. 1990 [9] J. Pritchard, "The Chaos CookBook: A practical programming guide", Part of Reed International Books. Oxford. Great Britain. 1992. [10] E. A. Rying, Griff L. Bilbro, and Jye-Chyi Lu. "Focused Local Learning With Wavelet Neural Networks", IEEE Transactions on Neural Networks, Vol. 13, No. 2, March 2002. [11] Xeiping Gao, Fen Xiao, Jun Zhang, Chunhong Cao. "Short-term Prediction of Chaotic Time Series by Wavelet Networks", WCICA, Fifth World Congress on Intelligent Control And Automation, 2004. [12] Li Deqiang, Shi Zelin, Huang Shabai. "A wavelet network based classifier" IEEE ICSPProceedings 2004 . [13] M. Yeginer1, Y.P. Kahya. "Modeling of Pulmonary Crackles Using Wavelet Networks". Proceedings of the IEEE Engineering in Medicine and Biology 27th Annual Conference Shanghai, China, September 1-4, 2005 [14] C. M. Chang and T. S. Liu, "A Wavelet Network Control Method for Disk Drives" IEEE Transactions on Control Systems Technology, Vol. 14, No. 1, January 2006 [15] V. Alarcon-Aquino V. and J. Barria "Multiresolution FIR neural network based learning algorithm applied to network traffic prediction", IEEE Transactions on Systems, Man, and Cybernetics, Part C: Cybernetics, Vol 36, No. 2, March 2006.

Single-Step Prediction of Chaotic Time Series Using ...

typical application of neural networks. Particularly, .... Equations (7) and (9) express that a signal 1РBС is decomposed in details ..... American Association for the.

381KB Sizes 2 Downloads 212 Views

Recommend Documents

One-Dimension Chaotic Time Series
internal state is introduced to model one-dimension chaotic time series. The neural network output is a nonlinear combination of the internal state variable. To.

Learning and Approximation of Chaotic Time Series ...
This paper presents a wavelet neural-network for learning and approximation of chaotic time series. Wavelet networks are a class of neural network that take ...

Trading Bitcoin and Online Time Series Prediction - Proceedings of ...
The ubiquity of time series is a fact of modern-day life: from stock market data to social media ... trained on more recent data, producing a 4-6x return on investment with a ... In the context of time series analysis, classical methods are a popular

PACKET-BASED PSNR TIME SERIES PREDICTION ... - IEEE Xplore
Liangping Ma. *. , Gregory Sternberg. †. *. InterDigital Communications, Inc., San Diego, CA 92121, USA. †. InterDigital Communications, Inc., King of Prussia, ...

Synchronization of two different chaotic systems using ...
trol (SMC), an interval fuzzy type-2 logic controller is used. .... a type-2 fuzzy logic controller is used. .... fuzzy if-then rules, the inference engine maps the sin-.

Characterizing Chaotic Behavior of REM Sleep EEG Using Lyapunov ...
significance level of QPC—also has been utilized as a quantitative index to delineate the developmental changes in hippocampal subfields [3]-[4]. Information, chaos, as well as random fractal theories have introduced a breakthrough in the world of

Synchronization of two different chaotic systems using ...
tions, optimization of nonlinear systems performance, modeling .... the defuzzification procedure, which uses Center of. Sets (COS) ... us call it [yl yr]. The firing ...

Experimental Results Prediction Using Video Prediction ...
RoI Euclidean Distance. Video Information. Trajectory History. Video Combined ... Training. Feature Vector. Logistic. Regression. Label. Query Feature Vector.

Implementation of chaotic cryptography with chaotic ...
FIG. 1. An illustration of generating the keys from a master- slave type of a .... 12 Public-Key Cryptography: State of the Art and Future Direc- tions, edited by Th.

Support Vector Echo-State Machine for Chaotic Time ...
Keywords: Support Vector Machines, Echo State Networks, Recurrent neural ... Jordan networks, RPNN (Recurrent Predictor Neural networks) [14], ESN ..... So the following job will be ...... performance of SVESM does not deteriorate, and sometime it ca

Financial Time Series Forecasting Using Artificial Neural ... - CiteSeerX
Keywords: time series forecasting, prediction, technical analysis, neural ...... Journal of Theoretical and Applied Finance, Information Sciences, Advanced ... data of various financial time series: stock markets, companies stocks, bond ratings.

Financial Time Series Forecasting Using Artificial ...
data of various financial time series: stock markets, companies stocks, bond ratings. ... asset would perform the best during that month, it would have grown to over $2 ... Visualization and illustration (plot and histogram) of the dataset has an ...

Financial Time Series Forecasting Using Artificial Neural ... - CiteSeerX
Faculty of Mathematics and Computer Science. Department of ... Financial prediction is a research active area and neural networks have been proposed as one.

prediction of time-dependent cyp3a4 drug-drug ...
Sep 14, 1982 - as the metric to assess the degree of interaction. In the case of ... M. S. Lennard, G. T. Tucker, and A. Rostami-Hodjegan, submitted for pub- lication) were ..... 2-fold (89% when no corrections for the intestinal interaction were.

Anesthesia Prediction Using Fuzzy Logic - IJRIT
Thus a system proposed based on fuzzy controller to administer a proper dose of ... guide in developing new anesthesia control systems for patients based on ..... International conference on “control, automation, communication and energy ...

Feature Selection using Probabilistic Prediction of ...
selection method for Support Vector Regression (SVR) using its probabilistic ... (fax: +65 67791459; Email: [email protected]; [email protected]).

Prediction of Channel State for Cognitive Radio Using ...
ity, an algorithm named AA-HMM is proposed in this paper as follows. It derives from the Viterbi algorithm for first-order. HMM [20]. 1) Initialization. âiRiR+1 ...