Neurocomputing 110 (2013) 18–28

Contents lists available at SciVerse ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Data driven modeling based on dynamic parsimonious fuzzy neural network Mahardhika Pratama a,n, Meng Joo Er b,1, Xiang Li c, Richard J. Oentaryo b,1, Edwin Lughofer d, Imam Arifin e a

The University of New South Wales, Northcott drive, Canberra, ACT 2600, Australia Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore c Singapore Institute of Manufacturing Technology, Nanyang Drive 78, Singapore 638075, Singapore d Department of Knowledge-Based Mathematical Systems, Johannes Kepler University, Linz A-4040, Austria e Institut Teknologi Sepuluh Nopember, Campus ITS Sukolilo, Surabaya 60111, Indonesia b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 1 November 2011 Received in revised form 13 November 2012 Accepted 17 November 2012 Communicated by D. Wang Available online 2 January 2013

In this paper, a novel fuzzy neural network termed as dynamic parsimonious fuzzy neural network (DPFNN) is proposed. DPFNN is a four layers network, which features coalescence between TSK (Takagi–Sugeno–Kang) fuzzy architecture and multivariate Gaussian kernels as membership functions. The training procedure is characterized by four aspects: (1) DPFNN may evolve fuzzy rules as new training datum arrives, which enables to cope with non-stationary processes. We propose two criteria for rule generation: system error and e-completeness reflecting both a performance and sample coverage of an existing rule base. (2) Insignificant fuzzy rules observed over time based on their statistical contributions are pruned to truncate the rule base complexity and redundancy. (3) The extended self organizing map (ESOM) theory is employed to dynamically update the centers of the ellipsoidal basis functions in accordance with input training samples. (4) The optimal fuzzy consequent parameters are updated by time localized least square (TLLS) method that exploits a concept of sliding window in order to reduce the computational burden of the least squares (LS) method. The viability of the new method is intensively investigated based on real-world and artificial problems as it is shown that our method not only arguably delivers more compact and parsimonious network structures, but also achieves lower predictive errors than state-of-the-art approaches. & 2012 Elsevier B.V. All rights reserved.

Keywords: Dynamic parsimonious fuzzy neural network (DPFNN) Radial basis function (RBF) Self organizing map (SOM) Rule growing Rule pruning

1. Introduction 1.1. Preliminary Nowadays, in modern industries analytical or physics-based models are difficult or sometimes impossible to be deployed due to a high system complexity and non-stationary characteristic. As an alternative way, the data-driven modeling, which are capable of performing system optimization, control and/or modeling directly using the input–output observations collected from various real-world processes, are increasingly demanded. In general, the development of data-driven modeling tools involves two key objectives: a high modeling accuracy (e.g., low approximation error or misclassification rate) and a low model complexity (e.g., a small number of nodes or rules). For clarity, one may

n

Corresponding author. Tel.: þ6289676186386; fax: þ62315939177. E-mail addresses: [email protected], [email protected] (M. Pratama), [email protected] (M.J. Er), [email protected] (X. Li), [email protected] (R.J. Oentaryo), [email protected] (E. Lughofer), arifi[email protected] (I. Arifin). 1 Fax: þ62315939177. 0925-2312/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2012.11.013

discern that the structural complexity can be pinpointed by the number of network parameters stored in the memory hinging on the number of rules, input features and network type. The majority of conventional modeling techniques, however, focus largely on the modeling accuracy without paying an attention on the susceptibility of over-complex model representations. As a reciprocal impact, this may complicate the users from understanding the system being modeled. It is conceivable that the frugal rule base is in line with the high level of interpretability of the rule base. Such models usually impose a high structural and computational burden, thus being hampered to be installed in online applications entailing rapid model updates without the danger of over-fitting. One may envisage that the computational burden is defined by resultant cost of learning modules. Conversely, the memory demand owns an affinity with the computational cost in which the memory demand can be perceived from the total number of parameters stored in the repository. (i.e., number of training data and network parameters). Many methodologies in modeling unknown system dynamics relying on the input–output data observations have been tremendously developed by many researchers. One of well known approaches are artificial neural networks (ANNs) [68] which incorporate associative representations of the input and output

M. Pratama et al. / Neurocomputing 110 (2013) 18–28

examples in terms of synaptic weights between various layers. Nevertheless, a major grievance of the classical NNs cannot actualize automatic knowledge discovery, which necessitates the network structure to be fixed a priori and in turn lack ability in dealing with time-variant systems. Apparently, this approach is impractical in overcoming complex on-line real world engineering problems due to too less flexibility. To correct this limitation, whenever a new state/condition of the system arises, the existing network should automatically re-organize or even expand its structure so as to accommodate the new knowledge. In other words, the NNs should be equipped by a structural learning strategy which is an ad-hoc leverage to automate the growth and movement of the present hidden nodes in order to guarantee its completeness in capturing available training stimuli flexibly and on-the-fly. In what follows, traditional batch learning approaches (such as e.g., FAOS–PFNN [20] and MRAN [4]) are inapplicable for this task, as (1) iterating over the complete data set multiple times and (2) disallowing to assimilate new knowledge on demand. Noticeably, this concept degenerates the computational burden and is incompetent to online life-long learning. Another major bottleneck of ANNs lies in their fundamental architecture mimicking neurons and connections in the human brain, which is generally opaque and un-interpretable for the users as it is inherent to black boxes. Vice versa, fuzzy (logic) systems based on the concept of fuzzy sets [8] are able to explain the implicit system relations using linguistic fuzzy rules, and realize approximate reasoning coping with imprecision and uncertainty in decision-making processes in a possibilistic manner [28,42]; this can be achieved with generalization performance of fuzzy rules [66]. To this end, this has led to the development of fuzzy neural networks (FNNs), a powerful hybrid modeling approach that integrates the learning abilities, parallelism, and robustness of neural networks with the human-like linguistic and approximate reasoning concepts of fuzzy systems.

1.2. Review over state-of-the-art algorithms Historically, a pioneering work in FNNs was initiated by Jang [1,9] with the so-called adaptive network based on fuzzy inference system (ANFIS). Nevertheless, the network structure of the ANFIS has to be predefined, which is unable to be dynamic/ adaptive in responses to the changing (evolving) data stream [5]. Another prominent (static, non-dynamic) FNN approach is presented in [64], where a hybrid TS-neural network architecture is presented, capable of refining weights of the structural components (fuzzy rules); To introduce more flexible fuzzy neural networks, several typical semi-online FNNs namely DFNN [10,11], a successor of DFNN termed GDFNN [13,14], and SOFNN [17,18], were devoted by several authors. DFNN, GDFNN and SOFNN are short forms of dynamic fuzzy neural network (DFNN), generalized dynamic fuzzy neural network (GDFNN) and self organizing fuzzy neural network (SOFNN), respectively. All of these learning machines possess a peculiar ability to automate fuzzy rule evolution and pruning simultaneously. Nevertheless, these methods completely revisit past training signals intensifying computational burden over time and are unable to handle a vast amount of data. Researches in developing data-driven fuzzy modeling tool culminated in a prominent proposal of Angelov and Filev by means of evolving Takagi Sugeno (eTS) [36]. The eTS conveys a new concept of cluster potential in order to augment its rule base size in a sample-wise manner. Another approach was proposed by Lughofer with the so-called flexible fuzzy inference system (FLEXFIS) [38], which poses an incremental evolving version of vector quantization algorithm (eVQ) [61] and integrates some

19

advanced concepts for better robustness and reliability (resulting in FLEXFISþþ [67]). Furthermore, other approaches were deliberated by literature [20,39] termed as fast and accurate self organizing scheme for parsimonious fuzzy neural network (FAOS–PFNN), parsimonious and accurate fuzzy neural network (PAFNN), respectively. A foremost ingredient of these algorithms wields error reduction ratio (ERR) method [12] as an effective criterion to amalgamate a new rule. Opposed to eTS and FLEXFIS, a major grievance of these methods is the need to gather all presented training data during the execution of its teaching mechanism. Vice versa, all approaches listed in this paragraph do not have any rule pruning mechanisms integrated, which is beneficial in order to endow a compact and parsimonious rule base while retaining their predictive accuracy. In line with a rule base simplification purpose, several sequential pruning mechanisms have been proposed by researchers, for instances [3,6,7,31] which benefit some concepts of singular value decomposition (SVD) and of statistical methods. The approach in [6,7] discriminates with some approaches eliminating fuzzy rules in offline or post-processing manner like ERR method [12] in DFNN and GDFNN [13,14], as it quantifies the importance of a fuzzy rule with the use of the newest training datum. Conversely, the concepts in [6,7] are plugged in as one important component of the incremental learning machine termed as sequential adaptive fuzzy system (SAFIS) [21]. However, a main deficiency of the SAFIS utilizes a singleton (Sugeno) instead of TSK (Takagi– Sugeno–Kang) fuzzy system. It is well known that the TSK fuzzy system allows a better generalization capability than Sugeno models [41,58]. All state-of-the-art methods aforementioned herein exploit uni-dimensional membership functions, which triggers hyper-spherical regions. In retrospect, this theory merely wraps 2 parameters per fuzzy rule, which evoke the same fuzzy region for all input attributes, so that it is indeed not necessarily coincident with the real occurring data distribution. To obviate this drawback, a plausible solution provider of this drawback is to consolidate a paradigm of multi-dimensional membership functions [54–56], whose axes are not necessarily parallel to the input variable axes. The advantageous features of this method excel the previous type of uni-dimensional membership functions in the sense that it favors the input variable interactions in form of partial local correlations. A comprehensive survey on state-ofthe-art evolving neuro-fuzzy approaches can be found in [52]. 1.3. Our approach In this article, a novel fuzzy neural network namely dynamic parsimonious fuzzy neural network (DPFNN) is devised, which features a synergy between high predictive accuracy and low structural complexity. Opposed to the aforementioned methods, our new approach grants a coherent methodology for incremental learning of fuzzy neural networks, which integrates (1) a rule evolution strategy and synchronously assuring epsilon-completeness of fuzzy rule bases, thus establishing an important interpretability aspect, (2) a rule pruning mechanism based on rule significances, thus mitigating the complexity and on-line training time of the evolved models and (3) ellipsoidal clusters in arbitrary position, thus being able to model local correlations between inputs and outputs. More specifically, the DPFNN is a four layer network, where each layer undertakes a specific operation in order to manifest TSK fuzzy decision makings. On the one hand, the premise parts of DPFNN constitute multi-dimensional membership functions achieving ellipsoidal regions in the input space. On the other hand, the consequents consist of first order polynomials fusing input variables and few constants. In the first stage, the DPFNN

20

M. Pratama et al. / Neurocomputing 110 (2013) 18–28

commences its learning process from scratch with an empty fuzzy rule-base. Moreover, the extraneous rules are then parsimoniously amalgamated in accordance with two rule growing criteria, which are system error and e-completeness criteria. The second learning stage involves the tuning of input and output parameters of the fuzzy rules. For the antecedent parts, extended self-organizing map (ESOM) theory is applied to dynamically adjust the center of the fuzzy rules so as to better suit the input data distribution. Meanwhile, time localized least square (TLLS) is consolidated to derive an optimal set of the fuzzy consequent parameters. By extension, the TLLS method finalizes an LS adaptation relying on the data points contained in a sliding window [41], thus preventing to reuse all already seen data as it is needed for LS method. To circumvent infinite size of the rule base, the last stage within the incremental DPFNN learning cycle is to remove inconsequential fuzzy rules. Therefore, DPFNN adopts a rule pruning module as mounted in SAFIS prolonging it to hyper-plane consequents and multivariate kernels. The merits of DPFNN have been experimentally validated through various artificial and real world datasets and benchmarked with miscellaneous state-of-the-art methods. Finally, we may infer that DPFNN may outperform state of the art works in terms of predictive fidelity and structural burden achieving a balance between the predictive accuracy and the structural cost. The remainder of this paper is organized as follows: Section 2 elaborates the network architecture of DPFNN. Section 3 explores the holistic working principle of the DPFNN. Section 4 outlines experimental results in various benchmark problems including artificial and real world datasets. Conclusions are drawn in the last section of this paper.

2. Architecture of DPFNN Various methodologies for fuzzy identification models have been tremendously published in numerous literatures [30,43,44]. Nevertheless, the TSK fuzzy systems [30] have been largely scattered in a much broader scope than the relational fuzzy models [44]. The TSK fuzzy system is worth noting to possess an interesting property allowing that any real-occurring nonlinear relationship can be approximated to a certain degree of accuracy, thus confirming models with a high predictive accuracy [25]. By extension, a TSK fuzzy system is one step toward rapprochement between a conventional precise mathematical model and a human-like decision making, as it characterizes linguistic and mathematical rules in the antecedent and consequent parts, respectively. Nowadays, some techniques to boost the generalization of the TSK fuzzy system have been cast by several authors capitalizing a maximization of the uncertainty or a combination of the rough sets [62–64]. As such reason, DPFNN can be delineated as a four layers network in which each layer is committed to perform a particular operation in tandem so as to enforce the TSK fuzzy mechanism. For the sake of flexibility, the antecedent part of DPFNN is composed of multidimensional membership functions triggering ellipsoidal rules, thus taking into account input variable interactions owing to axes not necessarily in a same range to input variable axes. At any time t, the input and output signals are supplied by crisp variables xt and yt. In the sequel, the operative procedures of each layer are detailed. Fig. 1 visualizes the proposed network architecture. Input layer: Each node in this layer represents an input feature of interest and feeds these input signals to the next layer. These nodes interface with the external environment and are turned on when they capture external stimuli.

Fig. 1. Architecture of DPFNN.

Hidden/rule layer: Each node of this layer constitutes the premise part of the fuzzy rule and serves to transform a crisp value into a particular value in the fuzzy domain in which Gaussian function is in charge to manifest input transformations/fuzzifications. Moreover, the use of Gaussian law is by virtue of gaining a smooth approximation for a local data space and omitting undefined input space (that is the case where the normalized basis function in Eq. (2) becomes 0/0) [53]. The product of this layer is termed a rule firing strength, which can be mathematically expressed as follows:  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 Ri ¼ exp  ðXC i ÞSi 1 ðXC i Þ ð1Þ 2 where C is a center or template vector of ith fuzzy rules CAR1  u and X is an input vector of interest XAR1  u. S is the data covariance matrix for samples falling into Ri, SARu  u, whose main diagonal element consists of ski k¼1,2y,u and i¼1,2,y.r. Normalization layer: A node in this layer is in charge to normalize the rule firing strength into a certain range of [0,1]. In this panorama, the number of nodes in this layer is as with the number of nodes in the rule layer.

ji ¼ Pr

Ri

i¼1

Ri

ð2Þ

Output layer: If the center of gravity [9] method is benefited to perform back-transformations/defuzzifications of the final system output to a crisp variable, which is an action to the external environments. That is, the output induced by fuzzy consequences is eventually inferred by the weighted sum of incoming signals as follows: In the TSK fuzzy system, W is W i ¼ k0i þk1i x1 þ    þkui xu , i ¼ 1,2,::::,r (i.e., first order polynomial), where WAR1  (uþ 1)r and CAR(uþ 1)r  1. For notational simplicity, the DPFNN is pictorially shown as a multi-input–single-output (MISO) system. However, it can be easily prolonged to deal with a class of multi-input–multioutput (MIMO) system. Pr r X i ¼ 1 Ri wi y¼ wi ji ¼ W C ¼ P ð3Þ r i ¼ 1 Ri i¼1

3. Learning algorithm for DPFNN This section explores overall learning procedures of the proposed DPFNN. The learning framework of the DPFNN consists of four phases as follows: rule updating based on ESOM, allocation of rule premise parameters, determination of rule consequent

M. Pratama et al. / Neurocomputing 110 (2013) 18–28

parameters, and pruning of inconsequential rules. Algorithm 1 elucidates a holistic overview of the DPFNN learning scenario. By extension, this section also elaborates the computational and structural costs of the proposed algorithm. ALGORITHM 1. DPFNN LEARNING PROCEDURE 1. if the rule base is empty 2. create a new rule whose center and width are set as data point & s0, respectively. s0 is a predefined constant 3. Initialize consequent weights of the new rule using LS 4. else 5. Undertaking the learning procedures as the scenarios in Section 3.2 6. Prune inconsequential rules (Section 3.4) 7. end if 8. if a new sample comes in 9. go back to step 5 10. end if

3.1. Rule updating based on extended self organizing map (ESOM) The self organizing map (SOM) method is first originated by Kohonen [26]. The versatility of the SOM method has prompted many researchers [22,23,27] to embed it in their neural and fuzzy systems. Typically, the SOM theory is employed to update the focal points of the fuzzy rules so as to track the input distributions closely. However, the traditional SOM approach is deemed deficient, which only involves the Euclidean distance between the rule (node) and the current datum excluding a zone of influence of the Gaussian membership functions. To this end, the extended self organizing map method was ascribed in [24], whereby the winning rule is elicited via both the distance and membership function width, thus ensuing to produce a more representative winner. For each training episode (xn,tn), the ESOM method seeks the winning rule Cv using (2) and adjusts all centers of the ellipsoidal units as follows:   n n ð4Þ C in ¼ C in1 þ b Rvn hv X n C in1

bn labels a learning rate, Rv n denotes the firing strength of the n

winner, and hi is a neighborhood function and is defined as follows:  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 n ðC v C i ÞSi 1 ðC v C i Þ ð5Þ hi ¼ exp  2

bn ¼ 0:1expðnÞ

ð6Þ n

The neighborhood function hi approximates a matching factor of two neighboring rules, whose values epitomize a distance between the winner and another rule. One may conceive that a priority of adaptation is granted to a rule which lies in the adjacent proximity to the winner bearing a larger value of neighborhood function. Whereas a less value of the neighborhood function subsumes that an adapted rule is inadequately similar to the winner, so that it is solely moved slightly from its original position. Conversely, the learning rate bn decays exponentially as more examples have been traversed by DPFNN assuming a more refined rule base has been crafted in the late of training process. 3.2. Criteria for rule generation by dynamic rule generation thresholds The constructions of fuzzy rules in DPFNN are orchestrated according to two criteria, which are applicable to be cursors of a

21

rule base expansion. Moreover, the two criteria are system error and e-completeness. In the sequel, we will elaborate these two criteria in detail. 3.2.1. System error This criterion was originally put forward by Platt [2] and has been largely adopted in miscellaneous literatures [2,3,13–15, 17–19,20,22,23]. The aim of the use of this criterion is to stipulate DPFNN performance in covering the given training datum referring to an error arising owing to the new incoming datum fed. More specifically, the system error is defined as follows: en ¼ :t n yn :

ð7Þ

where tn the measured output and yn the predicted output value at time instance n. If a nth training episode complies en Zke or the system error is considered sufficiently large, where ke is a dynamic rule generation threshold, existing rules are considered inadequate to cover the new data point. Accordingly, a new rule ought to be crafted and appended to the rule-base in order to fill up the scarcity of the rule base outreach. One may comprehend that the threshold ke is initially set as a large value and gradually decreases over time. The large threshold value at the beginning of learning is intended to construct a coarse rule base, which sets the most troublesome positions in the underlined training patterns to be coverable. As ke decays exponentially, a more precise fuzzy rule-base is formed to assure a high quality rule base, which allows the crisp variable to be accurately transformed into specific value in the fuzzy domain.     n , emin ke ¼ max emax  exp  ð8Þ q where emin and emax are predefined constant, as foreshadowed in the previous section, DPFNN adopts the Time Localized Least Square (TLLS), which solely solicits the most recent qon data points, whereas the remaining samples are discarded. 3.2.2. e-Completeness This criterion is employed to estimate a compatibility of the newest datum whether it is possibly the supplementary focal point fostering the rule base, in parenthesis, it supplies a blueprint of the injected training datum whether its novelty is admitted in order to seize the overall input distribution. To facilitate this criterion, e-completeness proposed by Lee [34] is exploited. Definition 1: e-completeness of fuzzy rule [34]: For any input in the operating range, there exists at least one fuzzy rule so that the match degree (or firing strength) is no less than e. Consider nth input–output pairs (xn,tn), the firing strength of each rule is computed via Eq. (2) and the winning rule is prescribed as a rule which exhibits the highest firing strength in the nth observation:   Rnv ¼ maxi ¼ 1,...,r Rni ð9Þ If Rvn o e with  n1 e e ¼ min emin  max ,

emin

!

emax

ð10Þ

Logically, the condition in Eq. (10) may occur especially in the non-stationary data stream when the model may traverse a datum which lies distant from the position of the present focal points. Therefore, the DPFNN will act that either an extraneous rule is tailored or the existing rules conforms their positions thereby intensifying the DPFNN rule base (see the four cases

22

M. Pratama et al. / Neurocomputing 110 (2013) 18–28

below). On the contrary, if Rv n o e is unsatisfied, the newest datum occupies a space in the outreach of the recent rule base. At the beginning of training process, e  emin (i.e., the winning rules are assumed to roughly have a small firing strength of emin as the learning mechanism just commences), thus stimulating the system to create a coarse covering mainly the most the troublesome data regions. Conversely, e  emax towards the end of training process which fosters the system to craft a more refined rule base that properly captures the data samples more accurately. Remark: Supposing Gaussian membership functions are used and the newest datum xn has been properly accommodated by existing fuzzy rules in the certain range of [Cji 72sji]. If a training episode complies Eq. (12), it will encounter the e-completeness criterion proposed by Lee [34] with the threshold e ¼0.1354 at the beginning of the rehearsal process. Since, the threshold e tends to exponentially increase as more examples have taught the DFPNN thereby attaining a higher similarity degree. In this regard, at the end of the learning process, if the input datum xn lies in the range [Cji 7 sji], they will strike the e-completeness criterion with e ¼0.3679. In this angle, there is no individual input which constraints the matching factor higher than e. Hence, DPFNN adopts this perspective instantly setting emin ¼ 0:1354 and emax ¼ 0:3679. Noticeably, the dynamic rule generation thresholdemay serve to establish the coarse rule base so as to obtain the most troublesome regime. Later on, eattains a bigger value at the end of the learning process, thus ensuing DPFNN in order to tailor a more fine-grained rule base afterwards. In a case that the newest data sample does not comply both of the system error and the e-completeness, the existing rule base is no longer representative to cluster the datum or the datum conveys a significant impact in order to foster the embrace of the rule base thereby recruiting it as a complementary rule. If Rvn o e

and

9en 9 Z ke

ð11Þ

the data sample is unmanageable to the existing rule base so that a new rule is evolved in order to build a new ellipsoid which is capable of precisely capturing the data samples located nearby its region as follows: C i þ 1 ¼ X n , Si þ 1 ¼ dmin

ð12Þ

where dmin,i ¼ arg min9xni cki 9, k ¼ 1,. . .,u, i ¼ 1,:::,r

ð13Þ

whereas the consequent parameters are crafted via time localized least square (TLLS) method as explored by Section 3.3. By acting fuzzy rule discovery autonomously, DPFNN enables to manage its completeness in accordance with the training data fed, which is a judicious way to cope with possible time-varying characteristics of the training data. In addition to a case of Rvn o e and 9en9 Zke, there exist few other occasions emerging while the learning engine of the DPFNN is switched on, which are detailed as follows: Case 2. 9en 9 Zke ,

Rnv Z e

The newly injected datum xn can be clustered to the existing fuzzy rules. However, the predictive accuracy of the DPFNN is deemed inadequate to suffice the tolerable accuracy of the DPFNN, so that the consequent parameters of the DPFNN are merely polished up using TLLS method which is explored hereafter in the next section.   Case 3. 9en 9 oke , max Rnj r e

This case implies that DPFNN already produces a desirable output accuracy, but the input data xi is untouchable with the existing rule base and in turns unable to be segmented. Hence, the existing rules are adjusted by means of the ESOM theory.   Case 4. 9en 9 o ke , max Rnj 4 e The DPFNN already yields a convincing performance satisfying the two criteria. Hence, no action is carried out in this circumstance sustaining the recent formation of the rule base. 3.3. Determination of consequent parameters To derive optimum consequent parameters, DPFNN benefits the least square (LS) method as this algorithm is simple and agile to deliver global optima solutions, which are generally appealing to track the footprints of most real world problems. Unlike back propagation (BP) method, which is another popular method in the neural-network field, suffers from a serious convergence problem. That is, the BP method is usually slow in attaining global minima and can be easily trapped in local minima [59]. Yet, the major bottleneck of the traditional LS method is severe computational efforts induced by the involvement of a high dimensional matrix while exploiting all gathered training data in which some of them are probably outdated. It should be envisaged that the computational complexity increases when there is a sample increment obviously retarding the training process and it also engenders the system memory overflow. Accordingly, the LS method should be due a forgetting mechanism, so that it constraints an unaffordable computational expense as abundant data points embark to the model. To remedy this shortcoming, the moving window or time localized technique introduced by [41] is one of sensible alternatives. The foremost constituent of the time localized least square (TLLS) method is that only the last q data points are conserved to a model update, while other data samples are dispossessed. Hence, this concept is capable of a more flexible update and in turns steeply detracting the computational burden and memory demand of the classical LS method. Therefore, the TLLS method may confer an instantaneous adaptation depending on sliding window size q which can be deduced relatively small against the size of training data through our rigorous experimentations.  1 T W ¼ T jT j j ð14Þ One the one hand, T¼(t1,t2,y,tq)ARq is the localized target data and jARr(u þ 1)  q whose element is xknRin. On the other hand, the expression (jTj)  1jT is a pseudoinverse of matrix j. The weight vector W¼[k10,k20,y,kk0,y,k1r,y,kkr]ARr  (u þ 1) contains the consequent parameters of the DPFNN in the form of the TSK fuzzy type. One may presume that, albeit, the suboptima solutions of the LS method are sought, we observe that sub-optima solutions are arguably sufficient to replicate the target vector T. On the other hand, the past data may no longer reflect or are obsolete to describe the current data trends (i.e., consider when the footprints of the process may smoothly or abruptly change overtime from one operating points to others which is dubbed as drift situations). 3.4. Pruning of inconsequential rules In the area of neural-networks or fuzzy neural networks, it is always desirable to achieve a coherent tradeoff between a predictive accuracy and a model simplicity. Loosely speaking, the convoluted network structure is inherent with an over-fitting issue, which is undesired in most circumstances. Apart from that, it inhibits the users from understanding the system being

M. Pratama et al. / Neurocomputing 110 (2013) 18–28

modeled and suppresses the interpretability of explanatory module of the fuzzy neural network due to deteriorating rule semantics, in parenthesis, considering a neural network with a few hundred rules in order to finalize a particular task. To ascertain a concise network structure being assembled, it is noteworthy for every model in order to be endued with an ad-hoc mechanism of the rule base simplification. On the one hand, the rule pruning leverage is usable to point out the obsolete fuzzy rules, which contribute little during their lifespan or are no longer informative to delineate the recently observed data trends. On the other hand, this mechanism is also efficacious to remove a rule which may be outlier. For clarity, in the noisy environment, the models may wrongly synthesize a new rule which may be outlier. Nevertheless, the mismatch of fuzzy rule recruitments can be corrected with the rule pruning adornment as such rules or clusters are populated with so little or even none of data points thereby classed as inactive fuzzy rules. Henceforth, they can be evicted from the rule base without significant loss of the predictive accuracy of the model. In conjunction with rule pruning endeavors, tremendous variants of rule pruning strategies have been put forward by many researchers. In average, most of them are infeasible to be embedded due to several rationales. First, some variants impose expensive computational cost due to soliciting all past data and conjointly processing them (i.e., ERR [12], OBS [57]). Second, several types solely estimates the rule significance in the present time (i.e., using density to determine rule significance [38,60]), which noticeably distorts a nature that the rule may be indispensable constituent in the future time. Hence, it may jeopardize the stability of the rule base coverage. Third, another drawback is they merely approximate the rule sensitivity of the input parameters regardless of the output parameter importance. In fact, the output parameters play a crucial role so as to diminish the training errors due to reflecting the system behavior in a specific operating region of a cluster [60]. Due to the aforementioned reasons, the DPFNN inherits the rule pruning strategy of [6,9], which forecasts the rule significance based on approximations of statistical contributions when the number of training episodes approaches to infinity. In additions, the peculiar characteristics of this method endow the fuzzy rule contributions based on the significance of input and output parameters, sequential in nature invoking the most recent training datum, expelling already learned data, and also taking into account the fuzzy rule contributions in the future. Although it can be perceived suitable with DPFNN, its original version to the best of our knowledge faces a culdesac to be directly mounted in the DPFNN learning engine as designed only for singleton fuzzy system and uni-dimensional membership functions learning platforms. In connection with the DPFNN learning platform, the DPFNN learning framework exploits multi-dimensional membership functions and TSK type consequents. To this end, we extend its original version in [6,9] to the architecture used in DPFNN, finally arriving in Eq. (15) (the proof is left to readers).  Qu u

s

k ¼ 1 ki Qu j¼1 k¼1

Einf ðiÞ ¼ 9di 9 Pr

Einf ðiÞ r kerr

skj

u ¼ 9di 9Ei

ð15Þ

ð16Þ Pr

where di ¼ i ¼ 1 w1i þ w2i x1 þ    þwki xk þ    þ wu þ 1,i xu outlines the output contribution of the ith fuzzy rule whereas Ei denotes the input contribution of the ith fuzzy rule. Conversely, a threshold kerr is set according to prior knowledge or should be selected around 10%emin . One may comprehend that the threshold kerr plays a vital role in triggering a rule base simplification. Larger value of kerr may excite a worse modeling output.

23

Nevertheless, it solicits less number of rules to be coupled and vice versa. In line with foreshadowed exposures, the threshold kerr is proxy to regulate a plausible trade off between predictive accuracy and model simplicity. If a rule matches the condition in Eq. (16), then it will be categorized as an obsolete, inactive or needless fuzzy rule. Accordingly, it can be evicted from the rule base, decreasing the structural burden of DPFNN. 3.5. Analysis of computational expense The mechanisms in extracting the fuzzy rules, adapting the premise parameters, adjusting the consequent parameters and pruning inactive fuzzy rules are inherent with the computational cost or complexity of the algorithms. More specifically, one may contemplate that the number of data points, input features, rules and type of algorithms (i.e., iterative or recursive, batch learning or single pass learning) used in the learning process affect the overall computational efforts needed. Arguably, the computational overhead of the algorithms can be defined as a resultant cost of learning pillars in the learning costs (i.e., fuzzy rule recruitment, rule pruning strategy, adaptations of input and output parameters) as each learning constituent endure standalone computational burden depending on the number of data, rule, input attributes capitalized so that the resultant cost is the summation of them. On the one side, it is worth-stressing that the most influential impact in the computational burden of the DPFNN is indicated in the adjustment of the consequent parameters. The primal standpoint is that DPFNN necessitates the snapshot of last q data samples. Hence, the computational complexity arising in this learning step is O(UqM), where M¼(u þ1)  r. u and r are the number of input features and rules, respectively, whereas U is the number of times the rule base expansion is executed. One may grasp that O(  ) is a big O notation1 which is ubiquitous in exploring the computational complexity of the algorithms. This expense can be considered manageable as the total amount does not grow exponentially with the number of episodes or data samples n. It is conceivable that the number of episodes n is usually higher than q n b q (this is justifiable by means of our empirical study). On the other side, the computational expenses of the other learning modules like fuzzy rule extraction, pruning of inconsequential rules, adjustments of the input parameters are much less than the adaptation of the output parameters. The sensible rationale is that these learning modules are consummated on-the-fly forgoing the past training stimuli. Virtually, the adaptation of input parameters, fuzzy rule growing and pruning deplete the computational expenses in the order of O(r), O(2), and O(r), respectively. Accordingly, the DPFNN bears the total computational load as follows: OðUqM þ 2r þ 2Þ

ð17Þ

In comparison with such approaches like DFNN, GDFNN, SOFNN, or FAOS PFNN, the computational cost of our incipient algorithm is more economical as aforesaid algorithms completely revisit the past data points in order to learn the newest datum. Nonetheless, the computational complexities of the DFNN and GDFNN are tantamount as they enrobe the same learning components. The major contributors in the DFNN and GDFNN learning algorithms are LS method so as to craft the output parameters and ERR method in order to oversee the superfluous fuzzy rules which are quiet demanding. These two methods navigate to the computational complexities in the order of O(2n2M2). Apart from that, the computational cost incurred in quantifying the potential of the training data is equivalent to the DPFNN which is O(2). Hence, we deduce the resultant computational costs of DFNN and GDFNN

24

M. Pratama et al. / Neurocomputing 110 (2013) 18–28

Table 1 Computational load, memory requirement, structural cost.

Computational load Memory requirement Structural cost

DPFNN

DFNN

GDFNN

SOFNN

FAOS–PFNN

O(UqM2 þ 2rþ 2) O(q þ Mþ 2u  r) O((2u  r) þ(uþ 1)  r)

O(2M2n2 þ2) O(nþM þ u  rþ r) O((u þ1)  2r)

O(2M2n2 þ2) O(n þM þu  rþ r) O((2u  r) þ(uþ 1)  r)

O(2M2nU þ 2) O(nþM þ u  rþ r) O((2u  r) þ (u þ1)  r)

O(r2n2 þ2 þr) O(nþu  rþ 2r) O((u þ1)  r þr)

are O(2n2M2 þ2). Conversely, the computational expense of SOFNN is more affordable than the formers as it possesses the recursive least square (RLS) method to polish up the output parameters and optimal brain surgeon (OBS) method as rule pruning ingredient heading to O(2M2rU). Opposed to DPFNN, U in SOFNN is the number of times a fuzzy rule is recruited and evicted. Indeed, the term U in SOFNN renders it more expensive than in DPFNN as every time the rule base amends its size (i.e., expansion or simplification) the adaptation of the consequent parameters benefiting all collected data points ought to be enforced. Meanwhile, the rule growing procedure of SOFNN is the same as DFNN and GDFNN conveying the total computational burden O(2M2nUþ2). In contrast, FAOS–PFNN is not endued by rule pruning adornment, however, this algorithm refurbishes the traditional ERR method as another cursor of rule base augmentation. As with DFNN, GDFNN, the ERR in FAOS–PFNN invokes severe computational efforts O(n2r2) as it collects the already learned training signal. In this viewpoint, the resultant computational complexity is O(n2r2 þ2 þr) as FAOS–PFNN utilizes two other criteria in addition to ERR and the EKF method in adapting the output parameters. In addition to the computational complexity, the memory requirement of the model plays a noteworthy role in the viability of the algorithms. One may envision that the memory requirement of DPFNN is in the order of O(qþM þ2u  r). Arguably, the computational cost of DPFNN is lighter than of DFNN, GDFNN, FAOS–PFNN as the use of preceding training data overtime is needless. In essence, the memory requirement of DFNN is in the order of O(n þMþ u  r þr) whereas the GDFNN, SOFNN land on the memory requirement O(nþ Mþu  r þr). One may grasp that the distinction on memory requirement in GDFNN and DFNN is inflicted by DFNN is wrapped by the uni-dimensional membership function generating the same fuzzy region per input attribute. Meanwhile, the memory requirement of FAOS–PFNN is in the order of O(n þu  rþ 2r) which is not tantamount with DPFNN as FAOS–PFNN benefits from the singleton type consequences. 3.6. Analysis of structural complexity An in-depth look, the structural cost of the FNN is emanated by the total number of network parameters (input and output parameters) stored in the memory. In retrospect, the number of rules and the network specification are decisive to portray the level of complexity of the model. In a nutshell, DPFNN enrobes the multidimensional membership function on the premise part and the first order polynomial on the output parameters. Therefore, we deduce the structural load of DPFNN is O((2u  r)þ (u þ1)  r) in which this load is comparable with the structural cost of FLEXFIS, GDFNN and SOFNN. One may be cognizant that DPFNN consumes lighter memory requirement and computational load than GDFNN and SOFNN, however, DPFNN labors the equivalent structural complexity. This is emanated that DPFNN employs the commensurate network type drawing the comparable network parameters. It is worth-stressing that we do not reckon the number of data points benefited during the training process in order to gauge the structural complexity of the model.

Conversely, the rule base complexity of eTS, simp_eTS and DFNN is dissimilar with those foreshadowed as all of these algorithms exploit uni-dimensional membership function. That is, the structural cost of these algorithms is O((uþ 1)  2r). In contrast, FAOS–PFNN and SAFIS harness the uni-dimensional membership function and the constant type consequence where the computational burden is O((u þ1)  r þr). Table 1 explicates the consolidated computational complexities, memory requirements and structural burdens of the algorithms aforementioned.

4. Simulation studies The viability of the DPFNN prototype as a novel breakthrough to the field of data-driven modeling is experimentally validated through miscellaneous benchmark problems employing synthetic and real world datasets. The problems consolidated herein do not only feature nonlinear and uncertain properties, but also suffer from non-stationary ingredients, which are generally challenging to be overcome by the classical models. On the one side, the synthetic dataset problem encompasses a time series prediction of the Mackey–Glass function. On the other side, the real world dataset problems outline the tool wear forecasting of the ball-nose end-milling process, and the auto MPG problem. Nonetheless, DPFNN is also benchmarked with state-of-the-art algorithms in order to promote the efficacy of DPFNN contrasted with its counterparts. In this viewpoint, it is tangible in our experimentations in which DPFNN not only emulates the versatility of its counterparts, but also marginalizes the superiority of the already published works. 4.1. Chaotic Mackey Glass (MG) time series prediction This study case explores one of the classical benchmark problem introduced by [35] which was originally proposed as a control model of the production of the white blood cells. This problem is tremendously employed in many literatures [10–12,14–16,20,21,30] due to chaotic time series in which its nonlinear oscillations is universally endorsed as a representation of various physiological processes. Furthermore, this problem is governed with the use of the following mathematical model. dxðtÞ bxðttÞ ¼ axðt Þ dt 1þ x10 xðttÞ

ð18Þ

where a¼0.1, b¼ 0.2 and t ¼17. The task is to forecast future values x(tþP) from past values The parameters of the underlined function are assigned as P¼ Dt¼85 and n¼ 4. Herewith, the non-linear dependence of this time series problem is regularized by the following mathematical form: xðt þ85Þ ¼ f ½xðtÞ, xðt6Þ, xðt12Þ, xðt18Þ

ð19Þ

A total of 3000 training data are driven from the time interval t¼201 to t¼3200 and embarks to DPFNN algorithm. After the aging process of the resultant rule base is settled, the resultant rule base is tested benefiting 500 unseen data points from t¼5001 to t¼ 5500 in which the overall data patterns are produced by an approximation of the fourth-order Range–Kutta

M. Pratama et al. / Neurocomputing 110 (2013) 18–28

Table 2 Mackey Glass problem 3000 training sample.

25

Table 3 The auto MPG task.

Algorithms

Rule

Testing NDEI

Time

Number of network parameters

Algorithm

RMSEtest(std) RMSEtest(mean) Rule Time

DENFIS SAFIS FLEXFIS eTS Simple eTS eMG BARTFIS FAOS PFNN DFNN GDFNN DPFNN

58 21 89 99 21 58 24 44 20 18 11

0.278 0.380 0.157 0.356 0.378 0.139 0.301 0.1685 0.1345 0.1445 0.0959

4.1

754 126 1157 1287 210 754 312 264 200 234 143

GAP RBF eTS Simp_eTS BARTFIS DENFIS FLEXFIS DFNN GDFNN FAOS–PFNN DPFNN

0.1144 0.011 0.024 0.032 0.031 0.013 0.0879 0.0676 0.0321 0.0579

a

4.01 3.8 3.54 a

3.66 129.89 200.56 190.67 4.28

a a

0.1404 0.088 0.09 0.085 0.073 0.056 0.0890 0.0789 0.0775 0.0500

3.12 3.8 7.1 6.6 5.1 4.6 3.5 4.0 2.9 2.66

a

0.233 0.221 0.255 0.264 0.244 0.876 0.980 0.566 0.2713

Number of network parameters 31.96 86.4 156.2 116.96 112.2 101.2 56 88 26.1 58.52

The result is not listed in the original paper.

The result is not listed in the original paper.

method. To properly accomplish this study case, the design parameters of DPFNN are assigned as follows: emax ¼0.5, emin ¼ 0:05, s0 ¼a0 ¼0.4, kerr ¼0.005k ¼1.1, kw ¼2, q ¼30. It should be envisaged that the design parameters can be elicited from optimization techniques like the grid-search method [40]. Moreover, DPFNN is benchmarked against RAN [2], SAFIS [21], FAOS PFNN [20], DENFIS [16], eTS [36], Simple_eTS [37], and FLEXFIS [38], eMG [65] and BARTFIS [66] on the withheld assessment set of 500 training samples in order to campaign the superiority of DPFNN. Table 2 summarizes the consolidated results of all benchmarked systems in generalizing the validation data. Inevitably, DPFNN outperforms the other models, which not only showcases higher modeling accuracy but also lands on a more economical rule base than the other approaches breeding the smallest number of rules. In addition, the training speed of DPFNN may predate the training speed of FAOS–PFNN. This is mainly induced by that FAOS–PFNN subscribes to all past data, which incur a more expensive computational cost than DPFNN. In line with our computational complexity analysis outlined in the previous section, DPFNN undergoes a more instantaneous training episode than DFNN and GDFNN. Both, DFNN and GDFNN, are even slower than FAOS–PFNN as they wield the original LS and ERR methods hinging to all training samples in every training cycle. Notwithstanding, eTS, simp_eTS, FLEXFIS, DENFIS and BARTFIS experience more rapid training processes, eTS, simp_eTS, DENFIS and FLEXFIS are sketchy which are not including a rule base simplification technology, conversely, these algorithms generate less accurate predictions than DPFNN. Unfortunately, the comparative result of training speed of other methods (yet, we provide analyses of computational burden of the DPFNN and other models in the previous section) are unavailable by virtue of the exploitation of different computer environments in their original publications. Hence, their results are not comparable with DPFNN in our article. In conjunction with the number of network parameters conserved in the memory, DPFNN can suppress the memory demand on the lowest level. 4.2. Fuel consumption prediction of automobiles The auto MPG problem aims to foresee fuel consumptions of automobiles (Miles per gallon) based on 392 training patterns. The goal is to promote the versatility of DPFNN in addressing a real world engineering problems. Moreover, there are seven input variables which are transmitted to DPFNN (displacement, horsepower, weight, acceleration, cylinders, model year and origin). A total of 320 training data and 72 testing data are permuted from

the auto mpg database. To favor representative experimental results, a simulation is repeatedly performed 50 times afterwards due to shuffling natures of the training and testing data points (As every trial may showcase different results) and the eventual result is obtained from an averaged result of 50 trials. Furthermore, the DPFNN is contrasted against RAN [2], MRAN [2] GAP RBF [6], and eTS [36], simp_eTS [37], BARTFIS [66], FLEXFIS [38] and DENFIS [16] algorithms in order to emulate the robustness of the DPFNN counterparts. The average results of 50 trials in the withheld evaluation set of the 72 data points are tabulated by Table 2. Table 3 summarizes that the other models are inferior to DPFNN in terms of predictive accuracy. On the one hand, DPFNN is able to eradicate the convoluted rule base conferring the most compact and parsimonious structure splitting the smallest number of rules. On the other hand, its resultant fuzzy rule also delivers the best predictive fidelity. Albeit GAP–RBF, FAOS–PFNN, and DFNN gain the smaller number of network parameters than DPFNN, the modeling accuracy of these approaches is much worse than that of DPFNN. Nonetheless, this fact is occasioned by the fact that GAP–RBF constitutes neural network whereas DFNN and FAOS–PFNN conveys the uni-dimensional membership function bestowing the same fuzzy regions per input attributes. This phenomenon is supported by their natures, which exclude the rule/hidden node simplification. In contrast, eTS, simp_eTS, DENFIS, FLEXFIS, BARTFIS hold milder computational burden than DPFNN. However, they are inferior to DPFNN in terms of structural burden and predictive quality. 4.3. Tool wear prediction of ball nose end milling process Tool condition monitoring and prediction plays a vital role in high speed machining processes [33]. Undetected or pre-mature tool failures often impose to costly scrap or rework arising from damaged surface finishing, and loss of dimensional accuracy or possible damage to the work piece and machine [50,51]. More specifically, in the high precision machining industry, the development of self adjusting and integrated system capable of monitoring performance degradation and work piece integrity under various operational conditions under minimum operator’s supervision is desirable [46]. However, a production of accurate tool wear predictions is quiet challenging by virtue of the nonlinear and uncertain natures of machining processes [34]. New theories on machine learning have shed some lights to these issues. The principal constituents, which are necessitated their existence to concurrently address such issues, include the use of fuzzy logic reasoning in handling imprecise data and elevating the level of human interpretability [48] and the

26

M. Pratama et al. / Neurocomputing 110 (2013) 18–28

Table 4 The tool-wear prediction of the ball nose end milling process. Algorithm

APEtest(std) (%)

APEtest(mean) Rule (%)

Time (s)

Number of network parameters

GDFNN DFNN FAOS PFNN BARTFIS eTS Simp_eTS FLEXFIS DENFIS DPFNN

0.179 0.098 0.084 0.008 0.012 0.02 0.009 0.03 0.079

10.75 4.43 20.81 5.01 5.1 6.67 5.05 6.23 4.77

3.59 7.83 0.61 0.45 0.41 0.4 0.44 0.55 0.60

52 89 49.2 119.6 130 162.5 110.5 123.5 87.2

4 8.9 8.2 9.2 10 12.5 8.5 9.5 8.4

learning ability of neural network [49] in associating the input and target data. + A CNC milling process (Roders Tech RFM760) with a spindle rate up to 42,000 RPM is selected for the experiments. At the beginning of empirical study, the raw signal is gathered by seven channels DAQ. The first three channels provide force signal in the three dimensional cutting axes (X,Y,Z) measured by the dynamometer, the next three channels constitute vibration signals captured by the Accelerometer, and the last one produces the AE (acoustic emission) signals received by the AE sensor. It is conceivable that, in the machining process, many parameters and variables affect the work piece integrity as well as the tool performance over the production regime. As a reciprocal impact, it is deemed necessary for researchers to install a suit of accelerometer, dynamometer, acoustic sensors at critical locations in order to allow in-situ signals to be captured, processes, analyzed, and transformed into useful reference models for condition and performance monitoring [45,47,48]. A total of sixteen features of force signal were extracted. As pointed out by [29,32], it was recommended to merely encompass four features which are the most correlative with the tool wear. They are maximum absolute force, amplitude of force, average force, and amplitudes ratio. The dataset of two different cutter profiles are collected, normalized and permuted. The number of data points is 630 pairs and a 10-fold cross validation technique introduced by [40] is exploited in order to assess the performance of DPFNN due to shuffling nature of the injected training patterns. Referring to cross validation (CV) technique, the data set is first shuffled and partitioned into ten mutually exclusive bins, labeled as CV1–CV10. In the first trial, CV1 is used as the testing set, whereas, CV2–CV10 constitute the testing set; in the second trial, CV2 is the testing set while CV1 and CV3– CV10 are the training set, and so on. The average results across 10-fold cross-validation are tabulated by Table 4. To yield a robust tool wear predictor, which exemplifies a synergy between the economical model complexity and the high predictive accuracy, the predefined parameters of DPFNN should be allocated as follow: emax ¼1, emin ¼ 0:01, s0 ¼a0 ¼ 2, kerr ¼ 0.01, k ¼ 1.1, kw ¼1.12 and q ¼30. In this experiment, DPFNN is benchmarked against DFNN [10,11], FAOS PFNN [20] and GDFNN [13] BARTFIS [66], eTS [36], simp_eTS [37], FLEXFIS [38] and DENFIS [16], respectively. In this empirical study, DPFNN outperforms the other methods in term of training speed. Notwithstanding FAOS–PFNN may yield a competitive training speed and the lowest memory requirement, it is featureless as it is unfurnished by the rule pruning adornment and engages the uni-dimensional membership function and the constant output parameters. For brevity, it is well known that the multi-dimensional membership function inflicts a

more appealing property whose axes are not necessarily parallel to input variable axes and the TSK fuzzy system confers the higher degree of freedom than the singleton fuzzy system. By extension, the modeling accuracy of FAOS–PFNN is worse than that of DPFNN. Vice versa, GDFNN may confer the simplest rule base proliferating the smallest number of rules in this occasion. However, its predictive accuracy and training speed are obviously the worst result. Despite the best modeling accuracy produced by DFNN, it suffers from a high computational burden owing to a collection of all embarked data, which is tangible to be one of problematic natures of DFNN. On the contrary, this hypothesis is strengthened by the training speed of DFNN which yields the slowest execution time. Moreover, DFNN wraps the complex model representation in terms of the numbers of generated rules and network parameters conserved in the repository which are usually unacceptable in the data-driven modeling field. In comparison with BARTFIS, eTS, simp_eTS, DENFIS, FLEXFIS, our algorithm excels these methods in terms of the structural complexity and predictive quality. Yet, DPFNN is inferior to those algorithms in terms of the training speed.

5. Conclusion An exposure of a novel fuzzy neural network namely dynamic parsimonious fuzzy neural network (DPFNN) has been elaborated in this paper as a promising candidate of the data-driven modeling tools. In this context, the viability and efficacy of DPFNN are exemplified in which the experimental results utilizing various real-world and synthetic datasets are encouraging, especially in the structural complexity and predictive accuracy view points. As a downside, DPFNN benefits a sliding window-based least square (SWLS) or time localized least square (TLLS) method in order to derive the consequent parameters of DPFNN, which necessitates a number of training patterns in the sliding window. As our future work, we focus in enhancing DPFNN, so that it can run without storing several data points in a moving window thereby expediting the rehearsal process. Hence, all training procedures are executed without a priori domain knowledge of next data blocks and strictly do not require to look-back to already recognized training stimuli.

Acknowledgements This research is supported by the AnSTAR Science and Engineering Research Council Singapore-Poland Grant. The authors would like to thank the Singapore Institute of Manufacturing Technology for kindly providing the tool wear data. The fifth author acknowledges the Austrian fund for promoting scientific research (FWF, contract number I328-N23, acronym IREFS). References [1] L.X. Wang, J.M. Mendel, Fuzzy basis functions, universal approximation, and orthogonal least-squares learning, IEEE Trans. Neural Networks 3 (1992) 807–814. [2] J. Platt, A resource allocating network for function interpolation, Neural Comput. 3 (1991) 213–225. [3] M. Salmero´n, J. Ortega, C.G. Puntonet, A. Prieto, Improved RAN sequential prediction using orthogonal techniques, Neurocomputing 41 (2001) 153–172. [4] L. Yingwei, N. Sundararajan, P. Saratchandran, Performance evaluation of a sequential minimal radial basis function (RBF) neural network learning algorithm, IEEE Trans. Neural Networks 9 (1998) 308–318. [5] H.K. Saman, Self evolving neural network for rule base data processing, IEEE Trans. Signal Process. 45 (1997) 2766–2773.

M. Pratama et al. / Neurocomputing 110 (2013) 18–28

[6] G.-B. Huang, P. Saratchandran, N. Sundararajan, An efficient sequential learning algorithm for growing and pruning RBF (GAP–RBF) networks, IEEE Trans. Syst. Man Cybern. Part B Cybern. 34 (2004) 2284–2292. [7] G.-B. Huang, P. Saratchandran, N. Sundararajan, A generalized growing and pruning RBF (GGAP–RBF) neural network for function approximation, IEEE Trans. Neural Networks 16 (2005) 57–67. [8] L.A. Zadeh, Soft computing and fuzzy logic, IEEE Softw. 11 (1994) 48–56. [9] J.-S.R. Jang, ANFIS: adaptive-network-based fuzzy inference system, IEEE Trans. Syst. Man Cybern. Part B Cybern. 23 (1999) 665–684. [10] S. Wu, M.J. Er, Dynamic fuzzy neural networks—a novel approach to function approximation, IEEE Trans. Syst. Man Cybern. Part B Cybern. 30 (2000) 358–364. [11] M.J. Er, S. Fast, Learning algorithm for parsimonious fuzzy neural network, Fuzzy Sets Syst. 126 (2002) 337–351. [12] S. Chen, C.F.N. Cowan, P.M. Grant, Orthogonal least squares learning algorithm for radial basis function network, IEEE Trans. Neural Networks 2 (1991) 302–309. [13] S.-Q. Wu, M.J. Er, Y. Gao, A fast approach for automatic generation of fuzzy rules by generalized dynamic fuzzy neural networks, IEEE Trans. Fuzzy Syst. 9 (2001) 578–594. [14] Y. Gao, M.J. Er, NARMAX time series model prediction: feedforward and recurrent fuzzy neural network approaches, Fuzzy Sets Syst. 150 (2005) 331–350. [15] C.F. Juang, C.T. Lin, An on-line self-constructing neural fuzzy inference network and its applications, IEEE Trans. Fuzzy Syst. 6 (1998) 12–32. [16] N. Kasabov, Q. Song, DENFIS: dynamic evolving neural-fuzzy inference system and its application for time series prediction, IEEE Trans. Fuzzy Syst. 10 (2002) 144–154. [17] G. Leng, T.M. McGinnity, G. Prasad, An approach for on-line extraction of fuzzy rules using a self-organising fuzzy neural network, Fuzzy Sets Syst. 150 (2005) 211–243. [18] G. Leng, G. Prasad, T.M. McGinnity, An on-line algorithm for creating self-organizing fuzzy neural networks, Neural Networks 170 (2004) 1477–1493. [19] G. Leng, T.M. McGinnity, G. Prasad, Design for self organizing fuzzy neural network based on genetic algorithm, IEEE Trans. Fuzzy Syst. 14 (2006) 755–766. [20] N. Wang, M.J. Er, M.X. Fast, Accurate self organizing scheme for parsimonious fuzzy neural network, Neurocomputing 72 (2009) 3818–3829. [21] H.J. Rong, N. Sundararajan, G.B. Huang, P. Saratchandran, Sequential adaptive fuzzy inference system (SAFIS) for nonlinear system identification and time series prediction, Fuzzy Sets Syst. 157 (2006) 1260–1275. [22] Y. Zhou., M.J. Er, A novel approach for generation of fuzzy neural networks, Int. J. Fuzzy Syst. 7 (2007) 8–13. [23] M.J. Er, Y. Zhou, Automatic generation of fuzzy inference systems via unsupervised learning, Neural Networks 21 (2008) 1556–1566. [24] M.J. Er, S. Wu, Y. Gao, Dynamic Fuzzy Neural Networks: Architectures, Algorithms and Applications, McGraw-Hill, NY, USA, 2003. [25] L. Wang, Fuzzy systems are universal approximators, in: Proc. International Conference on Fuzzy Systems, (1992), pp. 1163–1169. [26] T. Kohonen, Self-organized formation of topologically correct feature maps, Biol. Cybern. 43 (1982) (1982) 59–69. [27] M.J. Er, et al., Adaptive noise cancellation using enhanced dynamic fuzzy neural networks, IEEE Trans. Fuzzy Syst. 13 (2005) 331–342. [28] W.L. Tung, C. Quek, eFSM-A novel online neural-fuzzy semantic memory model, IEEE Trans. Neural Networks 21 (2010) 136–157. [29] S. Huang, X. Li, O.P. Gan, Tool wear estimation using SVM in ball nose end milling, IEEE Annual Conference of The Prognostic and Health Society, 2010. [30] M. Sugeno, G.T. Kang, Structure identification of fuzzy model, Fuzzy Sets Syst. 28 (1988) 15–33. [31] C.S. Leung, K.W. Wong, P.F. Sum, L.W. Chan, A pruning method for the recursive least squared algorithm, Neural Networks 14 (2001) 147–174. [32] J.H. Zhou, C.K. Pang, F.L. Lewis, Z.W. Zhong, Intelligent diagnosis and prognosis of tool wear using dominant feature identification, IEEE Trans. Ind. Inf. 5 (2009) 454–464. [33] A.G. Rehorn, J. Jiang, P.E. Orban, State-of-the-art methods and results in tool condition monitoring: a review, Int. J. Adv. Manuf. Technol. 26 (2005) 693–710. [34] C.C. Lee, Fuzzy logic in control systems: fuzzy logic controller, IEEE Trans. Syst. Man Cybern. Part B Cybern. 20 (1990) 404–436. [35] M.C. Mackey, L. Glass, Oscillation and chaos in physiological control systems, Science 197 (1977) 287–289. [36] P. Angelov, D. Filev, An approach to online identification of Takagi–Sugeno fuzzy models, IEEE Trans. Syst. Man Cybern. Part B Cybern. 34 (2004) 484–498. [37] P. Angelov, D. Filev, Simpl_eTS: A simplified method for learning evolving Takagi–Sugeno fuzzy models, in: IEEE International Conference on Fuzzy Systems (FUZZ), 2005, pp. 1068–1073. [38] E. Lughofer, FLEXFIS: a robust incremental learning approach for evolving Takagi–Sugeno fuzzy models,, IEEE Trans. Fuzzy Syst. 16 (2008) 1393–1410. [39] W. Ning, M.J. Er, M. Xian-Yao, X. Li, An online self organizing scheme for parsimonious and accurate fuzzy neural networks, Int. J. Neural Syst. 10 (2010) 389–403. [40] M. Stone, Cross-validatory choice and assessment of statistical predictions, J. R. Stat. Soc. 36 (1974) 111–147.

27

[41] P. Angelov, Evolving rule-based models: a tool for design flexible adaptive systems, in: The Series Studies in Fuzziness and Soft Computing. Heidelberg Germany: Springer, Physics-Verlag, 2002, vol. 92. [42] J.S.R. Jang, C.T. Sun, Functional equivalence between radial basis function networks and fuzzy inference systems, IEEE Trans. Neural Networks 4 (1993) 156–159. [43] W. Pedrycs, An identification algorithm in fuzzy relational system, Fuzzy Sets Syst. 13 (1984) 153–167. [44] T. Takagi, M. Sugeno, Fuzzy identification of systems and its application to modeling and control, IEEE Trans. Syst. Man Cybern. Part B Cybern. 15 (1985) 116–132. [45] P. Angelov, R. Buswell, Identification of evolving fuzzy rule based models, IEEE Trans. Fuzzy Syst. 16 (2002) 667–676. [46] X. Li, M.J. Er, B.S. Lim, J.H. Zhou, O.P. Gan, L. Rutkowski, Fuzzy regression modeling for tool performance prediction and degradation detection, Int. J. Neural Syst. 20 (2010) 405–419. [47] B.Y. Lee, H.S. Liu, Y.S. Tarng, Modeling and optimization of drilling process, J. Mater. Process. Technol. 74 (1998) 149–157. [48] K. Perusich, Using fuzzy cognitive maps to identify multiple causes in troubleshooting systems, Integr. Comput. Aided Eng. 15 (2008) 197–206. [49] J.M. Zurada, Introduction to Artificial Neural System, West Publishing Company, USA, 1992. [50] E. Haddadi, M.R. Shabghard, M.M. Ettefagh, Effect of different tool edge conditions on wear detection by vibration spectrum analysis in turning operation, J. Appl. Sci. 8 (2008) 3879–3886. [51] L. Wang, M.G. Mehrabi, E.K. Jr, Tool wear monitoring in reconfigurable machining systems through wavelet analysis, Trans. NAMRI 3 (2001) 399–406. [52] E. Lughofer, Evolving Fuzzy Systems—Methodologies, Advanced Concepts and Applications, Springer, Berlin Heidelberg, 2011. [53] E.P. Klement, R. Mesiar, E. Pap, Triangular Norms, Kluwer Academic Publishers, Dordrecht Norwell New York London, 2000. [54] J.A. Dickerson, B. Kosko, Fuzzy function approximations with ellipsoidal rules, IEEE Trans. Syst. Man Cybern. Part B Cybern. 26 (1996) 542–560. [55] S. Abe, Fuzzy function approximators with ellipsoidal regions, IEEE Trans. Syst. Man Cybern. Part B Cybern. 29 (1999) 654–661. [56] A. Lemos, W. Caminhas, F. Gomide, Multivariable Gaussian evolving fuzzy modeling system, IEEE Trans. Fuzzy Syst. 19 (2011) 91–104. [57] C.S. Leung, K.W. Wong, P.F. Sum, L.W Chan, A pruning method for the recursive least squared algorithm, Neural Networks 14 (2001) 147–174. [58] J.D.J. Rubio, SOFMLS: online self-organizing fuzzy modified least squares network, IEEE Trans. Fuzzy Syst. 17 (2009) 1296–1309. [59] P.J. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting, Wiley, Hoboken, NJ, 1994. [60] J. Abonyi, R. Babuska, F. Szeifert, Modified Gath–Geva fuzzy clustering for identification of Takagi–Sugeno fuzzy models, IEEE Trans. Fuzzy Syst. 33 (2002) 612–621. [61] E. Lughofer, Extension of vector quantization for incremental clustering, Pattern Recognit. 41 (2008) 995–1011. [62] Xi-Zhao Wang, Chun-Ru Dong, Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy, IEEE Trans. Fuzzy Syst. 17 (2009) 556–567. [63] Xi-Zhao Wang, Jun-Hai Zhai, Shu-Xia Lu, Induction of multiple fuzzy decision trees based on rough set technique, Inf. Sci. 178 (2008) 3188–3202. [64] Xi-Zhao Wang, Chun-Ru Dong, Tie–Gang fan training T–S norm neural networks to refine weights for fuzzy if-then rules, Neurocomputing 70 (2007) 2581–2587. [65] A. Lemos, W. Caminhas, F. Gomide, Multivariable Gaussian evolving fuzzy modelling system, IEEE Trans. Fuzzy Syst. 19 (2011) 91–104. [66] R.J. Oentaryo, M.J. Er, L. San, L.-Y. Zhai, X. Li, Bayesian ART-based fuzzy inference system: a new approach to prognosis of machining process, in: IEEE Annual Conference of The Prognostic and Health Society, 2011. [67] E. Lughofer, Flexible evolving fuzzy inference systems from data streams (FLEXFISþþ), in: M. Sayed-Mouchaweh, E. Lughofer (Eds.), Learning in NonStationary Environments: Methods and Applications, Springer, New York, 2012, pp. 205–246. [68] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed., Prentice Hall Inc., Upper Saddle River, New Jersey, 1999.

Mahardhika Pratama was born in Surabaya, Indonesia. He received B.E. degree (First Class Honor) in Electrical Engineering from the Sepuluh Nopember Institute of Technology, Indonesia, in 2010. At the same time, he was awarded the best and most favorite final project by the same institution. Mr. Pratama holds his Master of Science (M.Sc.) degree in Computer Control and Automation (CCA) from Nanyang Technological University, Singapore, in 2011. He currently pursues a PhD program in University of New South Wales, Australia. Mr. Pratama is a member of IEEE, IEEE Computational Intelligent Society (CIS) and IEEE System, Man and Cybernetic Society (SMCS), and Indonesian Soft Computing Society (ISC-INA). His research interests involve machine learning, computational intelligent, evolutionary computation, fuzzy logic, neural network and evolving adaptive systems.

28

M. Pratama et al. / Neurocomputing 110 (2013) 18–28

Meng Joo Er is currently a Professor with the Division of Control and Instrumentation, School of Electrical and Electronic Engineering (EEE), NTU. His research interests include control theory and applications, fuzzy logic and neural networks, computational intelligence, cognitive systems, robotics and automation, sensor networks and biomedical engineering. He has authored 5 books, 16 book chapters and more than 400 refereed journal and conference papers in his research areas of interest. He served as the Editor of IES Journal on Electronics and Computer Engineering from 1995 to 2004. Currently, he serves as the Editor-in-Chief of the International Journal of Electrical and Electronic Engineering and Telecommunications, an Area Editor of International Journal of Intelligent Systems Science and an Associate Editor of 11 refereed international journals, namely International Journal of Fuzzy Systems, Neurocomputing, International Journal of Humanoid Robots, Journal of Robotics, International Journal of Mathematical Control Science and Applications, International Journal of Applied Computational Intelligence and Soft Computing, International Journal of Fuzzy and Uncertain Systems, International Journal of Automation and Smart Technology, International Journal of Modelling, Simulation and Scientific Computing, International Journal of Intelligent Information Processing and the Open Electrical and Electronic Engineering Journal. Furthermore, he served as an Associate Editor of IEEE Transactions on Fuzzy Systems from 2006 to 2011 and a Guest Editor of International Journal of Neural Systems from 2009 to 2010.

Xiang Li received her Ph.D. degree from Nanyang Technological University, Singapore in 2000, as well as M.E. and B.E. degrees from Northeastern University, China, in 1987 and 1982, respectively. She has more than 15 years of experience in research and applications of data mining, artificial intelligence and statistical analysis, such as neural networks, fuzzy logic systems, data clustering and multiple regression modeling.

Richard J. Oentaryo is currently a Research Fellow at the Living Analytics Research Centre, Singapore Management University (SMU). Prior to joining SMU, he was a Research Fellow at the School of Electrical and Electronic Engineering, Nanyang Technological University (NTU), where he worked as part of the team that clinched the IES Prestigious Engineering Achievement Award 2011. He received his Ph.D. and B.E. (First Class Honor) from the School of Computer Engineering, NTU, in 2011 and 2004, respectively. Upon his B.E. graduation, he was awarded the Information Technology Management Association Gold Medal cum Book Prize for the best Final Year Project of the 2004 cohort. Dr. Oentaryo is a

member of the Institute of Electrical and Electronics Engineers (IEEE), IEEE Computational Intelligence Society (IEEE-CIS), and Pattern Recognition and Machine Intelligence Association (PREMIA), Singapore. His research interests span neuro-fuzzy systems, social network mining, and brain-inspired architectures. He has published over 15 international journal and conference papers, and received several awards such as the IEEE-CIS Outstanding Student Paper Travel Grant in 2006 and 2009.

Edwin Lughofer received his Ph.D. degree from the Department of Knowledge-Based Mathematical Systems, University Linz, where he is now employed as post-doctoral fellow. During the past 10 years, he has participated in several international research projects, such as the EU-projects DynaVis: www.dynavis.org, AMPA and Syntex (www.syntex.or.at). In this period, he has published around 70 journal and conference papers in the fields of evolving fuzzy systems, machine learning and vision, clustering, fault detection, image processing and human–machine interaction, including a monograph on ‘Evolving Fuzzy Systems’ (Springer, Heidelberg) and an edited book on ‘Learning in Nonstationary Environments’ (Springer, New York). He is associate editor of the international journals Evolving Systems (Springer) and Information Fusion (Elsevier), and organized various special sessions and issues in the field of evolving systems, incremental machine learning and on-line modeling. He served as programme committee member of several international conferences and is currently a member of the ‘ETTC task force on Machine Learning’, and of the ‘EUSFLAT Working Group on Learning and Data Mining’. In 2010 he initiated the bilateral FWF/DFG Project ‘Interpretable and Reliable Evolving Fuzzy Systems’ and is currently key researcher in the national K-Project ‘Process Analytical Chemistry (PAC)’ (18 partners) as well as in the long-term strategic research projects ‘Condition Monitoring with Data-Driven Models’ and ‘Performance Optimization of Electrical Drives’ within the Austrian Competence Center of Mechatronics.

Imam Arifin graduated in electronic engineering from Electronic Engineering Polytechnic Institute—ITS Surabaya, in 1994. He received bachelor degree in Control System Engineering—Electrical Engineering from Sepuluh Nopember Institute of Technology—Surabaya, in 2000 and master degree in Intelligent System and Control from School of Electrical Engineering and Informatics—Bandung Institute of Technology, in 2008. From 1994 to 1996, he joined the Auto Insertion Division, PT Sony Electronics Indonesia as programmer for Numerical Control machine. He is presently as a young lecturer of Control System Engineering at Electrical Engineering Department, Sepuluh Nopember Institute of Technology.

Data driven modeling based on dynamic parsimonious ...

Jan 2, 2013 - The training procedure is characterized by four aspects: (1) DPFNN may evolve fuzzy rules ..... relationship can be approximated to a certain degree of accuracy, ...... power, weight, acceleration, cylinders, model year and origin). ..... [31] C.S. Leung, K.W. Wong, P.F. Sum, L.W. Chan, A pruning method for the.

327KB Sizes 0 Downloads 204 Views

Recommend Documents

[PDF Online] Dynamic Mode Decomposition: Data-Driven Modeling of ...
Buy Dynamic Mode Decomposition Data Driven Modeling of Chapter 1 Dynamic Mode Decomposition Data Driven Modeling of Complex Systems by J Nathan ...

Constraint-based modeling of discrete event dynamic systems
tracking, or decision tasks: automata, Petri nets, Markov ... tracking problems, such as failure diagnosis. .... constrained project scheduling, temporal constraint.

ePub Data-Driven Modeling & Scientific Computation
Big Data Full Online ... Hands-On Machine Learning with Scikit-Learn and TensorFlow ... Schaums Outline of Digital Signal Processing, 2nd Edition (Schaum's ...

Integrating Data Modeling and Dynamic Optimization ...
As the domains and contexts of data mining applications become rich and diverse, .... formulations in which the constraints are defined as a cost functional of the ...

IC_26.Data-Driven Filter-Bank-Based Feature Extraction for Speech ...
IC_26.Data-Driven Filter-Bank-Based Feature Extraction for Speech Recognition.pdf. IC_26.Data-Driven Filter-Bank-Based Feature Extraction for Speech ...

Data-driven and feedback based spectro-temporal ...
estimates of various phone classes in multilayer perceptron. (MLP) based acoustic ..... IEEE Transactions on Audio, Speech, and. Language Processing, 2010. ... Proc. of International Conference on Acoustics, Speech, and Signal. Processing ...

Photonic Aharonov-Bohm Effect Based on Dynamic ... - Zongfu Yu
Apr 12, 2012 - Illustration of the Aharonov-Bohm ef- fect for both an electron .... The state-of-the-art silicon modulators can achieve a modulation frequency of ...

Locality-Sensitive Hashing Scheme Based on Dynamic ...
4.1 Theory of Virtual Rehashing ..... Color The Color dataset contains 68,040 32-dimensional data objects, which are the color histograms of images in the.

Likelihood-based Data Squashing: A Modeling ... - Semantic Scholar
Sep 28, 1999 - performing a random access than is the main memory of a computer ..... identi es customers who have switched to another long-distance carrier ...

A Survey on Artificial Intelligence-Based Modeling ... - IEEE Xplore
Jun 18, 2015 - using experimental data, thermomechanical analysis, statistical or artificial intelligence (AI) models. Moreover, increasing demands for more ...

Impact of Web Based Language Modeling on Speech ...
IBM T.J. Watson Research Center. Yorktown Heights, NY ... used for language modeling as well [1, 4, 5]. ... volves a large company's call center customer hotline for tech- ... cation we use a natural language call–routing system [9]. The rest of ..

Impact of Web Based Language Modeling on Speech ...
volves a large company's call center customer hotline for tech- nical assistance. Furthermore, we evaluate the impact of the speech recognition performance ...

A New Data Representation Based on Training Data Characteristics to ...
Sep 18, 2016 - sentence is processed as one sequence. The first and the second techniques are evaluated with MLP, .... rent words with the previous one to represent the influence. Thus, each current input is represented by ...... (better disting

Modeling HIV-1 integrase complexes based on their ...
Correspondence to: Alexei A. Podtelezhnikov; email: [email protected]. ... complete structural basis for guiding drug discovery, including inhibition of ...

3 Münster Workshop on Agent-based Modeling -
Jul 14, 2016 - geographic information systems, social network analysis, and machine ... His workshop lecture is on Participatory Extension (PET), a software.

Data-Driven Traps!
Sep 4, 2008 - This keeps the test code short and makes it easy to add new tests but makes it hard to ... As your code grows the test data tends to grow faster.