Bayesian ART-Based Fuzzy Inference System: A New ...

Viewer
Transcript

Bayesian ART-Based Fuzzy Inference System: A New Approach to Prognosis of Machining Processes Richard J. Oentaryo, Meng Joo Er, Linn San, and Lianyin Zhai

Xiang Li

School of Electrical and Electronic Engineering Nanyang Technological University Nanyang Avenue, Singapore 639798 Email: {rjoentaryo, emjer, sanlinn, lyzhai}@ntu.edu.sg

Manufacturing Execution and Control Group Singapore Institute of Manufacturing Technology Nanyang Avenue, Singapore 638075 Email: [email protected]

Abstract—Modeling of machining processes plays a crucial role in manufacturing operations, in view of its substantial impacts on the overall cost effectiveness and productivity. To this end, computational intelligence approaches, such as neural networks, fuzzy systems, and hybrid fuzzy neural networks, are increasingly being employed in the recent years. However, most of the existing approaches are based on batched learning procedure, in which all machining data are assumed to be available and can be accessed repeatedly. Such approach is impractical in the face of large data stream, and is not suitable for dynamic, time-varying tasks. In this light, this paper proposes a novel fuzzy neural network called the Bayesian Adaptive Resonance Theory (BART)-Based Fuzzy Inference System, which features a fully online learning scheme employing the BART algorithm and decoupled extended Kalman ﬁlter (DEKF) procedure for the construction and parameter optimization of its rule base, respectively. Together, the BART and DEKF mechanisms endow the proposed system with computational efﬁciency and strong statistical foundation, which are desirable in modeling and prognosis tasks. To further simplify its structure, the system also incorporates a pruning procedure to remove inconsequential rules. Experimental studies on tool wear prognosis and chaotic time series prediction tasks have veriﬁed the efﬁcacy of the proposed system as an online modeling tool.

I, M, K xi , y m , tm IVi , OVm Rk Ai,k , Ck,m ci,k , σi,k wi,k,m μAi,k μ Rk , λ Rk ρ, α, β Nk Vˆk , Vˆmax ϕk Gk , Pk , Hk

N OMENCLATURE Total number of inputs, outputs, and rules ith input, mth output, and mth target features ith input variable and mth output variable k th (fuzzy) rule ith antecedent label and mth consequent function belonging to rule Rk ith center and width of the (Gaussian) membership function belonging to rule Rk Consequent weight of rule Rk corresponding to the it h input and mth output Membership degree of Ai,k given input xi Firing strength and normalized ﬁring strength of rule Rk Vigilance parameter, width scale, and pruning threshold Number of times rule Rk wins the competition Hypervolume of data space covered by rule Rk and maximum hypervolume allowed for Rk Contribution/inﬂuence degree of rule Rk Kalman gain, error covariance matrix, and Jacobian matrix of the parameters of rule Rk

I. I NTRODUCTION Modeling and prognosis of machining processes constitute a crucial facet in manufacturing operations, and have gained increasing popularity in the recent years due to their substantial contributions to the overall cost effectiveness and productivity. In general, machining refers to the process in which a metal material is removed (in the form of chips) using single or multiple wedge-shaped cutting tools [1]. Among the contemporary machining processes, high-speed milling (HSM) is regarded as the most sophisticated and challenging, as it involves the use of multi-point cutting tool rotating at high speed to remove material from the surface of a workpiece. Many HSM centers are available today to meet the ever-growing demands for the production of vital pieces for various industries. A fundamental objective of HSM process is to achieve better surface-ﬁnishing, thus improving the quality of the completed workpiece. To this end, the fast, in-situ detection of the wear state of the cutting tools and the recognition of their breakage will play a crucial role. Tool breakage has been regarded as the major cause of unforeseen machine tool downtime/failure that continues to plaque the manufacturing industries today. Even though the tool does not break during machining processes, the use of dull or damaged cutters can impose heavy burdens on the machine tool system, and consequently a loss of quality in the ﬁnal workpiece. In light of this issue, extensive researches have been conducted to develop an accurate and reliable tool condition monitoring (TCM) system [2]. Such system would help substantially expedite the production processes, minimize machining tool failures, and reduce the overall cost. With the recent advances in computer-based technologies, computational intelligence (CI) is increasingly being employed for modeling and prognosis of machining operations [1]. Established CI methodologies such as neural network and fuzzy logic systems are particularly attractive, due to their abilities to gracefully cope with highly nonlinear, multidimensional, and ill-deﬁned engineering tasks. For instance, a neural networkbased TCM system was developed in [3] to approximate tool wear during end-milling process. In [4], feed-forward neural networks were used to predict the life span of brittle tools. On the other hand, a fuzzy logic-based in-process TCM system was proposed in [5] to forecast ﬂank wear of the cutters. More

recent work presented in [6] used, respectively, fuzzy expert systems and fuzzy pattern recognition for monitoring tool wear over a limited range of cutting conditions. However, several deﬁciencies remain in these approaches. On one hand, while neural networks provide a powerful means for TCM, their operations are generally opaque to the human operators. Yet, it is much desirable to have a TCM system that can provide tractable operations in a way akin to the human’s logical reasoning. On the other hand, fuzzy logic systems are able to explain their operations using fuzzy linguistic rules, and realize approximate reasoning to cope with imprecision and uncertainty in decision-making. Unfortunately, their design traditionally involves extensive manual intervention, thereby resulting in static rules that cannot be tuned further after their initial setup. This has led to the development of fuzzy neural networks (FNNs), a powerful hybrid modeling approach that integrates the learning abilities, parallelism, and robustness of neural networks with the human-like linguistic and approximate reasoning traits of fuzzy logic systems [7]. Several applications of FNN in TCM and machining operations have been reported in the literature. For instance, an adaptive network-based fuzzy inference system (ANFIS) was proposed in [8] for estimation of ﬂank wear rate based on cutting force measurement. In [9], an ANFIS-based classiﬁcation system was developed to determine tool wear condition based on drilling forces. More recently, the work in [10] introduced a new ANFIS-based method for prediction of ﬂank wear based on cutting force signals during end-milling process, and its prediction performance under varying cutting conditions was found satisfactory. Nevertheless, the FNN models used in these works employ batched learning procedure, which assumes that all data patterns to be learned are always available (or stored in memory) and can be accessed repeatedly. Such procedure would pose high storage and time requirements when dealing with large machining data stream, and is not suitable for modeling dynamic, time-varying environment. A plausible approach to addressing this issue is to develop an FNN model that can learn in a fully online manner. To this end, four discriminating features of online learning given by [11] are considered in this paper: • All training observations are sequentially (i.e., one-byone) presented to the system • At any time, only one training observation is seen/learned • A training observation is discarded once the learning procedure for that particular observation is completed • The learning system has no prior knowledge as to how many total training observations will be presented In addition to performing online learning, it is imperative for the model to attain a good balance between approximation accuracies (e.g., prediction error) and model simplicity (e.g., the number of rules/nodes). The latter is particularly important to provide an intuitive, tractable model to the human operators, as well as scalability in the face of large data stream. To fulﬁll the above requirements, this paper proposes a novel FNN termed the Bayesian Adaptive Resonance TheoryBased Fuzzy Inference System (BARTFIS) for modeling and

prognosis of machining processes. The proposed system employs an extended version of the online Bayesian Adaptive Resonance Theory (BART) algorithm [12] to dynamically construct and adapt its fuzzy rule base structure. The BART algorithm features a synergy between the fast online ART learning [13] and the Bayes’ rule [14], thereby making the proposed BARTFIS model both efﬁcient and statistically sound. Adjustment of the rule base parameters in the model is then achieved via the decoupled extended Kalman ﬁlter (DEKF) algorithm [15] that provides an efﬁcient second-order recursive parameter optimization method, also conceptually rooted in the Bayes’ rule. Lastly, a simple pruning procedure is performed to remove inconsequential rules that contribute little over their lifespan, thus reducing the complexity of the BARTFIS model without signiﬁcantly degrading its prediction accuracy. The remainder of this paper is organized as follows. Section II describes the architecture of the proposed BARTFIS system. The detailed learning procedure of the proposed system are subsequently described in Section III. Section IV provides experimental results and analysis of the proposed system on tool wear prognosis and chaotic time series prediction tasks. Section V ﬁnally concludes the paper. II. BARTFIS A RCHITECTURE A. System Structure The proposed BARTFIS system, as illustrated in Fig. 1, consists of a ﬁve-layer, multi-input-multi-output (MIMO) connectionist structure. Nodes in the input layer, termed input variable nodes IVi , capture the ith input features of interest xi . The antecedent layer consists of rule antecedent nodes Ai,k , each representing a fuzzy linguistic label/concept. Each node Rk in the rule layer represents a fuzzy If-Then associative rule. Nodes Ck,m in the consequent layer correspond to the rule consequent function. Lastly, each output variable node OVm in the output layer represents the mth output feature of interest ym . The total numbers of inputs, outputs, and rules are denoted as I, M and K respectively. Essentially, the aforementioned structure realizes the ﬁrstorder Takagi-Sugeno-Kang (TSK) fuzzy inference system [16], which comprises fuzzy rules in the form of (1): If x1 is A1,k and ... xi is Ai,k and ... xI is AI,k Then y1 = Ck,1 and ... ym = Ck,m and ... yM = Ck,M (1) Here, the antecedent label Ai,k of rule Rk is deﬁned using Gaussian membership function as per (2), while the consequent function Ck,m is computed using linear equation (3): (xi − ci,k )2 (2) μAi,k = exp − 2 2σi,k Ck,m =

I

wi,k,m xi

(3)

i=0

where ci,k and σi,k are the center and width of the Gaussian membership function, respectively, wi,k,m is the consequent weight parameter, and x0 = 1.

[ = > [ [L [, @7

,9

[

[L

[,

,9L

,9,

$ N

& P

5

$ .

& 0

$L

&N

$L N

5N

& N P

$L .

&N 0

$,

&.

$, N

&.P

5.

29

\

29P

\P

290

\0

&.0

$, . ,QSXWOD\HU

Algorithm 1 BARTFIS Learning Procedure &

$

$QWHFHGHQWOD\HU 5XOHOD\HU

&RQVHTXHQWOD\HU

2XWSXWOD\HU

Fig. 1: Structure of the proposed BARTFIS system B. Inference Scheme The decision-making process of the proposed BARTFIS system involves the following inference scheme, in which the system outputs ym are computed based on given inputs xi . First, each input layer node IVi simply captures xi and directly propagate it to the next (antecedent) layer. Then, the antecedent layer computes the Gaussian membership degree μAi,k of the rule antecedents based on (2). The ﬁring strength of each rule Rk in the rule layer is subsequently computed using the product (fuzzy) T-norm of μAi,k , as per (4): I I 1 (xi − ci,k )2 (4) μAi,k = exp − μRk = 2 2 i=1 σi,k i=1 Next, the consequent outputs Ck,m are calculated in the consequent layer via (3), and the normalized ﬁring strength of each rule Rk is calculated using (5): μR λRk = K k (5) l=1 μRl Based on (3) and (5), the overall system outputs ym are ﬁnally inferred using (6): I K K (6) ym = λRk Ck,m = λR k wi,k,m xi k=1

k=1

i=0

III. BARTFIS L EARNING P ROCEDURE The online learning procedure of the proposed BARTFIS system consists of three phases: rule construction, parameter adjustment, and rule pruning, all of which are carried out based on a single data sample. Algorithm 1 gives an overview of the learning procedure. The three phases are elaborated in sections III-A-III-C respectively. A complexity analysis of the procedure is also presented in section III-D.

Deﬁne: Input-target pair (x, t) = ([x1 . . . xi . . . xI ]T , [t1 . . . tm . . . tM ]T ), vigilance parameter ρ ∈ (0, 1], width scale α ∈ (0, 1], and pruning threshold β ∈ [0, 1] /* Phase 1: Rule construction (BART Procedure) */ Compute the posteriors Pˆ (Rk |x) of all rules Rk using (7) Compute the hypervolumes Vˆk of all rules Rk using (10) K Vˆmax ← ρ k=1 Vˆk Ktmp ← K for j = 1 to K do kp ← arg maxk (Pˆ (Rk |x)) if Vˆkp ≤ Vˆmax then Perform learning on rule Rkp using x via (15)-(17) j ← j + K /* break the for loop */ else Pˆ (Rkp |x) ← 0 /* remove from competition */ end if end for if Pˆ (Rkp |x) = 0 then Create a new rule RK+1 based on x and α via (12)-(14) K ←K +1 end if /* Phase 2: Parameter adjustment (DEKF Procedure) */ if Ktmp = K then Compute the memberships μRk of all rules Rk using (4) kw ← arg maxk (μRk ) Do DEKF procedure on rule Rkw using t via (18)-(20) else Initialize the parameters of the new rule RK via (32)-(33) Reset the covariance matrices of rules R1 -RK-1 via (34) end if /* Phase 3: Rule pruning */ Compute inﬂuences ψk of all rules Rk using (35) kp ← arg mink (ψk ) if ψkp < β then Prune rule Rkp /* prune the least inﬂuential rule */ Delete the covariance matrix of Rkp K ←K −1 end if

A. Rule Construction In this phase, a modiﬁed version of the BART algorithm [12] is carried out to construct rules (also referred to as categories in the context of ART) in an online manner. The procedure can be decomposed into three steps: category choice, vigilance test, and category learning, as follows: 1) Category Choice: In this step, all existing rules compete to represent the current input pattern. The posterior probability of rule Rk given input pattern x = [x1 , . . . , xi , . . . , xI ]T is computed using Bayes’ theorem [14], as deﬁned in (7): Pˆ (x|Rk )Pˆ (Rk ) Pˆ (Rk |x) = K ˆ x|Rl )Pˆ (Rl ) l=1 P (

(7)

where K is the total number of rules, Pˆ (Rk ) is the estimated prior probability of rule Rk , and Pˆ (x|Rk ) is the estimated likelihood of Rk with respect to x. The prior probability and likelihood are deﬁned in (8) and (9) respectively: Nk Pˆ (Rk ) = K l=1

Pˆ (x|Rk ) =

cnew kp Nl

I/2

=

cold kp

+

(8)

1 (2π)

3) Category Learning: When a chosen rule Rkp passes the vigilance test, the parameters ckp , σkp , Nkp of that rule are updated using (15)-(17) respectively:

1/2 Vˆk

exp −

1 2

I i=1

(xi − ci,k ) 2 σi,k

2

(9)

I

2 σi,k

(10)

i=1

The chosen (i.e., winning) rule Rkp is subsequently deﬁned as one having the maximum a posteriori probability (MAP): kp = arg maxk (Pˆ (Rk |x)). That is, the rule Rkp is either more populated (i.e., having high Pˆ (Rk )) than other rules, or more likely to be the true rule (i.e., having high pˆ(x|Rk ) since it is the closest to x, for instance), or both. Based on probabilities and Bayes’ theorem [14], the MAP criterion is expected to select a winning rule more accurately than using one of the probabilities alone. For example, the MAP criterion may prefer a rule with higher prior probability than that of another rule, although the normalized distance (i.e., the argument of the exponential term in (9)) of the former to x is larger. 2) Vigilance Test: The goal of this test is to ensure that the chosen rule Rkp is limited in size. That is, the test restricts the hypervolume (coverage) Vˆkp of the chosen rule to the maximal hypervolume Vˆmax allowed for a rule, as per (11): Vˆkp ≤ Vˆmax

(11)

K where Vˆmax = ρ k=1 Vˆk and ρ ∈ (0, 1] is a user-deﬁned vigilance parameter. If Rkp matches (11), category learning is performed (as described shortly). Otherwise, the rule is removed from the competition for the current pattern x (e.g., by resetting its posterior probability Pˆ (Rk |x) = 0), and a search for another rule with high posterior probability that complies with (11) is conducted. If all existing rules fail the vigilance test, then a new rule RK+1 is created and its center vector cK+1 = [c1,K+1 , . . . , ci,K+1 , . . . , cI,K+1 ]T , width vector σK+1 = [σ1,K+1 , . . . , σi,K+1 , . . . , σI,K+1 ]T , and winning count NK+1 are initialized using (12)-(14), respectively: cK+1 = x

(12)

(15)

Nkold +1 p

(σknew )2 = (σkold )2 + p p

where Nk is the winning count of rule Rk , and Vˆk is the hypervolume of data space covered by Rk as given by (10): Vˆk =

x − cold kp 2 σkold )2 (x − cnew kp ) − ( p

Nkold +1 p

Nknew = Nkold +1 p p

(17)

The update formulae in (15) and (16) are essentially expanded from sequential maximum likelihood estimation for a single Gaussian to the multidimensional case [12]. B. Parameter Adjustment This phase is composed of two alternative scenarios. The ﬁrst scenario takes place when no new rule is created in the rule construction phase (i.e., when Ktmp = K in Algorithm 1) and involves updating the parameters of the winning rule that passes the vigilance test. The second scenario is about initializing the consequent parameters of the new rule that is formed when all other existing rules fail the test. The following subsections describe the two scenarios in detail: 1) Winning Rule Adjustment: In this scenario, the winnertakes-all learning strategy is adopted to adjust all parameters of the winning rule Rkw , which has the highest membership degree such that kw = arg maxk (μRk ). Here, the adjustment is made by means of the decoupled extended Kalman ﬁlter (DEKF) procedure [15], as given by (18)-(20): Gkw (t) = Pkw (t − 1)Hkw (t) −1 × R(t) + HTkw (t)Pkw (t − 1)Hkw (t) Pkw (t) = IZ×Z − Gkw (t)HTkw (t) Pkw (t − 1) θkw (t) = θkw (t − 1) + Gkw (t) t − y

θkw (t) = [w kTw , cTkw , σkTw ]T

NK+1 = 1

w kw = [w0,kw ,1 . . . wi,kw ,1 . . . wI,kw ,1 . . . w0,kw ,m . . . wi,kw ,m . . . wI,kw ,m . . .

(14)

w0,kw ,M . . . wi,kw ,M . . . wI,kw ,M ]T

where . is the Euclidean norm and α ∈ (0, 1] is the userdeﬁned width scale parameter.

(19) (20)

(21)

which may be further decomposed into (22)-(24):

(13)

k=1

(18)

where Gkw (t), Pkw (t), Hkw (t) and θkw (t) are the Kalman gain, covariance matrix, Jacobian matrix, and parameter vector belonging to Rkw at time instance t, respectively, R(t) is the observation noise variance, IZ×Z is an identity matrix such that Z is the length of θkw (t), and t = ([t1 , . . . , tm , . . . , tM ]T and y = ([y1 , . . . , ym , . . . , yM ]T are the target and system output vectors, respectively. For simplicity and to avoid introducing an extra free parameter, R(t) is set as identity matrix in this work (i.e., R(t) = IM ×M ). The parameter vector θkw (t) can be written as in (21):

σK+1 = α × min x − ck

K

(16)

ckw = [c1,kw . . . ci,kw . . . cI,kw ]

T

σkw = [σ1,kw . . . σi,kw . . . σI,kw ]T

(22) (23) (24)

Meanwhile, the Jacobian matrix Hkw (t) is given by (25): ⎤ ⎡ ∂C kw ,1 . . . 0 . . . 0 ⎥ ⎢ ∂ w kw ,1 ∂C ⎥ ⎢ ... 0 0 . . . ∂ w kkw ,m ⎥ ⎢ w ,m ⎢ ∂Ckw ,M ⎥ 0 ... 0 . . . ∂ w k ,M ⎥ (25) Hkw (t) = ⎢ w ⎥ ⎢ ∂Ck ,1 ∂Ckw ,m ∂Ckw ,M ⎥ w ⎢ ... ... ∂ ckw ∂ ckw ⎦ ⎣ ∂ckw ∂Ckw ,1 ∂Ckw ,m ∂Ckw ,M . . . . . . ∂ σk ∂ σk ∂ σk w

w

w

where the gradient vectors are deﬁned in (26)-(28): T ∂Ckw ,m ∂Ckw ,m ∂Ckw ,m ∂Ckw ,m = ... ... (26) ∂w kw ,m ∂w0,kw ,m ∂wi,kw ,m ∂wI,kw ,m T ∂Ckw ,m ∂Ckw ,m ∂Ckw ,m ∂Ckw ,m = ... ... (27) ∂ckw ∂c1,kw ∂ci,kw ∂cI,kw T ∂Ckw ,m ∂Ckw ,m ∂Ckw ,m ∂Ckw = ... ... (28) ∂σkw ∂σ1,kw ∂σi,kw ∂σI,kw and each element in the vectors is computed using (29)-(31): ∂Ckw ,m = λR k w x i ∂wi,kw ,m (xi − ci,kw ) ∂Ckw ,m = λRkw (Ckw ,m − tm ) 2 ∂ci,kw σi,k w ∂Ckw ,m (xi − ci,kw ) = λRkw (Ckw ,m − tm ) 3 ∂σi,kw σi,k w

(29) (30) 2

(31)

2) New Rule Initialization: In the case when a new rule is created, the consequent parameters of that particular rule are computed as the weighted average of the parameters of the other existing rules. Formally, the consequent parameters of the new rule RK+1 are initialized using (32): wi,K+1,m =

K

λRk wi,k,m

(32)

k=1

and its covariance matrix is in turn set as per (33): PK+1 (t) = IZ×Z

(33)

At the same time, the covariance matrices of all the other rules Rk , where k ∈ 1, . . . , K, are reset using (34): 2 K +1 Pk (t) = Pk (t − 1) (34) K2 C. Rule Pruning A simple and intuitive procedure for rule pruning is carried out in this phase. The key idea is to remove a rule that has the least contribution since it was ﬁrst created. The contribution ϕk of each rule Rk at time instance t is estimated using (35): t μRk (t ) ϕk = t =tk (35) t − tk where the numerator denotes the cumulative ﬁring strength of rule Rk (see (4)) and tk is the time instance at which Rk was created. The cumulative strength here is updated recursively, with the initial condition μRk (tk ) = 1. Next, the least inﬂuential rule Rkp is identiﬁed, such that kp = arg mink (ϕk ), and is

TABLE I: Time complexity of the BARTFIS learning phases Learning phase Rule construction

Time complexity O(K 2 + I × K)

Parameter adjustment

O(I 2 × M 3 )

Rule pruning

O(K + I × M )

Description Find a winning rule and adjust its parameters only if it passes the vigilance test. If no rule meets the test, a new rule is created. Carry out DEKF procedure to ﬁne-tune the parameters of the best-ﬁt rule, or otherwise initialize those of the newly created rule. Prune the least inﬂuential rule if the ratio of its cumulative ﬁring strength and life span is below a certain speciﬁed level

pruned if ϕkp < β, where β ∈ [0, 1] is a user-deﬁned pruning threshold. When pruning happens, the covariance matrix Pkp of the obsolete rule Rkp is deleted accordingly. D. Complexity Analysis Using the notations in the previous sections, Table I provides a summary of the (worst) time complexity of the above three learning phases for a single data sample. It can be seen that the computational load of the BARTFIS system lies in the rule construction and parameter adjustment steps, especially when the number of inputs I and number of rules K are large. For a training data stream comprising N samples, the total time complexity of the learning process is given by (36): O K2 + I × K + I2 × M 3 × N (36) Such complexity is fairly low nonetheless, as in most cases K is much smaller than N (i.e., K N ). This may be attributed to the vigilance test (11) and pruning procedure (35), which together restrict the size of the rule base. Also, since the BARTFIS learning procedure assumes no prior domain knowledge and that each training sample is fed into the system only once (i.e. no data revisit), its speed performance compares favorably to that of other conventional fuzzy neural approaches (e.g., [17], [18], [19], [20], [21], [22]). This beneﬁt is evident in our simulation studies (see, e.g., section IV-B). On the other hand, the overall space complexity of the learning procedure is given by (37): O K × I2 × M 2 (37) which can be largely attributed to the size of the covariance matrix Pk (t), as given by (19) and (33). This requirement is reasonable nevertheless, recalling that K N . It is also an order-of-magnitude lower than that of the original (i.e., global) EKF algorithm [23], which requires to store the full covariance matrix with a total complexity of O K 2 × I 2 × M 2 . IV. E XPERIMENTAL R ESULTS AND A NALYSIS A. Simulation Setup In order to validate the efﬁcacy of the proposed BARTFIS system, simulation studies on tool wear prognosis and chaotic time series prediction tasks are reported in this paper. The c experiments were performed under the MATLAB R2010b environment, running on Intel Core i5 processor with 4 GB memory. The conﬁguration of the three system’s parameters, i.e., vigilance parameter ρ, width scale α, and pruning threshold β, is determined empirically for each case study. Detailed results and analysis are provided in sections IV-B and IV-C.

Fig. 2: Experimental setup for tool wear prognosis task TABLE II: Description of the tool wear input features Cutting axis x-axis

y-axis

z-axis

Feature Ma x Am x Av x Ra x Ma y Am y Av y Ra y Ma z Am z Av z Ra z

Description Maximum force in the x-direction Amplitude of force in the x-direction Average force in the x-direction Ratio of force amplitude in the x-direction Maximum force in the y-direction Amplitude of force in the y-direction Average force in the y-direction Ratio of force amplitude in the y-direction Maximum force in the z-direction Amplitude of force in the z-direction Average force in the z-direction Ratio of force amplitude in the z-direction

B. Tool Wear Condition Prognosis An experimental study on tool wear prediction in highspeed end-milling process is presented in this section. For this experiment, a set of 6 mm ball-nose cutters is engaged in the end-milling of the surface ﬁnishing of Inconel 718 workpiece. The Inconel 718 a hard-to-cut superalloy material widely used in aerospace and gas turbine industries [24]. At certain level, the cutting process is stopped to measure the tool wear. Interpolation of the tool wear measurement is then achieved via a nonlinear curve ﬁtting method. For data acquisition purposes, a vibration, a force, and an acoustic sensor are attached to the workpiece. The force sensor consists of a Kistler 9257BA dynamometer with built-in 3-channel charge ampliﬁer. As for the vibration sensor, a Kistler 8762A50 ceramic shear triaxial accelerometer is employed, while a Kistler 8152B121 acoustic emission sensor is used to capture the acoustic signal. In this experiment, three dimensional force signals of the cutting processes are used for prognosis of the tool wear. For each cutting axis, four features i.e., maximum (peak) force, force amplitude (peak-to-peak), average force, and ratio of force amplitude, are fed as inputs to the proposed BARTFIS system. Hence, there are a total of 12 input features, and a single output feature (i.e., the tool wear measurement). Table II summarizes the input features considered. In total, the dataset

consists of 635 observations taken from two ball-nose cutters. Evaluation of the BARTFIS system is subsequently done using a 10-fold cross-validation (CV) procedure, whereby the dataset is ﬁrst shufﬂed and then partitioned into 10 mutually exclusive bins, labeled CV1-CV10. In the ﬁrst trial, CV1 is used as the testing set, whereas CV2-CV10 constitute the testing set; in the second trial, CV2 is the testing set while CV1 and CV3-CV10 are the training set, and so on. This process is repeated for 10 times, and then the performance results are averaged. For this experiment, the user parameters of the BARTFIS system are set as follows: ρ = 0.1, α = 0.7, and β = 0.009. Fig. 3 plots the ﬁnal center locations of the rules (clusters) generated by the BARTFIS system that are projected to some of the training input features (for CV10). These locations signify that some local regions within the input (force) space are indicative of the tool wear. It is also shown that there are some occasional rules placed at remote areas (e.g., for Input 3), which can be largely attributed to the parameter adjustments made by the DEKF algorithm after rule construction. Regardless, such placement of rules is acceptable, as the primary goal is to minimize the prediction error rather than accurately representing the input space. That is, the DEKF algorithm indicates that the placement would help obtain good tool wear prediction. Fig. 4 illustrates some of the ﬁnal input membership functions crafted by the system. As seen, the membership functions are distributed in their own range. The training traces of the BARTFIS system are subsequently presented in Fig. 5. Fig. 5(a) depicts the evolution of the system’s rule base size during the course of training (CV10). It is shown that the rule base grows at the early stage of training, and then shrinks as more samples are presented. In the latter case, two rules are deleted since they lose their inﬂuences over time, as per the pruning criterion in (35). On the other hand, Fig. 5(b) shows the trace of the actual output error (i.e., the difference between target and predicted outputs) during training (CV10). It can be observed that the output error tends to decrease over time. This demonstrates that the proposed system is able to learn the data more accurately over time. For comparison purposes, experiments were performed using other prominent TSK-typed fuzzy neural methods: the adaptive network-based fuzzy inference system (ANFIS) [17], dynamic fuzzy neural network (DFNN) [19], generalized DFNN (GDFNN) [20], and the fast and accurate online selforganizing scheme for parsimonious fuzzy neural network (FAOS-PFNN) [22], based on the same 10-fold CV procedure. The benchmark results are summarized in Table III. As shown, the proposed BARTFIS system yields the most compact rule base (i.e., 3.7 rules on average) and shortest training time (i.e. 0.35 second on average), while giving competitive generalization performance in terms of testing root mean square error (RMSE). These results are also consistent across different CV trials, as reﬂected by the low standard deviations of the rule base size, training time, and testing RMSE. Although FAOSPFNN produces superior generalization performance here, its rule base size is larger than that of the proposed system. It must also be noted that, based on the deﬁnition in [11]

,QSXW

'DWD &OXVWHU

,QSXW

,QSXW

'DWD &OXVWHU

,QSXW

,QSXW

,QSXW

'DWD &OXVWHU

,QSXW

'DWD &OXVWHU

,QSXW

,QSXW

,QSXW

,QSXW

,QSXW

,QSXW

'DWD &OXVWHU

'DWD &OXVWHU

,QSXW

'DWD &OXVWHU

,QSXW

'DWD &OXVWHU

,QSXW

,QSXW

Fig. 3: Projections of BARTFIS cluster centers for tool wear data (CV10)

)X]]\PHPEHUVKLS

)X]]\PHPEHUVKLS

)X]]\PHPEHUVKLS

)X]]\PHPEHUVKLS

,QSXW

,QSXW

,QSXW

Fig. 4: Fuzzy membership functions crafted by BARTFIS for tool wear data (CV10)

(see section I), ANFIS is inherently a batched learning system, which requires to see all training samples for its learning to take place. As such, its usage is limited to small dataset (with few samples) and time-invariant environment. On the other hand, while DFNN, GDFNN, and FAOS-PFNN are dynamic and can cope with time-varying domains, they are not strictly online (hence denoted in Table III as semi-online), due to the use of the error reduction ratio procedure [19] that requires complete revisit of all training samples seen so far. This would in turn impose high memory and time requirements in the face of a long data stream. The BARTFIS system, by contrast, is a fast, fully online learning system, whereby a training sample is presented to the system only once and can be discarded as

soon as the learning for that sample is completed. Such merit is particularly appealing for dynamic, real-time prognosis tasks. C. Chaotic Time Series Prediction As a preliminary study to verify the efﬁcacy of the proposed BARTFIS model in online modeling of more complex, volatile machining data, we conducted experiments using the chaotic Mackey-Glass time series [25], which is popularly used in the literature to evaluate various neural and fuzzy neural techniques [26], [21], [27], [11]. The time series is generated using the differential delay equation in (38): dy(t) 0.2y(t − τ ) = − 0.1y(t) dt 1 + x10 (t − τ )

(38)

6

0.4

5.5 0.3 5 0.2

4

Output error

Number of rules

4.5

3.5 3 2.5

0.1

0

−0.1

2 −0.2 1.5 1

0

100

200

300 400 Training sample

500

−0.3

600

0

100

200

(a) Rule size trace

300 400 Training sample

500

600

(b) Output error trace

Fig. 5: Training traces for tool wear dataset (CV10) TABLE III: Benchmark results on tool wear data Method ANFIS DFNN GDFNN FAOS-PFNN BARTFIS

Type TSK-1 TSK-1 TSK-1 TSK-0 TSK-1

,QSXW

,QSXW

'DWD &OXVWHU

'DWD &OXVWHU

,QSXW

'DWD &OXVWHU

,QSXW

,QSXW

'DWD &OXVWHU

,QSXW

,QSXW

Testing RMSE 0.0811 ± 0.1286 0.1087 ± 0.2469 0.0818 ± 0.1197 0.0377 ± 0.0059 0.0524 ± 0.0117

,QSXW

,QSXW

Learning mode No of rules Training time (sec) Ofﬂine 10.70 ± 0.67 2.3418 ± 0.2218 Semi-online 12.20 ± 1.75 24.6964 ± 2.9012 Semi-online 17.30 ± 2.98 48.5295 ± 10.2815 Semi-online 13.40 ± 1.58 1.6689 ± 0.1496 Online 3.70 ± 0.67 0.3488 ± 0.0118 TSK-0/1: zero/ﬁrst-order Takagi-Sugeno-Kang fuzzy system

'DWD &OXVWHU

,QSXW

,QSXW

'DWD &OXVWHU

,QSXW

Fig. 6: Projections of BARTFIS cluster centers for Mackey-Glass data

Following the setting adopted in [27], 6000 observations are produced by means of the fourth-order Runge-Kutta method

using a step-size 0.1 and the delay coefﬁcient τ = 17. The observations for 201 ≤ t ≤ 3200 and 5001 ≤ t ≤ 5500 are

0.4

30

0.3 25 0.2 0.1 Output error

Number of rules

20

15

10

0 −0.1 −0.2 −0.3

5 −0.4 0

0

500

1000

1500 2000 Training sample

2500

3000

−0.5

0

500

1000

(a) Rule trace

1500 2000 Training sample

2500

3000

(b) Error trace

Fig. 7: Training traces for Mackey-Glass data

1.4

Desired Predicted

1.3 1.2 1.1 1 Output

uniformly distributed in the range [0.4, 1.4], and are used as the training and testing sets, respectively. The objective is to forecast the signal several steps ahead of the current time t. For this experiment, the (user) parameters of the BARTFIS system are conﬁgured as: ρ = 0.009, α = 0.5, and β = 0.005. Four input features are used: y(t), y(t − 6), y(t − 12) and y(t − 18), while the output to be forecasted is y(t + 85). Projections of the centers of the identiﬁed rules (clusters) to the input data space are shown in Fig. 6. From the ﬁgure, it can be seen that the rules are fairly well-scattered, thus providing good coverage of the input data. The training traces of the BARTFIS system are summarized in Fig. 7. Fig. 7(a) depicts the evolution of the rule base, which grows and shrinks at the beginning of training process and stabilizes toward the end. Fig. 7(b), on the other hand, shows the trace of the output error during the course of training. Again, the output error tends to decrease as more data points are presented to the system. Fig. 8 subsequently compares the testing output forecasted by the BARTFIS system and the target (testing) series. As shown, the system manages to approximate the target series rather well. The consolidated results of the BARTFIS system are subsequently presented in Table IV, whereby the proposed system is benchmarked against various renowned neural and fuzzy neural approaches: the resource-allocating network (RAN) [26], evolving self-organizing map (ESOM) [28], evolving fuzzy neural network (EFuNN) [18], dynamic evolving neuralfuzzy inference system (DENFIS) [21], evolving TakagiSugeno (eTS) [27], simpliﬁed eTS (Simpl eTS) [29], and the sequential adaptive fuzzy inference system (SAFIS) [11]. The generalization performance of the system on the testing set is evaluated based on non-dimensional error index (NDEI), deﬁned as the (testing) root mean squared error divided by the standard deviation of the target series [21]. The results in Table IV show that the proposed system is able to produce competitive performances. In particular, the

0.9 0.8 0.7 0.6 0.5 0.4

0

100

200 300 Testing sample

400

500

Fig. 8: Desired and predicted outputs for Mackey-Glass data TABLE IV: Benchmark results on Mackey-Glass data Method Type Learning mode No. of rules/nodes Testing NDEI RAN NN Online 113 0.373 ESOM NN Ofﬂine 114 0.320 EFuNN TSK-1 Semi-online 193 0.401 DENFIS TSK-1 Semi-online 58 0.276 eTS TSK-1 Online 99 0.356 Simpl eTS TSK-1 Online 21 0.376 SAFIS TSK-0 Online 21 0.380 BARTFIS TSK-1 Online 30 0.279 NN: neural network, TSK-0/1: zero/ﬁrst-order Takagi-Sugeno-Kang fuzzy system

system achieves relatively low NDEI = 0.279, and performs well with moderate 30 rules. Note that, while DENFIS gives comparable NDEI in this case, its rule base size is nearly twice that of BARTFIS, implying higher storage requirement. In addition, the DENFIS (multi-pass) learning procedure normalizes data before training, which implicitly assumes the upper

and lower bounds of the data (and thus all training samples) are known a priori. Therefore, DENFIS is not a fully online system. On the other hand, the proposed system yields slightly more rules than those of SAFIS and Simpl eTS. Nonetheless, the latter’s approximation accuracies are found to be poorer than that of BARTFIS. In summary, these results demonstrate that the proposed system can effectively balance between the approximation accuracy and model simplicity. V. C ONCLUSION A novel approach to modeling and prognosis of machining processes, termed the Bayesian Adaptive Resonance TheoryBased Fuzzy Inference System (BARTFIS), is presented in this paper. In the proposed system, rule construction is achieved via the online BART algorithm that makes it both efﬁcient and statistically sound. Reﬁnement of the rule base parameters is subsequently done using the DEKF algorithm, which offers an efﬁcient recursive parameter optimization method. Finally, inconsequential rules are identiﬁed and pruned to simplify the system’s structure without substantially degrading its prediction performance. The effectiveness of the proposed approach has been validated through experimental studies on tool wear prognosis and chaotic time series prediction tasks. The results demonstrate that the online learning procedure of the BARTFIS is not only efﬁcient, but also yields both competitive prediction accuracy and a compact, intuitive structure. By extension, these suggest that the proposed system can be scaled up and applied to more complex machining operations as well as other real-world problems. As future work, investigation on advanced methods for rule base reduction and interpretability enhancement (see [30] for instance) shall be carried out. ACKNOWLEDGMENT This research is supported by the A*STAR Science and Engineering Research Council Singapore-Poland Programme. The authors also thank the Singapore Institute of Manufacturing Technology for kindly providing the tool wear data. R EFERENCES [1] M. Chandrasekaran, M. Muralidhar, C. Murali Krishna, and U. S. Dixit, “Application of soft computing techniques in machining performance prediction and optimization,” International Journal of Advanced Manufacturing Technology, vol. 46, pp. 445–464, 2010. [2] A. G. Rehorn, J. Jiang, and P. E. Orban, “State-of-the-art methods and results in tool condition monitoring: a review,” International Journal of Advanced Manufacturing Technology, vol. 26, no. 7, pp. 693–710, 2005. [3] P. Huang and J. C. Chen, “Neural network-based tool breakage monitoring system for end milling operations,” Journal of Industrial Technology, vol. 16, no. 2, pp. 2–7, 2000. [4] S. Elanayar and Y. Shin, “Robust tool wear estimation via radial basis function neural networks,” Journal of Dynamic Systems, Measurement, and Control, vol. 117, pp. 459–467, 2001. [5] V. Susanto and J. C. Chen, “Fuzzy logic based in-process tool-wear monitoring system in face milling operations,” International Journal of Advanced Manufacturing Technology, vol. 21, pp. 1433–3015, 2003. [6] P. Bhattacharyya, D. Sengupta, and S. Mukhopadhyay, “Cutting forcebased real-time estimation of tool wear in face milling using a combination of signal processing techniques,” Mechanical Systems and Signal Processing, vol. 21, pp. 2665–2683, 2007. [7] C. T. Lin and C. S. G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems. Upper Saddle River, NJ: Prentice Hall, 1996.

[8] R. J. Kuo, “Multi-sensor integration for on-line tool wear estimation through artiﬁcial neural networks and fuzzy neural network,” Engineering Applications of Artiﬁcial Intelligence, vol. 13, no. 3, pp. 249–261, 2000. [9] P. Fu and A. D. Hope, “Intelligent classiﬁcation of cutting tool wear states,” in Advances in Neural Networks, Lecture Notes in Computer Science, 2006, vol. 3973, pp. 964–969. [10] Z. Uros, C. Franc, and K. Edi, “Adaptive network-based inference system for estimation of ﬂank wear in end-milling,” Journal of Materials Processing Technology, vol. 209, pp. 1501–1511, 2009. [11] H. Rong, N. Sundararajan, G. Huang, and P. Saratchandran, “Sequential adaptive fuzzy inference system (SAFIS) for nonlinear system identiﬁcation and prediction,” Fuzzy Sets and Systems, vol. 157, no. 9, pp. 1260–1275, 2006. [12] B. Vigdor and B. Lerner, “The Bayesian ARTMAP,” IEEE Transactions on Neural Networks, vol. 18, no. 6, pp. 1628–1644, 2007. [13] S. Grossberg, “Adaptive pattern recognition and universal encoding ii: Feedback, expectation, olfaction, and illusions,” Biological Cybernetics, vol. 23, pp. 187–202, 1976. [14] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classiﬁcation, 2nd ed. New York: Wiley, 2001. [15] G. V. Puskorius and L. A. Feldkamp, “Decoupled extended Kalman ﬁlter training of feedforward layered networks,” in Proceedings of the International Joint Conference on Neural Networks, vol. 1, Seattle, 1991, pp. 771–777. [16] T. Takagi and M. Sugeno, “Fuzzy identiﬁcation of systems and its applications to modeling and control,” IEEE Transactions on Systems, Man and Cybernetics, vol. 15, no. 1, pp. 116–132, 1985. [17] J.-S. R. Jang, “ANFIS: adaptive-network-based fuzzy inference system,” IEEE Transactions on Systems, Man and Cybernetics, vol. 23, no. 3, pp. 665–685, 1993. [18] N. Kasabov, “Evolving fuzzy neural networks: Algorithms, applications and biological motivation,” in Methodologies for the Conception, Design and Application of Soft Computing, T. Yamakawa and G. Matsumoto, Eds. Singapore: World Scientiﬁc, 1998, pp. 271–274. [19] S. Wu and M. J. Er, “Dynamic fuzzy neural networks: A novel approach to function approximation,” IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 30, no. 2, pp. 358–364, 2000. [20] S. Wu, M. J. Er, and Y. Gao, “A fast approach for automatic generation of fuzzy rules by generalized dynamic fuzzy neural networks,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, pp. 578–594, 2001. [21] N. K. Kasabov and Q. Song, “DENFIS: Dynamic evolving neural-fuzzy inference system and its application for time-series prediction,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 2, pp. 144–154, 2002. [22] N. Wang, M. J. Er, and X.-Y. Meng, “A fast and accurate online selforganizing scheme for parsimonious fuzzy neural networks,” Neurocomputing, vol. 72, no. 16-18, pp. 3818–3829, 2009. [23] V. Kadirkamanathan and M. Niranjan, “A function estimation approach to sequential learning with neural networks,” Neural Computation, vol. 5, no. 6, pp. 954–975, 1993. [24] E. G. Ng, D. W. Lee, R. C. Dewes, and D. K. Aspinwall, “Experimental evaluation of cutter orientation when ball nose end milling Inconel 718,” Journal of Manufacturing Processes, vol. 2, no. 2, pp. 108–115, 2000. [25] M. C. Mackey and L. Glass, “Oscillation and chaos in physiological control systems,” Science, vol. 197, no. 4300, pp. 287–289, 1977. [26] J. Platt, “A resource-allocating network for function interpolation,” Neural Computation, vol. 3, no. 2, pp. 213–225, 1991. [27] P. Angelov and D. Filev, “An approach to online identiﬁcation of takagi-sugeno fuzzy models,” IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 34, no. 1, pp. 484–498, 2004. [28] D. Deng and N. Kasabov, “Evolving self-organizing maps for online learning, data analysis and modeling,” in Proceedings of the IEEE International Joint Conference on Neural Networks, 2000, pp. 3–8. [29] P. Angelov and D. Filev, “Simpl eTS: A simpliﬁed method for learning evolving Takagi-Sugeno fuzzy models,” in Proceedings of the IEEE International Conference on Fuzzy Systems, 2005, pp. 1068–1073. [30] Y. Jin, “Fuzzy modeling of high-dimensional systems: Complexity reduction and interpretability improvement,” IEEE Transactions on Fuzzy Systems, vol. 8, no. 2, pp. 212–221, 2000.

Bayesian Inference of Viral Recombination