An Anomaly Detection and Isolation Scheme with ...

Viewer
Transcript

An Anomaly Detection and Isolation Scheme with Instance-Based Learning and Sequential Analysis Tae-Sic Yoo and Humberto E. Garcia Sensors and Decision Systems Group Idaho National Laboratory, P.O. Box 1625 Idaho Falls, ID 83415-6180 U.S.A. {Tae-Sic.Yoo,Humberto.Garcia}@inl.gov

Abstract— This paper presents an online anonmaly detection and isolation (FDI) technique using an instance-based learning method combined with a sequential change detection and isolation algorithm. The proposed method uses kernel density estimation techniques to build statistical models of the given empirical data (null hypothesis). The null hypothesis is associated with the set of alternative hypotheses modeling the abnormalities of the systems. A decision procedure involves a sequential change detection and isolation algorithm. Notably, the proposed method enjoys asymptotic optimality as the applied change detection and isolation algorithm is optimal in minimizing the worst mean detection/isolation delay for a given mean time before a false alarm or a false isolation. Applicability of this methodology is illustrated with redundant sensor data set and its performance.

I. I NTRODUCTION Technologies enabling condition-based maintenance have drawn attentions to move away from the traditional time directed schedule-based (preventive) maintenance into the more effective predictive maintenance strategy. To meet this challenge, the nuclear engineering community has developed novelty detection techniques [1, 3–5, 11]. These techniques involve statistical learning methods and sequential analysis. In this paper, we propose an alternative simple FDI approach that complements existing FDI techniques. Our approach involves an instance-based learning method combined with a sequential change detection algorithm. The proposed method uses kernel density estimation techniques to build statistical models of the given empirical data (null hypothesis). The null hypothesis is associated with the set of alternative hypotheses modeling the abnormalities of the systems. The proposed approach then incorporates multiple alternative simple hypotheses and applies an optimal change detection and isolation algorithm. Figure 1 depicts our FDI scheme in a diagram format. Our approach has many conceptual commonalities with novelty detection techniques such as the Multivariate State Estimation Technique (MSET) [11] and its variations (e.g. [3]) in that these procedures involve statistical learning techniques and sequential analysis. However, our approach is direct and simple as it does not involve the estimation

Fig. 1.

A simple FDI architecture

of the measurement values to generate the residual signals. Instead of examining the statistical property of residual signals, our procedure examines the probability of measurements directly. Therefore, it skips the step of computing the estimate of sampled signals. Also, novelty detection logic involves a change detection and isolation procedure that gives asymptotic optimality in minimizing the worst mean detection/isolation delay for a given mean time before a false alarm or a false isolation. This should improve the prevalent existing practices of heuristic applications in various sequential analysis techniques. Applicability of this methodology is illustrated with empirical data collected from redundant sensors where one of sensors drifts. This paper is organized as follows. In Section II, we describe the procedures of density estimation and hypothesis construction. Section III gives an optimal change detection and isolation algorithm. In Section IV, we apply the algorithmic procedure given in the previous sections to redundant sensor data set from nuclear systems [2]. Section V concludes the paper with some remarks. II. H YPOTHESIS C ONSTRUCTION Let X = (xi : 1 ∈ i ∈ p) be p independent samples of a d-dimensional random vector with density function f that

represents the nominal operation of systems. We denote the j th sample xj by xj := (x1,j , x2,j , . . . , xd,j )T . Different methods may be used for estimating the probability density based on the given samples drawn from f including parametric and nonparametric methods. Parametric methods assume a functional form of probability density and optimize the free parameters of density function. A typical example is mean and variance estimation of Gaussian distribution. Unlike parametric methods, nonparametric methods do not assume functional forms such as histograms. A widely-used nonparametric density estimation technique is Kernel Density Estimation (KDE) and we adopt this technique in this paper to estimate multivariate probability density functions. The density estimate of the given query vector x is Pp j=1 KH (x − xj ) ˆ fH (x) = p −1/2

−1/2

where KH (x) = |H| K(H x), K is a multivariate kernel function, and H is a symmetric positive definite d × d bandwidth matrix. It is widely recognized that the choice of bandwidth is crucial to give an accurate KDE while the choice of kernel itself is largely irrelevant to the performance of KDE. We will use Gaussian Kernel when we work with applications in the later part of this paper. That is, 1 ||x||2 K(x) = √ exp − 2 ( 2π)d An active area of research regarding KDE, especially for multivariate case, is to find optimal bandwidth matrix. The treatment on selecting bandwidth is beyond the scope of this paper. We will use some conventional heuristics [10] for selecting bandwidths to show the benefits of our FDI scheme. The estimated density fˆ represents the operations of nominal systems. Now let us construct the set of alternative hypotheses, {f1 , f2 , . . . , fM }, representing the various abnormal conditions of the systems. If experimentation and sampling of abnormal system conditions is feasible, we may directly sample from systems experiencing abnormal conditions. Otherwise, we may need to construct the alternative hypothesis with analytical assumptions on the abnormalities. A good example of analytical assumptions on the abnormalities is the deviation of mean value. This assumptions may nicely work when we deal with applications such as in sensor drift detections. III. C HANGE D ETECTION AND I SOLATION We are interested in detecting and isolating anomalies in a sequential manner. Wald’s Sequential Probability Ratio Test

(SPRT) [12] has been used widely in designing detection systems. Wald’s test is an optimal sequential hypothesis test satisfying given probability detection and false alarm constraints. When designing detection systems, we are not interested in testing hypothesis but interested in detecting change of the true hypothesis from one to some others. These are two different issues and there is a body of literature devoted to the both topics. Page’s CUSUM algorithm [9] provides an optimal procedure for sequential change detection problem [6] as SPRT for sequential hypothesis testing. Let us recall Page’s CUSUM algorithm. Consider a sequence of independent random variables X1 , X2 , . . .. Let X1 , X2 , . . . , Xv−1 be i.i.d. sequence with probability density f0 , and Xv , Xv+1 , . . . be i.i.d. sequence with probability density f1 . Time of change v is not known. Many sequential analysis techniques, including Page’s CUSUM test, involve the following summation of loglikelihood ratio of probability densities: Gin =

i X

m=n

log f1 (Xm ) − log f0 (Xm )

where CUSUM statistics is to find a subsequence that prefers density f1 over f0 most. That is, Qi = max Gin . 1≤n≤i

Given a threshold h > 0, Page’s CUSUM test is simply to determine if Qi ≥ h. The recursive form of this test is Qi = max(Qi−1 + log f1 (Xi ) − log f0 (Xi ), 0) = max Gin 1≤n≤i

with Q0 = 0. The stopping time of N of CUSUM is the first time when Qi ≥ h. That is, N = inf{i ≥ 1 : Qi ≥ h}. The stopping time N is asymptotically optimal in the following sense [6]: E0 (N ) ≥ eh and τ 1 (N ) = E1 (N ) ∼

h as h → ∞. ρ

where Ej (N ) denotes the mean time before a false alarm or a false isolation of j-type, τ 1 (N ) denotes the worst mean detection/isolation delay of an 1-type change, and ρ is the Kullback-Leibler distance between the two densities. For multiple alternatives, consider a sequence of independent random variables X1 , X2 , . . .. Let X1 , X2 , . . . , Xv−1 be i.i.d. sequence with probability density f0 , and Xv , Xv+1 , . . . be i.i.d. sequence with probability density fj ∈ {f1 , . . . , fM }. Time of change v to one of the alternative hypotheses is not known and we are interested in detecting and isolating the change from f0 to fj in a

sequential manner. Nikiforov’s treatments in [7] generalize CUSUM procedure to accommodate multiple alternatives and show asymptotic optimality of the algorithm in minimizing the worst mean detection/isolation delay for a given mean time before a false alarm or a false isolation. A drawback of Nikiforov’s procedure is that the proposed algorithm is not recursive. Subsequent developments in this line of research include [8] that provides a recursive procedure while preserving asymptotic optimality of Nikiforov’s procedure. We recall the procedure in [8] below. We use this procedure for the detection and isolation logic of our proposed FDI scheme. ˆ is defined by ˆ , δ) Given h > 0, the pair (N ˆ = min{N ˆ 1, N ˆ 2, . . . , N ˆ M } and δˆ = arg min N ˆj N 1≤j≤M

where ˆ j = inf{i ≥ 1 : N

min

max Gin (j, k) ≥ h}

0≤k6=j≤M 1≤n≤i

For CUSUM test with multiple alternatives described in Section III, we used the threshold h = 20. Figure 3 shows the minimum CUSUM statistics of each hypothesis against other hypothesis. For the sake of visibility, we only plotted every 10th minimum CUSUM statistics of each hypothesis. To visualize the moment of decision, Figure 4 shows the saturated (at 20) minimum CUSUM statistics of each hypothesis against other hypotheses. This shows that the procedure detects and isolates the change from the null hypothesis to the 9th hypothesis (lower drift of sensor 5) at the 79th testing data as minimum CUSUM statistics of the 9th hypothesis against other hypotheses becomes higher than the threshold h at that moment. Figure 5 shows the complete minimum CUSUM statistics of the 9th hypothesis against other hypotheses. Note that there is no mis-isolation because only the minimum CUSUM statistics of the 9th hypothesis against other hypotheses becomes higher than the threshold h over the entire run.

and

Training (o) & Test (x) Data for Redundant Sensors

m=n

64

log fj (Xm ) − log fk (Xm ).

Note that CUSUM statistics of j th hypothesis against k th hypothesis, max1≤n≤i Gin (j, k), is recursively computable. Therefore, the minimum CUSUM statistics of j th hypothesis against other hypotheses, min0≤k6=j≤M max1≤n≤i Gin (j, k), ˆ is the is also recursively computable. The stopping time N instance when the minimum CUSUM statistics of one of ˆ becomes higher than the given threshold hypotheses, say δ, h. This is the moment when the procedure declares that the ˆ true hypothesis has changed from the null hypothesis to δ. IV. A PPLICATIONS A. Redundant Sensor Figure 2 shows a redundant sensor data set (9 sensors) that was collected from an operating nuclear reactor [2]. We note that there is a down drift in one of the redundant sensors in Fig. 2 (indicated as the fifth sensor in the figure). We used the first 200 vector data points ({x1 , . . . , x200 }) to build the null hypothesis where xj := (x1,j , x2,j , . . . , x9,j )T . The remaining 600 vectors were used for testing the detection and isolation performance. Assuming the measurement noise characteristics would not change, we built 18 alternative hypotheses representing σi up and down drifts of ith sensor. In this example, We obtained σi from the variance of the first 200 data points of ith sensor as below: σi2 := Var({xi,1 , . . . , xi,200 }). The Gaussian kernel is used, with bandwidth selected using a rule of thumb described in [10]. With the above procedure, we built 18 alternative hypotheses from the null hypothesis.

63.5

63

Sensor Output

Gin (j, k) =

i X

The 5th sensor drfits down

62.5

62

61.5

61

60.5

60

Fig. 2.

0

100

200

300 400 500 Observation Number

600

700

800

Redundant Sensor Data

V. C ONCLUDING R EMARKS We proposed a novel procedure for designing simple but very effective detection systems that takes advantage of two matured disciplines; i) density estimation and 2) change detection and isolation. Our two step sequential procedure involves constructing hypothesis via density estimation techniques and applying change detection and isolation algorithms. Performance comparisons against other comparable detection techniques such as MSET [11] and Auto-Associate Kernel Regression method [3] are in progress. VI. ACKNOWLEDGEMENT The research reported in this paper was supported by the U.S. Department of Energy contract DE-AC07-05ID14517.

22 14000 20

Minimum CUSUM statistics

12000

18

10000

16

8000

14 12

6000 10 4000

8

2000

6

0 0

4 2 10 20

10

0

Thresholded Minimum CUSUM statistics

60

CUSUM with multiple alternatives

Fig. 5. 20

20

15

10

5 50 0 0

5

10

15

20

0

Time × 10

Hypothesis

Fig. 4.

0

Time (× 10)

Hypothesis

Fig. 3.

30

20

50

40

Matrix CUSUM evolution with threshold at 20

We thanks Prof. Wesley Hines and Mr. Dustin Garvey for sharing with us the PEM toolbox and experimental data used in this paper. R EFERENCES [1] P. Fantoni. Experiences and applications of peano for on-line monitoring in power plants. Progress in nuclear energy, 46(3-4):206–225, 2005. [2] D. Garvey and J.W. Hines. Process & Equpment Monitoirng Toolbox: User’s Guide. Nuclear Engineering Department, The University of Tennessee. [3] J. W. Hines, D. Garvey, J. Garvey, and R. Seibert. Nuclear application of on-line sensor calibration monitoring for safety critical sensors. In First World Congress on Engineering Asset Management, 2006. [4] J.W. Hines and E. Davis. Implementation of on-line monitoring programs at nuclear power plants. In Proc. of 6th International Conference on Fuzzy Logic and Intelligent Technologies in Nuclear Science (FLINS), 2004. [5] K.C. Gross K.E. Humenik. Sequential probability ratio tests for reactor signal validation and sensor surveillance applications. Nucl. Sci. and Eng., 105:383–390, 1990.

0

100

200

300 Time

400

500

600

Minimum CUSUM evolution of 9th hypothesis with threshold at

[6] G. Lorden. Procedures for reacting to a change in distribution. Ann. Math. Statist., 42(6):1897–1908, 1971. [7] I. V. Nikiforov. A generalized change detection problem. IEEE Transactions on Information Theory, 41(1):171–187, 1995. [8] T. Oskiper and H.V. Poor. Online activity detection in a multiuser environment using thematrix cusum algorithm. IEEE Transactions on Information Theory, 48(2):477–493, 2002. [9] E.S. Page. Continous inspection schemes. Biometrika, 41:100–115, 1954. [10] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall: London, 1986. [11] R. M. Singer, K. C. Gross, J.P. Herzog, R.W. King, and S.W. Wegerich. Model-based nuclear power plant monitoring and fault detection: Theoretical foundations. In Proc. 9th Intl. Conf. on Intelligent Systems Applications to Power Systems, 1996. [12] A. Wald. Sequential Analysis. John Wiley & Sons, 1947.

Anomaly Detection and Attribution in Networks with ...