Sparse Representation based Anomaly detection with ...

Viewer
Transcript

SPARSE REPRESENTATION BASED ANOMALY DETECTION WITH ENHANCED LOCAL DICTIONARIES Sovan Biswas

R. Venkatesh Babu

Video Analytics Laboratory Supercomputer Education and Research Centre Indian Institute of Science Bangalore, India ABSTRACT In this paper, we propose a novel approach for anomaly detection by modeling the usual behaviour with enhanced dictionary. The corresponding sparse reconstruction error indicates the anomaly. We compute the dictionaries, for each local region, from feature descriptors obtained from usual behavior. The novelty of the proposed work is in enhancing the local dictionaries based on the similarity of usual behavior with its spatial neighbors. Dictionary enhancement is achieved by appending ‘transformed dictionary’ to the ‘local dictionary’. This ‘transformed dictionary’ is learned based on the transformations of behavior patterns across two neighboring regions. We conduct experiments on widely used UCSD Ped1 and Ped2 datasets to compare with the existing algorithms and demonstrate the improvement in anomaly detection with enhanced dictionaries compared to typically learned local dictionary. Index Terms— Anomaly Detection, Sparse Reconstruction, Dictionary Enhancement. 1. INTRODUCTION Today’s world is filled with many potentially dangerous situations such as bombings, etc and surveillance plays a critical role in avoiding these dangers. Recent advances in hardware made it possible to have cheaper cameras and surveillance equipments leading to huge video data. This is boon and bane for security personnel simultaneously. Boon, as it helps to avoid any potential catastrophe from occurring. But, the huge amount of video data that is needed to be ‘mined’ to achieve a simple objective, pose a major challenge to the security personnels. Thus, an intelligent system is the need of the hour. One of the primary goals of intelligent surveillance is anomaly detection that can avoid catastrophic results if not addressed in time. The challenge is complicated due to diverse scenarios possible such as density of crowd, type of anomalies, etc. Among them anomaly detection in a high density crowd is very challenging (Fig. 1) due to the following two reasons: a) anomalous candidates can be very small in size and b) interaction between the moving objects. Furthermore, lack of training samples with appropriate statistics at all surveilled regions make it difficult to solve the problem. Many algorithms, in recent past, were proposed to address the problem with reasonable success. Recently, sparse l1 solver have been used in many computer vision areas such as face recognition, tracking etc. Cong et al. [1], first introduced sparse reconstruction concept in solving anomaly detection through learned dictionary over a usual pattern. The unique way of forming the bases based on the spatio-temporal structure enabled

Fig. 1. Few anomalous behavior samples (Marked in red)

them to achieve high accuracy for anomaly detection. But, the approach is restricted at scenarios where normal patterns are limited in a given spatial region. This issue has not been addressed adequately in the literature. Though the underlying training data is significantly sparse, there exist high correlation among the spatial neighborhood regions. This can be exploited for enhancing the information at any given location. In this paper, we try to substantiate the local dictionary by appropriately transforming the data from neighboring region. The rest of the paper is divided in five sections. Section 2 discuss the recent work in anomaly detection. Later, we define the proposed framework in brief in section 3 followed by the detailed algorithm in section 4. Experiments are discussed in section 5. In the end, we conclude with section 6. 2. RELATED WORK Many algorithms have been proposed recently in anomaly detection. According to Li et al. [2], all the anomaly detection algorithms can be coarsely divided into two major categories: trajectory based analysis [3, 2, 4] and feature based anomaly detection [5, 6, 7, 8]. Trajectory analysis involves learning the normal behavior patterns through tracking of normal objects/persons and interaction of those tracked objects/persons, whereas video feature based analysis involves detection based on the low level features extracted at space-time cube. The proposed algorithm computes descriptors using optical flow at foreground regions to detect anomaly. Video features for anomaly detection was first used by Itti and Baldi [9], where Poisson modeling of descriptor computed at every location was proposed to detect surprise events. This was followed by Adam et al. [7] using histograms of optical flows as local monitors to detect anomaly. Kim et al. [6] modeled normal pattern using

mixture of probabilistic principal component analyzers (MPPCA) at each node followed by using space-time markov random field to detect anomaly. Kartz et al. [10] harnessed spatio-temporal gradient modeled through hidden markov model to detect the anomaly in a heavily crowded scene. Mahadevan et al. [5] used mixture of dynamic textures (MDT) to model normal crowd behavior successfully. Varadarajan et al. [11] explored topic models for anomaly detection by modeling recurrent activities in long video sequences. More recently, Saligrama et al. [8] assumed anomaly has significant local spatio-temporal signatures that occur for a very small interval and developed a probabilistic framework to detect them.

Fig. 2. Frame Blocks: Statistics in a block is similar and can be accumulated to form a local dictionary and subsequent sparse solving.

3. PROPOSED FRAMEWORK Anomaly is defined as departure from usual statistics. In other terms, given a set of usual d-dimensional data (X = {x |x ∈ Rd×1 }), a candidate (y ∈ Rd×1 ) is abnormal if it deviates from usual pattern (X ). The amount of abnormality (e) can be defined in terms of measure of deviation/distance. Mathematically, e ∝ dist (X , y) where, dist(•, •) indicates the deviation/distance between the test data (y) and usual pattern (X ). The deviation can be represented in many forms. One of the recent approach represents the given candidate data as sparse linear combinations of the optimal subset of usual features referred as ‘dictionary’ (D ∈ Rd×m ), where m is the number of atoms in the dictionary. D = f (X ) (1) where, f (•) is a dictionary learning function on usual data X . The anomaly can be defined in terms of reconstruction error that indicates the measure of deviation. The reconstruction error of any abnormal test sample (y), using the corresponding dictionary, would result in large error compared to that of usual test samples. e ∝ ∥y − Dα∥2

(2)

where, α ∈ Rm×1 denotes the sparse coefficients. e is the reconstruction error which indicates the measure of anomaly. Even though the solution is simple, it is restricted by the learning of appropriate dictionary (Eq. (1)) that depends on the amount of data available for learning. In this paper, we propose to enhance the currently learned dictionary by appending dictionaries learned with another set of similar data. Typically, different sets of data are related under some constrained transformation such as in Eq. (3).

˜ enhances the existing dictionary Thus, transformed dictionary (D) (D) and helps in better representation of the normal pattern. Subsequently, the reconstruction error (e) can be expressed as ˜ e ∝ w1 ∗ ∥y − Dα∥2 + w2 ∗ ∥y − [D, D]α∥ 2

(5)

where, w1 and w2 are weights associated with the reconstruction error from original dictionary and enhanced dictionary (concatena˜ respectively, tion of original and transformed dictionary ie. [D, D]) with following constraint: w1 ≤ w2 . Reconstruction error is the weighted average of both original dictionary and enhanced dictionary to measure the error with a higher penalty of reconstruction error to enhance dictionaries compared to original dictionary. 4. DETAILED ALGORITHM The proposed approach detects anomaly based on the reconstruction error by solving l1 minimization by sparsely representing a candidate using a dictionary of usual behavior. Typically, the behavior change widely depends upon factors such as layout of the scene, distance from camera, topology of paths, etc. For instance, the flow magnitude near to the camera is different from that far from the camera. Hence, it is not suitable to have a global dictionary for the entire scene. The issue can be resolved by dividing the space into homogeneous motion regions and then solving l1 minimization problem independently based on dictionaries learned at each location. As it makes the proposed algorithm highly sensitive to location information, we have proposed a local dictionary for each k × k size block representing a larger region of the frame (Fig 2). The proposed approach constitutes the following four main stages: a) Descriptor extraction b) learning data transformation c) local dictionary formation and enhancement d) anomaly detection. 4.1. Descriptor extraction

X˜j

=

subject to Xj XjT

=

Ψji Xi X˜j X˜jT

(3)

where, Ψji is the transformation from data Xi to data X˜j . Each candidate sample is a linear combination of the corresponding dictionary. Thus, we can consider the dictionaries and underlying data to be related by the same transformation (Eq. (4)) for a given distribution of coefficients as below. Xi Ψji Xi X˜j

≈ ≈ ≈

Di α Ψji Di α ˜j α D

˜j D

=

Ψji Di

(4)

Descriptors play an important role in anomaly detection. Extracting descriptors based on the type of anomaly can effectively help in anomaly detection. For our experiments, we have used histogram of optical flow (HOF), motion magnitude and foreground pixel occupancy for ‘dense’ space time cube of m × m × n. The descriptors are explained in detail below: 4.1.1. Foreground pixel occupancy Foreground pixels are obtained using the method [12] which performs background subtraction. As, the anomaly is considered to be one which exist for majority of the portion in space time cube. Foreground pixel occupancy in the space time cube indicate the importance of the space time cube in capturing the foreground object.

4.1.2. Histogram of optical flow (HOF)

4.3. Dictionary formation and enhancement

Histogram of optical flow provide an important indication about the motion pattern in a space-time cube. Thus, we have used l1 normalized HOF descriptor where flow information is quantized into p bins (p = 10). The algorithm proposed by Liu [13] is used for estimating the optical flow. Each bin contains flow information corresponding to foreground region only.

For optimal dictionary creation, we have used Mairal et al. [14] dictionary learning algorithm which tries to learn dictionary by optimizing Eq. (8), even though the proposed approach is suitable for any dictionary learning algorithm with l2 normalized atoms.

4.1.3. Motion magnitude

where, D is the learned optimal dictionary among all matrices C. xi is the l2 normalized ith data sample in X and αi are its sparse coefficients over dictionary D. The learned dictionary is suitable to represent the underlying data in optimal way, but fails to cater the need of unknown normal variation which might not be present in the training sample in current block but occurs in the neighboring blocks. We enhance the current dictionary by learning the transformation (Ψ) based on the similarity ˜j = of the underlying data. The transformed dictionary, defined as D Ψji Di in Eq. (4), is also used to find the reconstruction error.

Motion magnitude can directly indicate anomaly. For e.g., even though the directional flow for a skater and a walking person at a location could be similar, but they differ drastically in flow magnitude. Thus, an additional bin is introduced for mean magnitude flow corresponding to foreground region in a space-time cube. The mean magnitude is further normalized based on the median statistics in the k × k block (Fig 2). Thus, we have a 12-dimensional descriptor for each space-time cube. Subsequently, we l2 normalize the descriptor prior to transformation learning and subsequent dictionary enhancement. 4.2. Learning data transformation across neighboring blocks The local usual dictionary is learned on data samples/descriptors at each block. But, the data samples at current block are related to data samples of other blocks under some constrained transformations such as Eq. (3). Consider, two blocks i and j with corresponding l2 normalized data Xi ∈ Rd×m and Xj ∈ Rd×n . If the data samples of both blocks are l2 normalized that is required to preserve dot product among different data samples, the transformation matrix Ψji ∈ Rd×d should have the following properties: • det(Ψji ) = 1 or −1

n 1! 1 ( ∥xi − Dαi ∥22 + λ∥αi ∥1 ) 2 D∈C,α∈Rd×n n i=1

min

(8)

4.4. Anomaly detection The local dictionary and enhanced dictionary both contain the normal behavior for a localized block. The anomaly is now defined using reconstruction error (e) of the new candidate feature (y) in a given block with respect to both dictionaries. We compute the sparse coefficients through solving 1 λ2 min ∥y − Dα∥22 + λ1 ∥α∥1 + ∥α∥22 α 2 2

(9)

where, λ1 and λ2 are two regularizing parameters. D is the concate˜ and T dictionaries. T is the trivial dictionary consist nation of D, D of [I − I]. Extending the reconstruction error defined in Eq. (5), we compute the final reconstruction error (e) as: ˜ e = w1 ∗ ∥y − Dα∥2 + w2 ∗ ∥y − [D, D]α∥ 2 + w3 ∗ ∥αT ∥1 (10)

• Ψij = Ψ−1 ji where, d = 12 is the dimension of the features as described earlier. As, Xj XjT = X˜j X˜jT Xj XjT

(Ψji Xi )(Ψji Xi )T

=

5. EXPERIMENTS

On Eigenvalue decomposition, Qj Λj QTj

=

Ψji Qi Λi QTi ΨTji

where, w1 , w2 and w3 are weights such that w3 ≤ w1 ≤ w2 . And, αT denotes the coefficients corresponding to trivial dictionary only.

(6)

where, Qi and Qj are eigenvectors of Xi XiT and Xj XjT respectively. If two blocks i and j have similar variation in data Xi and Xj , then the corresponding eigenvalues Λi and Λj would be highly similar. Thus, Qj = Ψji Qi Ψji = Qi Q−1 (7) j subject to ∥Λi − Λj ∥2 ≤ ϵ So, the best transformation is achieved between two blocks if ∥Λi − Λj ∥2 is minimum and below some threshold ϵ. We compute ∥Λi − Λj ∥2 between block j and neighboring block i, where j = i + h and h being the spatial neighborhood. Typically, ∥Λi − Λj ∥2 is very small in neighborhood and increases across blocks placed far apart.

The focus of the proposed approach is to detect anomaly in a video through enhancing the local dictionary by learning transformations across neighbors. We demonstrate the effectiveness of the approach through evaluating the algorithm on two widely used anomaly datasets named UCSD Ped1 and Ped2 [5]. Ped1 consists of 34 training clips of only usual behavior pattern, whereas testing set has 36 clips of anomalies in form of cyclist, vehicle, etc. All the clips of Ped1 have 200 frames of size 238 × 158. Similarly, Ped2 is surveillance videos with different topology where training has 12 clips along with 16 clips for testing. Each clip has 120 to 180 frames of size 360 × 240. We have performed all the experiments on MATLAB (with mex implementation for optical flow computation) on single core 3.4 GHz Intel i7 processor with 8GB RAM. 5.1. Implementation details For experiments, the space time cube for feature extraction is empirically set to a size of 10 × 10 × 7 for Ped1 and 15 × 15 × 7 for Ped2. We have kept the number of the atoms in dictionary to 50 for a localized block of 10 × 10 for both dataset.

(a) Video Sequence Ped1 Test 001

(b) Video Sequence Ped1 Test 019

ROC (AUC: 85.85%, EER: 19.22%) 1 0.8 0.6 0.4 0.2 0 0

The Proposed Approach Sparse LSA MDT MPPCA Social Force

0.5 False Positve Rate

True Positve Rate (recall)

True Positve Rate (recall)

Fig. 3. Representative results for different video sequences where each row is from a single sequence. The anomalies are marked in ‘Red’ color. (Best viewed in color) (More results available at http://val.serc.iisc.ernet.in/AnomalyResultsICIP14)

ROC (AUC: 85.95%, EER: 20.38%) 1 0.8 0.6 0.4 0.2 0 0

1

1 0.8 0.6 0.4 0.2 0 0

With Dictionary Enhancement Without Dictionary Enhancement

0.5 False Positve Rate

1

c)

Ped1 31% 40% 32% 25% 19% 16% 19.53% 19.22%

Ped2 42% 30% 36% 25% 21.26% 20.38%

The Proposed Approach MDT MPPCA Social Force

0.5 False Positve Rate

1

Table 1. Frame Level Comparison: Equal Error Rate (EER) on Ped data-sets

b) True Positve Rate (recall)

True Positve Rate (recall)

a)

Approaches SF[15, 5] MPPCA[6, 5] SF-MPPCA MDT[5] Sparse[1] LSA[8] Ours (No Enhancement) Ours (With Enhancement)

ROC (AUC: 50.63%, EER: 48.92%) 1 The Proposed Approach Sparse MDT MPPCA

0.8 0.6 0.4 0.2 0 0

0.5 False Positve Rate

Approaches

RD

AUC

SF[15, 5] MPPCA[6, 5] MDT[5] Sparse[1] Our (With Enhancement)

21% 18% 45% 46%

17.9% 20.5% 44% 46.1%

Detection speed (frame per sec.) 0.04 fps 0.25 fps

51.02%

50.63%

∼ 3 fps

1

d)

Fig. 4. a,b) Performance of the different approaches tested for the frame-level anomaly detection on the Ped1 and Ped2 respectively. c) Frame level comparison with and without dictionary enhancement on Ped2 d) Pixel-level comparison on Ped1

5.2. Quantitative evaluation The proposed algorithm has achieved frame-level anomaly detection with area under curve (AUC) of 85.85% and equal error rate (EER) of 19.22% on Ped1 (Fig. 4a and Tab. 1). Even without employing spatio-temporal basis to detect anomaly, the proposed algorithm performs competitively with Local statistical aggregates based anomaly detection [8] and Sparse [1]. To quantify localization of anomaly, Mahadevan et al. [5] proposed that a frame is considered abnormal if atleast 40% of the anomalous region is detected. The proposed algorithm achieves 50.63% AUC and 51.02% Rate of Detection (RD), outperforming other recent anomaly detection algorithms in pixellevel accuracy as shown in Fig. 4d and Tab. 2. On Ped2, the proposed approach performs with EER of 20.38%

Table 2. Pixel Level Comparison on Ped 1: Rate of detection (RD), Area under the curve (AUC) and Detection speed

and area under ROC of 85.95% as depicted in Fig. 4b and Tab. 1, outperforming other algorithms evaluated on Ped2. The proposed approach of dictionary enhancement is also compared with regularly learned local dictionary without changing the parameters (Fig. 4c).

6. CONCLUSION In this paper, we have used enhanced dictionary of usual behavior descriptors to define the normal pattern followed by sparse reconstruction error to measure the abnormalness. Enhancement of the local dictionaries was performed by appending regular local dictionary along with transformed dictionary obtained from underlying transformation between sets of usual data/descriptor across neighboring regions. The experiments conducted on widely used datasets UCSD Ped1 and Ped2 datasets, demonstrate better anomaly detection compared to regularly learned local dictionary.

7. REFERENCES [1] Yang Cong, Junsong Yuan, and Ji Liu, “Sparse reconstruction cost for abnormal event detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 3449–3456. [2] Ce Li, Zhenjun Han, Qixiang Ye, and Jianbin Jiao, “Visual abnormal behavior detection based on trajectory sparse reconstruction analysis,” Neurocomputing, vol. 119, no. 0, pp. 94– 100, 2013. [3] Chris Stauffer and W. Eric L. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 747–757, 2000. [4] Claudio Piciarelli, Christian Micheloni, and Gian Luca Foresti, “Trajectory-based anomalous event detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 11, pp. 1544–1554, 2008. [5] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, “Anomaly detection in crowded scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 1975–1981. [6] J. Kim and K. Grauman, “Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 2921– 2928. [7] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, “Robust real-time unusual event detection using multiple fixed-location monitors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 555–560, 2008. [8] Venkatesh Saligrama and Zhu Chen, “Video anomaly detection based on local statistical aggregates,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2112–2119. [9] L. Itti and P. Baldi, “A principled approach to detecting surprising events in video,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005, vol. 1, pp. 631–637. [10] L. Kratz and K. Nishino, “Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1446–1453. [11] Jagannadan Varadarajan, Sequential Topic Models for Mining Recurrent Activities and their Relationships: Application ´ to long term video recordings, Ph.D. thesis, Ecole Polytechnique F´ed´erale de Lausanne, 2012. [12] “http://clickdamage.com/sourcecode/index. php,” . [13] Ce Liu, Beyond pixels: exploring new representations and applications for motion analysis, Ph.D. thesis, Massachusetts Institute of Technology, 2009. [14] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro, “Online dictionary learning for sparse coding,” in Proceedings of the Annual International Conference on Machine Learning, 2009.

[15] R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior detection using social force model,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 935–942.

Sparse Representation based Anomaly Detection with ...

Sparse Representation based Anomaly Detection using ...

Exemplar-Based Sparse Representation Phone ...

Exemplar-Based Sparse Representation Features ...

Temporal Representation in Spike Detection of Sparse ... - Springer Link

Incorporating Sparse Representation Phone ...

Anomaly Detection and Attribution in Networks with ...

An Anomaly Detection and Isolation Scheme with ...

SPARSE REPRESENTATION OF MEDICAL IMAGES ...

Time Series Anomaly Detection - Research at Google

Enhancing Memory-Based Particle Filter with Detection-Based ...

Traffic Anomaly Detection Based on the IP Size ... - Research at Google

Self-Explanatory Sparse Representation for Image ...

Programming Exercise 8: Anomaly Detection and ... - nicolo' marchi

Network Anomaly Detection Using a Commute ...

Bayesian Pursuit Algorithm for Sparse Representation

Random Sparse Representation for Thermal to Visible ...

Sparse Representation Features for Speech Recognition

Anomaly Detection via Online Over-Sampling Principal Component ...

Anomaly Detection Using Replicator Neural Networks ...

Anomaly detection techniques for a web defacement monitoring ...