Automated indicators for behavior interpretation G. J. Burghouts*, B. van den Broek, B. G. Alefs, E. den Breejen, K. Schutte *TNO Defence, Security and Safety, The Netherlands ([email protected])

Keywords: Monitoring of people, behavior interpretation, machine learning, expert knowledge, sensory feature extraction.

• Trajectory-based behavioral indicators. We deduce automatically group trajectories and interactions between individuals.

Abstract

• Appearance-based behavioral indicators. We deduce behavioral indicators from examples that are considered as relevant by an expert.

While sensors become distributed at an unprecedented scale, their use in the monitoring of hostile activities is very limited. Monitoring by humans is demanding and expensive. To aid in the complex and semantic task of deciding whether a situation implies hostile intent, the authors describe a set of automated behavioral indicators based on camera and radar data.

1 Problem This article considers the task of facilitating the safety at public areas, for instance, at manifestations, or public transport. One of the foremost problems that the responsible parties are faced with, is the lack of affordable and well-functioning abilities to signal potential threats to the public safety. To mention one recent example, the ‘MP3-murder’ in Belgium, where a child was killed in a railway station after a quarrel about an audio player. The problem arises partly from limited human and technological resources to perform the complex task of interpreting the behavior of people or crowds.

2 Our approach Our study aims at improving technological resources to aid in the monitoring of people. Improving the public safety involves safety professionals taking adequate actions, for whom it is necessary to have the relevant information available. We consider the sensory information delivery. Sensory data is filtered by a model of expert knowledge to retain relevant information about the perceived behavior of people. Our major innovation is to develop intelligent software that connects (i) expert knowledge about human behavior and suspect activities, (ii) sensory perception, and (iii) action-oriented information delivery. The focus in this article is to indicate the behavior of persons and groups of people. The projected application is to deduce potential hostilities. Examples provided in this article are: detection of people, behavior of groups of people (group formation and splitting), persons moving to a suspicious spot, and detection of carried objects. The contribution of this article is two-fold:

The article contains experimental results illustrating how behavioral indicators may contribute to situation understanding.

3

Localization and visual feeds

In a first stage, we localize objects and retrieve their visual appearance. The sensors considered in this study are: • Stereo-radar. Used to detect, track and to measure the speed of objects. Coverage: range of 200m, the field of view is 40 degrees. • Camera’s. Used to capture the appearance of objects. The wide field-of-view (FOV) camera captures the integral scene, whereas the narrow-FOV camera is able to monitor objects in detail. 3.1

Localization on non-flat terrains

Radar measurements are particularly useful for discriminating moving objects from the stationary background. Hence this sensor is used to detect, track and to measure the speed of objects. Moving objects are detected automatically from rangeDoppler processing of the reflections, where clutter is removed by a local threshold [5]. Stereo processing is used to obtain the angle of the detected moving objects. The angle of a reflection is established from phase differences between the left and right radar. Hence we obtain location and velocity information of a person. A more complete description of the radar processing can be found in [9]. The objective is to relate the detections to world coordinates, such that we know where persons are. See the example in Figure 1 (a). For instance, we are interested in the case where a person is getting close to a particular building that is to be monitored. The problem is that the terrain is not flat. For this purpose we exploit a digital elevation map (DEM), see Figure 1 (b). To relate the view from the sensors to the DEM, the geometry of the sensors is calibrated once by comparing a known track to a track obtained from the radar. We calculate

(a) Detections

(d) Calibration

(b) Digital elevation map (DEM)

(e) Tracking

(c) Localization

(f) Visuals

Figure 1. People are detected by radar (a). For localization the digital elevation map (DEM) is used (b), resulting in world coordinates (c). Locations are coupled to terrain type (c, inset), which provides an additional information source for behavioral analysis. Tracking requires calibration to world coordinates (d) and yields segmentations (e, green boxes) from which the visual appearance is extracted (f).

(a) Persons crossing

(b) Group splitting

Figure 2. Automatic analysis of the trajectories. Trajectories consist of connected locations (dots). These dots are the output of the localization procedure as in Figure 1. To connect the dots, partial trajectories (in different colors) are constructed by local principal curves (black lines). If the local curves satisfy good continuation they are connected (red dashed lines) and splitted otherwise.

the intersection of the radar detection with the DEM to obtain accurate object locations. Localization results are shown in Figure 1 (c). Note that, based on visual information from Figure 1 (a), one may have wrongly interpreted that all detections (in the three circles) are at a same distance from each other. They are not equidistant: the up-right detection (dashed green circle) is further away as it is on the backside of a hill. The localization that is based on the DEM, Figure 1 (c), correctly indicates that the detection behind the hill is further away. 3.2 Retrieving the visual feeds In addition to the radar, the wide FOV camera tracks moving objects. The objective here is to attach to a localized object its visual appearance. To associate the radar detections and camera tracks, the object is also localized from camera feeds. In this way, the radar and camera have a shared reference frame i.e. the world coordinates. The relation between camera pixels and world coordinates is established from a calibration procedure that has to be performed once and does not require knowledge of the camera parameters [4], see Figure 1 (d). In a second step, we retrieve the visuals from objects by tracking them. We apply the foreground vs. background as proposed in [7]. This tracker effectively clusters parts in the image based on homogeneous motion features and color features that stand out from the background. Results are depicted by green boxes in Figure 1 (e). From each track, we extract the segmented shape, as illustrated in Figure 1 (f).

4 Behavioral indicators from trajectories Tracking people over a longer period of time, provides interesting information whether people interact with each other. We aim to determine, among others, group formations, meetings, and more suspicious group break-ups where everybody walks suddenly into a different direction. We have developed automated detectors for such particular events based on Principle Curve Analysis [2]. Given a starting point, principle curve analysis finds a path through the point data that minimizes deviations along the principle line, as given by the first component of a local principle component-analysis (PCA). A curve is found that in a purely local manner describes the strongest movement, given a set of (noisy) object observations. Although principle curve analysis is robust for noisy observations, it does not naturally deal with occlusions. Our method identifies curve ambiguities and reconstructs trajectories after occlusion and group formation and splitting based on a good continuation criterion. This criterion is quantified in terms of spatio-temporal angles (speed) and curve length (duration). Figure 2 shows two examples of our method, for sequences denoted as persons crossing and group splitting. Each left panel shows the clustered points (different colors) from a 3D trajectory. The black crosses indicate for each cluster the principle components in 2D, where the 3D to 2D projection is based on optimal alignment between the clusters. The cluster align-

ment is based on the angle of intersection. Our method has connected the clusters that satisfy the good continuation criterion. The right panel of Figure 2 depicts the connection schemes, with the connections indicated by the red dashed lines. Connected clusters are correctly identified as continuous trajectories.

5 Generic video features for individual behavioral indicators Visual details are very informative of a person’s behavior. For instance, one may be interested in whether a person is carrying an object, or the more suspicious counterpart that the person is getting rid of it. 5.1

Dynamic, non-rigid objects as collections of parts

The rationale is to describe dynamic, non-rigid objects as a collection of parts. Hence, in a video feed, we collect image patches from the object. The patches are detected from strong motions as in Figure 3 (a). Motion is detected by an online Gabor filter [1], see Figure 3 (b). Its local maxima provide us with detections of moving video patches, see Figure 3 (c). In a small region around the detections grayscale Hu-features [3] are computed, also for the frame before and after the detection, to obtain a spatio-temporal feature vector. 5.2

Comparing moving objects, or sets of moving parts

To compare moving objects to each other, a similarity measure is needed from one set of Hu-feature vectors to another. In our application, the number of patches may vary and thus the number of features. Hence the similarity function should be able to compare two sets of patches that are not of the same size. The Earth Mover’s Distance (EMD) is able to express dissimilarity between two sets of different sizes, based on individual feature-to-feature dissimilarities [6]. The EMD dissimilarity is the result of a minimization of feature-to-feature assignment, resulting in larger dissimilarities if the features from the two sets are less similar. An application of the EMD dissimilarity measure is shown in Figure 4. The persons are reasonably similar (small EMD), whereas the person has a large dissimilarity to the dog (large EMD, see figure caption for an elaboration). Interestingly, the persons are concluded to be similar, while their posture, poses and the colors of their clothes are very different. The EMD similarity measure provides us a means to compare objects and their behavior.

6

Learning from expert knowledge

In this section, experts are enabled to specify the behavioral indicators that are relevant for a given application or scenario. The first issue is to design a selection tool. The second issue is to deduce an automatic indicator from the resulting selection. A selection tool enables the expert to select fragments of video which consists a particular behavior of interest (about 2-

close up @ frame t0

0.01

t HframesL 10

20

30

40

50

60

70

-0.01

(a) Motion sequence

(b) Temporal Gabor filter

detections and image content @ frames {t0,t−1,t−2}

detections, image content and features @ frame t 0

detections and computed features @ frames {t0,t−1,t−2}

(89, 176)

(89, 176)

(89, 176)

(112, 175)

(112, 175)

(112, 175)

(98, 149)

(98, 149)

(98, 149)

(109, 164)

(109, 164)

(109, 164)

(78, 177)

(78, 177)

(78, 177)

(97, 150)

(97, 150)

(97, 150)

(89, 164)

(89, 164)

(89, 164)

(d) Features

(e) 3 time samples

(c) Patches

(f) Feature vector

Figure 3. Motion of a person (a) measured by a Gabor filter (b) yielding detections of moving patches (c). Around the image patches (d, left) Hu-features (d, right) are accumulated into a feature vector (e). Between brackets are patch locations.

Figure 4. The comparison between the left (person A) and middle (person B) feature vectors are on average much better (small EMD dissimilarity) than the matches between the left (person A) and right (dog) feature vectors (large EMD dissimilarity).

3 seconds). Next, the expert needs to indicate four fundamental issues. The first issue is whether: • The expert is interested solely in the case he indicated. This means that the behavioral indicator that is to be learned becomes a one-class classifier [8]. An example of one-class classification is person vs. non-person; person is the class of interest. • The expert is interested in two cases between which a distinction has to be made. The behavioral indicator constitutes a two-class classifier. Two-class classification implies that for both classes examples have to be provided by the expert. An example is to decide whether a person carries an object or not: we learn the dynamics of both cases and search for the differences (see third experiment). Second, the expert has to indicate in which class he is interested. That is, the constructed behavioral indicator should know when to sign the user or system. For instance, the expert may only be interested in the cases where a person is present that carries an object. Third, the object part of interest has to be indicated, and four, which type of feature is expected to be discriminative.

7 Appearance-based behavioral indicators Experiments have been carried out to evaluate the appearancebased indicators as modeled by the video features and learned from expert input. For each behavioral indicator, Table 1 summarizes the expert input. Supplemental materials. The results that will be discussed below are examples taken from processed movies which serve as supplemental material to this article and will be shown on the conference. 7.1 Is it a person in front of my camera? Both the observation and the dynamics from the whole object are exploited to distinguish persons from non-persons. Figures 5 (a) and (b) show classified examples. Note that the classified persons have significantly different appearance, which makes the classification challenging. Furthermore, the nonperson is a dog, which has moving legs etc. likewise the persons. Yet the indicator is able to discriminate the persons from non-persons. 7.2 Is the person leaving the road? To distinguish persons that are either on or off the road, again, both the observation and the dynamics are exploited. The only difference with the previous experiment is the part on which the behavioral indicator focuses, i.e. patches besides the lower part of the person. In search for specific evidence – like in this case with a detection of the road – the expert has an addional option in the selection tool to tune the classifier with an example image, in this case a small fragment that contains the road. This explains the preference of detected image patches

to be located on the road, see Figures 5 (c) and (d). The figure shows that additional expert input enables the classifier to tune the patches of interest such that they are optimized to model the road. The behavioral indicator is able to discriminate whether persons are on or off the road, such that persons leaving the road can be identified. 7.3

Is the person carrying something?

To distinguish whether persons are carrying an object or not, the behavioral indicator focuses on the dynamics only. Dynamics are derived from the image patches that are beside the lower part of the person. The rationale is that persons who carry an object have a lower dynamic around the carried object. Figures 5 (e) and (f) show classified examples. The behavioral indicator is able to discriminate whether persons carry an object, such that persons getting rid of an object can be identified. Note that the behavioral indicator is able to specify the position of the carried object, as can be concluded from the red and orange patches. Interestingly, the high contrast of the carried object is not an informative detail: the hands of the person who is not carrying an object also display a high contrast.

8

Conclusions

In this article, two types of behavioral indicators have been proposed: trajectory-based and appearance-based indicators. Trajectories of moving people have been shown to provide information about group formation and interaction. For the appearance-based behavioral indicators, sensory features have been exploited to describe human behavior. From the sensory features, the behavioral indicators are learned automatically from expert input. Experimental results entail three examples of the expressive power of the proposed framework. The combination of generic video features and only four parameters defined by the expert is powerful. We have illustrated that very specific problems in the field of situation awareness can be solved with a minimum of adjustment or tailoring.

Acknowledgements The authors kindly thank Laura Anitori and Rob Boekema for implementing the radar clustering and localization. Arthur Smith is acknowledged for specifying the required functionality of the proposed method. We are grateful to John Schavemaker, Jan-Willem Marck, Leon Kester and Albert Huizing for discussions about the proposed algorithms.

References [1] G. J. Burghouts and J. M. Geusebroek. Quasi-periodic spatiotemporal filtering. IEEE Transactions on Image Processing, 15(6):1572–1582, 2006. [2] J. Einbeck, G. Tutz, and L. Evers. Local principal curves. Statistics and Computing, 15(4), 2005.

Table 1. Expert input for the behavioral indicators Behavioral indicator

One/two-class

Class of interest

Object part

Properties

Person?

One

1

whole

observation and dynamics

On the road?

One

2 (off the road)

besides lower part

observation and dynamics

Carrying an object?

Two

1

middle part

dynamics only

Behavioral indicators discussed in the experiments. Expert input concerns the machine learning parameters.

[3] M. K. Hu. Visual pattern recognition by moment invariants. IRE Transactions on Information Theory, 8:179–187, 1962. [4] E. Malis and R. Cipolla. Multi-view constraints between collineations: application to self-calibration from unknown planar structures. In European Conference on Computer Vision, volume 2, pages 610–624. IEEE, 2000. [5] R. Nitzberg. Clutter map CFAR analysis. IEEE Transactions on Aerospace and Electronic Systems, 22(4):419– 421, 1986. (a) Person

(b) Non-person

[6] Y. Rubner, C. Tomasi, and L. J. Guibas. A metric for distributions with applications to image databases. In IEEE Conference on Computer Vision & Pattern Recognition, 1998. [7] C. Stauffer and W. E. L. Grimson. Adaptive background mixture models for real-time tracking. In IEEE Conference on Computer Vision & Pattern Recognition, 1999. [8] D. M. J. Tax. One-Class Classification: Concept-Learning in the Absence of Counter-Examples. PhD thesis, Delft University of Technology, June 2001.

(c) On the road

(e) Carried object

(d) Off the road

(f) No carried object

Figure 5. The behavioral indicators discriminate between persons (a) and non-persons (b), whether a person is on the road (c) or not (d), and whether persons carry an object (e) or not (f). Image patches that are descriptive are colored red to yellow, for non-descriptive patches the coloring is towards blue.

[9] B. van den Broek, G. J. Burghouts, S.P. van den Broek, R. Hagen, L. Anitori, W. van Rossum, and A. Smith. Automatic detection of hostile behaviour. In SPIE Defence Europe, 2009.

Automated indicators for behavior interpretation

of carried objects. The contribution of this article is two-fold: • Trajectory-based behavioral indicators. We deduce automatically group trajectories and interac- tions between ... Moving objects are detected automatically from range-. Doppler processing of the ..... A metric for dis- tributions with applications to image databases.

363KB Sizes 0 Downloads 267 Views

Recommend Documents

Quality Indicators: creation and interpretation Hylke ...
decreased significantly between 2009 and 2014 in Luxembourg and Spain. Source: Comext (EUROSTAT), ORBIS data and own calculations. B1.2) Quality average rank, dynamics. We can also analyse country by country evolution. Here we can find the evolution

KEY ECONOMIC INDICATORS
Nominal wage rate index for workers in all wages boards. 2.3. 7.8. 2.1. 21.4. 25.6. 4.9. 32.0. Nominal wage rate index for central government employees. 8.3.

Board Effectiveness Indicators
Are directors offered continuing education in governance or a program of director certification? ❑Yes ❑ No. Does each director display a keen interest or passion ...

Automated Methods for Evolutionary Pavé Jewellery Design
Jan 15, 2006 - Keywords Jewellery design, evolutionary algorithm, aesthetics, ..... Whilst the more natural application of this algorithm might appear to be in ...... to aid in the automated construction of Pavé jewellery exist, although at a price.

Automated Device Pairing for Asymmetric Pairing Scenarios
5. Prior Work. ▫ Seeing-is-Believing by McCune et al. [Oakland'05] o Based on protocol by Balfanz et al. [NDSS'02]. A. B pk. A pk. B. H(pk. A. ) H(pk. B. ) Insecure Channel. ▫. Secure with: o. A weakly CR H() o. An 80 bit permanent key o. A 48 bi

Automated Device Pairing for Asymmetric Pairing Scenarios
10. Notations- Adversarial Model. Adopted from Canetti and Krawczyk [EUROCRYPT, 2001]. .... Receiver functionality is implemented on the laptop computer.

Automated Architecture Consistency Checking for ...
implementation, design documents, and model transformations. .... of the same stage of a software development process, e.g., comparing UML sequence.

Automated Screening System For Acute Myelogenous Leukemia.pdf ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Automated ...

Automated Screening System For Acute Myelogenous Leukemia ...
Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Automated Sc ... mia ieee.pdf. Automated Scr ... emia ieee.pdf. Open. Extract. Open with. Sign

and indicators
defining information needs must be based on consensus building among all the actors ... decision-making process at all levels of the health services. • Step 1: ... analysis of functions of the different management levels of the health system. .....

KEY ECONOMIC INDICATORS
AGGREGATE DEMAND AND SAVINGS (per cent of GDP) (f). Consumption. 82.6. 82.1. 83.0. 82.4. 86.1. 82.1. 81.3. Private. 72.1. 69.0. 67.7. 67.2. 70.0. 64.4.

Digital and Automated Ebulliometer for wines_29jan11 compr.pdf ...
Digital and Automated Ebulliometer for wines_29jan11 compr.pdf. Digital and Automated Ebulliometer for wines_29jan11 compr.pdf. Open. Extract. Open with.

Indicators Worksheet Answers.pdf
Try one of the apps below to open or edit this item. Indicators Worksheet Answers.pdf. Indicators Worksheet Answers.pdf. Open. Extract. Open with. Sign In.

How To Prepare For Data Interpretation and Logical Reasoning For ...
How To Prepare For Data Interpretation and Logical Reasoning For CAT Arun Sharma.pdf. How To Prepare For Data Interpretation and Logical Reasoning For ...Missing:

How To Prepare For Data Interpretation and Logical Reasoning For ...
How To Prepare For Data Interpretation and Logical Reasoning For CAT Arun Sharma.pdf. How To Prepare For Data Interpretation and Logical Reasoning For ...

Automated
programmedhomeprofitsandprogrammedblogeasybucksnet1499501310645.pdf. programmedhomeprofitsandprogrammedblogeasybucksnet1499501310645.

Automated Mailbox
Hitachi primarily because Atmel is less feature filled then the other microcontrollers. The Intel and .... //see "Photodiode Test Diagram.png" for hookup information.

Worksheet in DTI performance indicators and targets for 2011.pdf ...
Trade policy/negotiation 1. Number of policy proposals approved/endorsed by the Secretary Number/Absolute Value 15. Trade facilitation/promotion 2. Value of ...