Curvature Scale Space Based Affine-Invariant Trajectory Retrieval Faisal Bashir1, Ashfaq Khokhar2

Department of Elect. & Comp. Engg., University of Illinois at Chicago, 851 S. Morgan St., Chicago, IL, 60607. {1fbashi1, 2ashfaq}@uic.edu

Abstract In this paper, we outline a novel technique for robust indexing and retrieval of video object trajectories. Our approach is based on using spatiotemporal curvature to represent the trajectories. Dominant inflection points from the curvature are extracted at multiple levels of scale simultaneously revealing structure at varying levels of details. These inflection points, the maxima of Curvature Scale Space (CSS), are then used for indexing and retrieval. Experiments on a database containing trajectories of high jumps and other motions show the affineinvariant nature of our approach.

1. Introduction Object motion provides an important clue when dealing with the problem of content-based indexing and retrieval of video data. First stage towards this analysis requires spatio-temporal segmentation and tracking of the object of interest appearing in a video clip. Estimating motion content and object tracking has been an active area of research over the past few decades. The drive towards it has mainly come from standardization efforts in video compression, viz. MPEG-1, MPEG-2 and MPEG-4. Motion estimation improves the compression efficiency of MPEG-1 and MPEG-2 coders while object segmentation and tracking makes the object-based video coding possible in MPEG-4 [1]. Once the object tracking has been performed, the rich motion content of object represented as a trajectory can help mine more information about video data than without. On these lines of object trajectory based video retrieval, Chen et al [3] proposed a segmented trajectory based approach. The trajectories are segmented based on extrema in acceleration measured by high frequency wavelet coefficients. Each subtrajectory is then represented by a feature vector containing acceleration, velocity, and arc length. In [2], Bashir et al. segment the trajectories

based on dominant curvature zero-crossings and represent each subtrajectory by using Principal Component Analysis (PCA). One important aspect that many of the trajectory based indexing and retrieval schemes seem to overlook is the view-dependence in the multi-view nature of the problem. Similar object trajectory captured through different view-points leads to an entirely different representations of the trajectories in most of the raw trajectory based representations. In this paper, we represent the object trajectory as an open-ended, 2-D planar curve. This allows us to cast the trajectory indexing and retrieval problem into shape representation and matching. Owing to the recent advances in affine-invariant shape matching, we address the problem of trajectory representation and retrieval in a view-independent setting. Extending earlier work on shape representation based on Fourier descriptors and moment invariants, Hu [4] proposed a set of seven invariant moments derived from a set of second and third order moments (variance and skewness). This set of moments is invariant to translation, rotation and scale changes. Finite element methods (FEM) [9] have also been used as shape representation tool. FEM defines a stiffness matrix that describes how each point on the object is connected to other points. The eigenvectors of the stiffness matrix are called eigen-modes and span the feature space. All the shapes are first mapped into this space, and similarity is then computed based on the eigenvalues. Mokhtarian et al [6] proposed Curvature Scale Space, a multiscale curvature based shape representation technique for planar curves. The representation is computed by convolving a path-based parametric representation of the curve with a Gaussian function, as the standard deviation of the Gaussian varies from a small to a large value. At each level, curvature zero crossing points of the resulting curve are extracted and a binary image is made out of this, called the CSS image of the shape. In [7], the robustness of CSS based representation under affine transformation is

considered. Their results suggest that maxima of CSS image can be used for shape representation in similarity retrieval applications even under affine transformations. The rest of the paper is organized as follows: Section 2 outlines an overview of the CSS theory. Section 3 describes our approach of generating and indexing the CSS image to represent object trajectories as well as the retrieval process. Section 4 presents the results and section 5 wraps up conclusions.

2. Curvature Scale Space Over the last decade, scale-space based methods have gained popularity in computer vision [5]. The notion of scale in measured data is handled by representing measured signal at multiple levels of detail, from the finest (original signal) to the coarsest (most-smoothed version). Witkin [12] first proposed this methodology to obtain a multi-scale representation of a measured signal by embedding the signal into a one-parameter family of derived signals, the scalespace. Given any 1-D signal f, the scale-space L : Z × R + → R 1 is generated by convolution with

which is a zero-mean, symmetric Gaussian kernel of standard deviation σ . The original signal is smoothed by using Gaussian kernel of varying width at each scale. The scale parameter σ ∈ R + represents the current level of scale based on the amount of smoothing performed on the signal. An object trajectory is essentially a 2-D signal having x- and y- projections of object centroid at each successive frame of video clip. Hence, a trajectory is essentially a parametric curve represented by the object’s x- and y- locations over successive frames as r (t ) = {x(t ), y (t )} (5)

The curvature κ (t ) can be expressed as:

κ (t ) =

L(t ; σ ) =



∫ ε

g (ξ ; σ ) f (t − ξ )dξ

(1)

=−∞

The kernel used in scale-space representation must satisfy the monotone property of the amount of structure in the signal. Specifically, for any σ 2 > σ 1 the

L(t ; σ 2 ) must not exceed the number of local extrema in L(t ; σ 1 ) . The kernel g number of local extrema in

is a one-parameter family of symmetric functions satisfying the semi-group property under convolution: g (.; σ 1 ) ⊗ g (.; σ 2 ) = g (.; σ 1 + σ 2 ) (2) and the normalization criterion: ∞



g (ξ ; σ )dξ = 1

(3)

ξ =−∞

In this case, it is necessary and sufficient that the kernel be of the form:

g (ξ ; σ ) =

1

1 2πσ

2

e

−ξ

2

2σ 2

R + denotes the set of real non-negative numbers.

(4)

[ x′(t ) 2 + y′(t )2 ]

3

(6)

2

Curvature of a curve has desirable computational and perceptual characteristics. One such property is that it is invariant under planar rotation and translation of the curve. Thus if transformation, then

M : R 3 → R 3 is

b

b

a

a

∫ κ (r (t )) dt = ∫ κ (M D r (t ) dt

one parameter family of kernels. i.e.,

L(t ; 0) = f (t )

x′(t ) y′′(t ) − y′(t ) x′′(t )

a

rigid (7)

Curvature is computed from dot and cross products of parametric derivatives and these are purely local quantities, hence independent of rotations and translations. The dot and cross products are based only on the lengths of, and angles between, vectors. Hence, these are also independent of rigid transformations of the coordinate system. Given a trajectory as in Eq. (5), the evolved version of the trajectory in terms of scale-space is defined by rσ (t ) = { X (t , σ ), Y (t , σ )} (8) where

X (t ; σ ) = x(t ) ⊗ g (t ; σ ) Y (t ; σ ) = y (t ) ⊗ g (t ; σ )

(9)

At each level of scale governed by the standard deviation of Gaussian kernel, curvature of the evolved trajectory is computed. Then the function implicitly defined by κ (t ; σ ) = 0 (10) is the curvature scale space image of the trajectory. It is defined as a binary image with a value of 1 assigned to the points of curvature zero crossing at each scale level. Figure 1 depicts an example trajectory plotted in 3-D against frame number, x-coordinate of object location and corresponding y-coordinate. Note that each of the arch-shaped contours in the CSS image corresponds to a convexity or concavity on the original

contour with the size of the arch being proportional to the size of the corresponding feature. The CSS image is a very robust representation under the presence of noise in trajectory data due to small camera motions and minor jitters in tracking. Noise amplifies only the small peaks, with no effect on the location or scale of the feature contour maxima [8].

Figure 1: An example trajectory of a high jump sequence, and its corresponding CSS image.

3. Indexing and Matching The set of trajectories is processed using above method to generate CSS image of each trajectory. The indexing is then performed based on the maxima of curvature in CSS Image. When the user submits a query trajectory, a CSS image of this trajectory is generated. Next, matching is performed between the CSS image maxima representation of query and maxima representation of all trajectories in the database. The section 3.1 outlines the way CSS maxima based representation is generated, while section 3.2 describes the matching process used at the query time.

3.1. Indexing The CSS image is a binary image of as many columns as there are points on the original trajectory and as many rows as the scale of the strongest concavity in the trajectory. The height of the CSS image depends on the starting value of σ as well as the amount of increment at each level. In our

implementation, we start the process with σ = 1 and increment it by 0.1 at each level. For the example trajectory shown in Figure 1, the CSS image is 163x256 in size. Representing a trajectory of size 2x256 by such an image itself results in waste of storage space as well as increased complexity of matching process. In [11], the two diffused CSS images corresponding to the direction and the speed of computed from the trajectory data are stored for trajectory representation. Matching is then performed between the two CSS images of the query trajectory and those of the trajectories in the database. For the purpose of indexing, we use the CSS image generated from the spatio-temporal curvature of the trajectory. The indexing is performed based on the maxima of this CSS image. During this process, noisy and spurious open-contours in CSS image are first pruned away. The process starts by performing the morphological operation of ‘closing’ on the binary CSS image in order to close the small gaps in contours. This step is necessary to enable the ensuing process of maxima extraction which depends on contours to be fully intact. A small iteration value of 2 is used for the closing operator. Next, the connected components labeling is performed to extract the total number of contours in the CSS image and to locate their corresponding peaks. The contour regions which are open from the top are then discarded. To remove noisy peaks, the global maximum CSS peak is extracted and a threshold of 90% is set on the peaks to be indexed. Any peak less than 0.9 times the height of global maximum is discarded as noise. The remaining peaks are then sorted according to their σ coordinate value. Only the timevalue of each peak’s occurrence on trajectory and its height in terms of σ coordinate value is stored.

3.2. Retrieval The retrieval stage first processes the input query to create index structure of the query trajectory as outlined in the previous section. The maxima of query trajectory’s CSS image are then matched against the corresponding maxima of each of the indexed trajectories in the database. The matching process assigns some weight measure to each of the indexed trajectory according to how close the indexed trajectory is to the query trajectory. At the end of the retrieval process, a ranked list of trajectories is generated based on the sorted distance measure. From this ranked list, top few good match trajectories are displayed to the user as final retrieval results. The problem effectively boils down to computing distance between two sets of maxima, one from the query trajectory and the other from a trajectory in the

database. The distance computation takes into account the difference between the numbers of peaks of the two sets of maxima as well as their mutual differences in heights and spatial location. The difference of height between global peaks of the two sets of maxima has not proved very important in our experiments, so we first scale the database trajectory’s set of maxima to make it as high as query trajectory’s set of maxima. Next, the distance measure based on the difference between numbers of peaks in the query trajectory and the ith trajectory in the database is computed as:

Dnp (i ) = α NPq − NPdb (i )

paradigm is used where by the user submits a trajectory for which the similar trajectories are to be searched in the database.

(11)

Here α is a scale factor set to 100, NPq represents the number of peaks in query trajectory CSS index, and NPdb ( i ) represents number of peaks in the ith database trajectory’s CSS index. The second component of the distance measure, namely peaks location distance is computed as follows. For each peak of the query set, the nearest peak in terms of spatial distance to it is located within a small window of 10 frames. If such a peak is found, then distance between the jth query peak and kth peak in database trajectory is computed by:

D pl ( j , k ) = Pq ( j ) tq ( j ) − tdb ( k )

(12)

The cost of match between jth peak in query trajectory CSS index and any of the peaks in that of database trajectory is then computed by taking a minimum in Eq. (12). Next, the peak location distance of the ith set of database CSS maxima with the query set of maxima is computed by summing up the match costs of all peaks in query set to this database set of maxima. Finally, the overall distance between the sets of maxima is the sum of number of peaks distance and peak locations distance.

Figure 2: Sample trajectory along with its rotated versions.

Figure 3 shows the top 10 retrieved trajectories corresponding to the query trajectory which is the same as the first retrieved trajectory. Note that all the transformed versions of the trajectory in the database are retrieved as top entries in the ranked list.

4. Results We examined the performance of our system on a database containing trajectories of high jumpers, slalom skiers, and some random object trajectories. Since the main emphasis of this presentation is on affine-invariant retrieval of trajectories, we populate our database with rotated versions of one class of trajectories. The database contains 20 original high jump trajectories with 5 rotated versions of each one at equal intervals in the -60 to +60 degrees range. Figure 2 displays one such trajectory and its rotated versions. The database consists of 100 trajectories from the high jump class, 68 trajectories from the slalom skiing class and 300 random trajectories. A query by example

Figure 3: Retrieval results.

For the quantitative evaluation of the information retrieval systems, the Precision Pp and Recall PR rates have been widely used. In the information retrieval community, Recall of a system is defined as “proportion of relevant material actually retrieved in

answer to a search request”, while Precision is defined as “proportion of retrieved material that is actually relevant” [10]. These metrics have been defined as:

N− F PR = T

(13)

N− F PP = N Notice that the quantity N − F is used to denote the number of correct detections. The value of N denotes the number of elements in the Ranked list L (i.e., the number of both true and false alarms). Finally, the term

T denotes the number of elements in the Target set T (i.e., the ground truth). The Precision-Recall metric is then commonly plotted in a two-dimensional graph, known as the Receiver Operating Characteristics (ROC) curve, with Recall on the x-axis and Precision on the y-axis. Figure 4 shows precision vs. recall metric for a query posed to the above-mentioned database. The trajectory that is used in this query is a rotated version of one of the trajectories present in the database. The system retrieves all the transformed versions of this query present in the database, with a distance measure of zero, as shown in Figure 3 and 4. There is a direct correspondence between the results shown in Figure 3 and the precision-recall metrics shown in Figure 4, viz. the first 5 trajectories in Figure 3 are true positive resulting in a precision of 100% at the low value of recall section. The sixth trajectory in the ranked list is a false positive which lowers the value of precision from 100% to 85%. The next results are true positives and result in a higher value of precision at those values of recall.

Figure 4: Precision-Recall metric for a query from high jump class.

These results clearly reveal the affine-invariant nature of our CSS based descriptor used for indexing. All rotated versions of the query trajectory are retrieved at the top of the ranked list. At the high values of recall, precision goes significantly down revealing that CSS based descriptors are good in pattern recognition kind of setting for locating the exact matches when the database contains many scaled and rotated versions. It loses the edge in retrieving similar trajectories within a class at high values of recall and performs poorly as compared to other approaches [2]. In our future work, we are exploring ways to devise a hybrid system based on PCA and CSS to perform better throughout the spectrum for all values of recall rates.

5. Conclusions An affine-invariant technique of object trajectory indexing and retrieval inspired by the shape matching in image analysis is presented here. Our representation is based on spatio-temporal curvature of the trajectory data, which is robust under affine transformations. Assuming no a priori knowledge about the preferred scale to analyze the data, we take a scale space approach performed on the curvature of trajectories. The resulting CSS image is compactly indexed by the location of its peaks. Our peak matching algorithm then computes distance between the query trajectory set of maxima and all the trajectories in the database. A ranked list of trajectories sorted by distance measure with query trajectory is then produced as result. The results show CSS based method’s affine-invariant nature, an issue overlooked by many motion based indexing and retrieval systems in the past.

6. Acknowledgements The authors would like to thank William Chen and Prof. Shi-Fu Chang from Columbia University, NY for providing the database of trajectories.

7. References [1] Bashir F., Khanvilkar S., Schonfeld D., Khokhar A., “Multimedia systems: content-based indexing and retrieval”, Sec. 4, Chap. 6, in “Electrical Engineer’s Handbook” (In press). [2] Bashir F., Schonfeld D., Khokhar A., “Segmented trajectory based indexing and retrieval of video

data”, IEEE International Conference on Image Processing, ICIP 2003, Barcelona, Spain. [3] Chen W., Chang S. F., “Motion trajectory matching of video objects”, IS&T/SPIE. San Jose, CA, Jan. 2000. [4] Hu M. K., “Visual pattern recognition by moment invariants, computer methods in image analysis”, IEEE transactions on Information Theory. Vol. 8. No. 2, Feb. 1962. Pages 179-187. [5] Lindeberg T., “Scale-space theory in computer vision”, Kluwer Academic Publishers, Netherlands, 1994. [6] Mokhtarian F., Mackworth A., “A theory of multiscale, curvature-based shape representation for planar curves”, IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 14, No. 8, August 1992. Pages 789-805. [7] Mokhtarian F., Abbasi S., “Retrieval of similar shapes under affine transformation”, Proc. International Conference on Visual Information Systems. Amsterdam, The Netherlands, 1999. Pages 566-574. [8] Mokhtarian F., Bober M., “Curvature scale space representation: theory, applications, and MPEG-7 standardization”, Kluwer Academic Publishers, Netherlands, 2003. [9] Pentland A., Sclaroff S., “Modal matching for correspondence and recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 17, No. 6, June 1995. Pages 545-561. [10] Raghavan, V. V.; Bollman, P.”A critical investigation of Recall and Precision as a measure of retrieval system performance”, ACM transactions on Information Systems, 7(3):205– 229, 1989. [11] Rangarajan K., Allen W., Shah M., “Matching motion trajectories using scale-space”, Pattern Recognition, Vol. 26, No. 4, 1993. Pages 595-610. [12] Witkin A. P., “Scale space filtering”, Proc. 8th International Joint Conference on AI. Karlsruhe, West Germany, Aug. 1983. Pages 1019-1022.

Curvature Scale Space Based Affine-Invariant Trajectory Retrieval

represented as a trajectory can help mine more information about video data than without. On these lines of object trajectory based video retrieval, Chen et.

246KB Sizes 1 Downloads 247 Views

Recommend Documents

Segmented Trajectory based Indexing and Retrieval of ...
Multimedia Systems Lab, UIC. 1. Faisal I. Bashir, Ashfaq A. ... Systems Lab, UIC. 4. System Block Diagram .... Of Detection, P d and Prob. Of False Alarm,. P f is:.

Segmented Trajectory based Indexing and Retrieval of Video Data.
Indexing and Retrieval of Video. Data. Multimedia Systems Lab, UIC. 1. Faisal I. Bashir, Ashfaq A. Khokhar, Dan Schonfeld. University of Illinois at Chicago,.

A Motion Trajectory Based Video Retrieval System ...
learning and classification tool. In this paper, we propose a novel motion trajectory based video retrieval system. For feature space representation, we use two ...

Large-Scale Content-Based Audio Retrieval ... - Research at Google
Oct 31, 2008 - Permission to make digital or hard copies of all or part of this work for ... Text queries are also natural for retrieval of speech data, ...... bad disk x.

Trajectory-based handball video understanding
Jul 10, 2009 - timeout) [19]. It relies on Gaussian mixtures and an EM al- gorithm trained on manually labeled sequences. Then, using a partition of the court, ...

TENSOR-BASED MULTIPLE OBJECT TRAJECTORY ...
In most of the existing content-based video indexing and re- ... There are 3 major tensor decomposition tools: HOSVD .... Comparison of HOSVD, PARAFAC and Multiple-SVD .... Proceedings of the IEEE International Conference on Im-.

Visual Curvature
extreme points, the sharper the contour is at the point, i.e. point A, the higher is the ... window, thus they in essence estimate the curvature locally. The size of the ...

Real-Time Motion Trajectory-Based Indexing and ...
of the object trajectory in this setting include tracking results from video trackers .... An important application area of trajectory-based indexing is human activity ...

Content-based retrieval for human motion data
In this study, we propose a novel framework for constructing a content-based human mo- tion retrieval system. Two major components, including indexing and matching, are discussed and their corresponding algorithms are presented. In indexing, we intro

Evaluating Content Based Image Retrieval Techniques ... - CiteSeerX
(“Mountain” class); right: a transformed image (negative transformation) in the testbed ... Some images from classes of the kernel of CLIC ... computer science.

Content-Based Copy Retrieval Using Distortion-Based ... - IEEE Xplore
very large databases both in terms of quality and speed. ... large period, refers to a major historical event. ... that could be exploited by data mining methods.

Robust Trajectory Tracking Controller for Vision Based ...
Aug 18, 2005 - ‡Associate Professor & Director, Flight Simulation Laboratory, Aerospace Engineering Department. ... An critical technology for autonomous aerial refueling is an adequate ... information for real-time navigation applications. ... Vis

Real-Time Motion Trajectory-Based Indexing and ...
gained significant interest in scientific circles lately. This is primarily due to ...... M.S. and Ph.D. degrees in Electrical and Computer. Engineering from the Johns ...

Up Next: Retrieval Methods for Large Scale ... - Research at Google
KDD'14, August 24–27, 2014, New York, NY, USA. Copyright 2014 ACM .... YouTube official blog [1, 3] or work by Simonet [25] for more information about the ...

LSH BANDING FOR LARGE-SCALE RETRIEVAL ... - Semantic Scholar
When combined with data-adaptive bin splitting (needed on only. 0.04% of the ..... tions and applications,” Data Mining and Knowledge Discovery,. 2008.

View-Invariant Motion Trajectory-Based Activity ...
language data measurements gathered from wired glove interfaces fitted with sensors, Global Positioning. System (GPS) .... This allows us to take advantage of a wealth of recent work involving ... generated Bayesian network. The problem ...

Automatic Object Trajectory- Based Motion Recognition ...
Need to process data from Non- video sensors: e.g., wired- gloves, radar, GPS, ... acceleration, velocity, subtrajectory length, etc. ▫ Lei-Chen & Oria [ACM MIR ...

A trajectory-based computational model for optical flow ...
and Dubois utilized the concepts of data conservation and spatial smoothness in ...... D. Marr, Vision. I. L. Barron, D. J. Fleet, S. S. Beauchemin, and T. A. Burkitt,.

The OpenFlow based Scale-Out Router - NoviFlow
Nov 30, 2015 - Other product names used herein are for identification purposes only, and may be trademarks of their respective companies. .... Date. Revision No. Author. Revision Description. Approved ... The vast array of data formats,.

The OpenFlow based Scale-Out Router
Nov 30, 2015 - The latest approved version is located under version control. 3 / 15. Table of Contents. Change ... Today's Network Management Challenges .

Interactive Cluster-Based Personalized Retrieval on ... - Springer Link
consists of a good test-bed domain where personalization techniques may prove ... inserted by the user or implicitly by monitoring a user's behavior. ..... As the underlying distributed memory platform we use a beowulf-class linux-cluster .... Hearst

Shape-Based Image Retrieval in Logo Databases
In recent several years, contents-based image re- trieval has been studied with more attention as huge amounts of image data accumulate in various fields,.

Image Retrieval: Color and Texture Combining Based on Query-Image*
into account a particular query-image without interaction between system and .... groups are: City, Clouds, Coastal landscapes, Contemporary buildings, Fields,.