A Time Warping Based Approach for Video Copy Detection Chih-Yi Chiu, Cheng-Hung Li, Hsiang-An Wang, Chu-Song Chen, Lee-Feng Chien Institute of Information Science, Academia Sinica {cychiu, chli, sawang, song, lfchien}@iis.sinica.edu.tw Abstract The proliferation of digital video urges the need of video copy detection for content and rights management. An efficient video copy detection technique should be able to deal with spatiotemporal variations (e.g., changes in brightness or frame rates), and lower down the computation cost. While most studies put more emphases on spatial variations, less effort is made for temporal variations and computation cost. To address the above issues, we propose a time warping based approach for video copy detection. A time warping matching algorithm is used to deal with video temporal variations. To reduce matching times, a fast filtering method to generate key frames and select candidate clips from video is presented. Our experiments demonstrate promising results of the proposed approach.

1. Introduction As multimedia technologies advanced in capturing and authoring tools, video becomes ubiquitous over the Internet. Since digital video can be easily duplicated, edited, and redistributed, an efficient information tracking technique is helpful to manage video content and rights. There are two general techniques for video information tracking: watermarking and content-based copy detection. The watermarking technique is to embed watermarks in video, while the content-based technique approaches in a different way. Content-based copy detection employs perceptual features of the video content as a unique signature. By comparing signatures of video clips, we can determine if they are copies. The major advantage of content-based copy detection is it does not need to embed watermarks in video, so that video quality will not be degraded. Besides, content-based copy detection can be a complementary technique to watermarking. There are two inherent challenges in content-based copy detection of video. The first comes from the

video structure, which consists of complex spatiotemporal signals. An effective video copy detector should be able to overcome the effects caused by spatiotemporal variations. While most studies put more emphases on spatial variations (e.g., histogram equalization, noise, frame resizing, etc.), limit effort is made for temporal variations (e.g., fast and slow motion, frame rates change). Actually, video temporal variations occur frequently at various scenarios. Slow motion, one of the temporal variations, widely appears to recapture a key moment in sport video. Many video databases provide fast motion and low frame rate clips for users to skim the video content. In order to tackle these temporal variations, we propose a time warping matching algorithm to compensate for the temporal discrepancy. The other challenge is the computation cost of video copy detection. Undoubtedly there is a huge amount of video data on the Internet, consuming enormous computation resources during exhaustive search. In this study, we propose a fast filtering method to extract key frames and then select candidate clips from the video content. Only these candidate clips are detected by the time warping matching algorithm so that a considerable computation cost can be reduced. Our experimental result shows the proposed method can reduce more than tenfold computation cost compared with the exhaustive matching. This paper is organized in the following. Section 2 discusses the related work. Section 3 describes the proposed time warping based approach. Some experimental results are shown in Section 4. Conclusion and future work are given in Section 5.

2. Related work For image copy detection, Kim [1] used the ordinal measure of DCT coefficients as the image signature to resist image spatial variations. Qamra et al. [2] proposed an enhanced dynamic partial function to measure perceptual similarity between images for replica detection.

For video copy detection, Naphade et al. [3] presented a fast algorithm that uses the DCT histogram of compressed video and the polynomial approximation technique for matching. Hampapur et al. [4] designed several sequence matching methods based on the motion, ordinal, and color signatures. Yuan et al. [5] employed a coarse-to-fine strategy, which uses visual ordinal signatures for coarse search and audio features for fine matching. Hua et al. [6] proposed an ordinal signature generated by resampling video frames, which can cope with frame rates change. Kim and Vasudev [7] proposed ordinal and temporal signature matching, which obtains better performance than original ordinal matching. A number of studies [4-7] employed an exhaustive search in a video collection, where a fix-size window that slides one by one frame is used to locate copies. There are two drawbacks in this approach. First the fix-size sliding window can not deal with some temporal variations, e.g., fast and slow motion. Second the exhaustive approach spends high computation cost. For the first drawback, it can be solved effectively by the proposed time warping matching algorithm. For the second drawback, we propose key frame generation and candidate clip selection to cull appropriate frames for matching. The culling process can reduce considerable computation cost.

3. The time warping based approach The proposed approach contains three steps: key frame extraction, candidate clip selection, and sequence matching. Before we detail each step, some notations are defined as follows. Suppose that T and Q are the query and target video clips, respectively. T is represented as {Tj | j = 1, 2, ... , M}, and Q is represented as {Qi | i = 1, 2, ... , N}, where M and N are number of frames, M >> N, Tj and Qi are the ordinal signatures of the corresponding frames. In this study, a video frame is partitioned into 3×3 blocks, thus its ordinal signature is a 3×3 matrix. We then reshape the matrix to a 9×1 vector. For details, please refer to [1, 4-7]. Based on the above notation, the task of copy detection is to find from T the subsequences (i.e., copies) whose signature series are similar to Q's.

3.1. Key frame extraction The first step is to extract key frames from video clips. In addition to reduce the storage and computation cost, it can moderate the effects caused by

temporal variations. Our method is unlike the traditional work that extracts key frames based on visual features. We extract key frames based on the ordinal signature, which has been shown its robustness under many spatial variations. Let us take the target clip T as an example. We define a 9×9 Laplacian of Gaussian filter F, which is often used to calculate second order derivatives in a signal: 2

x y

2

1 x 2 y2 − 2  . F  x , y =− [1 − ]e 4  2 2 The second order derivatives reveal signal transitions which can be chosen as key frames. To detect these transitions, we first convolute T and F to obtain a 1×j vector A: 2

j4

∑ T k⋅f k− j5 k= j−4

A j=

,

where fk is the k-th column vector of F(x, y), and Tk·fk denotes the dot operation of Tk and fk. Define A' = A – mean(A), where mean(x) returns the mean value of the vector x. Then the j-th frame is extracted as a key frame of T if: A(j) is a local maximum and A'(j) ≥ 0, or A(j) is a local minimum and A'(j) < 0. The two conditions are use to detect peaks and to remove noises. Finally the extracted key frames are denoted as t = {t1, t2, ... , tm}. For the query clip Q, we repeat the same procedure to extract its key frame sequence q = {q1, q2, ... , qn}. In this study, we set σ = 2 for the filter F. In our testing, every 6.43 frames a key frame is extracted averagely. That is, using key frames to represent the whole video clip can reduce approximately 83% (i.e., 5/6) storage and computation cost in this case.

3.2. Candidate clip selection Even key frames are extracted, the key frame sequence of the target clip (i.e., t) is still very long. To avoid exhaustive search in the long sequence, we roughly scan t to find subsequences that may be copies of q. First we search start candidates Cstart from t. These candidates are frames that are similar to the first frame of q (i.e., q1). The similarity is defined as: C start ={ s ∣∥ts −q1∥ , s=1,2,... , m } , where ||x|| is the norm of the vector x, ε is the threshold. In this study we set ε = 9. Similarly, end candidates Cend is selected from t: C end ={ e ∣∥te −q n ∥ , e =1,2,... , m } , where qn is the last frame of q.

Then we scan the two candidate lists Cstart and Cend. A subsequence y = {ts, ts+1, ... , te} in t is reported as a candidate clip according to following conditions: (1) s < e (2) There is no other start and end candidates lies between s and e. (3) a×n < (e – s) < b×n , where a and b are both positive real numbers. The condition (1) is to keep the order of the start and end candidates. The condition (2) is to pick up the smallest frame set from the candidate combinations. The condition (3) is to filter out too long or too short clips. In this study, we set a = 0.7 and b = 1.3.

3.3. Time warping matching The Dynamic Time Warping (DTW) algorithm is applied to compute the similarity between the query example q and the candidate clip y. Since DTW can compensate for length difference, it is suitable to deal with video temporal variations. Denote q = {q1, q2, ... , qn} and y = {t'1, t'2, ... , t'l}, where n and l are the frame numbers of q and y respectively. We define the following distance function: distq , y=costn , l , where cost(u, v) is a recursive function: cost1,1 =∥q1 −t' 1∥ , costu , v=∥q u −t' v∥ min {cost u −1, v , cost u , v −1  , cost u−1,v −1 } . where ||u – v|| ≤ d×n, d×n is the maximum amount of warping distance. We set d = 0.3 in this study. Then the above distance function is normalized to determine whether y is a copy of q. If dist q , y  , max n , l then y is a copy of q, where τ is the threshold.

3.4. Time complexity analysis We compare the time complexity of the time warping based approach and the exhaustive approach. Suppose that t and q are given. In time warping matching, the search for start and end candidates needs to scan t two times. The DTW algorithm costs n2. Therefore, the total cost of the time warping based approach is 2×m + c×n2, where c is the number of the selected candidate clips. On the other hand, the exhaustive approach takes m×n computation cost. In our experimental testing 2×m + c×n2 is usually smaller than m×n since m >> n, and thus the proposed time

warping approach spends less computation cost than the exhaustive approach. The details are reported in the next section.

4. Experimental results We collect more than six hours (712,060 frames) video data from TRECVID 2001, the Open Video Project, and the MPEG-7 collection. The format of the video is MPEG-1 NTSC, i.e., resolution 352×240 and 29.97 fps. To test the performance of the proposed approach, the video data are modified to generate eight copies. The eight copy cases are brightness up, histogram equalization, changing resolution to 176× 120, changing frame rate to 15 fps and 10 fps, slow motion (0.5×), fast motion (2×), and the hybrid modification (changing to 176×120 resolution, 10 fps, and 2× fast motion). All these video data are available at http://www.iis.sinica.edu.tw/~chli/Research/Video/VCD/. Note the hybrid modification is often used to generate preview video clips in video databases. However, such a hybrid case is less discussed in related work. Each copy is used as the target clips. From the original collection, 200 video clips are randomly selected as query clips, and each query clip has one thousand frames. Therefore, there are totally 1,600 queries submitted to these eight targets (copies). We compare the results of the proposed approach with various threshold τ against Hua's [6] and Kim's [7] approaches. For Hua's approach, a 3×3 ordinal signature is used to represent a video frame, while for Kim's approach, 2×2 ordinal and temporal signatures are used. The comparison results are plotted using receiver operating characteristics (ROC) curve, which is a plot of the false positive rate Fpr (at the X axis) versus the false negative rate Fnr (at the Y axis): Fp Fn F pr  = and F nr  = , NT NT where NT be the total match number, Fn is the number of false negatives (number of copy clips that are not detected), and Fp is the number of false positives (number of noncopy clips that are detected). Fig. 1 illustrates the ROC curves of the three approaches for various copy cases. Fig. 1(a) shows the result by combining three spatial variations: brightness up, histogram equalization, and resolution change. Since the ordinal signature has been shown its robustness to these spatial variations, the three approaches perform acceptable results. Fig. 1(b) shows the result by combining two temporal variations: 10 fps and 15 fps. We found Kim's approach is worse

than the other two. Unlike Kim's approach, the Hua's resampling approach and our time warping approach are able to handle the cases of frame rates change. Fig. 1(c) shows the results of fast and slow motion. Both Hua's and Kim's performances are severely degraded, while the degradation of our performance is limited. Finally in Fig. 1(d) of the hybrid case, our approach not only outperforms the other two approaches, but shows its robustness under the mix of the spatial and temporal variations.

(a)

(b)

collection, we introduce a fast filtering method to generate key frames and select candidate clips from the video collection so that the computation cost can be reduced. Experimental results show that the proposed approach is effective and efficient. For future work, we will explore the motion feature, which may be a prominent signature in video copy detection. Another direction is to employ an efficient index structure to speed up the process.

Fig. 2. The comparison of computation costs.

6. Acknowledgment This work was partially supported by National Science Council, Taiwan, under Grants NSC95-2422H-001-007- and NSC95-2422-H-001-007-.

7. References (c) (d) Fig. 1. The ROC curves. (a) spatial variations; (b) frame rates change; (c) fast and slow motion; (d) hybrid.

Fig. 2 illustrates the time complexity of the proposed approaches mentioned in Sec. 3.4. By setting different numbers of query frames (i.e., parameter n) will obtain different numbers of candidate clips (i.e., parameter c). As the number of query frames increased, the computation cost of our method grows slightly and much lower than that of the exhaustive method. In real case, a 1,000-frames clip is submitted to query the target collection (712,060 frames). Our method spends 12 seconds on a 2.8GHz with 1GMB ram computer, while the exhaustive method (e.g., Hua's and Kim's approaches) spends 216 seconds.

5. Conclusion and future work In this study, we present a time warping-based approach for video copy detection. The proposed time warping matching algorithm can handle video temporal variations, such as fast and slow motion, frame rates change, etc. In order to speed up the detection process in a large amount of a video

[1] C. Kim, “Content-based image copy detection,” Signal Processing: Image Communication, Vol. 18, No. 3, pp. 169184, 2003. [2] A. Qamra, Y. Meng, and E. Y. Chang, “Enhanced perceptual distance functions and indexing for image replica recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 3, pp. 379-391, 2005. [3] M. Naphade, M. Yeung, and B. Yeo, “A novel scheme for fast and efficient video sequence matching using compact signatures,” The SPIE Conference on Storage and Retrieval for Media Databases, Jan. 2000. [4] A. Hampapur, K.-H. Hyun, and R. M. Bolle, “Comparison of sequence matching techniques for video copy detection,” The SPIE Conference on Storage and Retrieval for Media Databases, 2002. [5] J. Yuan, Q. Tian, and S. Ranganath, “Fast and robust search method for short video clips from large video collection,” International Conference on Pattern Recognition, 2004. [6] X. S. Hua, X. Chen, and H. J. Zhang, “Robust video signature based on ordinal measure,” International Conference on Image Processing, 2004. [7] C. Kim and B. Vasudev, “Spatiotemporal sequence matching for efficient video copy detection,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 15, No. 1, pp. 127-132, 2005.

Time Warping-Based Sequence Matching for ... - Semantic Scholar

The proliferation of digital video urges the need of video copy detection for content and rights management. An efficient video copy detection technique should be able to deal with spatiotemporal variations (e.g., changes in brightness or frame rates), and lower down the computation cost. While most studies put more ...

155KB Sizes 0 Downloads 237 Views

Recommend Documents

Time Warping-Based Sequence Matching for ... - Semantic Scholar
The proliferation of digital video urges the need of ... proposed an ordinal signature generated by ... and temporal signature matching, which obtains better.

Learning sequence kernels - Semantic Scholar
such as the hard- or soft-margin SVMs, and analyzed more specifically the ..... The analysis of this optimization problem helps us prove the following theorem.

Sequence Discriminative Distributed Training of ... - Semantic Scholar
A number of alternative sequence discriminative cri- ... decoding/lattice generation and forced alignment [12]. 2.1. .... energy features computed every 10ms.

Expected Sequence Similarity Maximization - Semantic Scholar
ios, in some instances the weighted determinization yielding Z can be both space- and time-consuming, even though the input is acyclic. The next two sec-.

Nonlinear time-varying compensation for ... - Semantic Scholar
z := unit right shift operator on t 2 (i.e., time ... rejection of finite-energy (i.e., ~2) disturbances for .... [21] D.C. Youla, H.A. Jabr and J.J. Bongiorno, Jr., Modern.

Nonlinear time-varying compensation for ... - Semantic Scholar
plants. It is shown via counterexample that the problem of .... robustness analysis for structured uncertainty, in: Pro- ... with unstructured uncertainty, IEEE Trans.

TIME OPTIMAL TRAJECTORY GENERATION FOR ... - Semantic Scholar
Aug 13, 2008 - I would like to thank my committee members Dr.V.Krovi and. Dr.T.Singh ..... points and go to zero at the boundary of the obstacle. In Ref. .... entire configuration space.thus, to satisfy 3.14b the trajectory generated after meeting.

Fast exact string matching algorithms - Semantic Scholar
LITIS, Faculté des Sciences et des Techniques, Université de Rouen, 76821 Mont-Saint-Aignan Cedex, France ... Available online 26 January 2007 ... the Karp–Rabin algorithm consists in computing h(x). ..... programs have been compiled with gcc wit

TIME OPTIMAL TRAJECTORY GENERATION FOR ... - Semantic Scholar
Aug 13, 2008 - In the near future with the increasing automation and development in ...... Once the point cloud information is available it can be translated into ...

Primary sequence independence for prion formation - Semantic Scholar
Sep 6, 2005 - Most of the Sup35-26p isolates were white or light pink, with three isolates a ..... timing or levels may prevent prion formation. Although all of the ...

Group Incentive Compatibility for Matching with ... - Semantic Scholar
Oct 27, 2008 - to a technical result (the Blocking Lemma of Gale and Sotomayor .... Abdulkadiro˘glu, Atila, “College Admission with Affirmative Action,” In-.

DNA Sequence Variation and Selection of Tag ... - Semantic Scholar
Optimization of association mapping requires knowledge of the patterns of nucleotide diversity ... Moreover, standard neutrality tests applied to DNA sequence variation data can be used to ..... was mapped using a template-directed dye-termination in

A sequence machine built with an asynchronous ... - Semantic Scholar
memory, a type of neural network which can store associations and can learn in a .... data memory and address decoder have real values between. 0 and 1.

engineering a sequence machine through spiking ... - Semantic Scholar
7.1 The complete network implementing the sequence machine . . . . 137 ..... What kind of spiking neural model should we choose for implementing spe-.

DNA Sequence Variation and Selection of Tag ... - Semantic Scholar
clustering algorithm (Structure software; Pritchard et al. .... Finder software (Zhang and Jin 2003; http://cgi.uc.edu/cgi-bin/ ... The model-based clustering ana-.

A sequence machine built with an asynchronous ... - Semantic Scholar
memory, a type of neural network which can store associations and can learn in a .... data memory and address decoder have real values between. 0 and 1.

Multi-Window Time-Frequency Signature ... - Semantic Scholar
Hermite functions, missing samples, multi-window time-frequency representation, multiple ... These motions generate sidebands about the ... and the applications of associated sparse reconstruction techniques for effective joint-variable.

Linear space-time precoding for OFDM systems ... - Semantic Scholar
term channel state information instead of instantaneous one and it is ... (OFDM) and multicarrier code division multiple access ..... IEEE Vehicular Technology.

Allocating Dynamic Time-Spectrum Blocks for ... - Semantic Scholar
[email protected]. ABSTRACT. A number .... prototype we are currently developing at Microsoft [24]. The ..... three consecutive time-windows [tcur, tcur + Γ], at least one of which it ...... Assignment for Mobile Ad Hoc Networks. In I-SPAN ...

Multi-Window Time-Frequency Signature ... - Semantic Scholar
Radar offers privacy and non-intrusive monitoring capability. Micro-. Doppler .... In order to obtain a good estimate of the. TFR, we use ..... compare the performance of sets of windows based on Slepian and Hermite functions. The measured ...

Prime Time forα-Synuclein - Semantic Scholar
Lee VM, Trojanowski JQ (2006) Mechanisms of. Parkinson's disease linked to pathological al- pha-synuclein: new targets for drug discovery. Neuron 52:33–38.

A novel time-memory trade-off method for ... - Semantic Scholar
Institute for Infocomm Research, Cryptography and Security Department, 1 ..... software encryption, lecture notes in computer science, vol. ... Vrizlynn L. L. Thing received the Ph.D. degree in Computing ... year. Currently, he is in the Digital Fore

Human-Agent Collaboration for Time Stressed ... - Semantic Scholar
Situation Analysis. Situation ..... choice of API as the main performance measure in the analysis ..... data mining, search engines, and graph information retrieval.