Saliency-based Image Retargeting in the Compressed ∗ Domain Yuming Fang
School of Computer Engineering, Nanyang Technological University, Singapore
[email protected]
Zhenzhong Chen
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
[email protected]
Weisi Lin
Chia-Wen Lin
School of Computer Engineering,Nanyang Technological University, Singapore
Department of Electrical Engineering, National Tsing Hua University, Taiwan R.O.C
[email protected]
[email protected]
ABSTRACT
1. INTRODUCTION
In this paper, we propose a novel image retargeting algorithm to resize images based on the extracted saliency information from the compressed domain. Firstly, we utilize DCT coefficients in JPEG bitstream to perform saliency detection with the consideration of the human visual sensitivity. The obtained saliency information is used to determine the relative visual importance of each 8 × 8 block for the image. Furthermore, we propose a new adaptive block-level seam removal operation for connected blocks to resize the image. Thanks to the directly derived saliency information from the compressed domain, the proposed image retargeting algorithm effectively preserves the objects of attention, efficiently removes the less crucial regions, and therefore significantly outperforms the relevant state-of-theart algorithms, as demonstrated with the careful analysis and in the extensive experiments.
The content-aware image retargeting algorithms such as seam carving are popular and effective in resizing images. The performance for these algorithms greatly depends on the used visual significance maps which measure the visual importance for image pixels. The visual significance maps used in these algorithms include the gradient map, the saliency map and some high-level feature maps such as facial map, motion map and so on [1-6]. These existing image retargeting algorithms are implemented in the uncompressed domain. However, images are typically stored in a compressed domain such as JPEG. Thus, it is crucial to design an efficient image retargeting algorithm in the compressed domain. In this paper, we propose a saliency-based image retargeting algorithm in the compressed domain. Firstly, we use DCT coefficients in the JPEG bitstream to obtain the saliency information with the consideration of the human visual sensitivity. In JPEG, an image is split into blocks of 8 × 8 pixels. Although the minimum coded unit (MCU) blocks can be as large as 16 × 16 (for 4:2:0 component subsampling format), we perform our saliency detection and retargeting at the 8 × 8 block level. After obtaining the saliency information, we determine the visual importance of each 8 × 8 block in the image. Based on the visual significance map, we design an adaptive seam removal approach for connected blocks to resize the image. Experimental results show the superior performance of the proposed saliency detection as well as image retargeting.
Categories and Subject Descriptors I.4 [Image Processing and Computer Vision]: General—image displays
General Terms Algorithms
Keywords Image retargeting, saliency detection, human visual sensitivity, compressed domain
2. BACKGROUND AND MOTIVATIONS The popular image retargeting algorithm of seam carving was proposed by Avidan et al. [1]. A seam is defined as 8-connected patch of low-energy pixels (from top to bottom or left to right) in images. These pixels include only one pixel in each row or column. The seam carving aims to reduce the width (or weight) by removing those unimportant seams. The visual importance map is used to determine the importance of each pixel in images. Based on seam carving, many other algorithms have been proposed to improve the performance for image retargeting [2,3]. Other advanced image retargeting algorithms have also been proposed. Wolf et al. [4] designed a video retargeting algorithm to resize videos by using the visual importance
∗Area chair: Lei Chen. Corresponding author.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’11, November 28– December 1, 2011, Scotsdale, Arizona, USA. Copyright 2011 ACM 978-1-4503-0616-4/11/11 ...$10.00.
1049
(a) (b) (c) (d) (e) (f) (g) (h) Figure 1: Comparisons of different image retargeting algorithms: (a) the original image; (b) the gradient map; (c) the saliency map from Itti’s model [8]; (d) the saliency map from our proposed model; (e)-(h) the retargeted images from [2],[4],[5] and our proposed algorithm. map composed of local saliency, face detection and motion detection. Ren et al. [5] proposed an image reatargeting algorithm based on global energy optimization, in which the saliency map and face detection were combined as the visual importance map. Jin et al. [6] presented a content-aware image resizing algorithm through warping a triangular mesh over images by regarding salient line features and curved features as important regions. In this paper, we build a superior saliency detection model in the compressed domain to obtain the saliency map, which is used as the visual significance map for the proposed image retargeting. Thanks to the directly derived saliency map from the compressed domain, the proposed algorithm effectively preserves the objects of attention and remove the less crucial regions, as shown in Figure 1. From Figure 1, we can see our saliency map can detect the salient object more accurately than the gradient map and the saliency map from Itt’s model [8]. More information and comparisons will be provided in the following sections.
3.
Figure 2: DCT coefficients in one 8 × 8 block. DCT coefficients in one block are composed of the DC coefficient and AC coefficients. In this block, the DC coefficient is a measure of the average energy for the 8 × 8 pixels, while AC coefficients can represent the orientation information for this block. Thus, we use the DC coefficient to extract the intensity and color features for the block, and use the AC coefficients to extract the orientation feature of the block in the JPEG image. Since the DC coefficients represent the Luminance and Chrominance information in YCrCb color space in the JPEG image, we firstly transfer the DC coefficients from YCrCb color space to the RGB color space to extract the intensity and color features for the JPEG image. We calculate the color and intensity features by following steps: let r, g and b denote the red, green and blue color components from DC coefficients, and four broadly-tuned color channels are generated as R = r − (g + b) for new red component, G = g − (r + b)/2 for new green component, B = b − (r + g)/2 for new blue component and Y = (r +g)/2−|r −b|/2−b for new yellow component. The intensity feature can be calculated as I = (r + g + b)/3. Each color channel is then decomposed into red/green and blue/yellow double opponency according to the related property of the human primary visual cortex [7, 8]: Crg = R−G and Cby = B −Y . I, Crg and Cby are the extracted intensity and color features for an 8×8 block in the JPEG image. It is noted that a 16×16 MCU consists of four 8 × 8 Luminance blocks and two 8 × 8 Chrominance blocks (one for Cb and the other for Cr). Thus, four Luminance blocks share the same Chrominance blocks in a typical 4:2:0 component subsampling JPEG encoding system. We use the AC coefficients in YCrCb color space to extract the orientation for each block. In YCrCb color space, Cr and Cb components represent the color information and their AC coefficients provide little information for orientation. Thus we use the AC coefficients of the Y component only to extract the orientation feature O. The AC coefficients used in this paper include the AC coefficients in the first row and first column in Figure 2: O = AC0i ACi0 (i ∈ 1, 2...6, 7).
THE FRAMEWORK
The proposed approach consists of two aspects. First, we propose a saliency detection model in the compressed domain, which is used to obtain the saliency map to measure the visual importance of each block in JPEG images. Second, based on the saliency detection in the compressed domain, we design a new block-level seam carving operation for connected blocks to resize JPEG images.
3.1 Saliency Detection in the Compressed Domain Images are usually stored in a compressed domain such as JPEG. In this study, the saliency information is derived from the JPEG bitstream. The saliency detection model is built on the DCT coefficients with the consideration of the human visual sensitivity. The saliency value from feature k for each block Sik is determined by the differences between the block i and others in the image, and the weighting factors for these differences by the contrast sensitivities. k Sik = αij Dij (1) j=i
Sik
is the saliency value from feature k for the block where k is the difference of feature k between the blocks i and i ; Dij k . We use j; αij is the weighting for the block difference Dij three features extracted from the JPEG bitstream to calculate the saliency values: intensity, color and orientation. The final saliency map is a combination of these feature maps.
3.1.1 Feature Extraction from JPEG Bitstream
3.1.2 Feature Maps in the Compressed Domain
In this study, we extract features from DCT coefficients. The DCT coefficients in one block are shown as Figure 2.
In the next step, we use the intensity, color and orientation features to obtain the feature maps. Since we use the DC
1050
coefficients to calculate the intensity and color features for each block, the feature difference between blocks i and j can be calculated as: k = Cki − Ckj Dij
(2)
where k = 1, 2, 3 represent the intensity and color features respectively (one intensity feature and two color features); Ck ∈ {I, Crg , Cby }. As mentioned above, we use AC coefficients from the Luminance block to represent the orientation feature for each block in the JPEG image. The Hausdorff distance [9] is used here to calculate the difference between two AC coefficient vectors from two different blocks. The orientation difference 4 Dij between two blocks i and j can be computed as follows: 4 = max(h(Oi , Oj ), h(Oj , Oi )) Dij
(3)
where Oi and Oj represent the AC coefficient vectors for blocks i and j, respectively. h(Oi , Oj ) is calculated as: h(Oi , Oj ) = max min oi − oj oi Oi oj Oj
(4)
where . is the L2 norm. We further propose to use the visual sensitivity to determine the weights for these feature differences. The model in [10] is adopted to measure the human contrast sensitivity as a function of eccentricity. The contrast sensitivity CS (f, e) is defined as: Cs (f, e) = 1/(C0 exp(af (e + e2 )/e2 ))
(5) Figure 3: Comparisons of different image retargeting algorithms. The first column: the original images; The second to fifth columns: the retargeted images from [2], [4], [5], and our proposed algorithms.
where f is the spatial frequency (cycles/degree); e is the retinal eccentricity (degree) between blocks i and j ; C0 is the minimum contrast threshold; a is the spatial frequency decay constant; e2 is the half-resolution eccentricity. Based on the study in [10], these parameters are set to C0 = 1/64, a = 0.106, and e2 = 2.3. We let aij = Cs (f, e) to represent the weights for the differences between blocks. According to (5), the weighting aij is dependent on the retinal eccentricity between blocks i and j. The weighting factor becomes smaller with increased retinal eccentricity from the block i that results in less contribution from Dij in the final saliency value in (1).
size saliency map, the computational cost is much lower on the smaller saliency map. Moreover, we propose a blockbased seam removal operation for resizing the JPEG image. Here we use the ‘forward energy’ method [2] in seam carving to determine the optimal seam. It is noted that since our saliency map is at block-level, each seam indicates connected blocks instead of connected pixels in the original image. To reduce the block artifacts, we propose a new method to adaptively determine how many parallel strips in the blocklevel seams should be removed (8 strips in each 8-pixel wide block-level seam). As the optimal block-level seams are chosen according to their visual importance, the firstly chosen removal block-level seams are the least important areas in images. Therefore, we remove the the whole 8 strips for the block-level seams first. Then for other 8-pixel wide blocklevel seams, we remove part of them (less than 8 strips) according to their increased visual importance. So four different seam removal operations, i.e., remove 8, 6, 4, and 2 strips, are defined. The amount of removal block-level seams for each operation is determined as follows.
3.1.3 Saliency Map After obtaining the four feature maps Sk , the final saliency map for the JPEG image can be obtained by integrating these feature maps. According to (1), the four feature maps k Sk can be calculated based on the Dij (k ∈ {1, 2, 3, 4}) and aij in (2), (3) and (5). We use the coherent normalization based fusion method to combine these four feature maps into the final saliency map S as follows: S= γθ N(θ) + βθ N(θ) (6) where N is the normalization operation; θ ∈ {Sk }; γ and β are parameters determining the weights for each components in (6). In this paper, we set γθ = βθ = 1/5.
n = arg min(|Ri − Tm |) i
3.2 Block-based Seam Removal Operation
(7)
where m ∈ {1, 2, 3, 4} denotes the operation to remove 8, 6, 4, and 2 strips for the block-level seams; n is the amount of the removal block-level seams in the mth operation; N is the total amount of the removal block-level seams; Ri is the mean saliency value for the ith block-level seam; Tm is the threshold for the mth operation. With the superior saliency
As we use the blocks with size of 8 × 8 to calculate the saliency map, the saliency map is only 1/64 times of the original image and each value in the final saliency map represent the saliency value for one 8 × 8 block. When compared to traditional retargeting algorithms operating on the original
1051
Table 1: Comparisons of different saliency detection models. Models Precision Recall F-Measure Hou’s Model [12] 0.6152 0.2944 0.4916 Achanta’s Model [13] 0.5450 0.3006 0.4288 Itti’s Model [8] 0.5916 0.3784 0.4981 Our Model 0.6571 0.5961 0.6354 detection model, the artifacts are not obvious as they are typically located away from regions of attention, where the visual sensitivity is low.
4.
EXPERIMENTS
Figure 4: Score comparisons of retargeted images from different algorithms.
In this section, we evaluate the overall performance of the proposed approach from two aspects: one is the performance of the proposed saliency detection algorithm, the other is the performance of the proposed image retargeting algorithm in resizing images.
Table 2: Mean scores for the 500 retargeted images from different algorithms. [2] [4] [5] Our Algorithm Mean 3.278 3.348 3.424 3.708
4.1 Saliency Detection Evaluation
5. CONCLUSIONS
The performance of most existing image retargeting algorithms depends on the adopted visual significance maps which should be able to indicate the salient regions in images effectively, as demonstrated in Figure 1. In this experiment, we randomly select 500 images from the image database [11] to compare the performance of the saliency map from our proposed model with those from other existing saliency detection models [8, 12, 13]. The salient objects in the ground-truth maps are remarked by bounding boxes in this database. Similar with [11], we use precision, recall and F-measure to compare the performance of different saliency detection algorithms. Table 1 shows the comparison results in which we can see that the precision, recall and F-measure values from our saliency detection algorithm are higher than these from other algorithms. Our saliency detection algorithm in the compressed domain shows superior performance.
In this paper, we have proposed a saliency-based image reatargeting algorithm in the compressed domain. We demonstrate the effectiveness of our novel saliency detection algorithm. Moreover, we propose a new block-based operation for resizing JPEG images. The user-study experiment shows that the proposed algorithm can obtain better results compared with existing ones in retargeting images.
6. REFERENCES [1] S. Avidan, and A. Shamir. Seam carving for content-aware image resizing. ACM TOG, 26(3), 2007. [2] M. Rubinstein, A. Shamir, and S. Avidan. Improved seam carving for video retargeting. ACM TOG, 2008. [3] R. Achanta, and S. Susstrunk. Saliency detection for content-aware image resizing. In ICIP, 2009. [4] L. Wolf, M. Guttmann, and D. Cohen-OR. Non-homogeneous content-driven video retargeting. In ICCV, 2007. [5] T. Ren, Y. Liu, and G. Wu. Image retargeting based on global energy optimization. In ICME, 2009. [6] Y. Jin, L. Liu, and Q. Wu. Nonhomogeneous scaling optimization for realtime image resizing. In CGI, 2010. [7] S. Engel, X. Zhang, and B. Wandell. Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature, 388(6), 1997. [8] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI, 20(11), 1998. [9] R. T. Rockafellar, and R. J.-B. Wets. Variational Analysis. Springer-Verlag, 2005. [10] W. S. Geisler, and J. S. Perry. A real-time foveated multi-solution system for low-bandwidth video communication. SPIE, 3299, 1998. [11] T. Liu, J. Sun, N. Zheng, X. Tang, and H. Y. Shum. Learning to detect a salient object. In CVPR, 2007. [12] X. Hou and L. Zhang. Saliency detection: a spectral residual approach. In CVPR, 2007. [13] R. Achanta, S. Hemami, F. Estrada and S. Susstrunk. Frequency-tuned salient region detection. In CVPR, 2009.
4.2 Image Retargeting Evaluation In this experiment, we use the same 500 images to evaluate the performance of our proposed image retargeting algorithm. Three state-of-the-art image retargeting algorithms [2, 4, 5] are adopted for comparions. Some visual results are shown in Figure 3. We can observe that the retargeted images from our proposed algorithm are better than those from other existing algorithms. The user study is performed to evaluate the performance of different algorithms. 10 participants (3 are female while others are male) involve in this experiment. The 500 original images are used as reference images. The experiments are conducted in the typical laboratory environment. All retargeted images from four different algorithms are displayed in the random order on the screen. Mean opinion scores (15) are recorded by participants where 1 means bad viewing experience and 5 means excellent viewing experience. Each participant votes for 50 images. The statistical results for the retargeted images are shown in Table 2. We can see that the mean score of the retargeted images from our proposed algorithm is higher than those from other algorithms. In Figure 4, we presents the number of the retargeted images under each score. Most of the retargeted images from our proposed algorithm provide better viewing experience for users.
1052