Saliency-based Image Retargeting in the Compressed ∗ Domain Yuming Fang

School of Computer Engineering, Nanyang Technological University, Singapore

[email protected]

Zhenzhong Chen

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

[email protected]

Weisi Lin

Chia-Wen Lin

School of Computer Engineering,Nanyang Technological University, Singapore

Department of Electrical Engineering, National Tsing Hua University, Taiwan R.O.C

[email protected]

[email protected]

ABSTRACT

1. INTRODUCTION

In this paper, we propose a novel image retargeting algorithm to resize images based on the extracted saliency information from the compressed domain. Firstly, we utilize DCT coefficients in JPEG bitstream to perform saliency detection with the consideration of the human visual sensitivity. The obtained saliency information is used to determine the relative visual importance of each 8 × 8 block for the image. Furthermore, we propose a new adaptive block-level seam removal operation for connected blocks to resize the image. Thanks to the directly derived saliency information from the compressed domain, the proposed image retargeting algorithm effectively preserves the objects of attention, efficiently removes the less crucial regions, and therefore significantly outperforms the relevant state-of-theart algorithms, as demonstrated with the careful analysis and in the extensive experiments.

The content-aware image retargeting algorithms such as seam carving are popular and effective in resizing images. The performance for these algorithms greatly depends on the used visual significance maps which measure the visual importance for image pixels. The visual significance maps used in these algorithms include the gradient map, the saliency map and some high-level feature maps such as facial map, motion map and so on [1-6]. These existing image retargeting algorithms are implemented in the uncompressed domain. However, images are typically stored in a compressed domain such as JPEG. Thus, it is crucial to design an efficient image retargeting algorithm in the compressed domain. In this paper, we propose a saliency-based image retargeting algorithm in the compressed domain. Firstly, we use DCT coefficients in the JPEG bitstream to obtain the saliency information with the consideration of the human visual sensitivity. In JPEG, an image is split into blocks of 8 × 8 pixels. Although the minimum coded unit (MCU) blocks can be as large as 16 × 16 (for 4:2:0 component subsampling format), we perform our saliency detection and retargeting at the 8 × 8 block level. After obtaining the saliency information, we determine the visual importance of each 8 × 8 block in the image. Based on the visual significance map, we design an adaptive seam removal approach for connected blocks to resize the image. Experimental results show the superior performance of the proposed saliency detection as well as image retargeting.

Categories and Subject Descriptors I.4 [Image Processing and Computer Vision]: General—image displays

General Terms Algorithms

Keywords Image retargeting, saliency detection, human visual sensitivity, compressed domain

2. BACKGROUND AND MOTIVATIONS The popular image retargeting algorithm of seam carving was proposed by Avidan et al. [1]. A seam is defined as 8-connected patch of low-energy pixels (from top to bottom or left to right) in images. These pixels include only one pixel in each row or column. The seam carving aims to reduce the width (or weight) by removing those unimportant seams. The visual importance map is used to determine the importance of each pixel in images. Based on seam carving, many other algorithms have been proposed to improve the performance for image retargeting [2,3]. Other advanced image retargeting algorithms have also been proposed. Wolf et al. [4] designed a video retargeting algorithm to resize videos by using the visual importance

∗Area chair: Lei Chen.  Corresponding author.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’11, November 28– December 1, 2011, Scotsdale, Arizona, USA. Copyright 2011 ACM 978-1-4503-0616-4/11/11 ...$10.00.

1049

(a) (b) (c) (d) (e) (f) (g) (h) Figure 1: Comparisons of different image retargeting algorithms: (a) the original image; (b) the gradient map; (c) the saliency map from Itti’s model [8]; (d) the saliency map from our proposed model; (e)-(h) the retargeted images from [2],[4],[5] and our proposed algorithm. map composed of local saliency, face detection and motion detection. Ren et al. [5] proposed an image reatargeting algorithm based on global energy optimization, in which the saliency map and face detection were combined as the visual importance map. Jin et al. [6] presented a content-aware image resizing algorithm through warping a triangular mesh over images by regarding salient line features and curved features as important regions. In this paper, we build a superior saliency detection model in the compressed domain to obtain the saliency map, which is used as the visual significance map for the proposed image retargeting. Thanks to the directly derived saliency map from the compressed domain, the proposed algorithm effectively preserves the objects of attention and remove the less crucial regions, as shown in Figure 1. From Figure 1, we can see our saliency map can detect the salient object more accurately than the gradient map and the saliency map from Itt’s model [8]. More information and comparisons will be provided in the following sections.

3.

Figure 2: DCT coefficients in one 8 × 8 block. DCT coefficients in one block are composed of the DC coefficient and AC coefficients. In this block, the DC coefficient is a measure of the average energy for the 8 × 8 pixels, while AC coefficients can represent the orientation information for this block. Thus, we use the DC coefficient to extract the intensity and color features for the block, and use the AC coefficients to extract the orientation feature of the block in the JPEG image. Since the DC coefficients represent the Luminance and Chrominance information in YCrCb color space in the JPEG image, we firstly transfer the DC coefficients from YCrCb color space to the RGB color space to extract the intensity and color features for the JPEG image. We calculate the color and intensity features by following steps: let r, g and b denote the red, green and blue color components from DC coefficients, and four broadly-tuned color channels are generated as R = r − (g + b) for new red component, G = g − (r + b)/2 for new green component, B = b − (r + g)/2 for new blue component and Y = (r +g)/2−|r −b|/2−b for new yellow component. The intensity feature can be calculated as I = (r + g + b)/3. Each color channel is then decomposed into red/green and blue/yellow double opponency according to the related property of the human primary visual cortex [7, 8]: Crg = R−G and Cby = B −Y . I, Crg and Cby are the extracted intensity and color features for an 8×8 block in the JPEG image. It is noted that a 16×16 MCU consists of four 8 × 8 Luminance blocks and two 8 × 8 Chrominance blocks (one for Cb and the other for Cr). Thus, four Luminance blocks share the same Chrominance blocks in a typical 4:2:0 component subsampling JPEG encoding system. We use the AC coefficients in YCrCb color space to extract the orientation for each block. In YCrCb color space, Cr and Cb components represent the color information and their AC coefficients provide little information for orientation. Thus we use the AC coefficients of the Y component only to extract the orientation feature O. The AC coefficients used in this paper include the AC coefficients  in the first row and first column in Figure 2: O = AC0i ACi0 (i ∈ 1, 2...6, 7).

THE FRAMEWORK

The proposed approach consists of two aspects. First, we propose a saliency detection model in the compressed domain, which is used to obtain the saliency map to measure the visual importance of each block in JPEG images. Second, based on the saliency detection in the compressed domain, we design a new block-level seam carving operation for connected blocks to resize JPEG images.

3.1 Saliency Detection in the Compressed Domain Images are usually stored in a compressed domain such as JPEG. In this study, the saliency information is derived from the JPEG bitstream. The saliency detection model is built on the DCT coefficients with the consideration of the human visual sensitivity. The saliency value from feature k for each block Sik is determined by the differences between the block i and others in the image, and the weighting factors for these differences by the contrast sensitivities.  k Sik = αij Dij (1) j=i

Sik

is the saliency value from feature k for the block where k is the difference of feature k between the blocks i and i ; Dij k . We use j; αij is the weighting for the block difference Dij three features extracted from the JPEG bitstream to calculate the saliency values: intensity, color and orientation. The final saliency map is a combination of these feature maps.

3.1.1 Feature Extraction from JPEG Bitstream

3.1.2 Feature Maps in the Compressed Domain

In this study, we extract features from DCT coefficients. The DCT coefficients in one block are shown as Figure 2.

In the next step, we use the intensity, color and orientation features to obtain the feature maps. Since we use the DC

1050

coefficients to calculate the intensity and color features for each block, the feature difference between blocks i and j can be calculated as: k = Cki − Ckj Dij

(2)

where k = 1, 2, 3 represent the intensity and color features respectively (one intensity feature and two color features); Ck ∈ {I, Crg , Cby }. As mentioned above, we use AC coefficients from the Luminance block to represent the orientation feature for each block in the JPEG image. The Hausdorff distance [9] is used here to calculate the difference between two AC coefficient vectors from two different blocks. The orientation difference 4 Dij between two blocks i and j can be computed as follows: 4 = max(h(Oi , Oj ), h(Oj , Oi )) Dij

(3)

where Oi and Oj represent the AC coefficient vectors for blocks i and j, respectively. h(Oi , Oj ) is calculated as: h(Oi , Oj ) = max min  oi − oj  oi Oi oj Oj

(4)

where  .  is the L2 norm. We further propose to use the visual sensitivity to determine the weights for these feature differences. The model in [10] is adopted to measure the human contrast sensitivity as a function of eccentricity. The contrast sensitivity CS (f, e) is defined as: Cs (f, e) = 1/(C0 exp(af (e + e2 )/e2 ))

(5) Figure 3: Comparisons of different image retargeting algorithms. The first column: the original images; The second to fifth columns: the retargeted images from [2], [4], [5], and our proposed algorithms.

where f is the spatial frequency (cycles/degree); e is the retinal eccentricity (degree) between blocks i and j ; C0 is the minimum contrast threshold; a is the spatial frequency decay constant; e2 is the half-resolution eccentricity. Based on the study in [10], these parameters are set to C0 = 1/64, a = 0.106, and e2 = 2.3. We let aij = Cs (f, e) to represent the weights for the differences between blocks. According to (5), the weighting aij is dependent on the retinal eccentricity between blocks i and j. The weighting factor becomes smaller with increased retinal eccentricity from the block i that results in less contribution from Dij in the final saliency value in (1).

size saliency map, the computational cost is much lower on the smaller saliency map. Moreover, we propose a blockbased seam removal operation for resizing the JPEG image. Here we use the ‘forward energy’ method [2] in seam carving to determine the optimal seam. It is noted that since our saliency map is at block-level, each seam indicates connected blocks instead of connected pixels in the original image. To reduce the block artifacts, we propose a new method to adaptively determine how many parallel strips in the blocklevel seams should be removed (8 strips in each 8-pixel wide block-level seam). As the optimal block-level seams are chosen according to their visual importance, the firstly chosen removal block-level seams are the least important areas in images. Therefore, we remove the the whole 8 strips for the block-level seams first. Then for other 8-pixel wide blocklevel seams, we remove part of them (less than 8 strips) according to their increased visual importance. So four different seam removal operations, i.e., remove 8, 6, 4, and 2 strips, are defined. The amount of removal block-level seams for each operation is determined as follows.

3.1.3 Saliency Map After obtaining the four feature maps Sk , the final saliency map for the JPEG image can be obtained by integrating these feature maps. According to (1), the four feature maps k Sk can be calculated based on the Dij (k ∈ {1, 2, 3, 4}) and aij in (2), (3) and (5). We use the coherent normalization based fusion method to combine these four feature maps into the final saliency map S as follows:   S= γθ N(θ) + βθ N(θ) (6) where N is the normalization operation; θ ∈ {Sk }; γ and β are parameters determining the weights for each components in (6). In this paper, we set γθ = βθ = 1/5.

n = arg min(|Ri − Tm |) i

3.2 Block-based Seam Removal Operation

(7)

where m ∈ {1, 2, 3, 4} denotes the operation to remove 8, 6, 4, and 2 strips for the block-level seams; n is the amount of the removal block-level seams in the mth operation; N is the total amount of the removal block-level seams; Ri is the mean saliency value for the ith block-level seam; Tm is the threshold for the mth operation. With the superior saliency

As we use the blocks with size of 8 × 8 to calculate the saliency map, the saliency map is only 1/64 times of the original image and each value in the final saliency map represent the saliency value for one 8 × 8 block. When compared to traditional retargeting algorithms operating on the original

1051

Table 1: Comparisons of different saliency detection models. Models Precision Recall F-Measure Hou’s Model [12] 0.6152 0.2944 0.4916 Achanta’s Model [13] 0.5450 0.3006 0.4288 Itti’s Model [8] 0.5916 0.3784 0.4981 Our Model 0.6571 0.5961 0.6354 detection model, the artifacts are not obvious as they are typically located away from regions of attention, where the visual sensitivity is low.

4.

EXPERIMENTS

Figure 4: Score comparisons of retargeted images from different algorithms.

In this section, we evaluate the overall performance of the proposed approach from two aspects: one is the performance of the proposed saliency detection algorithm, the other is the performance of the proposed image retargeting algorithm in resizing images.

Table 2: Mean scores for the 500 retargeted images from different algorithms. [2] [4] [5] Our Algorithm Mean 3.278 3.348 3.424 3.708

4.1 Saliency Detection Evaluation

5. CONCLUSIONS

The performance of most existing image retargeting algorithms depends on the adopted visual significance maps which should be able to indicate the salient regions in images effectively, as demonstrated in Figure 1. In this experiment, we randomly select 500 images from the image database [11] to compare the performance of the saliency map from our proposed model with those from other existing saliency detection models [8, 12, 13]. The salient objects in the ground-truth maps are remarked by bounding boxes in this database. Similar with [11], we use precision, recall and F-measure to compare the performance of different saliency detection algorithms. Table 1 shows the comparison results in which we can see that the precision, recall and F-measure values from our saliency detection algorithm are higher than these from other algorithms. Our saliency detection algorithm in the compressed domain shows superior performance.

In this paper, we have proposed a saliency-based image reatargeting algorithm in the compressed domain. We demonstrate the effectiveness of our novel saliency detection algorithm. Moreover, we propose a new block-based operation for resizing JPEG images. The user-study experiment shows that the proposed algorithm can obtain better results compared with existing ones in retargeting images.

6. REFERENCES [1] S. Avidan, and A. Shamir. Seam carving for content-aware image resizing. ACM TOG, 26(3), 2007. [2] M. Rubinstein, A. Shamir, and S. Avidan. Improved seam carving for video retargeting. ACM TOG, 2008. [3] R. Achanta, and S. Susstrunk. Saliency detection for content-aware image resizing. In ICIP, 2009. [4] L. Wolf, M. Guttmann, and D. Cohen-OR. Non-homogeneous content-driven video retargeting. In ICCV, 2007. [5] T. Ren, Y. Liu, and G. Wu. Image retargeting based on global energy optimization. In ICME, 2009. [6] Y. Jin, L. Liu, and Q. Wu. Nonhomogeneous scaling optimization for realtime image resizing. In CGI, 2010. [7] S. Engel, X. Zhang, and B. Wandell. Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature, 388(6), 1997. [8] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI, 20(11), 1998. [9] R. T. Rockafellar, and R. J.-B. Wets. Variational Analysis. Springer-Verlag, 2005. [10] W. S. Geisler, and J. S. Perry. A real-time foveated multi-solution system for low-bandwidth video communication. SPIE, 3299, 1998. [11] T. Liu, J. Sun, N. Zheng, X. Tang, and H. Y. Shum. Learning to detect a salient object. In CVPR, 2007. [12] X. Hou and L. Zhang. Saliency detection: a spectral residual approach. In CVPR, 2007. [13] R. Achanta, S. Hemami, F. Estrada and S. Susstrunk. Frequency-tuned salient region detection. In CVPR, 2009.

4.2 Image Retargeting Evaluation In this experiment, we use the same 500 images to evaluate the performance of our proposed image retargeting algorithm. Three state-of-the-art image retargeting algorithms [2, 4, 5] are adopted for comparions. Some visual results are shown in Figure 3. We can observe that the retargeted images from our proposed algorithm are better than those from other existing algorithms. The user study is performed to evaluate the performance of different algorithms. 10 participants (3 are female while others are male) involve in this experiment. The 500 original images are used as reference images. The experiments are conducted in the typical laboratory environment. All retargeted images from four different algorithms are displayed in the random order on the screen. Mean opinion scores (15) are recorded by participants where 1 means bad viewing experience and 5 means excellent viewing experience. Each participant votes for 50 images. The statistical results for the retargeted images are shown in Table 2. We can see that the mean score of the retargeted images from our proposed algorithm is higher than those from other algorithms. In Figure 4, we presents the number of the retargeted images under each score. Most of the retargeted images from our proposed algorithm provide better viewing experience for users.

1052

Saliency-based image retargeting in the compressed ...

Dec 1, 2011 - on the study in [10], these parameters are set to C0 = 1/64, ... In this paper, we set γθ = βθ = 1/5. .... optimization for realtime image resizing.

1MB Sizes 0 Downloads 189 Views

Recommend Documents

Image and video retargeting using adaptive scaling function - Core
Aug 28, 2009 - Wolf et al. [8] described the retargeting process from a source image to a target image as a system of linear equations and solved the system in ...

Image and video retargeting using adaptive scaling function - eurasip
Aug 28, 2009 - ABSTRACT. An image and video retargeting algorithm using an adaptive scaling function is proposed in this work. We first construct an importance map which uses multi- ple features: gradient, saliency, and motion difference. Then, we de

Image and video retargeting using adaptive scaling function - eurasip
Aug 28, 2009 - first construct an importance map which uses multi- ple features: gradient, saliency, and motion difference. Then, we determine an adaptive ...

Adaptive compressed image sensing based on wavelet ...
Thus, the measurement vector y , is composed of dot-products of the digital image x with pseudo-random masks. At the core of the decoding process, that takes.

Estimating the Energy Savings Potential in Compressed Air Systems
Quincy Compressor has been awarded patent. #7,519,505 for developing a standardized “Method and system for estimating the efficiency rating of a.

Nespresso / A creative approach to retargeting
All together now. It was, as ever, a collaborative process, Acting as the creative agency, we worked with the client and the markets performance agency to define ...

Discontinuous Seam-Carving for Video Retargeting
ence to the spatial domain by introducing piece-wise spa- tial seams. Our spatial coherence measure minimizes the change in gradients during retargeting, ...

Image Reconstruction in the Gigavision Camera
photon emission computed tomography. IEEE Transactions on Nuclear Science, 27:1137–1153, June 1980. [10] S. Kavadias, B. Dierickx, D. Scheffer, A. Alaerts,.

Retargeting a C Compiler for a DSP Processor
Oct 5, 2004 - C source files produce an executable file that can execute on the DSP. The only .... The AGU performs all of the address storage and address calculations ... instruction can be seen here: Opcode Operands. XDB. YDB. MAC.

BAYESIAN COMPRESSED SENSING USING ...
weight to small components encourages sparse solutions. The CS reconstruction ... knowledge about the signal. ... MERIDIAN PRIORS. Of interest here is the development of a sparse reconstruction strategy using a Bayesian framework. To encourage sparsi

Worst Configurations (Instantons) for Compressed ...
ISA. We say that the BasP fails on a vector e if e = d, where d solves Eq. (2). We start with the following two definitions. Definition 1 (Instanton): Let e be a k-sparse vector (i.e. the number of nonzero entries in e is equal to k). Consider an err

Multihypothesis Prediction for Compressed ... - Semantic Scholar
May 11, 2012 - regularization to an ill-posed least-squares optimization is proposed. .... 2.1 (a) Generation of multiple hypotheses for a subblock in a search ...... For CPPCA, we use the implementation available from the CPPCA website.3.

Network Tomography via Compressed Sensing
and fast network monitoring methods has increased further in recent years due to the complexity of new services (such as video-conferencing, Internet telephony ...

Nadir Akinci Dissertation (Compressed).pdf
Page 1 of 2. Stand 02/ 2000 MULTITESTER I Seite 1. RANGE MAX/MIN VoltSensor HOLD. MM 1-3. V. V. OFF. Hz A. A. °C. °F. Hz. A. MAX. 10A. FUSED.

Network Tomography via Compressed Sensing
that require high-level quality-of-service (QoS) guarantees. In. 1996, the term network tomography was coined by Vardi [1] to encompass this class of methods ...

Hybrid Shift Map for Video Retargeting
example, seam carving techniques [1, 12] try to minimize ... Figure 1 (b) is an illustration of this constraint on two tem- ..... When applying the more advanced.

Criteo gains great results and scale by retargeting audiences ...
Jun 1, 2011 - Advertisers get a click-through rate 6 times greater than static ... plus emerging markets like Brazil and South Korea. “We're continuing to grow ...

TIME DELAY ESTIMATION: COMPRESSED SENSING ...
Sampling theorems for signals that lie in a union of subspaces have been receiving growing ..... and reconstructing signals of finite rate of innovation: Shannon.

DISTRIBUTED COMPRESSED SENSING OF ...
nel data as a compressive blind source separation problem, and 2) proposing an ... interesting to use the compressive sampling (CS) approach [3, 4] to acquire HSI. ... sentation in some basis, CS is an alternative to the Shannon/Nyquist sampling ...

355894194-Panduan-Akademik-Unsyiah-2016.compressed-ilovepdf ...
2. c x x x.. Lời giải: Phương trình tương đương với. Page 4 of 13. 355894194-P ... pressed.pdf. 355894194-Pa ... mpressed.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying 355894194-Panduan-Akademik-Unsyiah-2016.compressed-i

1-bit Compressed Quantization
[email protected]. Abstract. Compressed sensing (CS) and 1-bit CS cannot directly recover quantized signals preferred in digital systems and require time consuming recovery. In this paper, we introduce 1-bit compressed quantization (1-bit CQ) th

COMPRESSED SENSING BLOCK MAP-LMS ...
ABSTRACT. This paper suggests to use a Block MAP-LMS (BMAP-. LMS) adaptive filter instead of an Adaptive Filter called. MAP-LMS for estimating the sparse channels. Moreover to faster convergence than MAP-LMS, this block-based adap- tive filter enable