Signal Processing 97 (2014) 262–268
Contents lists available at ScienceDirect
Signal Processing journal homepage: www.elsevier.com/locate/sigpro
Chromatic interpolation based on anisotropy-scale-mixture statistics Satohiro Tajima n, Ryohei Funatsu, Yukihiro Nishida Science & Technology Research Laboratories, Japan Broadcasting Corporation, 1-10-11, Kinuta, Setagaya, Tokyo, Japan
a r t i c l e in f o
abstract
Article history: Received 19 July 2013 Received in revised form 17 September 2013 Accepted 27 October 2013 Available online 11 November 2013
We present an efficient approach for reconstructing full-color images from imagery data acquired using a color filter array (CFA). On the basis of our understanding of early visual processing in humans, we utilize a correlation among the multi-scale directional-anisotropy statistics observed in natural images in order to estimate the missing high-spatial-frequency chromatic signals. The directional anisotropy in the high-spatial-frequency component is efficiently and robustly estimated according to a mixture of anisotropy statistics derived from lower-frequency components. We show that, in spite of its simple implementation, our proposed method gives an excellent numerical performance as well as a perceptually natural output. The present approach is expected to provide a promising methodology for the on-line processing of recently rising high-resolution image data acquired using a singlesensor CFA. & 2013 Elsevier B.V. All rights reserved.
Keywords: Image color processing Demosaicing Color filter array Image statistics
1. Introduction Most single-sensor digital color cameras acquire images using color filter array (CFA) technology [1], which samples only one color component for each spatial location. To obtain full-color images using such an imaging device, the missing color information must be interpolated according to observable color-mosaic data; this process is termed color demosaicing, and a wide variety of approaches have been proposed to address this problem (for a recent review, see [2–4]). Generally, there is a trade-off between computational cost and performance in regards to color demosaicing. For example, naïve approaches, such as nearest-neighbor or bilinear interpolation, or other interpolation techniques using linear filters (e.g., [5,6]), are computationally less demanding, but can lead to loss of the high-spatial-frequency component in images and the n Corresponding author. Current address: Brain Science Institute, RIKEN, 2-1, Hirosawa, Wako, Saitama, Japan. Tel.: þ81 48 462 1111; fax: þ 81 48 467 9670. E-mail address:
[email protected] (S. Tajima).
0165-1684/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2013.10.027
emergence of false color (color signals that do not exist in the original photographic subject) as an artifact. On the other hand, more sophisticated methods based on iterative and relatively heavy computation (e.g., [7–12]) often yield high-quality reconstructions of the full-color images with fewer color errors. The latter group of methods can be suitable for off-line processes or processing of relatively small images. However, such computationally demanding approaches are not practical for the on-line processing of large images. In particular, when targeting the growing field of high-resolution and high-frame-rate video imaging [13,14], utilizing a demosaicing algorithm that maintains low computational complexity remains a key issue, even when taking into account the increasing speed of digital processing [6]. To date, there are few studies on demosaicing algorithms that are optimized for real-time processing of huge image data [22]. We present an efficient demosaicing method for reconstructing full-color images from imagery data acquired Humans are known to recognize visual textures according to the relations among multiple spatial-frequency selective nervous units [15–18]. Signal intensities of neighboring
S. Tajima et al. / Signal Processing 97 (2014) 262–268
spatial frequencies and directions (orientations) are strongly correlated in natural scene imageries (Fig. 1), and the artificial images generated to preserve such correlation structures tend to be perceived as natural texture [19]. This indicates that if we interpolate (extrapolate) the missing high-spatial-frequency component in CFA data to preserve the multi-scale correlation among the spatial-frequency bands, it is expected that the resulting image would be perceived as natural. Working off this notion, we propose a method that reconstructs the missing high-spatial-frequency chromatic signals by utilizing the multi-resolution correlation among the directionalanisotropy statistics that are observed in natural scenes imageries. The directional anisotropy in the high-spatialfrequency component is estimated according to a mixture of anisotropy statistics derived from lower-frequency components. Our algorithm can be implemented as a combination of simple linear filters, which enables us to retain a reasonable computational cost. In contrast with most conventional approaches, our method does not explicitly minimize numerical reconstruction errors such as peaksignal-to-ratio (PSNR) measures, but instead aims to preserve perceptual naturalness. As we show later, despite its simple implementation, the proposed method provides the same high numerical performance as other approaches based on the error-minimizing concept. Our approach is expected to provide a promising methodology for on-line demosaicing of high-resolution color image data.
263
2. Method Our method takes a Bayer-type RGB color mosaic as input data and reconstructs the full-RGB image in the following three computational steps (Fig. 2): (1) estimation of local directional anisotropy, (2) interpolation of the green component, and (3) interpolation of the red and blue components. We elaborate on the details of each of these steps below. 2.1. Estimation of local directional anisotropy For the following interpolation calculation, we estimate the local directional characteristics. Although the estimation of local direction itself has been a widely used strategy in recent demosaicing algorithms [20–23], we focus on the local directional anisotropy in the context of the perceived natural image statistics. First, we apply a set of simple linear filters to the CFA data in order to extract local directional features at multiple spatial scales, as shown in Fig. 3a–d. H n ¼ hn nCFA;
ð1Þ
V n ¼ vn nCFA;
ð2Þ
where Hn and Vn are the horizontal and vertical feature maps at the nth spatial scale, respectively; CFA is the input color mosaic data; hn and vn represent the horizontal and vertical feature images at the nth spatial scale, respectively; and the
Anisotropy:
|Horizontal|
(coarse filter)
(fine filter)
Fine filters
Anisotropy (coarse filter)
Coarse filters
0.8 0.6 0.4 0.2
1
0.5
0
0.5
1
Anisotropy (fine filter)
Fig. 1. Directional anisotropies computed in different spatial scales tend to correlate with each other in natural images. (a) An example of a natural image. (b) Spatial distribution of directional anisotropy values computed with orientation Gabor filters with a coarse scale (0.125 cycles/pixel). The directional anisotropy was derived by computing the ratio of the absolute values of the horizontal edge filter output to the sum of the horizontal and vertical filter output (see the equation in the figure). (c) Spatial distribution of the directional anisotropy values computed using orientation Gabor filters with a fine scale (0.25 cycles/pixel). (d) Relation of the anisotropy values computed with coarse and fine filters.
264
S. Tajima et al. / Signal Processing 97 (2014) 262–268
CFA image
filters H1~Hn, V1~Vn
<
H, V
Anisotropy
+
G
Green
G R
+ R-G
+
B
+
True color image
R
+
+
B-G +
<
-
<
-
B
Fig. 2. A schematic illustration of the proposed method. First, we estimate directional-anisotropy distribution by directly applying directional linear filters to the input CFA image (first row). Based on the anisotropy estimation, the missing green signals are interpolated as a weighted linear sum of the vertically and horizontally interpolated signals (second row). Finally, the red and blue signals are reconstructed via the linear interpolations of the chromatic differential signals (R–G, and B–G; third and fourth row). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3. Estimation of the spatial distribution of directional anisotropy. (a–d) We first obtain the multi-scale directional feature maps. The figure shows the case in which two directions (vertical [a and b] and horizontal [c, d]) and two spatial scales (fine [a and c] and coarse [b and d]) were used. The symbols above maps a–d indicate the features that were used for the acquisition of individual directional maps in the present simulation; the white, black, and gray pixels respectively indicate the positive (1), negative ( 1), and neutral (0) values of the feature parameters. (e and f) Next, we derive multi-scale anisotropy maps by combining directional maps within each spatial scale. (g) Finally, the multi-scale anisotropy maps are unified into a single anisotropy map. (h) The true RGB image.
S. Tajima et al. / Signal Processing 97 (2014) 262–268
asterisk indicates the two-dimensional convolution. In the numerical simulation in the subsequent section, we set the filter parameters to 1, 1, or 0 for the “positive,” “negative,” or “neutral” value of the feature templates, respectively (also see the Fig. 3 legend). Regarding directional estimation, some recently proposed algorithms interpolate the green or other individual chromatic signals before directional estimation [21,22]. However, we empirically found that it is more effective to apply directional linear filters directly to the input CFA image. This not only utilizes multiple-color information but also reduces computational complexity. Next, we compute the directional anisotropy at each spatial scale by combining directional feature maps within individual spatial scales (Fig. 3e and f): jH n jp λn ¼ ; p jH n j þ jV n jp þc
1 N ∑ λn ; Nn¼1
ð4Þ
where N is the number of the spatial scale. We refer the value represented by λ in (4) as anisotropy-scale-mixture statistics. As we mentioned above, the rationale behind combining the anisotropy statistics over multiple spatial scales is to extract the local texture (direction) statistics in a way that simulates human visual perception. Another practical advantage of utilizing multiple-scale representation is that it leads to a robust directional estimation by eliminating noise through the averaging procedure. Optionally, we can apply an additional smoothing process to the unified anisotropy map by using a filter with a moderately sized averaging field. 2.2. Interpolation of the green component We reconstruct the green component by using the local directional anisotropy estimated in the previous step. The missing green signal at the ith row and jth column is interpolated as a weighted linear sum of horizontally and vertically directed interpolation results: Gði; jÞ ¼ λGH ði; jÞ þ ð1 λÞGV ði; jÞ;
that are of sufficiently high appearance quality for practical use. 2.3. Interpolation of the red and blue components Once the green component is reconstructed, an arbitrary method can be utilized for the interpolation of the remaining red and blue components. In the simulation presented in the next section, we used a method that was inspired by the work of Zhang and Wu [21]. To reconstruct the red component, we first compute the difference between the green and red components (at the location of the red filter in CFA data): ΔRG ði; jÞ ¼ Rði; jÞ Gði; jÞ:
ð6Þ
ð3Þ
where λn refers to the anisotropy statistics at the nth spatial scale, indicating bias toward the horizontal component by 0:5 o λn r 1 and vertical bias by 0 rλn o0:5; p is an arbitrary real-valued parameter that controls how small variations between feature intensities are emphasized; c is a small constant to the denominator of the left-hand side in (3) that is in place in order to avoid dividing by zero. Finally, the multiple-scale anisotropy maps are unified into a single anisotropy map (Fig. 3g): λ¼
265
ð5Þ
where directional interpolations GH and GV are obtained by applying a linear filter to the input CFA data. Any kind of linear filter can be used for the directional interpolation; for example, a second-order directional Laplacian filter [8,24], which we used in the subsequent simulation, is a filter that can be easily implemented and which provides desirable results. Aside from a second-order Laplacian filter, we also observed that a simple directional nearestneighbor filter, which is given by (1/2, 0, 1/2), yields results
We fill in the difference value at the blue filter location by using its four diagonally closest difference values: ΔRG ði; jÞ ¼
1 fΔRG ði 1; j 1Þ þ ΔRG ði 1; j þ1Þ 4 þ ΔRG ði þ 1; j 1Þ þ ΔRG ði þ 1; j þ 1Þg:
ð7Þ
Next, we interpolate the difference value at the green filter by using the four closest difference values (including the filled ones given in (7)): ΔRG ði; jÞ ¼
1 fΔRG ði 1; jÞ þ ΔRG ði þ 1; jÞ 4 þ ΔRG ði; j 1Þ þΔRG ði; jþ 1Þg:
ð8Þ
In the final stage of this step, we recover the red value by adding the completed difference values to the green component at each spatial location: Rði; jÞ ¼ ΔRG ði; jÞ þGði; jÞ:
ð9Þ
The blue component is interpolated by substituting R and B in the above procedure. It should also be noted that we observed only a minor variation in the overall performance caused by typical changes in the red- and blue-interpolation procedures, including substituting the subtraction by the division by the green component in (6). 3. Results and discussion We compared demosaicing performance of the proposed method with other conventional algorithms reported in previous studies. In order to quantify these performances, we first prepared full-RGB images as ground truths, and generated color-mosaic data from them that simulated the subsampling process with CFA. We then set the parameters of the proposed method as p¼1 and c¼0.001 in (3). Many of the conventional algorithms were implemented according to the source codes provided by their authors. Fig. 4 depicts the demosaicing results from the conventional and proposed methods for two example images. We quantified the demosaicing performances on the basis of two measures: the overall PSNR and S-CIELAB color error, which simulates the perceptual differences in colors [25]. To compute the S-CIELAB color error, we used the color primary coordinate of sRGB, and assumed an image resolution of 60 pixels/deg; even slight changes in these parameters did not affect the trends of the results. Fig. 4a and c indicate that the relative performances were roughly
266
S. Tajima et al. / Signal Processing 97 (2014) 262–268
36
6 9 8 5 7 10
34
Sample 1
Proposed
35
Sample 2
9 6 Proposed 8 5 10 7
4 32
PSNR [dB]
30 30
4
3
3
28 26
2
25
2 24 22
1
1 20 0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
20 0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Log S-CIELAB error
True color
1. Nearest neighbor
2. Bilinear
True color
1. Nearest neighbor
2. Bilinear
3. Funatsu et al. (2011)
4. Malvar et al. (2004)
5. Dubois et al. (2012)
3. Funatsu et al. (2011)
4. Malvar et al. (2004)
5. Dubois et al. (2012)
6. Zhang & Wu (2005)
7. Gunturk et al. (2002)
8. Getreuer (2011)
6. Zhang & Wu (2005)
7. Gunturk et al. (2002)
8. Getreuer (2011)
Proposed
9. Buades et al. (2009)
10. Hamilton & Adams (1997)
Proposed
9. Buades et al. (2009)
10. Hamilton & Adams (1997)
Fig. 4. Demosaicing performance results of the conventional and proposed methods for two sample images. (a and c) Demosaicing results quantified by PSNR and averaged S-CIELAB color error [25]. The number next to each symbol indicates each demosaicing method, as shown in panel b and d. The relative performances were roughly consistent between the two sample images, although the absolute performances could depend on the image contents. For both test images, the proposed method demonstrated a relatively high performance in spite of its computational simplicity.
consistent between the contents of the sample images. For both test images, the proposed algorithm performed relatively well. Of particular note is that in spite of its computational simplicity, our proposed approach was found to yield comparable results to the conventional methods such as iterative optimization process or block matching, which are based on relatively heavy computations. Fig. 4b and d demonstrate that the proposed method provided perceptually natural reconstructions of the original test image without the emergence of false colors or a severe loss of image sharpness, even for the sample
containing an extremely high spatial-frequency signal (e.g., Fig. 4c). We also reviewed the performance of our method by using 12 different image samples (Fig. 5). It is worth noting that our method does not explicitly minimize the numerical reconstruction errors, such as the one measured by PSNR, but instead aims at preserving perceptual naturalness. Nevertheless, the proposed method was also found to show a high PSNR and a small value of color errors on average, which was comparable to other approaches based on the error-minimizing concept (e.g., [21]).
S. Tajima et al. / Signal Processing 97 (2014) 262–268
12 test images
267
34
6 Proposed 5
32
4
PSNR [dB]
30
3
28
26
2
24
1 22
20 1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Log S-CIELAB error Fig. 5. The proposed method demonstrated a high performance on average. (Left) The 12 test images used in the evaluation of the average demosaicing performances. Six images were cropped from Kodak's standard image data set, while the remaining six images were pseudo-randomly obtained from online image resources. (Right) The PSNR and color errors obtained from the test images. The square symbols and bars respectively represent the average and standard error of performance for each method. The number next to each symbol indicates each demosaicing method (same convention as in Fig. 4); here we omitted methods 8–10 for better plot visibility.
The difference between our proposed method and conventional methods becomes clear when we focus on the perceived naturalness of the demosaicing results for images that contain sharp and continuous edges. Fig. 6 compares the results for image samples that contain added subtitles. The proposed method provided acceptable reconstructions of the subtle shapes created by the calligraphic letters, while the conventional methods tended to generate “jagged” edges in the resulting images. It should be noted that since the major difference between our method and Zhang and Wu's method [21] is the use of scale mixture, the apparent quality of the reconstruction by our method is considered to be a benefit of the proposed scale-mixture-based approach, which is motivated by models of human visual perception. Finally, because of its low complexity, our method can be completed very efficiently. As a rough index, it took only 0.010 70.0008 s (mean7s.d.) to process each of 12 images shown in Fig. 5 (tested an unoptimized code on MATLAB© under Intels Core™ i7 CPU, 2.40 GHz). It was much shorter than the methods by Zhang and Wu [21]
(0.42 70.015 s) and by Funatsu et al. [22] (0.12 70.0015 s), and even comparable to the naïve application of MATLAB© built-in nearest-neighbor (0.18 70.0063 s) or bilinear (0.15 70.0021 s) interpolation functions (TriScatteredInterp). Although these running times do not directly reflect the complexity of processing, we consider these results as still significant. In addition, it is straightforward to parallelizing the multi-scale computation in our method to further boost its efficiency. 4. Conclusion We proposed a color interpolation (demosaicing) method based on anisotropy-scale-mixture statistics. Our method was inspired by multi-scale directional representation in human visual processing, which was found to provide interpolation results from CFA data that offers a high level of perceptual naturalness as well as good numerical performance. This approach can be easily implemented using a combination of simple linear filters, which minimizes the computational cost and leads to an efficient hardware
268
S. Tajima et al. / Signal Processing 97 (2014) 262–268
2. Bilinear
4. Malvar et al. (2004)
5. Dubois et al. (2012)
6. Zhang & Wu (2005)
Proposed
Fig. 6. Comparison of the demosaicing results for image samples with added subtitles. The proposed method yielded a more perceptually natural reconstruction of the calligraphic shapes compared to other conventional methods.
implementation. The present approach is expected to provide a promising methodology for the on-line processing of image data acquired using a single-sensor CFA in high spatiotemporal resolution. References [1] B.E. Bayer, Color imaging array. US Patent 3971065 (1976). [2] B.K. Gunturk, J. Glotzbach, Y. Altunbasak, R.W. Schafer, R.M. Mersereau, Demosaicking: color filter array interpolation, IEEE Signal Processing Magazine 44 (2005) 44–54. [3] X. Li, B. Gunturk, L. Zhang, Image demosaicing: a systematic survey, Proceedings of SPIE 6822 (2008) 1–15. [4] D. Menon, G. Calvagno, Color image demosaicking: an overview, Signal Processing: Image Communication 26 (2011) 518–533. [5] H. Malvar, L. He, R. Cutler, High-quality linear interpolation for demosaicing of Bayer-patterned color images, Proceedings of ICASSP III (2004) 485–488. [6] J. Portilla, Low-complexity linear demosicing using joint spatialchromatic image statistics, in: Proceedings of the IEEE International Conference on Image Processing, vol. 1, 2005, I–61–64. [7] A. Buades, B. Coll, J.-M. Morel, C. Sbert, Self-similarity driven color demosaicking, IEEE Transactions on Image Processing 18 (6) (2009) 1192–1202. [8] J.F. Hamilton Jr., J.E. Adams, Adaptive color plan interpolation in single sensor color electronic camera. U.S. Patent, 1997, 5629734. [9] B.K. Gunturk, Y. Altunbasak, R.M. Mersereau, Color plane interpolation using alternating projections, IEEE Transactions on Image Processing 11 (9) (2002) 997–1013. [10] P. Getreuer, Color demosaicing with contour stencils, in: Proceedings of the 17th International Conference on Digital Signal Processing, 2011, pp. 1–6. [11] K. Hirakawa, X. Meng, An empirical Bayes EM-wavelet unification for simultaneous denoising, interpolation, and/or demosaicing, in: Proceedings of the IEEE International Conference on Image Processing, 2006, pp. 1453–1456. [12] N.-X. Lian, L. Chang, Y.-P. Tan, V. Zagorodnov, Adaptive filtering for color filter array demosaicking, IEEE Transactions on Image Processing 16 (10) (2007) 2515–2525.
[13] ITU-R BT.2020, Parameter values of for ultra-high-definition television systems for productions and international programme exchange. International Telecommunications Union, Geneva, 2012. [14] T. Yamashita, K. Masaoka, K. Ohmura, M. Emoto, Y. Nishida, M. Sugawara, ‘Super Hi-Vision’ video parameters for next-generation television. SMPTE Mot, ImageJ 121 (4) (2012) 63–68. [15] V. Billock, Neural acclimation to 1/f spatial frequency spectra in natural images transduced by the human visual system, Physica D 137 (3–4) (2000) 379–391. [16] C.A. Párraga, T. Troscianko, D.J. Tolhurst, The effects of amplitudespectrum statistics on foveal and peripheral discrimination of changes in natural images, and a multi-resolution model, Vision Research 45 (25–26) (2005) 3145–3168. [17] S. Tajima, M. Okada, Discriminating natural image statistics from neuronal population codes, PLOS ONE 5 (3) (2010) 1–10e9704 5 (2010) 1–10. [18] S.L. Elliott, M.A. Georgeson, M.A. Webster, Response normalization and blur adaptation: data and multi-scale model, Journal of Vision 11 (2) (2011) 1–18. [19] J. Portilla, E.P.E. Simoncelli, A parametric texture model based on joint statistics of complex wavelet coefficients, International Journal of Computer Vision 40 (1) (2000) 49–70. [20] J. Hwang, H. Lee, Adaptive image interpolation based on local gradient features, Signal Processing Letters, IEEE 11 (3) (2004) 359–362. [21] L. Zhang, X. Wu, Color demosaicking via directional linear minimum mean square-error estimation, IEEE Transactions on Image Processing 14 (12) (2005) 2167–2178. [22] R. Funatsu, T. Yamashita, K. Mitani, Y. Nojiri, Single-chip color imaging for UHDTV camera with a 33 M-pixel CMOS image sensor, in: R. Widenhorn, V. Nguyen (Eds.), Proceedings of SPIE, 7875, 2011. (787502:1–11). [23] L. Zhang, X. Wu, A. Buades, X. Li, Color demosaicking by local directional interpolation and nonlocal adaptive thresholding, Journal of Electronic Imaging 20 (2) (2011) 23016. (23016). [24] J.E. Adams, Design of practical color filter array interpolation algorithms for digital cameras, Proceedings of SPIE 3028 (1997) 117–125. [25] X. Zhang, B. Wandell, A spatial extension of CIELAB for digital color image reproduction, SID International Symposium Digest of Technical Papers 27 (1996) 731–734.