Mobile Camera Identification Using Demosaicing Features - IEEE Xplore

Viewer
Transcript

Mobile Camera Identification Using Demosaicing Features Hong Cao

Alex C. Kot

School of Electrical and Electronic Engineering Nanyang Technological University Singapore [email protected]

School of Electrical and Electronic Engineering Nanyang Technological University Singapore [email protected]

Abstract—Mobile cameras are typically low-end cameras equipped on handheld devices such as personal digital assistants and cellular phones. The fast proliferation of these mobile cameras has brought up concerns on the origin and integrity of their output images. In this paper, we identify blindly the source mobile cameras by combining 3 types of demosaicing features extracted from a test image. Through Eigenfeature regularization and feature reduction, comparison results show our Eigen demosaicing features perform significantly better than several conventional features in distinguishing 9 mobile cameras of dissimilar models based on cropped image blocks. By including cameras of the same and very similar models, in 15cam identification, our Eigen demosaicing features achieves excellent classification accuracies in distinguishing cameras from dissimilar models and the classification accuracies expectedly tend to confuse among cameras of the same or very similar models.

I.

INTRODUCTION

Since the first commercial camera phone (J-SH04 by Sharp) was made in 2000, mobile cameras incorporated in the handheld devices have experienced tremendous growth until nowadays that over 90% of available handsets have mobile cameras attached [1]. Though the quality of there mobile camera are hardly comparable to that of the digital still cameras (DSC), the resolution and image quality of these mobile cameras have been steadily improving so that the technological gap between mobile cameras and DSCs are constantly shortening. The application prospect of these mobile cameras is highly promising and their mobile photos are popularly shared on Internet (e.g. photo blogs) and the newsworthy pictures can even be accepted and published in news reports to make great impact. This has also brought up serious concerns on the origin and integrity of these low-end mobile pictures. Recently, passive image forensics has been extensively studied mainly for high-quality photos from DSCs. Previous works have attempted to detect various intrinsic image regularities mainly for the tasks of source identification, tampering discovery and steganalysis. Though mobile cameras

978-1-4244-5309-2/10/$26.00 ©2010 IEEE

(a)

(b) Figure 1. (a) A Typical Camera Processing Pipeline; (b) 4 possible Bayer Color Filter Arrays (CFA), where the Subscript 1,2,…,4 Denote the Relative Position in the 2×2 Periodical CFA Lattice

in general share a similar processing pipeline in Fig. 1 (a) as the DSCs, it is worth to note that, a mobile camera is usually about 10 times cheaper, 10 times smaller in size (both sensor chip and camera head) and consume 10 times less power than a DSC [1]. Under such constraints, capturing pictures of descent quality requires special software processing including strong denoising, crude demosaicing and white balancing algorithms and JPEG compression. As some special processes may render certain unique high-frequency regularities undetectable, not all passive image forensic methods for DSCs can be readily extended to mobile cameras. Several existing forensic works for mobile photos includes: By extending the identification technique developed for DSCs in [7], Alles et al. [8] detects the PRNU sensor noises to identify the individual mobile cameras and webcams from their output images. McKay et al. [4] compute both color interpolation coefficients (CIC) and some noise statistics (NStats) features to identify the image acquisition devices including cellular cameras and scanners. Tsai et al. [8] combine several sets of statistical image features including color features, image quality metrics (IQM) and wavelet features to identify both DSCs and mobile cameras. Similar to [8], Celiktutan et al. [12] use various fusion methods to combine 3 sets of statistical features including IQM, multi-scale wavelet (MSW) and binary similarity (BS) features to distinguish cell-phone models. With 192 features selected using sequential forward feature

1683

selection, their work achieves 95.1% accuracy in identifying 16 mobile cameras including one pair of mobile cameras of the same model.

weights. Since for each sample in the ith category, we can write a similar linear equation in (2), by organizing all these equations into a matrix form, we have

Our previous work [5, 6, 15] accurately estimates the underlying demosaicing formulas from DSC photos through a precise reverse classification and a partial second-order image derivative correlation model. Since mobile cameras are also single-sensor based and they also rely on color filtering and demosaicing techniques to economically produce color, we extend our previous work to compute 3 types of demosaicing features, which characterize both the applied demosaicing algorithm and the post-demosaicing processing [15]. Through Eigenfeature regularization and feature reduction, our 20dimensional Eigen features show superior results in identification of the source mobile cameras based on cropped image blocks as compared to the same number of Eigen features extracted from existing statistical features.

 s′′i1T   ei1   xi′′1   wi1          ei =   , x i =   , Q i =   and w i =   s′′iK T   xiK′′  eiK   wim    Since K m , the optimal weights w i can be solved as a regularized least square solution below,

The rest of this paper is organized as follows. The details of computing our proposed demosaicing features are described in section II. Section III shows some experimental results and discussions. Section IV concludes this paper.

where λ is a small empirical regularization constant to avoid overfitting and L ∈ m×m denotes an identity matrix. With the weights solved for each category, we readily collect 3 types of demosaicing features as follows.

II.

DEMOSAICING FEATURE EXTRACTION

To estimate the demosaicing weights for a given color image I, we first separate the demosaiced samples from the sensor samples. Since Bayer CFAs in Fig. 1 (b) has been dominantly used commercially [2], with an assumption the first Bayer CFA in Fig. 1(b) is the correct underlying CFA, we can write I and the RAW samples S as

{R, g , B}12 {R, G, b}22

…  ,  

 r11  S =  g 21  

g12 …     

b22

(3)

where

(

min ei

2

+ λ wi

2

) ⇒ w = (Q i

T i

Qi + λ L ) Qi xi −1

(4)

Weights: As representation of the underlying demosaicing formula, the m weights of the ith category are used as features. The number of weights can vary from one category to another. For all 16 categories, we derive 312 weights.

Figure 2. Proposed Identification Framework

 {r , G , B}11  I =  { R, g , B}21  

ei = x i - Q i w i

(1)

Error cumulants (EC): The amount of absolute prediction error tells the goodness that our estimated weights match with the underlying demosaicing formula. For all 16 demosaicing categories, we compute 64 error cumulants including mean, variance, skewness and kurtosis. Normalized group sizes (NGS): The reverse classification technique [6] partitions all demosaiced samples into 16 categories, which best reveal the actual demosaicing grouping. Since distribution of the demosaiced samples is a good indication of the adopted demosaicing grouping, we compute 8 such features as percentages of demosaiced samples distributed to 8 selected categories.

Since the other 3 shifted Bayer CFAs in Fig. 1 (b) are almost equally likely adopted in practice, we also separate the where capital letters represent the demosaiced samples. Then demosaiced samples and sensor samples accordingly and by following a 2-pass reverse classification in [6, 15], we repeat the above feature extraction process 3 more times. As a partition all demosaiced samples from 3 color channels into 16 result, we derive a total of 1248 weights, 256 ECs and 32 categories with known demosaicing axes. We assume that the NGSs. Though the dimensionality increase 4-fold, our features same demosaicing formula has been applied for each category. automatically includes the useful information of the CFA and th th For the k sample xik of the i category of demosaiced samples rich relative information for the 4 channels of demosaicing {xik}, we express the prediction error eik as features. It is worth to note that other non-Bayer CFAs such as CMY and SuperCCD share some similar properties to Bayer (2) CFAs such as the 2-by-2 periodicity and mosaic lattice. Our eik = xik′′ − s′′ik T w i technique based on Bayer CFAs also likely capture the unique where 1≤i≤16 and 1≤k≤K and K denotes the size of the ith demosaicing characteristics for these non-Bayer CFAs. category, xik′′ is the partial second-order derivative of xik Moreover, we observe that our extracted features are distorted computed from I along the demosaicing axis of the ith category, uniquely for different post-demosaicing processing. Not only can our features be used to distinguish various source s′′ik ∈ m×1 is the vector of the corresponding support partial demosaicing algorithms but also the different postsecond-order derivatives computed from S, w i ∈ m×1 is the demosaicing processing pipelines when the demosaicing weight vector representing the underlying demosaicing algorithm is fixed. Therefore, our proposed features contain formula of the ith category and m is the selected number of

1684

the fingerprint which is unique to the entire camera software processing pipeline. In view of the high feature dimensionality, we further perform the Eigenfeature regularization (ERE) in [14] to derive a compact set of discriminant Eigen features. Through regularizing the unreliable part of an Eigen spectrum computed based on a small training set, which are followed by a whitening transformation and principle component analysis (PCA) feature reduction, this ERE method works excellently in selecting a highly discriminant low-dimensional subspace from a very high dimensional Eigenfeature space. EXPERIMENTAL RESULTS AND DISCUSSIONS

To test efficacy of our proposed features, we have set up a mobile photo set ID Brand Model containing 1500 photos from N1 3230 a total of 15 mobile cameras N2 3250 in Table I, where cameras of N3 5300 identical or very close N4 6280 models are present. These Nokia N5 7390 photos are collected from 15 N6 7390 contributors by their mobile N7 N73 cameras and all photos are N8 N73 direct camera output stored N9 N73 in the default JPEG format. S1 K750 The default photo sizes of S2 K750 Sony different camera models Ericsson S3 K800 vary from 1280×960 to S4 W800 L1 LG KG320 2048×1536. These photos cover a large variety of O1 O2 XDA Atom common indoor and outdoor scenes. We crop 4 non-overlapping blocks of about 512×512 at fixed locations from each photo to get 400 blocks per camera. For each camera, we randomly apportion the images into a training set of 300 blocks cropped from 75 mobile photos and a test set of the remaining blocks. The random apportion is repeated for another 4 times so that we have 5 different combinations of the training and test image sets. TABLE I.

(a)

MOBILE CAMERAS USED

A. Comparison for 9-Camera Model Identification Our proposed features are associated with camera software processing, which are known to be similar for mobile cameras of the same model. In this experiment, we test our proposed features for identification of 9 cameras of dissimilar model labels including N1, N3, N4, N5, N7, S1, S3, L1 and O1 in Table 1. By applying the same ERE feature reduction and based on the same number of reduced Eigen features, we compare in Fig. 3 our demosaicing features with several stateof-arts statistical image forensic features in literature for mobile camera identification. These feature sets include multiscale wavelet (MSW) features [11, 12], binary similarity measures (BS) [12], image quality metrics (IQM) [12], color interpolation coefficients (CIC) [3, 4], noise statistics (NStats) [4], the combination of MSW, BS and IQM [12] and the combination of CIC and NStats [4]. For the BS features, we compute a total of 432 features based on the description in [12]. Though this number is still less than the 480 BS features used in [12], our BS feature set still covers majority of the BS

100%

Weights (1248)

90%

Test accuracy

III.

80%

MSW(216)

NStats(60)

70% IQM(40)

CIC (441) BS (432)

Pro

pos All ed

CIC+ NStats

EC(256) MSW+ BS+ IQM

60% NGS(32) 50%

(b) Figure 3. Comparison of Various Feature Sets in 9-Cam Identification with the Number Inside Parentheses Indicating Dimension of the Original Feature Set; (a) Average Error Rates Vs Number of Eigen Features (1NN Classifier with Cosine Dissimilarity Measure are Used); (b) Average Test Accuracy Achieved using PSVM Classifier with 20 Eigen Features

features. With a simple first nearest neighbor (1NN) classifier and based on cosine dissimilarity measure, the comparison result in Fig. 3(a) shows our proposed combination of Weights, EC and NGS works well and for all feature sets, the error rate stabilizes to a low level after selecting about 8 Eigen features. With 20 Eigen features and a more sophisticated probabilistic support vector machine (PSVM) classifier [13], the source identification accuracy is compared in Fig. 3(b). From it, our Weights features are the best performing type of features, whose test accuracy of 95.2% is 2.3% higher than that of the CIC features, 8.5% higher than MSW and 11.8% higher than BS. Our proposed combination of demosaicing features achieve an average test accuracy of 99.0%, which is 6.7% higher than the combination of MSW, BS and IQM features and 5.7% higher than the combination of CIC and NStats features. It should be noted that the test accuracy achieved by our joined demosaicing features is significantly better than that of the weights features alone. This shows our EC and NGS features have good complementary effects to the weight features. If we combine all individual feature sets including MSW, BS, IQM, NGS, EC, Weights, CIC and NStats and apply the same ERE feature reduction to 20 Eigen features, the obtained test accuracy (the last bar in Fig. 3(b)) is almost identical with that of the proposed combination of demosaicing features. This suggests our good results can hardly be further complemented by adding the several

1685

Ave. Model Accuracy = 94.8 3230 (N1) 3250 (N2) 5300 (N3) 6280 (N4) Nokia 7390 (N5) 7390 (N6) N73 (N7) N73 (N8) N73 (N9) K750 (S1) Sony K750 (S2) Ericsson K800 (S3) W800 (S4) LG KG320 (L1) O2 XDA Atom (O1)

N1 100

N2

N3

N4

N5

N6

N7

98.4

N8

N9

S1

0.4

S2

S3

S4

0.8

0.4

13.6 90.4 0.8 23.2

6.4 4.4

L1

O1

100 0.4

0.4

98.4

0.8 90.8 1.6 2.8 1.2 3.6

0.4 2.0 98.4

1.6

3.6

2.0

74.0 12.8 13.6

8.8 80.8 4.8

14.4 4.8 78.0 80.0 5.2

0.4

2.8

0.8

0.4 16.4

92.8 0.4

60.0 98.8

1.2 0.4

99.6

Figure 4. Confusion Matrix (%) of Source Identification for 15 Mobile Cameras, where the Empty Fields Indicate Zeros, Bold-Face Single Brackets Highlight Cameras of the Same Models and the Shaded Fields are the Identification Results among Cameras of the Same Models

conventional feature sets. With an assumption that the ERE process and the PSVM classifier have no bias, our proposed demosaicing features outperform the conventional forensic features in the context of mobile camera classification.

REFERENCES [1] [2]

B. 15-Camera Model Identification In this experiment, we derive 20 Eigen features from our combined demosaicing features and use PSVM classifier to identify all 15 mobile cameras listed in Table 1. The average results are presented in Fig. 4 in terms of a confusion matrix, where its (i,j)th element is the probability of identifying the images from the ith input camera as the jth camera. The result demonstrates that our demosaicing features are highly efficient in distinguishing different mobile camera brands as well as dissimilar models of the same brand. The average test accuracies achieved in identification of the 4 brands and the 11 models are respectively 99.4% and 94.8%, which are highly promising. For cameras of the same model or very close models, since the software processing is identical or very similar, our test accuracies expectedly tend to confuse with each other to certain extent depending on the camera models. IV.

[3]

[4]

[5] [6]

H. Cao and A. C. Kot, “Accurate Detection of Demosaicing Regularity from Output Images,” Proc. of ISCAS, pp. 497-500, 2009

[7]

J. Lucas, J. Fridrich and M. Goljan, “Digital Camera Identification from Sensor Pattern Noise”, IEEE Trans. Information Forensics and Security, vol. 1-2, pp. 205-214, 2006 E. J. Alles, Z. J. M. H. Geradts and C. J. Veenman, “Source Camera Identification for Low Resolution Heavily Compressed Images,” in Proc. of ICCSA, pp. 557-567, 2008 M.-J. Tsai, C.-L. Lai and J. Liu, “Camera/Mobile Phone Source Identification for Digital Forensics,” Proc. of ICASSP, vol. 2, pp. 221224, 2007 I. Avcibas, M. Kharrazi, N. Memon, B. Sankur, “Image Steganalysis with Binary Similarity Measures,” EUROSIP Journal of Applied Signal Processing, 17, 2749-2757, 2005 S. Lyu and H. Farid, “How Realistic is Photorealistic?”, IEEE Trans. on Signal Processing, vol. 53-2, pp. 845-850, 2005 O. Celiktutan, B. Sankur and I. Avcibas, “Blind Identification of Source Cell-Phone Model,” IEEE Trans. on Info. Forensics and Security, vol. 3-3, pp. 553-566, 2008 C.-W. Hsu, C.-C. Chang, C.-J. Lin, “A Practical Guide to Support Vector Classification,” 2008

[8]

[9]

CONCLUSION

In this paper, we propose to combine 3 types of demosaicing features for blind identification of mobile cameras. Comparison results for 9-cam model identification shows our combined demosaicing features achieve an average test accuracy of 99%, which confidently outperforms several conventional statistical image forensic features and their suggested combinations. This suggests our proposed demosaicing features are an excellent choice for blind identification of the low-end mobile cameras from their output images. By including numerous cameras of the same model and very close models, in 15-cam identification, our average test accuracy in identification of the individual cameras is still as high as 89.4%. In such a case, we find from the obtained confusion matrix that to some extent, the cameras of the same model expectedly tend to confuse with each other.

F. Mosleh (Kodak), “Cameras in Handsets Evolving from Novelty to DSC Performance, Despite Constraints,” Embedded.com, 2008 X. Li, B. Gunturk, and L. Zhang, "Image Demosaicing: a Systematic Survey, " in Proc. of SPIE, vol. 6822, 2008 A. Swaminathan, M. Wu and K. J. R. Liu, “Nonintrusive Component Forensics of Visual Sensors Using Output Images,” IEEE Trans .on Information Forensics and Security, vol. 2-1, pp. 91-106, 2007 C. McKay, A. Swaminathan, H. Gou, and M. Wu, “Image Acquisition Forensics: Forensic Analysis to Identify Imaging Source,” Proc. of ICASSP, pp. 1657-1660, 2008 H. Cao and A. C. Kot, “A Generalized Model for Detection of Demosaicing Characteristics,” Proc. of ICME, pp. 1513-1516, 2008

[10]

[11] [12]

[13]

[14] X. Jiang, B. Mandal and Alex C. Kot, “Eigenfeature Regularization

and Extraction in Face Recognition,” IEEE Trans. on PAMI, vol. 30-3, pp. 383-394, 2008

[15] H. Cao and A. C. Kot, “Accurate detection of demosaicing regularity for digital image forensics,” accepted in IEEE Trans. on Information Security and Forensics, 2009

1686

raw tool identification through detected demosaicing ... - IEEE Xplore