High capacity audio watermarking based on wavelet transform D.Zanganeh and H.Sajedi Abstract— This paper suggests a novel high capacity robust audio watermarking algorithm by using the high frequency band of the wavelet decomposition, for which the human auditory system (HAS) is not very sensitive to alteration. The main idea is embedding two secret bits into a singlewavelet coefficient. The experimental results show that the method has very high capacity (about 8 kbps), without significant perceptual distortion(SNR about 21dB) and provides robustness against common signal processing attacks such as highpass filtering,echo, resampling, amplifying MPEGcompression (MP3).etc.

Index Terms—Audio Watermarking, wavelet transform, high capacity. ——————————u——————————

1

Introduction y virtue of the new advancements in computer and telecommunication networks, multimedia files are produced, stored and distributed easily across the globe. However, the ownership and copyright ofmultimedia files are not usually protected. Digital watermarking has been proposed in recent years as a means of protecting multimedia contents from intellectual piracy. This is achieved by modifying theoriginal content, by inserting a signature which can be extracted, when necessary, as a proof of ownership. Indeed, many effective digital image and video watermarking algorithms have been proposed andimplemented at a commercial scale. However, and due to the fact that the human audio system is far more complex and sensitive than the human visual system, few algorithms have been proposed for audiowatermarking. Watermarking is a technique, which copyright information (watermark signal) embeds imperceptibly in the host signal in a way that does not interfere with normal usage of it. The watermark signal must be readily extracted from the watermarked signal to characterize the copyright owner, completely. Several important issues that should be considered in watermarking systems are: Perceptual transparency, Robustness against the attacks and capacity. The trade-offs between the Data rate of watermark data and the robustness against watermark attacks is necessary. It is not possible to attain a high robustness against the attacks and capacity rate of embedded watermark at the same time. Furthermore,there are two kinds of detections in watermarking system. Blind detection which has no knowledge of the original signal and informed (non-blind) detection, which uses the original signal for extracting the watermark signal. Blind detection is beneficial when the original signal is not easily accessible. According to the implementation process of audio watermarking algorithms, they can bedivided into the time domain methods and transform domain methods. In time

B

————————————————

domain schemes, the hidden bits are embedded directly into the time signal samples. Time domain watermarking systems are usually weaker against signal-processing attacks compared to the transform domain counterparts.From the view of theperformance of watermarks against attacks, the performance of the transform domain methods arecommonly considered better than that of the time domain methods [15].Phase modulation [1] and echo hiding [2] are well known methods in the time domain. In frequency domain watermarking, after taking one of the usual transforms such as Discrete Fourier Transform (DFT), the Discrete Cosine Transform (DCT),the Modified Discrete Cosine Transform (MDCT)[5] or the Wavelet Transform (WT) [4, 6, 7, 9,10,11] from the signal, the hidden bits are embedded into the resulting transform coefficients In this paper, we discuss the audio watermark embedding based on the human auditorycharacters using the wavelet transform. The wavelet transform has many advantages in audio signal processing. Its inherent frequency multiresolution and logarithmic decomposition of frequency bands resembles the human perception of frequencies, since it provides the decomposition to mimicthe critical band structure of the HAS. In the proposed scheme, the last high frequency band of the third level wavelet decomposition (cd3), is used for embedding. This band of wavelet samples is divided into frames and then, the average of the absolute values of each frame’s samples is computed[9]. After that, in the embedding process, all wavelet coefficients are scanned and if each coefficient satisfies a given condition then the corresponding secret bit is embedded into it. The corresponding two secret bits is embedded into a single wavelet coefficient and the two next secret bits is embedded into the next suitable coefficient. The experimental results show that this scheme has an excellent capacity with transparency and robustness against common signal processing attacks is achieved. In section two, we give a brief description of the discrete wavelets transform. In section three, we describe in details the watermarking embedding and extraction proce-

• D.zangene. is with the Department ofComputer Engineering,dezfulbranch,Islamic Azad Universty,dezful, Iran. • H.Sajedi. is with the Department of Computer Engineering, Amirkabir University of Technology, Iran. © 2011 JCSE http://sites.google.com/site/jcseuk/

29

dures of the proposed algorithm. In section four, we evaluate the performance of the algorithm and present simulation results with respect to inaudibility and robustness. We conclude in section five with some remarks.

2. THE DISCRETE WAVELETS TRANSFORM Wavelets are special functions which, in a form analogous to sines and cosines in Fourier analysis, are used as basal functions for representing signals. They provide powerful multi-resolution tool for the analysis of non-stationary signals with good time localization information.Starting from the original audio signal S, DWT produces two sets of coefficients as shown in Figure 1 [13]. The approximated coefficients (A) (low frequencies) are produced by passing the signal (S) through a low pass filter (y). The details coefficients (D)(high frequencies) are produced by passing the signal (S) through a low pass filtering.

Figure 1: One-level DWT decomposition.

Depending on the application and the length of the signal, the low frequencies part might be further decomposed into two parts of high and low frequencies. Figure 2 shows a 3-level DWT decomposition of signal (S). The original signal (S) can be reconstructed using the inverse DWT process.

Figure 2: Three-level DWT decomposition

bedding in the DWT-transformed audio signal. The operations are described in the following steps: 1.Convert the two-dimensional image matrix Img into a one-dimensional vector W of length M1 × M2. W = {W (i ) = Im g ( k , j ), i = k × M 2 + j ,0 ≤ k ≤ M 1 ,0 ≤ j ≤ M 2 } (1) 2.The audio signal was decomposed using wavelet with three levels. 3. Divide the cd3 samples into frames of a given length and, based on the average of the absolute values of each frame’s samples, compute the average mi for each frame.

mi =

1 s

is

∑

cj

(2)

j = ( i −1) s +1

where(cj)are the wavelet coefficients of the high frequency sub-band (cd3), (s) is the frame size and(mi) is the average of the i-th frame. 4.The marked wavelet coefficients(Cj') are achieved by using equation (3). - mi cj / mi < k , w(l) = 0 , w(l+1) = 0 - 3*mi/2 cj / mi

Due to its excellent spatio-frequency localization properties, the DWT is very suitable toidentify areas in an audio signal where a watermark can be embedded effectively [14]. Many DWT-based audio watermarking techniques can be found in literature[4, 6, 7, 9, 10, 11].

3. PROPOSED SCHEME The algorithm we propose here is based on applying the DWT on the digital audio signal in which a watermark should be embedded. The watermark embedding method is performed in the wavelet domain. A binary image is embedded into the significant coefficients cd3 selective from detail coefficients.The algorithm consists of two procedures; watermarking embedding procedure and watermarking extraction procedure.

3.1. Watermark Embedding Procedure The embedding procedure performs three major operations; watermark pre-processing, DWT-based frequency decomposition of the audio signal, andwatermark em-

Figure 3:The watermark embedding procedure.

3.2.Watermark Extraction Procedure

The watermark detection is performed by using the DWT transform and the embedding parameters. Since the host

30

audio signal is not required in the detection process, the detector is blind. The detection process can be summarized in the following steps: 1.The watermarked audio was decomposed using wavelet with 3 levels. 2.Divide the cd3 samples into frames of a given length and, based on the average of the absolute values of each frame’s samples, compute the average m'i for each frame.

m 'i =

1 s

is

∑

c ' j (4)

ness against attacks would be achieved. In order to remove the effects of the subject factor, we adopted the SNR and the normalized correlation (NC) to measure the performance of the embedding algorithm in this paper. Signal to Noise Ratio (SNR) is a statistical difference metric which is used to measure the similitude between the undistorted original audio signal and the distorted watermarked audio signal. The SNR computation is done according to Equation (6), where x(n) corresponds to the original signal, and x'(n)corresponds to the watermarked signal.

j = ( i −1) s +1

3.The secret bit stream is achieved by using equation(5). -((k+α) / 6) ≤ c'j/ m'i< (-(k+α) / 2), w'(l) = 0, w'(l) = 0 -((k+α) / 6) ≤ c'j/ m'i< ٠۰ , w'(l)= 0, w'(l) = 1 W'l=(5) ٠۰ ≤ c'j / m'i< ((k+α) / 6), w'(l) = 1 , w'(l) =0 ((k+α) /6) ≤ c'j /m'i< ((k+α) /2) , w'(l) = 1 , w'(l) = 1 Where (C'j) is the sample of the high frequency band of the third level waveletdecomposition (cd3) of the marked signal, (α) is the strength of watermark and (W'l) is the lth bit of the extracted secret stream. If the signal is distorted by attacks, the absolute mean ofthe coefficients (m'i) is slightly modified. However, theexperimental results showthat this change does not affect the extraction process since an interval, not aconstant number, is used for extracting.

∑

SNR = 10log10

∑

N −1

n =0 N −1 ^ n =0

x2 (n)

[ x (n)-x ' (n)]2

(6)

After embedding watermark, the SNR of all selected audio signals using the proposed method are above 20dB [10] which ensures the imperceptibility of our proposed system. The normalized correlation (NC) are defined as follow: M lenght M width

NC =

∑ ∑ i =1

j =1

' ⎡ ⎣ M (i , j ) M (i , j ) ⎤ ⎦

M lenght M width

(7)

2

∑ ∑ [ M (i , j )] i =1

j =1

whereM(i, j) is the watermark of the original binary image, M'(i, j) is the watermark of therecovery binary image, N1× N2 is the size of the binary image watermark. Table 1 shows the perceptual distortion and the payload obtained for the three songs with BER equal to zero (or near zero) under the attacks detailed in Table 2,for k = 5, α =1.2 and the frame size equal to 5. Table 1: Results of 3 signals (robust against table 2 attacks) Audio File

Time (m:sec)

SNR (dB)

Payload (bps)

Beginning of the End

3:16

22.1

8011

Citizen,Go Back toSleep

1:57

20.3

8001

Loopy Music

10

20.7

8008

Average

1:47

21.03

8005

Note that all the results have an average SNR is 21dB and capacity is around 8,000 bps for all the experiments. The proposed method is thus able to provide large capacity whilst keeping imperceptibility. Table 2 illustrates the effect of various attacks provided in the Stirmark Benchmark for Audio v1.0 on audio signalLoopymusic. Figure 4: The watermark extraction procedure.

4. EXPERIMENTAL RESULTS To evaluate the performance of the proposed method, four pieces of audio were used for watermarking. The last clip was selected from sound quality assessment material (SQAM) vocal audio files [16]. A three-level wavelet transform was applied to each frame. We used the 8-coefficient daubechies wavelet (db8) to decompose the audio signal. Simulation results also showed that by using daubechiesfilter (db8) best robust-

Table 2: Robustness test results for five selected files and comparison with schemes in this

Attack

NC proposed

AddBrumm _100 AddBrumm _2100 AddBrumm _3100

31

BER % proposed

[9]

[5]

[11]

[12]

[8]

0.11 0.11

_ _

0

_

0.05

0

_

_

_

0.15

_

0

_

_

_

0.9971 0.9970 0.9960

achieved.However, by increasing the frame size, we enforce the same value for a greaternumber of samples, which decreases the audio quality and transparency.It may seem that using high frequencies for embedding the secret bits would leadto a fragile scheme against low-pass filtering. Indeed, the experimental resultsshow that the secret stream is damaged by low-pass filters with a cutofffrequency 2 kHz, but these filters damage the cover signal as well.

Addnoise AddSinus Add FFT noise

0.7831

9

0_4

0

0

0_1

0

0.9633

1.4

0

0

_

0_8

_

0.7848

9

1_2

1

_

1_2

_

5. CONCLUSION

FFT_HLPassQuick

0.9925

0.29

0_2

5

_

1_4

_

FFT_RealReverse

0.9973

0.1

11_24

2

_

_

_

FFT_Stat1 MP3 128

0.3498

30

14_23

1

_

_

_

0.7763

10.1

0_3

_

37.1

0_5

14

Requantization16to 12 Resampling 44/22/44

0.8985

5

2_4

_

٠۰

_

1

0.9980

0.08

38_47

0

٠۰

5

2

Echo LSBZero

0.9514

1.8

0_3

63

_

5

6

0.9996

0

0

0

_

0

0

RC_HighPass 1 to 22k

0.9908

0.37

0_1

_

_

0_1

_

RC_LowPass 2 to 22k

0.7810

9.1

0_4

0

12.7

0

3

Amplify

0.9937

0.2

0_1

0

0

0

2

Compressor

1

0

_

_

_

_

_

In this paper, we describe a high capacity data hiding algorithm for audiosignal by using the high frequency band of the wavelet decomposition and embed two bits in each coefficient,leads to a high-capacity watermarking algorithm for digital audio.In this scheme,we divide the high frequency band into frames and use the average of the frames as a key value.Furthermore, the suggested scheme is blind, since it does not need the original signal for extracting the hidden bits The experimental results show that this scheme has an excellent capacity(about 8 kbps) with SNR (about 21 dB) and provides robustness against common signalprocessing attacks such as amplify, resampling,requantization,echo and filtering.A comparison with other schemes in the audio watermarking literature is also provided, illustrates that the suggested scheme outperforms the capacity of other approaches while keeping robustness and transparency in the acceptable ranges.

As already remarked, this scheme uses the high frequency band of the wavelet coefficients for embedding. Hence, it may seem that it would be fragile against attacks which manipulate or suppress the high frequency data. In Table 3, The MP3 and RC low-pass filter attacks are analyzed. This table shows that the BER is increased by decreasing the MP3 rate also by decreasing cut-off frequency of the low-pass filter. The suggested method is still robust (BER < 15%) against these attacks for a wide range of the attack parameters. Table 3: Robustness results for a variety of audio types under MP3 and RC Low-pass filter attacks MP3 attack

RC lowpass attack

REFERENCES [1]

[2] [3]

[4] MP3 rate BER Cut-off frequency of lowpass filter (kHz) BER

320

256

192

160

128

0.9

1.7

3.9

5.1

10.1

20

15

10

5

2

2

2.3

3.9

11.4

26

[5]

[6]

The comparison shows that the compared schemes are robust against common attacks and transparency is in an acceptable range, about 21dB. However, the capacity of these schemes is just a few hundred bps (except for the method [8, 11]). A few attacks such as FFT_Stat1 in Table 2 and RC Lowpass filter with cut-off frequency less than 2KHz in Table 3 remove the hidden data (BER > 15%).Thiscomparison shows that the capacity of the proposed scheme is very remarkable,whilst keeping the transparency and BER in their acceptable ranges. Using frames of wavelet samples results in an increased robustness againstattacks, since the average of the samples is more robust than the value of eachsample. Thus, by increasing the frame size, better robustness can be

[7]

[8]

W. N. Lie, L. C. Chang, “Multiple Watermarks for Stereo AudioSignals Using Phase-Modulation Techniques”, IEEE Trans.SignalProcessing, Vol. 53, No. 2, pp. 806–815, Feb. 2005. H. J. Kim, Y. H. Choi, “A novel echo hiding scheme with backward and forward kernels”,IEEE Trans. Circuit and Systems, pp. 885-889,Aug. 2003. S. Esmaili, S. Krishnan, K. Raahemifar, “A novel spread spectrum audio watermarkingscheme based on time - frequency characteristics,” IEEE Conf. Electrical and ComputerEngineering, Vol. 3, pp. 1963-1966, May 2003. S. Wu, J. Huang, D. Huang, Y. Q. Shi, “Efficiently SelfSynchronized Audio Watermarkingfor Assured Audio Data Transmission”, IEEE Trans. Broadcasting, Vol. 51, No. 1, pp. 69-76, Mar. 2005. J.J.Garcia-Hernandez,M.Nakano-Miyatake,andH.PerezMeana, “Data hiding in audio signal using Rational DitherModulation”,IEICE Electron. Express,Vol. 5, No. 7,pp.217222,2008. Z.Xu,W.Wang,“Digital Audio Watermarking Algorithm Based on QuantizingCoefficients,”International Conference on IntelligentInformation Hiding and Multimedia Signal Processing (IIH-MSP’06), Pasadena,CA USA, pp. 41-46,2006. M.A.Akhaee,S.GhaemMaghami,and N.Khademi,“A Novel Technique forAudio Signals Watermarking in the Wavelet and Walsh Transform Domains”,IEEEInternational Symposium on Intelligent Signal Processing and CommunicationSystems (ISPACS),Tottori, Japan, p. 171-174,2006.

W.Lanxun, Y.Chao, P.Jiao,“An audio watermark embedding algorithm basedon mean-quantization in wavelet domain,“In: 8th International Conference on ElectronicMeasurement and Instruments, pp. 2-423–2-425,2007. [9] M.Fallahpour, and D.Megias, “High capacity audio watermarking using the high frequency band of the wavelet domain”, Multimedia tools and Applications,2010. [10] N.KhademiKalantari, S.M.Ahadi,“A Robust Audio Watermarking Scheme UsingMean Quantization in the Wave-

32

[11]

[12]

[13]

[14] [15] [16]

let Transform Domain,” Submitted to IEEEInternational Symposium on Signal Processing and Information Technology(ISSPIT), Cairo, Egypt,2007. M.Pooyan,A.Delforouzi,“Adaptive and robust audiowatermarking in wavelet domain,”Third InternationalConference on International Information Hiding andMultimedia Signal Processing,V2, Pages 278-290,2007. M.Fallahpour, andD.Megias,“High capacity audio watermarking using FFT amplitude interpolation,“IEICE Electronics Express, vol.6, no.14, pp. 1057-1063,2009. S.Mallat,“A theory for multi-resolution signal decomposition: The wavelet Representation,“IEEE Transactions on Pattern Analysis And Machine Intelligence; 11(7): 674693,,1989. M.Hsieh, D.Tseng, &Y.Huang,“ Hiding Digital Watermarks Using MultiresolutionWavelet Transform,“IEEE Transactions on Industrial Electronics; 48(5): 875-882,2001. J. Cox, J. Kilian, and T. Shamoon, “Secure Spread Spectrum Watermarking for Image,Audio and Video,“IEEE trans. on Image Processing, vol.6, pp.1673-1687, 1997. No, Really, “Rust”. http://www.jamendo.com/en/album/7365.

DavodZangenereceived the B.Sc. degree in Electronic Engineerin-

gandandM.Sc, degrees in Computer Engineering from Islamic Azad Universtydezful,iran, in 2008 and 2011respectively. His interest covers signal and image processing. HediehSajedireceived the B.Sc. degree in Computer Engineering

from AmirKabir University of Technology in 2003, and M.Sc. and Ph.D, degrees in Computer Engineering (Artificial Intelligence) from Sharif University of Technology in 2006 and 2010 respectively. Now she is an invited professor in Amirkabir University of Technology. Her research interests include Multimedia data hiding, steganography and steganalysis methods, pattern recognition, and machine learning for signal analysis and segmentation.