Wavelet Transform-based Clustering of Spectra in Chemometrics A. Ukil J. Bernasconi H. Braendle ABB Corporate Research, Segelhofstrasse 1K, Baden-Daettwil, CH-5405, Switzerland {abhisek.ukil, jakob.bernasconi, hubert.braendle}@ch.abb.com Keywords: Chemometrics, Spectroscopy, Spectrum, Wavelet transform, Data clustering.

1 Introduction Spectroscopy is the study of the interaction between radiation (electromagnetic radiation, or light, as well as particle radiation) and matter. Spectrometry is the measurement of these interactions and an instrument which performs such measurements is a spectrometer or spectrograph. A plot of the interaction is referred to as a spectrum [1]. Such spectra are typically used in food and agrochemical quality control applications, pharmaceutical, medical diagnostics, etc. Various kinds of statistical and mathematical data analysis techniques are applied to process those spectra, grouped under an umbrella term ‘chemometrics’ [2]. In this paper, we propose a novel algorithm using the wavelet transform to cluster different spectra originating from different chemical applications and acquired by different spectrometers. The clustering technique insensitive to spectrometer type or spectra acquisition method, can be applied effectively to classify the spectra for further complex chemometrics operations which supposedly benefit from the clustered data.

2 Background Information 2.1 Spectroscopy & Chemometrics Infrared (IR) or near-infrared (NIR) spectroscopy is a method used to identify a compound or to analyze the composition of a material. This is done by studying the interaction of infrared light with matter. The plot called IR/NIR spectrum shows the absorption of the infrared light at various different wavelengths. In IR spectroscopy the considered frequency is usually somewhere between 14,000 and 10cm−1. Note that the frequency scale applied is wavenumbers (measured in reciprocal centimeters) rather than wavelengths (measured in microns). On the other hand, the absorption of the materials at different frequencies is measured in percent. Chemometrics is the application of mathematical or statistical methods to chemical data. Chemometrics is utilized for various purposes like multivariate calibration, signal processing/ conditioning, pattern recognition, experimental design and so forth [2].

2.2 Wavelet Transform The Wavelet Transform (WT) is a mathematical tool, like Fourier transform for signal analysis. Wavelet analysis is the breaking up of a signal into shifted and scaled versions of the original (or mother) wavelet. The Continuous Wavelet Transform (CWT) is defined as the sum over all time of the signal multiplied by the scaled and shifted versions of the wavelet function ψ . The CWT of a signal x(t) is defined as

CWT (a, b) =



x(t )ψ a*,b (t ) dt ,

(1)

ψ ((t − b) / a ) .

(2)

−∞

ψ a ,b (t ) = a

−1 / 2

ψ (t ) is the mother wavelet, the asterisk in (1) denotes a complex conjugate, and

a, b ∈ R, a ≠ 0 , (R is a real

continuous number system) are the scaling and shifting parameters respectively. The Discrete Wavelet Transform (DWT) is given by choosing a = a 0m , b = na 0m b0 , t = kT in (1) & (2), where T = 1.0 and k , m, n ∈ Z , (Z is the set of positive integers).

DWT (m, n) = a0− m / 2

(

)

x[k ]ψ *[(k − na0mb0 ) / a0m ] .

(3)

The Multiresolution Signal Decomposition (MSD) [4] technique decomposes a given signal into its detailed and smoothed versions. Let x[n] be a discrete-time signal, then MSD technique decomposes the signal in the form of WT coefficients at scale 1 into c1[n] and d1[n], where c1[n] is the smoothed version of the original signal, and d1[n] the detailed version.

c1[n] =

h[k − 2n] x[k ] ,

(4)

g[k − 2n] x[k ] .

(5)

k

d1[n] = k

where h[n] and g[n] are the associated filter coefficients that decompose x[n] into c1[n] and d1[n] respectively. The next higher scale decomposition will be based on c1[n]. Thus, the decomposition process can be iterated, with successive approximations being decomposed in turn, so that the original signal is broken down into many lower resolution components. This is called the wavelet decomposition tree [4], shown in Figure 1.

Figure 1 – Multiresolution signal decomposition and wavelet decomposition tree

3 Application of Wavelet Transform & Clustering 3.1 WT on spectra WT can be effectively applied for processing IR or NIR spectra in chemometrics [5],[6]. Figure 2 shows one such example. In Figure 2, plot (i) shows a typical NIR chemometrics spectra which has been recorded over 220 wavenumbers; plot (ii) shows the smoothed version (see (4)) of the wavelet decomposition, and plot (iii) the detailed version (see (5)). A 4-scale decomposition using the Haar [3] mother wavelet has been performed on the 220-point spectra, giving

220 24

≈ 14

wavelet coefficients (in plots ii & iii).

The smoothed version coefficients (Figure 2, plot ii) resemble the original spectra but reducing the data points from 220 wavenumbers to 14 wavelet coefficients. Therefore, smoothed coefficients could be utilized for data reduction purpose, which is necessary in spectra calibration. This is because all the wavenumbers (220 in Figure 2, plot i) cannot be used due to the risk of overfitting. Besides WT, other popular data reduction techniques in chemometics are partial least squares (PLS) [2], [7], principal component analysis (PCA) [2], [7]. If we have a spectra with N wavenumbers and we require m number of reduced data point (m

and N are strictly integers, m < N ) using wavelet decomposition at scale S, then the relationship is given by

m = round

N . 2S

(6)

From (6), knowing N and m, the optimum scale can be determined as

S = round log

N m

log(2) ,

(7)

where S should be strictly an integer (that is why the round operation). In practice, to avoid under- and overfitting, m is restricted between 5 and 20 [5],[6]. From Figure 2, the detailed coefficients (plot iii) show the changes in the frequency profile. For example, in plot (iii) of Figure 2, we can notice three peaks at coefficients 1, 5, 12 corresponding to the wavenumbers 1, 70, 180 in plot (i) respectively. These peaks show the changes in the frequency profile of the spectra at those points occurring due to changes in the constituent absorptions [4].

3.2 Proposed Clustering Algorithm The clustering algorithm is depicted by the flowchart in Figure 3. As per the flowchart in Figure 3, for the raw spectra of N wavenumbers, we determine the optimum scale using (7) assuming a realistic data reduction point m between 5 and 20. Wavelet decomposition using Haar [3] mother wavelet up to this optimum scale gives smoothed and detailed coefficients. The smoothed coefficients could be used for calibration purpose. On the detailed coefficients, we perform a search for the maximum coefficient (among the m coefficients). The maximum coefficient is a key parameter for clustering the spectra. That is, for different spectra we monitor their respective maximum detailed wavelet coefficient.

Figure 2 – Application of wavelet transform on chemometrics spectra

Figure 3 – Flowchart of the spectra clustering algorithm

The kind of clustering revealed depends on applications, and could be indicating spectra methodology, different samples, etc, as we shall see in the application results section. One key point in achieving effective

clustering is to use the raw spectra before applying any preprocessing. In industrial chemometrics, standard preprocessing steps like multiplicative scatter correction (MSC) [2] (described briefly in the following section), mean centering [2], etc are performed on the spectra to minimize the adverse effects due to instrumental variations, changes in recording conditions, etc.

3.3 Multiplicative Scatter Correction (MSC) si (k ) represents spectral absorbances of sample i ( i = 1,2, ( k = 1,2, , N ), then

If

, n ) at wavelength number k

si (k ) = ai s (k ) + bi , 1 s (k ) = n

n i =1

(8)

si ( k ) .

(9)

For MSC, the coefficients ai ’s and bi ’s are obtained by solving the following optimization problem N

arg min ai ,bi

k =1

[si (k ) − ai s (k ) − bi ]2 .

(10)

This gives

si s − s i s

ai =

bi =

s2 − s

,

2

si s 2 − si s s s2 − s si =

1 N

N

2

(11)

,

(12)

si ( k ) .

(13)

si (k ) − bi , ai

(14)

k =1

Then, we have

siMSC (k ) = where ai ’s and bi ’s are obtained using (11-12).

4 Application Results Figure 4 shows the monitoring of maximum detailed wavelet coefficients of various NIR spectra of chemical components. Figure 4 shows clearly two clusters (boxed and marked A and B in Figure 4). For this particular case, 487 different spectra were recorded using two methods ‘singlebeam’ [2] and ‘non-singlebeam’ [2], which are reflected by the wavelet-based clusters. We used the maximum detailed coefficient as the clustering parameter. The reason is if we use other detailed wavelet coefficients, the classification gap between the clusters gets reduced. This is shown in Figure 5, where we plot the first, second and third highest detailed wavelet coefficients for the same chemical spectra. Figure 6 shows an example how preprocessing destroys the clusters. From Figure 6, we can notice that as an effect of the MSC preprocessing on the spectra, we lose the distinct clustering pattern as in Figure 4.

Figure 4 – Clustering of NIR spectra of chemical components

Figure 5 – 1st, 2nd, 3rd highest coefficients as clustering parameter Figure 6 – MSC results in unsuccessful clustering

5 Conclusion In this paper, we have presented a novel algorithm based on wavelet decomposition for clustering chemometrics spectra. Maximum detailed wavelet coefficients of spectra decomposed to an optimum scale are used as the clustering parameter. Maximum detailed coefficients are chosen as they reflect the frequency profile changes to highest order compared to decreasing coefficients. The clustering step has to be applied before any preprocessing like multiplicative scatter correction, baseline correction, etc. The smoothed wavelet coefficients arriving from the otherside of the wavelet decomposition could instead be used as a data reduction tool for calibration. However, it is proposed that calibration accuracy improves for clustered spectra compared to all-mixed spectra [4], [5]. The kind of clusters revealed varies depending on particular application.

6 References [1] Wikipedia resources. Available: http://en.wikipedia.org [2] R.G. Brereton, Chemometrics: data Analysis for the Laboratory and Chemical Plant, John Wiley & Sons, Ltd, England, 2003. [3] I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics, Philadelphia, 1992. [4] S. Mallat, “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation,” IEEE Trans. Pattern Anal. Mach. Intelligence vol. 11, no. 7, pp. 674-693, 1989. [5] F.T. Chau, Y.Z. Liang, J. Gao, X.G. Shao, Chemometrics From Basics to Wavelet Transform, John Wiley, NJ, 2004. [6] B. Walczak (ed.), Wavelets in Chemistry, Elsevier, Amsterdam, 2000. [7] K.H. Esbensen, D. Guyot, F. Westad, L.P. Houmoller, Multivariate Data Analysis – In Practice, 5th ed., Camo Process AS, Norway, 2002.

Wavelet Transform-based Clustering of Spectra in ...

Various kinds of statistical and mathematical data analysis techniques ... originating from different chemical applications and acquired by different spectrometers.

133KB Sizes 1 Downloads 238 Views

Recommend Documents

Libraries of XAFS Spectra - GitHub
Can the IXAS or IUCr support and host these libraries? The model of ... Web-based Libraries of XAFS Spectra have obvious utility for sharing data: Look up ... But: relational databases have been shown many times to be the best ... Page 10 ...

Hedging of options in presence of jump clustering
provides evidence that the considered specification can fit S&P500 options prices ..... The first graph of Figure 1 plots returns of the index on the sampling period.

ADJUSTMENT OF WAVELET DETAILS FOR ...
to preserve radiometric characteristics of the original multispectral data in the sharpened product. ... The wavelet transform became a standard tool to ensure.

ADJUSTMENT OF WAVELET DETAILS FOR ...
tially resolved multispectral data. ... to preserve radiometric characteristics of the original multispectral data in the .... in the areas where the magnitude of HD. Na.

Characteristics of 1D spectra in finite-volume LES with ...
measure-preserving stochastic mapping events that represent notional eddies in the turbulent ... Note that the SGS field is defined such that Üu (xj)=0 and hence ...

Clustering and Visualization of Fuzzy Communities In ...
Bezdek et al. [7-9] collected data from small groups of students in communications classes, and developed models based on reciprocal fuzzy relations that quantified notions such as distance to consensus. An idea that is gaining traction in social net

Clustering of Earthquake Events in the Himalaya – Its ...
hypothesis has been offered to explain the spatial and temporal clustering ..... 131, pp. 505-525. Meyer, S. L. (1975) Data analysis for Scientists and Engineers.

Measuring Volatility Clustering in Stock Markets
Sep 15, 2007 - Division of Business Administration, ... apply it to a high-frequency data of the financial markets. ... In the next section, we describe the data sets and methods used in this paper. .... istry of Education through the program BK 21.

artefacts removal in eeg signal us- ing wavelet ...
... NEURAL NETWORK. Prof Dr. R. Kawitkar and Ms. Rohini More. ... gad college of Engineering in Pune, Maharashtra(India). • Rohini More: Post-graduate ...

Cosmological effects of the first stars: evolving spectra ...
Oct 16, 2007 - and hot stars with virtually no metal content. They are believed to have been formed in the early. Universe. The have not been observed. Yet.

The synthesis, molecular structure and spectra properties of ... - Arkivoc
In our work we have replaced the exocyclic oxygen atom with sulfur in .... there is no possibility for increasing the electron density on selenium atom i.e. aliphatic ...

Clustering in Data Streams
Small(er)-Space Algorithm (cont'd). • Application in data stream model. − Input m (a multiple of 2k) points at a time. − Reduce the first m points to 2k medians. − Maintain at most m level-i medians. − On seeing m, generate 2k level-(i+1) m

Wavelet Framework for Improved Target Detection in ...
Jan 1, 2009 - introspection of the features in a three dimensional view, we find a better and clear .... Learning and Cybernetics,. Vol.1, No.2, pp, 360-363.

Albeverio, Altaisky, Gauge Invariance in Wavelet-Based Quantum ...
Albeverio, Altaisky, Gauge Invariance in Wavelet-Based Quantum Field Theory.pdf. Albeverio, Altaisky, Gauge Invariance in Wavelet-Based Quantum Field ...

Wavelet-Based Smoke Detection in Outdoor Video ...
processing block which resizes the image by applying a bicubic interpolation .... image and comparing it with a non-smoke frame from a data base, and selecting ...

Source Coding and Digital Watermarking in Wavelet Domain
domain. We shall be discussing the DWT – advantages over DCT, .... As per Table 1, the cost of the lifting algorithm for computing the wavelet transform.

fast wavelet-based single-particle reconstruction in cryo ...
The second idea allows for a computationally efficient im- plementation of the reconstruction procedure, using .... We will use the following definition for the Fourier transform of a D-dimensional function f(x) = f(x1,...,xD): ... the above definiti