Image Sensor-Based Heart Rate Evaluation From Face ieee.pdf ...

Viewer
Transcript

618

IEEE SENSORS JOURNAL, VOL. 15, NO. 1, JANUARY 2015

Image Sensor-Based Heart Rate Evaluation From Face Reflectance Using Hilbert–Huang Transform Duan-Yu Chen, Jun-Jhe Wang, Kuan-Yi Lin, Hen-Hong Chang, Han-Kuei Wu, Yung-Sheng Chen, Senior Member, IEEE, and Suh-Yin Lee

Abstract— Monitoring heart rates using conventional electrocardiogram equipment requires patients to wear adhesive gel patches or chest straps that can cause skin irritation and discomfort. Commercially available pulse oximetry sensors that attach to the fingertips or earlobes also cause inconvenience for patients and the spring-loaded clips can be painful to use. Therefore, a novel robust face-based heart rate monitoring technique is proposed to allow for the evaluation of heart rate variation without physical contact with the patient. Face reflectance is first decomposed from a single image and then heart rate evaluation is conducted from consecutive frames according to the periodic variation of reflectance strength resulting from changes to hemoglobin absorptivity across the visible light spectrum as heartbeats cause changes to blood volume in the blood vessels in the face. To achieve a robust evaluation, ensemble empirical mode decomposition of the Hilbert–Huang transform is used to acquire the primary heart rate signal while reducing the effect of ambient light changes. Our proposed approach is found to outperform the current state of the art, providing greater measurement accuracy with smaller variance and is shown to be feasible in real-world environments. Index Terms— Computer vision, biomedical signal processing.

I. I NTRODUCTION

N

OWADAYS A high and growing percent of deaths worldwide are related to cardiovascular disease [1], including sudden cardiac death [2], hypertension [3], hemorrhagic shock [4] and septic shock [5]. Heart rate monitors are simple inexpensive tools that can detect potentially lifethreatening arrythmias or heart rhythm malfunctions. Heart rate variability is widely used as an indicator for likelihood of fatal myocardial infarction [6] and liver cancer [7].

Manuscript received April 6, 2014; revised July 22, 2014 and August 5, 2014; accepted August 7, 2014. Date of publication August 13, 2014; date of current version November 11, 2014. The associate editor coordinating the review of this paper and approving it for publication was Prof. Octavian Postolache. D.-Y. Chen, J.-J. Wang, and Y.-S. Chen are with the Department of Electrical Engineering, Yuan Ze University, Chung-Li 32003, Taiwan (e-mail: [email protected]; [email protected]; eeyschen@ saturn.yzu.edu.tw). K.-Y. Lin and S.-Y. Lee are with the Department of Computer Science, National Chiao-Tung University, Hsinchu 300, Taiwan (e-mail: [email protected]; [email protected]). H.-H. Chang is with the Department of Chinese Medicine, Chang Gung University, Taoyuan 333, Taiwan, and also with the Center for Traditional Chinese Medicine, Taoyuan Chang Gung Memorial Hospital, Chang Gung Medical Foundation, Taoyuan 333, Taiwan (e-mail: [email protected]). H.-K. Wu is with the Department of Chinese Medicine, Chang Gung University, Taoyuan 333, Taiwan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSEN.2014.2347397

Conventional electrocardiogram (ECG) heart rate monitors requires the patient to wear adhesive gel patches or chest straps that can cause skin irritation and discomfort. Commercial pulse oximetry sensors that attach to the fingertips or earlobes are also inconvenient for continuous use and the spring-loaded clips can cause discomfort. Recently increased attention has focused on touchless heart rate monitoring which does not require physical contact [8]–[10]. Heart rates can be evaluated using consecutive visual images of the subject’s face by measuring periodic variation of reflectance strength resulting from varying hemoglobin absorptivity across the visible light spectrum as blood volume in blood vessels increases and decreases with every heartbeat. In [11], each color sensor records a mixture of the original source signals with slightly different weights. Photoplethysmography (PPG) is typically implemented using dedicated light sources (e.g. red and/or infra-red wavelengths), but recent work [9], [10] has shown that pulse measurements can be acquired using digital camcorders/cameras with normal ambient light as the illumination source. In [8] Poh et al. present a novel methodology for non-contact, automated, and motion-tolerant cardiac pulse measurements from video images based on blind source separation. Firstly, they describe their approach and apply it to compute heart rate measurements from video images of the human face recorded using a simple webcam. Second, they demonstrate how this method can tolerate motion artifacts and then validate the accuracy of this approach using an FDA-approved finger blood volume pulse (BVP) measurement device. Their experimental results indicate that best performance is obtained by estimating the heart rate directly from the green channel intensity image. However, the success of face-based heart rate evaluation strongly depends on the measurement of facial illumination, which is the most significant factor affecting facial appearance. Both indoor and outdoor ambient lighting conditions are subject to constant change, and direct lighting sources can cast strong shadows that accentuate or diminish certain facial features. In recent years many appearancebased algorithms have been proposed to deal with these problems [12]–[15]. Belhumeur and Kriegman [13] showed that the set of images of an object in a fixed pose but under varying illumination conditions forms a convex cone in the image space. The illumination cones of human faces can be approximated well by low dimensional linear subspaces [16]. The linear subspaces are typically estimated from training data, requiring multiple images of the object

1530-437X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

CHEN et al.: IMAGE SENSOR-BASED HEART RATE EVALUATION

619

area is first captured and then the green channel is selected for reflectance decomposition since oxygenated hemoglobin absorbs green light. The variation of oxygen levels in blood can thus be modeled as the reflectance strength of the green channel. The method for reflectance decomposition is presented in Section 2.1. In Fig. 1(d), the reflectance strength alters with every heartbeat. However, this time signal is too noisy to identify each heartbeat period. To remove environmental noise, EEMD is used to split the original time signal from the face reflectance to more than one intrinsic mode function. The function (IMF4) that represents the heart rate is demonstrated in Fig. 1(e). Details of IMF decomposition are described in Section 2.2. Finally, the method for peak detection and heart rate counting is presented in Section 2.3. A. Reflectance Decomposition RGB color sensors acquire signals including illumination from environmental light sources and reflectance from captured targets. To provide robust heart rate monitoring, the face reflectance should be first decomposed. In general an original image I (x, y) can be denoted as [18] Fig. 1. Overview of the proposed framework: (a) original video frames covering the brow area, (b) green channel intensity image, (c) reflectance image, (d) time signal of reflectance strength, and (e) decomposed intrinsic mode function 4.

under different illumination conditions. Alternatively, modelbased approaches have been proposed to address the problem. Blanz et al. [17] fit a previously constructed morphable 3D model to single images. The algorithm works well across pose and illumination, however, the computational expense is very high. Computing the reflectance and the illuminance fields from real images is, in general, an ill-posed problem. Therefore, various assumptions and simplifications about illuminance reflectance, or both are proposed to solve the problem. A common assumption is that illuminance varies slowly while reflectance can change abruptly. In our proposed framework, the decomposed face reflectance, rather than the illumination part, is used for heart rate evaluation since, as blood volume in the blood vessels in the face expands with every heartbeat, the reflectance strength shown by the face will vary with hemoglobin absorptivity across the visible light spectrum. In addition, in Chinese clinical studies [23], variations or heart condition can be observed from the appearance of the brow. Therefore, in our proposed framework, we first decompose reflectance from the green channel and then decompose intrinsic mode functions (IMF) using ensemble empirical mode decomposition (EEMD) to separate the effect component that represents the real heart rate variation from environmental noise. The details of the proposed framework are described in Section 2. Experimental results and discussions are presented in Section 3. Finally, some concluding remarks are drawn in Section 4. II. FACE R EFLECTANCE -BASED H EART R ATE E VALUATION This section describes the proposed framework for touchless heart rate evaluation. The overview of the framework is introduced in Fig. 1. An image covering the subject’s brow

I (x, y) = R (x, y) L (x, y),

(1)

where R is mostly the reflectance of the scene, and L is mostly the illumination field, but they may not be “correctly” separated in a strict physical sense. After all, humans perceive reflectance details in shadows as well as in bright regions, but they are also cognizant of the presence of shadows. From this point on, we may respectively refer to R and L as reflectance and illuminance, but they are to be respectively understood as the perceived sensation and the perception gain. To derive our model, we turn to evidence gathered in experimental psychology. According to Weber’s Law the sensitivity threshold to a small intensity change increases proportionally to the signal level [19]. This law follows from experimentation on brightness perception that consists of exposing an observer to a uniform field of intensity I in which a disk is gradually increased in brightness by a quantity ΔI. The value ΔI from which the observer perceives the existence of the disk against the background is called brightness discrimination threshold. Weber noticed that ΔI I is constant for a wide range of intensity values. Weber’s law gives a theoretical justification for assuming a logarithmic mapping from input stimulus to perceived sensation, as shown in Fig. 2. Due to the logarithmic mapping for a weak stimulus (e.g., in deep shadows) small changes in the input stimulus elicit large changes in perceived sensation. When the stimulus is strong, small changes in the input stimulus are mapped to even smaller changes in perceived sensation. In fact, local variations in the input stimulus are mapped to the perceived sensation variations with the gain 1I as defined by 1 = R (x, y), (x, y) ∈ ψ, (2) I (x, y) Iψ (x, y) where Iψ (x, y) is the stimulus level in a small neighborhood Ψ in the input image. By comparing Eqs. (1)-(2), the model for the perception gain is defined as L (x, y) = Iψ (x, y) = I (x, y) ,

(3)

620

IEEE SENSORS JOURNAL, VOL. 15, NO. 1, JANUARY 2015

Fig. 4. Demonstration of reflectance decomposition. (a) Green channel of original brow image. (b) Reflectance image obtained after 100 iterations.

Fig. 2. Compressive logarithmic mapping emphasizes changes at low stimulus levels and attenuates changes at high stimulus levels [19].

reason for this is the realization that ADMMs are extremely useful for objectives if sub-problems can be defined through the use of dummy variables that can be solved efficiently on an individual basis. Using ADMMs, Eq. (4) can be redefined as [24]: arg minξ,l I − ξ2ω + 0.5γ ∇x ∗ l2 + ∇y ∗ l2 +0.5ω l − ξ2

s.t.l = ξ,

(5)

where l denotes L(xy) and γ is a constant. The iterative equations are defined by lk+1 = (ω × I − Rk + σ k × y k )/(ω + σ k ), yfk+1 = σ k ∗ lfk+1 + Rkf / γ ∗ qf + σ k , Fig. 3. Weighting ω at each pixel is proportional to the ratio of the absolute value of the local appearance change with the actual average local appearance. (a) An original brow image. (b) Weighting ω for each pixel.

where the neighborhood stimulus level is the stimulus at point (x, y). Therefore, a good assumption for the reflectance estimation is that the illuminance image typically is spatially smooth, which can naturally be included as a constraint. The idea of the weight image/vector is that not all parts of the illuminance estimate should be equally smooth. Areas of salience (e.g., edges) should be preserved in the illuminance estimate over non-textured areas. However, salience is a relative concept. Gross and Brajovic [20] turned to Weber’s Law which states that the sensitivity threshold to a small intensity change increases proportionally to the signal level. More specifically, ΔI I is relatively constant in terms of perceived salience. However, estimating R(x, y) and L(x, y) in this way is an ill-posed problem, therefore constraints are required. The weighting ω at each pixel is proportional to the ratio of the absolute value of local appearance change with the actual average local appearance defined as ω × (A ∗ I) = (∇x + j∇y ) ∗ I,

(4)

where ∇x and ∇y are the x and y direction edge filters, and A is an averaging filter. Complex j is used here to separate the x-coordinate gradient from the y-coordinate gradient response. The result of applying weighting ω is demonstrated in Fig. 3. The next step is to optimize the objective function using Alternating Direction Method of Multipliers (ADMMs). One

k+1

k

k

R = R + σ ∗ (l k+1 = σ k ∗ 1.1, σ

k+1

−y

k+1

),

(6) (7) (8) (9)

where Rk , y k , σ k and qf are dummy variables used to initialize the Lagrangian. yfk , lfk+1 and Rkf denote the value obtained after applying fast Fourier transform (FFT). Filters are defined in the range [1,-1] as this is the highest fidelity filter possible. Since we are dividing by the filter, when solving the least-squares problem we want to ensure there are no highfrequency components that are not penalized. Therefore, the filter gx is defined as gx =[1 −1] and gy is the transpose of gx. Accordingly qf can be defined by qf = gxf × gxf + gyf × gyf ,

(10)

where gxf , gyf are the values obtained by applying FFT. The result of face reflectance is demonstrated in Fig. 4. B. Ensemble Empirical Mode Decomposition The Hilbert–Huang transform (HHT) is a method for decomposing a signal into intrinsic mode functions (IMF) and obtaining instantaneous frequency data. It is designed to work well for data that are non-stationary and nonlinear. Using the empirical mode decomposition (EMD) method, any complicated data set can be decomposed into a finite and often small number of components, which is a collection of IMF. EMD has no specified “basis”. Its “basis” is adaptively produced depending on the signal itself, which results not only in high decomposition efficiency but also sharp frequency and time localization. A key point is that the signal analysis based

CHEN et al.: IMAGE SENSOR-BASED HEART RATE EVALUATION

621

Fig. 5. Demonstration of IMF decomposition: the green curve is the original input signal, the red curve is the maxima envelope, the blue curve is the minima envelope and the pink curve is the mean envelope. Fig. 6. The first component h1 is obtained by the difference between the original data and their mean m1 .

on HHT is physically significant. Because of its excellent performance, HHT has been widely applied in signal processing and other related fields. An IMF represents a generally simple oscillatory mode as a counterpart to the simple harmonic function. By definition, an IMF is any function with the same number of extrema and zero crossings, with its envelopes being symmetric with respect to zero. The definition of an IMF guarantees a wellbehaved Hilbert transform of the IMF. An IMF is defined as a function that satisfies the following requirements: 1. In the whole data set, the number of extrema and the number of zero-crossings must either be equal or differ at most by one. 2. At any point, the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero. An example of IMF decomposition is provided in Fig. 5. The horizontal and vertical axes respectively denote the time and amplitude. Therefore, an IMF represents a simple oscillatory mode as a counterpart to the simple harmonic function, but it is much more general: instead of constant amplitude and frequency in a simple harmonic component, an IMF can have variable amplitude and frequency along the time axis. The procedure of extracting an IMF is called sifting and consists of the following steps: 1. Identify all the local extrema in the test data. 2. Connect all the local maxima by a cubic spline line as the upper envelope. 3. Repeat the procedure for the local minima to produce the lower envelope. The upper and lower envelopes should cover all the data between them. As demonstrated in Fig. 6, the difference between the data and their mean m1 is the first component h1 as defined by h1 = x(t) − m1 .

(11)

Theoretically the decomposed h1 could be an IMF since h1 satisfies all the requirements. However, original data usually contains significant variations and is thus insufficient to form the basic IMF after only a single iteration. During the process, a new extrema could be produced or shifted since upper and lower envelopes are formed by connecting all the local extrema by a cubic spline line. In addition,

Fig. 7.

c1 is obtained by repeating the sift process up to ten times.

some problems exist for non-linear data, for instance the mean envelope could be different from the real local mean envelope, which would result in a non-symmetric signal no matter regardless of the number of sift process iterations. Despite these problems, the critical components can be extracted from the sift process. In the subsequent sifting process, h1 can only be treated as a proto-IMF. In the 2nd sift process, h1 is treated as the data and h11 is obtained by h11 = h1 − m11 .

(12)

After repeated sifting up to k times, h1 becomes an IMF, that is h1k = h1(k−1) − m1k .

(13)

Accordingly, h1 is designated as the first IMF component from the data by c1 = h1k .

(14)

Figure 7 shows c1 obtained by repeating the sift process 10 times. This component is clearly very close to IMF since it satisfies the IMF requirements. Even though removing a riding wave is needed to obtain an instantaneous frequency, it is also needed to smooth amplitude and thus prevent significant differences from occurring in the neighboring data. However, overuse of the sift process would result in the decomposed IMFs having a fixed amplitude. To make IMFs fit the HHT

622

IEEE SENSORS JOURNAL, VOL. 15, NO. 1, JANUARY 2015

properties (i.e., the variation of amplitude and frequency possesses their physical meaning), we must define the stoppage criterion for the sift process. The stoppage criterion determines the number of sifting steps to produce an IMF. Two different stoppage criteria are used here and are defined as follows: 1. The first criterion is based on the S-number, which is defined as the number of consecutive siftings when the number of zero-crossings and extrema are equal or at most differ by one. Specifically, an S-number is pre-selected. The sifting process will stop only if the numbers of zero-crossings and extrema are equal or at most differ by one for S consecutive iterations. 2. The sifting process stops when the standard deviation SD is smaller than a pre-defined value. SD is defined by T 2 |hk−1 (t) − hk (t)| . (15) SD = h2k−1 (t) t=0 Once a stoppage criterion is selected, the first IMF, c1 , can be obtained. Overall, c1 should contain the finest scale or the shortest period component of the signal. We can then separate c1 from the rest of the data by r1 = x (t) −c1 .

(16)

The residue r1 still contains longer period variations in the data; it is thus treated as the new data and subjected to the same sifting process as described above. This procedure can be repeated to all the subsequent rn by rn = rn−1 −cn .

n

ck (t) + rn (t).

post-process of EEMD. Here we replace c with D to represent the components obtained by EEMD. The first component D1 is processed by EMD and then the first IMF component c1 and a new residual r1 can be obtained. In the next step, the new residual r1 and the 2nd component D2 are added up and are processed again using EMD to generate the 2nd IMF c2 and residual r2 . The steps are defined as follows: EEMD

x (t) −→

(17)

The sifting process finally stops when the residue rn becomes a monotonic function from which no further IMF can be extracted. From the above equations, we obtain x(t) by x (t) =

Algorithm 1 Find Location of Local Maxima Input: time signal Sn of IMF c4 , where n = 1 · · · n Output: the location of local maxima Φ Step 1. Identify whether signal is rising or falling. Take 1st derivative of Sn , namely Sm . Case 1:if Sm (t) < 0, then Mm = −1; Case 2:if Sm (t) = 0, then Mm = 0; Case 3:if Sm (t) > 0, then Mm = 1; Step 2. Find points where signal is rising before and falling after. Case 1:if M1 < 0, Athen A = 1; Case 2:if dif f (Mm ) < 0, Athen B = 1; Case 3:if M1 > 0, Athen C = 1; Λ = max {A, B, C}; Step 3. Compute the number of peaks in Λ. Step 4. Return the location of peaks to Φ.

(18)

n

Dk (t) + rn (t)

k=1 EMD

D1 (t) −→ c1 (t) + r1 (t) EMD

D2 + r 1 (t) −→ c2 (t) + r2 (t) .. . EMD Dk + r k−1 (t) −→ ck (t) + rk (t) .

(19)

k=1

Thus, a decomposition of the data into n-empirical modes is achieved. The components of the EMD are usually physically meaningful, for the characteristic scales are defined by the physical data. In this work, we use the method Ensemble Empirical Mode Decomposition (EEMD) proposed by Wu and Huang [21] to improve the decomposition of IMFs by solving the mode mixing problem. The steps are summarized as follows: (1) Add white noise w(t) to signal. (2) Decompose the data with the added white noise to the n component of the IMF. EEMD is applied to decompose the data to produce several IMFs (c1 until cn ). (3) Repeat steps (1)-(2) with the addition of white noise to the signal but with a different white noise series each time. The process will stop when the value of the SDk is smaller than the pre-given threshold. (4) Obtain the (ensemble) means of the corresponding IMFs of the decompositions as the final result. Although EEMD can solve the mixing mode problem, the decomposed component could not be the real IMF since the EEMD process includes the summation and averaging of IMF. To overcome this problem, in [22] EMD is conducted as the

Using EEMD thus produces a more accurate IMF. Figure 8 provides an example where the original signal of face reflectance shown at the top is decomposed into separate components. The components c1 and c8 are respectively the highest and lowest frequencies among c1 to c8 . The following section provides a detailed description of the heart rate evaluation method and the components selected for measurement.

C. Heart Rate Evaluation Figure 9 shows the spectrum of IMFs c2 to c5 along with the highest frequency for each component. It is clear that the highest frequency of c4 is about 1.3 Hz, which is very close to the normal heart rate frequency of 1.2 Hz. Therefore, IMF c4 is selected for heart rate evaluation. In Fig. 8, each peak of c4 corresponds to every heartbeat and the number of peaks in a unit time interval in fact represents the real heart rate. To detect the peak of each period, the location of the local maxima is computed via Algorithm 1. Using the algorithm, the location of peaks can be found in Φ and the number of peaks corresponds to the number of heartbeats in a time interval T. Accordingly, the real heart rate

CHEN et al.: IMAGE SENSOR-BASED HEART RATE EVALUATION

623

Fig. 9. The spectrum of decomposed IMF2, IMF3, IMF4, IMF5. The highest frequency of IMF4 is close to the frequency 1.2 Hz of normal heart rate.

Fig. 10. Demonstration of the result of heart rate evaluation: (a) strength variation of face reflectance recorded over 30 seconds; (b) the number of peaks detected is 39 and thus the is Hr =78 by eq. (20). The ground truth of this test is also 78 as introduced in top-right corner of (b).

III. E XPERIMENTAL R ESULTS

Fig. 8. Original signal of face reflectance shown in the top figure is decomposed into separate components. The components c1 and c8 are respectively the highest and lowest frequencies among c1 to c8 .

can be computed by Hr = Φ ×

Γ × 60, O

(20)

where O denotes the frame numbers and Γ represents the frame rate. An example of heart rate evaluation is demonstrated in Fig. 10(b), in which the evaluated heart rate is 78 for a male subject.

In the experiment, the ground truth of the heart rate is obtained by a sphygmomanometer TERUMO ES-P310 following the specification of IEC 60601-1-2:2001 with a measurement deviation within ±5%. The camera used was a Panasonic DMC-FX07GT with a capturing distance of 7-9 cm to the brow area. Testing primarily took place indoors. The frame rate was set 30 fps (frames per second) and a total of 900 frames were selected for each heart rate evaluation. The testing data set included 59 video clips recorded from 8 subjects, including 7 males and 1 female with ages ranging from 22 to 31. Figure 11(a) shows the impact on heart rate evaluation of different numbers of iterations k for face reflectance decomposition. The blue, orange and purple bins respectively represent the heart rate obtained by ES-P310 and our proposed framework, and the difference between them. Figure 11(b) shows the difference between the detected heart rate and the ground truth by setting k to 100. The worst evaluation results occur when k is set to 50 and the best performance is obtained by setting k to 100, with the difference between each test result and the ground truth falling almost uniformly in the range ±5%. When k is set to 150 and 200, the evaluation deviation for the 59 tests exceeds ±5% and thus these settings were not used for evaluation. Figure 12(a) shows the average measurement deviation and Fig. 12(b) introduces the average heart rate evaluated using

624

IEEE SENSORS JOURNAL, VOL. 15, NO. 1, JANUARY 2015

Fig. 12. (a) The difference of heart rate between ground truth and our proposed approach using different settings of k. (b) The average heart rate estimated using our method under different settings of k. The stars indicate the average and the corresponding bar indicates the standard deviation.

Fig. 13. Distribution of testing samples shown in the Bland-Altman plot. The best performance is achieved with the smallest range (about 17bpm) when k is set 100.

Fig. 11. (a) Impact on heart rate evaluation of different numbers of iterations k for face reflectance decomposition: the blue, orange and purple bins respectively represent the heart rate obtained by ES-P310 and our proposed framework, and the difference between them. (b) The difference between the detected heart rate and the ground truth by setting k to 100.

different values of k. The best performance with a standard deviation falling within ±5% is achieved by setting k to 100. In Fig. 12(a), when k is set 100, the average standard deviation (denoted as a star) is very close to zero, indicating the heart rate measurement is very robust under these conditions. Figure 13 presents a Bland-Altman plot for performance comparison for different values of k using the mean, standard deviation (SD) of the differences, mean of the absolute differences and 95% limits of agreement with the confidence interval ± 1.96 SD. For k values of 50, 100, 150 and 200, the corresponding 95% confidence interval from 16.6648 to −16.7126, 10.4045 to −6.5401, 17.0116 to −13.9947, and 17.2899 to −14.4085, respectively. The smallest range (about 17 bpm) is achieved when k is set 100.

Fig. 14. Bland-Altman plots obtained by (a) [8] and (b) our proposed framework.

Figure 14 compares the Bland-Altman plots for our proposed framework and the current state-of-the-art [8]. Almost all evaluated results fall into the 95% confidence interval (± 1.96 SD). For fair comparison with the results of [8], the detection range of heart rate is set between 50 and 90. Our proposed framework provides more robust evaluation with a smaller degree of deviation. For quantitative

CHEN et al.: IMAGE SENSOR-BASED HEART RATE EVALUATION

Fig. 15.

Precision measured under different values of k.

625

the facial tissue, during the experiment, it is possible that linearity assumed by EEMD is not representative of the true underlying mixture in the signals. In addition, the physiological changes in blood volume due to motion are not well understood and could also be nonlinear. Nonetheless, given the short time window used for EEMD (30 seconds), a linear model can provide a reasonable local approximation. Subject respiration can complicate the image capturing process in that it sometimes results in moving artifacts, causing a certain degree of distortion in facial appearance and thus affecting the decomposition of facial reflectance. However, the results presented in this work verify the effectiveness of the proposed framework for removing noise under these assumptions. In addition, the recording time for this present work was relatively short and the whole process of heart rate detection cannot be conducted in real-time manner and future work needs to extend the time window to enable long-term, continuous measurements and meanwhile the computation complexity should be reduced to achieve higher accuracy and real-time process. IV. C ONCLUSION

Fig. 16. The analysis between the ground truth and our observation using correlation coefficient.

performance evaluation, the precision for different k settings is measured and is shown in Fig. 15. The threshold to determine whether the evaluated heart rate is correct or not is set 6 bpm. The highest precision (about 84%) is achieved when k is set at 100. In the experiment, with k set to 100 for face reflectance decomposition and IMF 4 for heart rate evaluation, the evaluated heart rate is very close to the ground truth since most sources of environmental noise (e.g., 60Hz ambient light) are removed while maintaining the reflectance resulting from environmental light sources. In order to verify the appropriateness of the use of c4 for describing the heart rate, the correlation coefficient is employed for analyzing the relationship between the ground truth and our observation. As shown in Fig. 16, positive correlation between these two variables by Pearson’s productmoment coefficient and the population correlation coefficient value has achieved 0.9101, which means our proposed approach for heart rate evaluation is effective and the use of IMF 4 is meaningful. Given that, according to the Beer-Lambert law, the reflected light intensity varies nonlinearly with distance traveled through

A novel, robust face-based heart rate monitoring technique is proposed to evaluate heart rate variation without physical contact. Face reflectance is first decomposed from a single image and then heart rate evaluation is conducted from consecutive frames according to the periodic variation of reflectance strength resulting from variations in hemoglobin absorptivity across the visible light spectrum as each heartbeat increases/decreases the blood volume in the blood vessels of the face. To achieve a robust evaluation, ensemble empirical mode decomposition (EEMD) of the Hilbert-Huang Transform (HHT) was been used to acquire the primary heart rate signal while reducing the effect of changes to ambient light. The proposed approach outperforms the current state-of-the-art in terms of providing accurate measurement with a smaller degree of variance, thus demonstrating its applicability in realworld environments. For the non-real time characteristic of the proposed method, it is more useful to be used for Traditional Chinese Medicine (TCM) or for similar analytic systems than for emergency systems. R EFERENCES [1] S. Cook, M. Togni, M. C. Schaub, P. Wenaweser, and O. M. Hess, “High heart rate: A cardiovascular risk factor?” Eur. Heart J., vol. 27, no. 20, pp. 2387–2393, Sep. 2006. [2] L. Politano, A. Palladino, G. Nigro, M. Scutifero, and V. Cozza, “Usefulness of heart rate variability as a predictor of sudden cardiac death in muscular dystrophies,” Acta Myol, vol. 27, pp. 114–22, Dec. 2008. [3] P. Palatini and S. Julius, “The role of cardiac autonomic function in hypertension and cardiovascular disease,” Current Hypertension Rep., vol. 11, no. 3, pp. 199–205, Jun. 2009. [4] M. Kawase et al., “Heart rate variability during massive hemorrhage and progressive hemorrhagic shock in dogs,” Can. J. Anesthesia, vol. 47, no. 8, pp. 807–814, Aug. 2000. [5] W. L. Chen and C. D. Kuo, “Characteristics of heart rate variability can predict impending septic shock in emergency department patients with sepsis,” Acad. Emerg. Med., vol. 14, no. 5, pp. 392–397, May 2007.

626

[6] W. L. Chen, T. H. Tsai, C. C. Huang, J. H. Chen, and C. D. Kuo, “Heart rate variability predicts short-term outcome for successfully resuscitated patients with out-of-hospital cardiac arrest,” Resuscitation, vol. 80, no. 10, pp. 1114–1118, Oct. 2009. [7] J. K. Chiang, M. Koo, T. B. Kuo, and C. H. Fu, “Association between cardiovascular autonomic functions and time to death in patients with terminal hepatocellular carcinoma,” J. Pain Symptom Manage., vol. 39, no. 4, pp. 673–679, Apr. 2010. [8] M. Z. Poh, D. J. McDuffk, and R. W. Picard, “Non-contact, automated cardiac pulse measurements using video imaging and blind source separation,” Opt. Exp., vol. 18, no. 10, pp. 10762–10774, May 2010. [9] C. Takano and Y. Ohta, “Heart rate measurement based on a time-lapse image,” Med. Eng. Phys., vol. 29, no. 8, pp. 853–857, Oct. 2007. [10] W. Verkruysse, L. O. Svaasand, and J. S. Nelson, “Remote plethysmographic imaging using ambient light,” Opt. Exp., vol. 16, no. 26, pp. 21434–21445, Dec. 2008. [11] W. G. Zijlstra, A. Buursma, and W. P. Meeuwsen-van der Roest, “Absorption spectra of human fetal and adult oxyhemoglobin, de-oxyhemoglobin, carboxyhemoglobin, and methemoglobin,” Clin. Chem., vol. 37, no. 9, pp. 1633–1638, Sep. 1991. [12] P. N. Belhumeur, J. P. Hespanha, and D. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997. [13] P. Belhumeur and D. Kriegman, “What is the set of images of an object under all possible illumination conditions?” Int. J. Comput. Vis., vol. 28, no. 3, pp. 245–260, Jul. 1998. [14] A. S. Georghiades, D. Kriegman, and P. N. Belhumeur, “From few to many: Generative models for recognition under variable pose and illumination,” in Proc. IEEE PAMI, Mar. 2000, pp. 277–284. [15] T. Riklin-Raviv and A. Shashua, “The quotient image: Class-based re-rendering and recognition with varying illumination conditions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 2, pp. 129–139, Feb. 2001. [16] A. S. Georghiades, D. Kriegman, and P. N. Belhumeur, “Illumination cones for recognition under variable lighting: Faces,” in Proc. IEEE Conf. CVPR, Jun. 1998, pp. 23–25. [17] V. Blanz, S. Romdhani, and T. Vetter, “Face identification across different poses and illuminations with a 3D morphable model,” in Proc. IEEE Conf. Autom. Face Gesture Recognit., May 2002, pp. 20–21. [18] B. K. P. Horn, Robot Vision. Cambridge, MA, USA: MIT Press, 1986. [19] B. A. Wandel, Foundations of Vision. Sunderland, MA, USA: Sinauer, 1995. [20] R. Gross and V. Brajovic, “An image preprocessing algorithm for illumination invariant face recognition,” in Proc. 4th Int. Conf. Audio-VideoBased Biometric Person Authentication (AVBPA), Jun. 2003, pp. 10–18. [21] Z. Wu and N. E. Huang, “Ensemble empirical mode decomposition: A noise-assisted data analysis method,” Centre Ocean-Land-Atmos. Stud., Tech. Rep. Ser., vol. 193, no. 173, pp. 1–41, 2004. [22] Z. Wu, and N. E. Huang, “Ensemble empirical mode decomposition: A noise-assisted data analysis method,” Adv. Adapt. Data Anal., vol. 1, no. 1, pp. 1–41, 2008. [23] D. P. Reid, Chinese Herbal Medicine. Hong Kong: CFW Publications Ltd., 1997, p. 26. [24] S. Lucey. Illumination Invariance Through ADMMs. [Online]. Available: http://www.simonlucey.com/category/research/, accessed 2012.

Duan-Yu Chen received the B.S. degree in computer science and information engineering from National Chiao Tung University, Hsinchu, Taiwan, in 1996, the M.S. degree in computer science from National Sun Yat-sen University, Kaohsiung, Taiwan, in 1998, and the Ph.D. degree in computer science and information engineering from National Chiao Tung University in 2004. He was a PostDoctoral Research Fellow with Academia Sinica, Taipei, Taiwan, from 2004 to 2008. He is currently an Associate Professor with the Department of Electrical Engineering, Yuan Ze University, Chung-Li, Taiwan. He was a recipient of the Young Scholar Research Award from Yuan Ze University in 2012. His research interests include computer vision, video signal processing, contentbased video indexing and retrieval, and multimedia information system.

IEEE SENSORS JOURNAL, VOL. 15, NO. 1, JANUARY 2015

Jun-Jhe Wang received the B.S. degree in electrical engineering from Chung Hua University, Hsinchu, Taiwan, in 2008, and the master’s degree from the Department of Electrical Engineering, Yuan Ze University, Chung-Li, Taiwan, in 2013. His research interests include computer vision, video signal processing, and video surveillance system.

Kuan-Yi Lin received the B.S. degree in automatic control engineering from Feng Chia University, Taichung, Taiwan, in 2007, and the M.S. degree from the Department of Electrical Engineering, Yuan Ze University, Chung-Li, Taiwan, in 2009. He is currently pursuing the Ph.D. degree at the Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan. His research interests include computer vision, video signal processing, and video surveillance systems.

Hen-Hong Chang received the M.D. degree in medicine and traditional Chinese medicine (TCM) from China Medical College, Taichung, Taiwan, in 1980, the M.P.H. degree from the Epidemiology Division, School of Public Health, University of California at Los Angeles, Los Angeles, CA, USA, in 1992, and the Ph.D. degree from China Medical College in 1994. He is resourceful in both TCM and modern medicine. He established a laboratory in the medical center, worked in cooperation with biomedical engineers and statisticians, specialized in standardization and modernization of diagnostic procedure in TCM, and completed scientific assessment of therapeutic effects in several traditional Chinese herbal formulas. He has acquired several important positions in the field of TCM, including the Director of the Institute of Chinese Medical Science at China Medical University, and Chang Gung University Superintendent of the Taipei City Hospital of Traditional Chinese Medicine, and a Member of the Committee of Chinese Medicine and Pharmaceutics. He is currently a leader of the Chang Gung Memorial Hospital medical team of TCM, and is recognized for over 30 years of contribution to the medical community.

Han-Kuei Wu received the M.D. degree in medicine and traditional Chinese medicine (TCM), and the M.S. degree in TCM diagnosis and hemodynamics from Chang Gung University, Taoyuan, Taiwan, in 2010 and 2013, respectively. He is currently pursuing the Ph.D. degree at the Institute of Traditional Medicine, National Yang-Ming University, Taipei, Taiwan. His major field of study is TCM pulse diagnosis and the clinical assessment on hemodynamics.

CHEN et al.: IMAGE SENSOR-BASED HEART RATE EVALUATION

Yung-Sheng Chen (M’93–SM’13) received the B.S. degree from Chung Yuan Christian University, Zhongli, Taiwan, in 1983, and the M.S. and Ph.D. degrees from the National Tsing Hua University, Hsinchu, Taiwan, in 1985 and 1989, respectively, all in electrical engineering. He was a recipient of the Best Paper Award from the Chinese Institute of Engineers in 1989, and an Outstanding Teaching Award from the Yuan Ze University, Chung Li, Taiwan, in 2005. In 1991, he joined the Department of Electrical Engineering, Yuan Ze University, where he is currently a Full Professor. Since 1998, his name has been listed in the Who’s Who in the World. He also currently serves as an Associate Editor of International Journal of Machine Learning and Cybernetics, and an Editorial Board Member of International Scholarly Research Notices. He is currently an invited Editor of the book entitled Image Processing (InTech). His interests include human visual perception, computer vision, circuit system design, and information management.

627

Suh-Yin Lee received the B.S. degree in electrical engineering from National Chiao Tung University, Hsinchu, Taiwan, in 1972, the M.S. degree in computer science from the University of Washington, Seattle, WA, USA, in 1975, and the Ph.D. degree in computer science from the Institute of Electronics, National Chiao Tung University. Her research interests include content-based indexing and retrieval, distributed multimedia information system, mobile computing, and data mining.

Vagally mediated heart rate variability and heart rate ...

Cheap Finger Earlobe Clip Pulse Rate Sensor Heart Rate Monitor ...

Heart rate turbulence (HRT)

heart rate monitor training pdf

Heart rate variability Standards of measurement, physiological ...

Heart Rate Monitor Technical Manual.pdf

Strain rate sensitivity of face-centered-cubic ...

Online PDF Face with A Heart: Mastering Authentic ...

Evaluation of Watermarking Low Bit-rate MPEG-4 Bit ... - CiteSeerX

self-implication and heart rate variability during ...

Evaluation of Watermarking Low Bit-rate MPEG-4 Bit ... - CiteSeerX