3741

A Generalized Data Detection Scheme Using Hyperplane for Magnetic Recording Channels With Pattern-Dependent Noise Seiichi Mita and Vo Tam Van Toyota Technological Institute, Hisakata, Tempaku, Nagoya 468-8511, Japan We propose a novel data-detection scheme using support vector machine techniques in the presence of pattern-dependent noise on magnetic recording channels. First, the log-likelihood ratios (LLRs) of data series were generated using the Bahl–Cocke–Jelinek–Raviv algorithm. Second, these LLRs were mapped to a 3-D space, and hyperplanes for data discrimination were generated using the radialbasis-function kernel. Third, the LLR of each bit was rescaled on the basis of the distance from the hyperplanes and then fed to an LDPC decoder. We evaluated the performance of the proposed method by retrieving a real data series from a perpendicular magnetic recording channel, and obtained a bit-error rate of approximately 10 3 . For projective geometry–low-density parity-check codes with a code rate of 0.93, the proposed method can reduce the iteration number for a sum product algorithm using conventional LLRs by approximately half. Index Terms—Hyperplane, partial response, projective geometry–low-density parity-check (PG–LDPC) codes, support vector machine (SVM).

I. INTRODUCTION HE choice of a data detection method is a key consideration in the implementation and development of hard-disk drives (HDDs) with higher recording density and higher data reliability. An increase in areal recording density has caused a severe deterioration in signal performance, such as from patterndependent noise and nonlinear distortion, and many methods have been proposed to overcome these problems [1]–[4]. In particular, considerable progress has been made in the theoretical study of maximum a posteriori (MAP) and maximumlikelihood detectors, which can be applied to intersymbol-interference (ISI) and pattern-dependent noise channels [5]. This method can reduce the data errors quite effectively at the cost of complexity. However, this method can only be applied to a 1-D data series. The signal retrieved from a magnetic disk with a very narrow track width in the near future will suffer significant deterioration from 2-D pattern dependency. In this environment, each combination of recording patterns needs its own appropriate threshold for data discrimination. Thus, we focused our attention on the search for a more general method, in order to reduce the errors due to multidimensional pattern-dependent noise. The realization of a novel detection method requires a technique by which curved surfaces for thresholds can be arbitrarily generated. The introduction of the support vector machine (SVM) [6] to the data detection process may provide an answer to this requirement. We already checked the effectiveness of using SVM by simulation [7]. This method can intuitively and directly reflect the pattern dependency of the noise included in a data series. In this study, we investigated the feasibility of a novel data detection method based on the radial basis function (RBF) kernel using a real perpendicular recording data series. Moreover, we evaluated the performance of the proposed

T

Manuscript received March 05, 2009; revised April 26, 2009. Current version published September 18, 2009. Corresponding author: S. Mita (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMAG.2009.2023236

Fig. 1. Signal flow for performance evaluation.

method by applying it to low-density parity-check (LDPC) decoding. This paper is organized as follows. Section II shows the outline of the proposed method. Section III shows the results of data detection by using the proposed method. Section IV shows the results of its application to iterative detection. Finally, the paper is concluded in Section V. II. OUTLINE OF MEASUREMENT PROCESS AND SVM A. Measurement Process The signal flow for the performance evaluation is shown in Fig. 1. Analog signals retrieved from a read channel are sampled at adequate timing instants, fed to a 6-b analog-to-digital converter (ADC), and then subjected to a simple dc restoration process that uses signal envelope detection. In the next step, these data series are equalized to partial response class1 (PR1) , using or class3 (PR3), a channel response of a nine-tap transversal filter. Then, log-likelihood ratios (LLRs) are calculated by applying the BCJR algorithm [8] to the PR1 or PR3 channel. In the case of additive noise, LLRs are independent of each other. However, in the case of pattern-dependent noise, LLRs are dependent on the adjacent bits. Let LR(nT), where denote the LLR series for each bit and T is a bit period. For simplicity, we consider a combination of three-bit values . These combinations are mapped to the 3-D LLR’s space by assigning the first bit of the three-bit as the axis, the second bit as the axis, and the third bit as the axis for the coordinates. We evaluated two kinds of data series—Data1 and Data2—shown in Table I. These data

0018-9464/$26.00 © 2009 IEEE Authorized licensed use limited to: Toyota Technological Institute. Downloaded on October 18, 2009 at 22:15 from IEEE Xplore. Restrictions apply.

3742

IEEE TRANSACTIONS ON MAGNETICS, VOL. 45, NO. 10, OCTOBER 2009

TABLE II STANDARD DEVIATION OF LLR’S DISTRIBUTION OF EACH PATTERN

Fig. 2. Distribution of LLRs in 3-D LLR space.

Fig. 3. Top view of LLRs’ distribution.

TABLE I BER OF THE MEASURED DATA SERIES

series, which had a bit-error rate (BER) of around 10 , were obtained by increasing the linear recording density of a 2.5-in hard-disk drive (HDD). The data transfer rate for this data series was around 600 Mb/s, and the linear recording density was was obtained nearly 1000 kfci. For Data1, a BER of for the PR1 detector and a BER of for the PR3 detector. Data 2 showed a BER of approximately 10 for both detectors. An example of the LLRs’ distribution is shown in Fig. 2. A top view of the 3-D space of the LLRs’ distribution is shown in Fig. 3. It can be easily observed that the distributions of the “000” and “111” three-bit patterns were different from those of “010” and “101.” The former patterns were scattered uniformly. On the other hand, the latter patterns, which include at least two flux changes, have strong correlations between the axes. Table II shows the standard deviation of LLR’s distribution of each pattern. Each deviation was normalized by the distance between the center of each distribution and an origin. Each pattern has a different standard deviation from the others. In particular, the “010” and “101” patterns have larger standard deviations than the other patterns. B. Support Vector Machine (SVM) The SVM (see WEB) is a classifier introduced by Vapnik [6], which can realize the same performance as the so-called artificial neural networks (ANNs) for classification. Generally, neural networks have the problem of a local minimum. On the other hand, the SVM is mathematically transparent, and can provide global and unique solutions. There are two types of SVMs: 1) linear and 2) nonlinear. We will focus mainly on the use of

Fig. 4. Basic concept of SVM.

the nonlinear SVM. With an appropriate nonlinear mapping to a sufficiently high dimension, data from the two categories can always be separated by a hyperplane. However, this requires several computations. The kernel trick invented by Vapnik makes it possible to work in high-dimensional feature spaces without actually having to perform explicit computations in this space, as shown in Fig. 4. A linear SVM is a classifier having a discriminant function , a linear combination of the components of x (i.e., hyperplane for data separation), given as (1) where denotes the data vector, namely a three-bit combination in an LLR’s space, is the weight vector, and b is the , with bias weight. Consider a given training set of class labels , and is input data , output data is represented by a the sample number. The weight vector linear combination of this training data set (2) In this case, the linear classifier

is defined as (3)

values are set to zero through Several of the resulting quadratic programming. Therefore, the computation of the data classification can be remarkably reduced. Vectors of nonzero values are known as support vectors. This implies that the decision boundary is uniquely determined by the support vectors and this decision boundary is called a hyperplane. It should be noted that the distance between the hyperplane and each support vector is normalized to 1. A nonlinear classifier

Authorized licensed use limited to: Toyota Technological Institute. Downloaded on October 18, 2009 at 22:15 from IEEE Xplore. Restrictions apply.

MITA AND VAN: GENERALIZED DATA DETECTION SCHEME

can easily be achieved by replacing kernel kernel trick

3743

with the RBF , thanks to the

(4) where

is a constant. Fig. 5. Example of hyperplane between “000” and “010.”

III. GENERATION OF HYPERPLANE USING SVM In this section, we explain the generation of hyperplanes for data discrimination. In a three-bit combination, we use the center bit for data discrimination. We assume that only one pattern-dependent error event occurs in the cases that follow. 1) One bit of data is slipped to an adjacent bit period. 2) One bit of data is expanded to an adjacent bit period. 3) One bit of data is split to both adjacent bit periods. 4) One bit of data is erased. In these situations, the number of hyperplanes necessary for data discrimination is limited to nine surfaces (S1–S9), as follows. A “010” pattern makes the neighboring surfaces have patterns, such as “000” (S1), “001” (S2), “100” (S3), and “101” (S4). It should be noted that no error occurs for the center bit of a three-bit combination when the “010” pattern changes to “011,” “110,” or “111.” Similarly, a “101” pattern makes the neighboring surfaces have patterns such as “011” (S5), “110” . Moreover, a “100” pattern (S6), “111” (S7), and “010” makes the neighboring surface between a “110” pattern (S8). Similarly, a “001” pattern makes the neighboring surface between a “011” pattern (S9). In order to generate adequate hyperplanes for data discrimination, we carefully selected training data samples that satisfied the following conditions: 1) the distance between the center of each distribution and the selected training sample was larger than the value of each standard deviation; 2) the training data samples did not include error bits; 3) the values of the training data samples were rescaled so that most of the error bits were included within the distance of from a hyperplane. In particular, condition 1) was important for generating adequate hyperplanes that included training data samples with very large LLRs, because of the use of the nonlinear kernel SVM. An example of a hyperplane that separates the “000” and “010” patterns is shown in Fig. 5. We can clearly see that an adequate surface for the discrimination of the two patterns is different from a surface parallel to the axis (i.e., a conventional one). The numbers of errors that occurred between various patterns in Data1 (132 000 b) are presented in Table III. PR3 was used as a channel response. The total number of errors was slightly reduced by using adequate hyperplanes, even if the number of errors for some patterns increased. From these results, we could say that the LLRs rescaled by the distance from the hyperplanes could reduce the bias due to the pattern dependence included in the conventional LLRs. The frequency distributions of the conventional LLRs

Fig. 6. Comparison of LLRs’ distributions. (a) Conventional plane. (b) Hyperplanes.

TABLE III NUMBER OF ERRORS INCLUDED IN DATA1

and the rescaled LLRs are presented in Fig. 6(a) and (b). The conventional one is normalized by the average LLRs. Moreover, Table IV shows the numbers of correct and incorrect LLRs in the conventional plane and the hyperplanes. This comparison uses Data1 and PR3. The incorrect LLRs cause errors. We can see that the conventional LLRs included three times the number of correct samples than those of the hyperplanes for nearly the same number of errors. In other words, the conventional LLRs included numerous ambiguous samples. Therefore, we can expect the use of the new LLRs to reduce the number of iterations in the LDPC decoding process.

Authorized licensed use limited to: Toyota Technological Institute. Downloaded on October 18, 2009 at 22:15 from IEEE Xplore. Restrictions apply.

3744

IEEE TRANSACTIONS ON MAGNETICS, VOL. 45, NO. 10, OCTOBER 2009

TABLE IV COMPARISON OF NUMBERS OF CORRECT AND INCORRECT LLRS

Fig. 7. Reduction of the iteration number using LLRs rescaled by the distance from the hyperplane generated by the SVM. (a) PG–LDPC code with a code rate of 0.95. (b) PG–LDPC code with a code rate of 0.93.

applied to the PG–LDPC code with a code rate of 0.949, all of the errors could be corrected after seven or nine iterations. In these figures, we used two kinds of hyperplanes. HP1 (the best case) was generated from a data series that included several errors in one sector. On the other hand, HP2 (the worst case) was generated from a data series that included several tens of errors in one sector. The BER of HP2 was slightly worse than that of PR3 before the iteration and at the first iteration. Nevertheless, the error-correcting capability outperformed that of PR3 after several iterations. This fact is proof of the robustness of the proposed method. The PG–LDPC code with a code rate of 0.931 could correct all of the errors. The number of iterations of HP1 could be reduced by approximately half, compared to that for PR3. The performance improvement for PR1 was nearly the same as that of PR3. As a comparison, the performance improvement results of a four-state PR1 with branch-metric paths for a pattern-dependent noise (PDN) reduction [3] using the same real data and LDPC codes are shown in Fig. 8, [10]. The use of the four-state PR1 reduced BER by more than one order. However, this did not reduce the number of iterations for Data1. V. CONCLUSION The proposed method using SVM was able to reduce the number of iterations of the sum product algorithm using conventional LLRs by approximately half for a data series with a BER of approximately when using PR3 and the shortened PG–LDPC code with a code rate of 0.93. This study only used a 3-D space. Due to the nature of the SVM, the dimensions could easily be expanded to more than three. ACKNOWLEDGMENT This work was supported in part by Kakenhi No. 19560398, in part by the Storage Research Consortium (SRC), and in part by the NEDO project. The authors would like to thank student F. Haga, who assisted with the data analysis. REFERENCES

Fig. 8. BER of a four-state PR1 with PDN reduction paths.

IV. PERFORMANCE EVALUATION A. PG–LDPC Code In this evaluation, we used a type-2 PG–LDPC code [9] with a code length of 5797, redundant bit length of 296, and code rate of 0.949. Moreover, we estimated another code, the shortened PG–LDPC code, which had a code length of 4296. The code rate of the shortened PG–LDPC code was set to 0.931. As is well known, these LDPC codes do not include four cycle. We used the sum-product algorithm (SPA) as a decoding algorithm. B. Performance Evaluation of Error-Correcting Codes We evaluated the performance of these codes for Data1. Fig. 7(a) and (b) shows examples of the decoding performance of both the PG–LDPC code and the shortened code. When we used conventional LLRs for Data1, the PG–LDPC code with a code rate of 0.949 could not correct all of the errors, even after nine iterations. On the other hand, when rescaled LLRs were

[1] J.-G. Zhu and H. Wang, “Noise characteristics of interacting transitions in longitudinal thin film media,” IEEE Trans. Magn., vol. 31, no. 2, pp. 1065–1070, Mar. 1995. [2] J. Moon and J. Park, “Pattern-dependent noise prediction in signal dependent noise,” IEEE J. Sel. Areas Commun., vol. 19, no. 4, pp. 730–743, Apr. 2001. [3] S. Mita, “A robust detector based on a combination of PR1 and EEPR4 for perpendicular magnetic recording,” IEEE Trans. Magn., vol. 42, no. 10, pp. 2567–2569, Oct. 2006. [4] Z. Wu, P. H. Siegel, J. K. Wolf, and N. Bertram, “Mean-adjusted pattern-dependent noise prediction for perpendicular recording channels with nonlinear transition shift,” IEEE Trans. Magn., vol. 44, no. 11, pp. 3761–3764, Nov. 2008. [5] A. Kavcic and J. M. Moura, “The Viterbi algorithm and Markov noise memory,” IEEE Trans. Inf. Theory, vol. 46, no. 1, pp. 291–301, Jan. 2000. [6] V. Vapnik, The Nature of Statistical Learning Theory. Berlin, Germany: Springer-Verlag, 1995. [7] S. Mita, “A generalized data discrimination scheme using kernel machines in the presence of pattern-dependent noise,” J. Magn. Magn. Mater., vol. 287, pp. 426–431, 2005. [8] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inf. Theory, vol. IT-20, no. 2, pp. 284–287, Mar. 1974. [9] Y. Kou, S. Lin, and M. Fossorier, “Low-density parity-check codes based on finite geometries: A rediscovery and new results,” IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2711–2736, Nov. 2001. [10] S. Mita and H. Matsui, “Performance comparison of various error correcting strategies using perpendicular magnetic recording data series,” InterMag Dig., p. 1511, 2008, (HT03).

Authorized licensed use limited to: Toyota Technological Institute. Downloaded on October 18, 2009 at 22:15 from IEEE Xplore. Restrictions apply.