Reducing false-positive detections by combining two stage-1 computer-aided mass detection algorithms a

Noah D. Bedarda, Mehul P. Sampata, Patrick A. Stokesa, and Mia K. Markey a * Department of Biomedical Engineering, The University of Texas at Austin, Austin, TX 78712, USA ABSTRACT

In this paper we present a strategy for reducing the number of false-positives in computer-aided mass detection. Our approach is to only mark "consensus" detections from among the suspicious sites identified by different “stage-1” detection algorithms. By “stage-1” we mean that each of the Computer-aided Detection (CADe) algorithms is designed to operate with high sensitivity, allowing for a large number of false positives. In this study, two mass detection methods were used: (1) Heath and Bowyer’s algorithm based on the average fraction under the minimum filter (AFUM) and (2) a low-threshold bi-lateral subtraction algorithm. The two methods were applied separately to a set of images from the Digital Database for Screening Mammography (DDSM) to obtain paired sets of mass candidates. The consensus mass candidates for each image were identified by a logical “and” operation of the two CADe algorithms so as to eliminate regions of suspicion that were not independently identified by both techniques. It was shown that by combining the evidence from the AFUM filter method with that obtained from bi-lateral subtraction, the same sensitivity could be reached with fewer false-positives per image relative to using the AFUM filter alone. Keywords: computer-aided detection, mammography, breast cancer

1. INTRODUCTION The American Cancer Society estimates that 211,240 women will be diagnosed with breast cancer in the U.S. in 2005 [1] and 40,410 women will die of the disease. In the US, breast cancer is the most common form of cancer among women and is the second leading cause of cancer deaths, after lung cancer [1]. Women in the U.S. have about a 1 in 8 lifetime risk of developing invasive breast cancer [2, 3]. Early detection of breast cancer increases the treatment options for patients and also increases the survival rate. Screening mammography, x-ray imaging of the breast, is currently the most effective tool for early detection of breast cancer. Screening mammographic examinations are performed on asymptomatic woman to detect early, clinically unsuspected breast cancer. Two views of each breast are recorded: the craniocaudal (CC) view, which is a top to bottom view, and a mediolateral oblique (MLO) view, which is a side view taken at an angle. However, mammography is not perfect. Detection of suspicious abnormalities is a repetitive and fatiguing task. For every thousand cases analyzed by a radiologist, only 3 to 4 are cancerous and thus an abnormality may be overlooked. As a result, radiologists fail to detect 10-30% of cancers [4-6]. Approximately two-thirds of these false-negative results are due to missed lesions that are evident retrospectively [7]. Computer-Aided Detection (CADe) systems have been developed to aid radiologists in detecting mammographic lesions that may indicate the presence of breast cancer [8]. These systems act only as a second reader and the final decision is made by the radiologist. It is important to realize that mammographic image analysis is an extremely challenging task for a number of reasons. First, since the efficacy of CADe systems can have very serious implications, there is a need for near perfection. Second, the large variability in the appearance of abnormalities makes this a very difficult image analysis task. Finally, abnormalities are often occluded or hidden in dense breast tissue, which makes detection difficult.

*[email protected]; phone: +1.512.471.1771; fax: +1.512.471.0616; http://www.bme.utexas.edu/research/informatics/

!"#$%&'()*&+$,+(-../0()*&+"(123%"44$,+5("#$6"#(78(934":;(!<(="$,;&2#65(934$",(1<(><(1'?$*5 123%<(3@(A1)B(C3'<(/DEE5(/DEEFG5(H-../I(J(.-KKLKM/NO./OPDF(J(#3$0(D.
Although CADe algorithms can help radiologists identify true lesions in mammograms, they often also mark false areas in the tissue that can distract the radiologist. Most CADe algorithms consist of two stages. The aim of “stage 1” is to achieve a high sensitivity and the aim of the “stage 2” is to reduce the number of false positives per image. For detecting masses, two such “stage 1” algorithms are the AFUM mass detection method [9] and the bilateral subtraction method [10-12]. The basic principle of the AFUM method is that it compares the relative gray scale intensities of potential masses to the surrounding tissue. The AFUM value represents the extent to which the neighboring region of a point radially decreases in intensity. Thus, points with a high AFUM value are suspicious for being masses as masses generally have greater intensities than normal tissue [9]. Previous studies have demonstrated that bilateral subtraction of mammograms is effective for detecting masses [1012]. The mammograms of an individual patient are commonly analyzed for abnormalities by comparing the images for asymmetries, such as comparing the right and left breast images or a current image to a previous one. Even though the density of breast tissue varies widely patients, the left and right breasts of a patient should be similar and should change little between screenings, so regions of asymmetry between images are suggestive of abnormality. Our hypothesis is that the outputs of multiple CADe algorithms that each have high sensitivity but which are based on different image processing principles will tend to agree for true-positive detections but disagree for false-positive detections. Thus, by taking a "consensus" from the results of two or more CADe algorithms, the number of false positives can be reduced while maintaining the sensitivity of the component algorithms. In this study, a logical "and" of CADe outputs was used to identify consensus detections from the AFUM and bilateral subtraction methods, and the performance of the consensus detection method was compared to that of the AFUM alone.

2. METHODOLOGY 2.1 Database The images used in this study were obtained from the Digital Database for Screening Mammography (DDSM). There are 2,620 cases in the DDSM, each containing the cranio-caudal (CC) and mediolateral-oblique (MLO) views of each breast, as well as boundary files of the image abnormalities outlined by a radiologist. The datasets used in this study were the same as that used in the original AFUM dataset presented in Heath et al. [9], with the exception of 26 images that were excluded because of inaccuracies in the segmentation preprocessing step. This dataset was randomly selected from the DDSM by Heath et al. [13] from the DDSM database. The training and testing datasets are each composed of 140 and 150 left/right pairs of CC and MLO images from the DDSM. There was no more than one mass per image. The volumes and case numbers of the images used are shown in Table 1. 2.2 Segmentation of Breast Region We encountered difficulties in using the code of Heath et al. [13] for segmenting the breast region. Thus, the segmentation algorithm of Gavrielides et al. [14] with an additional smoothing step was employed for this study. The smoothing step consisted of an opening and closing operation with a 20 pixel radius. The results of the segmentation algorithm for one case are shown in Figure 1. The segmentation worked correctly in all images except where labels and other artifacts were close to the breast region; this occurred in 14 images from the original dataset of Heath et al., typically on mammograms of large breasts. These 14 images as well as their corresponding pairs were removed from the analysis. 2.3 AFUM filter based mass detection algorithm The first stage-1 CADe algorithm used was the AFUM method of Heath and Bowyer [9], which is available from the Digital Database of Screening Mammography (DDSM) website, with the modification that the breast region segmentation algorithm of Gavrielides et al. [14] was used as described above. The AFUM applies an average fraction under the minimum filter to analyze the degree to which a region around a point decreases in intensity. Each pixel is assigned an AFUM value by this method and using this information, points are marked as suspicious sites for masses [9].

123%<(3@(A1)B(C3'<(/DEE((/DEEFGL-

The basic principle of this method is that it compares the relative gray scale intensities of potential masses to the surrounding tissue. The AFUM value represents the extent to which the neighboring region of a point radially decreases in intensity. Thus, points with a high AFUM value are suspicious for being masses as masses generally have greater intensities than normal tissue [9]. A total of 15 points per image were marked by the AFUM method so that a high sensitivity would be obtained. The c functions “afumfeature” and “detect” obtained from were run for each image [13]. The output of the detect function was a list of points with a corresponding level of suspicion. The points are ordered based on suspicion and location. As explained by Heath et al. [9], points that are within 5 mm of a highly suspicious point are ordered toward the end so that the first several points represent a high level of suspicion in areas at least 5 mm apart. To be consistent with this procedure, the first 15 points of the output of “detect” were used. 2.4 Bilateral subtraction based mass detection algorithm The second stage-1 CADe algorithm used was bilateral subtraction. Subtraction is a well documented method for identifying masses in digital mammograms [10-12]. To eliminate artifacts from outside the breast region, pixels outside the segmented breast region were set to zero. The variables explored during the development of the subtraction algorithm were subtraction threshold level, image alignment, and morphological opening and closing. These parameters were empirically optimized using the training set only. To obtain a high sensitivity the subtraction threshold was set low, often marking most of the area of the mammogram as suspicious for a mass. The subtraction threshold for each image was set equal to 0; this meant that suspicious areas in one mammogram were areas that had greater pixel intensity than the corresponding mammogram. In an effort to most effectively align two bilateral mammograms, a centroid and major axis was calculated for each mammogram mask. The line defining the major axis was made up of two points: the left most point on the lower twothirds of the image and a point on the chest wall located half way between the centroid and the left most point. Using the major axis in each image, the difference in slope between the two images was calculated. The images were then aligned by their centroids and one was rotated so that the angle of each image's major axis was equal. The outer edges of the mammograms were finally zero padded to result in two aligned images of the same size. The centroid and major axis are identified in Figure 2 and alignment of two mammograms based on centroid and major axis are shown in Figure 3. To ensure that one image did not have much greater image intensity than its pair, each image was multiplied by the median of the corresponding image. For each resized image the opposite image was then subtracted, creating two resized image subtractions. The threshold was applied to each subtraction, where any pixel value above the threshold was defined as 1 and any value below as 0. Each threshold image was then multiplied by its corresponding mask—all white regions were considered as suspicious regions for masses, shown in Figure 4. The images were then rotated back to their original positions and sized back to their original dimensions. 2.5 Combination of the AFUM and the bilateral subtraction methods To evaluate the performance of the AFUM method alone, a template of the ground truth region was created using code presented in Heath et al [9]. Each point marked by the AFUM method was determined to be inside or outside the ground truth. In Heath et al. [9] it was unclear whether additional points in the ground truth region were disregarded or counted as false positives. In this study, only one point in the ground truth region was considered a true positive and all other points within the ground truth were disregarded., so to eliminate bias of false positive reduction toward our system, additional points within the ground truth were ignored. This evaluation was done for 15 points where the false positives per image could range from 0 to 15. The AFUM points were then separately compared to the subtraction image. Any AFUM point that was in a positive subtraction region was evaluated as either inside or outside the ground truth region and evaluated as before. If an AFUM point was not within the subtraction region, the point was not considered marked and the FP and TP counts were not advanced. Therefore the second set of points represents only the areas where both the AFUM and subtraction algorithms marked as suspicious sites. For comparison, the results after the combination of the two algorithms are shown in Figure 5.

123%<(3@(A1)B(C3'<(/DEE((/DEEFGLQ

  -?< I =@>LI< F= K?< 8C>FI@K?D =FI K?< @D8>< :8J< 7  :8J< :FEJ@JKJ F= 8 JLI<  J?FNJ K?< J<>D K?< J<>DFI@K?D GIFGFJ<; 9P !8MI@ED K?< :< -?< D8AFI 8O< F= K?< :8J< 7 8I< J?FNE @E @>LI<  -?< 8C@>E<; @D8>DFI@K?D 8I< J?FNE @E @>LI<  8E;  #E @>LI<  N< J?FN 8 :FDG8I@JFE F= K?< ;FI@K?D N@K? 8E; N@K?FLK K?< JL9KI8:K@FE D K?< KNF D 8E; K ;8K8J J J K?< :FEJ<

    -?@J JKL;P ; K?< :FEJ< T < 8C>FI@K?DJ :8E I<;L:< K?< ELD9 < 8E; @J EFK JG<:@=@: KF K?< G8IK@:LC8I KPG< F= C@E> DF;8C@KP LJ<; @E K?@J JKL;P FI K?< G8IK@:LC8I G8@I F= < 8C>FI@K?DJ :FD9@E<; @E K?@J JKL;P 8 C8I>?C@>?K 8I<8J F= JLJG@:@FE :FLC; 8;M8E:< K?@J NFIB #DGIFM@E> K?< JL9KI8:K@FE 8C>FI@K?D NFLC; ??E<; @D8>E K?< @D8> KF K?< EDDP I< =FI 9FK? K?< KI8@E@E> 8E; K?< K J KNF SJK8>< T < 8C>FI@K?DJ -?< @EKL@K@FE @J K?8K < 8C>FI@K?DJ ;E<; 98J<; FE =LE;8D? J@E> DF;8C@KP 8E; E8KLI8CCP
 -?8EBJ KF I '8I@FJ !8MI@ LJ KF LJ< ?@J J<>D@M< 8 JG<:@8C K?8EBJ KF FLI JPJK
123%<(3@(A1)B(C3'<(/DEE((/DEEFGLE

Training Set (156 images, 78 masses) (Cancer Volume Number) Case Number (07)1118 (07)1134 (06)1156 (08)1229 (10)1589 (10)1592 (10)1620 (10)1622* (14)1908

(07)1217 (11)1222 (07)1224 (10)1587 (11)1726 (11)1790 (14)1896 (06)1203 (06)1212

(08)1486 (14)1520 (08)1557 (11)1720 (06)1163 (07)1166 (06)1174 (08)1417 (08)1467

(11)1693 (10)1700 (07)1159 (11)1236 (11)1252 (07)1262 (08)1403 (10)1642 (11)1671

Test Set (156 images, 80 masses) (Cancer Volume Number) Case Number (06)1112 (07)1114 (06)1122 (07)1127 (06)1140 (07)1147 (07)1149 (06)1155 (06)1168 (06)1169

(06)1171 (07)1207 (06)1211 (07)1228 (07)1233 (07)1234 (07)1237 (07)1247^ (07)1258 (08)1401

(08)1416 (08)1468 (08)1485* (08)1504 (08)1510 (10)1573* (10)1577 (10)1618 (10)1628 (14)1999

(10)1669 (11)1673 (11)1674 (11)1804 (11)1821 (11)1827 (14)1892 (14)1906 (14)1985

Table 1: This table shows the volume number and case numbers of the images used in this study. All the images were obtained from the DDSM database [13]. (*only CC used; ^only MLO used)

Figure 1: This figure shows the output of the segmentation algorithm proposed by Gavrielides et al. [14], for the MLO and CC views

Figure 2: The major axes (in red) and the centroids (in blue) were identified for images of both the MLO and CC views.

123%<(3@(A1)B(C3'<(/DEE((/DEEFGLF

I

Figure 3: This figure show the results obtained after aligning and subtracting the left and right images of the MLO and CC views respectively.

Figure 4: The left and right images of the MLO and CC views are subtracted and the regions of suspicion are those pixels where the pixel intensity is greater than zero. These suspicious regions for each of the four images of a case are depicted in this figure.

123%<(3@(A1)B(C3'<(/DEE((/DEEFGL/

9 12

15

7

12

24 25 2 15 14 2118 20

7 16

19 1 13

17 16 11 322 23

10

11

5

10

8

19 1

9

25 24 17 23 2

4

4

14 3

6

20 21 22 5

6 8

18 13

Figure 5: In this figure we show a comparison of the detection results of the AFUM algorithm and the results obtained by combining the two methods. We note that the number of false positives is reduced after the combination of the two algorithms. (red * = AFUM detections; blue circled * = AFUM and Subtraction detections; green circle = ground truth)

Training set FROC

Testing Set FROC

AFUMand5ubtraction

AFUM and Subtraction AFUM

AFUM

0

5

10

15

0

Avo FF/lmaoe

5

10

15

AVO FF/lmaoe

Figure 6: In this figure we show the results of the AFUM algorithm and the results obtained after combining the AFUM algorithm with the subtraction algorithm. The figure on the left shows the results for the training set and the figure on the right shows the results for the testing set.

123%<(3@(A1)B(C3'<(/DEE((/DEEFGLK

REFERENCES

1. 2. 3. 4.

5.

6. 7. 8.

9. 10. 11.

12. 13. 14.

Cancer Facts and Figures 2005. 2005, American Cancer Society: Atlanta. Feuer, E.J., et al., The Lifetime Risk of Developing Breast Cancer. Journal of the National Cancer Institute, 1993. 85(11): p. 892-897. Wun, L., R.M. Merrill, and E.J. Feuer, Estimating Lifetime and Age-Conditional Probabilities of Developing Cancer. Lifetime Data Analysis, 1998. 4: p. 169-186. Kerlikowske, K., et al., Performance of screening mammography among women with and without a first-degree relative with breast cancer. Annals of Internal Medicine, 2000. 133(11): p. 855-63. Kolb, T.M., J. Lichy, and J.H. Newhouse, Comparison of the performance of screening mammography, physical examination, and breast US and evaluation of factors that influence them: an analysis of 27,825 patient evaluations.[see comment]. Radiology, 2002. 225(1): p. 165-75. Bird, R.E., T.W. Wallace, and B.C. Yankaskas, Analysis of cancers missed at screening mammography. Radiology., 1992. 184(3): p. 613-7. Giger, M.L., Computer-aided diagnosis in radiology. Academic Radiology, 2002. 9(1): p. 1-3. Sampat, M.P., M.K. Markey, and A.C. Bovik, Computer-aided detection and diagnosis in mammography, in Handbook of Image and Video Processing, A.C. Bovik, Editor. 2005, Academic Press. p. 1195-1217. Heath, M.D. and K.W. Bowyer. Mass Detection by Relative Image Intensity. in 5th International Workshop on Digital Mammography. 2000. Toronto, Canada. Yin, F.F., et al., Computerized detection of masses in digital mammograms: analysis of bilateral subtraction images. Medical Physics., 1991. 18(5): p. 955-63. Zheng, B., Y.H. Chang, and D. Gur, Computerized detection of masses from digitized mammograms: comparison of single-image segmentation and bilateral-image subtraction. Academic Radiology, 1995. 2(12): p. 1056-61. Mendez, A.J., et al., Computer-aided diagnosis: automatic detection of malignant masses in digitized mammograms. Medical Physics., 1998. 25(6): p. 957-64. Heath, M., et al. The Digital Database for Screening Mammography. in 5th International Workshop on Digital Mammography. 2000. Toronto, Canada. Gavrielides, M.A., J.Y. Lo, and C.E. Floyd, Jr., Parameter optimization of a computeraided diagnosis scheme for the segmentation of microcalcification clusters in mammograms. Medical Physics, 2002. 29(4): p. 475-83.

123%<(3@(A1)B(C3'<(/DEE((/DEEFGLM

Reducing false-positive detections by combining ... - Semantic Scholar

It is important to realize that mammographic image analysis is an extremely challenging task ... Digital Database of Screening Mammography (DDSM) website, with the modification that the breast region ..... Lifetime Data Analysis, 1998. 4: p.

1MB Sizes 0 Downloads 235 Views

Recommend Documents

Structural Representation: Reducing Multi-Modal ... - Semantic Scholar
togram using a Gaussian kernel in order to spatially constrain the contribution of each pixel in the patch entropy. Also, a function f is employed to increase the contribution of pixels with lower proba- bility in the patch and weaken the pixel contr

Combining MapReduce and Virtualization on ... - Semantic Scholar
Feb 4, 2009 - Keywords-Cloud computing; virtualization; mapreduce; bioinformatics. .... National Center for Biotechnology Information. The parallelization ...

Reducing Label Cost by Combining Feature Labels ...
Dept. of Computer Science, University of Maryland, College Park, MD, USA 20742 .... these labels helps to better define the optimal decision boundary, resulting ...

Reducing social inequalities in health: public ... - Semantic Scholar
plexedi, and makes the present article all the more ... the other hand, aims to reduce social inequalities in health through an empowerment process. However, this .... replaced by 'promotion' in the name of this ..... Health Evidence Network: 37.

Combining ability of rice genotypes under coastal ... - Semantic Scholar
4B-8-1 X ADT 45, IR 65192-4B-8-1 X Norungan, IR 65192-4B-8-1 X MDU 5 and ... ADT 45. The hybrids IR 65847-3B-6-2 X ADT 45 recorded non additive gene ...

Reducing Cache Miss Ratio For Routing Prefix ... - Semantic Scholar
frequency f(e) determines which segment it is inserted into. Within the segment, the entry is always inserted at the begin- ning. An entry ages naturally when new ...

Repeatabilty of general and specific combining ... - Semantic Scholar
Keyword: Potato, combining ability, clone, yield. ... estimated using the computer software SPAR1. A ... MP/90-94 was the best specific combiner for dry.

Studies on hybrid vigour and combining ability for ... - Semantic Scholar
and combining ability analysis were carried out in line x tester model using five lines viz., Kanakamany, ... cm standard package of practices were followed with.

Repeatabilty of general and specific combining ... - Semantic Scholar
Keyword: Potato, combining ability, clone, yield. ... estimated using the computer software SPAR1. A ... MP/90-94 was the best specific combiner for dry.

Combining ability for yield and quality in Sugarcane - Semantic Scholar
estimating the average degree of dominance. Biometrics 4, 254 – 266. Hogarth, D. M. 1971. ... Punia, M. S. 1986.Line x tester analysis for combining ability in ...

Combining Similarity in Time and Space for ... - Semantic Scholar
Eindhoven University of Technology, PO Box 513, NL 5600 MB Eindhoven, the Netherlands ... Keywords: concept drift; gradual drift; online learning; instance se- .... space distances become comparable across different datasets. For training set selecti

Combining Local Feature Scoring Methods for Text ... - Semantic Scholar
ommendation [3], word sense disambiguation [19], email ..... be higher since it was generated using two different ...... Issue on Automated Text Categorization.

Semantic Queries by Example - Semantic Scholar
Mar 18, 2013 - a novel method to support semantic queries in relational databases with ease. Instead of casting ontology into rela- tional form and creating new language constructs to express ...... uni-karlsruhe.de/index_ob.html. [19] OTK ...

Semantic Queries by Example - Semantic Scholar
Mar 18, 2013 - Finally, we apply the query semantics on the data to ..... mantic queries involving the ontology data are usually hard ...... file from and to disk.

Multiagent Coordination by Stochastic Cellular ... - Semantic Scholar
work from engineering, computer science, and mathemat- ics. Examples ..... ing serves to smooth out differences between connected cells. However, if this ...

Backward Machine Transliteration by Learning ... - Semantic Scholar
Backward Machine Transliteration by Learning Phonetic Similarity1. Wei-Hao Lin. Language Technologies Institute. School of Computer Science. Carnegie ...

Operative length independently affected by ... - Semantic Scholar
team size: data from 2 Canadian hospitals. Background: Knowledge of the composition of a surgical team is the premise for studying efficiency inside the operating room. Methods: To investigate the team composition in general surgery procedures, we re

Prebiotic Metabolism: Production by Mineral ... - Semantic Scholar
conduction-band electrons and valence-band holes of semi- ... carbon dioxide to formate using a conduction-band (CB) electron is shown; the corresponding ...

Context-Aware Query Recommendation by ... - Semantic Scholar
Oct 28, 2011 - JOURNAL OF THE ROYAL STATISTICAL SOCIETY,. SERIES B, 39(1):1–38, 1977. [5] B. M. Fonseca, P. B. Golgher, E. S. de Moura, and. N. Ziviani. Using association rules to discover search engines related queries. In Proceedings of the First

Context-Aware Query Recommendation by ... - Semantic Scholar
28 Oct 2011 - ABSTRACT. Query recommendation has been widely used in modern search engines. Recently, several context-aware methods have been proposed to improve the accuracy of recommen- dation by mining query sequence patterns from query ses- sions