Redundancy, Redundancy, Redundancy: The Three Keys to Highly Robust Anatomical Parsing in Medical Images Xiang Sean Zhoua Zhigang Peng , Yiqiang Zhan , Maneesh Dewana , Bing Jiana , Arun Krishnana , Yimo Taoa∗, Martin Harderb , Stefan Grosskopfc , Ute Feuerleinc a

a

a

Siemens Healthcare Malvern, Pensylvania 19355 / b Erlangen, Germany / c Forchheim, Germany

[email protected] ABSTRACT

1.

Although redundancy reduction is the key for visual coding in the mammalian visual system [1, 2], at a higher level, the visual understanding step, a central component of intelligence, achieves high robustness by exploiting redundancies in the images, in order to resolve uncertainty, ambiguity, or contradiction [3,4]. In this paper, an algorithmic framework, Learning Ensembles of Anatomical Patterns (LEAP), is presented for the purpose of automatic localization and parsing of human anatomy from medical images. It achieves high robustness by exploiting statistical redundancies at three levels: the anatomical level, the parts-whole level, and the voxel level in the scale space. The recognition-by-parts intuition is formulated in a more principled way as a spatial ensemble, with added redundancy and less parameter tuning for medical imaging applications. Different use cases were tested using 2D and 3D medical images, including X-ray, CT, and MRI images, for different purposes such as view identification, organ and body parts localization, and MR imaging plane detection. LEAP is shown to significantly outperform existing methods or its “non-redundant” counterparts.

Semantic content tagging and retrieval of medical images is an active research field with promising applications, which can potentially help a wide spectrum of clinical as well as non-clinical users [5, 6]. At least two types of content tags can be derived from an image: one for diseases such as lesions or cancers, and the other for anatomical structures. This paper addresses the latter, i.e., automatic localization and parsing of human anatomy in medical images. Automatic detection and tagging of anatomical structures, even if healthy, could help image matching and retrieval, as well as clinical use cases such as intelligent target localization during CT and MR scans, automation of PACS (Picture Archiving and Communication Systems) workflows, or selective preprocessing for CAD (computer aided detection and diagnosis) algorithms. Please see Fig. 1–3 for three examples. In general, the goal is to find in a 2D or 3D image a properly positioned, scaled, and/or oriented bounding box of an anatomical structure. Both robustness and accuracy requirements are high. For example, for Use Case II, a biggerthan-necessary box may result in more harm to the patient (due to added radiation) and less voxel resolution. On the other hand, a smaller-than-necessary box means missing information, which may prompt the need for a re-scan, resulting in waste of time and potentially more radiation to the patient again. There have been significant advances in the research field of general-purpose object detection and pattern recognition in images. However, image understanding algorithms that work well with natural images may not work robustly enough with medical images, due to the high requirements of medical use cases [6], and the very nature of medical imaging that anomaly is the norm. In Fig. 1–3, one can see the strong variability across the cases. The image analysis and tagging algorithm must work robustly on most, if not all, of these cases, in order to achieve practical usability within a clinical workflow. One must wonder, then, how does a human learn and perform these tasks? It is interesting to observe that when a disease or artifact affect an anatomical structure, the remaining portions of the structure or other anatomical structures would hint its existence and extent. It is redundancy that the human visual recognition system exploits to achieve high robustness. It is also clear that multiple aspects of redundancy are exploited at the same time by human recognition system, from inter-organ, intra-organ, to interactions among local image patterns such as edges or corners, or pixel-level redundancies [2]. All aspects are not

Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing; J.3 [Computer Applications]: Life and Medical Sciences—Medical information systems

General Terms Algorithms, Performance

Keywords Medical image parsing, computational anatomy

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MIR’10, March 29–31, 2010, Philadelphia, Pennsylvania, USA. Copyright 2010 ACM 978-1-60558-815-5/10/03 ...$10.00.

INTRODUCTION

(a)

(b)

(c)

(d)

(e) (f) Figure 1: Use case I: automatic imaging plane detection in 3D MR scout images. (a) Input: scout scan of a knee; (b) desired output: plane and center aligned for optimal imaging of the menisci; (c)-(f ) studies with severe disease, artifacts, and missing data, etc.—Rectangles show desired (and actual) alignment results.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 2: Use case II: automatic organ and body parts detection in CT topograms—for example, where is the pelvis? (a)-(f ) Studies with various artifacts, occlusions, and implants around pelvis.

(a) (b) (c) (d) Figure 3: Use case III: radiograph classification, i.e., labeling radiographs as “Chest PA-AP”, “Chest Lateral”, “Pelvis”, “Head”, etc. (a)-(d) Pelvis (http://imageclef.org/2008/medaat). Notice the strong variability.

Figure 4: Human foveal vision, at any given time point, takes in only local evidence with a blurred peripherial context ( [7]; http://www.learning-systems.ch/multimedia/vis_e02.htm). exploited by existing methods, although many use a shape constraint, or cross-shape constraints (e.g., [8]), etc. This paper presents an attempt to address and exploit several, if not all, of these sources of redundancies.

1.1

Redundancies in Medical Images

Medical images indeed exhibit more statistical redundancies than other types of images. For example, many medical imaging modalities produce direct 3D representations of human anatomy, or 2D projection in canonical views (e.g., in either PA-AP (posteroanterior/anteroposterior) or lateral direction). Such well-defined and well-controlled imaging process enhances structural predictability, or redundancy. More specifically, there are at least three types of redundancies that are worth exploiting:

1.1.1

Redundancy in part-whole relations

When it comes to visual object recognition and localization, the whole is less than the sum of the parts. In other words, one does not need to see all the trees in order to see the forest. A subset of all the parts is often sufficient to reveal the whole. This is a general rule but in medical imaging, the subset can be even smaller due to well-defined imaging protocols and the strong prior knowledge. Human can make a well-educated guess of the extent of the pelvic bone in most images in Fig. 2 and 3 because sufficient evidences of its parts are present within the image.

1.1.2

Redundancy in anatomical constraints

Kidneys do not exist in a head and neck scan; and in a whole body scan, the pelvis is always below the lung in a predictable way, at least in a DICOM world (http://medical. nema.org/). These are just simple examples of a very rich set of anatomical constraints that one could exploit. In general, in addition to the part-whole relations which are relatively local, there are “long-range” or “distant” relationships or constraints, all the way from head to toe, that provide strong redundancies for anatomical modeling. One could imagine the use of anatomical knowledge representations such as Foundational Model of Anatomy (FMA) [9] as a guidance for image analysis. For more discussion along this line, please refer to [6]. Here we aim to derive simple statistical models based on a set of training data with manual annotations of anatomical landmarks.

1.1.3

Redundancy in the scale space

Much research was done for face or car detection in natural images. Faces or cars can appear on any background. But most anatomical structures appear in a predictable context at a coarser scale: the aortic arch, or a smaller structure such as the carina of trachea (i.e., the bifurcation of the

airway), appears almost always roughly in the middle of the two lungs (Exceptions can happen, for example, when one lung is collapsed). The inverse is also true, i.e., that a large organ or structure may distinguish itself only in fine details. A good example is the two adjacent vertebrae— thoracic T12 and the lumbar L1: one can tell them apart only by a small difference in shape and connectivity to ribs on the backside. So, redundancy not only exists spatially in a part-whole relation or among distant structures, it also exists in the scale space. In general, exploiting redundant evidences across scales can improve the robustness of target localization, if the target has cross-scale support. The good news is, for human anatomical structures, most (if not all) have cross-scale support (as represented in the space of medical images).

1.2

Exploiting Redundancies for Robustness

Redundancies can be exploited for either efficiency or robustness. In this paper, we focus on the latter, and introduce some ways of exploiting redundancies to achieve human-like or better-than-human robustness. The key idea is twofold: to collect as many as possible redundant evidences, and to use a robust way to combine them. More specifically, we adopt the following three strategies or algorithms to exploit the aforementioned redundancies, respectively:

1.2.1

Spatial ensemble through re-alignment

Existing ensemble methods such as bagging predictors [10] make redundant use of the training set by re-sampling with replacement. We also make redundant use of the training images, but in a spatial manner: we reuse all training images in each bag, but each time with re-alignment to a different part of the whole target structure. This is similar to human foveal vision where multiple examinations are performed at many focal points with peripheral context (see Fig. 4 and 5). Details are discussed in section 2.1.

1.2.2

Learned sparse configurational models

To exploit redundancy in anatomical modeling, we use collections of images annotated with different anatomical landmarks throughout the body. The relationships among these landmarks are represented redundantly by a sparse configurational model, in which every pair or triple of landmarks form a voting group and predict among themselves. The collection of all groups is an overly redundant representation of the spatial relationship among the landmarks, but this brings extremely high robustness because even if severe diseases affect many landmarks, consensus can still form among the remaining landmarks.

1.2.3

Reducing cross-scale dependencies

To exploit redundancy in the scale space, we implement our detectors in multiple scales, but with minimal dependencies that are allowable by the speed requirement. In other words, a coarse level detector of carina will “see” the lungs and thus predict the location of carina, but such a prediction is largely disregarded by finer level detectors, which will re-search for carina in a wide search space. Such a loose dependency introduces added redundancy and performs more robustly when “strange”, abnormal cases are encountered.

1.3 1.3.1

Related Work Bag-of-visual-words and spatial bagging

The use of bag-of-visual-words is common nowadays for scene classification [11]. Notice, however, that the “bag” here is conceptually different from the “bag” in bagging algorithms. Here the term “bag” is used to signify the fact that the visual words are mixed without considering spatial origin. Therefore, our foveal evidences corresponds to the visual words, and our “bag” corresponds to the training set for one visual word. For scene classification or object detection in natural images, it does not make sense to use a full-context spatial ensemble because background variability is too great. But for medical images, our spatial ensemble is best suited to take advantage of the redundancy in anatomical context. For the same reason, the spatial bagging method proposed by Vucetic et al. [12], where an image is breaking up into patches and randomly sampled as bags, is only suitable for textured images, for example, precision agricultural data which motivated their idea. And as we have emphasized in Section 1.1 and 1.2, for sure one would want to take advantage of the spatial dependencies of foveal evidences or visual words. Hence the throw-’em-in-a-bag-and-shake-it philosophy and methodology would be ill-advised for most medical imaging applications.

1.3.2

Object recognition by parts

In the more broad research field of object detection and recognition, many methods based on the use of local features have been proposed. The objects of interest were in many cases faces, cars or people [13–18]. Cristinacce and Cootes [13] combined boosted detector described by Viola and Jones [19] with a statistical shape model. Multiple hypotheses of each local feature were screened using the shape model and the winning hypothesis was determined for each feature. Agarwal et al. [14] presented an object detection algorithm for detecting the side view of a car in a cluttered background. It used a “part-based representation” for the object. The global shape constraint was imposed through learning using the Sparse Network of Winnows architecture. Mohan et al. [17] proposed a full-body pedestrian detection scheme. They first used separate SVM classifiers to detect the body parts, such as heads, arms and legs. Then, a second SVM classifier integrating those detected parts was used to make the final detection decision. Leibe et al. [18] proposed a method for robust object detection based on a learned codebook of local appearances. To integrate the global shape prior, an implicit shape model was learned to specify the locations, where the codebook entries might occur. Compared to existing methods, including the most recent

learning-based detection methods (e.g., [20, 21]), the proposed algorithm emphasizes robustness through redundancy, with much more redundant use of the image content at the part level, both within and across image scales. In addition, an overly redundant set of sparse predictors are used to remove outliers and errors. Our tests showed that such redundancies notably improve robustness, especially for multiclass problems and for less-predictable, challenging input images. The most interesting difference, in our view, is the fact that we have formulated the recognition-by-parts intuition in a more principled way as a spatial ensemble, so that we no longer need to worry about what context to use for each part, or what type of parts or visual words to pick—just use the whole image as context, and use as many foveal evidences as possible (Breiman [10] suggestes about 20 and our experiments confirmed this to be reasonable).

2.

LEARNING ENSEMBLES OF ANATOMICAL PATTERNS (LEAP)

In this section, we give more details of our algorithms, focusing on the three aspects mentioned in Section 1.2, i.e., how we exploit redundancies to achieve high robustness.

2.1

Spatial Ensemble through Re-Alignement

Given a training set L = {(yn , x n ), n = 1, ..., N } where the y’s are the class labels and x ’s are the inputs, and a learning x, L), a algorithm that uses this set to form a predictor ϕ(x bagging predictor can be formulated as below [10]: x) = avB ϕ(x x, L(B) ). ϕB (x

(1)

(B)

where L ’s are the bootstrap samples or bags, and avB denotes averaging or voting among the predictors. For learning visual patterns, where x is an image (or a volume), the bootstraping can be done in the spatial domain, with each bag using all the images in the training set, but re-aligned by different parts of the target pattern. Denoting the re-alignment process of the training set as Ai ~ L, with Ai representing the ith alignment parameters and ~ the alignment operation, the formulation becomes: x) = votei ϕ(x x, Ai ~ L). ϕA (x

(2)

with a similar number of re-alignment as the typical number of bags: |A| ≈ |B|. One way to re-align the training set is to align them according to correspondence points, which we call anatomical landmarks, or landmarks for short. Landmarks can be easily identifiable feature points such as the tip of a bony structure; or they can also be “fuzzily” defined landmarks, such as point on a line/surface or in a texture-less region, for example, center of the liver. The subsequent learning step is designed to take care of the aperture problem (e.g., anisotropic variability) implicitly, achieving spatial specificity only comparable to that of the annotations. Fig. 5 shows statistical representations of some of the re-aligned bags based on some pelvis landmarks. The top row shows the medians of the training set. The bottom row shows the MAD’s (median absolute deviations), where xi − medianj (x xj )|). MAD = mediani (|x

(3)

Robust statistics are used here to avoid the influence of outliers in the training set such as missing data caused by an abnormal FoV (Field of View).

•••

••• (a) (b)  (c) Figure 5: Robust representations of multiple foveal evidences around pelvis based on about 300 CT topograms. Arrows indicate the points of computational foveal fixation achieved by re-aligning the training set. Top row shows medians of the re-aligned training set, and bottom row shows median absolute deviations or MAD’s. The spatial ensemble makes sense because it somewhat resembles the human’s foveal vision system, which makes repeated examinations of the same scene with different focus points. Fig. 4 is an illustration of how the human visual system works at any given time point: only a small neighborhood is in focus and the context is blurred. Repeated examinations of multiple focal points in the scene eventually arrives at, and confirms, a consensus recognition. Fig. 6 compares existing ensemble methods with the spatial ensemble. Aside from the difference in how to re-use the training set, there are additional processing steps for the spatial ensemble at run time, which depends on the final goal either being a detection or a segmentation (of either bounding box or contour/surface) task. For each “bag” of a re-aligned training set, a separate foveal detector is trained in the form of a classifier Ci (see Fig.6). Several existing algorithms can be used for this, including AdaBoost-based algorithms or the random forest/ferns algorithms (e.g., [22, 23]). It is interesting to note that, for matching objects in natural images and video, Lepetit and Fua [23] also used a large set of local evidences to ensure robustness. (The difference is that they could assume a known space of variations thus synthesize as many training data as needed. This is not possible in our case in that we have neither unlimited training data nor anticipated variations.) Once the foveal evidences are gathered, three questions can be answered: 1. Is the target structure present? 2. Are any of the foveal detections erroneous/outliers? 3. What is the extent of the target structure? The first question can be answered by a majority voting scheme similar to that of the existing ensemble methods such as bagging. The second question is answered through a sparse voting algorithm (see next Section). The third question is answered by using the remaining inlier detections to predict the optimal output [24], which we omit in this paper due to space limitation.

2.2

Learned Sparse Configurational Models

While the previous step detects redundant foveal evidences independently, the aim of this module of the algorithm is to model and exploit the dependencies among these local detections, in order to remove possible error detections. Denote the conditional probability of one foveal evidence or landmark location pi given other landmarks as P (pi |P¯i ). If all the pj ’s in the set P¯i are correct, this probability can be used to measure the quality of pi . However, since it is not know which of the pj ’s are erroneous, we resort to a RANSAC-type strategy by sampling many subsets of landmarks and detect the outliers from the samples. To address the potential challenge that the majority portion of a target anatomical pattern may be missing or occluded or altered by disease or artifacts (see for example Fig.11(c) or Fig.12), we suggest the use of small or sparse P¯i ’s. In other words, a landmark is judged by only a small subset of others every time. In this paper, we take every pair or triple of landmarks to form a voting group and construct predictors only among themselves. This kind of sparsity, and “democracy” (i.e., aggregation of many small/sparse decisions), has two advantages: a decision can be made even only a scarce set of evidences is available; and the final decision is robust to potentially high percentage of erroneous detections, as long as they are inconsistent with one another. The vote received by landmark point pi is denoted by η(pi |Pv ), where Pv is a voting group. The vote is defined as pi ’s likelihood of being accepted or predicted by Pv based on the conditional distribution estimated using the annotated training set. Assuming Gaussianity with mean νi and covariance Σ, the vote is simply η(pi |Pv ) =

1 (2π)D/2 |Σ|1/2

T

e−(pi −νi )

Σ−1 (pi −νi )

.

(4)

where D ∈ {2, 3} is the dimensionality of the image. The collection of all groups is an overly redundant representation of the spatial relationship among the parts of an anatomical structure—this brings robustness because even if severe diseases affect many landmarks causing wrong or

Figure 6: Spatial Ensemble through Re-Alignement. (a) traditional bagging; (b) BTR: a redundant resampling of training images but with different focus points, with each bag reusing all images and all context available. missed detections, consensus can still form among the remaining landmarks. This sparse modeling algorithm is applied not only on or around the target anatomy, but also applied to distal organs and structures whenever present in the image. This exploits the redundancy in anatomical constraints and can improve the robustness further : when accidental findings of pelvic evidence happen in the upper part of the torso, the consensus among the thoracic foveal evidences and their joint vetoes against such accidental false findings can quickly remove them.

2.3

Adding Redundancy in the Scale Space

Typically, a coarse to fine strategy is employed for recognition when cross-scale dependencies are high. This can improve the speed of target search in the image exponentially. We also employ a multi-scale approach to landmark detection, but pay special attention to the trade-off of efficiency and robustness. Specifically, we try to minimize the dependency of finer detectors on proposal from the coarser detectors, using a search range as wide as allowable by the speed requirements of the application. Such a strategy can capture occasional error detections especially at finer scales. Ambiguity or contradiction indeed can happen, for example, when a large tumor in the abdomen grows into the shape of the heart. An analogous and interesting example is shown in Fig. 7: in this painting by Swiss artist Sandro Del Prete, one first sees Da Vinci doing a self-portrait. But a closer look, at a finer scale, reveals clearly two mule riders. In the end, is Da Vinci really

painting himself ? In medical imaging, such mind-twisting ambiguities do happen, but usually, they can be resolved by contextual information. This is why redundant examinations across scales is important to achieve robustness.

3.

EXPERIMENTS AND RESULTS

We provide here experimental results for the three use cases illustrated in Fig. 1–3.

3.1

Imaging plane detection in MRI Knee

MRI (magnetic resonance imaging) machines can take images of high resolution along any orientation to focus on important anatomical structures, such as the meniscus plane in the knee. The problem is how to find the exact location and orientation of the meniscus plane. Manual trial-and-error approach is time-consuming, error-prone, and inconsistent. An automatic solution is to take a 3D scout scan of the knee, and find the meniscus plane in the scout scan. With the patient staying still, a high resolution scan can be taken to image the meniscus plane. A total of 744 consecutive cases are collected from a hospital as unseen test data. The proposed algorithm has an extremely low failure rate of 0.4%. This is compared to 15.2% of a predecessor algorithm based on bone segmentation, which did not exploit any redundancies. The computation times are similar—Both algorithms run in a few seconds on the MR scanner. In Fig. 1, the bounding boxes in (c)-(f) show the robust outcome of the algorithm despite the severe disease, mo-

3.3

Figure 7: Do you see Da Vinci or two mule riders?— Visual ambiguity and contradiction arises when this image is observed closely or from afar. See Section 2.3 for more discussions. (Painting by Swiss artist Sandro Del Prete) tion artifacts, or abnormal field-of-view, etc. Additional results are shown in Fig. 8. In one case, the menisci are completely missed in the field-of-view, but the algorithm reliably predicted the target location and orientation based on the limited information available—in the same way a human observer would do, but maybe with higher precision and consistency. For most of these challenging cases, the successes of the algorithm can be attributed to the extra redundancies incorporated in the algorithm.

3.2

Scan range detection in CT topograms

On a test data set of 169 topograms, the detection rate varies from 98.2% to 100%, and the false detection rate from 0.0% to 0.5%, for different ROIs. The test is carried out on a DELL Precision 490 workstation with an Intel (R) Xeon (R) . The typical execution time for the detection of multiple organ ROIs in an input image is around 1 second (on an Intel Xeon 1.86GHz CPU with a 2GB RAM). We conducted a stress-testing and comparison with the well-known active appearance model (AAM [25]) approach for the detection of brain scan range in lateral topograms. On a total of 198 cases, some are quite challenging (see Fig. 11), the failure rates for AAM and our approach are 11.6% and 3%, respectively. Fig. 9 through 12 shows results for scan/reconstruction ranges for pelvis as well as other anatomical regions. As shown on many of these challenging cases, our approach is very robust to missing data and large image variations, succeeding even on cases with 80% to 90% of the target organ out of the image (see, for example, Fig. 11(c)).

Radiograph classification

For this use case, we evaluated our method on four subtasks: PA-AP/LAT chest radiograph view identification task with and without OTHER class, and the multi-class radiograph classification task with and without OTHER class. For the former task, we used an in-house database of close to 1500 chest radiographs; for the latter task, we used the IRMA/ImageCLEF2008 database (http://imageclef.org/ 2008/medaat), which contains more than 10,000 radiographs of various body regions. We randomly selected 500 PA-AP, 500 LAT, and 500 OTHER images for training all the foveal evidence detectors, with each using about 200-300 cases. These training images were also used for training the configurational model. The remaining images were used as the testing set. For the chest radiograph view identification, we compared our method with three other methods described by Boone et al. [26], Lehmann et al. [27], and Kao et al. [28]. For the multi-class radiograph classification task, we compared our method with the bag-of-features method proposed by Deselaers and Ney [29] (named as PatchBOW+SVM) and the method proposed by Tommasi et al. [30] (named as SIFTBOW+SVM). Regarding PatchBOW+SVM, we implemented the bag-of-features approach based on randomly cropped image sub-patches. The generated bag-of-features histogram for each image had 2000 bins, which were then classified using a SVM classifier with a linear kernel. Regarding SIFTBOW+SVM, we implemented the same modified version of SIFT (modSIFT) descriptor and used the same parameters for extracting bag-of-features as those used by Tommasi et al. [30]. We combined the 32×32 pixel intensity features and the modSIFT bag-of-features as the final feature vector, and we used a SVM classifier with a linear kernel for classification. We also compared to the benchmark performance of directly using 32×32 pixel intensity from the down-sampled image as the feature vector along with a SVM classifier. Table 1 and Table 2 show the recognition rates of our method, along with other methods. It can be seen that our system obtained an almost perfect performance on the PAAP/LAT separation task. The only one failed case was a PA-AP image of a 3-year-old baby. Our method also performed the best on the other three tasks. Fig. 13 shows the classification result along with the detected landmarks for different classes. It again shows that our method is very robust against severe artifacts and diseases.

4.

DISCUSSION AND CONCLUSIONS

The key to robust visual recognition, either by a human or by a machine, is to exploit the redundancies among multiple visual cues. For the purpose of parsing anatomical structures in medical images, this paper presents ways to exploit redundancies across image scales, in part-whole relations, and in human anatomy. We showed that such redundancies help the anatomical parsing algorithm to reach very high robustness and success rate, approaching or exceeding human performance. The proposed algorithm and its high robustness is validated using three real-world use cases involving CT, MRI, and radiographic images. The same methodology can be applied to other applications.

Figure 8: Use case I results on four more cases with severe imaging artifacts, missing data due to poor positioning, or excessive fat.

[ref ] [27] [26] [28]

[ref ] [29] [30]

5.

Table 1: PA-AP/LAT/OTHER performance. Method PA-AP/LAT PA-AP/LAT/OTHER Our method 98.81% Our method w/o FP reduction 99.98% 98.47% Lehmann’s method 99.04% 96.18% Boone’s method 98.24% Improved Projection Profile 97.60% -

Table 2: Multi-class performance. Method Multi-class w/o OTHER Our method 99.33% Subimage pixel intensity + SVM 97.33% PatchBOW + SVM 96.89% SIFTBOW + SVM 98.89%

REFERENCES

[1] H. Barlow, “Sensory mechanisms, the reduction of redundancy, and intelligence,” The mechanisation of thought processes, pp. 535–539, 1959. [2] D. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” Journal of the Optical Society of America A, vol. 4, no. 12, pp. 2379–2394, 1987. [3] N. Sutherland, “Outlines of a theory of visual Pattern Recog. in animals and man,” Proc. Royal Society of London. Series B, Biological Sciences, vol. 171, no. 1024, pp. 297–317, 1968. [4] I. Biederman, “Recognition-by-components: A theory of human image understanding,” Psychological review, vol. 94, no. 2, pp. 115–147, 1987. [5] H. M¨ uller, N. Michoux, D. Bandon, and A. Geissbuhler, “A review of content-based image retrieval systems in medical applications - clinical benefits and future directions,” Int’l J. of Medical Informatics, vol. 73, no. 1, pp. 1–23, 2004. [6] X. Zhou, S. Zillner, M. Moeller, M. Sintek, Y. Zhan, A. Krishnan, and A. Gupta, “Semantics and CBIR: a medical imaging perspective,” in Proc. Int’l Conf. Content-based Image and Video Retrieval, 2008, pp. 571–580. [7] H. Hunziker, “Im Auge des Lesers-foveale und periphere Wahrnehmung: vom Buchstabieren zur

[8]

[9]

[10] [11]

[12]

[13]

[14]

Multi-class w/ OTHER 98.81% 89.00% 94.71% 95.86%

Lesefreude,” The eye of the reader: foveal and peripheral perception-from letter recognition to the joy of reading, Transmedia Zurich, 2006. J. Yang and J. Duncan, “Joint prior models of neighboring objects for 3D image segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recog., Washington, DC, vol. 1, 2004. N. F. Noy and D. L. Rubin., “Translating the Foundational Model of Anatomy into OWL,” in Stanford Medical Informatics Technical Report, 2007. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996. J. Yang, Y. Jiang, A. Hauptmann, and C. Ngo, “Evaluating bag-of-visual-words representations in scene classification,” in Proc. Int’l Workshop on Multimedia Information Retrieval, 2007, p. 206. S. Vucetic, T. Fiez, and Z. Obradovic, “A data partitioning scheme for spatial regression,” in Proc. IEEE/INNS Int’l Jnt Conf. Neural Networks, 1999. D. Cristinacce and T. Cootes, “Facial feature detection using adaboost with shape constraints,” in British Machine Vision Conference, 2003, pp. 231–240. S. Agarwal, A. Awan, and D. Roth, “Learning to detect objects in images via a sparse, part-based representation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 26, no. 11, pp. 1475–1490, 2004.

(a)

(b)

(c) (d) (e) Figure 9: Use case II results for the cases shown in Fig. 2.

(f)

Figure 10: Use case II results on additional cases for the pelvis range. (Multiple choices, colored, are offered to satisfy different user requirements or preferences.)

[15] K. Yow and R. Cipolla, “Feature-based human face detection,” Image and Vision Computing, vol. 15, no. 9, pp. 713–735, 1997. [16] T. K. Leung, M. C. Burl, and P. Perona, “Finding faces in cluttered scenes using random labeled graph matching,” in Proc. Intl. Conf. on Computer Vision, Cambridge, MA, 1995, pp. 637–644. [17] A. Mohan, C. Papageorgiou, and T. Poggio, “Example-based object detection in images by components,” IEEE Trans. Pattern Anal. Machine Intell., vol. 23, no. 4, pp. 349–361, 2001. [18] B. Leibe, A. Leonardis, and B. Schiele, “Robust object detection with interleaved categorization and segmentation,” Int’l J. of Computer Vision, vol. 77, no. 1, pp. 259–289, 2008. [19] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. IEEE Conf. on Computer Vision and Pattern Recog., Hawaii, vol. 1, 2001, pp. 511–518. [20] B. Georgescu, X. S. Zhou, D. Comaniciu, and A. Gupta, “Database-guided segmentation of anatomical structures with complex appearance,” in Proc. IEEE Conf. on Computer Vision and Pattern Recog., San Diego, CA, 2005, pp. II: 429–436. [21] Y. Zheng, X. Lu, B. Georgescu, A. Littmann, E. Mueller, and D. Comaniciu, “Robust object detection using marginal space learning and ranking-based multi-detector aggregation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recog., Miami, FL, 2009. [22] M. Ozuysal, P. Fua, and V. Lepetit, “Fast keypoint recognition in ten lines of code,” in Proc. IEEE Conf. on Computer Vision and Pattern Recog., Minneapolis, MN, vol. 1, 2007, pp. 1–8.

[23] V. Lepetit and P. Fua, “Keypoint recognition using randomized trees,” IEEE Trans. Pattern Anal. Machine Intell., vol. 28, no. 9, p. 1465, 2006. [24] Z. Peng, Y. Zhan, X. S. Zhou, and A. Krishnan, “Robust anatomy detection from CT topograms,” in Proc. SPIE Medical Imaging, vol. 7620, 2009, pp. 1–8. [25] T. Cootes, G. Edwards, C. Taylor et al., “Active appearance models,” IEEE Trans. Pattern Anal. Machine Intell., vol. 23, no. 6, pp. 681–685, 2001. [26] J. M. Boone, G. S. Hurlock, J. A. Seibert, and R. L. Kennedy, “Automated recognition of lateral from PA chest radiographs: saving seconds in a PACS environment,” Journal of Digital Imaging, vol. 16, no. 4, pp. 345–349, 2003. [27] T. M. Lehmann, O. G¨ uld, D. Keysers, H. Schubert, M. Kohnen, and B. B. Wein, “Determining the view of chest radiographs,” Journal of Digital Imaging, vol. 16, no. 3, pp. 280–291, 2003. [28] E. Kao, C. Lee, T. Jaw, J. Hsu, and G. Liu, “Projection profile analysis for identifying different views of chest radiographs,” Academic Radiology, vol. 13, no. 4, pp. 518–525, 2006. [29] T. Deselaers and H. Ney, “Deformations, patches, and discriminative models for automatic annotation of medical radiographs,” Pattern Recog. Letters, vol. 29, no. 15, pp. 2003–2010, 2008. [30] T. Tommasi, F. Orabona, and B. Caputo, “Discriminative cue integration for medical image annotation,” Pattern Recog. Letters, vol. 29, no. 15, pp. 1996–2002, 2008.

(a)

(b)

(c)

(d)

(e)

Figure 11: Use case II results for head scan ranges—The lower edge should go through the Canthomeatal line. (e) shows a case of strong patient motion, and the algorithm handled it robustly.

Thorax Thorax Heart Heart Abdomen Figure 12: Use case II results for other anatomical regions: thorax, heart, abdomen.

Figure 13: Use case III: Examples of the detected landmarks on different images.

Redundancy, Redundancy, Redundancy: The Three ...

Mar 31, 2010 - visual understanding step, a central component of intelli- gence, achieves ..... cording to correspondence points, which we call anatomical landmarks, or .... For the former task, we used an in-house database of close to 1500 ...

5MB Sizes 2 Downloads 235 Views

Recommend Documents

Redundancy, Redundancy, Redundancy: The Three ...
Mar 31, 2010 - Automatic detection and tagging of anatomical structures, ...... (e) shows a case of strong patient motion, and the algorithm handled it robustly.

Reusable and Redundancy Removable Component ...
the duplicity of requirements and will minimize the time taken and cost for ... Modern software systems become more and more large-scale, complex ... One can say that it is changing the way large software systems are developed. ... In October 2, 2012

Reusable and Redundancy Removable Component Based Software ...
Component Based Software Engineering development is based on the ... In the current run of competition, the software development companies even after the ...

Content Aware Redundancy Elimination for Challenged Networks
Oct 29, 2012 - Motivated by advances in computer vision algorithms, we propose to .... We show that our system enables three existing. DTN protocols to ...

Exploiting the Redundancy for Humanoid Robots to ...
legged platforms, especially the humanoid robots. Without this ... In HRP-2 platform, Guan et al. ..... Conference on Robotics and Automation, may 2006, pp.

the all digital photogrammetric workflow: redundancy ...
workflow offers numerous opportunities for improving the ... photogrammetric data analysis. ... digital signal processor (DSP) and the IEEE 1394 data transfer.

Redundancy or Mismeasurement? A Reappraisal of ...
a study of the role of money in forecasting inflation and/or nominal income, and more recently. 1This paper adopts the definition of stability used by Friedman and Kuttner (1992). Specifically, money demand is stable if there exists an identified mon

pdf-1853\statutory-redundancy-pay-amendment-bill-house-of ...
... apps below to open or edit this item. pdf-1853\statutory-redundancy-pay-amendment-bill-hous ... ills-by-great-britain-parliament-house-of-commons.pdf.

the all digital photogrammetric workflow: redundancy ...
and higher overlaps provide a key to optimized levels of accuracy, automation and robustness in a production environment. The design of the UltraCamD with its ...

A Practical Improvement to the Partial Redundancy ...
program along path P, if P is a path leading to the point q and the expression occurs at some point r on P with no changes to its operands between r and q on P.

pdf-1875\phytochemical-diversity-and-redundancy-in-ecological ...
... apps below to open or edit this item. pdf-1875\phytochemical-diversity-and-redundancy-in-ec ... s-recent-advances-in-phytochemistry-from-springer.pdf.

An Exploration of Parameter Redundancy in Deep ... - Sanjiv Kumar
These high-performing methods rely on deep networks containing millions or ... computational processing in modern convolutional archi- tectures. We instead ...

A Model-free Redundancy Resolution Technique for ...
relationship between robot end-effector position and joint angle vector. Many of the .... It is to be noted that the training data for simulation is generated using ...

INS-mr²PSO A Maximum Relevance Minimum Redundancy Feature ...
INS-mr²PSO A Maximum Relevance Minimum Redundanc ... r Machine Classification_Unler_Murat_Chinnam.pdf. INS-mr²PSO A Maximum Relevance Minimum ...

Comparison of Redundancy and Relevance Measures ...
2 Volume Visualization and Artificial Intelligence research group, ... This research was supported by the Spanish MEC Project “3D Reconstruction, clas- ...... Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Tech- niques.

Land-use intensification reduces functional redundancy and response ...
datasets initially made available for this project by our colleagues, 18 studies proved appropriate for analysis. Datasets were used for our analyses if they included a survey of ...... Rosenfeld, J.S. (2002). Functional redundancy in ecology and con

An Exploration of Parameter Redundancy in Deep ... - Sanjiv Kumar
view, this structure creates great advantages in both space and computation (detailed in Section 3.4), and enables effi- cient optimization procedures (Section 4).

Effects of consistency, grain size, and orthographic redundancy
Beyond the two-strategy model of skilled spelling: Effects of consistency, grain size, and orthographic redundancy. Conrad Perry. The University of Hong Kong, Hong Kong and Macquarie Centre for Cognitive Science,. Macquarie University, Sydney, Austra

redundancy or mismeasurement? a reappraisal of money
money demand and the role of money as an information variable, using Divisia monetary ... The author would also like to thank the associate editor and two .... degree of “moneyness.” .... For example, innovations in transactions technology are.

Characterization and Elimination of Redundancy in Recursive Programs
common generator redundancy, lprovided that the frontier condition holds for the set of functions. {gi!O

Moment Redundancy Test with Application to Efficiency ...
May 6, 2018 - If H(y1,y2) is absolutely continuous then C(u1,u2) is unique. In a panel setting, Prokhorov and Schmidt (2009b) used copulas to construct a ...

Redundancy and Computational Efficiency in Cartesian ...
Index Terms—Cartesian genetic programming (CGP), code .... There are three user-defined parameters: number of rows (r), number of ..... [Online]. Available: http://www.gaga.demon.co. uk/evoart.htm. [4] W. Banzhaf, P. Nordin, R. E. Keller, ...

XML schema refinement through redundancy detection ... - Springer Link
Feb 20, 2007 - egy to avoid data redundancies is to design redundancy-free schema from the ...... nodesCN,C A, andCNA are removed becauseC is an XML. Key. ..... the name element of province and the country attribute of city together ...