Visualization of Large Collections of Medical Images ...

Viewer
Transcript

Visualization of Large Collections of Medical Images: State Of The Art Jorge Camargo March 20, 2009

Abstract Nowdays a huge amount of medical information is available as large collections of digital images which have relevant information that it is useful for health professionals. Traditional information retrieval systems are not enough in order to provide good tools that help to phyisicians in the diagnosis process. Information visualization area focuses on study how to display large collections of information to users in a intuitive way. In this paper, we describe the state-ofthe-art of visualization of large collections of images such as it can be useful for address the visualization in medical context. Information visualization, images, projection methods, MDS, ISOMAP

3.3

4 Development Perspectives 4.1

4.2

2 Problem Denition 2.1 2.2

Motivation . . . . . . . . . . . . . . . Problem Denition . . . . . . . . . . .

3 Background And Related Work 3.1

3.2

Background . . . . . . . . . . . . . . . 3.1.1 Dimensionality Reduction (Projection Methods) . . . . . 3.1.2 Nonlinear Dimensionality Reduction . . . . . . . . . . . . . 3.1.3 Performance Measures . . . . . Related Work . . . . . . . . . . . . . . 3.2.1 Visualization . . . . . . . . . .

Main 4.1.1 4.1.2 4.1.3

. . . . . . . . . . . . . . . . . . . . . in Medical . . . . . . .

Working Areas . . . . . . . . . . Medical information retrieval . Visual Data Mininig . . . . . . Interactive Image Collection Navigation . . . . . . . . . . . Open Problems . . . . . . . . . . . . .

5 Conclusions

Contents 1 Introduction

3.2.2 Summarization . . 3.2.3 Exploration . . . . 3.2.4 Indexing . . . . . . Information Visualization Area . . . . . . . . . . . .

5 5 5 5

5

5 5 5

5 5

5

1 Introduction 1

The huge amount of visual and multimedia data is growing exponentially thanks to the development of 2 Internet and to the easy of producing and publishing 2 multimedia data. This generates two main problems: 2 how to nd eciently and eectively the information needed, and how to extract knowledge from the 2 data. The problem has been mainly addressed from 3 the Information Retrieval (IR) perspective, and this approach has been very useful dealing with textual 3 data [4]. However, there are still a huge amount of work to do on other kind of non-textual data, such 3 as images. Information visualization techniques [18] 4 are an interesting alternative to address the problem 4 in the case of large collection of images. Information 4 visualization techniques oer ways to reveal hidden 1

information (complex relationships) in a visual representation and allow users to seek information in a more ecient way [22]. Thanks to the human visual capacity for learning and identifying patterns, visualization is a good alternative to deal with this kind of problems. However, the visualization itself is a hard problem; one of the main challenges is how to nd low-dimensional, simple representations that faithfully represent the complete dataset and the relationships among data objects [14]. The majority of existent CBIR approaches use a 2D grid layout for visualizing results. Figure 1 shows a screenshot of the result for a query in Google Images.

are necessary in health centers to assist diagnosis tasks eectively and eciently. For instance, a medical doctor may have a diagnostic image and wants to nd similar images associated to other cases that helps him to assess the current case. Previously, the doctor would need to sequentially traverse the image database looking for similar images, a process that could be unfeasible for moderately large data bases. Nowadays, image visualization techniques provides a good alternative by generating compact representations of the collection, which are easier to navigate allowing the user to nd quickly the information needed. The use of projection methods based only on low-level features is a common strategy in visualization of image collections, but it exists a huge semantic gap in the resulting visualization since domain knowledge is not taken into account.

2.2 Problem Denition The general problem of visualization can be divided into more specic problems: projection method, summarization, exploration, and indexing. Projection Figure 1: Typical grid layout for visualizing the result method is concerned about how to reduce the high obtained in a query using Google Images dimensionality of the original images into a low diThe rest of this articles is organized as follows: Sec- mensional space that preserves the original structure. tion II, presents the problem denition; background Summarization cares about the how to visualize a and related work are presented in Section III; in Sec- portion of the collection to user that summarizes the tion IV, development perspectives are presented; we entire dataset. Exploration focuses on how to provide to the user a intuitive way for exploring the visualconclude the article in Section V. ization. Indexing is concerned about how to organize the information extracted from images in a ecient way and how to use it in visualization and summarization phases. Before reviewing visualization research, we rst introduce the problem and motivate it with some issues in medical context.

2 Problem Denition

2.1 Motivation

3 Background Work

In the medical eld, many digital images (x-ray, ultrasound, tomography, etc.) are produced for diagnosis and therapy. The Radiology Department of the University Hospital of Geneva generated more than 12,000 images per day in 2002, which requires Terabytes of storage per year [12]. Visualization tools

In this section we describe the background of visualization of large collections of images and we comment the main works performed in this area. Figure 3.3, shows a mental map that expresses the main concepts involved in visualization of large collections of images. 2

And

Related

3.1 Background

serves pairwise distances among objects. If these distances correspond to Euclidean distances, the results of metric MDS are equivalent to PCA [16].

3.1.1 Dimensionality Reduction (Projection Methods)

GPCA.

There are dierent methods to reduce the dimensionality of a set of data points. Generally these methods select the dimensions that best preserve the original information. Methods like Multidimensional Scaling (MDS) [20], Principal Component Analysis (PCA) [8], and Isometric Feature Mapping (Isomap) [19], have been useful for this projection task. MDS is a technique that focusses in nding the subspace that best preserves the interpoint distances and it uses linear algebra solution for the problem. The process involves the calculation of Eigenvalues and Eigenvectors of a scalar product matrix and proximity matrix. The input is a similarity matrix of images in a high-dimensional space and the result is a set of coordinates that represent the images in a low dimensional space [22]. Figure 3.1.1 shows a visualization of Corel dataset using this method.

ISOMAP uses graph-based distance computation in order to measure the distance along local structures. The technique builds the neighborhood graph using k -nearest neighbors, then uses Dijkstra's algorithm to nd shortest paths between every pair of points in the graph, then the distance for each pair is assigned the length of this shortest path and nally, when the distances are recomputed, MDS is applied to the new distance matrix [14]. On the other hand, additionaly to ISOMAP which is a method that preserves the non-linear structure of the objects, there exist another methods like Locally Linear Embedding (LLE) [16], an unsupervised learning algorithm that computes low-dimensional neighborhood preserving embeddings of high dimensional data. SNE [6] is method based on the computation of probabilities of neighborhood assuming a Gaussian distribution, in both the high dimensional and the 2D space. The method then tries to match the two probability distributions. Finally, in [14] it is proposed a combination of non-linear methods to build new methods.

3.1.2 Nonlinear Dimensionality Reduction On the other hand, some works focus in nonlinear dimensionality reduction problems. In [21], authors address the problem of learning a kernel matrix for nonlinear dimensionality reduction, they try to answer the question of how to choose a kernel that best Figure 2: Visualization of Corel dataset usign MDS preserves the original structure. In [3], authors proposed kernel Isomap, a modication of the original method Isomap method inspired in kernel PCA, where they PCA is an Eigenvector method too designed to address generalization and topological stability probmodel linear variabilities in high-dimensional data. lems found in the original Isomap method. Finally, it The method computes the linear projections of great- is important to highlight that, up to our knowledge, est variance from the top Eigenvectors of the data the problem of medical image collection visualization covariance matrix. In classical MDS, the low dimen- has not been previously addressed by the information sional embedding is computed such that best pre- visualization community. 3

3.1.3 Performance Measures Visualization methods require of measures that provide information about how good the visualization is. There are formal and informal measures that have been used in related works and they are very useful for future works. Kruskal Stress measure [9]. It is a formal and well known measure used in MDS which expresses the difference between the distances d in the original space and the Euclidean distance D in the projected space for all the images. A small value of stress indicates that the original distances have been preserved in the projected space. It measure has been used in [17, 7]. Stress is calculated using equation1. P i,j (di,j − Di,j ) P (1) Stress = i,j (D ²i,j )

Mp = (1/M )

Mc = (1/M )

σp = (1/M )

σc = (1/M )

XX

XX

XX

XX

Di,j ,

d(mi , mj ),

Di,j ² − M ²p ,

d²(mi , mj ) − M ²c ,

M = k(k − 1)/2

where mi is the center of the cluster containing image i and d(mi , mj ) is the distance between two S cluster centers. Kullback-Leibler measure [10]. It is a formal meaSearch time measure[15]. It is a informal measure sure that calculates the dierence between the dis- used in experimental phase with real users, in specic tribution probabilities of the original and projected search tasks, in order to determine the time taken by spaces. In [14], authors try to match the two proba- users nding a target image. bility distributions for nding the optimal projected Search eciency measure [15]. It is a informal positions by minimizing the cost function, they use measure used in experimental phase with real users this cost optimization in order to preservate the in specic search tasks in order to determine the ratio structure. Distance is calculated using equation2. between percentage of correct images selected and the The lower this cost, the better the projection has pre- browsing duration. They show two graphics: search served the relations between neighbors. eciency per method and search time. XX Pi,j Cs = Pi,j log (2) 3.2 Related Work Q i,j i j In this subsection, we relate the existent works that

Hubert statistic measure [5]. It is a formal mea- have addressed the visualization, summarization, ex-

sure commonly used in clustering and used in [14] ploration and indexing issues. Also, we relate works for measuring the quality of summarization. Authors that have addressed the information visualization use this cost optimization in order to t the overview. problem on medical context. It measure ranging from 0 to 1, where the higher the value the better the clustering. The overview cost 3.2.1 Visualization function is expressed in equation 3. In [14, 13] projection methods like MDS, PCA, r − Mp Mc (3) Isomap, Local Linear Embedding (LLE) [16] and Co = 1= σp σc combinations of them are used for experimenting. In this work the main concerns are overview, visiwhere bility and structure preservation. The optimization XX r = (1/M ) Di,j d(mi , mj ), of the limited space for visualizing is addressed in 4

4 Development Perspectives

[7]. This optimization is dealt from computational complexity perspective and is used the unpopulated presentation areas. In [17] it is proposed a modication to MDS method that solves the overlapping and occluding problems. The unused visualization space is used for locating images that are occluding or overlapping. A regular grid structure is used to relocate images. Chen [2] uses a pathnder-networkscaling technique for visualization and the images are related using a similarity measure beased on color, layout and texture. That work uses the GSA framework [1] that provides a pathnder implementation for experimentation. In [11] it is proposed a browsing strategy which uses a one-page overview, proposing a task driven attention model in order to optimize the visualization space. Users can interact with the overview with a slider bar that permits to adjust the image overlapping. Porta [15] developed dierent non-conventional methods for visualizing and exploring large collection of images like cube, snow, snake, volcano, funnel and others.

4.1 Main Working Areas 4.1.1 Medical information retrieval Paper retrieval by visual content, diagnostic image collection exploration.

4.1.2 Visual Data Mininig Pattern nding in medical image collection.

4.1.3 Interactive Image Collection Navigation Dynamic visualization based on online user relevance feedback.

The majority of works published in the area of image collection visualization use datasets like Corel which contains images of general purpose but it is not easy to nd researchers working in visualization for Content-Based Medical Image Retrieval (CBMIR).

4.2 Open Problems Medical image collection visualization, is an unexplored area that oers interesting and challenging problems. In Figure 4.2, there is a classication of research questions that without answer currently in visualization of large collections of medical images.

3.2.2 Summarization

5 Conclusions

3.2.3 Exploration

Medical image collection visualization, is an unexplored area that oers interesting and challenging problems. First at all, a huge amount of medical is produced routinely in health centers that 3.3 Information Visualization in Med- images demand eective and ecient techniques for searchical Area ing, exploration and retrieval. Second, these images have a good amount of semantic, domain-specic conIn the literature it is not easy to nd works focused tent that has to be modeled in order to build eecon medical datasets and it is more dicult to nd tive medical decision support systems. Information researchers working in visualization for medical do- visualization methods coupled with machine learning main. Normally, general purpose datasets like ALOI, techniques may provide meaningful representation of Corel, and TRECVID are used. medical image collections.

3.2.4 Indexing

5

Figure 3:

Mental map of visualization of large collections of images

References

formation Processing Systems 15. MIT Press, 2003.

[1] C. Chen. Generalised similarity analysis and pathnder network scaling. Interacting with Computers, 10(2):107128, May 1998.

[7] T. Janjusevic and E. Izquierdo. Layout methods for intuitive partitioning of visualization space. Information Visualisation, 2008. IV '08. 12th International Conference, pages 8893, July 2008.

[2] Chaomei Chen, George Gagaudakis, and Paul Rosin. Similarity-based image browsing. 2000. [3] Heeyoul Choi and Seungjin Choi. Robust kernel isomap. Pattern Recognition, 40(3):853862, March 2007.

[8] I.T. Jollie.

Principal component analysis.

Springer-Verlag, 1989.

[9] J.B Kruskal and M. Wish. Multidimensional scaling. Sage Publications, 1978.

[4] A. Del Bimbo. A perspective view on visual information retrieval systems. Content-Based Ac-

cess of Image and Video Libraries, 1998. Pro- [10] S. Kullback and R. A. Leibler. On information ceedings. IEEE Workshop on, pages 108109, and suciency, annals of mathematical statis-

Jun 1998.

tics. 22:7986, 1951.

[5] R. C. Dubes. How many clusters are best? - an experiment. Pattern Recognition, [11] Bing Liu, Wei Wang, Jiangjiao Duan, Zhihui Wang, and Baile Shi. Subsequence similarity 20(6):645â663, 1987. search under time shifting. In Information and [6] Georey Hinton and Sam Roweis. Stochastic Communication Technologies, 2006. ICTTA '06. neighbor embedding. In Advances in Neural In2nd, volume 2, pages 29352940, 2006. 6

Figure 4: Research questions in visualization of large collections of medical images [12] Henning Muller, Nicolas Michoux, David Ban- [18] Jock D.Mackinlay Stuart K. Card and Ben don, and Antoine Geissbuhler. A review of Shneiderman. Readings in Information Visualcontent-based image retrieval systems in mediization: Using Vision to Think. Morgan Kaufcal applicationsclinical benets and future dimann Publishers, 1999. rections. International Journal of Medical In[19] V. Tenenbaum, J. B. de Silva and J. C. Langford. formatics, 73(1):123, February 2004. A global geometric framework for nonlinear di[13] G. P. Nguyen and M. Worring. Similarmensionality reduction. Science, 260:23192323, ity based visualization of image collections. 2000. In Seventh International Workshop of the EU Network of Excellence DELOS on AUDIO- [20] MS Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401419, VISUAL CONTENT AND INFORMATION 1958. VISUALIZATION IN DIGITAL LIBRARIES

(AVIVDiLib'05), 2005.

[21] Kilian Q. Weinberger, Fei Sha, and Lawrence K. Saul. Learning a kernel matrix for nonlinear dimensionality reduction. In ICML '04: Proceed-

[14] G. P. Nguyen and M. Worring. Interactive access to large image collections using similaritybased visualization. Journal of Visual Languages & Computing, 19(2):203224, April 2008.

ings of the twenty-rst international conference on Machine learning, page 106, New York, NY,

USA, 2004. ACM. [15] Marco Porta. Browsing large collections of images through unconventional visualization tech- [22] Jin Zhang. Visualization for Information Retrieval. Springer, 2008. niques. In AVI '06: Proceedings of the working conference on Advanced visual interfaces, pages 440444, New York, NY, USA, 2006. ACM Press. [16] L. Saul S. Roweis. Nonlinear dimensionality reduction by locally linear embedding. Technical report, 2000. [17] Gerald Schaefer and Simon Ruszala. Image database navigation on a hierarchical mds grid. http://dx.doi.org/10.1007/11861898_31, 2006. 7

Visualization of Large Collections of Medical Images ...

Visualization of Large Collection of Medical Images

Kernel-Based Visualization of Large Collections of ...

Visualization, Summarization and Exploration of Large ... - CiteSeerX

Learning Part-based Templates from Large Collections of 3D Shapes

SPARSE REPRESENTATION OF MEDICAL IMAGES ...

Visualization, Summarization and Exploration of Large ...

Clustering Billions of Images with Large Scale ... - Research at Google

Watermarking of Chest CT Scan Medical Images for ...

ContentBased Access to Medical Image Collections

Stationarity and Regularity of Infinite Collections of Sets

Focus+ Context Visualization Techniques for Displaying Large Lists ...

Enhanced NURBS modeling and visualization for large ...

Efficiency of Large Double Auctions

Multidimensional Visualization of Communication ...

Study of wheat genetic variation in base collections

Download Collections of Nothing For Free

Clustering and Recommending Collections of Code ...

OpinionSeer: Interactive Visualization of Hotel ...