Automated Segmentation of Drosophila RNAi ...

Viewer
Transcript

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 11, NOVEMBER 2006

2415

Automated Segmentation of Drosophila RNAi Fluorescence Cellular Images Using Deformable Models Guanglei Xiong, Xiaobo Zhou, and Liang Ji

Abstract—Image-based high-throughput genome-wide RNA interference (RNAi) experiments are increasingly carried out to facilitate the understanding of gene functions in intricate biological processes. Robust automated segmentation of the large volumes of output images generated from image-based screening is much needed for data analyses. In this paper, we propose a new automated segmentation technique to fill the void. The technique consists of two steps: nuclei and cytoplasm segmentation. In the former step, nuclei are extracted, labeled, and used as starting points for the latter step. A new force obtained from rough segmentation is introduced into the classical level set curve evolution to improve the performance for odd shapes, such as spiky or ruffly cells. A scheme of preventing curve intersection is proposed to treat the difficulty of segmenting touching cells. Synthetic images are generated to test the capabilities of our approach. Then, we apply it to three types of Drosophila cells in RNAi fluorescence images. In all cases, accuracy of greater than 92% is obtained. Index Terms—Deformable model, fluorescence microscopy, fuzzy c-means (FCM), geodesic active contour, geometric active contour, image segmentation, level set, RNA interference (RNAi).

I. INTRODUCTION

I

N RECENT years, plenty of efforts have been focusing on the understanding of functions of genes in various biological phenomena. A gene’s function can be determined by inspecting changes in a biological process caused by its absence. Specific and reproducible loss-of-function phenotypes can be mimicked by addition of gene-specific double stranded RNA (dsRNA), which causes reduction or elimination of target gene expression by a process known as RNA interference (RNAi). The recent discovery and application of RNAi has revolutionized the functional analysis of genes and made high-throughput functional genetics a reality [1], [2]. Drosophila, a long-favored model organism for genetic studies, is emerging as a premier cell-based system for such systematic functional genetic analyses.

Manuscript received September 27, 2005. This work was supported in part by the Ministry of Science and Technology of China under Contract 2003CB517106. The work of X. Zhou was supported by the HCNR Center for Bioinformatics Research Program and the National Institutes for Health under Grant R01 LM008696. This paper was recommended by Guest Editor P.-C. Chung. G. Xiong and L. Ji are with Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China (e-mail: [email protected]; [email protected]). X. Zhou is with Center for Bioinformatics, Harvard Center for Neurodegeneration and Repair, Harvard Medical School, Boston, MA 02115 USA, and also with the Functional and Molecular Imaging Center, Department of Radiology, Brigham and Women’s Hospital, Boston, MA 02215 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCSI.2006.884461

Rac, Rho, and Cdc42 are members of the Rho family of small GTPases that play an important role in cellular signaling. They are involved in regulation of a variety of essential cellular processes, such as actin reorganization, endocytosis, cell migration, cell adhesion, cell polarity, and cell cycle. A number of effector molecules interact with them to relay or implement downstream responses. With the development of Drosophila RNAi technology to systematically disrupt gene expression, we are able to screen the entire genome for specific cellular functions. The present study intends to discover novel Rho protein effectors by image-based high-throughput RNAi screening. A dsRNA that leads to the loss of Rho-induced cytoskeletal structures will identify putative effectors. Although genome-wide assays are now routinely performed, manually interpreting such large-scale image datasets are timeconsuming and tedious. In addition, such a procedure is susceptible to observer bias, thus will not lead to reproducible and comparable results. A fully automated image analysis system is thus an urgent need. We present here a novel integrated and automated segmentation technique that aims at the task of analyzing the images produced by Drosophila RNAi experiments. It greatly reduces human labor and facilitates further quantitative analysis. Several automated methods on nuclei and cytoplasm segmentation have been proposed in recent years. Previous work can be mainly classified into two categories. The first approach is based on “region-growing” in which starting regions grow by connecting neighboring pixels satisfying some similarity measure. In [3], region growing is designed with local color features and global shape criteria to allow self-adaptation to difficulties of cell segmentation. In [4], regions are grown from manually or automatically marked small regions, known as seeds. Another kind of popular region growing method is the so-called “watershed” algorithm. It treats the intensity of an image as a topological surface. Watersheds, or crest lines, are built when the water rising from two minimums, i.e., catchment basins on the surface meet. All pixels associated with the same catchment basin are assigned to the same label. To overcome the problem of oversegmentation, seeded watersheds are often used in nuclei and cytoplasm segmentation [5]. In [6]–[8], Watershed algorithm and morphological reconstruction are employed to divide clusters of nuclei into smaller ones. In [9], a novel watershed algorithm that incorporates an improved distance transform and an explicit mathematical model for nuclei structure is proposed. Oversegmentation is eliminated by a statistical model-based merging mechanism. In [10], seeds are generated by the morphological filtering and gradient magnitude of the original image. Shape

1057-7122/$20.00 © 2006 IEEE

2416

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 11, NOVEMBER 2006

information is finally used to split and merge to obtain good results. The second approach is based on deformable models in which contours or surfaces evolve according to internal and external forces till they converge on nuclei or cytoplasm boundaries of interest. “Active contour” or “snake” is adopted in the first place. In [11], boundaries of axons are extracted by snakes beginning with a rough identification of their centers using an elliptical Hough transform. In [12], Voronoi diagram is employed to construct initial shapes of snakes, and then fiber boundaries are marked by snakes guided by different external forces in several stages. In recent years, deformable models based on level set have become increasingly popular because they neither require any explicit parameterization nor suffer from any constraints on the topology as snakes [13]. In [14], instead of using volume stains, specific stains labeling surfaces of nuclei and cells are used. Level set approach is adopted to guide the movement of surface in a sequence of flows. Dynamic programming approach that ensures that the identified border around each cell is globally optimal can be more reliable than local optimization methods in the case of cell surface marker [15]. Although the aforementioned methods are reasonably successful on the target images, none of them are proved to produce a satisfactory result on different set of images. The primary goal of this research will therefore focus on designing a specific image segmentation algorithm well applicable for the analysis of Drosophila RNAi images. There are at least three challenging problems ahead of this goal: 1) Intensity variations are present inside cells; 2) Shapes are sometimes not convex, e.g., spiky and ruffly cells are observed; and 3) cells are clustered. Because the methods based on the traditional watershed approach will fail to produce a satisfactory result as for 1) and 2), we choose the deformable model as our primary approach. The boundary of each cell is individually represented by one level set function. Nuclei and cytoplasm segmentation are simultaneously necessary in our case as the result of the former will serve as the starting points of curve evolution in the latter. The forces which guide the evolution are modified in order to make the contour stop accurately on nonconvex shapes. A scheme to prevent contours from intersection is proposed to divide touching cells, which exhibit weak or no edges across their touching boundaries. This paper is organized as follows. In the next section, we briefly introduce image acquisition procedure and discuss the preprocessing method that prepares the images for the subsequent segmentation. In Section III, our nuclei and cytoplasm segmentation algorithms are described. Tests on synthetic images and experiments on three types of Drosophila cells are presented in Section IV. We conclude in Section V. II. PROCEDURE FOR PAPER SUBMISSION A. Image Acquisition Drosophila Kc cells, stably transfected with a copper-inducible GFP RacV construct are seeded in a 384-well plate in which 250-ng dsRNA is previously arrayed. Cells take up the dsRNA in serum-free media during 45 min, and then serum-containing media is added. Cells are grown for three days to allow the RNAi process to reach efficiency. Expression of GFP

Fig. 1. RNAi typical dataset displayed as a 3-channel image. It is obtained by superimposing the image from F-actin, DNA, and GFP channels.

RacV is induced by adding copper sulfate to a final concentration of 0.75 mM. After 24 h to allow levels of RacV to accumulate, cells are fixed and stained with TRITC-conjugated phalloidin and DAPI to visualize F-actin and DNA, respectively. Three channels of digital images per well (UV for DAPI, FITC for GFP, and TRITC for TRITC-phalloidin) are acquired by objective of a Universal automated microscopy using a Imaging AutoScope. Expressions of Cdc V or RhoV in Kc cells are also acquired in the same way. FITC and TRITC channels are used for identifying the cell because their associated proteins happen to distribute across the cytoplasm of the cell. A typical RNAi dataset is shown as a 3-channel grayscale image in Fig. 1. B. Image Preprocessing As mentioned in the Section II-A, there are three channels in total for the same set of cells. The three channels are DNA, F-actin, and GFP channels. How to make the most of them for the subsequent segmentation task is of great importance. No imaging system is perfect, thus the effects of shot noise, uneven illumination, and degradation have to be reduced above all. Shot noise is suppressed by applying a median filter [16] to the images. Median filtering is nonlinear such that output pixel is determined by the median of the pixel in question and its neighborhood pixels. The median is not sensitive to outliers. Therefore, it works well to remove shot noise without blurring. Uneven illumination is corrected by a data-driven method invented in [17]. It works by iteratively making better cubic B-spline estimates of the background of the image. The corrected image is obtained by subtracting the estimated image from the original image. In practice, we find the convergence is fast and robust, and the result is satisfactory. Degradation is restored by the well-known Lucy-Richardson deconvolution method [18], [19]. The method iteratively maximizes the likelihood that the resulting image, when convolved with the point spread function (PSF), is an instance of blurred image. In addition, damping is adopted to control noise amplification and reduce ringing effect. The PSF is measured experimentally using fluorescent bead [20], [21], close to a Gaussian profile. In practice, we fix the number of iterations.

XIONG et al.: AUTOMATED SEGMENTATION OF DROSOPHILA RNAI FLUORESCENCE CELLULAR IMAGES

All images from the three channels are preprocessed in the same way as mentioned above. Some extra filtering and combination operations are still required. DNA stained with DAPI distribute uniformly across nuclei. Thus, image from DNA channel can be used for nuclei segmentation directly which is denoted by . An image is constructed whose gray value at each image point is the maximum of gray values at the corresponding points of images from F-actin and GFP channels. is then used for cytoplasm segmentation. Now we are in the position to segment the cells. III. NUCLEI AND CYTOPLASM SEGMENTATION Direct segmentation of cells that often touch each other in clusters is difficult if we do not know where each cell is located. Fortunately, the DNA channel provides us rough information of positions. Therefore, nuclei extracted from are appropriate to serve as the starting points for cell segmentation. We are capable of treating each cell individually and only need to pay special attention to the touching regions of cells. In this way, the burden of cytoplasm segmentation is greatly reduced. A. Nuclei Segmentation The nuclei segmentation is carried out in the DNA channel . Otsu’s thresholding method [22] is first applied to and each connected region is then labeled as nucleus candidate. We find that undersegmentation problem is predominant and oversegmentation is rare. Some basic prior information can be utilized for the further segmentation. In general, there is only one nucleus within each cell and it is usually near the center of cell. Therefore, it is rational to assume that one nucleus cannot enter another nucleus’s domain. In addition, the shape of nucleus is nearly convex. If we find one segmented nucleus is indeed nonconvex, this is most probably caused by undersegmentation. A method of detecting and resolving this problem is necessary to the accuracy of nuclei segmentation. Finally, since the range of nucleus size can be estimated empirically and the resolution of image acquisition equipment is known, the minimum and maximum of nucleus size can be used as a criterion to evaluate the result of nuclei segmentation. When one nucleus candidate is smaller than the minimum, it is treated as noise, whereas one is larger than the maximum, it is again caused by undersegmentation. We define the convexity of the nucleus candidate to indicate undersegmentation when it is small, as (1) where and are the area and convex hull area of the nucleus candidate respectively. The estimated minimum and maximum of nucleus size in pixels is defined as mins and maxs. The segmented nucleus candidate is treated as noise when its size is smaller than mins and a cluster of nuclei when its size is larger than maxs. Noise should be discarded and cluster should be further segmented. The smallest acceptable nucleus convexity is cv. The segmented nucleus candidate is treated as a cluster when its convexity is smaller than cv and again needs to be further segmented. Then, our algorithm of nuclei segmentation can be described as follows.

2417

Fig. 2. Example of nuclei segmentation. (a) Original DNA channel. (b) Segmentation by Otsu’s algorithm. (c) Segmentation by our algorithm.

1) Apply the Otsu’s algorithm [22] to threshold the DNA and label each nucleus candidate. A nucleus channel candidate list , is obtained. , then remove from the candidate list. 2) If or , split by applying Otsu’s 3) If algorithm again to its convex hull region and substitute it with obtained sub-candidates whose sizes are greater than mins. Repeat step 3) recursively for all sub-candidates. 4) Repeat steps 2) and 3) for all in the candidate list. In Fig. 2, we show an example of nuclei segmentation. The original channel and segmentation result by Otsu’s algorithm are shown in Fig. 2(a) and (b). Clearly, a simple threshold is not able to yield a good segmentation since noise and undersegmentation coexist in the result. However, our algorithm produces a satisfactory segmentation, as shown in Fig. 2(c), because of the consideration of suppression of noise and detection of undersegmentation. B. Cytoplasm Segmentation Although the rough positions of cells can be obtained from the result of nuclei segmentation, the cytoplasm segmentation remains a difficult task for a number of reasons. First, intensity variations are present inside cells. Simple thresholding algorithms, such as Otsu [22] and ISODATA [23], will give rise to holes or even division of one cell. Furthermore, the shape of cell is often nonconvex, spiky and ruffly cells are observed and exhibit more biologically significance. Segmentation methods based on distance propagation, such as Voronoi diagram, distance transform will fail to detect their complex boundaries. In addition, cells are sometimes clustered. Weak or even no edge exist across their touch regions. Edge-based approaches will definitely fail to handle these cases. To address the above problems, we propose a technique based on active contour model to segment cytoplasm. Prior to the description of our own method, we first briefly recall the background of active contour models and discuss their limitations of direct application to our data. Then, several modifications are proposed to make it more adaptive for our task. The idea of segmentation based on active contour models is straightforward. A contour deforms until it reaches the boundary of object to be detected. This is accomplished by constructing and solving a partial differential equation (PDE) that directs the evolution of the contour from its initial position and shape. There are mainly two kinds of active contour, namely, parametric active contour (PAC), and geometric active contour (GAC). PAC is based on energy minimization, whereas GAC is based on the theory of curve evolution and geometric flow [24]. Because GAC is well suitable to be implemented

2418

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 11, NOVEMBER 2006

with level set framework which is capable of handling touching cells naturally and does not require explicit parameterization, we adopt the GAC framework for the implementation of our algorithm. be a parameterized planar curve and Let be an input image in which the task of segmentation is considered. A fairly generic curve evolution can be defined by the following PDE [24]: (2) where is the Euclidean curvature, is the unit inward normal of the curve. and are scalar fields, and is a vector field defined on . All these fields need not to be constant and can depend on position and time. In (2), the first term expands or shrinks the curve along its normal. The second term deforms the curve guided by the vector field. The third term uses the curvature to make the curve stay smooth. Most of the previous GAC-based studies on image segmentation attempt to tailor (2) in order to suit their own data. Indeed, how to incorporate the image features effectively is the primary challenge when one decides to employ GAC to achieve one’s goal. Both edge and region properties can be utilized to improve the performance. Among the proposed models, geodesic active contour is popular and useful [24]. We choose it as the basis of our study. This model customizes (2) as (3) is to speed where is a constant. The term up the flow in those regions where image gradient is low or slow it down where there is high gradient . The term acts like a doublet, which aligns the curve closer to the , the centerline of edge, since it points towards the valley of the edge. The gradient operators in and can be efficiently implemented by convoluting with partially derivatives of 2-D Gaussian kernel [25]. This implementation makes the computation of gradient more robust to noise. For an ideal object that is homogeneous inside and sharp at the boundary, (3) works well since the first term pushes the curve continuously inside the object until the curve reaches the boundary while the second term aligns the curve to the boundary. In view of the challenges mentioned above, direct application of geodesic active contour to our data suffers from at least four disadvantages. Firstly, besides the boundary, there are also local inside the cell and will refuse to push minimums of forward the curve there. Secondly, the boundary of cell is often faint, and will continue to push the curve outside there. As a result, boundary leaking occurs. Thirdly, the capture range of needs to be expanded to yield a more robust result. Finally, the curvature term tends to make the curve stay smooth. Such geometric influence should be reduced if spiky and ruffly cells are considered. Modification of (3) is thus important to improve the performance of segmenting RNAi cellular images.

Fig. 3. Comparison between 3-class FCM clustering and 2-class Otsu’s algorithm. (a) Original preposessed image. (b) Histogram of (a) with thresholds by FCM and Otsu. (c) Binary image by Otsu. (d) Binary image by FCM.

To solve our particular problem, we design a novel PDE to achieve this goal. It is formulated as

(4) comparing with (3), the following modifications are made. 1) A term , which is a rough segmentation of cells, is introduced to supplement the local edge information in with global regional information. The constant is in the range (0, 1), which serves as the weight between them. 2) The term is replaced by which increases the capture range of the doublet. A constant is added to control the strength of doublet force. 3) The curvature term is attenuated by a factor . The factor is necessary because we want to control the effect of curvature term on contour smoothing in order to handle the spiky shape as well as the normal round shape, both of which are common in RNAi images. Although can be obtained by any binary segmentation technique, a simple but reliable one will not only provide a helpful regional information but also will not increase the computation burden too much. We herein propose a fairly effective method adopted in our algorithm. Most thresholding methods, such as Otsu and ISODATA, consider the segmentation as a problem of 2-class clustering. For example, the Otsu’s algorithm divides the histogram into two classes and the inter-class variance is minimized [22]. This assumption is often not true when dealing with fluorescence images. Take one of our images shown in Fig. 3(a) as an example. The threshold by Otsu’s algorithm is large and produces a poor result as in Fig. 3(c). From the histogram in Fig. 3(b), we can see there are indeed three peaks. Thus, the assumption of 2-class clustering is the cause of this problem. By clustering with 3-class, we ought to achieve a better result.

XIONG et al.: AUTOMATED SEGMENTATION OF DROSOPHILA RNAI FLUORESCENCE CELLULAR IMAGES

2419

Fig. 4. Original and new normal force to the curve. (a) Original normal force g (I ) of Fig. 3(a) normalized to [0, 1]. (b) New normal force g (I )+(1 )h(I ) of Fig. 3(a) where = 0:5.

0

Fuzzy c-means (FCM) clustering [26], [27] is an attractive clustering algorithm due to its property of convergence and low complexity. We use FCM with the 3-class concept to cluster the pixel values. The threshold is obtained by averaging the maximum in the class with the smallest center and the minimum in the class with the middle center. It is usually located at the position of the demarcation line of the first and second peaks in histogram. In Fig. 3(b), we compare the thresholds by FCM and Otsu. A more satisfactory result is produced by FCM in Fig. 3(d) than what is produced by Otsu in Fig. 3(c). can be The normal force to the curve thought as a tradeoff between the local and the global features. The idea behind it is straightforward. The edge information encoded in cannot utilize enough prior knowledge to guide the evolution of the curve. In addition, it is calculated from the gradient and thus highly sensitive to noise. Fortunately, global regional information encoded in can serve as a supplement. Constant is a scalar parameter weighting the importance of these two sources of information. If is small, we trust more regional information; otherwise, we trust more edge information. In Fig. 4, we compare the original and new normal force of Fig. 3(a), where . Clearly, the new force pushes forward the curve faster inside cells than the original one which traps it in a number of local minimums. Additionally, it decreases rapidly at the boundary and thus stops the curve there with the aid of . The doublet force in traditional geodesic active contour model can attract the curve to the desired boundary because it points towards the boundary on both sides [see the force on a simulated shape in Fig. 5(a)]. The doublet force is a diffusion of the original force which is inspired by the gradient vector flow (GVF) model [28]. It overcomes the problems of limited capture range and unable progressing into the boundary concavities of . As in GVF, is the steady-state solution of in the PDE

(5) where is a constant which controls the degree of diffusion. Because the information from the strong gradient is propagated throughout the image during the diffusion process,

0rg(I ) and 0rg (I ) with 0rg (I ) on the same shape. 0rg (I ) on the same region.

Fig. 5. Comparison of the two doublet forces = 0:1. (a) g (I ) on a simulated shape. (b) (c) g (I ) in the small box in Fig. 3(a). (d)

0r

0r

will have a larger capture range than in the sense of guiding the curve to the desired boundary more effectively [see on the simulated shape in Fig. 5(b)]. These the force two doublet forces of the small box in Fig. 3(a) are shown in has a larger Fig. 5(c) and (d). The new doublet force capture range than especially along the transition boundary from bright to dark. It should be noted that the values of the constants , , , in (4) and in (5) can be easily chosen if their effects on the curve evolution are known. The parameters , , are the tradeoff between the three terms in (4). While their absolute magnitudes determine the speed of curve evolution, their relative differences are more important for the final position of the curve convergence. Large accelerates curve expansion but may result in boundary leaking when it is large enough to exceed the constraint of the doublet force at the boundary. Large aligns the curve precisely with the desired boundary but may lead to the trap of the curve at the faint edge inside the object. Large smoothing the curve well is a benefit for the round object but a drawback for the spiky object. Small and large leads the curve more closely to the shape of the region encoded in and the region enclosed by edges in , respectively. Large allows the curve to approach fast near the object boundary. In practice, the parameters can be set intuitively by observing the results on a few images and then fixed in the segmentation process. For the implementation of our method, we adopt the level set approach [29]. Instead of manipulating the curve directly, the curve is embedded as the zero level set of a higher dimensional function, namely, level set function . The level set function is then evolved under the specific PDE. Meanwhile, the status of the curve can be obtained by extracting the

2420

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 11, NOVEMBER 2006

zero level set, . Equation (4) can be easily transferred to the level set representation

(6) Every cell owns one individual level set function, initialized from the segmented nucleus. It is evolved independently from others. This scheme works naturally for the case of isolated cells because the curve will stop at the boundary of the cell under investigation without interfering with other cells. However, special treatments need to be carried on with the segmentation of touching cells because two curves may intersect at the touching cell region. Several authors have proposed their solutions to the difficulties of curve intersection resulted from evolving on multiple touching objects. In [30], two topological operators are developed to split or merge the contours when contour intersection is detected. In [31], the edge map that is used to guide curve evolution of each cell is multiplied by a modulation image. This image acts as repulsive effects imposed by the neighboring cells. Both approaches are under the framework of parametric active contour. In terms of curve evolution implemented with level set, the issue of touching cells can be handled naturally by the following equation: (7) where is the total number of level functions which is equal to the number of cells and is the iteration step number. That is, whenever the level set function is updated one step, it is considered as the trial function. The actual updated function is determined by the maximum of this trial function and negative of functions of other cells to ensure that the contour of one cell cannot cross the contour of the other one. This scheme can be understood with an example illustrated in Fig. 6. In order to display the influence of (7) on level set functions clearly, this example is designed in one-dimensional. In 1-D, the region claimed by a levelset function is reduced to an interval where the levelset function is negative and its contour is just two single end points on both sides of the interval. Assume that in the th iteration step, there are three contours represented by , , and [as solid lines in Fig. 6(a)]. The role of (7) to prevent the intersection of the first two contours is shown in Fig. 6. The first is evolved to be the trial function accontour th step (as a dashed line in Fig. 6(a)). cording to (6) in the Obviously, the first contour runs into the region already claimed by the second contour . The touching region is drawn in the black region in Fig. 6(a). Thanks to (7), the actual contour in th step is updated to which is shown in Fig. 6(b). The first contour thus shares the same boundary with the second one at their touching region. Similar routines are also performed for the second and third contours to handle the touching issue. The process of contour evolution is finally checked by the root mean square change in the level set function and a predefined maximum iteration number. If the convergence of one level set function is detected, it is removed from the updating list to reduce the computation load.

Fig. 6. A 1-D example of preventing curve intersection. (a) Three contours , , and as solid lines are in the th iteration step. The first contour is evolved to the trial function as a dashed line in the th step. The touching region is drawn in the black region. (b) Actual first contour is the trail curve except that the dash line in the trail curve is replaced by the solid curve.

9 9 9

9

9

m

(m+1)

Fig. 7. Flowchart of our RNAi image segmentation algorithm.

Finally, we summarize the flow of our RNAi image segmentation algorithm in Fig. 7. IV. EXPERIMENTS AND RESULTS In this section, we first illustrate the ability of our approach to prevent weak-edge leakage, the robustness to noise, and the capability to handle spiky and ruffly shapes, interior intensity variation, a combination of artifacts, and touching objects on a number of synthetic images. Comparisons of results are made between the original geodesic active contour and our approach. Then, our method is applied to real RNAi Drosophila cells in three distinct shapes, namely, round, spiky, and ruffly. Performance is measured by comparing the difference between automated and manual segmentation. We adopt the described preprocessing step only on the real cell images. A. Ability to Prevent Weak-Edge Leakage The effect of weak-edge is occasionally seen in cell images. This issue may pose certain difficulties for segmentation of these images, particularly traditional active contour-based methods, such as geodesic active contour. The contour tends to “leak” to the background at locations of weak-edge. In Fig. 8, we illustrate the ability of our approach to prevent the problem of weak-edge leakage. The test object (as in [32]) is a filled circle with a small blurred region to the upper right as shown in Fig. 8(a). Results from the geodesic active contour and our approach are shown in Fig. 8(b) and (d), respectively [both are initialized with the small circle in in Fig. 8(a)]. The failure of the former is mainly due to the fact that there is no clear edge at blurred region that can trap the contour. As a result of the new normal force in our model [shown in Fig. 8(c)], the pushing force decreases dramatically into the background and thus helps to prevent the contour from leaking. B. Robustness to Noise Noise is common and unavoidable in fluorescence microscopy images. Therefore, we also compare the tolerance to

XIONG et al.: AUTOMATED SEGMENTATION OF DROSOPHILA RNAI FLUORESCENCE CELLULAR IMAGES

2421

Fig. 8. Ability to prevent weak-edge leaking. (a) Test object is a filled circle with a small blurred region to the upper right. A small circle is used to initialize the two evolutions. (b) Result from geodesic active contour. (c) New normal force. (d) Result by our approach.

noise between the geodesic active contour and our approach. The simulated object (as in [32]) is a harmonic shape generated , where by the equation in polar coordinate indicates there are six “bumps”. Gaussian white noises of different variance are added to the original image. The accuracy is measured by the root mean square radial error (RMSRE), which is the square root of the mean of the square of distance between the true boundary and the actual active contour. Noisy images are shown in the first column of Fig. 9 with a range of variances: 0.1, 0.2, , 0.7. The normal forces we propose are in the second column. The third and fourth columns display the final results of geodesic active contour and our approach. Both are initialized with a small circle in the center of the harmonic shape. The quantitative measure of RMSRE is listed in Table I. We can conclude that at low level of noise, both approaches work well, whereas, at high level of noise, our approach is more robust to the noise. This is mainly due to the new normal force that gathers global information from the initial rough segmentation and pushes the curves faster and stops them more accurately. C. Handling Spiky and Ruffly Shapes There are mainly three types of cell shapes observed in our Drosophila RNAi images, namely round, spiky, and ruffly. Because the round shape is easy to handle, we focus on testing our approach on the last two shapes. In the top row of Fig. 10, two spiky and two ruffly shapes are simulated. All contours are initialized to be a small circle in the center. In the last two rows, the final contours by geodesic active contour and our algorithm

Fig. 9. Robustness to noise. (Column 1) The original image with Gaussian white noise of variances: 0.1, 0.2, . . ., 0.7, from top to bottom; (Column 2) the normal force we propose; (Column 3) geodesic active contour’s results; (Column 4) our algorithm’s results.

are shown. For the spiky shape, geodesic active contour tends to smooth the contours due to the strong curvature term. For the ruffly shape, it tends to stop the contours at the interior boundaries because the normal force is rather weak there. Thanks to the new normal force and the explicit attenuation of curvature term, our algorithm works well on these two odd shapes. D. Handling Interior Intensity Variation Ordinary cells often show patterns of interior intensity variation. Segmentation algorithms are thus necessary to be checked on this issue before working on real cells. In the top row of Fig. 11, two objects with interior intensity variations are simulated. In the last row of Fig. 11, the final contours by geodesic active contour and our algorithm are shown. In view of the results, both methods work well. However, the average number of steps by geodesic active contour is about 1.2 times more than

2422

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 11, NOVEMBER 2006

TABLE I RMSRE FOR THE CONTOURS IN THE LAST TWO COLUMNS OF Fig. 9

Fig. 12. Handling combination of several artifacts. (a) Object is synthesized to possess artifacts of weak-edge, noise, and interior intensity variation. (b) Segmentation result by our approach.

Fig. 10. Handling spiky and ruffly shapes. (Row 1) Two spiky and two ruffly shapes are simulated. The contours are initialized with small circles; (Row 2) geodesic active contour’s results; (Row 3) our algorithm’s results.

Fig. 13. Handling touching objects. (a) Four touching objects are simulated. In the top left, there are two touching objects with a clear boundary between them. The other three clusters are two, three, and four objects without clear boundaries. Contours are initialized with circles; (b) Final showing that our approach can handle touching objects with and without clear boundaries.

intensity variation and converges to outer boundaries faster than geodesic active contour. E. Handling Combination of Several Artifacts In the last subsections, we have tested our approach on several single artifacts, such as noise and intensity variation. In Fig. 12, we assess if our algorithm is capable of handling a combination of several artifacts, namely weak-edge, noise, and interior intensity variation. These artifacts are synthesized in the test image. The contour is initialized to be a small circle in the center as in Fig. 12(a). The segmentation result in Fig. 12(b) demonstrates the successful handling of artifact combination. Fig. 11. Handling interior intensity variation. (Row 1) Two objects with interior intensity variations are simulated. The contours are initialized with small circles; (Row 2) geodesic active contour’s results; (Row 3) our algorithm’s results.

that of our algorithm. Thanks to the new normal force, our approach is less sensitive to the internal edge effect led by interior

F. Handling Touching Objects In Fig. 13, we will demonstrate our approach on handling multiple touching objects. Four clusters of objects are simulated for this purpose. In the top left, there are two touching objects with a clear boundary between them. The rest three clusters are

XIONG et al.: AUTOMATED SEGMENTATION OF DROSOPHILA RNAI FLUORESCENCE CELLULAR IMAGES

2423

TABLE II PARAMETERS USED IN OUR EXPERIMENTS

Fig. 16. Segmentation of ruffly shape (Left) DNA channels. (Middle) Cells with nuclei segmentation results superimposed. (Right) Cytoplasm segmentation.

Fig. 14. Segmentation of round shape (Left) DNA channels. (Middle) Cells with nuclei segmentation results superimposed. (Right) Cytoplasm segmentation.

segmentation of DNA channel shown in the middle by the curve and they evolve to the final cytoplasm segmentation shown in the right. The relative coincidences of automated and manual segmentation are 98.5%, 97.5%, 92.8%, 92.3%, 96.7%, and 94.1%, respectively. In all cases, accuracy of greater than 92% is obtained. V. CONCLUSION

Fig. 15. Segmentation of spiky shape (Left) DNA channels. (Middle) Cells with nuclei segmentation results superimposed. (Right) Cytoplasm segmentation.

two, three, and four objects without clear boundaries. The contours are initialized as small circles in Fig. 13(a). We can see from the results in Fig. 13(b) that our approach also yields a satisfactory on touching objects. It outputs a division following the edge if the touching objects own one, whereas a simple division close to a straight line if there is no clear boundary between them. G. Real Drosophila RNAi Images There are mainly three types of cell shapes observed in our Drosophila RNAi images, namely, round, spiky, and ruffly. Thus, we will test our algorithm on each type with the same parameters. The accuracy of segmentation is measured by computing the relative region coincidence, namely the area of the overlap region of automated and manual results divided by the area of the latter. In all experiments, an expert biologist is asked to perform the manual segmentation. Unless specified, the values for the parameters in all our experiments are listed in Table II. In Figs. 14–16, we illustrate two segmentation results on three types of cells, respectively. The DNA channels are shown in the left. The contours are initialized from nuclei

In this paper, we describe an automated segmentation method for high-throughput Drosophila cell RNAi fluorescence images. It comprises of two steps: nuclei segmentation and cytoplasm segmentation. The former serves as the starting point of the latter. In the first step, nuclei are extracted and labeled. In the second step, we propose a novel level set-based algorithm to segment the cells. In the experiment, we first illustrate the ability of our approach to prevent weak-edge leakage, the robustness to noise, and the capability to handle spiky and ruffly shapes, interior intensity variation, a combination of artifacts, and touching objects on a number of synthetic images. Comparisons and analyses of results are made between the original geodesic active contour and our approach. Then, three types of cells commonly observed in Drosophila RNAi images are considered and tested using our method. In all cases, we obtain accuracy of greater than 92%. This suggests that our method can potentially serve as a candidate tool in the computerized image analysis system for genome-wide RNAi bioassay screen. In addition, the cell segmentation method described in this paper is generic in its nature. We believe it can be utilized in other RNAi experiments with different cell types. Our future work will focus on fast implementation of our approach and extension of its application to high-throughput imaging experiments of other cell types. ACKNOWLEDGMENT The authors would like to acknowledge the collaboration in this research effort with their biology collaborators Dr. P. Bradley and Dr. N. Perrimon, the Department of Genetics, Harvard Medical School. REFERENCES [1] M. Boutros, A. A. Kiger, S. Armknecht, K. Kerr, M. Hild, B. Koch, S. A. Haas, R. Paro, and N. Perrimon, “Genome-wide RNAi analysis of growth and viability in Drosophila cells,” Science, vol. 303, pp. 832–835, 2004.

2424

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 11, NOVEMBER 2006

[2] A. A. Kiger, B. Baum, S. Jones, M. R. Jones, A. Coulson, C. Echeverri, and N. Perrimon, “A functional genomic analysis of cell morphology using RNA interference,” J. Biol., vol. 2, 2003. [3] C. Garbay, J. M. Chassery, and G. Brugal, “An iterative region-growing process for cell image segmentation based on local color similarity and global shape criteria,” Anal. Quant. Cytology Histology, vol. 8, pp. 25–34, 1986. [4] R. Adams and L. Bischof, “Seeded region growing,” IEEE Trans. Pattern Anal. Machine Intell., vol. 16, no. 6, pp. 641–647, Jun. 1994. [5] N. Malpica, C. O. de Solorzano, J. J. Vaquero, A. Santos, I. Vallcorba, J. M. Garcia Sagredo, and F. del Pozo, “Applying watershed algorithms to the segmentation of clustered nuclei,” Cytometry, vol. 28, pp. 289–297, 1997. [6] C. O. de Solorzano, E. G. Rodriguez, A. Jones, D. Pinkel, J. W. Gray, D. Sudar, and S. J. Lockett, “Segmentation of confocal microscope images of cell nuclei in thick tissue sections,” J. Microscop. Oxford, vol. 193, pp. 212–226, 1999. [7] X. Zhou, X. Cao, Z. Perlman, and S. Wong, “A computerized cellular imaging system for high content analysis in Monastrol suppressor screens,” J. Biomed. Inf., to be published. [8] X. Chen, X. Zhou, and S. Wong, “An automated method for cell phase identification in high-throughput time-lapse screens,” IEEE Trans. Biomed. Eng., to be published. [9] G. Lin, U. Adiga, K. Olson, J. F. Guzowski, C. A. Barnes, and B. Roysam, “A hybrid 3-D watershed algorithm incorporating gradient cues and object models for automatic segmentation of nuclei in confocal image stacks,” Cytometry Part A, vol. 56A, pp. 23–36, 2003. [10] C. Wahlby, I. M. Sintorn, F. Erlandsson, G. Borgefors, and E. Bengtsson, “Combining intensity, edge and shape information for 2-D and 3-D segmentation of cell nuclei in tissue sections,” J. Microscop. Oxford, vol. 215, pp. 67–76, 2004. [11] Y. L. Fok, J. C. K. Chan, and R. T. Chin, “Automated analysis of nervecell images using active contour models,” IEEE Trans. Med. Imag., vol. 15, no. 3, pp. 353–368, Jun. 1996. [12] A. Klemencic, S. Kovacic, and F. Pernus, “Automated segmentation of muscle fiber images using active contour models,” Cytometry, vol. 32, pp. 317–326, 1998. [13] R. Malladi, J. A. Sethian, and B. C. Vemuri, “Shape modeling with front propagation—a level set approach,” IEEE Trans. Pattern Anal. Machine Intell., vol. 17, no. 1, pp. 158–175, Jan. 1995. [14] C. O. De Solorzano, R. Malladi, S. A. Lelievre, and S. J. Lockett, “Segmentation of nuclei and cells using membrane related protein markers,” J. Microscop. Oxford, vol. 201, pp. 404–415, 2001. [15] D. Baggett, M. A. Nakaya, M. McAuliffe, T. P. Yamaguchi, and S. Lockett, “Whole cell segmentation in solid tissue sections,” Cytometry A, vol. 67A, pp. 137–143, 2005. [16] W. K. Pratt, Digital Image Processing. New York: Wiley, 1978. [17] C. Wahlby, J. Lindblad, M. Vondrus, E. Bengtsson, and L. Bjorkesten, “Algorithms for cytoplasm segmentation of fluorescence labelled cells,” Anal. Cell. Patholog., vol. 24, pp. 101–111, 2002. [18] L. B. Lucy, “An iterative technique for rectification of observed distributions,” Astronom. J., vol. 79, pp. 745–765, 1974. [19] W. H. Richardson, “Bayesian-based iterative method of image restoration,” J. Opt. Soc. Amer., vol. 62, pp. 55–59, 1972. [20] D. A. Agard, “Optical sectioning microscopy: cellular architecture in three dimensions,” Ann. Rev. Biophys. Bioeng., vol. 13, pp. 191–219, 1984. [21] D. A. Agard, Y. Hiraoka, P. Shaw, and J. W. Sedat, “Fluorescence microscopy in three dimensions,” Methods Cell Biol., vol. 30, pp. 353–377, 1989. [22] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst., Man, Cybern., vol. SMC-9, no. 1, pp. 62–66, Jan. 1979. [23] T. W. Ridler and S. Calvard, “Picture thresholding using an iterative selection method,” IEEE Trans. Syst., Man, Cybern., vol. SMC-8, no. 8, pp. 630–632, Aug. 1978. [24] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contours,” Int. J. Comput. Vision, vol. 22, pp. 61–79, 1997. [25] R. Deriche, “Fast algorithms for low-level vision,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, no. 1, pp. 78–87, Jan. 1990. [26] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Algorithms. New York: Plenum, 1981. [27] X. Zhou, X. Wang, E. R. Dougherty, D. Russ, and E. Suh, “Gene clustering based on cluster-wide mutual information,” J. Comput. Biol., vol. 11, pp. 151–165, 2004. [28] C. Y. Xu and J. L. Prince, “Snakes, shapes, and gradient vector flow,” IEEE Trans. Image Process., vol. 7, no. 3, pp. 359–369, Mar. 1998.

[29] J. A. Sethian, Level Set Methods: Evolving Interfaces in Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge: Cambridge University Press, 1996. [30] H. Delingette and J. Montagnat, “Shape and topology constraints on parametric active contours,” Comput. Vision Image Understand., vol. 83, pp. 140–171, 2001. [31] C. Zimmer, E. Labruyere, V. Meas-Yedid, N. Guillen, and J. C. Olivo-Marin, “Segmentation and tracking of migrating cells in videomicroscopy with parametric active contours: A tool for cell-based drug testing,” IEEE Trans. Med. Imag., vol. 21, no. 10, pp. 1212–1221, Oct. 2002. [32] X. H. Xie and M. Mirmehdi, “RAGS: Region-aided geometric snake,” IEEE Trans. Image Process., vol. 13, no. 5, pp. 640–652, May 2004.

Guanglei Xiong received the B.S. degree in automation and the M.S. degree in pattern recognition and intelligent systems from the Department of Automation, Tsinghua University, Beijing, China, in 2003 and 2006, respectively. He is currently working toward the Ph.D. degree in biomedical informatics at Stanford University, Stanford, CA. His research interests include biomedical image analysis, biomedical imaging, and machine learning.

Xiaobo Zhou received the B.S. degree in mathematics from Lanzhou University, Lanzhou, China, in 1988, and the M.S. and Ph.D. degrees in mathematics from Peking University, Beijing, China, in 1995 and 1998, respectively. From 1988 to 1992, he was a Lecturer at the Training Center, 18th Building Company, Chongqing, China. From 1992 to 1998, he was a Research Assistant and Teaching Assistant in the Department of Mathematics, Peking University. From 1998 to 1999, he was a Postdoctoral Research Fellow in the Department of Automation, Tsinghua University, Beijing, China. From 1999 to 2000, he was a Senior Technical Manager with the 3G Wireless Communication Department, Huawei Technologies Co., Ltd., Beijing, China. From February 2000 to December 2000, he was a Postdoctoral Research Fellow in the Department of Computer Science, University of Missouri-Columbia, Columbia, MO. From 2001 to 2003, he was a Postdoctoral Research Fellow in the Department of Electrical Engineering at Texas A&M University, College Station. From 2003 to 2005, he was a Research Scientist with the Harvard Center for Neurodegeneration and Repair, Harvard Medical School, Boston, MA, and also with the Radiology Department, Brigham and Women’s Hospital, Boston, MA. Since 2005, he has been an Instructor in the Radiology Department at Brigham and Women’s Hospital and at Harvard Medical School. His current research interests include image and signal processing for high-content molecular and cellular imaging analysis, informatics for integrated multiscale and multimodality bio/medical imaging analysis, molecular imaging informatics, neuroinformatics, bioinformatics for genomics and proteomics. He has published more than 100 papers and holds five patents. He has authored one book titled Computational System Bioinformatics, which is to be published. Dr. Zhou is a Member of SPIE, and Sigma Xi and he was listed in Who’s Who in Science and Engineering in 2004.

Liang Ji received the B.S. degree from Tsinghua University, Beijing, China, and Ph.D. degree from Aalborg University, Aalborg, Denmark. He has worked at Beijing Nuclear Instrument Factory and MRC Human Genetics Unit, Edinburgh, Scotland, U.K. He is currently a Professor of Information Processing in the Department of Automation, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology, Beijing, China. His research interests include bioinformatics, modernization of traditional Chinese medicine, molecular imaging, and biomedical image analysis.

Automated Segmentation and Anatomical Labeling of ...