Author Guidelines for 8

Viewer
Transcript

RECOGNIZING LIVE FISH SPECIES BY HIERARCHICAL PARTIAL CLASSIFICATION † BASED ON THE EXPONENTIAL BENEFIT Meng-Che Chuang1, Jenq-Neng Hwang1, Fang-Fei Kuo1, Man-Kwan Shan2, Kresimir Williams3 1

Dept. Electrical Engineering University of Washington {mengche, hwang, ffkuo}@uw.edu

2

Dept. Computer Science National Chengchi University [email protected]

ABSTRACT Live fish recognition in open aquatic habitats suffers from the high uncertainty in many of the data. To alleviate this problem without discarding those data, the system should learn a species hierarchy so that high-level labels can be assigned to ambiguous data. In this paper, a systematic hierarchical partial classification algorithm is therefore proposed for underwater fish species recognition. Partial classification is applied at each level of the species hierarchy so that the coarse-to-fine categorization stops once the decision confidence is low. By defining the exponential benefit function, we formulate the selection of decision threshold as an optimization problem. Also, attributes from important fish anatomical parts are focused to generate discriminative feature descriptors. Experiments show that the proposed method achieves an accuracy up to 94%, with partial decision rate less than 5%, on underwater fish images with high uncertainty and class imbalance. Index Terms— hierarchical partial classification, exponential benefit, feature extraction, live fish recognition, underwater imagery 1. INTRODUCTION Fish abundance estimation [1], which often calls for the use of bottom and mid-water trawls, is critically required for the conservation and management of commercially important fish populations. To improve the quality of survey, we developed the Cam-trawl [2] to capture image/video data of live fish. We have developed algorithms that successfully analyze the collected data by performing fish segmentation, length estimation, counting and tracking [3], [4]. This framework can be further extended with the capability of species identification, which allows for monitoring the species composition and thus provides a mean of assessing the status of fish stocks and the ecosystem. While object recognition in various contexts has been well investigated, there exist fundamental challenges to identifying fish in an unconstrained natural habitat. For freely-swimming fish, there is a high uncertainty existing in many of the data because of poor image quality, non-lateral † This project was made possible through research funds from NMFS’ Advanced Sampling Technology Working Group, NOAA.

3

National Oceanic and Atmospheric Administration [email protected]

fish views or curved body shapes. Critical information in these data may be lost or comes with large measurement error. Huang et al. use a hierarchical classifier constructed by heuristics to control the error accumulation [5]. However, errors still propagate to the leaf layer once they occur, so the accuracy is reduced when the amount of ambiguous data increases. Even without uncertainty, fish share a strong visual similarity among species. Common features for image classification are hence not discriminative since they represent merely the global appearance of an object. To address the uncertainty issue, one approach is partial classification, i.e., allowing indecision in certain regions in the data space [9], [10]. Partial classification has shown its effectiveness in practical scenarios such as medical diagnosis cases. However, the importance of unclassified instances is gone since no information about the data is retrieved from them. Besides, no systematic methods have yet been proposed to determine the criteria of decision making. For fish features, previous work focuses on local attributes such as contours or texture [6], [7] or relies on feature selection theories [8]. However, these methods do not take advantage of fish body structure, which provides useful information of species in a highly-concentrated way. In this paper, a hierarchical partial classification based on the exponential benefit for recognizing live fish images is proposed to overcome the problems described above. Specifically, we (1) build a class hierarchy by unsupervised learning and then introduce partial classification to allow assignments of incomplete but high-level labels; (2) define the exponential benefit to evaluate partial classifiers, and hence formulate the selection of decision criteria as an optimization problem; (3) learn a fish classifier by using part-aware features to identify visually-similar fish species. The rest of this paper is organized as follows. Section 2 introduces the proposed hierarchical partial classifier. Section 3 describes methods of extracting features from a fish image. Section 4 shows the experimental results, and the conclusion is given in Section 5. 2. HIERARCHICAL PARTIAL CLASSIFIER To overcome the uncertainty in live fish images, we present a classification algorithm that discovers a tree structure for

A-a-Sp1 Sp1

a

A-a

A

ID

C2

C1

Sp2

Sp3 Fish

Sp4 a

Sp5

B

B

f(x) = -t

f(x) = 0 f(x) = t

Sp6

Fig. 1. An example of hierarchical partial classifier. The class hierarchy is learned from the training data via an unsupervised recursive clustering procedure. The fully-classified instance (blue) reaches the leaf layer and receives a complete label sequence A-a-Sp1, while the ambiguous instances (green) stop at middle layers and receive incomplete sequences A-a and B.

the classes and allows for indecision for ambiguous data. A hierarchy of classes (fish species) is established to determine the higher level class groupings, where the exact class label cannot be identified. In the testing phase, the input data instance is examined by layers of classifiers, each of which gives a label prediction. If the instance falls in the indecision range at any layer, the classification procedure stops and returns an incomplete sequence of class labels. In this way, misclassifications are avoided without losing the entire information provided by uncertain data. The concept of hierarchical partial classifier is illustrated in Fig. 1. 2.1. Unsupervised Class Hierarchy The class hierarchy follows a binary tree structure, i.e., each node separates data into two categories. The arrangement of class grouping is learned by an unsupervised recursive clustering procedure as follows. The EM algorithm on mixture of Gaussians is applied to separate all data into two clusters, which can be viewed as “positive” and “negative” data respectively. For each species, data are relabeled based on which cluster the majority of this species belongs. An RBF-kernel SVM is trained with these two super-classes. The above steps are then repeated separately within each clusters until there is only one species in each cluster. Note that the produced class hierarchy may be imbalanced, which does not affect the recognition performance. To handle the class imbalance issue, which is caused by the dominance of one or more species in the sampled habitats, a biased-penalty approach is adopted during the SVM training procedure [11]. Rather than using only a single penalty parameter, the penalty parameters for positive and negative classes are set differently to C  C  N Ntotal and C  C  N Ntotal , where C is the original penalty parameter, N  and N  denote separately the number of positive and negative training samples, and Ntotal  N  N .

Fig. 2. Partial classification for an SVM. Ambiguous data have small absolute decision values, so they fall in the indecision domain (ID) and are not assigned to either of the classes.

2.2. Exponential Benefit for Partial Classification After the SVM classifier is trained, one needs to define its indecision domain in order to enable partial classification. Inspired by the evaluation of deferred decisions [5], our task is formulated as an optimization problem. Given a set of data (xi , yi ) , i  1,..., N , and an SVM decision function f : R d  R trained by these data, the generalized benefit function of partial classification is defined as

B( D)  sc (x) P( D, yˆ  y)  sw (x) P( D, yˆ  y) ,

(1)

where sc (x) and sw (x) are score functions for correct and wrong decisions, respectively, and D denotes the event of decisions being made. (1) can be interpreted as the expected value of total reward for classification, where one earns sc (xi ) points for being correct, loses sw (xi ) points for being wrong, and gets zero points for an indecision with the i-th data point. The score functions can be any nonnegative functions that decreases monotonically with respect to f (x) so that a greater importance is added to ambiguous data. We hence choose sc (x)  sw (x)  exp( f (x) ) . The goal is to find D that maximizes (1). First note that y {1,1} . Also, a correct decision implies yf (x)  0 and a wrong decision implies yf (x)  0 . As shown in Fig. 2, a partial SVM classifier makes a decision only if f (x) is greater than a threshold t . Therefore (1) can be written as

B(t )  e yf ( x ) P( yf (x)  t )  e yf ( x ) P( yf (x)  t ) 

1 N





e ai 1[t  ai ]   i 1 eai 1[t  ai ] , i 1 N

N

(2)

where ai : yi f (xi ) and 1[] denotes the 0-1 indicator function. It can be easily verified that indicator functions are bounded below by exponential functions, i.e.,

 1[t  a ]   (1  e  1[t  a ]   e N

i 1

N

i

N

i 1

t  ai

i 1

N

i

i 1

 t  ai

.

),

(3) (4)

1.5 Decision Rate Exponential Benef it

1

0.5 0

-0.5

Fig. 4. Fish binary masks (top row) and their vertical projections (bottom row). The boundary positions for the tail and head region are labeled separately by red and turquoise lines and dots.

-1

-1.5

TABLE I FEATURES FOR FISH RECOGNITION Dim. Description 1 Body length / body height 1 Length from mouth to tail / total length

-2 0

0.5

1 Decision Threshold

1.5

2

Feature Aspect ratio Length ratio

Fig. 3. Decision rate and exponential benefit vs. decision threshold.

Using (3) and (4), we define an exponential benefit function, Bexp (t ) , which serves as a lower bound of B(t ) :

Bexp (t )  (1 N )( i 1 e ai (1  et  ai )   i 1 e ai et  ai ) N

N

 (1 N )( i 1 e ai  et  i 1 e2 ai )  et . N

N

(5)

An example of the exponential benefit function with respect to decision threshold is shown in Fig. 3. Based on this, selecting the decision threshold can be written as an inequality constrained minimization problem:

min et  i 1 e2 ai  Net

(6)

s.t. f min  t  f max , Bexp (t )  Bexp (0) ,

(7)

N

t

where f min  mini 1,..., N f (xi ) and f max  maxi 1,..., N f (xi ) . Constraints in (7) ensure not only feasible solutions but also a gain in the exponential benefit function comparing to full classification. The problem defined in (6) and (7) is solved by applying the barrier method [12]. Finally, the optimal threshold t  is found and used in the testing phase. 3. FEATURE EXTRACTION

3.1. Head/Tail Localization To determine the head/tail side of a fish body, we utilize the concept of 1-D image projection. For the i-th fish binary mask Bi (u, v) , with image size Wi  H i , the vertical projection is given by



Hi v 1



Bi (u, v)  G (u ), u  1,...,Wi ,

1

maxu piB,vert (u) min u piB,vert (u)

Tail size Tail shape Tail texture Head size Head shape Eye texture

1 19 16 1 19 16

Tail area divided by body area Fourier descriptor of tail contour LBP histogram of tail Head area divided by body area Fourier descriptor of head contour LBP histogram of eye

uit  arg mint piB,vert (u ) , uih  arg minh ui

ui

d B ,vert pi (u ) , (9) du

where ti and ih are the tail and head search ranges, respectively. Formulations in (9) are given by knowledge of fish anatomy that in general the boundary between tail and body is thinnest in width, while the boundary between head and body is where the body width tends to stop increasing. Examples of head and tail localization are shown in Fig. 4. In addition to head and tail, the eye is also an important key to distinguishing fish species. An object detector based on the Viola-Jones cascade classifier [13] is trained and employed inside the head region to locate the eyes. Details of training the fish eye detector are given in Section 4.1. 3.2. Feature Descriptors

The major visual differences among fish species often lie in some external anatomical parts, such as the head, the caudal (tail) fin and eyes location and shapes. These body parts are therefore emphasized by the proposed part-aware feature extraction to identify the fish species.

piB,vert (u ) 

Projection ratio

(8)

where G (u ) denotes the 1-D Gaussian filter with standard deviation  . Using the vertical projection, positions of tail and head are estimated by

The size, shape and texture attributes of fish body parts mentioned above are extracted as species features. The size attributes are estimated by the part area calculated by the connected components algorithm and normalized by the whole body area. Shape is represented by the Fourier descriptor of contour points normalized by its DC component so that the descriptor is scale-invariant. Texture properties are represented by the histogram of local binary patterns (LBP) within the detected body parts [14]. Moreover, by leveraging fisheries knowledge, several global attributes of the fish body are also added to the features. These includes the aspect ratio, the projection ratio and the length ratio of fish body. The species features, with a total dimensionality of 75, are summarized in Table I. 4. EXPERIMENTAL RESULTS 4.1. Dataset and Simulation Setup

Adult Pollock (622)

Eulachon (206)

Rockfish (203)

Juvenile Pollock (145)

King Salmon (74)

Capelin (45)

Chum Salmon (30)

TABLE II ACCURACY & PARTIAL DECISION RATES OF THE PROPOSED METHOD Classification Accu. (%) PD (%) Hierarchical Full 90.79 0.00 Hierarchical Partial 93.65 4.91 TABLE III RECOGNITION PERFORMANCE VS. CLASSIFICATION ALGORITHMS Classification Accu. (%) Naïve Bayes 37.96 k-NN 64.83 Random Forest 71.85 Multiclass SVM 84.23 Huang [5] 82.64 Proposed 93.65

Rate (%)

Fig. 5. Fish species in underwater video sorted in the descending order of the number of training examples. 100 90 80 70 60 50 40 30 20 10 0

Precision Recall F1-Score KS

CS

JP

AP

C

E

R

Average

Fig. 6. Precision, recall and F1-score using the proposed algorithm. 100 90 80

[0.1li ,0.4li ] and iR  [0.6li ,0.9li ] , where li is the fish length. The tail search range is given by ti  iL  iR , and then the head search range by ih  iX , X {L, R} such that uit  iX . For fish eye detection, the Viola-Jones object detector is trained by a hand-labeled set consisting of 1240 eye samples and 1680 non-eye samples at resolution of 8  8 pixels. All classifier parameters are selected by a 10fold cross-validation. 4.2. Recognition Performance The accuracy and partial decision rate (percentage of data not classified to species level) of the proposed algorithm are shown in Table II. Allowing partial classification, which uses the optimal decision criterion, increases the accuracy by 3% while only 4.9% of data receive incomplete categorizations. Since there is little fish recognition work focusing on the classifier, we compare our method with several popular classification algorithms available in OpenCV library and Huang’s method [5]. The accuracy rates on the Cam-trawl dataset are reported in Table III. The proposed algorithm performs favorably against uncertainty and highly-skewed class composite. The precision, recall and F1-score for each class by using the proposed method are shown in Fig. 6. These metrics are defined as follows:

70

F1-Score (%)

The proposed system is used to recognize 1325 grayscale fish images captured by the Cam-trawl system [2]. Each image is scaled to no larger than 300  300 pixels with its aspect ratio preserved. The dataset includes 7 fish species, as shown in Fig. 5. One can see the high visual similarity shared among species (e.g., King Salmon and Chum Salmon) and severe class imbalance exists in this dataset. Our previous work on automatic fish segmentation [3] is performed to generate binary masks of live fish. In head/tail localization, the standard deviation of Gaussian filter is set to   2 . We empirically assign iL 

60

50 40 30

20 10

All Features All Features except Length Ratio All Features except Projection Ratio All Features except Eye Texture All Features except Tail Texture All Features except Head Shape

0 KS

CS

JP

AP

C

E

R

Average

Fig. 7. F1-score of each species vs. unselected features. Each bar represents a feature that is not used during training.

TP TP recall  (10) TP  FP TP  FN 2  precision  recall F1 -score  , (11) precision  recall where TP, FP and FN denote true positive, false positive and false negative, respectively. The proposed method achieves an average F1-score of 94%. To further investigate the importance of each specified feature, a sensitivity test is reported in Fig. 7. From the F1-scores, one can see that Tail Texture and Projection Ratio have higher impact on the recognition performance. precision 

5. CONCLUSION We proposed a hierarchical partial classification to recognize live fish captured by underwater cameras. Problems of uncertainty in data and class imbalance are successfully mitigated. Even at around 90% of accuracy, the proposed algorithm further improves the recognition performance, while information from ambiguous data can still be partially retrieved. Attributes from specific fish body parts generate the features that discriminates visually similar species very well. Future work includes a systematic approach to discover nontrivial but informative features for fish species classification and utilizing temporal information from video to enhance the robustness of recognition.

6. REFERENCES [1] D.G. Hankin and G.H. Reeves, “Estimating total fish abundance and total habitat area in small streams based on visual estimation methods,” Can. J. Fish. Aquat. Sci., vol.45, no.5, pp. 834-844, 1988. [2] K. Williams, R. Towler, C. Wilson, “Cam-trawl: a combination trawl and stereo-camera system,” Sea Technol., vol. 51, no. 12, Dec. 2010. [3] M.-C. Chuang, J.-N. Hwang, K. Williams, and R. Towler, “Automatic fish segmentation via double local thresholding for trawl-based underwater camera systems,” in Proc. 2011 IEEE Int. Conf. on Image Process., pp. 3145-3148, Sep. 2011 [4] M.-C. Chuang, J.-N. Hwang, K. Williams, and R. Towler, “Multiple fish tracking via Viterbi data association for low-framerate underwater camera systems,” in Proc. 2013 IEEE Int. Symp. on Circuits and Syst., pp. 2400-2403, May 2013. [5] P.X. Huang, B.J. Boom, and R.B. Fisher, “Hierarchical Classification for Live Fish Recognition,” in Proc. 2012 British Mach. Vision Conf., Sep. 2012. [6] D.J. Lee, S. Redd, R. Schoenberger, X. Xu and P. Zhan, “Contour matching for a fish recognition and migration-monitoring system,” in Optics East, Int. Soc. for Optics and Photonics, vol. 37, Dec. 2004. [7] C. Spampinato, D. Giordano, R. Di Salvo, Y.-H. ChenBurger, R.B. Fisher and G. Nadarajan, “Automatic fish classification for underwater species behavior understanding,” in Proc. 1st ACM Int. Workshop on ARTEMIS, pp. 45-50, 2010. [8] M. Nery and A. Machado, “Determining the appropriate feature set for fish classification tasks,” in Proc. 18th IEEE Brazilian Symp. on Computer Graphics and Image Process., 2005. [9] Y. Baram, “Partial classification: The benefit of deferred decision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, pp. 769-776, Aug. 1998. [10] K. Ali, S. Manganaris, and R. Srikant, “Partial classification using association rules.” in Proc. 1997 KDD, pp. 115-118, 1997. [11] K. Morik, P. Brockhausen and T. Joachims, “Combining statistical learning with a knowledge-based approach - A case study in intensive care monitoring,” in Proc. 1999 Int. Conf. on Mach. Learning, pp. 268-277, 1999 . [12] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004, pp. 568-578. [13] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. 2001 IEEE Conf. on Comput. Vision and Pattern Recognition, vol.1, pp. 511-518, 2001. [14] T. Ojala, M. Pietikainen and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local

binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, pp. 971–987, Jul. 2002.