Relevance feedback based on non-negative matrix ... - IEEE Xplore

Viewer
Transcript

Relevance feedback based on non-negative matrix factorisation for image retrieval D. Liang, J. Yang and Y. Chang Abstract: As a powerful tool for content-based image retrieval, many techniques have been proposed for relevance feedback. A non-negative matrix factorisation (NMF)-based relevance feedback approach is introduced. This approach uses a standard NMF algorithm to construct a reliable semantic space from a pool of relevant images based on a user’s interactions, because the latent semantic space derived by NMF does not need to be orthogonal, and each image is guaranteed to take only non-negative values in all the latent semantic directions. It means that each axis in the space derived by NMF has a straightforward correspondence with each image semantic class. In addition, the hidden semantic features of the query and images in the database are extracted with an NMF-projecting algorithm. By memorising the feedback information provided by the user, the knowledge accumulated from past relevance interaction is used to update semantic space, which results in the semantic space being closer to the user’s expectation. The experiments show that the proposed NMF-based relevance feedback approach performs better than other relevance feedback approaches.

1

Introduction

Relevance feedback, initially developed in text retrieval [1], has been introduced into content-based image retrieval (CBIR) to narrow the gap between the low-level features and high-level semantics. Most previous work on relevance feedback can be classified into the following three categories: query reweighting [2 – 4], query point movement [3, 5] and query expansion [6 –8]. The query reweighting approach associates the different weights with different components of the query representation. The query point movement approach attempts to move the point towards the relevant points and away from the irrelevant points. In these two approaches, a query is represented by a single point in a feature space, and nearest-neighbour sampling is used. This kind of nearest-neighbour sampling approach will work well on the basis of the assumption that the initial query example is good and the query concept is convex in the feature space [7], but it is unpractical. The query expansion approach can be regarded as a multipleinstances sampling approach, which does not assume that a query is represented as a single point in a multidimensional feature space. It refines the query by selectively adding new relevant samples to the query representation. The samples of a subsequent round are selected from the neighbourhood of the positive examples of the previous round. The work by Porkaew et al. [6] shows that query expansion approach did not provide dramatic performance over query point movement approach. These previous relevance feedback approaches focus on estimating and refining the ideal query only from the low# The Institution of Engineering and Technology 2006 IEE Proceedings online no. 20050168 doi:10.1049/ip-vis:20050168 Paper first received 17th June and in revised form 14th October 2005 The authors are with the Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200030, People’s Republic of China E-mail: [email protected]

436

level features, not taking into account the actual semantics of the images. However, users often pay more attention to the semantic content of an image than to the image data, and the correspondence between user-based semantic contents and system-based low-level features is not one –one. It means that similar low-level features may represent different semantic content, and the similar semantic content may be associated with different sets of image low-level features. Hence, using low-level features may not be effective in representing user’s intentions. In recent years, much has been written about learningbased relevance feedback approaches [9 –14]. Wu et al. [10] proposed a discriminant-EM algorithm that formulates image retrieval as a transductive learning problem, in which both unlabelled and labelled samples are used in training. Because EM algorithm is a greedy algorithm, this scheme is difficult in computational expense, especially when database is large. A typical problem of CBIR system with relevance feedback is the relatively small number of training samples and the high dimension of the feature space. Given the information above, support vector machine (SVM) has also been incorporated relevance feedback [9, 11, 12] to learn a boundary that separates the relevant images from the irrelevant images with a hyper-plane in a projected space. However, it is difficult to estimate the real distribution of negative images in the database based on the relevance feedback, because negative examples may belong to any semantic class. He et al. [13] used singular value decomposition (SVD) to construct a compact semantic space from a pool of relevant images based on user’s interactions. Su et al. [14] used principal component analysis (PCA) to extract and update a reduced subspace during the feedback process. However, the latent semantic space derived by SVD and PCA is orthogonal, which implies that all query topics are orthogonal. In practice, it is also quite common that the high-level semantics comprising an image collection are not completely independent of each other, and there are some overlaps among them, different queries may involve common high-level semantic. In IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 4, August 2006

such a case, the axes of the semantic space that capture each of the semantics are not necessarily orthogonal. As we know, human perception on an image cannot be obtained automatically from the image data and must be combined with human knowledge. This process needs much experience accumulated from the past. However, most of these learning-based approaches have a common disadvantage that they do not memorise the relevance feedback information provided by users in the previous round, which means the knowledge accumulated from past interactions is not used. Recently, an unsupervised technique for searching a reduced representation of global data, non-negative matrix factorisation (NMF), was proposed in the work of Lee and Sueng [15]. As matrices decomposed by NMF contain only non-negative values, the original data are represented in the reduced space by only additive, not subtractive, combinations of the non-orthogonal basis vectors. This characteristic is appealing because it reflects the intuitive notion of combining parts to form a whole. Contrary to NMF, the basis vectors obtained by PCA and SVD are orthogonal. The problem is that complex cancellations are generally involved when the original data are represented as linear combinations of those basis vectors with positive and negative coefficients, which contradicts physical reality. The motivation of introducing NMF to relevance feedback is that image data and features are non-negative, which means that NMF can be a suitable technique for such a problem. Furthermore, the latent semantic space derived by NMF does not need to be orthogonal, and each image is guaranteed to take only non-negative values in all the latent semantic directions. It means that each axis in the space derived by the NMF has a much more straightforward correspondence with each semantic category than in the space derived by the SVD and PCA. Determining the class label for each image is as simple as finding the axis with which the image has the largest projection value [16]. In this paper, we propose an NMF-based relevance feedback approach for image retrieval. The idea is introduced as follows. After the first retrieval conducted in low-level feature space, the user labels a pool of relevant images called positive examples to infer a semantic space by standard NMF algorithm. Then, the query image and each image in the database are projected into the semantic space to extract the semantic features by NMF-projecting algorithm. Finally, the images in the database are ranked based on the distance between each image and the refined query in this semantic space. Some top images are returned to the user waiting for the next round of relevance feedback. In addition, at each bound, the relevant images obtained from the past retrieval interaction are accumulated to update the query and derived semantic space, which make the query and the semantic space be closer to the user’s expectation gradually. Our experiments show that the approach we proposed performs better than other relevance feedback approaches. 2

Non-negative matrix factorisation

NMF is an outstanding method to decompose a nonnegative matrix (hence well-suited for images) into two non-negative matrices. The basic idea of NMF is to find a reduced space that not only preserves much of the structure of the original data but also guarantees that both the low-dimensional basis and its accompanying encodings are non-negative. The non-negative constraints lead to a part-based representation because they allow only additive, IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 4, August 2006

not subtractive, combinations of the original data. When applied to images, the NMF algorithm can find a new reduced representation of the image data. We can project the new image into the reduced space spanned by the NMF basis and work with the projected coefficients. In addition, it is advantageous for applications involving large matrices, because the NMF computation is based on the simple iterative algorithm [16 – 20]. Now considering x1 , x2 , . . . , xm in Rn are the feature vectors representing images in the database. The image database is regarded as an n m matrix V, each column of which is a n-dimensional non-negative feature vector belonging to one of the m examples in the original database. The goal of NMF is to find two new matrices to approximate the whole database as V ’ WH or Vim ’ ðWHÞim ¼

r X

W ia H am

ð1Þ

a¼1

The dimensions of W and H are n r and r m, respectively. The product WH can be regarded as a compressed form of the data in V, as rank r is usually chosen so that (n þ m)r , nm. Each column of W contains a basis vector. Each column of H is called an encoding and is in one-to-one correspondence with the examples in V. An encoding contains the coefficients by which an example is represented with a linear combination of basis vectors, but only additive combinations are allowed. In order to find an approximate factorisation V WH, an objective function that quantifies the quality of the approximation has to be defined. Such a function can be constructed using some measure of distance between original samples V and its approximation WH. One useful objective function is defined when the distance is measured by Kullback – Leibler divergence F¼

n X m h i X V im logðWHÞim ðWHÞim

ð2Þ

i¼1 m¼1

An iterative approach to reach a local maximum of this objective function is given by W ia

W ia

X V im H am ðWHÞim m

W ia

W P ia W ja

H am

H am

X i

W ia

V im ðWHÞim

ð3Þ

ð4Þ

The NMF algorithm starts with positive random initial conditions for matrices W and H. This iterative process is easy to implement and is not time consuming given that it is based on simple operations. Refer to Lee and Seung [21] for more information about detailed version of NMF and different implementations of NMF update rules. In addition to the non-negativity, another property of NMF is that the columns of W tend to represent clusters of semantically relative elements as used in the semantic analysis of a corpus of encyclopedia articles [15]. NMF, when applied to images, can find a new reduced representation of the image data. Thus, the basis matrix W can be regarded as describing a hidden semantic space and matrix H contains the semantic features of the samples in the original data matrix V. 437

space of H1 2 H3, S1 2 S3 and C1 2 C3, respectively and Fig. 2c plots the images in the space of H2 2 H3, S2 2 S3 and C2 2 C3, respectively, where H1, H2, H3 are the three row vectors of H obtained by NMF algorithm, S1, S2, S3 are the first three singular vectors of the SVD and C1, C2, C3 are the three vectors of coefficient matrix obtained by PCA algorithm. From Fig. 2, we can obtain a similar view with that in Fig. 1. First, each axis corresponds to a semantic class in the NMF space, and all the images belonging to the same semantic class spread along the same axis. Determining the class label for each image is to find the axis with which the image has the largest projection value. Although the data points are separated to the different clusters in the space derived by SVD and PCA, there is no direct relationship between the axes and the semantic classes. Second, each image is guaranteed to take only non-negative values in all the latent semantic directions in the NMF space, whereas each image may take negative values in some of the semantic directions in the PCA and SVD space. In our work, the function of standard NMF algorithm is to infer a new reduced space from the sample data and obtain the encoding matrix of the sample data in this space. The following problem is how to represent new vectors using a predefined set of basis vectors. Next, we introduce the NMF-projecting algorithm [22], that projects new vectors into the predefined reduced space and obtain the coefficients. In NMF-projecting algorithm, V is constructed by new data vectors. We can iterate the same algorithm without modifying the basis matrix W and starting with a positive random matrix H . When iteration converges, we will obtain the representations of new data vectors as V ’ WH in the reduced space, whose columns are regarded as the coefficients and in one-to-one correspondence with the new examples in V . In addition, rank r and the number of iterations maxiter in NMF-projecting algorithm are the same as in standard NMF algorithm.

In order to demonstrate the difference between NMF and other matrix factorisation algorithms such as SVD and PCA, we apply NMF, SVD and PCA algorithms to two data sets. The first data set consists of two clusters, and each of which contains ten two-dimensional samples. The size of the original data matrix V is 2 20. The number of the basis vector is two. Equation (5) shows the basis matrix generated from V by NMF, SVD and PCA algorithms, respectively W NMF ¼

0:33544 0:6515

U SVD ¼ W PCA ¼

0:66546 0:3485

0:74579 0:66618 0:66618 0:74579

0:89855

0:43888

0:43888

0:89855

ð5Þ

We plot the dataset and the directions found by NMF, PCA and SVD algorithms, respectively, in Fig. 1. From this figure, we can see that NMF does not require the derived low-dimensional space to be orthogonal, and it guarantees that each sample takes only non-negative values in all the directions. These characteristics bring about some important benefits [16]. First, when overlap exists among clusters, NMF can still find a latent semantic direction for each cluster, whereas the orthogonal latent semantic directions derived by SVD and PCA are less likely to correspond to each of the clusters. Second, a sample is represented by an additive combination of the base latent semantics with NMF, which makes more sense in the image domain. Third, the cluster membership of each sample can be easily identified from NMF, whereas the latent semantic spaces derived by SVD and PCA do not provide a direct indication of the data partitions. The second dataset consists of three clusters, and each cluster contains ten images. Each image is represented by 100-dimensional low-level features, then the size of original data matrix V is 100 30. The number of the basis vector is three. We plot the data set in the derived three-dimensional spaces obtained by NMF, SVD and PCA algorithms, respectively. Fig. 2 shows the data distributions in three spaces. In this figure, images belonging to the same semantic class are depicted by the same symbol (point, circle and triangle). In Fig. 2a, b, and c, top-left for NMF, top-right for SVD and bottom for PCA, respectively. The three figures in Fig. 2a plot the images in the space of H1 2 H2, S1 2 S2 and C1 2 C2, respectively. Fig. 2b plots the images in the

a

3

NMF-based relevance feedback approach

In this section, we will introduce our NMF-based relevance feedback approach in detail. Table 1 gives a list of symbols used in the following discussion. 3.1

First retrieval in low-level feature space

When an image Q is submitted as a query, the ndimensional low-level features (colour, shape, texture, etc.) of Q and each image in database are extracted to

b

c

Fig. 1 Directions found by different algorithms a NMF b SVD c PCA 438

IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 4, August 2006

a

b

c

Fig. 2 Data distributions in the three derived spaces a NMF b SVD c PCA

conduct the first retrieval. Note that no semantic features are involved at this stage. 3.2 Inferring a semantic space by standard NMF algorithm Assuming there are u relevant images among the t returned images, user selects these positive examples to construct the matrix VR (say of size n u) represented by vector space model [23]. In VR, the jth sample is presented by n-dimensional low-level features vector VRj ¼ fVRj1, VRj2, . . . , VRjngT. The standard NMF algorithm is used to factorise VR into basis matrix WR and encode matrix HR. As described in Section 2, the basis matrix describes a hidden semantic space inferred from the positive examples, thus the jth corresponding encoding HRj ¼ fHRj1, HRj2, . . . , HRjrgT is regarded as the semantic features of the jth sample in VR. 3.3 Extracting the hidden semantic features of the images in the database Assuming there are k images in the database represented by VP (say of size n k) besides t returned images, we project k images into the semantic space WR by NMF-projecting algorithm and obtain the encoding matrix IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 4, August 2006

HP, whose columns are the semantic features and in one-to-one correspondence with the samples in VP. 3.4 Representing user’s query with semantic features of positive examples Intuitively, if more information is given, the representation of the semantic content of the query is more precise. The mean value of the semantic features of the u positive examples is used to construct a new semantic feature vector HQ, which is not only used to represent the semantic of the query but also the query vector in the next round. The semantic feature vector of the new query HQ ¼ fHQ 1, Q T , . . . , H g is HQ 2 r m P

HQ i ¼ 3.5

j¼1

H Rij

m

;

i ¼ 1r

ð6Þ

Ranking and showing retrieval results

Before the distance between the query and each image in the database d(HQ, HPj ), j ¼ 1 k is calculated, we accumulate the u relevant images. After ranking the images in the semantic space, these u relevant images are inserted in the front of the sorted image line to form a 439

Table 1: Some symbols and their definitions. Symbol

Description

Q

the query image

n

the dimension of the low-level features

t

the number of the returned images

u

the number of the relevant images

VR

the positive examples matrix

in top t results constructed by relevant images VjR W

¼

R fVj1 ,

R Vj2 ,

...,

R T Vjn g

R

the jth sample in VR the basis matrix inferred from the positive examples

HR

the encoding matrix inferred from

R R , Hj2 , . . . , HjrRgT HjR ¼ fHj1

the semantic features of VRj

k

the number of images in the

the positive examples

database besides t returned results VP

the matrix consisting of images in

HP

the encoding matrix inferred from

H Q ¼ fH1Q, H2Q, . . . , HrQgT

the semantic features of the query

d(H Q, HjP)

the distance between the query and

Fig. 3 Flowchart of NMF-based relevance feedback approach

the database images in the database

Table 2: The low-level features used in our system Colour1

sub-range accumulative histogram with

Colour2

hue-saturation combined histogram with

Shape

wavelet modulus maxima with four levels and

Texture

the first and second moments of filtered image

quantisation 180 [24]

each image in the database

quantisation 100 [25] seven moments in each level

new line. Note that, the first u images in the new top t results are just the u images that are used to infer the semantic space in Section 3.2. If the user is not satisfied with the new retrieved results, the feedback process is repeated until the returned results satisfy their expectation. Besides the relevant images memorised before, there are some new relevant images among the returned images. The user can select all the relevant images to construct a new positive examples matrix, which means that the knowledge obtained from the past retrieval interaction is used. Then the decomposed new basis matrix results in the updating of the semantic space, thus the semantic features of all images in the database and the query are updated subsequently. After re-ranking in the updated semantic space, another new top t images are returned to the user. Fig. 3 is the flowchart of our NMF-based relevance feedback approach.

4 4.1

Experimental results and discussion Test data and features

We explore the performance of our image retrieval system integrated with NMF-based relevance feedback approach. The image database we used consists of 4742 images, which are collected from Corel data set and the Internet. There are 80 semantic categories and each category consists of 15 –100 images. Two types of colour features, one type of texture features and one type of shape features as listed in Table 2 are used in our system. Then the total dimension of feature vectors of one image is 340. 440

by Gabor wavelet with four levels and four orientations

4.2

Evaluation criteria

We use two evaluation criteria to evaluate the performance of our proposed approach. The first is the retrieval accuracy defined as in He et al. [13] Accuracy ¼

relevant images retrieved in top t t

ð7Þ

The second is average-retrieval-rank, defined as NR P

pi Average-retrieval-rank ¼ i¼1 NR

ð8Þ

where NR is the number of all relevant images to the given query in the database. pi is the position of the ith relevant image in image line ranked by distance to the query image. For example, for a query image, there are ten relevant images in the database consisting of 500 images. In the best case, where the ten correct matches occupy the first ten positions, this average-retrieval-rank would be 5.5. Whereas in the worst case, where the ten correct matches occupy the last ten positions, this average-retrieval-rank would be 495.5. We can see that the criterion of average-retrieval-rank is independent of the number of returned images. For the same image database, the smaller the average-retrieval-rank, the more effective is the retrieval system. IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 4, August 2006

In our experiments, we randomly select 10% images from each semantic category as the queries, and then the retrieval accuracy is the average retrieval accuracy of all query images. For each query, top 12 and 20 images are returned to the user. 4.3 Parameters selection in standard NMF algorithm When standard NMF algorithm is used to infer a semantic space from positive examples, we need to determine two parameters: rank r and the number of iterations maxiter. r is the dimension of the semantic space and maxiter is the number of iterations of standard NMF algorithm. As described previously, top t images are shown to the user per round of interaction. If the retrieved images are all the relevant images, namely u ¼ t, it is not necessary to feedback. If only one relevant image exists in the returned results, namely u ¼ 1, it is not enough to construct a semantic space. Then the range of u is [2, t 2 1] for relevance feedback. As we know, the rank r should satisfy r , nu/ (n þ u). It is easy to prove that the range of r is [1, u 2 1] for any u considering top 12 or 20 images are returned. We randomly select some images as the queries to explore system performance when r takes different values among [1, u 2 1]. In experiments, u is different in each round of feedback of each query image. We find that the system performs best when r takes u 2 1. In addition, the retrieved results tend to be stable after 100 iterations. Thus, the values of parameters r and maxiter are u21 and 100, respectively.

the database are extracted using NMF-projecting algorithm. The second alternative approach is described in Feng et al. [20]. Figs. 4 and 5 show the average-retrieval-rank and average retrieval accuracy of different approaches for extracting hidden semantic features, respectively, as a function of the numbers of rounds of user’s feedback. As can be seen from the figures, our approach performs better than the other two approaches, although the reduced semantic space is same. The results indicate that the features extracted by NMF-projecting algorithm are more reliable to represent images in the semantic space than those extracted by matrix operation. In addition, the mean value of semantic features of the positive examples is more precise to represent the ideal query than only the semantic features of the original query image. 4.5 Effectiveness of NMF-based relevance feedback approach We compare NMF-based relevance feedback approach with three common relevance feedback approaches: SVD-based relevance feedback approach, SVM-based relevance feedback approach [11] and query reweighting approach [2]. In SVM-based approach, RBF kernel is used. The SVDbased approach, is similar to the NMF-based approach, but the SVD instead of NMF is used to infer a semantic space and the semantic features of each image in the database are extracted using matrix operation as in Feng et al. [20]. Fig. 6 shows average-retrieval-rank of four relevance feedback approaches, as a function of the number of rounds of user feedback. As can be seen, NMF-based approach performs best among four relevance feedback

4.4 Hidden semantic features of the query and images in the database The key problem in relevance feedback is to extract the hidden semantic features of the query and each image in the database. One common approach is introduced by Feng et al. [20] for face recognition, using NMF algorithm. In this approach, after obtaining basis matrix W, each face image xi is projected into the new reduced space to extract the semantic features with matrix operation hi ¼ W þxi , where W þ ¼ (WTW)21W T. A new face image xq is also projected into the reduced space with hq ¼ W þxq . However, there are not only positive components but also negative components in hi and hq . Because of the non-negative constraints of NMF, hi and hq which contain negative components are not suitable to represent the query image and each image in the database in the semantic space. In our work, we extract the semantic features of each image in the database, using NMF-projecting algorithm in order to follow non-negative constraints. For the hidden semantic features of the query, we use the mean value of the semantic features of the positive examples to represent the query as described by (6). When the user provides the feedback, the positive examples matrix is updated, which results in the update of not only the basis matrix but also the encoding matrix, which means the hidden semantic features of the query are refined with more relevant images and the query point moves to the user’s expectation. In order to evaluate the effectiveness of our proposed approach for extracting the hidden semantic features of the query and images in the database, we design experiments to compare our approach with two alternative approaches. In the first alternative approach, the query is represented by the semantic features of only the original query image, and the semantic features of each image in IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 4, August 2006

Fig. 4 Average-retrieval-rank of semantic features extracted using different approaches

Fig. 5 Average retrieval accuracy of semantic features extracted using different approaches (in top 12 results) 441

Fig. 6 Average-retrieval-rank of four different relevance feedback approaches

approaches. It means that all the relevant images to the query occupy the anterior positions in NMF-based approach than in other approaches. In other words, for a given number of images returned, NMF-based approach can retrieve more relevant images than other approaches. In addition, SVDbased approach performs better than SVM-based approach and query reweighting approach. NMF-based and SVDbased approaches achieve their best performances with three iterations and keep stable in subsequent iterations, whereas SVM-based approach and query reweighting approach keep stable since the sixth iteration. It is obvious that relevance feedback conducted in reduced semantic space can reach a better performance with less number of feedback iterations than that conducted in original low-level feature space. In Figs. 7 and 8, average retrieval accuracy of four relevance feedback approaches in top 12 and 20 results are plotted against the number of feedback iterations. In the two figures, NMF-based approach still performs best among four relevance feedback approaches. It can be seen that more relevant images are retrieved for NMF-based approach, as the number of user’s feedbacks increases within three iterations and keep stable in subsequent iterations. However, the performance of all the relevance feedback approaches decreases when more images are returned. It is because the number of images in some semantic categories is less than 20, which means that for these categories, the best accuracy is less than 100% though all the relevant images are retrieved. In addition, we can find that

Fig. 8 Average retrieval accuracy of four different relevance feedback approaches (in top 20 results)

the decreasing degrees of four relevance feedback approaches are different and NMF-based approach decreases least. The reason is that the semantic space is inferred from more relevant images than before, which improves the representation power and reliability of the semantic space. It means NMF-based approach can retrieve more relevant images when top 20 images are returned than other approaches as described by the performance of average-retrieval-rank in Fig. 6. A typical retrieval process of our NMF-based relevance feedback approach over three iterations is given in Fig. 9. Fig. 9a is the original query image. Fig. 9b is the retrieval results without relevance feedback and eight relevant images are retrieved in top 12 results. Fig. 9c is the retrieval results after the first round of relevance feedback, in which we can see that eight relevant images obtained from the first retrieval results are accumulated and occupy the first eight positions in the current retrieval results. In addition, two new relevant images are retrieved and occupy the ninth and 12th positions, thus there are ten relevant images in the retrieval results after the first round of relevance feedback. Fig. 9d is the retrieval results after the second round of relevance feedback. In the same way, ten relevant images obtained from the past retrieval results occupy the first ten positions. A new relevant image is retrieved which occupy the 12th position. Fig. 9e is the retrieval results after the third round of relevance feedback. Eleven relevant images obtained from the past retrieval results are remembered in the current retrieval results. Although new relevant image is not retrieved, the non-relevant image in Fig. 9e looks more similar to the query in colour distribution than that in Fig. 9d. Note that, in each round of relevance feedback, the query is not the original query image in Fig. 9a but the mean value of the semantic features of positive examples. 5

Fig. 7 Average retrieval accuracy of four different relevance feedback approaches (in top 12 results) 442

Conclusion and future work

In this paper, we describe a NMF-based relevance feedback approach to make use of user’s feedback to enhance the performance of image retrieval system. The proposed NMFbased relevance feedback approach first infers a semantic space with the standard NMF algorithm from the user’s interaction, and then the semantic features of the query and images in the database are extracted in the semantic space with NMF-projecting algorithm. Finally, ranking and retrieving operations are processed in this new semantic space, and the retrieved results are shown to the user to wait IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 4, August 2006

a

b

c

d

e

Fig. 9 Typical retrieval process of our proposed relevance feedback approach over three iterations a b c d e

Original query image Retrieval results without relevance feedback Retrieval results after the first round of relevance feedback Retrieval results after the second round of relevance feedback Retrieval results after the third round of relevance feedback

for new feedback. In addition, at each bound, the relevant images obtained from the past retrieval interaction are accumulated to update the derived semantic space, and make the semantic space be closer to the user’s expectation gradually. Experimental results show that our proposed NMF-based relevance feedback approach performs better than other relevance feedback approaches. Much work remains to be done to improve the performance of our proposed relevance feedback approach. For example, the problem that one may encounter in NMF application is random initialisation, because it is an IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 4, August 2006

unsupervised technique. As the solution of NMF is always a local minimum, initialisation seems to play a key role in obtaining a ‘good’ solution. However, the entirely nonnegative random initialisations provide a random starting point of searching reduced representation of global data. One of our future works will focus on how to construct the initial matrices W and H to improve the stability and effectiveness of NMF-based relevance feedback approach. Another future work is to combine latent semantic indexing (LSI) with relevance feedback. It means that we will conduct the first retrieval using NMF-based LSI approach 443

(in which, NMF is used as an alternative of SVD), and then update the semantic space through relevance feedback. In addition, when the size of image collection is very large (for examples, 10 000 images in the database), projecting all images in the database into the reduced space by NMF-projecting algorithm may be time-consuming, although NMF is based on the simple iterative algorithm. We want to generate prototype images using some clustering algorithms and each of prototype images is a representative of a semantic category. After inferring a semantic space from user’s interactions, we project prototype images into the semantic space and return the images that belong to the same semantic category as the prototype image, which is the closest prototype image to the query. 6

Acknowledgment

This Project was supported by the Key Technologies R&D Program of Shanghai (03DZ19320). 7

References

1 Salton, G.: ‘Automatic text processing’ (Addison-Wesley, Boston, 1989) 2 Rui, Y., Huang, T.S., Ortega, M., and Mehrotra, S.: ‘Relevance feedback: a power tool for interactive content-based image retrieval’, IEEE Trans. Circuits Syst. Video Technol., 1998, 8, (5), pp. 644–655 3 Ishikawa, Y., Subramanya, R., and Faloutsos, C.: ‘MindReader: query databases through multiple examples’. Proc. 24rd Int. Conf. on Very Large Data Bases, New York, 1998 (Morgan Kaufmann), pp. 218–227 4 Rui, Y., and Huang, T.S.: ‘A novel relevance feedback technique in image retrieval’. Proc. 7th ACM Int. Conf. on Multimedia (Part 2), Orlando, Florida, 1999 (ACM Press), pp. 67–70 5 Rui, Y., Huang, T.S., Mehrotra, S., and Ortega, M.: ‘A relevance feedback architecture for content-based multimedia information retrieval systems’. Proc. IEEE Workshop Content-based Access of Image and Video Libraries, Puerto Rico, 1997 (IEEE Computer Society), pp. 82– 89 6 Porkaew, K., Mehrota, S., and Ortega, M.: ‘Query reformulation for content based multimedia retrieval in mars’. IEEE Int. Conf. on Multimedia Computing and Systems, ICMCS, Florence, 1999 (IEEE Computer Society), pp. 747–751 7 Wu, L., Faloutsos, C., Sycara, K., and Payne, T.R.: ‘Falcon: feedback adaptive loop for content-based retrieval’. Proc. of 26th Int. Conf. on Very Large Data Bases, Cairo, Egypt, 2000 (Morgan Kaufmann), pp. 297–306 8 Porkaew, K., and Chakrabarti, K.: ‘Query refinement in multimedia similarity retrieval in mars’. The 7th ACM Int. Multimedia Conf., Orlando, Florida, 1999 (ACM Press), vol. 1, pp. 235–238

444

9 Tong, S., and Chang, E.: ‘Support vector machine active learning for image retrieval’. Proc. 9th ACM Int. Conf. on Multimedia, Ottawa, Canada, 2001 (ACM Press), pp. 107–118 10 Wu, Y., Tian, Q., and Huang, T.S.: ‘Discriminant-EM algorithm with application to image retrieval’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head Island, SC, 2000 (IEEE Computer Society), vol. 1, pp. 222–227 11 Lei, Z., Fuzong, L., and Bo, Z.: ‘Support vector machines based relevance feedback algorithm in image retrieval’, J. Tsinghua Univ. Sci. Technol., 2002, 42, (1), pp. 80 –83 12 Jing, F., Li, M., Zhang, H.-J., and Zhang, B.: ‘Support vector machines for region-based image retrieval’. Int. Conf. Multimedia and Expo, Baltimore, Maryland, 2003 (IEEE Computer Society), vol. 2, pp. 21– 24 13 He, X., King, O., Ma, W.-Y., Li, M., and Zhang, H.-J.: ‘Learning a semantic space from user’s relevance feedback for image retrieval’, IEEE Trans. Circuits Syst. Video Technol., 2003, 13, (1), pp. 39–48 14 Su, Z., Zhang, H., Li, S., and Ma, S.: ‘Relevance feedback in contentbased image retrieval: Bayesian framework, feature subspaces, and progressive learning’, IEEE Trans. Image Process., 2003, 12, (8), pp. 924–937 15 Lee, D.D., and Seung, H.S.: ‘Learning the parts of objects by nonnegative matrix factorization’, Nature, 1999, 401, pp. 788– 791 16 Wei, X., Xin, L., and Yihong, G.: ‘Document clustering based on nonnegative matrix factorization’. SIGIR Forum (ACM Special Interest Group on Information Retrieval), Toronto, Canada, 2003 (ACM Press), pp. 267–273 17 Tsuge, S., Shishibori, M., Kuroiwa, S., and Kita, K.: ‘Dimensionality reduction using non-negative matrix factorization for information retrieval’. IEEE Int. Conf. on Systems, Man, and Cybernetics, Arizona, USA, 2001 (IEEE Computer Society), vol. 2, pp. 960–965 18 Guillamet, D., Schiele, B., and Vitria, J.: ‘Analyzing non-negative matrix factorization for image classification’. 16th Int. Conf. on Pattern Recognition, Quebec, Canada, 2002 IEEE Computer Society vol. 2, pp. 116– 119 19 Foon, N.H., Jin, A.T.B., and Ling, D.N.C.: ‘Face recognition using wavelet transform and non-negative matrix factorization’ (2004), LNAI 3339, pp. 192– 202 20 Feng, T., Li, S.Z., Shum, H.-Y., and Zhang, H.J.: ‘Local non-negative matrix factorization as a visual representation’. Proc. 2nd Int. Conf. on Development and Learning (ICDL’02), Massachusetts, USA, 2002 IEEE Computer Society pp. 1– 6 21 Lee, D.D., and Seung, H.S.: ‘Algorithms for non-negative matrix factorization’. ‘Advances in neural information processing systems 13’ (Proc. NIPS2000), 2001 (MIT Press), pp. 556–562 22 Guillamet, D., and Vitria, J.: ‘Discriminant basis for object classification’. 11th Int. Conf. on Image Analysis and Processing, Palermo, 2001 (IEEE Computer Society), pp. 256– 261 23 Salton, G., and McGill, M.: ‘Introduction to modern information retrieval’ (McGraw-Hill, New York, 1983) 24 Zhang, Y.J., Liu, Z.W., and He, Y.: ‘Color-based image retrieval using sub-range cumulative histogram’, High Technol. Lett., 1998, 4, (2), pp. 71– 75 25 Zhao, R., and Grosky, W.I.: ‘Negotiating the semantic gap: from feature maps to semantic landscapes’, Pattern Recognit., 2002, 35, (3), pp. 593–600

IEE Proc.-Vis. Image Signal Process., Vol. 153, No. 4, August 2006

Evolutionary Computation, IEEE Transactions on - IEEE Xplore