Social Image Search with Diverse Relevance Ranking - Springer Link

Viewer
Transcript

Social Image Search with Diverse Relevance Ranking Kuiyuan Yang1, , Meng Wang2 , Xian-Sheng Hua2 , and Hong-Jiang Zhang3 1

University of Science and Technology of China, Hefei, Anhui, 230027 China [email protected] 2 Microsoft Research Asia, 49 Zhichun Road Beijing, 100080 China {mengwang,xshua}@microsoft.com 3 Microsoft Adv. Tech. Center, 49 Zhichun Road Beijing, 100080 China [email protected]

Abstract. Recent years have witnessed the success of many online social media websites, which allow users to create and share media information as well as describe the media content with tags. However, the existing ranking approaches for tag-based image search frequently return results that are irrelevant or lack of diversity. This paper proposes a diverse relevance ranking scheme which is able to simultaneously take relevance and diversity into account. It takes advantage of both the content of images and their associated tags. First, it estimates the relevance scores of images with respect to the query term based on both the visual information of images and the semantic information of associated tags. Then we mine the semantic similarities of social images based on their tags. With the relevance scores and the similarities, the ranking list is generated by a greedy ordering algorithm which optimizes Average Diverse Precision (ADP), a novel measure that is extended from the conventional Average Precision (AP). Comprehensive experiments and user studies demonstrate the eﬀectiveness of the approach. Keywords: Image search, diversity.

1

Introduction

There is an explosion of community-contributed multimedia content available online, such as Youtube, Flickr and Zooomr. Such media repositories promote users to collaboratively create, evaluate and distribute media information. They also allow users to annotate their uploaded media data with descriptive keywords called tags, which can greatly facilitate the organization of the social media. However, performing search on large-scale social media data is not an easy task.

This work was performed when Kuiyuan Yang was visiting Microsoft Research Asia as research intern.

S. Boll et al. (Eds.): MMM 2010, LNCS 5916, pp. 174–184, 2010. c Springer-Verlag Berlin Heidelberg 2010

Social Image Search with Diverse Relevance Ranking

175

Commonly-used tag-based image search is a straightforward approach, which returns the images annotated with a speciﬁc query tag. Currently, Flickr provides two ranking options for tag-based image search. One is “most recent”, which orders images based on their uploading time, and the other is “most interesting”, which ranks the images by “interestingness”, a measure that integrates the information of click-through, comments, etc. These two search options both rank images according to other measures (interestingness or time) instead of relevance levels and thus may bring many irrelevant images in the returned ranking lists. In addition to relevance, lack-of-diversity is also a problem. Many images on social media websites are actually close to each other. For example, several users used to upload continuously captured images in batch, and many of them will be visually and semantically close. When these images appear simultaneously in the top results, users will get only limited information. Therefore, a ranking scheme that can simultaneously generate relevant and diverse results is highly desired. In this work we propose a Diverse Relevance Ranking (DRR) scheme for social image search. It is able to rank the images based on their relevance levels with respect to query tag while simultaneously considering the diversity of the ranking list. The scheme works as follows. First, we estimate the relevance score of each image with respect to the query term as well as the semantic similarity of each image pair. The relevance estimation incorporates both the visual information of images and the semantic information of their associated tags into an optimizations framework, and the semantic similarity is mined based on the associated tags of images. With the estimated relevance scores and similarities, we then implement the DRR algorithm, which can be viewed as a greedy ordering algorithm that optimizes Average Diverse Precision (ADP), a novel measure that is extended from conventional Average Precision (AP). Diﬀerent from AP that only considers relevance, ADP further takes diversity into account, and thus the derived DRR algorithm can generate results that are both relevant and diverse. The main contribution of this paper can be summarized as follows: (1) Propose a diverse relevance ranking scheme for social image search, which is complementary to the existing ranking approaches. (2) Propose a method to estimate the relevance scores of images with respect to a query tag. It leverages both the visual information of images and the semantic information of tags. (3) Extend the conventional AP measure to ADP to take diversity into account, and then derive a greedy ordering algorithm accordingly that compromises relevance and diversity. The organization of the rest of this paper is as follows. In Section 2, we provide a short review on the related work. In Section 3, we introduce the diverse relevance ranking approach. In Section 4, we detail the relevance and semantic similarity estimation of social images. Empirical study is presented in Section 5. Finally, we conclude the paper in Section 6.

176

2

K. Yang et al.

Related Work

The last decade has witnessed a great advance of image search technology [10,5,17]. Diﬀerent from general web images, social images are usually associated with a set of user-provided descriptors called tags, and thus tag-based search can be easily accomplished by using the descriptors as index terms. But user-provided tags are usually very noisy [8,11,14], and this fact usually makes search results unsatisfactory. In comparison with the extensive studies on how to help users better perform tagging or leveraging tags as an information source in other applications, the literature regarding tag-based image search is still very sparse. Li et al. have proposed a tag relevance learning method which is able to assign each tag a relevance score, and they have shown its application in tagbased image search [11]. But they simply adopt a visual neighborhood voting method, and the semantic information of tags is not utilized. Their method also cannot deal with the aforementioned lack-of-diversity problem. It has been long acknowledged that diversity plays an important role in information retrieval. In 1964, Goﬀman have recognized that the relevance of a document must be determined with respect to the documents appearing before it [4]. Carbonell et al. propose a ranking method named Maximal Marginal Relevance (MMR), which attempts to maximize relevance while minimizing similarity to higher ranked documents [2]. Zhai et al. propose a subtopic search method, which aims to return the results that cover more subtopics [19]. Many related works can be found in [19] and the references therein. The diversity problem is actually more challenging in image search, as it involves not only the semantic ambiguity of queries but also the visual similarity of search results [9,16]. Currently there are two typical approaches to enhancing the diversity in image search: search results clustering [1,7,12,9] and duplicate removal [6,15,13]. The clustering and duplicate elimination techniques are both useful, but they have their limitations due to the involved heuristics. These two methods actually both accomplish diversiﬁcation by removing several images in the ranking list (near-duplicates or images that are not the representatives of clusters), and this thus introduces a dilemma. If we adopt too many clusters or a small threshold for near-duplicate detection, then the diversity of search results cannot be guaranteed, and contrarily if clusters are too few or we set a large threshold for near-duplicate detection, many informative images will be removed. The diverse relevance ranking scheme proposed in this work adopts a diﬀerent approach. We just rank all images and keep the diversity of top results. Therefore, users will not miss information since we do not remove any image, and the relevance and diversity of top results can still both be kept.

3

Diverse Relevance Ranking

We introduce the DRR approach in this section. Here we present it as a general ranking algorithm algorithm and leave the two ﬂexible components, i.e., relevance score and similarity estimation of images, to the next section. We ﬁrst prove that

Social Image Search with Diverse Relevance Ranking

177

ranking by relevance scores can be viewed as optimizing the mathematical expectation of the conventional Average Precision (AP) measure. Then we analyze the limitation of AP and generalize it to an Average Diverse Precision (ADP) measure to take diversity into account. The DRR algorithm is then derived by greedily optimizing the mathematical expectation of ADP measurement. 3.1

Average Precision

AP is a widely-applied performance evaluation measure in information retrieval. Given a collection of images D = {x1 , x2 , . . . , xn }, denote by y(xi ) the binary relevance label of xi with respect to the given query, i.e., y(xi ) = 1 if xi is relevant and otherwise y(xi ) = 0. Denote by τ an ordering of the images, and let τ (i) be the image at the position of rank i (lower number means higher ranked image). Let R be the number of true relevant images in the set D. Then the non-interpolated AP is deﬁned as j n y(τ (k)) 1 (1) y(τ (j)) k=1 AP (τ , D) = R j=1 j Obviously, ranking images with their relevance scores in decreasing order is the most intuitive approach if we do not consider other factors. Now we prove that the ranking list generated in this way actually maximizes the mathematical expectation of AP measurement. Denote by r(xi ) the relevance score of xi (how to estimate it will be introduced in the next section), and it is reasonable for us to assume that r(xi ) = P (y(xi ) = 1), i.e., we regard the relevance score r(xi ) as the probability that xi is relevant. Since R can be regarded as a constant, we do not take it into account in the expectation estimation. We also assume that the relevance of an image is independent with other images, and hence the expected value of AP (τ , D) can be computed as follows j n 1 E{y(τ (k))y(τ (j))} R j=1 j k=1 j−1 n 1 1 r(τ (j)) + r(τ (k))r(τ (j)) = R j=1 j

E{AP (τ , D)} =

(2)

k=1

Then we have the following theorem: Theorem 1. Ranking the images in D with relevance scores r(xi ) in non increasing order maximizes E{AP (τ , D)}. Proof. Denote by τ ∗ the ranking of images in D with relevance scores in non increasing order, i.e., r(τ ∗ (i)) ≥ r(τ ∗ (i + 1)). Then we only need to prove E{AP (τ ∗ , D)} ≥ E{AP (τ , D)} for every possible τ . Without loss of generality, we consider an ordering τ that has exchange the documents at the positions of rank i and i + 1 in τ ∗ , i.e., τ (i) = τ ∗ (i + 1) and

178

K. Yang et al.

τ (i + 1) = τ ∗ (i). Actually it is not diﬃcult to ﬁnd that any change on the τ ∗ can be decomposed into a series of such adjacent exchanges. So, our task is simpliﬁed to prove E{AP (τ ∗ , D)} ≥ E{AP (τ , D)}. = For simplicity, we denote ri = r(τ ∗ (i)) and ri = r(τ (i)). Since ri = ri+1 ,ri+1 ri , and rk = rk when k = i and i + 1, we have = E{AP (τ ∗ , D} − E{AP (τ , D)} ri + i−1 ri+1 + ik=1 rk ri+1 rj + j−1 1 k=1 rk rj k=1 rk ri + + ) = ( R j i i+1 1≤j≤n,j=i,j=i+1 rj + j−1 + ik=1 rk ri+1 ri + i−1 ri+1 1 k=1 rk rj k=1 rk ri − ( + + ) R j i i+1 1≤j≤n,j=i,j=i+1 ri − ri+1 + i−1 ri − ri+1 + i−1 k=1 rk (ri − ri+1 ) k=1 rk (ri − ri+1 ) − = i i +1 1 1 = (1 + i−1 k=1 rk )(ri − ri+1 )( i − i+1 ) (3)

Since ri ≥ ri+1 , we have ≥ 0, i.e., E{AP (τ ∗ , D)} ≥ E{AP (τ , D)}, which completes the proof. This proof demonstrates that adopting the AP performance evaluation measure will prioritize images with high relevance. However, the measure may not be consistent with users’ experience due to the neglect of diversity. Therefore, the AP measure can be enhanced by considering diversity. 3.2

Average Diverse Precision

Here we generalize the existing AP measure to Average Diverse Precision (ADP) to take diversity into account, which is deﬁned as n j 1 k=1 y(τ (k))Div(τ (k)) y(τ (j))Div(τ (j)) ADP (τ , D) (4) R j=1 j where Div(τ (k)) indicates the diversity score of τ (k). We deﬁne Div(τ (k)) as its minimal diﬀerence with the images appearing before it, i.e., Div(τ (k)) = min (1 − s(τ (t), τ (k))) 1≤t
(5)

where s(., .) is a similarity measure between two images. It is worth noting that it needs not to be visual similarity. Actually in the next section we will introduce a semantic similarity for social images. Comparing the deﬁnition of AP and ADP (see Eq. 1 and Eq. 4), we can see that the only diﬀerence is that we have changed y(τ (k)) to y(τ (k))Div(τ (k)). For an image in the ranking list, its contribution to the ADP measure is not only determined by its relevance with respect to the query but also its diﬀerence with the images appearing before it. If an image is identical to one of the previously appeared images, it will contribute zero to the ADP measurement. Thus the ADP measure takes both relevance and diversity into account. Denote by τ ∗ the optimal ranking list under the ADP performance

Social Image Search with Diverse Relevance Ranking

179

evaluation measure, i.e., the list that achieves the highest ADP measurement, we can prove that y(τ (i))Div(τ (i)) ≥ y(τ (j))Div(τ (j)) for any i < j. This indicates that the top images will tend to be more relevant and diverse. Here we omit its proof since it is analogous to Theorem 1. 3.3

Diverse Relevance Ranking

The DRR algorithm is actually a greedy approach to optimizing the expected value of the ADP measurement. Analogous to AP, we can estimate the expected value of ADP as n

j

1 E{y(τ (k))y(τ (j))Div(τ (k))Div(τ (j))} E{ADP (τ , D)} = R j=1 j k=1 ⎛ ⎞ j−1 r(τ (k))Div(τ (k)) ⎟ n ⎜ Div(τ (j)) + 1 ⎟ ⎜ k=1 r(τ (j))Div(τ (j)) ⎜ = ⎟ R j=1 j ⎠ ⎝

(6)

The direct optimization of E{ADP (τ , D)} is a permutation problem and the solution space scales is O(n!). Thus here we propose a greedy method to solve it. Considering the top i − 1 documents have been established, based on Eq. 6 we can derive that the i-th image should be decided as follows τ (i) = arg max

x∈D−Si

r(x) Div(x)(C + Div(x)) i

(7)

where Si = {τ (1), τ (2), . . . , τ (i − 1)} C=

i−1

r(τ (k))Div(τ (k))

(8) (9)

k=1

4

Relevance and Similarity of Social Images

In this section, we introduce the estimation of relevance scores and similarities of social images, which are the two necessary components of the DRR algorithm. The following notations will be used. Given a query tag tq , denote by D = {x1 , x2 , . . . , xn } the collection of images that are associated with the tag. For image xi , denote by Ti = {ti1 , ti2 , . . . , ti|Ti | } the set of its associated tags. The relevance scores of all images in D are represented in a vector r = [r(x1 ), r(x2 ), . . . , r(xn )]T , whose element r(xi ) > 0 denotes the relevance score of image xi with respect to query tag tq . Denote by W a similarity matrix whose element Wij indicates the visual similarity between images xi and xj .

180

4.1

K. Yang et al.

Relevance Estimation

Our relevance estimation approach is accomplished by leveraging both the visual information of images and the semantic information of tags. Our ﬁrst assumption is that the relevance of an image should depend on the “closeness” of its tags to the query tag. Thus we ﬁrst have to deﬁne the similarity of tags. Diﬀerent from images that can be represented as sets of low-level features, tags are textual words and their similarity exists only in semantics. Recently, there are several works aim to address this issue [3,18]. Here we adopt an approach that is analogous to Google distance [3], in which the similarity between tag ti and tj is deﬁned as sim(ti , tj ) = exp(−

max(log c(ti ), log c(tj )) − log c(ti , tj ) ) log M − min(log c(ti ), log c(tj ))

(10)

where c(ti ) and c(tj ) are the numbers of images associated with ti and tj on Flickr respectively, c(ti , tj ) is the number of images associated with both ti and tj simultaneously, and M is the total number of images on Flickr. Therefore, the similarity of the query tag tq and the tag set of image xi can be computed as 1 sim(tq , Ti ) = sim(tq , t) (11) |Ti | t∈Ti

Our second assumption is that the relevance scores of visually similar images should be close. The visual similarity between two images can be directly computed based on Gaussian kernel function with a radius parameter σ, i.e., 2

xi − xj ) (12) σ2 Note that this assumption may not hold for several images, but it is still reasonable in most cases. Based on the two assumptions, we formulate a regularization framework as follows 2 n n r(xj ) r(xi ) Q(r) = Wij √ − + λ (r(xi ) − sim(tq , Ti ))2 (13) Dii Djj i,j=1 i=1 Wij = exp(−

r∗ = arg min Q(r)

where r(xi ) is the relevance score of xi , and Dii = can be written in matrix form as

n

j=1

Wij . The above equation

Q(r) = rT (I − D−1/2 WD−1/2 )r + λ r − v 2

(14)

where D = Diag(D11 , D22 , . . . , Dnn ) and v = [sim(tq , T1 ), sim(tq , T2 ), . . . , sim(tq , Tn )]T . r∗ can be obtained in an iterative way: (1) Construct the image aﬃnity matrix W by Eq. 12 if i = j and otherwise Wii = 0. (2) Initialize r(0) . The initial values will not inﬂuence the ﬁnal results. 1 λ D−1/2 WD−1/2 r(t) + v until convergence. (3) Iterate r(t+1) = 1+λ 1+λ The method can be viewed as a random walk process, and it will converge to a ﬁxed point.

Social Image Search with Diverse Relevance Ranking

4.2

181

Semantic Similarity Estimation

We deﬁne a semantic similarity for social images, which is estimated based on their associated tag sets. Note that we have obtained the similarity of tag pair in Eq. 10. Consequently, we estimate the semantic similarity of images xi and xj as s(xi , xj ) =

|Ti |

|Tj |

k=1

k=1

1 1 max sim(tik , t) + max sim(tjk , t) t∈Tj t∈Ti 2|Ti | 2|Tj |

(15)

We can see that the above deﬁnition satisﬁes the following properties: (1) s(xi , xj ) = s(xj , xi ), i.e., the semantic similarity is symmetry. (2) s(xi , xj ) = 1 if Ti = Tj , i.e., the semantic similarity of two images is 1 if their tag sets are identical. (3) s(xi , xj ) = 0 if and only if sim(t , t ) = 0 for every t ∈ Ti and t ∈ Tj , i.e., the semantic similarity is 0 if and only if every pair formed by the two tag sets has zero similarity.

5 5.1

Empirical Study Experimental Settings

We evaluate our approach on a set of social images that are collected from Flickr. We ﬁrst select a diverse set of popular queries, including airshow, apple, beach, bird, car, cow, dolphin, eagle, flower, fruit, jaguar, jellyfish, lion, owl, panda, starfish, triumphal, turtle, watch, waterfall, wolf, chopper, fighter, flame, hairstyle, horse, motorcycle, rabbit, shark, snowman, sport, wildlife, aquarium, basin, bmw, chicken, decoration, forest, furniture, glacier, hockey, matrix, Olympics, palace, rainbow, rice, sailboat, seagull, spider, swimmer, telephone, and weapon. For simplicity, we use 1 to 52 to denote the IDs of these queries, respectively. We then perform tag-based image search with “ranking by most recent” option, and the top 2,000 returned images for each query are collected together with their associated information, including tags, uploading time, user identiﬁer, etc. In this way, we obtain a social image collection consisting of 104,000 images and 83,999 unique tags. But many of the raw tags are misspelling and meaningless. Hence, we adopt a pre-ﬁltering process on these tags. Speciﬁcally, we match each tag with the entries in a Wikipedia thesaurus and only the tags that have a coordinate in Wikipedia are kept. In this way, 12,921 unique tags are kept for our experiment, and there are 7.74 tags associated with an image in average. For each image, we extract 428-dimensional features, including 225-dimenaional block-wise color moment features generated from 5-by-5 ﬁxed partition of the image, 128-dimensional wavelet texture features, and 75-dimensional edge distribution histogram features. The ground truth of the relevance of each image is voted by three human labelers. The radius parameter σ in Eq. 12 is empirically set to the median value of all the pair-wise Euclidean distances between images.

182

5.2

K. Yang et al.

Experimental Results

We ﬁrst compare the following three ranking methods: (1) Time-based ranking, i.e., order the images according to their uploading time. (2) Relevance-based ranking, i.e., order the images according to their estimated relevance scores r(xi ). (3) Diverse Relevance Ranking (DRR), i.e., the method proposed in this work. Fig. 1 and 2 illustrate the AP and ADP measurements obtained by diﬀerent methods, respectively. We also illustrate the mean AP (MAP) and mean ADP (MADP) measurements that are averaged over all queries. First, we observe Fig. 1 and it can found that the time-based ranking performs the worst in terms of relevance. This is understandable since the ranking list is generated merely based on time information. The relevance-based ranking performs much better, and this demonstrates the eﬀectiveness of our relevance estimation method. The AP measurements of DRR degrade slightly in comparison with the relevance-based ranking. The MAP measurements of relevance-based ranking and DRR are 0.684 and 0.663, respectively. However, from Fig. 2 we can see that the ADP measurements of DRR are much higher. The MADP measurements of time-based ranking, relevance-based

TimeͲbasedranking

RelevanceͲbasedranking

DRR

1 0.8 AP

0.6 0.4 0.2 0

QueryID

Fig. 1. The comparison of AP of diﬀerent ranking schemes. The MAP measurements of time-based ranking, relevance-based ranking and DRR are 0.576, 0.676 and 0.655, respectively.

TimeͲbasedranking

RelevanceͲbasedranking

DRR

0.7 0.6 ADP

0.5 0.4 0.3 0.2 0.1 0

QueryID

Fig. 2. The comparison of ADP of diﬀerent ranking schemes. The MADP measurements of time-based ranking, relevance-based ranking and DRR are 0.304, 0.356 and 0.401, respectively.

Social Image Search with Diverse Relevance Ranking

183

Table 1. The average rating scores and variances converted from users study results DRR vs. DRR vs. Time-based ranking Relevance-based ranking DRR Time-based DRR Relevance-based 2.40±0.386 1.03±0.033 2.40±0.455 1.133±0.189 Table 2. The ANOVA test results on comparing DRR and time-based ranking The factor of ranking schemes The factor of users F -statistic p-value F -statistic p-value 108.57 2.58 × 10−11 0.656 0.894 Table 3. The ANOVA test results on comparing DRR and relevance-based ranking The factor of ranking schemes The factor of users F -statistic p-value F -statistic p-value 46.74 2.5 × 10−11 0.25 0.999

ranking and DRR are 0.308, 0.361 and 0.412, respectively. This shows that DRR achieves a good trade-oﬀ between relevance and diversity. We then conduct a user study to compare the three ranking schemes. To avoid bias, a third-party data management company is involved. The company invites 30 anonymous participants, who declare they are regular users of Internet and familiar with image search and media sharing websites. We ask them to freely choose queries and compare DRR with each of the other two ranking approaches. The users are asked to give the comparison results using “>”, “” and “=”, which mean “better”, “much better” and “comparable”. To quantitate the results, we convert the results into ratings. We assign score 1 to the worse scheme and the other scheme is assigned a score 2, 3 and 1 if it is better, much better and comparable than this one, respectively. The average rating scores and the variances are illustrated in Table 1. From the results we can clearly see the preference of users towards the DRR scheme. We also perform an ANOVA test, and the results are illustrated in Table 2 and Table 3. The results demonstrate that the superiority of the DRR is statistically signiﬁcant.

6

Conclusion

This paper proposes a diverse relevance ranking scheme for social image search, which is able to simultaneously take relevance and diversity into account. It leverages both visual information of images and the semantic information of tags. The ranking algorithm optimizes an Average Diverse Precision (ADP) measure, which is generalized from the conventional AP measure by taking diversity into account. Experimental results have demonstrated the eﬀectiveness of the approach. In addition, we have also shown the application of the DRR scheme in

184

K. Yang et al.

web image search diversiﬁcation. In the future, we will test our scheme with more queries as well as comprehensively investigate the dependence of users’ search experience with relevance and diversity.

References 1. Cai, D., He, X., Li, Z., Ma, W.Y., Wen, J.R.: Hierarchical clustering of www image search results using visual, textual and link information. In: Proceedings of ACM Multimedia (2004) 2. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR (1998) 3. Cilibrasi, R., Vitanyi, P.M.B.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering (2007) 4. Goﬀman, W.: A searching procedure for information retrieval. Information Storage and Retrieval 2 (1964) 5. Hsu, W.H., Kennedy, L.S., Chang, S.F.: Video search reranking via information bottleneck principle. In: Proceedings of ACM Multimedia (2006) 6. Jaimes, A., Chang, S.F., Loui, A.C.: Detection of non-identical duplicate consumer photographs. In: Proceedings of ACM Multimedia (2003) 7. Jing, F., Wang, C., Yao, Y., Deng, K., Zhang, L., Ma, W.Y.: Igroup: web image search results clustering. In: Proceedings of ACM Multimedia (2006) 8. Kennedy, L.S., Chang, S.F., Kozintsev, I.V.: To search or to label? predicting the performance of search-based automatic image classiﬁers. In: Proceedings of the 8th ACM international workshop on Multimedia information retrieval (2006) 9. Leuken, R.H.V., Garcia, L., Olivares, X., Zwol, R.: Visual diversiﬁcation of image search results. In: Proceedings of WWW (2009) 10. Li, J., Wang, J.: Real-time computerized annotation of pictures. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(6) (2008) 11. Li, X.R., Snoek, C.G.M., Worring, M.: Learning tag relevance by neighbor voting for social image retrieval. In: Proceeding of the ACM International Conference on Multimedia Information Retrieval (2008) 12. Song, K., Tian, Y., Huang, T., Gao, W.: Diversifying the image retrieval results. In: Proceedings of ACM Multimedia (2006) 13. Srinivasan, S.H., Sawant, N.: Finding near-duplicate images on the web using ﬁngerprints. In: Proceedings of ACM Multimedia (2008) 14. Tang, J., Yan, S., Hong, R., Qi, G.-J., Chua, T.-S.: Inferring semantic concepts from community-contributed images and noisy tags. In: ACM Multimedia (2009) 15. Wang, B., Li, Z., Li, M., Ma, W.Y.: Large-scale duplicate detection for web image search. In: Proceedings of ICME (2006) 16. Wang, M., Hua, X.-S., Tang, J., Hong, R.: Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia 11(3) (2009) 17. Wang, M., Yang, K., Hua, X.-S., Zhang, H.-J.: Visual tag dictionary: Interpreting tags with visual words. In: ACM Workshop on Web-Scale Multimedia Corpus, in association with ACM MM (2009) 18. Wu, L., Hua, X.S., Yu, N., Ma, W.Y., Li, S.: Flickr distance. In: Proceedings of ACM Multimedia (2008) 19. Zhai, C., Cohen, W.W., Laﬀerty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Information Processing and Management (2006)

Web Image Retrieval Re-Ranking with Relevance Model