Image Annotation Using Search and Mining Technologies

1

Xin-Jing Wang, Lei Zhang, Feng Jing, Wei-Ying Ma Microsoft Research Asia, 49 Zhichun Road, Beijing 100080, China

[email protected], {leizhang, fengjing, wyma}@microsoft.com ABSTRACT In this paper, we present a novel solution to the image annotation problem which annotates images using search and data mining technologies. An accurate keyword is required to initialize this process, and then leveraging a large-scale image database, it 1) searches for semantically and visually similar images, 2) and mines annotations from them. A notable advantage of this approach is that it enables unlimited vocabulary, while it is not possible for all existing approaches. Experimental results on real web images show the effectiveness and efficiency of the proposed algorithm.

a prediction model as a traditional approach does. And as a direct result, this method has no limitations on vocabulary, making it fundamentally different from the previous works.

2. ANNOTATING IMAGE BY SEARCH AND MINING

I.4.8 [Image Processing and Computer Vision]: Scene Analysis – object recognition. H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval –search process.

The entire process is as this: given one query image and one query keyword, a text-based search is first employed to retrieve a group of semantically similar images. Then content-based image retrieval is adopted based on the selected images to pick up those visually similar ones, and rank them accordingly. To speed up this step, a hash coding-based algorithm is adopted to map the highdimensional visual features to hash codes. Then key phrases are mined from the textual annotations of the top N ranked images using a clustering approach. Finally, after removing the duplicates, the rest phrases are output as the predicted annotations.

General Terms: Algorithms, Performance.

Below we detail three key techniques of this approach.

Keywords:

2.1 Hash Coding Algorithm

Categories and Subject Descriptors

Image annotation, search result clustering, hash

indexing.

1. INTRODUCTION Image annotation nowadays is still far from practical and satisfactory given so many computer vision and machine learning approaches. Possible reasons are: 1) it is still unclear how to model the semantic concepts effectively and efficiently; 2) the lack of training data to bridge effectively the semantic gap. With the explosive development of the Web, it has become a huge resource of all kinds of data and has brought about possible solutions to many problems that were believed to be “unsolvable.” In this paper, we leverage the huge number of images existing on the Web and propose a novel idea for image auto-annotation. The key idea is to find a group of similar images both semantically and visually, extract key phrases from their textual descriptions, and select the highest-scored ones to annotate the query image. To by-pass the semantic gap, an accurate keyword is assumed initially associated with the query image, and we call it query keyword. This requirement is not as lacking in subtlety as it may first seem, e.g., in desktop photo search, location or event names are usually provided as folder names. Or in Web image search, a surrounding keyword can be chosen as the query. A notable advantage is that the proposed approach is entirely unsupervised. No supervised learning approach is required to train Copyright is held by the authors/owners. WWW 2006, May 23–26, 2006, Edinburgh, Scotland. ACM 1-59593-323-9/06/0005. 1

We modified the algorithm proposed by Wang et al. [2] to encode the image visual features to hash codes. Firstly, images are divided into even blocks and 36-bin color Correlograms [1] to represent each block. Then the features are transformed by a PCA mapping matrix learned beforehand, and quantized into 32dimension hash codes. The quantization strategy is that if a feature component is larger than the mean of this vector, it is quantized to 1, otherwise to 0.

2.2 Distance Measures Three distance measures are proposed and compared. 1) Hamming distance. It measures the number of different bits of two hash codes. 2) Weighted Hamming distance. Since the higher bits of the hash codes contain majority energy of an image, difference in higher bits should be larger-weighted. We evenly separates the 32-bit hash codes into 8 bins, and weights the corresponding Hamming distance by 28 − i , 1 ≤ i ≤ 8 for the i-th bin. 3) Euclidean distance on color Correlograms. We use this measure as a baseline to assess the effectiveness of the hash code based methods.

2.3 Mining Annotations by Clustering The Search Result Clustering (SRC) algorithm [3] is used to cluster the retrieved semantically and visually similar images according to their titles, URLs and surrounding texts. Since SRC algorithm can generate clusters with highly readable names, we

The work was done when Xin-Jing Wang was an intern in Microsoft Research Asia. Now she is with IBM China Research Lab in Beijing and her contact email is [email protected].

Max Cluster Size Criterion E

0.7

Hamming Distance Weighted Hamming

0.6

Eucli_Dist on Correlo 0.5

Text only

0.4 0.3 0.2 0.1 0 0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

Figure 2. Examples of the Outputs

1.6

Similarity Weight

square curves in Figure 1 represent the text-based method as a baseline method that no visual features are available.

(a) Precision w.r.t. maximum cluster size criterion Avg Member Image Score E 0.35

Hamming Distance

0.3

Weighted Hamming

Figure 1 (a) shows that the weighted Hamming distance measure performs the best with maximum cluster size criterion. The reason is that it captures the important features of an image and weights them high. Interestingly, Euclidean distance measure performs nearly the same of the Hamming distance measure. It means that the information-loss due to PCA can be ignored on this dataset.

Eucli_Dist on Correlo

0.25

Text Only

0.2 0.15 0.1 0.05 0 0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

-0.05 -0.1 -0.15

Similarity Weight

(b) Precision w.r.t. average member image score criterion Figure 1. E vs. similarity weight. use these names as our candidate phrases, and rank them according to the following two criteria: Maximum cluster size criterion. The score of a cluster is equal to the number of its member images. It assumes that the best key phrases are the dominant concepts of the member images. Average member image score criterion. The cluster score is given by the average similarity of its member images to the query image. At last, we remove duplicates from the top ranked phrases and output the rest ones as annotations of the query image.

3. EXPERIMENTS 2.4 million high quality photos with rich descriptions from online photo forums are extracted. Though the descriptions are noisy, they cover to a certain degree the concepts of the corresponding photos. The query dataset is 30 Google images from 15 categories (Apple, Beach, Beijing, Bird, Butterfly, Clouds, Clownfish, Japan, Liberty, Lighthouse, Louvre, Paris, Sunset, Tiger, Tree) that are randomly selected. An evaluation criterion is proposed (see Eq.1), which differentiates “perfect” annotations (e.g. “Eiffel tower”) from just “correct” ones (e.g. “France” for an Eiffel tower image).

E = ( p + 0.5 × r − w) / n

(1)

n denotes the number of annotations predicted. p, r , w are the number of “perfect”, “correct”, and “wrong” annotations respectively. Figure 1 shows the curves of E of the three distance measure vs. the similarity weight. This weight weights the average similarity of images retrieved and gives a threshold for image filtering. The remained images are clustered for annotation mining. The green

Figure 1 (b) shows that maximum average member image score criterion performs generally worse than maximum cluster size criterion. A possible reason is that SRC is a text-based clustering algorithm; hence images in a cluster may not be visually similar. Note that the system performance jumps when the threshold is too large so that images retrieved are too few to ensure good clustering performance. The approach efficiency was also tested on a Dual Intel Pentium 4 Xeon hyper-threaded CPU and 2G memory computer. Images retrieved are 24,000 on average. The time cost is 0.034, 0.072, and 0.122 seconds for the three measures respectively (image ranking procedure is included). Figure 2 shows a few examples of the query images and their predicted annotations. The boldfaced words are query keywords.

4. CONCLUSIONS In this paper, we proposed a novel approach which reformulates the image auto-annotation problem as searching for semantically and visually similar images on the Web and mining annotations from their descriptions. To make it an online system, a hash coding algorithm is adopted to speed up the content-based search. Experiments conducted on 2.4 million photo forum images proved the effectiveness and efficiency of this proposed approach.

5. REFERENCES [1] Huang, J., Kumar, S. R., Mitra, M., Zhu, W.-J., and Zabih, R. Image Indexing Using Color Correlograms. IEEE Conf. on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, (1997)

[2] Wang, B., Li, Z.W., and Li, M.J. Efficient Duplicate Image Detection Algorithm for Web Images and Large-scale Database. In Technical Report of Microsoft Research, 2005

[3] Zeng, H.J., He, Q.C., Chen, Z., and Ma, W.-Y. Learning To Cluster Web Search Results. In Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, (July 2004). 210-217

Image Annotation Using Search and Mining ...

May 23, 2006 - ABSTRACT. In this paper, we present a novel solution to the image annotation problem which annotates images using search and data mining.

154KB Sizes 1 Downloads 274 Views

Recommend Documents

Scalable search-based image annotation
have considerable digital images on their personal devices. How to effectively .... The application of both efficient search technologies and Web-scale image set ...

Scalable search-based image annotation - Semantic Scholar
query by example (QBE), the example image is often absent. 123 ... (CMRM) [15], the Continuous Relevance Model (CRM) [16, ...... bal document analysis.

Scalable search-based image annotation - Semantic Scholar
for image dataset with unlimited lexicon, e.g. personal image sets. The probabilistic ... more, instead of mining annotations with SRC, we consider this process as a ... proposed framework, an online image annotation service has been deployed. ... ni

AnnoSearch: Image Auto-Annotation by Search
2. Related works. Relevance feedback is used to annotate images in ... content-based search stage and the annotation learning stage .... meaningful descriptions from online photo forums. ... distance measures are proposed and compared.

Image Annotation Using Bi-Relational Graph of Images ...
data graph and the label graph as subgraphs, and connect them by an ...... pletely correct, a big portion of them are (assumed to be) correctly predicted.

Medical Image Annotation using Bag of Features ...
requirements for the degree of. Master of Science in Biomedical Engineering ... ponents in teaching, development of support systems for diagnostic, research.

Automatic Image Annotation by Using Relevant ...
image/video search results via relevance re-ranking [14-16], where the goal for .... serve that visual-based image clustering can provide a good summarization of large ..... multiple search engines for visual search reranking”, ACM. SIGIR, 2009.

Image Annotation Using Multi-label Correlated Green's ... - IEEE Xplore
Computer Science and Engineering Department. University of Texas at Arlington ... assignment in most of the existing rank-based multi-label classification ...

Baselines for Image Annotation - Sanjiv Kumar
and retrieval architecture of these search engines for improved image search. .... mum likelihood a good measure to optimize, or will a more direct discriminative.

Web-scale Image Annotation - Research at Google
models to explain the co-occurence relationship between image features and ... co-occurrence relationship between the two modalities. ..... screen*frontal apple.

Semantic Image Retrieval and Auto-Annotation by ...
Conventional information retrieval ...... Once we have constructed a thesaurus specific to the dataset, we ... the generated thesaurus can be seen at Table 5. 4.3.

Baselines for Image Annotation - Sanjiv Kumar
Lasso by creating a labeled set from the annotation training data. ...... Monay, F. and D. Gatica-Perez: 2003, 'On image auto-annotation with latent space models' ...

Using Text-based Web Image Search Results ... - Semantic Scholar
top to mobile computing fosters the needs of new interfaces for web image ... performing mobile web image search is still made in a similar way as in desktop computers, i.e. a simple list or grid of ranked image results is returned to the user.

Using Text-based Web Image Search Results ... - Semantic Scholar
In recent years, the growing number of mobile devices with internet access has ..... 6 The negative extra space is avoided by the zero value in the definition of ...

Anno Search Using Content Based Image Retrieval ...
Abstract: AnnoSearch is searching for semantically and visually similar images and extracting annotations from them. In this paper we have proposed AnnoSearch using Content-based image retrieval (CBIR) concept. CBIR deals with the retrieval of most s

Reducing Annotation Effort using Generalized ...
Nov 30, 2007 - between model predicted class distributions on unlabeled data and class priors. .... can choose the instance that has the highest expected utility according to .... an oracle that can reveal the label of each unlabeled instance.

A New Baseline for Image Annotation - Research at Google
indexing and retrieval architecture of Web image search engines for ..... cloud, grass, ... set has arisen from an experiment in collaborative human computing—.

Image processing using linear light values and other image ...
Nov 12, 2004 - US 7,158,668 B2. Jan. 2, 2007. (10) Patent N0.: (45) Date of Patent: (54). (75) ..... 2003, available at , 5.

Image inputting apparatus and image forming apparatus using four ...
Oct 24, 2007 - Primary Examiner * Cheukfan Lee. (74) Attorney, Agent, or Firm * Foley & Lardner LLP. (57). ABSTRACT. A four-line CCD sensor is structured ...

"Genome Annotation and Curation Using ... - Wiley Online Library
see https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+at+iPlant and ... They also run multiple ab initio gene predictors, compare all predicted gene .... Software. MAKER and MAKER-P are available for download at yandell-lab.org. Inst

Large scale image annotation: learning to rank with joint word-image ...
Jul 27, 2010 - on a laptop, at least at annotation time. For many .... (9) to the total risk, i.e. taking the expectation of these contributions approximates (8) be-.