Approximate Asymmetric kNN Search for Binary Features Chih-Yi Chiu and Yu-Cyuan Liou Department of Computer Science and Information Engineering, National Chiayi University, Chiayi City, Taiwan 60004 E-mail: [email protected]; [email protected] INTRODUCTION A number of binary embedding algorithms have been proposed in the computer vision community recently. By transforming the real-number vector of a visual descriptor into the corresponding binary pattern, these algorithms enable a fast nearest neighbor search and compact storage space. Although search in the binary space can be done by efficient machine instructions (e.g., the POPCNT instruction), using linear search that matches with all binary features exhaustively is still inefficient in a large-scale dataset. Some studies [1-2] presented approximate nearest neighbor search methods in the binary space. They are fast but might be inaccurate. Norouzi et al. [3] proposed an exact search through efficient multi-index hashing. However, it spends too much memory space. For example, a dataset of one billion 64-bit binary features has to spend 86GB memory. Besides, the kNN ranking in the binary space is less discriminating than that in the real-number space. On the other hand, asymmetric distance matching for binary features is shown to be more accurate than Hamming distance matching [4]. “Asymmetric” means the distance is computed between two different spaces, e.g., the query is a real-number vector while the reference data are binary patterns. Since asymmetric matching does not transform the query to a binary feature, it takes advantage of less information loss in the query side, and hence the kNN ranking is more accurate and discriminating. We propose the approximate asymmetric nearest neighbor search for binary features. Based on [4], we assume that the binary embedding function ℎ(𝑥𝑖 ) = 𝑞�𝑓(𝑥𝑖 )� can be decomposed into two functions f and q, where 𝑓(𝑥𝑖 ): {𝐑}𝑠 → {𝐈}𝑡 transforms xi to a t-dimensional intermediate vector yi, and 𝑞(𝑦𝑖 ): {𝐈}𝑡 → {𝐁}𝑡 transforms yi to a t-dimensional binary vector zi. 1 In [4], the authors adopted the linear search scheme, which is inefficient in a large-scale dataset. In this paper, we present an approximate version that can yield comparable accuracy but much faster than [4], and can spend less memory than [3].

the intermediate space for the jth subvector (Step 2), and sort them to obtain an index list {βl} (Step 3). Next, we sequentially calculate the intermediate distances of reference files from the nearest group (𝑗) 𝑍𝛽𝑙 (Step 5). Input: A feature xq, a reference binary dataset {zi}, a set of (𝑗) (𝑗) groups �𝑍𝛽 � and corresponding intermediate means �𝑦𝛽 �. Output: k nearest neighbors. Step 1: Generate the t-dimensional intermediate feature 𝑦𝑞 = 𝑓�𝑥𝑞 �. Step 2: Divide yq into m subvectors. For each j and β, compute (𝑗)

(𝑗)

(𝑗)

(𝑗)

the 2-norm distance 𝑑 �𝑦𝑞 , 𝑦𝛽 � = �𝑦𝑞 , 𝑦𝛽 � in the 2

intermediate space. Step 3: For each j, sort the intermediate distances in ascending order. Denote the sorted indexes as {𝛽𝑙 , 𝑙 = 1,2, … 2 𝑔 }, where (𝑗) (𝑗) 𝑑 �𝑦𝑞 , 𝑦𝛽𝑙 � is the lth nearest intermediate distance. Step 4: Initialize xi.distance = 0 and xi.vote = 0 for all i. Initialize an empty nearest neighbor list D. Step 5: Do the following loop for l = 1 to 2g for j = 1 to m (𝑗) for 𝑖 ∈ 𝑍𝛽𝑙 (𝑗)

(𝑗)

𝑥𝑖 . 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = 𝑥𝑖 . 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 + 𝑑 �𝑦𝑞 , 𝑦𝛽𝑙 �. 𝑥𝑖 . 𝑣𝑜𝑡𝑒 = 𝑥𝑖 . 𝑣𝑜𝑡𝑒 + 1. if xi.vote == m Add xi in D and sort in ascending order. end end end if the size of D is equal to or greater than k return D. end end

THE PROPOSED METHOD

COMPLEXITY ANALYSIS

Suppose we have a reference dataset of n s-dimensional realnumber vectors {xi | i = 1, 2, … , n}, and its t-dimensional binary features {zi | i = 1, 2, … , n}. Every binary feature zi is divided into 𝑡 m subvectors, each of which comprising 𝑔 = bits (assume t is

The time complexity of the online search algorithm is about 𝑛 𝑚 ∙ 2 𝑔 ∙ 𝑔 + 𝑙 ∙ 𝑚 ∙ 𝑔 , i.e., the sorting time plus the number of 2 computations for updating the reference distance. The space complexity is about 𝑚 ∙ 𝑛 ∙ 4 bytes + 𝑛 ∙ 4 bytes, i.e., the size of m index tables (each of 4n bytes) plus the distance voting table.

𝑚 (𝑗)

divisible by m). Denote the jth subvector as 𝑧𝑖 , j = 1, 2, … , m. (𝑗) For a g-bit binary pattern 𝛽 ∈ {𝐁}𝑔 , we build the group 𝑍𝛽 = (𝑗)

�𝑖�𝑧𝑖 ∈ 𝛽, 𝑖 = 1,2, … , 𝑛� and intermediate space: (𝑗)

𝑦𝛽 = 𝑧 (𝑗)

1

compute

its

mean

in

the

(𝑗) � 𝑦𝑖 . (𝑗) �𝑍𝛽 � (𝑗) 𝑖∈𝑍 𝛽

If a subvector = 𝛽 , its correspondence in the intermediate (𝑗) (𝑗) space can be considered to be 𝑦𝛽 . We pre-compute 𝑦𝛽 for all j and β offline. Given a query 𝑥𝑞 ∈ {𝐑}𝑠 , we list the online search algorithm (𝑗)

(𝑗)

as follows. We first compute the distance between 𝑦𝑞 and 𝑦𝛽 in

1

R, I, and B represents the real-number, intermediate, and binary spaces respectively

REFERENCE [1] M. Muja and D. G. Lowe, “Fast matching of binary features,” In Proceedings of International Conference on Computer and Robot Vision, 2012. [2] M. M. Esmaeili, R. K. Ward, and M. Fatourechi, “A fast approximate nearest neighbor search algorithm in the Hamming space,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 12, pp. 2481-2488, 2012. [3] M. Norouzi, A. Punjani, and D. J. Fleet, “Fast search in Hamming space with multi-index hashing,” In Proceedings of International Conference on Computer Vision and Pattern Recognition, 2012. [4] A. Gordo, F. Perronnin, Y. Gong, and S. Lazebnik, “Asymmetric distances for binary embeddings,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, No. 1, pp. 33-47, 2014.

Paper Title (use style: paper title)

search and compact storage space. Although search ... neighbor search methods in the binary space. ... Given a query ∈ { } , we list the online search algorithm.

90KB Sizes 1 Downloads 259 Views

Recommend Documents

Paper Title (use style: paper title) - Sites
Android application which is having higher graphics or rendering requirements. Graphics intensive applications such as games, internet browser and video ...

Paper Title (use style: paper title) - GitHub
points in a clustered data set which are least similar to other data points. ... data mining, clustering analysis in data flow environments .... large than the value of k.

Paper Title (use style: paper title)
College of Computer Science. Kookmin ... of the distinct words for clustering online news comments. In ... This work was supported by the Basic Science Research Program through .... is performed on class-wise reviews as depicted in Fig. 1(b).

Paper Title (use style: paper title)
School of Electrical Engineering, KAIST .... [Online]. Available: http://yann.lecun.com/exdb/mnist/. [5] Design Compiler User Guide, Synopsys, Mountain View, CA, ...

Paper Title (use style: paper title)
on the substrate, substrate pre-deposition process, and Pd deposition .... concentration is below the ignition threshold, which is often important for such a sensor.

Paper Title (use style: paper title)
Turin, Italy [email protected]. Hui Wang. School of Information Engineering. Nanchang Institute of Technology. Nanchang 330099, China [email protected]. Abstract—Frequency Modulation (FM) sound synthesis provides a neat synthesis

Paper Title (use style: paper title)
mobile wireless networking, it is becoming possible to monitor elderly people in so-called ... sensor network that might be used in order to recognize tasks described in Table 1. ..... its advantages, and their relative merits and demerits are still.

Paper Title (use style: paper title)
zero which means cosθ tends to 1. The distance between each of the test vectors and profile vectors were obtained using (2). If the cosine value between the test vector and profile hub vector was greater than the cosine value between the same test v

Paper Title (use style: paper title)
communication channel between the sensors and the fusion center: a Binary ..... location estimation in sensor networks using binary data," IEEE Trans. Comput., vol. ... [9] K. Sha, W. Shi, and O. Watkins, "Using wireless sensor networks for fire.

Paper Title (use style: paper title)
Research Program Fellowships, the University of Central Florida – Florida. Solar Energy Center (FSEC), and a NASA STTR Phase I contract. NNK04OA28C. ...... Effluents Given Off by Wiring Insulation," Review of Progress in. QNDE, vol. 23B ...

Paper Title (use style: paper title)
In Long term Evolution. (LTE), HARQ is implemented by MAC level module called .... the receiver is decoding already received transport blocks. This allows the ...

use style: paper title
helps learners acquire scientific inquiry skills. One of ... tutoring systems; LSA; natural language processing ..... We collected data from 21 college students who.

Paper Title (use style: paper title)
Reducing Power Spectral Density of Eye Blink Artifact through Improved Genetic ... which could be applied to applications like BCI design. MATERIALS AND ...

Paper Title (use style: paper title)
general, SAW technology has advantages over other potentially competitive ... SAW devices can also be small, rugged, passive, wireless, and radiation hard,.

Paper Title (use style: paper title)
provide onboard device sensor integration, or can provide integration with an .... Figure 2 Schematic diagram of a 7 chip OFC RFID tag, and. OFC measured and ..... [3] C. S. Hartmann, "A global SAW ID tag with large data capacity," in Proc.

Paper Title (use style: paper title) - Research at Google
decades[2][3], but OCR systems have not followed. There are several possible reasons for this dichotomy of methods: •. With roots in the 1980s, software OCR ...

Paper Title (use style: paper title) - Research
grams for two decades[1]. Yet the most common question addressed to the author over more than two decades in OCR is: “Why don't you use a dictionary?

Paper Title (use style: paper title)
determine the phase error at unity-gain frequency. In this paper, while comparing some topologies we ... degrees at the integrator unity gain frequency result in significant filter degradation. Deviations from the .... due to gm/Cgd occur at a much h

Paper Title (use style: paper title)
Abstract— The Open Network and Host Based Intrusion Detection. Testbed .... It is unique in that it is web-based. .... sensor is also the application web server.

Paper Title (use style: paper title)
Orlando, FL 32816-2450 (email: [email protected]). Brian H. Fisher, Student .... presentation provides a foundation for the current efforts. III. PALLADIUM ...

Paper Title (use style: paper title)
A VLSI architecture for the proposed method is implemented on the Altera DE2 FPGA board. Experimental results show that the proposed design can perform Chroma-key effect with pleasing quality in real-time. Index Terms—Chroma-key effect, K-means clu

Paper Title (use style: paper title)
the big amount of texture data comparing to a bunch of ... level and a set of tile data stored in the system memory from ... Figure 1: Architecture of our algorithm.

Paper Title (use style: paper title)
printed texts. Up to now, there are no ... free format file TIFF. ... applied on block texts without any use of pre- processing ... counting [12, 13] and the reticular cell counting [1]. The main ..... Computer Vision and Image Understanding, vol. 63

Paper Title (use style: paper title)
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798. Abstract— ... For 60GHz wireless communication systems, the ... the benefit of isolated DC noise from the tuning element. The load on ...