MISMATCH REMOVAL VIA COHERENT SPATIAL MAPPING Jiayi Ma1 ,

Ji Zhao1 ,

Yu Zhou2 ,

Jinwen Tian1

1

Institute for Pattern Recognition and Artificial Intelligence 2 Department of Electronics and Information Engineering Huazhong University of Science and Technology, Wuhan, 430074, China. {jyma2010, zhaoji84, zhouyu.hust}@gmail.com, [email protected] ABSTRACT We propose a method for removing mismatches from given putative point correspondences in image pairs. Our algorithm aims to recover the underlying coherent spatial mapping which related to inliers. The thin-plate spline (TPS) is chosen to parameterize the coherent spatial mapping, and we formulate the solution of it as a maximum likelihood problem. The mismatches could be successfully removed after the EM algorithm, which we used for solving the problem, converges. The quantitative results on various experimental data demonstrate that our method outperforms many state-ofthe-art methods. Moreover, the proposed method is also able to handle the case that image pairs contain non-rigid motions. Index Terms— Mismatch removal, thin-plate splines, nonlinear mapping, outlier, point correspondence 1. INTRODUCTION This paper focuses on establishing accurate point correspondences between two images of the same scene. Many of the computer vision tasks such as building 3D models, camera self-calibration, registration, object recognition, and structure and motion recovery [1] start by assuming that the point correspondences and two-view image relations have been successfully recovered. Point correspondences are in general established by comparing the distances of keypoints’ local features. This may result in a number of mismatches (outliers) due to viewpoint changes, occlusions, repeated structures, etc. The existence of mismatches is usually enough to ruin the traditional estimation methods. In this case, robust estimators are developed to provide reliable point correspondences [2]. These methods search for sets of matches which consistent with some global geometric constraints. During the last decades, various robust estimators have been proposed in the statistics and computer vision literatures. Here we briefly review some of which that are widely used. Among the statistics community, two representatives are Maximum-likelihood estimators (M-estimators) [3] and Least Median of Squares (LMedS) estimator [4]. The former

minimizes the sum of symmetric, positive-definite functions of residuals with a unique minimum at zero, while the latter minimizes the median of squared residuals. In the computer vision community, RANSAC [2] and MLESAC [5] are two widely used robust estimators. They both try to get a minimum outlier-free subset to estimate a given parametric model by resampling, and the difference is that MLESAC chooses the solution which maximizes the likelihood rather than the inlier count as in RANSAC. Recently, there appear some new non-parametric model based methods, such as Identifying point correspondences by Correspondence Function (ICF) [6] and Vector Field Consensus (VFC) [7]. The former rejects mismatches though learning a correspondence function pair which map points in one image to their corresponding points in the other image, while the latter converts the mismatch removal problem into a robust vector field learning problem, and learns a smooth field to fit the potential inliers as well as estimates a consensus inlier set. In this paper, we present a new method for mismatch removal. Notice the fact that an image pair contain the same 3D scene, and then it will in general exist a smooth nonlinear spatial mapping which could fit the correct point correspondences well. And the mismatches will be easily removed if we recover this mapping. Motivated by this, we parameterize the mapping by a general purpose spline tool — the thin-plate spline (TPS) and focus on recovering it. Experimental results on various image data show the effectiveness of this method. 2. METHOD Given a set of putative image point correspondences S = {(xn , yn )}N n=1 in two views which may be perturbed by noises and outliers, the goal is to remove outliers contained in the point set to establish reliable point correspondences. Without loss of generality, a nonlinear mapping f : yn = f (xn ) is adopted to characterize the underlying coherent spatial relations of the inliers. And we call this mapping coherent spatial mapping. Obviously, if we successfully recover the mapping f , then the mismatches can be easily removed. However, estimating the coherent spatial mapping also requires reliable point correspondences. To solve this dilemma,

we formulate the problem as a maximum likelihood problem, and then solve it under an EM framework. 2.1. A maximum likelihood formulation We give a maximum likelihood formulation for computing the coherent spatial mapping f . In the following we make the assumption that the noise on inliers is Gaussian with zero mean and uniform standard deviation σ, and the outlier distribution is uniform [5, 7]. Thus, the likelihood is a mixture model as:  N  Y 1−γ γ − kyn −f (x2 n )k2 2σ , e + p(Y|X, θ) = 2πσ 2 a n=1

(1)

where θ = {f , σ 2 , γ} is the set of unknown parameters, γ is the percentage of inliers and a is just a constant, i.e. the area of the image. X = (x1 ; · · · ; xN ) and Y = (y1 ; · · · ; yN ) are matrices of size N × 3, due to the use of homogeneous coordinates for the image point set, i.e. x = (xx , xy , 1). 2.2. An EM solution Generally speaking, the true parameter set θ maximizes likelihood (1). Now we give a maximum likelihood estimation of θ, i.e. θ ∗ = arg maxθ p(Y|X, θ). The well known EM algorithm provides a natural framework for solving this problem. The E-step basically estimates the responsibility indicating to what degree a sample belonging to inlier under the given coherent spatial mapping f , while the M-step updates f based on the current estimate of the responsibility. Following standard approach, we simply summarize the EM iteration as follows. E-step: denote vn = f (xn ), we update the responsibility pn as γe−

pn = γe−

kyn −vn k2 2σ 2

kyn −vn k2 2σ 2

.

(2)

+ 2πσ 2 (1 − γ)/a

M-step: parameters σ 2 and γ are updated as (Y − V)T P(Y − V) , σ2 = 2 · tr(P)

(3)

γ = tr(P)/N,

(4)

where P = diag(p1 , . . . , pN ), V = (v1 ; · · · ; vN ), and tr(·) is the trace. Notice that vn is a homogeneous coordinate, we should normalize it with scale 1 before computing pn and σ 2 . To complete the EM algorithm, the mapping f should be estimated in the M-step. This is the key step in our method, and we will discuss in the next section. 2.3. Estimation of the coherent spatial mapping According to the complete negative log-likelihood of equation (1), the mappingPf is estimated by minimizing a weighted N empirical error 2σ1 2 n=1 pn kyn − f (xn )k2 . This is in general ill-posed since the mapping f is not unique. Notice that

the responsibility pn is a posterior probability indicating to what degree the sample n belonging to inlier. When pn = 0, the point correspondence (xn , yn ) is considered as an outlier and will not involve in estimating the coherent spatial mapping. Thus we consider pn as a soft decision as its continuous value over the interval [0, 1]. To generate a smooth mapping fitting for the image point correspondences, we choose the TPS for parametrization. The TPS is a general purpose spline tool which produces a smooth functional mapping for supervised learning [8]. It has no free parameters that need manual tuning, and also has a close-form solution which can be decomposed into a global linear affine motion and a local non-affine warping component controlled by coefficients A and W respectively: e f (x) = x · A + K(x) · W,

(5)

e where K(x) is a 1 × N vector defined by the TPS kernel, e n (x) = K(|x − xn |). i.e. K(r) = r2 logr, and each entry K Define the kernel matrix KN ×N = {Kij } where Kij = K(|xi − xj |). With a regularization parameter λ, the nonlinear spatial mapping f can be then estimate by minimizing a TPS energy function as: 1 kP1/2 (Y − XA − KW)k2 2σ 2 λ + tr(WT KW). 2

E(A, W) =

(6)

The second smoothness term is the standard TPS regularization term, and it is the bending energy which has a physical explanation and is independent on the linear component of the coherent spatial mapping. To solve the TPS parameter pair A and W, we use  a QR decomposition [8, 9], i.e. R P1/2 X = [Q1 Q2 ] . Minimizing the energy function 0 (6), we obtain e W = Q2 (ST S + λσ 2 T + ǫI)−1 ST QT 2 Y, 1/2 e c A = R−1 QT KW), 1 (Y − P

(7) (8)

1/2 KQ2 , T = QT where S = QT 2 KQ2 , and ǫI is used for 2P numerical stability. Once the EM algorithm converges, we get the coherent spatial mapping f . The mismatches are then able to be removed by checking whether they are consistent with the estimated mapping. This is equivalent to obtaining the inlier set I from the responsibility pn with a predefined threshold τ : I = {n|pn > τ, n = 1, · · · , N }. The mismatch removal algorithm is summarized in algorithm 1. Relation to VFC: From the perspective of mismatch removal, our method is related to the VFC algorithm [7]. On the one hand, both the two algorithms try to seek a mapping f fitting the inliers well under a Bayesian framework and use the EM approach to solve it. On the other hand, our algorithm

Algorithm 1: The Mismatch Removal Algorithm

1 2 3 4 5 6 7 8 9 10

Input: Putative correspondence set S = {xn , yn }N n=1 , parameter λ, τ , and kernel K Output: Inlier set I Initialization; Construct kernel matrix K using the definition of K; repeat E-step: Update P = diag(p1 , . . . , pN ) by equation (2); M-step: Update mapping f by using equations (7) and (8); Update σ 2 and γ by equations (3) and (4); until some stopping criterion is satisfied; The inlier set is determined by I = {n|pn > τ }.

is different from VFC. The VFC algorithm convert the correspondence problem into vector field learning problem, and learning a smooth field in a reproducing kernel Hilbert space (RKHS). While in our method, the spatial mapping related to inliers is parameterized by TPS, and then the mapping can be clearly decomposed into linear and nonlinear components. Moreover, the bending energy minimized by TPS has a specific physical explanation. This may be beneficial in the case of image pairs with non-rigid motions. 3. EXPERIMENTAL RESULTS To test the mismatch removal performance of our algorithm, we performed experiments on a wide range of real images. Four additional mismatch removal methods are used for comparison: RANSAC, MLESAC, ICF and VFC. There are mainly two parameters in our algorithm: the TPS regularization parameter λ and inlier threshold τ . In practice, we find that our method is not very sensitive to parameter tuning. We set λ = 800 and τ = 0.75 throughout this paper. The open source VLF EAT toolbox1 [10] is used to determine the initial matches of SIFT [11], and the match correctness is determined using the same criterion as in [7]. Results on several image pairs. We first tested the mismatch removal performance of our method on several image pairs, including wide baseline image pairs (Mex and Tree) and image pairs of non-rigid object (Peacock and T-shirt). The results are presented in Fig. 1. The performance is characterized by precision and recall. For the Mex pair, as shown on the top row of Fig. 1, there are 158 initial correspondences with 76 mismatches; the correct match percentage is 51.90%; after using our method to remove mismatches, 84 matches are preserved, including all the 82 correct matches. The precisionrecall pair is about (97.62%, 100.00%). The rest three rows of Fig. 1 present similar results on image pairs of Tree, Peacock and T-shirt. 1 available

at: http://www.vlfeat.org/

Fig. 1: Mismatch removal results of our method on image pairs of Mex, Tree, Peacock and T-shirt. The initial correct match percentages are 51.90%, 56.29%, 71.61% and 60.67% respectively. After using our method to remove mismatches, we obtain precision-recall pairs (97.62%, 100.00%), (98.85%, 91.49%), (100.00%, 99.41%) and (100.00%, 99.45%). For each group of results, the left pair denotes the identified suspect correct matches, and the right pair denotes the removed suspect mismatches.

Now we give a performance comparison on these image pairs with the other four mismatch removal methods, as shown in Table 1. We see that MLESAC has slightly better precisions than the RANSAC with the cost of producing slightly lower recalls. The recall of ICF is quite low, although it has a satisfactory precision. Compared to these three methods, VFC and our method have satisfactory performance due to the simultaneous high precision and high recall. However, in the case of image pair containing non-rigid object, our method has even better performance. The average run time of our method on these four image pairs is about 0.5 seconds on an Intel Pentium 2.0 GHz PC with Matlab code. Notice that we did not compare to RANSAC and MLESAC on the image pairs of non-rigid object, since in this case the two view relation, i.e. fundamental matrix, modeled in RANSAC and MLESAC is no longer exist. In general, Our method is effective for mismatch removal not only on image pairs related by rigid motions but also on image pairs with non-rigid motions. Results on a dataset. We also tested our method on the dataset of Mikolajczyk et al [12], which contains image pairs of large view angle, image rotation and affine transformation,

Table 1: Performance comparison. precision-recall pairs (%).

1

The pairs in the table are

0.8

RANSAC [2] MLESAC [5] ICF [6] VFC [7] Ours

Tree

(91.76, 95.12) (93.83, 92.68) (96.15, 60.98) (96.47, 100.00) (97.62, 100.00)

Peacock

T-shirt

(94.68, 94.68) (98.82, 89.36) (92.75, 68.09) (99.12, 66.86) (99.07, 58.79) (94.85, 97.87) (99.40, 98.82) (98.88, 96.70) (98.85, 91.49) (100.00, 99.41) (100.00, 99.45)

Precision

Mex

0.6 RANSAC, MLESAC, ICF, Ours,

0.4

p=93.84%, r=98.50% p=94.99%, r=92.40% p=93.95%, r=62.69% p=98.02%, r=98.14%

0.2

0 0

0.2

0.4

0.6

0.8

1

Recall

etc. We use all the 40 pairs, and for each pair, we set the SIFT distance ratio threshold t to 1.5, 1.3 and 1.0 respectively (the greater value of t indicates the smaller amount of matches with higher correct match percentage). The initial average precision of all image pairs is 69.58%, and nearly 30 percent of the training sets have correct match percentage below 50%. Fig. 2 gives the results of four methods on this dataset, and each scattered dot represents a precision-recall pair on an image pair. The average precision-recall pairs are (93.84%, 98.50%), (94.99%, 92.40%), (93.95%, 62.69%), (98.57%, 97.75%) and (98.02%, 98.14%) for RANSAC, MLESAC, ICF, VFC and our method respectively. Note that the performances of VFC and our method are quite close, thus we omit the result of VFC in the figure for clarity. As shown, RANSAC and MLESAC have good performance on most of the image pairs; still, due to the low initial correct match percentage, the effect is not so satisfactory on several image pairs. ICF usually has high precision or recall, but not simultaneously. Our method (and VFC) has the best precisionrecall trade-off, and the scattered dots almost concentrate on the upper right corner. These results demonstrate that the mismatch removal capability of our method is not affected by low initial correct match percentage, large view angle, image rotation and affine transformation since these cases are all contained in the dataset. 4. CONCLUSION Within this paper a novel mismatch removal method based on estimating coherent spatial mapping on inliers has been shown. It alternately fits a smooth spatial mapping for inliers and detects outliers under an EM framework. The experimental results on benchmark datasets show that our method outperforms state-of-the-art methods such as RANSAC. Moreover, the effective results achieved in the non-rigid motion case show its potential value in the area of image retrieval or image-based non-rigid registration. 5. REFERENCES [1] R. Hartley and A. Zisserman, Multiple view geometry in computer vision (2nd ed.), Cambridge University Press, Cambridge, 2003.

Fig. 2: Precision-recall statistics. Our method (red circles, upper right corner) has the best precision and recall overall.

[2] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with application to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981. [3] P. J. Huber, Robust Statistics, John Wiley & Sons, New York, 1981. [4] P. J. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection, John Wiley & Sons, New York, 1987. [5] P. H. S. Torr and A. Zisserman, “MLESAC: A new robust estimator with application to estimating image geometry,” Computer Vision and Image Understanding, vol. 78, no. 1, pp. 138–156, 2000. [6] X. Li and Z. Hu, “Rejecting mismatches by correspondence function,” International Journal of Computer Vision, vol. 89, no. 1, pp. 1–17, 2010. [7] J. Zhao, J. Ma, J. Tian, J. Ma, and D. Zhang, “A robust method for vector field learning with application to mismatch removing,” in CVPR, 2011. [8] G. Wahba, Spline models for observational data, SIAM, Philadelphia, PA, 1990. [9] H. Chui and A. Rangarajan, “A new point matching algorithm for non-rigid registration,” Computer Vision and Image Understanding, vol. 89, pp. 114–141, 2003. [10] A. Vedaldi and B. Fulkerson, “VLFeat - An open and portable library of computer vision algorithms,” in MM, 2010. [11] D. Lowe, “Distinctive image features from scaleinvariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. [12] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. van Gool, “A comparison of affine region detectors,” International Journal of Computer Vision, vol. 65, no. 1, pp. 43–72, 2005.

MISMATCH REMOVAL VIA COHERENT SPATIAL ... - Semantic Scholar

{jyma2010, zhaoji84, zhouyu.hust}@gmail.com, jwtian@mail.hust.edu.cn. ABSTRACT ..... image analysis and automated cartography,” Communi- cations of the ...

583KB Sizes 0 Downloads 351 Views

Recommend Documents

Automatic, Efficient, Temporally-Coherent Video ... - Semantic Scholar
Enhancement for Large Scale Applications ..... perceived image contrast and observer preference data. The Journal of imaging ... using La*b* analysis. In Proc.

Pattern formation in spatial games - Semantic Scholar
mutant strains of yeast [9] and coral reef invertebrates [10]. ... Each node can either host one individual of a given species or it can be vacant. .... the individual always adopt the best strategy determinately, while irrational changes are allowed

Computing with Spatial Trajectories - Semantic Scholar
services (LBS), leading to a myriad of spatial trajectories representing the mobil- ... Meanwhile, transaction records of a credit card also indicate the spatial .... that can run in a batch mode after the data is collected or in an online mode as.

Aeroengine Prognostics via Local Linear ... - Semantic Scholar
The application of the scheme to gas-turbine engine prognostics is ... measurements in many problems makes application of ... linear trend thus detected in data is used for linear prediction ... that motivated their development: minimizing false.

Web Query Recommendation via Sequential ... - Semantic Scholar
wise approaches on large-scale search logs extracted from a commercial search engine. Results show that the sequence-wise approaches significantly outperform the conventional pair-wise ones in terms of prediction accuracy. In particular, our MVMM app

Web Query Recommendation via Sequential ... - Semantic Scholar
Abstract—Web query recommendation has long been con- sidered a key feature of search engines. Building a good Web query recommendation system, however, is very difficult due to the fundamental challenge of predicting users' search intent, especiall

Collaborative Filtering via Learning Pairwise ... - Semantic Scholar
assumption can give us more accurate pairwise preference ... or transferring knowledge from auxiliary data [10, 15]. However, in real ..... the most popular three items (or trustees in the social network) in the recommended list [18], in order to.

A COHERENT HOMOTOPY CATEGORY OF 2 ... - Semantic Scholar
Subject classifications : [2000] 18D05, 18B30, 55P10. Keywords : track, semitrack, homotopy 2-groupoid, triple category, homotopy pair, interchange 2-track, Toda bracket. Department of Mathematics and Applied Mathematics, University of Cape Town,. 77

Observation of Coherent Helimagnons and Gilbert ... - Semantic Scholar
Dec 12, 2012 - 1Materials Science Division, Lawrence Berkeley National .... strongly correlated charge and spin degrees of freedom, ... 1 (color online).

Observation of Coherent Helimagnons and Gilbert ... - Semantic Scholar
Dec 12, 2012 - ductivity, and shifting spectral weight to low energy. To access the .... from the Alexander von Humboldt Foundation and S. A. P. acknowledges ...

Toward reliable estimates of seed removal by ... - Semantic Scholar
access to small rodents do not work properly in the Neotropics. We used an .... mum distance of 100 m from the nearest edge of the frag- ment. Thus, seed ...

Shadow Detection and Removal in Real Images: A ... - Semantic Scholar
Jun 1, 2006 - This may lead to problems in scene understanding, object ..... Technical report, Center for Automation Research, University of Maryland, 1999.

Coherent-state discrimination via nonheralded ...
Jun 14, 2016 - version of the probabilistic amplifier induces a partial dephasing which preserves quantum coherence among low-energy eigenvectors while removing it elsewhere. A proposal to realize such a transformation based on an optical cavity impl

Learning Speed Invariant Gait Template via Thin ... - Semantic Scholar
2 Department of Computer Science .... and deform the circle depending on the subjects. Thus .... The CMU Mobo gait database [5] has 25 subjects with 6 views.

Early Stage Botnet Detection and Containment via ... - Semantic Scholar
this research is to localize weakly connected subgraphs within a graph that models network communications between .... ultimate objective behind such early state detection is botnet containment. The approach that we ...... 71–86. Staniford, S., Pax

1 Spatial Autocorrelation and the Detection of Non ... - Semantic Scholar
(1985, 1986c) are technical reports that were submitted to the US Department of Energy, and are online at .... the alternative model allows for possible spatial dependence of T, i.e. e. +. +. = ... assuming an alternative model of the form e. +. = β

Unsupervised Spatial Event Detection in Targeted ... - Semantic Scholar
Oct 28, 2014 - built with the expanded query, we first derive an optimization ..... and the keyword feature were chosen for its best performance. ..... materials/analysis tools: LZ TH JD. ... applications to biological deep web data integration.

Learning a Factor Model via Regularized PCA - Semantic Scholar
Apr 20, 2013 - parameters that best explains out-of-sample data. .... estimation by the ℓ1 norm of the inverse covariance matrix in order to recover a sparse.

Object Instance Search in Videos via Spatio ... - Semantic Scholar
The dimension of local descriptors is reduced to 32 using PCA, and the number of Gaussian mixture components is set to 512. As in [20], a separate set of images .... 7, 4th–6th rows). 2) Parameter Sensitivity: We test the Max-Path search at dif- fe

Early Stage Botnet Detection and Containment via ... - Semantic Scholar
Detecting Botnet Membership with DNSBL Counterintelligence, pp. 131–142. Springer US. Riley, G. F., Sharif, M. I. and Lee, W. (2004) Simulating Internet worms. In Pro- ceedings of the 12th International Workshop on Modeling, Analysis, and Simu- lat

Cross-layer Optimal Decision Policies for Spatial ... - Semantic Scholar
Diversity Forwarding in Wireless Ad Hoc Networks. Jing Ai ... One of the practical advantages ... advantage of inducing a short decision delay, the main.

Avoiding Interference: How People Use Spatial ... - Semantic Scholar
profit or commercial advantage and that copies bear this notice and the full citation on the first page. .... One disadvantage is the need and cost of PDAs. Another is that .... custom single display groupware application, created atop the. SDGToolki

Spatial working memory load impairs manual but ... - Semantic Scholar
female, 2 male,. 23.8±2.6years) had normal or corrected-to-normal vision and gave written informed consent. The local ethics committee ap- proved this and all the following experiments. 2.1.2. Apparatus. Participants were seated in a dimly lit, soun