Exploiting Low-rank Structure for Discriminative Sub-categorization Zheng Xu1

1

Department of Computer Science, University of Maryland, College Park, USA

2

Microsoft Research, Beijing, China

[email protected]

Xue Li1 [email protected]

Kuiyuan Yang2 [email protected]

Tom Goldstein1 [email protected]

In visual recognition, sub-categorization, which divides a category into some sub-categories, has been proposed to deal with large intra-class variance in the real world. Recent discriminant sub-categorization approaches utilize samples that do not belong to the category under consideration as negative data for supervision, and cluster positive samples of the category into sub-categories, then simultaneously train the corresponding classifier for each sub-category [2, 4]. In the jointly clustering and classification framework of previous methods, the classifier for each sub-category is trained by using samples hard-assigned to the sub-category. However, some samples would contribute to the training of several sub-categories since the intra-variance of a category is caused by complex factors. Moreover, sub-categories are closely related since they are discovered from the same category, and the common information among these sub-categories is beneficial for classifier training. We propose a new approach for discriminative sub-categorization, which adopts the exemplar based method to address the intra-variance in category, and exploits the low rank structure to preserve common information while discovering sub-categories. Our approach builds up the exemplar-LDAs [3], which generates a set of exemplar classifiers with each classifier trained by a single positive sample and all the negative samples. The extreme case of sub-category is to have only one positive sample, which is a compact set for training and modeling. We adopt exemplar classifiers to represent the compact sub-categories and preserve intra-variance in a category. In order to share common information among exemplar classifiers while preserving diversity, we jointly train the exemplar-LDAs for all the positive samples and introduce the trace-norm regularizer on the matrix of weights, as we assume the weights lie on a union of subspaces such that the matrix of weights is low-rank. We formulate the proposed low-rank least squares exemplar-LDAs − + − (LRLSE-LDAs) as follows. Let X1 = [x+ 1 , . . . , xn ] and X2 = [x1 , . . . , xm ] denote the centered data matrix1 for positive samples and negative samples, W = [w1 , . . . , wn ] denote the weight matrix where each wi is the weight vector of exemplar-LDA for a positive sample. The objective function for training the exemplar-LDAs of positive samples together is

in Eq. 3 as an equality-constrained convex optimization problem by introducing an intermediate variable F, min JLSE−LDAs (W) + ξ kFk∗ W,F

s.t. W = F

(4)

The augmented Lagrangian for the formulation in Eq. 4 can be written as: Λk2F ) (5) L(W, F, Λ ) = JLSE−LDAs (W) + ξ kFk∗ + τ2 (kW − F + Λ k2F − kΛ

where Λ is the scaled dual parameter matrix, and τ is the penalty parameter. We iteratively update variables W, F, Λ as in scaled ADMM, where W, F are updated by solving two subproblems both with closed-form solutions, and Λ is updated by dual ascent. The two subproblems are τ W = arg min JLSE−LDAs (W) + kW − F + Λ k2F 2 W τ F = arg min ξ kFk∗ + kW − F + Λ k2F 2 F

(6) (7)

where Eq. 6 has a closed-form solution benefits from the least squares form and Eq. 7 can be solved by singular value thresholding method. After training the weights of LRLSE-LDAs, we utilize those exemplar classifiers to perform sub-category discovery and visual recognition. For sub-category discovery, we adopt spectral clustering with affinity matrix defined by the prediction scores on positive samples. For visual recognition, we adopt the cross domain recognition approach in [5] by fusing the top-K prediction scores from trained exemplar classifiers. We conduct comprehensive experiments on various datasets to validate the effectiveness and efficiency of our approach in sub-category discovery and visual recognition. We follow the experimental setting in [4] to evaluate the performance of sub-category discovery. We conduct experiments on ten public datasets from the UCI repository and MNIST, which cover a large variant types of data. LRLSE-LDAs based clustering achieves promising results measured by purity on those datasets. We follow the experimental setting in [5] to evaluate the performance of visual recognition. We use the Office-Caltech dataset for object recognition and δ 1 JLSE−LDAs (W) = kWk2F + kX02 Wk2F − trace(X01 W) (1) the IXMAS dataset for action recognition. LRLSE-LDAs based classifi2 2 cation achieves order-of-magnitude speedup with matching performance where k · kF is the Frobenius norm of a matrix, trace() represents the comparing with state-of-the art in [5]. trace of a matrix. We minimize the least squares form in Eq. 1 instead of maximizing the Fisher criterion so that the objective function is convex, [1] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alinspired by [6]. Eq. 1 has closed-form solution as R ternating direction method of multipliers. Foundations and Trends W = (X2 X02 + δ I)−1 X1 (2) in Machine Learning, 3(1):1–122, 2011. [2] Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva where I is the identity matrix. To discover the structure of sub-categories, Ramanan. Object detection with discriminatively trained part-based we jointly learn the weight for positive samples/exemplars of the category models. Pattern Analysis and Machine Intelligence, IEEE Transacand regularize the weight matrix with a low-rank constraint. Finally, we tions on, 32(9):1627–1645, 2010. arrive at the objective function of LRLSE-LDAs, [3] Bharath Hariharan, Jitendra Malik, and Deva Ramanan. Discriminative decorrelation for clustering and classification. In ECCV. 2012. JLRLSE−LDAs (W) = ξ kWk∗ + JLSE−LDAs (W) (3) [4] Minh Hoai and Andrew Zisserman. Discriminative subk · k∗ is the trace norm used to regularize the weight matrix, which is a categorization. In CVPR, 2013. convex approximation of the rank of a matrix [5] Zheng Xu, Wen Li, Li Niu, and Dong Xu. Exploiting low-rank strucTo solve the convex formulation in Eq. 3, we propose an efficient alture from latent domains for domain generalization. In ECCV, 2014. gorithm based on the scaled form of alternating direction method of multi[6] Jieping Ye. Least squares linear discriminant analysis. In ICML, pliers (scaled ADMM) [1]. We reformulate minimizing JLRLSE−LDAs (W) 2007.

1 Data matrix is centered by subtracting the mean of training samples from each sample. We use mean of negative samples to approximate the mean of all negative sample and a positive sample for each exemplar classifier.

Exploiting Low-rank Structure for Discriminative Sub-categorization

recognition. We use the Office-Caltech dataset for object recognition and the IXMAS dataset for action recognition. LRLSE-LDAs based classifi- cation achieves ...

99KB Sizes 3 Downloads 225 Views

Recommend Documents

Exploiting Low-rank Structure for Discriminative Sub ...
The transpose of a vector/matrix is denoted by using superscript . A = [aij] ∈. R m×n defines a matrix A with aij being its (i, j)-th element for i = 1,...,m and j = 1,...,n,.

Exploiting Low-rank Structure for Discriminative Sub-categorization
1 Department of Computer Science,. University of Maryland,. College Park, USA. 2 Microsoft .... the top-K prediction scores from trained exemplar classifiers.

Exploiting Structure for Tractable Nonconvex Optimization
Proceedings of the 31st International Conference on Machine. Learning, Beijing, China, 2014. JMLR: W&CP volume ..... Artificial Intel- ligence, 126(1-2):5–41, ...

DaMN – Discriminative and Mutually Nearest: Exploiting ... - Stanford CS
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM .... Yu, F., Cao, L., Feris, R., Smith, J., Chang, S.F.: Designing category-level at-.

DaMN – Discriminative and Mutually Nearest: Exploiting ... - CS Stanford
6. Rui Hou, Amir Roshan Zamir, Rahul Sukthankar, Mubarak Shah. −3. −2. −1 .... pairs that involve category yi plus one (to count for its one-vs-rest classifier).

DaMN – Discriminative and Mutually Nearest: Exploiting ... - CS Stanford
information from known categories to enable existing classifiers to generalize to novel categories for which there ... is how best to identify such categories and then how to combine information from the different classifiers in a ... trains from all

DaMN – Discriminative and Mutually Nearest: Exploiting ... - CS Stanford
was originally created by domain experts in the cognitive science [18] and AI [9] com- munities. Traditionally, semantic ... attributes by mining text and images on the web. Ferrari and Zisserman [7] learn ... 2) unlike the attributes described in Se

Exploiting Syntactic Structure for Natural Language ...
Assume we compare two models M1 and M2 they assign probability PM1(Wt) and PM2(Wt) ... A common choice is to use a finite set of words V and map any word not ... Indeed, as shown in 27], for a 3-gram model the coverage for the. (wijwi-2 ...

Exploiting Problem Structure in Distributed Constraint ...
To provide communication facilities for rescue services, it may be neces- ...... In [PF03], agents try to repair constraint violations using interchangeable values ...... Supply chain management involves planning and coordinating a range of activ- ..

Discovering and Exploiting 3D Symmetries in Structure ...
degrees of gauge freedom can be held fixed in the bundle adjustment step. The similarity ... planes and Euclidean transformations that best explain these matches are ..... Int. Journal of Computer Vision, 60(2):91–110, 2004. [18] G. Loy and J.

Exploiting the graphical structure of latent Gaussian ... - amlgm2015
[1] https://github.com/heogden/rgraphpass. [2] D. Koller and N. Friedman. ... for inference in generalized linear mixed models. Electron. J. Statist., 9:135–152, 2015.

Exploiting structure in large-scale electrical circuit and power system ...
such as the QR method [14] on laptop computers for up to a few thousands ..... 10. 20. 30. 40. 50. 60 nz = 1194. Fig. 3. Sparse Jacobian matrix (left) and dense ...

Exploiting the graphical structure of latent Gaussian ... - amlgm2015
We represent this factorization structure on a dependence graph, with one node per variable, and an edge between any two variables contained within the same ...

DISCRIMINATIVE TEMPLATE EXTRACTION FOR DIRECT ... - Microsoft
Dept. of Electrical and Computer Eng. ... sulting templates match closely to in-class examples and distantly ... Dynamic programming is then used to find the optimal seg- ... and words, and thus to extract templates that have the best discrim- ...

Discriminative Reordering Models for Statistical ...
on a word-aligned corpus and second we will show improved translation quality compared to the base- line system. Finally, we will conclude in Section 6. 2 Related Work. As already mentioned in Section 1, many current phrase-based statistical machine

Discriminative pronunciation modeling for ... - Research at Google
clinicians and educators employ it for automated assessment .... We call this new phone sequence ..... Arlington, VA: Center for Applied Linguistics, 1969.

Exploiting Similarities among Languages for Machine Translation
Sep 17, 2013 - translations given GT as the training data for learn- ing the Translation Matrix. The subsequent 1K words in the source language and their ...

Compacting Discriminative Feature Space Transforms for Embedded ...
tional 8% relative reduction in required memory with no loss in recognition accuracy. Index Terms: Discriminative training, Quantization, Viterbi. 1. Introduction.

Exploiting Graphics Processing Units for ... - Springer Link
Then we call the CUDA function. cudaMemcpy to ..... Processing Studies (AFIPS) Conference 30, 483–485. ... download.nvidia.com/compute/cuda/1 1/Website/.

Exploiting desktop supercomputing for three ...
Oct 8, 2008 - resolution, the size and number of images involved in a three-dimensional reconstruction ... High resolution structural studies demand huge computational ... under the conditions that the image data, as well as the informa-.

Exploiting desktop supercomputing for three ...
Oct 8, 2008 - under the conditions that the image data, as well as the informa- ...... Also, memory and hard disk storage prices drop, leading to more powerful ...

Exploiting Treebanking Decisions for Parse Disambiguation
new potential research direction by developing a novel approach for extracting dis- .... 7.1 Accuracies obtained on in-domain data using n-grams (n=4), local.

Exploiting Treebanking Decisions for Parse Disambiguation
3See http://wiki.delph-in.net/moin/LkbTop for details about LKB. 4Parser .... is a grammar and lexicon development environment for use with unification-based.

Hybrid Generative/Discriminative Learning for Automatic Image ...
1 Introduction. As the exponential growth of internet photographs (e.g. ..... Figure 2: Image annotation performance and tag-scalability comparison. (Left) Top-k ...