arXiv:1301.4171v1 [cs.IR] 17 Jan 2013

Affinity Weighted Embedding

Ron Weiss Google Inc., New York, NY, USA. [email protected]

Jason Weston Google Inc., New York, NY, USA. [email protected]

Hector Yee Google Inc., San Bruno, CA, USA. [email protected]

Abstract Supervised (linear) embedding models like Wsabie [5] and PSI [1] have proven successful at ranking, recommendation and annotation tasks. However, despite being scalable to large datasets they do not take full advantage of the extra data due to their linear nature, and typically underfit. We propose a new class of models which aim to provide improved performance while retaining many of the benefits of the existing class of embedding models. Our new approach works by iteratively learning a linear embedding model where the next iteration’s features and labels are reweighted as a function of the previous iteration. We describe several variants of the family, and give some initial results.

1 (Supervised) Linear Embedding Models Standard linear embedding models are of the form: f (x, y) = x⊤ U ⊤ V y =

X

xi Ui⊤ Vj yj .

ij

where x are the input features and y is a possible label (in the annotation case), document (in the information retrieval case) or item (in the recommendation case). These models are used in both supervised and unsupervised settings. In the supervised ranking case, they have proved successful in many of the tasks described above, e.g. the Wsabie algorithm [5, 4, 6] which approximately optimizes precision at the top of the ranked list has proven useful for annotation and recommendation. These methods scale well to large data and are simple to implement and use. However, as they contain no nonlinearities (other than in the feature representation in x and y) they can be limited in their ability to fit large complex datasets, and in our experience typically underfit.

2 Affinity Weighted Embedding Models In this work we propose the following generalized embedding model: X Gij (x, y) xi Ui⊤ Vj yj . f (x, y) = ij

where G is a function, built from a previous learning step, that measures the affinity between two points. Given a pair x, y and feature indices i and j, G returns a scalar. Large values of the scalar indicate a high degree of match. Different methods of learning (or choosing) G lead to different variants of our proposed approach: 1

• Gij (x, y) = G(x, y). In this case each feature index pair i, j returns the same scalar so the model reduces to: f (x, y) = G(x, y) x⊤ U ⊤ V y. • Gij (x, y) = Gij . In this case the returned scalar for i, j is the same independent of the input vector x and label y, i.e. it is a reweighting of the feature pairs. This gives the model: X Gij xi Ui⊤ Vj yj . f (x, y) = ij

This is likely only useful in large sparse feature spaces, e.g. if Gij represents the weight of a word-pair in an information retrieval task or an item-pair in a recommendation task. Further, it is possible that Gij could take a particular form, e.g. it isPrepresented as a low rank matrix Gij = gi⊤ gj . In that case we have the model f (x, y) = ij gi⊤ gj xi Ui⊤ Vj yj . While it may be possible to learn the parameters of G jointly with U and V here we advocate an iterative approach: 1. Train a standard embedding model: f (x, y) = x⊤ U ⊤ V y. 2. Build G using the representation learnt in (1). P ¯ ⊤ V¯j yj . 3. Train a weighted model: f (x, y) = ij Gij (x, y) xi U i ¯ 4. Possibly repeat the procedure further: build G from (3). (So far we have not tried this). Note that the training algorithm used for (3) is the same as for (1) – we only change the model. In the following, we will focus on the Gij (x, y) = G(x, y) case (where we only weight examples, not features) and a particular choice of G1 : G(x, y) =

m X

exp(−λx ||U x − U xi ||2 ) exp(−λy ||y − yi ||2 )

(1)

i=1

where x and y are the sets of vectors from the training set. G is built using the embedding U learnt in step (1), and is then used to build a new embedding model in step (3). Due to the iterative nature of the steps we can compute G for all examples in parallel using a MapReduce framework, and store the training set necessary for step (3), thus making learning straight-forward. To decrease storage, instead of computing a smooth G as above we can clip (sparsify) G by taking only the top n nearest neighbors to U x, and set the rest to 0. Further we take λy suitably large such that exp(−λy ||y − yi ||2 ) either gives 1 for yi = y or 0 otherwise2 . In summary, then, for each training example, we simply have to find the (n = 20 in our experiments) nearest neighboring examples in the embedding space, and then we reweight their labels using eq. 1. (All other labels would then receive a weight of zero, although one could also add a constant bias to guarantee those labels can receive non-zero final scores.)

3 Experiments So far, we have conducted two preliminary experiments on Magnatagatune (annotating music with text tags) and ImageNet (annotation images with labels). Wsabie has been applied to both tasks previously [4, 5]. On Magnatagatune we used MFCC features for both Wsabie and our method, similar to those used in [4]. For both models we used an embedding dimension of 100. Our method improved over Wsabie marginally as shown in Table 1. We speculate that this improvement is small due to the small size Pm 1 Although perhaps G(x, y) = exp(−λx ||U x − U xi ||2 ) exp(−λy ||V y − V yi ||2 ) would be more i=1 Pm 2 2

natural. Further we could also consider Gorig (x, y) = i=1 exp(−λx ||x−xi || ) exp(−λy ||y −yi || ) which does not make use of the embedding in step (1) at all. This would likely perform poorly when the input features are too sparse, which would be the point of improving the representation by learning it with U and V . 2 This is useful in the label annotation or item ranking settings, but would not be a good idea in an information retrieval setting.

2

Table 1: Magnatagatune Results Algorithm k-Nearest Neighbor k-Nearest Neighbor (Wsabie space) Wsabie Affinity Weighted Embedding

Prec@1

Prec@3

39.4% 45.2% 48.7% 52.7%

28.6% 31.9% 37.5% 39.2%

Table 2: ImageNet Results (Fall 2011, 21k labels) Algorithm

Prec@1

Wsabie (KPCA features) k-Nearest Neighbor (Wsabie space) Affinity Weighted Embedding Convolutional Net [2]

9.2% 13.7% 16.4% 15.6% (NOTE: on a different train/test split)

of the dataset (only 16,000 training examples, 104 input dimensions for the MFCCs and 160 unique tags). We believe our method will be more useful on larger tasks. On the ImageNet task (Fall 2011, 10M examples, 474 KPCA features and 21k classes) the improvement over Wsabie is much larger, shown in Table 2. We used similar KPCA features as in [5] for both Wsabie and our method. We use an embedding dimension of 128 for both. We also compare to nearest neighbor in the embedding space. For our method, we used the max instead of the sum in eq. (1) as it gave better results. Our method is competitive with the convolutional neural network model of [2] (note, this is on a different train/test split). However, we believe the method of [3] would likely perform better again if applied in the same setting.

4 Conclusions In conclusion, by incorporating a learnt reweighting function G into supervised linear embedding we can increase the capacity of the model leading to improved results. One issue however is that the cost of reducing underfitting by using G is that it both increases the storage and computational requirements of the model. One avenue we have begun exploring in that regard is to use approximate methods in order to compute G.

References [1] B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, C. Cortes, and M. Mohri. Polynomial semantic indexing. In NIPS, 2009. [2] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, A. Senior, P. Tucker, K. Yang, et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 25, pages 1232–1240, 2012. [3] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106–1114, 2012. [4] J. Weston, S. Bengio, and P. Hamel. Large-scale music annotation and retrieval: Learning to rank in joint semantic spaces. In Journal of New Music Research, 2012. [5] J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In Intl. Joint Conf. Artificial Intelligence, (IJCAI), pages 2764–2770, 2011. [6] J. Weston, C. Wang, R. Weiss, and A. Berenzeig. Latent collaborative retrieval. ICML, 2012.

3

arXiv:1301.4171v1 [cs.IR] 17 Jan 2013 - Ron Weiss

Jan 17, 2013 - in eq. (1) as it gave better results. Our method is competitive with the convolutional ... model of [2] (note, this is on a different train/test split).

51KB Sizes 1 Downloads 237 Views

Recommend Documents

arXiv:1301.4171v1 [cs.IR] 17 Jan 2013 - Columbia EE
Jan 17, 2013 - Affinity Weighted Embedding. 52.7%. 39.2%. Table 2: ImageNet Results (Fall 2011, 21k labels). Algorithm. Prec@1. Wsabie (KPCA features).

CSIR-NET-LDec-2013-Question-Paper-with-Answer-Key-solved.pdf
CSIR-NET-LDec-2013-Question-Paper-with-Answer-Key-solved.pdf. CSIR-NET-LDec-2013-Question-Paper-with-Answer-Key-solved.pdf. Open. Extract.

bbg-jan-june-2013.pdf
MAINTENANCE SAINT PAUL. AVAYA GOVERNMENT SOLUTIONS INC. Jun 4, 2013 $3,618.48 CUSTOM COMPUTER PROGRAMMING SERVICES FAIRFAX.

dec-jan dov 2013 web.pdf
and in other christian traditions . . . Father Louis Beasley-Suffolk, The Presbytery, South Street,. Wincanton ba9 9dh, 01963 34408; louis.beasley-suffolk@virgin.

TACC Schedule Jan 17.pdf
TACC Schedule Jan 17.pdf. TACC Schedule Jan 17.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying TACC Schedule Jan 17.pdf.

ELITE - JAN 17.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Main menu.

jan-feb 17.pdf
Temporal Topic Models, Proceedings of the IEEE International Conference on Data. Mining, December 2014. Whoops! There was a problem loading this page.

REASONING 17-Jan-16.pdf
Jan 17, 2016 - There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

SY Jan 17 Contact Us.pdf
Page 1 of 2. FCPS Food Services. [email protected]. Gwendolyn Jones, Food Services. Supervisor. [email protected]. (434)510-1013.

Newsletter - Jan & Feb'17.pdf
workshop. Students learned. about the Assamese Bamboo. Dance which is common to. both Indian and Thai cultures. In the Art workshop students. learned the.

Week 17- Jan 6.pdf
Jan 20, 2017 - to the reader. Students will also. write about Martin Luther King Jr. after reading several texts about. his life and accomplishments. Next Week's. Share Schedule. Monday- Barrett. Tuesday- Logan. Wed.- Braman. Thursday- Teagan. Friday

Bern-17-Jan-2017.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Bern-17-Jan-2017.pdf. Bern-17-Jan-2017.pdf. Open. Extract.

organizational meeting-Jan 2013.pdf
Laura Klosterman, Treasurer did roll call: Jeff Monnin, present; John Heitbrink, present; Connie. Meiring, present; Trina Nixon, present; Kurt Forsthoefel, present.

CIO Letter - Jan 17.pdf
Trump as the next President of the United States of America. Both these announcements surely. affect us in India. Demonetization directly affects 1.2 Billion ...

TACC Teams Jan 17.pdf
Page 1. Whoops! There was a problem loading more pages. TACC Teams Jan 17.pdf. TACC Teams Jan 17.pdf. Open. Extract. Open with. Sign In. Main menu.

III. Immunopathologie jan 17.pdf
Les leucotriènes et les prostaglandines (médiateurs secondaires) ont. Page 3 of 23. III. Immunopathologie jan 17.pdf. III. Immunopathologie jan 17.pdf. Open.

jeremy c. weiss
Department of Computer Sciences. B.A./B.S. University of Pennsylvania. 2003 – 2007. Department of Mathematics, Department of Biochemistry. H.S.. The Lakeside School. 1999 – 2003. RESEARCH EXPERIENCE. Carnegie Mellon University. Aug. 2016 - Presen

CSIR-NATIONAL ENVIRONMENTAL ENGINEERING RESEARCH ...
Either staple it ... (ii) Nearest Railway Station to the normal place of residence: .... ENVIRONMENTAL ENGINEERING RESEARCH INSTITUTE application.pdf.

LMPSStability06-17-2013.pdf
incomplete information will in general have substantive behavioral implications. 1. Page 3 of 72. LMPSStability06-17-2013.pdf. LMPSStability06-17-2013.pdf.

12014 JAN BÇO OCORRENCIAS POLIC 2013 COLETIVA.pdf ...
... e Defesa Social do Pará. Page 1 of 36. Page 2 of 36. QUESTÕES DE ESCLARECIMENTO. Page 2 of 36 ... alimentados. Page 3 of 36. 12014 JAN BÇO ...

Idukki District Kalolsavam 2013 - 2014 GHSS KALLAR 07 Jan 2014 ...
Page 1. Page 2. Page 3. Page 4. Page 5. Page 6. Page 7. Page 8. Page 9. Page 10. Page 11.

BOT Quarterly Meeting Report Jan 2013 v2 - Unitarian Universalist ...
communities, in accountable relationships and alive with transforming power, moving our local communities and the world towards more love, justice, and peace ...

DRAFT Board Meeting -9th Jan 2013.pdf
DRAFT Minutes of Board Meeting held on January 9th 2013. Agenda: 1. Adoption of minutes of Board meeting dated December 2012. 2. Matters Arising. 3.