arXiv:1301.4171v1 [cs.IR] 17 Jan 2013

Affinity Weighted Embedding

Ron Weiss Google Inc., New York, NY, USA. [email protected]

Jason Weston Google Inc., New York, NY, USA. [email protected]

Hector Yee Google Inc., San Bruno, CA, USA. [email protected]

Abstract Supervised (linear) embedding models like Wsabie [5] and PSI [1] have proven successful at ranking, recommendation and annotation tasks. However, despite being scalable to large datasets they do not take full advantage of the extra data due to their linear nature, and typically underfit. We propose a new class of models which aim to provide improved performance while retaining many of the benefits of the existing class of embedding models. Our new approach works by iteratively learning a linear embedding model where the next iteration’s features and labels are reweighted as a function of the previous iteration. We describe several variants of the family, and give some initial results.

1 (Supervised) Linear Embedding Models Standard linear embedding models are of the form: f (x, y) = x⊤ U ⊤ V y =

X

xi Ui⊤ Vj yj .

ij

where x are the input features and y is a possible label (in the annotation case), document (in the information retrieval case) or item (in the recommendation case). These models are used in both supervised and unsupervised settings. In the supervised ranking case, they have proved successful in many of the tasks described above, e.g. the Wsabie algorithm [5, 4, 6] which approximately optimizes precision at the top of the ranked list has proven useful for annotation and recommendation. These methods scale well to large data and are simple to implement and use. However, as they contain no nonlinearities (other than in the feature representation in x and y) they can be limited in their ability to fit large complex datasets, and in our experience typically underfit.

2 Affinity Weighted Embedding Models In this work we propose the following generalized embedding model: X Gij (x, y) xi Ui⊤ Vj yj . f (x, y) = ij

where G is a function, built from a previous learning step, that measures the affinity between two points. Given a pair x, y and feature indices i and j, G returns a scalar. Large values of the scalar indicate a high degree of match. Different methods of learning (or choosing) G lead to different variants of our proposed approach: 1

• Gij (x, y) = G(x, y). In this case each feature index pair i, j returns the same scalar so the model reduces to: f (x, y) = G(x, y) x⊤ U ⊤ V y. • Gij (x, y) = Gij . In this case the returned scalar for i, j is the same independent of the input vector x and label y, i.e. it is a reweighting of the feature pairs. This gives the model: X Gij xi Ui⊤ Vj yj . f (x, y) = ij

This is likely only useful in large sparse feature spaces, e.g. if Gij represents the weight of a word-pair in an information retrieval task or an item-pair in a recommendation task. Further, it is possible that Gij could take a particular form, e.g. it isPrepresented as a low rank matrix Gij = gi⊤ gj . In that case we have the model f (x, y) = ij gi⊤ gj xi Ui⊤ Vj yj . While it may be possible to learn the parameters of G jointly with U and V here we advocate an iterative approach: 1. Train a standard embedding model: f (x, y) = x⊤ U ⊤ V y. 2. Build G using the representation learnt in (1). P ¯ ⊤ V¯j yj . 3. Train a weighted model: f (x, y) = ij Gij (x, y) xi U i ¯ 4. Possibly repeat the procedure further: build G from (3). (So far we have not tried this). Note that the training algorithm used for (3) is the same as for (1) – we only change the model. In the following, we will focus on the Gij (x, y) = G(x, y) case (where we only weight examples, not features) and a particular choice of G1 : G(x, y) =

m X

exp(−λx ||U x − U xi ||2 ) exp(−λy ||y − yi ||2 )

(1)

i=1

where x and y are the sets of vectors from the training set. G is built using the embedding U learnt in step (1), and is then used to build a new embedding model in step (3). Due to the iterative nature of the steps we can compute G for all examples in parallel using a MapReduce framework, and store the training set necessary for step (3), thus making learning straight-forward. To decrease storage, instead of computing a smooth G as above we can clip (sparsify) G by taking only the top n nearest neighbors to U x, and set the rest to 0. Further we take λy suitably large such that exp(−λy ||y − yi ||2 ) either gives 1 for yi = y or 0 otherwise2 . In summary, then, for each training example, we simply have to find the (n = 20 in our experiments) nearest neighboring examples in the embedding space, and then we reweight their labels using eq. 1. (All other labels would then receive a weight of zero, although one could also add a constant bias to guarantee those labels can receive non-zero final scores.)

3 Experiments So far, we have conducted two preliminary experiments on Magnatagatune (annotating music with text tags) and ImageNet (annotation images with labels). Wsabie has been applied to both tasks previously [4, 5]. On Magnatagatune we used MFCC features for both Wsabie and our method, similar to those used in [4]. For both models we used an embedding dimension of 100. Our method improved over Wsabie marginally as shown in Table 1. We speculate that this improvement is small due to the small size Pm 1 Although perhaps G(x, y) = exp(−λx ||U x − U xi ||2 ) exp(−λy ||V y − V yi ||2 ) would be more i=1 Pm 2 2

natural. Further we could also consider Gorig (x, y) = i=1 exp(−λx ||x−xi || ) exp(−λy ||y −yi || ) which does not make use of the embedding in step (1) at all. This would likely perform poorly when the input features are too sparse, which would be the point of improving the representation by learning it with U and V . 2 This is useful in the label annotation or item ranking settings, but would not be a good idea in an information retrieval setting.

2

Table 1: Magnatagatune Results Algorithm k-Nearest Neighbor k-Nearest Neighbor (Wsabie space) Wsabie Affinity Weighted Embedding

Prec@1

Prec@3

39.4% 45.2% 48.7% 52.7%

28.6% 31.9% 37.5% 39.2%

Table 2: ImageNet Results (Fall 2011, 21k labels) Algorithm

Prec@1

Wsabie (KPCA features) k-Nearest Neighbor (Wsabie space) Affinity Weighted Embedding Convolutional Net [2]

9.2% 13.7% 16.4% 15.6% (NOTE: on a different train/test split)

of the dataset (only 16,000 training examples, 104 input dimensions for the MFCCs and 160 unique tags). We believe our method will be more useful on larger tasks. On the ImageNet task (Fall 2011, 10M examples, 474 KPCA features and 21k classes) the improvement over Wsabie is much larger, shown in Table 2. We used similar KPCA features as in [5] for both Wsabie and our method. We use an embedding dimension of 128 for both. We also compare to nearest neighbor in the embedding space. For our method, we used the max instead of the sum in eq. (1) as it gave better results. Our method is competitive with the convolutional neural network model of [2] (note, this is on a different train/test split). However, we believe the method of [3] would likely perform better again if applied in the same setting.

4 Conclusions In conclusion, by incorporating a learnt reweighting function G into supervised linear embedding we can increase the capacity of the model leading to improved results. One issue however is that the cost of reducing underfitting by using G is that it both increases the storage and computational requirements of the model. One avenue we have begun exploring in that regard is to use approximate methods in order to compute G.

References [1] B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, C. Cortes, and M. Mohri. Polynomial semantic indexing. In NIPS, 2009. [2] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, A. Senior, P. Tucker, K. Yang, et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 25, pages 1232–1240, 2012. [3] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106–1114, 2012. [4] J. Weston, S. Bengio, and P. Hamel. Large-scale music annotation and retrieval: Learning to rank in joint semantic spaces. In Journal of New Music Research, 2012. [5] J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In Intl. Joint Conf. Artificial Intelligence, (IJCAI), pages 2764–2770, 2011. [6] J. Weston, C. Wang, R. Weiss, and A. Berenzeig. Latent collaborative retrieval. ICML, 2012.

3

Affinity Weighted Embedding

Jan 17, 2013 - contain no nonlinearities (other than in the feature representation in x and y) they can be limited in their ability to fit large complex datasets, and ...

51KB Sizes 3 Downloads 399 Views

Recommend Documents

Affinity Weighted Embedding - JMLR Workshop and Conference ...
39.2%. 5.2. ImageNet. ImageNet (Deng et al., 2009) is a large scale image dataset organized according to WordNet (Fellbaum, 1998). Con- cepts in WordNet ...

Embedding Denial
University of Melbourne [email protected]. April 10, 2011. 1 Introduction ...... denial fit to express disagreement? We've got half of what we want: if I assert.

Maximum Margin Embedding
is formulated as an integer programming problem and we .... rate a suitable orthogonality constraint such that the r-th ..... 5.2 Visualization Capability of MME.

Cauchy Graph Embedding
ding results preserve the local topology of the ... local topology preserving property: a pair of graph nodes ..... f(x)=1/(x2 + σ2) is the usual Cauchy distribution.

Certain Affinity - googleusercontent.com
Establish data driven decisions as a key pillar of the design ... analytics provider will ever have everything you need out of the box, it is great to have a solution ...

Certain Affinity - googleusercontent.com
pillar of the design process starting in pre-production ... Patrick Bergman, Business Development Manager. Background ... Ramping Up Reporting. Certain Affinity has used Google Analytics for website analytics since 2005, and began talking ...

Certain Affinity - googleusercontent.com
a successful soft launch. “The flexibility ... All other company and product names may be trademarks of the respective companies with which they are associated.

Certain Affinity
The Google Mobile App Analytics was pivotal ... first mobile title, Age of Booty: Tactics ... The most notable among those are co-development on the Left 4. Dead ...

Certain Affinity
Analytics. “Google Analytics was a vital component of our overall development process. It ... Patrick Bergman, Business Development Manager ... The early inclusion of GA into the design process resulted in easy access to analytics to assist in infl

Factor-based Compositional Embedding Models
Human Language Technology Center of Excellence. Center for .... [The company]M1 fabricates [plastic chairs]M2 ... gf ⊗ hf . We call efi the substructure em-.

Certain Affinity
Approach. • Establish data driven decisions as a key pillar of the design process starting in pre-production. • Integrate Google Analytics to understand and optimize UI flow, customer behavior, and accessibility. • Leverage GA for A/B testing t

Circulant Binary Embedding - Sanjiv Kumar
to get the binary code: h(x) = sign(RT. 1 ZR2). (2). When the shapes of Z, R1, R2 are chosen appropriately, the method has time and space complexity of O(d1.5) ...

Tissue Embedding Center.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

Tangent-Corrected Embedding
lying instead on local Euclidean distances that can be mis- leading in image space. .... 0 means the pair is parallel (aligned) and 1 means the pair is orthogonal ...

Weighted Automata Algorithms - Semantic Scholar
A finite-state architecture for tokenization and grapheme-to- phoneme conversion in multilingual text analysis. In Proceedings of the ACL. SIGDAT Workshop, Dublin, Ireland. ACL, 1995. 57. Stephen Warshall. A theorem on Boolean matrices. Journal of th

Affinity Dryer Manual.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Affinity Dryer ...

Zero-distortion lossless data embedding
This way, the histograms of the host-data and the embedded data do not overlap ..... include comparison with other methods, transform domain embedding and.

Embodied voices: embedding contemporary Afro-Brazilian women ...
Jun 20, 2012 - To link to this article: http://dx.doi.org/10.1080/17528631.2012.695219. PLEASE ... who speak up and make themselves heard as Afro-Brazilian women writers today. In ... *Email: [email protected] ...... 7. http://cadernosnegrospo

Embodied voices: embedding contemporary Afro-Brazilian women ...
Jun 20, 2012 - who speak up and make themselves heard as Afro-Brazilian women writers today. In regard to the work ... *Email: [email protected] ..... politics, which, in the case of the Afro-Brazilian movement (as a responsive social.

Embedding change- lessons from leaders.pdf
There has been growing support for the idea that greater access to public sector information can. improve the lives of citizens – through better service delivery ...

EMBEDDING PROPER ULTRAMETRIC SPACES INTO ...
Mar 8, 2012 - above. Put Nk := #Pk. We consider each coordinate of an element of ℓNk p is indexed by. (i1,··· ,ik). We define a map fk : {xi1···ik }i1,··· ,ik → ℓNk.

HINE: Heterogeneous Information Network Embedding
The common factor shared by various network embedding approaches (e.g., ..... performance of Exact Match, Macro-F1 and Micro-F1 over ten different runs. For all ..... S., Cormode, G., Muthukrishnan, S.: Node classification in social networks.

Affinity Washer Manual.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Affinity Washer ...

Weighted Bench Dip.pdf
the movement. Forearms should always be pointing down. 3. 4. ... stay there throughout the movement. Page 1 of 1. Weighted Bench Dip.pdf. Weighted Bench ...