ARTICLE IN PRESS

JID: KNOSYS

[m5G;December 20, 2016;7:9]

Knowledge-Based Systems 0 0 0 (2016) 1–8

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Mixed similarity learning for recommendation with implicit feedback Mengsi Liu, Weike Pan1, Miao Liu, Yaofeng Chen, Xiaogang Peng∗, Zhong Ming∗ College of Computer Science and Software Engineering, Shenzhen University, China

a r t i c l e

i n f o

Article history: Received 13 June 2016 Revised 8 December 2016 Accepted 9 December 2016 Available online xxx Keywords: Mixed similarity Implicit feedback Recommender systems

a b s t r a c t Implicit feedback such as users’ examination behaviors have been recognized as a very important source of information in most recommendation scenarios. For recommendation with implicit feedback, a good similarity measurement and a proper preference assumption are critical for the quality of personalization services. So far, the similarities used in the state-of-the-art recommendation methods include the predefined similarity and the learned similarity; and the preference assumptions include the well-known pairwise assumption. In this paper, we exploit the complementarity of the predefined similarity and the learned similarity via a novel mixed similarity model. Furthermore, we develop a novel recommendation algorithm, i.e., pairwise factored mixed similarity model (P-FMSM), based on the mixed similarity and pairwise preference assumption. Our P-FMSM is able to (i) capture the locality of the user-item interactions via the symmetric predefined similarity, (ii) model the global correlations among items via the asymmetric learned similarity, and (iii) digest the uncertain implicit feedback via the pairwise preference assumption. Empirical studies on four public datasets show that our P-FMSM can recommend significantly more accurate than several state-of-the-art methods. © 2016 Elsevier B.V. All rights reserved.

1. Introduction The task of recommendation with implicit feedback [6,27] aims to provide personalization services via learning users’ preferences from their examination behaviors. It has recently been recognized as a more important problem than the well studied numerical rating prediction problem in the Netflix $1 Million Prize [15] in most scenarios. It has also attracted attentions from industry practitioners because of the pervasiveness of the problem setting in enormous recommendation systems such as users’ browsing behaviors in e-commerce, watching patterns in video streaming sites, and check-in records in location-based mobile social networks, etc. For recommendation with implicit feedback, there are at least three categories of approaches. The first one adopts some predefined similarities such as Cosine similarity and Jaccard Index, based on which a neighborhood of items is constructed and a prediction can then be made [5]. The second one chooses to learn the similarity instead of using some predefined similarities [14].



Corresponding authors. E-mail addresses: [email protected] (M. Liu), [email protected] (W. Pan), [email protected] (M. Liu), [email protected] (Y. Chen), [email protected] (X. Peng), [email protected] (Z. Ming). 1 Co-first author. The first two authors contributed equally.

The third one first proposes a proper preference assumption and then designs a corresponding loss or objective function to be optimized for user behavior reasoning, among which the pairwise assumption [27] usually performs well regarding accuracy and efficiency. However, most recommendation methods focus on one specific aspect of those three categories of approaches such as a predefined similarity, a learned similarity or some preference assumption. Each aspect has its own advantage, e.g., the ability to capture the local and global relations in the predefined similarity and learned similarity, respectively. As far as we know, no published works aim to integrate those three aspects into one single algorithm in a principled way. In this paper, we take each aspect as a reusable component and compile them into one single algorithm as a unified whole. Specifically, we first design a mixed similarity model that combines the predefined similarity and learned similarity. And then, we embed the prediction rule based on the mixed similarity into a typical loss function of the well-known pairwise preference assumption. Finally, we obtain our algorithm called pairwise factored mixed similarity model (P-FMSM). We conduct extensive empirical studies with several closely related state-of-the-art methods on four public real-world datasets, and find that our proposed solution, i.e., P-FMSM, performs significantly better than those with one of the aforementioned three aspects or components. Furthermore, our mixed similarity

http://dx.doi.org/10.1016/j.knosys.2016.12.010 0950-7051/© 2016 Elsevier B.V. All rights reserved.

Please cite this article as: M. Liu et al., Mixed similarity learning for recommendation with implicit feedback, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.12.010

ARTICLE IN PRESS

JID: KNOSYS 2

M. Liu et al. / Knowledge-Based Systems 000 (2016) 1–8 Table 1 Some notations and explanations. U I Ui Iu Ni R = { (u, i )} sii( p ) sii( ) ) sii(m  Uu· ∈ R1×d Vi· , Wi · ∈ R1×d bi ( p) ( ) (m ) , rˆui , rˆui rˆui T λs , α

few items with the highest preference scores. Note that the preference prediction rule in Eq. (2) is usually called item-oriented recommendation, parallel to the user-oriented recommendation based on a predefined similarity between each two users.

User set, u ∈ U , |U | = n Item set, i, i , j ∈ I , |I | = m Users that examined i Items examined by u Nearest neighbors of item i Examination records Predefined similarity between i and i Learned similarity between i and i Mixed similarity between i and i User-specific latent feature vector Item-specific latent feature vector Item bias Predicted preference Iteration number Tradeoff parameter

2.3. Recommendation with learned similarity The predefined similarity is of good interpretability, easy defect tracking, and system operation and maintenance. However, the main drawback of the predefined similarity is the lack of transitivity, i.e., two items will not be connected if there is no common user (Ui ∩ Ui = ∅). As a result, such an unconnected item i will not contribute to the preference prediction as it will not be included in the item set Ni in Eq. (2). In order to address this limitation, a learned similarity is proposed [14],

learning model is significantly better than the corresponding unmixed similarity model in the same pairwise preference learning framework. These observations clearly show the effectiveness of the designed mixed similarity model and the developed recommendation algorithm. We organize the rest of the paper as follows. We first give some background about the problem definition and some existing approaches for the studied problem in Section 2, and then describe our proposed solution in detail in Section 3. We conduct extensive empirical studies in Section 4, and discuss and summarize some closely related work in Section 5. Finally, we conclude our work in Section 6. 2. Background 2.1. Problem definition In the studied problem of recommendation with implicit feedback, we have n users, m items and some associated examination records in the form of (user, item) pairs denoted as R. Our goal is to learn users’ preferences from such pairs and provide a personalized recommendation list for each user. We put some heavily used notations and their explanations in Table 1. 2.2. Recommendation with predefined similarity Most memory-based or neighborhood-based recommendation methods adopt some predefined similarity measurement such as the well-known Cosine similarity between item i and item i ,

sii( p ) =

|U ∩ Ui | , i |Ui ||Ui |

(1)

where |Ui |, |Ui | and |Ui ∩ Ui | denote the number of users examined item i, item i , and both item i and item i , respectively. Once the similarity of each item pair has been calculated, we can obtain a set of most similar items to a target item i denoted as Ni . Finally, we can make a prediction for the preference of user u on item i as follows [5], ( p) rˆui =

[m5G;December 20, 2016;7:9]



i ∈Iu ∩Ni

sii( p ) ,

(2)

where Iu ∩ Ni denotes the intersected set of items that belong to the examined items by user u, i.e., Iu , and the nearest neighbors of item i, i.e., Ni . The prediction rule in Eq. (2) represents the overall similarity between the target item i and the most similar items examined by user u, i.e., Iu ∩ Ni . With the predicted preference on each (user, item) pair, we can rank the items and recommend a

sii( ) = Vi·WiT · ,

(3)

where the similarity between item i and item i is replaced by the inner product of two latent feature vectors, i.e., Vi· ∈ R1×d for item i and Wi · ∈ R1×d for item i . The inner product of two learned latent feature vectors is unlikely to be zero. Such non-zero similarity means that two items can be connected, which overcomes the aforementioned intransitivity limitation of the predefined similarity. With the learned similarity sii( ) , a similar prediction rule can then be derived [14], ( ) rˆui = bi +



1



|Iu \{i}| i ∈Iu \{i}

sii( ) ,

(4)

where bi is the newly added item bias denoting the learned popularity of item i, |Iu \{i}| is the size of the item set Iu excluding the item i, and √ 1 is a certain normalization for comparabil|Iu \{i}|

ity among predictions from different users. For a ranking-oriented recommendation model, we usually do not include the user bias because it will not affect the ranking orders for a typical user. Note that a recommendation method with learned similarity is usually not a staged method as the item-oriented recommendation with steps of similarity calculation and neighborhood construction. Hence, we do not include the item set Ni in the prediction rule in Eq. (4), but use Iu \{i} instead for simplicity [14].

2.4. Recommendation with pairwise preference assumption For the studied problem of recommendation with implicit feedback, besides the prediction rule based on the predefined or learned similarity, another important issue is the preference assumption. A proper preference assumption is usually proposed to digest the uncertain implicit feedback with different perspectives. There are typically three types of preference assumptions, including pointwise preference assumption, pairwise preference assumption and listwise preference assumption. Empirically, the pairwise preference assumption [27] usually performs more accurate than the pointwise assumption and more efficient than the listwise assumption, which is thus a good balance between accuracy and efficiency. In pairwise preference learning [27], we usually assume that a user prefers an examined item to an unexamined one. Specifically, there are three different instantiations of this assumption, including (i) rˆui > rˆu j meaning relative preference of user u on item i and item j [27], (ii) rˆui − rˆu j = 1 meaning the preference difference is a positive constant [11,14,31], and (iii) an extension with user groups [24].

Please cite this article as: M. Liu et al., Mixed similarity learning for recommendation with implicit feedback, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.12.010

ARTICLE IN PRESS

JID: KNOSYS

[m5G;December 20, 2016;7:9]

M. Liu et al. / Knowledge-Based Systems 000 (2016) 1–8

3

Fig. 1. Illustration of mixed similarity learning for recommendation with implicit feedback.

3. Our solution 3.1. Mixed similarity model ( p)

The predefined similarity sii is known to be good at capturing

local relations among items, while the learned similarity sii( ) can model global relations well because of the transitivity. We propose to combine the predefined similarity and learned similarity in a principled way in order to make full use of both two. Specifically, (i) we first use the predefined similarity to augment the learned ( p) similarity, i.e., sii sii( ) , which is a typical way of integrating two different signals [19]; and (ii) we then further integrate the augmented similarity into the learned similarity because of its superiority in preference prediction in most cases, which will also be discussed in our empirical studies. Note that the enhanced similarity is expected to capture both the local and global relations among items. We illustrate the main idea of our mixed similarity learning in Fig. 1. Mathematically, we formalize our mixed similarity model via a linear combination as follows,

(1 − λs )sii( ) + λs sii( p ) sii( ) ,

(5)

where the tradeoff parameter λs ∈ [0, 1] is the weight on the aug( p) mented similarity sii sii( ) . When λs = 0, it reduces to the learned similarity only [14]; when λs = 1, it is similar to the augmented similarity using auxiliary content information [19]; and when 0 < λs < 1, we have a mixed similarity model, which usually performs the best in our empirical studies.

3.2. Factored mixed similarity model In order to overcome the limitation of intransitivity, we follow the seminal work [14] and use two latent feature vectors to represent each learned similarity, i.e., sii( ) = Vi·WiT · , and have a factored version of the mixed similarity, ) sii(m = ((1 − λs ) + λs sii( p ) )Vi·WiT · 

(6)

With the factored mixed similarity, we can estimate the preference of user u on item i in a similar way to that of [14], i.e.,  (m ) (m ) rˆui = bi + √ 1 for an examined item i ∈ Iu , and i ∈Iu \{i} sii |Iu \{i}|  (m ) ( m) 1 rˆu j = b j + √ for an unexamined item j ∈ Iu . We i ∈Iu s ji |I u |

( ) can see that the difference between the prediction rule of rˆui in

) FISM [14] as shown in Eq. (4) and ours is the mixed similarity sii(m  . Hence, we call our similarity model factored mixed similarity model (FMSM).

Fig. 2. The SGD algorithm for P-FMSM.

3.3. Pairwise factored mixed similarity model Furthermore, in order to develop a recommendation algorithm, we adopt the well-known pairwise preference assumption [27], and assume that the preference of an examined item i ∈ Iu is larger than that of an unexamined item j ∈ Iu , i.e., rˆui > rˆu j . For this reason, we call our algorithm pairwise factored mixed similarity model (P-FMSM). With such a pairwise preference assumption, we reach the objective function of our P-FMSM,

min 

 

u∈U i∈Iu j∈I \Iu

fui(mj ) ,

(7)

 (m ) (m ) where fui = − ln σ (rˆui ) + α2 Vi· 2 + α2 V j· 2 + α2 i ∈Iu ||Wi · ||2F + j j

α b 2 + α b 2 is a tentative objective function for the sampled i j 2 2 triple (u, i, j), and  = {Wi· , Vi· , bi , i = 1, 2, . . . , m} denotes the set of parameters to be learned. Note that although the objective function of P-FMSM as shown in Eq. (7) looks very similar to that of BPR [27], the prediction rule is totally different with the proposed factored mixed similarity model. The similarity measurement is actually a very fundamental issue as discussed before and is also our focus in this paper.

3.4. Learning the P-FMSM For learning the model parameters θ ∈  in Eq. (7), we use the popular stochastic gradient descent (SGD) algorithmic framework [27], which is described in Fig. 2. Note that the P-FMSM algorithm as shown in Fig. 2 refers to the step of “Learn Parameters” in the middle part of Fig. 1. Specifically, our algorithm contains two loops, where the outer loop is for the number of scans, and the inner loop is for one scan of the whole set of feedback. For each iteration of the inner loop, we first randomly sample an examined (user, item) pair (u, i ) ∈ R and then sample an unexamined item j ∈ I \Iu , which is a typical sampling strategy in recom-

Please cite this article as: M. Liu et al., Mixed similarity learning for recommendation with implicit feedback, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.12.010

ARTICLE IN PRESS

JID: KNOSYS 4

[m5G;December 20, 2016;7:9]

M. Liu et al. / Knowledge-Based Systems 000 (2016) 1–8

mendation methods based on pairwise preference assumption [27]. Once a triple (u, i, j) is sampled, we can calculate the gradients, and use them to update the model parameters via θ ← θ − γ ∇θ with γ > 0 as the learning rate. The gradients of the model parameters,

∇θ =

∂ fui(mj ) ∂θ , are as follows,

∇ b j = −σ (−ˆrui(mj ) )(−1 ) + α b j , ∇V j· = −σ (−ˆrui(mj ) )(− 

1

(8)



|Iu | i ∈Iu

((1 − λs ) + λs sii( p ) )Wi · ) + αV j· , (9)

∇ bi = −σ (−ˆrui(mj ) ) + α bi , ∇Vi· = −σ (−ˆrui(mj ) ) 

(10) 

1

|Iu \{i}| i ∈Iu \{i}

Table 2 Statistics of the data used in the experiments, including numbers of users (|U |), items (|I |), implicit feedback (|R|) of training records (Tr.) and test records (Te.), and densities (|R|/|U |/|I |) of training records. Data set

|U |

|I |

|R| (Tr.)

|R| (Te.)

|R|/|U |/|I | (Tr.)

MovieLens100K MovieLens1M UserTag Netflix5K5K

943 6040 30 0 0 50 0 0

1682 3952 20 0 0 50 0 0

27, 688 287, 641 123, 218 77, 936

27, 687 287, 640 123, 218 77, 936

1.75% 1.21% 2.05% 0.31%

the recommendation list instead of their predicted scores [3]. We thus use two commonly adopted ranking-oriented evaluation metrics in recommendation and information retrieval [24], i.e., Prec@5 and NDCG@5 for precision and normalized discounted cumulative gain (NDCG) at location 5, respectively. Mathematically, Prec@5 and NDCG@5 are defined as follows [26],



((1 − λs ) + λs sii( p ) )Wi · + αVi· ,

1 P re@5 = |U te |

(11)

 ∇Wi · = −σ (−ˆrui(mj ) )((1 − λs ) + λs sii( p ) ) 

 Vi·

|Iu \{i}|



+ αWi · , i ∈ Iu \{i},

∇Wi· = −σ (−ˆrui(mj ) )

−((1 − λs ) + λs sii( p ) )V j·



|Iu |

+ αWi· ,

NDCG@5 =

V j·



|Iu |



|U te | u∈U te

⎡ ⎣

p=1

1 min(5,|Iute |) p=1



u 5  2δ (Lu ( p)∈Ite ) − 1 ⎦, log( p + 1 )

1 p=1 log( p+1 )

where Lu (p) is the pth item for user u in the recommendation list, and δ (x) is an indicator function with value of 1 if x is true and 0 otherwise. Note that U te and Iute denote the set of test users and the set of examined items by user u in the test data, respectively.

(13)

4.3. Baselines and parameter settings In order to study the effectiveness of the developed algorithm with the designed mixed similarity model in the pairwise preference learning framework, we choose the state-of-the-art methods based on the predefined similarity, the learned similarity or the pairwise preference assumption. Specifically, we include the following baselines:

4. Experimental results



4.1. Datasets



For direct comparative empirical studies, we use four public datasets2 , where the training and test records of each dataset are obtained via a random splitting of the converted implicit feedback [24]. Specifically, the first dataset MovieLens100K contains 943 users, 1682 items, 27, 688 training records and 27, 687 test records; the second dataset MovieLens1M contains 6040 users, 3952 items, 287, 641 training records and 287, 640 test records; the third dataset UserTag contains 30 0 0 users, 20 0 0 items, 123, 218 training records and 123, 218 test records; and the fourth dataset Netflix5K5K contains 50 0 0 users, 50 0 0 items, 77, 936 training records and 77, 936 test records. Each dataset contains three copies of training records and test records. For the first copy of each dataset, we construct a validation data for parameter tuning via sampling n (i.e., the number of users) records from the corresponding training records. We put the statistics of the first copy of each dataset in Table 2.











4.2. Evaluation metrics For recommendation with implicit feedback, a user is usually only interested in a few top ranked items (say 5 for example) in https://sites.google.com/site/weikep/GBPRdata.zip.

u∈U

(12)

(m ) (m ) where rui = rˆui − rˆu(mj ) . j From the algorithm in Fig. 2, we can see that the time complexity of our P-FMSM is O(T |R|dρ ), and that of BPR is O(T |R|d ), where ρ denotes the average number of examined items by each user.

2

1



5  1 δ (Lu ( p) ∈ Iute ) , 5 te



ranking based on items’ popularity (PopRank) is a basic algorithm for recommendation with implicit feedback; item-oriented collaborative filtering with Cosine similarity (ICF) [5] is a classical algorithm based on the predefined similarity; ICF with an amplifier on the similarity (ICF-A) [2] favors the items with higher similarity; factored item similarity model with AUC loss (FISMauc) and RMSE loss (FISMrmse) [14] are two seminal algorithms based on the learned similarity; hierarchical Poisson factorization (HPF) [8] is a recent Bayesian approach tailed for modeling implicit feedback in recommender systems; Bayesian personalized ranking (BPR) [27] is the first work that introduces the pairwise preference assumption into the task of recommendation with implicit feedback, which performs among the best in most cases; adaptive K nearest neighbors based recommendation (AKNN) [27] generalizes the pairwise preference assumption to ICF with similarity learning. However, the learned similarity in A-KNN is not factorized, which thus has the same limitation of intransitivity. For fair comparison, we implement a factored version of adaptive KNN (FA-KNN); and Group preference based BPR (GBPR) [24] is an extended pairwise preference learning method considering user groups.

For ICF and ICF-A, we use Cosine similarity and |Ni | = 20 in neighborhood construction. For the amplifier in ICF-A, we use 2.5

Please cite this article as: M. Liu et al., Mixed similarity learning for recommendation with implicit feedback, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.12.010

ARTICLE IN PRESS

JID: KNOSYS

[m5G;December 20, 2016;7:9]

M. Liu et al. / Knowledge-Based Systems 000 (2016) 1–8 Table 3 Recommendation performance of P-FMSM and other methods on MovieLens100K using Prec@5 and NDCG@5. The searched values of α , T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are marked in bold (p value < 0.01).

PopRank ICF ICF-A FISMauc (α = 0.01, T = 100) FISMrmse (α = 0.001, T = 100) HPF BPR (α = 0.1, T = 10 0 0) FA-KNN (α = 0.1, T = 10 0 0) GBPR P-FMSM (α = 0.001, T = 500, λs = 0.4)

Prec@5

NDCG@5

0.2724 ± 0.0094 0.3145 ± 0.0018 0.2820 ± 0.0048 0.3651 ± 0.0086 0.3987 ± 0.0085 0.3412 ± 0.0077 0.3627 ± 0.0079 0.3679 ± 0.0044 0.4051 ± 0.0038 0.4115 ± 0.0059

0.2915 ± 0.0072 0.3305 ± 0.0010 0.2943 ± 0.0091 0.3788 ± 0.0139 0.4153 ± 0.0092 0.3523 ± 0.0097 0.3779 ± 0.0079 0.3868 ± 0.0076 0.4201 ± 0.0031 0.4320 ± 0.0054

Table 4 Recommendation performance of P-FMSM and other methods on MovieLens1M using Prec@5 and NDCG@5. The searched values of α , T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are marked in bold (p value < 0.01).

PopRank ICF ICF-A FISMauc (α = 0.001, T = 100) FISMrmse (α = 0.001, T = 100) HPF BPR (α = 0.01, T = 10 0 0) FA-KNN (α = 0.01, T = 100) GBPR P-FMSM (α = 0.001, T = 500, λs = 0.4)

Prec@5

NDCG@5

0.2822 ± 0.0019 0.3831 ± 0.0021 0.3921 ± 0.0016 0.3502 ± 0.0069 0.4227 ± 0.0 0 06 0.4131 ± 0.0080 0.4195 ± 0.0013 0.3650 ± 0.0026 0.4494 ± 0.0020 0.4562 ± 0.0037

0.2935 ± 0.0010 0.3966 ± 0.0020 0.4066 ± 0.0022 0.3569 ± 0.0082 0.4366 ± 0.0011 0.4262 ± 0.0096 0.4307 ± 0.0 0 08 0.3808 ± 0.0017 0.4636 ± 0.0014 0.4731 ± 0.0038

as a typical value [2]. For factorization-based methods, i.e., FISMauc, FISMrmse, BPR, FA-KNN, GBPR and our P-FMSM, we use d = 20 latent features, and search the tradeoff parameter on regularization terms α ∈ {0.001, 0.01, 0.1} and iteration number T ∈ {10 0, 50 0, 10, 0 0 0} using NDCG@5. For the Bayesian method HPF, we use the public code3 and default parameter configurations, i.e., 20 for the latent dimension number, 10 for the iteration number multiple, 0.3 for the hyperparameter, and true for bias and hierarchy. The tradeoff parameter λs for the mixed similarity in PFMSM is also searched via NDCG@5 from {0.2, 0.4, 0.6, 0.8, 1}. The learning rate γ is fixed as 0.01. The results of GBPR are copied from [24]. Note that the sampling strategy of BPR and P-FMSM in our empirical studies in this paper is shown in Fig. 2, which is more efficient than that of BPR and GBPR in [24].

5

Table 5 Recommendation performance of P-FMSM and other methods on UserTag using Prec@5 and NDCG@5. The searched values of α , T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are marked in bold (p value < 0.01).

PopRank ICF ICF-A FISMauc (α = 0.1, T = 500) FISMrmse (α = 0.001, T = 100) HPF BPR (α = 0.1, T = 500) FA-KNN (α = 0.1, T = 10 0 0) GBPR P-FMSM (α = 0.001, T = 500, λs = 0.6)

Prec@5

NDCG@5

0.2647 ± 0.0012 0.2257 ± 0.0051 0.2480 ± 0.0028 0.2536 ± 0.0031 0.3006 ± 0.0046 0.2684 ± 0.0070 0.2849 ± 0.0036 0.2641 ± 0.0046 0.3011 ± 0.0 0 08 0.3129 ± 0.0026

0.2730 ± 0.0014 0.2306 ± 0.0045 0.2540 ± 0.0032 0.2619 ± 0.0037 0.3092 ± 0.0049 0.2756 ± 0.0074 0.2931 ± 0.0047 0.2720 ± 0.0036 0.3104 ± 0.0 0 09 0.3218 ± 0.0030

Table 6 Recommendation performance of P-FMSM and other methods on Netflix5K5K using Prec@5 and NDCG@5. The searched values of α , T and λs are also included for result reproducibility. Note that the results of GBPR are copied from [24]. The significantly best results are marked in bold (p value < 0.01).

PopRank ICF ICF-A FISMauc (α = 0.01, T = 500) FISMrmse (α = 0.001, T = 100) HPF BPR (α = 0.01, T = 10 0 0) FA-KNN (α = 0.001, T = 100) GBPR P-FMSM (α = 0.001, T = 10 0 0, λs = 0.2)

Prec@5

NDCG@5

0.1728 ± 0.0012 0.2017 ± 0.0012 0.1653 ± 0.0032 0.1932 ± 0.0039 0.2217 ± 0.0043 0.1899 ± 0.0047 0.2207 ± 0.0051 0.2101 ± 0.0020 0.2411 ± 0.0027 0.2469 ± 0.0027

0.1794 ± 0.0 0 04 0.2186 ± 0.0 0 08 0.1804 ± 0.0041 0.2022 ± 0.0023 0.2413 ± 0.0063 0.2044 ± 0.0041 0.2374 ± 0.0055 0.2254 ± 0.0029 0.2611 ± 0.0025 0.2661 ± 0.0027

GBPR and our P-FMSM) perform well, which shows its effectiveness for the uncertain implicit feedback; (iv) and the very basic method PopRank usually performs worst as expected as it is a nonpersonalized solution and recommends the same set of most popular items to all users. 4.4.2. Effect of mixed similarity In this subsection, we study the effect of the mixed similarity in P-FMSM. For direct comparison, we fix λs = 0 and obtain a reduced model P-FISM with the unmixed learned similarity only. We show the results in Fig. 3, and can see that P-FMSM is better PFISM across all the four datasets, which clearly shows the merit of the mixed similarity via exploiting the complementarity of the predefined similarity and the learned similarity in the pairwise preference learning framework.

4.4. Experimental results 4.4.1. Overall results The main experimental results are shown in Tables 3–6. We can have the following observations: (i) our proposed P-FMSM performs significantly better than all other methods in all cases (the p-value of the significance test is smaller than 0.01)4 , which clearly shows the advantages of integrating the predefined and learned similarities in the pairwise preference learning framework; (ii) the recommendation methods based on the learned similarity (e.g., FISMauc, FISMrmse, FA-KNN and our P-FMSM) usually perform better than that based on the predefined similarity (i.e., ICF and ICF-A), which shows the helpfulness of similarity learning for a recommendation algorithm; (iii) the recommendation methods based on the pairwise preference assumption (e.g., BPR, FA-KNN, 3

https://github.com/premgopalan/hgaprec. We calculate the p-value of the statistical significance test via the MATLAB function ttest2.m as shown at http://www.mathworks.com/help/stats/ttest2.html. 4

4.4.3. Effect of neighborhood size In this subsection, we study the effect of using different neighborhood sizes, i.e., K ∈ {20, 30, 40, 50}, of P-FMSM instead of using all the neighbors by default. For fair comparison, we also use the same numbers of neighbors in ICF. The recommendation performance on Prec@5 and NDCG@5 are shown in Figs. 4 and 5, respectively. We can have the following observations: (i) both P-FMSM and P-FMSM (with different K) give about 23% on MovieLens100K, 10% on MovieLens1M, 28% on UserTag, and 15% on Netflix5k5k of significantly better results compared to ICF w.r.t. the corresponding values of the neighborhood sizes K ∈ {20, 30, 40, 50}, which shows the effectiveness of the developed algorithm; and (ii) the performance of P-FMSM and P-FMSM (with different K) are close, which shows that we can use a proper size of neighborhood instead of using all the neighbors to achieve low space complexity. It is an appealing property for real deployment by industry practitioners. The above experimental results show that our P-FMSM is significantly more accurate than the state-of-the-art recommendation

Please cite this article as: M. Liu et al., Mixed similarity learning for recommendation with implicit feedback, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.12.010

JID: KNOSYS 6

ARTICLE IN PRESS

[m5G;December 20, 2016;7:9]

M. Liu et al. / Knowledge-Based Systems 000 (2016) 1–8

methods for the studied problem, which shows the effectiveness of mixed similarity learning model via integrating the predefined similarity, learned similarity and pairwise preference learning in one single algorithm. One major potential limitation is its effectiveness in an extremely sparse case, because neither the predefined similarity nor the learned similarity can guarantee to succeed in bridging two latently related items well. 5. Related work In the big data era, the information overload problem has become unavoidable for almost everyone. We are submerged into a world with various types of information from social updates to recent scientific and technology advancement. So far, we mainly have two approaches to address the information overload problem to some extent, including information retrieval [4] and intelligent recommendation [28]. The paradigm of information retrieval is from a user-issued query to a ranked list of documents w.r.t. a certain relevance metric, which needs proactive involvement of the user. On the contrary, an intelligent recommendation engine exploits users’ historical behavior and then generates a personalized ranked list of items (e.g., web pages [20], tags [1], travel sequences [13] and many others [18]) to a target user or a group of target users [34] without intensive user involvement. Technically, there are mainly two types of recommendation methods, including neighborhood-based recommendation and factrization-based recommendation, besides the very basic approach of recommendation based on items’ popularity (i.e., PopRank). We will discuss these two types of recommendation methods separately in the subsequent two subsections. Fig. 3. Recommendation performance of P-FISM (i.e., with learned similarity only when λs = 0) and P-FMSM. Note that the searched values of P-FISM are α = 0.01, T = 10 0 0 on MovieLens10 0K, α = 0.0 01, T = 50 0 on MovieLens1M, α = 0.0 01, T = 10 0 on UserTag, and α = 0.0 01, T = 50 0 on Netflix5K5K.

5.1. Neighborhood-based recommendation There are usually two types of users’ feedback in a typical recommendation system [12,28], including explicit feedback such as 5-star ratings and implicit feedback such as examination records.

Fig. 4. Recommendation performance of Prec@5 using ICF and P-FMSM with different values of K (i.e., different neighborhood sizes), and the default P-FMSM with all neighbors.

Please cite this article as: M. Liu et al., Mixed similarity learning for recommendation with implicit feedback, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.12.010

ARTICLE IN PRESS

JID: KNOSYS

[m5G;December 20, 2016;7:9]

M. Liu et al. / Knowledge-Based Systems 000 (2016) 1–8

7

Fig. 5. Recommendation performance of NDCG@5 using ICF and P-FMSM with different values of K (i.e., different neighborhood sizes), and the default P-FMSM with all the neighbors.

Table 7 Summary of some related works,. Neighborhood-based recommendation

Factorization-based recommendation

Explicit feedback

Predefined similarity: e.g., [7]

Implicit feedback

Predefined similarity: e.g., [2,5]

w/o similarity: e.g., [29,35] Learned similarity: e.g., [15] Mixed similarity: N/A w/o similarity: e.g., [8,10,22,24,27,30,36] Learned similarity: e.g., [14] Mixed similarity: e.g., P-FMSM

For either recommendation with explicit feedback or implicit feedback, neighborhood construction is always a very critical step [21], which requires some similarity measurement between two users or two items. For explicit feedback, the most well-known similarity measurement is probably the Pearson correlation coefficient (PCC) [7,9] defined on the co-rated items by two users or co-raters to two items. For implicit feedback, we have similarities such as Cosine similarity, Jaccard index and their extensions [5,33]. Note that the similarity between two users (or two items) such as Cosine similarity may be further amplified to favor users (or items) with high similarity [2]. 5.2. Factorization-based recommendation Factorization-based methods have been well recognized as the state-of-the-art methods in most recommendation problems. For explicit feedback, a typical factorization-based method will approximate each feedback, e.g., a numerical rating, with the inner product of two latent feature vectors [29] or with additional feature vectors of implicitly examined items [15]. There are also some factorization-based methods that optimize the ranking list of the derived relative ordering from explicit feedback directly [35]. For implicit feedback, a factorization-based method is usually associated with a preference assumption, including point-

wise preference assumption [10,14,22], pairwise preference assumption [14,24,27,36], and listwise preference assumption [30]. A specific preference assumption is usually associated with a loss to be minimized or an objective to be maximized, such as square loss [10,22] in pointwise assumption, Logistic loss [27], Hinge loss [36] and square loss of relative preference difference [14] in pairwise assumption, and the objective of mean reciprocal rank (MRR) [30] in listwise assumption. For the prediction rule in a typical loss or objective function, we usually have two choices, including the one with the inner product of two latent feature vectors [10,22,27,30,36] and the other using the prediction rule of neighborhood-based method [14]. Besides, a recent approach digests the implicit user feedback in the Bayesian framework [8] instead of the well-known matrix-factorization framework. Similarity between two items or two users is a very fundamental issue in recommendation. Our mixed similarity learning model goes one step beyond traditional unmixed similarity such as the predefined similarity or the learned similarity only, and inherits the merits of both two. The designed mixed similarity learning model is also generic and can be applied to recommendation settings with heterogeneous feedback [25] and social connections [32]. We summarize some related works in Table 7, from which we can see that our proposed mixed similarity learning model is a novel solution for recommendation with implicit feedback.

Please cite this article as: M. Liu et al., Mixed similarity learning for recommendation with implicit feedback, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.12.010

JID: KNOSYS 8

ARTICLE IN PRESS

[m5G;December 20, 2016;7:9]

M. Liu et al. / Knowledge-Based Systems 000 (2016) 1–8

6. Conclusions and future work In this paper, we study an important recommendation problem with implicit feedback from the perspective of item similarity. Specifically, we propose a novel mixed similarity learning model that exploits the complementarity of the predefined similarity and the learned similarity commonly used in the state-of-the-art recommendation methods. As far as we know, we are the first to study the integration of such two canonical approaches for bridging two items in a principled way. With the mixed similarity, we further develop a novel recommendation algorithm in the pairwise preference learning framework, i.e., pairwise factored mixed similarity model (P-FMSM). Our P-FMSM can recommend significantly better than the state-of-the-art methods with the predefined similarity or the learned similarity. For future works, we are interested in studying the mixed similarity model in an extremely sparse case, generalizing it to crossdomain problem settings with heterogeneous auxiliary data of text and temporal information [16,23], and deploying it in a cloud computing environment [17]. Acknowledgments We thank the handling editor and reviewers for their expert comments and constructive suggestions. We thank the support of National Natural Science Foundation of China Nos. 61502307 and 61672358, and Natural Science Foundation of Guangdong Province Nos. 2014A030310268 and 2016A030313038. References [1] F.M. Belém, C.S. Batista, R.L.T. Santos, J.M. Almeida, M.A. Gonçalves, Beyond relevance: explicitly promoting novelty and diversity in tag recommendation, ACM Trans. Intell. Syst. Technol. 7 (3) (2016) 26:1–26:34. [2] J.S. Breese, D. Heckerman, C.M. Kadie, Empirical analysis of predictive algorithms for collaborative filtering, in: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, UAI ’98, Madison, Wisconsin, 1998, pp. 43–52. [3] L. Chen, P. Pu, Users’ eye gaze pattern in organization-based recommender interfaces, in: Proceedings of the 16th International Conference on Intelligent User Interfaces, in: IUI ’11, 2011, pp. 311–314. [4] P.R. Christopher D. Manning, H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008. [5] M. Deshpande, G. Karypis, Item-based top-n recommendation algorithms, ACM Trans. Inf. Syst. 22 (1) (2004) 143–177. [6] A. Elbadrawy, G. Karypis, User-specific feature-based similarity models for top-n recommendation of new items, ACM Trans. Intell. Syst. Technol. 6 (3) (2015) 57:1–57:22. [7] D. Goldberg, D. Nichols, B.M. Oki, D. Terry, Using collaborative filtering to weave an information tapestry, Commun. ACM 35 (12) (1992) 61–70. [8] P. Gopalan, J.M. Hofman, D.M. Blei, Scalable recommendation with hierarchical Poisson factorization, in: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, in: UAI ’15, 2015, pp. 326–335. [9] G. Guo, J. Zhang, N. Yorke-Smith, A novel Bayesian similarity measure for recommender systems, in: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, in: IJCAI ’13, 2013, pp. 2619–2625. [10] Y. Hu, Y. Koren, C. Volinsky, Collaborative filtering for implicit feedback datasets, in: Proceedings of the 8th IEEE International Conference on Data Mining, in: ICDM ’08, 2008, pp. 263–272. [11] M. Jahrer, A. Töscher, Collaborative filtering ensemble for ranking, in: Proceedings of KDD Cup 2011 Competition, 2012, pp. 153–167. [12] G. Jawaheer, P. Weller, P. Kostkova, Modeling user preferences in recommender systems: a classification framework for explicit and implicit user feedback, ACM Trans. Interact. Intell. Syst. 4 (2) (2014) 8:1–8:26.

[13] S. Jiang, X. Qian, T. Mei, Y. Fu, Personalized travel sequence recommendation on multi-source big social media, IEEE Trans. Big Data 2 (1) (2016) 43–56. [14] S. Kabbur, X. Ning, G. Karypis, FISM: factored item similarity models for top-n recommender systems, in: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, in: KDD ’13, 2013, pp. 659–667. [15] Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, in: KDD ’08, 2008, pp. 426–434. [16] J. Li, J. Li, X. Fu, M.A. Masud, J.Z. Huang, Learning distributed word representation with multi-contextual mixed embedding, Knowl. Based Syst. 106 (2016) 220–230. [17] J. Li, M. Qiu, Z. Ming, G. Quan, X. Qin, Z. Gu, Online optimization for scheduling preemptable tasks on IaaS cloud systems, J. Parallel Distrib. Comput. 72 (5) (2012) 666–677. [18] J. Lu, D. Wu, M. Mao, W. Wang, G. Zhang, Recommender system application developments: a survey, Decis. Support Syst. 74 (2015) 12–32, doi:10.1016/j. dss.2015.03.008. [19] Z. Lu, Z. Dou, J. Lian, X. Xie, Q. Yang, Content-based collaborative filtering for news topic recommendation, in: Proceedings of the 29th AAAI Conference on Artificial Intelligence, in: AAAI ’15, 2015, pp. 217–223. [20] T.T.S. Nguyen, H. Lu, J. Lu, Web-page recommendation based on web usage and domain knowledge, IEEE Trans. Knowl. Data Eng. 26 (10) (2014) 2574–2587. [21] X. Ning, C. Desrosiers, G. Karypis, A Comprehensive survey of neighborhoodbased recommendation methods, in: Recommender Systems Handbook (Second Edition), Springer, New York, 2015, pp. 37–76. [22] R. Pan, Y. Zhou, B. Cao, N.N. Liu, R. Lukose, M. Scholz, Q. Yang, One-class collaborative filtering, in: Proceedings of the 8th IEEE International Conference on Data Mining, in: ICDM ’08, 2008, pp. 502–511. [23] W. Pan, A survey of transfer learning for collaborative recommendation with auxiliary data, Neurocomputing 177 (2016) 447–453. [24] W. Pan, L. Chen, GBPR: group preference based Bayesian personalized ranking for one-class collaborative filtering, in: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, in: IJCAI’13, 2013, pp. 2691–2697. [25] W. Pan, S. Xia, Z. Liu, X. Peng, Z. Ming, Mixed factorization for collaborative recommendation with heterogeneous explicit feedbacks, Inf. Sci. 332 (2016) 84–93. [26] W. Pan, H. Zhong, C. Xu, Z. Ming, Adaptive Bayesian personalized ranking for heterogeneous implicit feedbacks, Knowl. Based Syst. 73 (2015) 173–180. [27] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, BPR: Bayesian personalized ranking from implicit feedback, in: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, in: UAI ’09, 2009, pp. 452–461. [28] F. Ricci, L. Rokach, B. Shapira, Recommender Systems Handbook (Second edition), Springer, New York, 2015. [29] R. Salakhutdinov, A. Mnih, Probabilistic matrix factorization, in: Annual Conference on Neural Information Processing Systems 20, in: NIPS ’08, 2008, pp. 1257–1264. [30] Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, N. Oliver, A. Hanjalic, CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering, in: Proceedings of the 6th ACM Conference on Recommender Systems, in: RecSys ’12, 2012, pp. 139–146. [31] G. Takács, D. Tikk, Alternating least squares for personalized ranking, in: Proceedings of the 6th ACM Conference on Recommender Systems, in: RecSys ’12, 2012, pp. 83–90. [32] J. Tang, X. Hu, H. Liu, Social recommendation: a review, Soc. Netw. Anal. Min. 3 (4) (2013) 1113–1133. [33] K. Verstrepen, B. Goethals, Unifying nearest neighbors collaborative filtering, in: Proceedings of the 8th ACM Conference on Recommender Systems, in: RecSys ’14, 2014, pp. 177–184. [34] W. Wang, G. Zhang, J. Lu, Member contribution-based group recommender system, Decis. Support Syst. 87 (2016) 80–93. [35] M. Weimer, A. Karatzoglou, A.J. Smola, Improving maximum margin matrix factorization, Mach. Learn. 72 (3) (2008) 263–276. [36] S. Yang, B. Long, A.J. Smola, H. Zha, Z. Zheng, Collaborative competitive filtering: learning recommender using context of user choice, in: Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, in: SIGIR ’11, 2011, pp. 295–304.

Please cite this article as: M. Liu et al., Mixed similarity learning for recommendation with implicit feedback, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.12.010

Mixed similarity learning for recommendation with ...

ical studies on four public datasets show that our P-FMSM can recommend significantly more accurate than several ... ing sites, and check-in records in location-based mobile social networks, etc. For recommendation with implicit feedback, there are at least three categories of approaches. The first one adopts some pre-.

835KB Sizes 0 Downloads 237 Views

Recommend Documents

Mixed Similarity Learning for Recommendation with ...
Figure: Illustration of mixed similarity learning. Liu et al. (CSSE ..... Experiments. Effect of Neighborhood Size (1/2). 20. 30. 40. 50. 0.2. 0.3. 0.4. 0.5. K. Prec@5.

Mixed similarity learning for recommendation with ...
Implicit feedback such as users' examination behaviors have been recognized as a very important source of information in most recommendation scenarios. For recommendation with implicit feedback, a good similarity measurement and a proper preference a

Mixed factorization for collaborative recommendation with ...
Nov 10, 2015 - the CR-HEF problem, and design a novel and generic mixed factorization based transfer learn- ing framework to fully exploit those two different types of explicit feedbacks. Experimental results on two CR-HEF tasks with real-world data

Learning Fine-grained Image Similarity with Deep Ranking (PDF)
One way to build image similarity models is to first ex- tract features like Gabor .... We call ti = (pi,p+ i ,p− ... A ranking layer on the top evaluates the hinge loss (3).

An Online Algorithm for Large Scale Image Similarity Learning
machines, and is particularly useful for applications like searching for images ... Learning a pairwise similarity measure from data is a fundamental task in ..... ACM SIGKDD international conference on Knowledge discovery and data mining,.

Learning a Large-Scale Vocal Similarity Embedding for Music
ommendation at commercial scale; for instance, a system similar to the one described ... 1Spotify Inc.. ... sampled to contain a wide array of popular genres, with.

Learning Similarity Measure for Multi-Modal 3D Image ...
The most popular approach in multi-modal image regis- ... taken within a generative framework [5, 4, 8, 13]. ..... //www.insight-journal.org/rire/view_results.php ...

Recommendation for New Users with Partial ...
propose to leverage some auxiliary data of online reviewers' aspect-level opinions, so as to .... called CompleteRank), mainly contains the following three steps. ... defined dictionary). Inspired from this observation, we emphasize the usage of aspe

Stacking Recommendation Engines with Additional ...
cable to the target recommendation task can be easily plugged into our STREAM system. Anytime a ..... [14] IMDb. Internet movie database. downloadable at.

An Efficient Algorithm for Similarity Joins With Edit ...
ture typographical errors for text documents, and to capture similarities for Homologous proteins or genes. ..... We propose a more effi- cient Algorithm 3 that performs a binary search within the same range of [τ + 1,q ..... IMPLEMENTATION DETAILS.

Personalized Itinerary Recommendation with Queuing ...
tems; Location based services; Data mining; Web applications;. KEYWORDS ...... by iteratively calling the SelectNextNode() method (Line 12) and appending the ...

Supporting Approximate Similarity Queries with ...
support approximate answering of similarity queries in P2P networks. When a ... sampling to provide quality guarantees. Our work dif- ...... O(log n) messages. In [16], the authors propose a de- centralized method to create and maintain a random expa

Contour Grouping with Partial Shape Similarity - CiteSeerX
... and Information Engineering,. Huazhong University of Science and Technology, Wuhan 430074, China ... Temple University, Philadelphia, PA 19122, USA ... described a frame integrates top-down with bottom-up segmentation, in which ... The partial sh

Contour Grouping with Partial Shape Similarity - CiteSeerX
the illustration of the process of prediction and updating in particle filters. The .... fine the classes of the part segments according to the length percentage. CLi.

Context-Aware Query Recommendation by Learning ...
not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission a

Quantum Programming With Mixed States
scalable quantum computer makes therefore even more important to have a model for .... That definition of Fin remains valid when an observable O is defined.

Learning Context Sensitive Shape Similarity by Graph ...
Mar 24, 2009 - The Label propagation supposes the number of classes C is known, and all ..... from the Ph.D. Programs Foundation of Ministry of Education of China (No. ... [17] X. Zhu, “Semi-supervised learning with graphs,” in Doctoral ...

Syntactical Similarity Learning by means of Grammatical Evolution
by a similarity learning algorithm have proven very powerful in many different application ... a virtual machine that we designed and implemented. The virtual ..... pattern (e.g., telephone numbers, or email addresses) from strings which do not.

Component Recommendation for Cloud Applications - Semantic Scholar
with eigenvalue 1 or by repeating the computation until all significant values become stable. With the above approach, the significant value of a component ci is ...

Large Scale Online Learning of Image Similarity Through ... - CiteSeerX
Mountain View, CA, USA ... classes, and many features. The current abstract presents OASIS, an Online Algorithm for Scalable Image Similarity learning that.

Learning Distance Function by Coding Similarity
Intel research, IDC Matam 10, PO Box 1659 Matam Industrial Park, Haifa, Israel 31015. Daphna Weinshall ... data retrieval, where similarity is used to rank items.

Large Scale Online Learning of Image Similarity ... - Research
the Euclidean metric in feature space. Each curve shows the precision at top k as a function of k neighbors. Results are averages across 5 train/test partitions (40 ...