Cross-Channel Query Recommendation on Commercial Mobile Search Engine: Why, How and Empirical Evaluation Shunkai Fu1, Bingfeng Pi1, Ying Zhou1, Michel C. Desmarais2, Weilei Wang1, Song Han1, and Xunrong Rao1 2
1 Roboo Inc. Ecole Polytechnique de Montreal
Abstract. Mobile search not only inherits some features of traditional search on PC, but also has many of its own special characteristics. In this paper, we firstly share some unique features about mobile search and discuss why vertical search is preferred. Providing multiple vertical searches is proved to be convenient to users but causes some minor problem as well. This plays as the initiative for us to propose cross-channel query recommendation. Secondly, we briefly introduce how to realize the cross-channel recommendation effectively and efficiently online. Finally, we analyze the performance of the proposed method from three different but related metrics: expected effect, off-line evaluation and on-line evaluation. All three studies together indicate that the proposed cross-channel recommendation is quite useful. Being the first study about query recommendation on mobile search, it is believed that the findings, proposed solution and collected feedback as presented here will be beneficial to both researchers and industry companies while considering how to provide better mobile search service. Keywords: Cross query recommendation, mobile search, empirical evaluation.
1 Introduction Mobile search is an emerging and promising application in China currently considering the potential huge user population, over 600-million Chinese are consuming mobile communication service. Being a start-up company, we are growing up quickly in the Chinese market by providing easy-to-use mobile search service. A large amount of firsthand usage data allow us to understand Chinese search behavior, then know how to do better, and finally evaluate how well we are doing with the feedback collected. In this paper, we refer to the traditional search over computer as PC search, and the search over a hand-held device as mobile search. We discuss in the following paragraphs why the two kinds of search have to be treated in different way. Although the fundamental search algorithms and architecture are similar for both PC and mobile search on the backend, more challenge actually is imposed on the latter one, and they can be summarized briefly as below: T. Theeramunkong et al. (Eds.): PAKDD 2009, LNAI 5476, pp. 883–890, 2009. © Springer-Verlag Berlin Heidelberg 2009
2. 3. 4.
S. Fu et al.
Slow wireless access and processing speed: Neither GSM, GPRS nor EDGE as available in China has competitive transmission speed compared with typical wired Internet access, like ADSL. Meanwhile, the computing capacity, in terms of CPU and memory, is quite limited on mobile devices; Expensive connection fee: Compared with wired connection, the charge of wireless transmission flow is quite high, more than 1 US dollar per MB; Small screen: Quite limited information can be accessed by users without extra scrolling down operation (refer to Figure 1 for an example); Difficult input: Typing on a standard 12-key cell phone is not convenient. Our analysis indicates that the average query on our mobile search engine is less than 4 Chinese characters, which is much shorter than that on a typical PC search engine and is consistent with the finding as reported in [1, 2]; Mobile search is brand new for many: One telephone survey made by us on a random selected group of users indicates that most mobile search users even never use PC search before, which is consistent with the fact that there are over 600 millions of wireless communication users but only 200 millions of PC users. The survey also tells us that many experienced PC searchers neither have long history of accessing mobile search; Much lower average education level for mobile search users: The survey mentioned above indicated this point as well; One important way of amusement: Mobile phone is becoming one of the most favorite pastimes in China because of its portability and intelligent functions available. Looking for fun is one important goal for those who are using our mobile search service, as indicated by our search log data.
Figure 1(1) is the home page of Roboo mobile search engine (http:// wap. roboo. com). Several vertical channels are provided in addition to the page search, e.g., image, ring, MP3, short message, game, software etc. Here, we refer to those non-page channels as vertical channels considering they are designed for a specific kind of resource. It is built on the following points: 1.
Page search is not enough for mobile search. Given current keyword-based relevance model and the fact that there are many duplicated or near-duplicate pages , a mobile searcher may be hard to find what s/he wants within the long list of results, which becomes worse due to the small screen slow processing rate; Different from PC search, looking for fun is one important initiative for those mobile search users currently. Reading some news, playing a while of game, or browsing some images is usual pastime way for most mobile phone owners while they are waiting for bus, train, plane, etc. Separating these resources can assist the users to find what they prefer in an easier way; Slow access speed and small screen are critical factors for the necessity of these vertical channels. Again, asking the users to scroll down to locate what they need among a long list of page search result is not convenient. The poor transmission speed will make it worse since turning to the next page requires additional connection request. Indexing different resources with independent vertical channel is one solution for this problem since it allows users to find MP3 or image regarding to someone, for example, in a direct way;
Cross-Channel Query Recommendation on Commercial Mobile Search Engine
Multimedia resources, like image, video and music, are welcome by current mobile search users, at least when they are facing mobile device. Our search log indicates that over 70% of the total search flow comes from non-page channels, which makes us more confident about the decision.
We have covered so far the necessary background about mobile search as well as the search service provided by us. Although query recommendation are discussed by many research work [3,4,5], and can be found on most popular commercial search engines, like Google and Baidu (http://www.baidu.com), no effort, to our best knowledge, ever be spent on cross-channel recommendation and, more importantly, its importance on mobile search. Therefore, in this paper, we discuss the necessity of providing cross-channel recommendation, how to do that and expected outcome in Section 2. Then, in Section 3, we demonstrate the effect of our job based on log data. Section 4 gives some concluding remarks.
Fig. 1. A series of examples about Roboo mobile search engine and snapshot about crosschannel recommendation as discussed. (1) The primary user interface of Roboo mobile search engine; (2) Results as retrieved about query “Andy Liu” in page search channel; (3) Recommend an appropriate channel, different from the original one (highlighted with red rectangle); (4) Related resources are retrieved and presented if the user agrees and clicks the recommendation in (3); (5) Recommend more related resource about a target (highlighted with red rectangle); (6) Video results retrieved corresponding to the recommendation in (5).
S. Fu et al.
2 Cross-Channel Recommendation 2.1 Why We Need It Although the design of vertical channels allows for easier search and provides convenience for mobile searchers, it brings some problems as well: 1.
Users search for some target within a wrong channel. For example, we find query of “images of Andy Liu” (one famous star) in the MP3 channel, and “songs of Andy Liu” in the image channel. This may happen on new users who are not familiar with our product, or happen accidentally due to mistaken operation. Though it occupies a small portion of the total query in the corresponding channel, it has to be paid with enough attention because we believe user experience will never be over emphasized, especially on the mobile search engine; There are several related dimensions of information about one specific object. For instance, given a query “Andy Liu” in page channel, it indicates that the searcher is interested with related information about him. Typically, news relating to the queried object will be retrieved and ranked by some algorithm, with title and summary presented (Figure 1(2)). Actually, regarding to this star, there exist more resources about him, including images, MP3, ring, or video, but they are normally mixed in the results, and the searchers have to go through the results manually to find something interesting.
To solve these two problems, we propose cross-channel recommendation which may help us to achieve the following benefits: 1.
Giving timely appropriate tips to searchers while they are searching in a wrong channel. For example, given the query “images of Andy” in MP3 channel, no result will be found, but we would ask the searcher “if you are searching for Andy’s images.” (See Figure 1(3), the part highlighted in red. How it works will be introduced in Section 2.2) If this recommendation is clicked, the user will be transferred to the image channel, and the related images about Andy will be retrieved and presented to the user (Figure 1(4)). We won’t implicitly do this for the user, but asks her/him explicitly, considering that (1) the users may make some mistaken operations sometimes, and our recommendation allows them to fix the error with no cost (i.e., no more typing); (2) this reminding helps our users to get familiar with our product and avoid the same mistake next time; (3) algorithm will not always work correctly, so the recommendation is better to be confirmed by the user before it is processed. This may save time and money for the users by preventing non-appropriate service and transmission request; and (4) it shows our respect and emphasis towards the users; Allowing more channels to be introduced to the users, providing them greater satisfaction as well as gaining more PV (page-visit). Given a query “Andy” in image channel, other than the images retrieved, we will ask “Interested with Andy’s related video?” as shown in Figure 1(5). If this recommendation is accepted by the users (and be clicked), the related video resources about this star is retrieved and presented (Figure 1(6)). By this way, the users (1) have chance
Cross-Channel Query Recommendation on Commercial Mobile Search Engine
to access more relevant information; (2) can browse the related information easily since it is well constructed; (3) don’t have to re-enter the query again and again since everything is one-click away; (4) are able to control the possible cost by rejecting any recommendation not necessary to them; The combination of the above two aspects will greatly increase the user experience since (1) mistaken operation is detected automatically and correct recommendation is presented in a friendly manner, and (2) more relevant information is recommended to searchers without requesting extra typing, just one-click away. In short, user-centered design is believed to be the most important factor for the success of a mobile search engine.
2.2 Algorithms for the Cross-Channel Recommendation How to realize the cross-channel recommendation in a quick and effective manner online will be addressed in this section. The overall procedure is divided into the following steps: 1.
Indexing queries as they appear in each channel’s search log separately, by filtering out those queries with no results. New queries can be added into their corresponding channel’s index online; Given < Q , Ch >, i.e. query Q submitted in a specific channel Ch , do a search over each channel’s query index as prepared in step 1; If there is not exact match, go to step 4; else, Qrec = Q . Then, if Q appears in more than one channel, select the channel with highest frequency of this query; otherwise, directly go to step 6.; Filter out recommended candidates with frequency lower than threshold value θ1 and those with similarity score smaller than θ 2 ; Randomly select a recommended query as
Qrec if there is at least one match by
the end of step 4. Otherwise, Qrec = null ; 6.
If Qrec ≠ null , retrieve Qrec ’s corresponding channel label,
Chrec , to compose
Qrec , Chrec > . On the user interface, this recommendation appears as “If you are looking for Qrec ’s Chrec ”; When “If you are looking for Qrec ’s Chrec ” is clicked, a search about Qrec is done in channel Chrec , and the retrieved results are presented to the searcher, the final recommendation <
see Figure 1(3) and Figure 1(5). From this brief description, it can be seen that two kinds of information are necessary for the cross-channel query recommendation: 1. 2.
Search log data. It records queries and their corresponding frequency; The similarity between the active query (i.e. the one being studied) with any other queries existing in the search log. Vector space model (VSM) is used here with cosine similarity as the basic measure of the distance between a pair of queries. Although it is known that even two queries may share terms, they may
S. Fu et al.
be using the term in different contexts , we still depend on this measure considering that (a) it is quite simple; (b) the computation is fast enough for online application; (c) we can filter out too marginal matches by increasing θ 2 ; (d) our test indicates that it works well in most cases. 2.3 More Features about the Proposed Solution Besides the primary procedure covered above, some not-so-obvious features have to be introduced as well since they are there to ensure the success of this solution: 1.
Queries with no results are filtered out firstly in each channel’s query log, which is necessary to avoid any fruitless recommendation; Exact match is considered with highest priority. This maximizes the probability that the recommendation will be accepted; Ties due to multiple exact matches are solved by choosing the more frequent one. For example, if a query is found in both image and video channel, but more frequent in previous one, image channel will be suggested with priority; Randomly selecting one channel when exact match is not found, but there are several similar ones. Random decision has two obvious advantages, (a) each channel has fair opportunity to be recommended; (b) the risk of wrong recommendation is decreased.
However, due to the different releasing date for different channel, the volume of queries of different channel may vary greatly. Therefore, bias still exists even after random selection is applied: channels with longer operating history are more likely to be recommended.
3 Empirical Evaluation Since it was released online, the cross-channel recommendation has been operating for about five months. Our evaluation covers the following three aspects: 1. 2. 3.
Expected effect: Given the current algorithm, what the expected response will be under different condition given the current algorithm? Off-line evaluation: Manual voting of recommendation; On-line evaluation: Assessment of recommendations using log data.
Expected effect Different from the discussion in Section 2.3, here, we care more about the possible performance from the user side: if it only works on the right time, i.e. recommendation should appear only when there is indeed related resource about the query in a different channel. Based on the description and discussion in Section 2.2 and Section 2.3, we know that, given a query in a channel, recommendation is presented only when exactly or similarly matched query is found in a different channel. By filtering out queries with no results or with too low frequency, the recommendation is expected to be reasonable and useful.
Cross-Channel Query Recommendation on Commercial Mobile Search Engine
Off-line evaluation When the recommendation is made and presented to the users, we validate if the suggested query Qrec and the new channel, Chrec , are appropriate. To do this evaluation, we randomly choose 200 queries with exact matched results, and 200 with similar matches from the query log. Three professionals are employed to vote on the suggestions: < Qrec , Chrec >. If they think the recommendation is appropriate, “Agree” it; if not, “Disagree” it; if not sure about the result, mark as “No idea”. If there are at least two “Agree”, “Disagree” or “No idea” with the recommendation, the result is obvious; if there is one “agree”, one “disagree” and one “No idea”, the result is viewed as “No idea” as well. From Table 1, we see that 187 out of 200 (about 93.50%) recommendations are thought as appropriate when “exact match” is satisfied. The same index decreases to 69.50% (139 out of 200) for “similar match”. This exactly reflects that (a) our method is useful to recommend more dimensions of information to the searchers, and (b) the similarity measure (VSM) used here that requires future work. Table 1. Breakdown of the voting assessment of 400 randomly selected queries by three persons Agree
187 (out of 200)
139 (out of 200)
326 (out of 400)
On-line evaluation Large amount of query log data is accumulated for the analysis of its real performance. We initially considered the following measures for on-line evaluation: 1. 2.
The frequency of cross-channel recommendation is clicked by the searchers when they are presented given a query; The frequency of clicking happened within the results retrieved by the crosschannel recommended queried.
However, we afterwards realized that they are not as suitable as expected. Firstly, although the first measure could directly indicate if the recommendation is accepted or not, there are at least three factors ignored here: (1) The searchers don’t notice the recommendations appearing below the retrieved results; (2) Even it is seen, the users may not click it since the results retrieved may be satisfying to them; (3) The contribution may not be measured completely if it is clicked or not since the presence of the recommendation itself may impress the searchers and keep them on our search engine. So, we can’t just conclude that the recommendations are not welcome if they are not clicked. Secondly, with a similar inference, we could not depend on if the results retrieved are clicked or not to evaluate the recommendation.
S. Fu et al.
Even so, some remarks are still shared based on our study of query log: 1.
The recommendation works when there is no result found in some channel given a query. The user will click the suggestion, and be led to a new channel with related information presented, as the example of Figure 1(3)(4); The recommendation also works even when there are results retrieved given a query in a channel. The recommendation will trigger a new search, and take the user to a new channel, where some results are clicked; The absolute amount of times of clicking on the cross-channel recommendation is increasing, about 50% more per month; With most recent observations, it is estimated that about 8.0% of our monthly PVs come directly or indirectly from the cross-channel recommendation.
4 Conclusion In this paper, we start with mobile search as the big background, along with some discussions of its special characteristics based on our experience. Then, the discussion is narrowed down to our own mobile search engine, a popular one in China currently. Although our product strategy is proved to be suitable for Chinese mobile searchers, it also causes some minor problem. To fix this problem, a simple cross-channel query recommendation solution is proposed on the basis of (1) our understanding of user habit, (2) large scale of query log collected till now, and (3) efficient and effective strategy. The overall procedure is clearly described, and our evaluation covers (1) expected effect, (2) off-line evaluation and (3) on-line evaluation. All of these work together to present readers a comprehensive picture about how it works and how well it performs. Some future work is deemed. We may do more research to find a finer model to measure the similarity between queries. Besides, more log data is required for more comprehensive on-line performance evaluation.
References 1. Fu, S., Pi, B., Han, S., Guo, J., Zou, G., Wang, W.: User-centered solution to detect nearduplicate pages on mobile search engine. In: Proceedings of SIGIR Workshop on Mobile IR (MobIR), Singapore (2008) 2. Kamvar, M., Baluja, S.: Query suggestions for mobile search: Understanding usage patterns. In: Proceedings of the SIGCHI Conference on Human Factors in Computing (CHI) (2008) 3. Zhang, Z., Nasraoui, O.: Mining search engine query logs for query recommendation. In: Proceedings of 15th International World Wide Web Conference, W3C (2006) 4. Yates, R.B.: Query usage mining in search engines. In: Scime, A. (ed.) Web Mining: Applications and Techniques, Idea Group (2004) 5. Wen, J., Nie, J., Zhang, H.: Clustering user queries of a search engine. In: Proceedings at 10th International World Wide Web Conference, W3C (2001) 6. Sahami, M., Heilman, T.: Mining the Web to determine similarity between words, objects, and communities. In: Proceedings at 22nd ICML, Bonn, Germany (2005)