Conceptual Mile Markers for Exploratory Search Keith Bagley

Haim Levkowitz

IBM University of Massachusetts Lowell Bedford, NH, USA +1.603.818.9202

University of Massachusetts Lowell One University Ave. Lowell, MA, USA +1.978.934.3654

[email protected]

[email protected]

ABSTRACT During exploratory search sessions, users often experience difficulty finding and re-finding high-quality results that are pertinent to their search goals. Various studies have found that search sessions are unsuccessful, end prematurely, and/or lead to user frustration. This paper describes a framework and prototypical system called Conceptual Mile Markers (CMM) that reduces the “time to value” for a search by matching a user query with existing search paths, thereby allowing users to profit from the "wisdom of the masses" and reuse previously successful search trails. Additionally, these search paths may be saved, shared, and retrieved later as a style of lightweight personalization without the need for explicit user tracking. We propose that our framework and prototype leads to users being more likely to acquire a successful end-state and high-quality results, while being less apt to get frustrated and/or abandon searches prematurely.

Categories and Subject Descriptors H.1.2 [User/Machine Systems]: Human Factors – exploration, context, search, collaboration.

General Terms Human Factors.

Keywords Conceptual Mile Markers, exploratory search.

1. INTRODUCTION During exploratory search sessions, users often experience difficulty finding and re-finding high-quality results that are pertinent to their search goals. Feild and Allan found, as part of their research, that 36% of queries submitted in search sessions end with users being moderately to extremely frustrated [2]. Additionally, Shah and Marchionini found that there is a high overlap of queries for a given task between users, as well as a high percentage of query reuse by individual searchers for themselves in re-searching tasks [9].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. HCIR 2011

In this paper, we describe a framework and prototypical system called Conceptual Mile Markers (CMM) that addresses the findings outlined by the aforementioned research. We propose that CMM reduces the “time to value” for a search by matching a user query with existing search paths (which we call “roadmaps”) that have been captured by one or more users during previous search sessions. These roadmaps are presented in conjunction with a clustered result set from traditional search engines, thereby allowing users to select to either explore on their own, or reuse previous work. By making existing search paths easily accessible, users are more likely to acquire a successful end-state, and less apt to get frustrated and/or abandon searches prematurely. Since CMM stores path information at the system server, each user automatically benefits from the “wisdom of the masses” with respect to possible effective search paths.

1.1 Motivation Search optimization and ranking algorithms, while already very good, continue to improve as organizations and researchers like Google and Microsoft continually refine their implementations. However, from a humanistic side of exploratory search, the current challenges faced by searchers are not really about automation, keyword matching and relevance ranking. Instead, users are concerned about ease of access, utility, and what we term “time to value”. Research in this area has indicated that the greater effort and/or time a searcher uses results in lower satisfaction, particularly if time elapses without a resolution [3]. Further, user frustration is not isolated to the current interaction, but builds over time from previous states of frustration [2]. Therefore, it stands to reason that if we can decrease a user’s “time to value” for a search, we believe we will have improved search session quality. We believe that the use of CMM for exploratory search and re-search sessions does exactly that. By presenting the user with a list of viable, reusable roadmaps that have led to successful search results matching the current user query, we believe the time to achieve a successful matching end result can be reduced.

2. RELATED WORK 2.1 Built-in Browser Capabilities Modern browsers have a number of features to track and organize navigation history. This includes browser site/link history, bookmarks, and “favorites”. While these artifacts may be searched and reused, they are not automatically shared across users and compared for relevance in new queries.

Figure 1: Results with matching Roadmaps displayed on top, preview thumbnails for Points-of-Interest, and traditional clustered results below

2.2 Research Systems Several systems allow users to track search paths, and mark preferred URLs and useful queries. We examine a number of them here, and differentiate them from CMM. SearchBar [6] allows users to store query histories, browsing threads, user notes and ratings. SearchBar is primarily focused on the task of re-finding and task resumption between a single user’s search sessions, with that same user’s search history. SearchPad [1] implemented a virtual “notepad” that allowed users to track their search progress, and stored search results for later review. SearchPad required explicit action by users to “mark” useful/interesting sites, and focused on short-term search “leads”, rather than long-term browsing milestones. Neither SearchPad nor SearchBar address sharing results and search paths automatically or applying a relevancy ranking on existing paths, as CMM does. Additionally, CMM utilizes fine-grained user roadmap reuse in conjunction with traditional engine clustering, thus giving the user multiple possible success routes for a single query. CoScripter Reusable History (CRH, formerly ActionShot) [5] is an integrated web browser utility used to create a fine-grained history of users’ browsing activities. It is built on the CoScripter [4] platform, and continually records fine-grained browsing actions (like button clicks, text entries, etc.). CRH focuses a great deal on recording/playback of browsing sessions with fine-grained detail. Unlike CMM, CRH does not integrate new query results. Also, while recorded sessions may be shared between users, CRH differs from CMM by requiring this task to be done explicitly. Pendersen et. al. [8] describe their recent early research on “research trailing”: an approach that automatically filters and organizes user activity into groups of related work. These trails are useful for task resumption and re-finding information, but unlike our CMM system, their research implements neither automatic share-ability nor the application of “trail searching” against new queries. Teevan’s Re:Search Engine [10] supports refinding and new search queries by preserving previous query result lists, and merging them with new query results. However, unlike CMM, Re:Search does not keep track of user paths and points of interest. CMM further differentiates itself by allowing

for the incremental application of user-saved roadmaps overlaid on top of current system results. While similar to Re:Search merging, CMM’s overlays are user selected, and may come from a variety of sources besides the current search session.

3. DEFINITIONS The Conceptual Mile Marker prototype forms its capabilities around a framework that encompasses the basic notions of Milestones and Roadmaps. Milestone: As with actual, physical milestones, a CMM milestone is a reference point: a web site or URL that represents a user’s interim or end destination. Roadmap: One or more connected milestones along a user’s search path represent a roadmap. The first milestone in a roadmap is its on-ramp; the last milestone (or destination) is the roadmap’s off-ramp. If there are milestones between the on- and off-ramps, these are referred to as points-of-interest. User Route/Overlay: A roadmap that has been stored outside of the system, and then later uploaded to be merged with systemmanaged roadmaps. User routes are used as lightweight personalization mechanisms as a type of “bring your own favorite roadmap”. QR code: A QR code (Quick Response code) is a 2 dimensional barcode that is readable by dedicated QR barcode readers and camera telephones. QR codes can store a multitude of information, including plain text, URLs, business cards, and other data. As the user interacts with the CMM prototype, the system automatically generates milestones and roadmaps for query comparisons and reuse later.

4. CMM: THE SYSTEM CMM is implemented as a client-server system with the core server components managing roadmap histories and search result clustering, and the client consisting of a small amount of Javascript code. The CMM Launchpad is similar to many search provider pages (like Google or Yahoo).

Partial view of the traditional browsing area

Roadmap with active thumbnails

Figure 2: A CMM browsing session. As the roadmap is built, the user is shown active thumbnails of the sites visited

Figure 1 illustrates the key behavior of the search interface for CMM. Users enter a search as they normally would with any other search provider. The query is sent to the clustering service, which utilizes a group of common search sites for results. In parallel, the query is processed by the CMM system utilizing Simon White’s String Similarity algorithm [13]. We selected White’s algorithm because of its ease in implementation, and its stated performance: it outperforms both the Edit Distance and Longest Common Substring (LCS) algorithms in many cases, by examining both common substrings and the ordering of those substrings. By doing so, it combines the strengths of both Edit Distance and LCS. After the query is submitted, the system aggregates and presents any matching roadmaps at the top of the page, and a traditional clustered result set from the search engine(s) below that. We opted to present the possible roadmaps first because of the conventional wisdom espoused by many researchers that users will abandon a web site or search if they do not find the needed information in the first page of results returned by the search engine [11]. Roadmaps have entry points and destinations (“on-” and “offramps”) for each query. Additionally, roadmaps may contain interim milestones that we refer to as “points-of-interest”. For each roadmap, we display web site previews as the user hovers over each item, to help trigger a visually-relevant conceptual connection between the user’s goal and the point-of-interest. We believe these conceptual clues are significant since in related user studies by [12], it was found that participants intentionally sought a specific URL 48% of the time during re-finding sessions. As the user continues their browsing session, the CMM client tracks URLs visited, collects page view and scroll metrics, and creates another roadmap that is visible to the user. All the site thumbnails are hyperlinked, allowing the user to revisit recent sites easily without having to use the browser “Back” button or history. Figure 2 shows a screenshot of a browsing session with CMM building a dynamic roadmap at the bottom of the screen. When the user has finished the current browsing thread, the system stores the recently created roadmap in a server database. It also presents the user with a Quick Response code (QR code) that encodes the browsing session. These QR codes enable two things with CMM:

Figure 3: A QR code, encoding the most recent roadmap. These may be saved, shared and reloaded in future sessions for lightweight personalization and reuse.

1. Lightweight Personalization and Reuse: The QR code(s) may be thought of as “user preferred routes”. Later (weeks, months, years), they can be reloaded into CMM as an overlay onto the existing roadmaps stored in the system. When a query is submitted to CMM, the user-loaded roadmaps are searched as well – allowing for effective re-searching/re-finding of information. 2. Group Reuse: There has been a fair amount of research on collaborative search, sharing search results, and how to move search from a singular venture to a group activity [7]. CMM’s QR codes would allow multiple participants to embark on a joint search task, and then merge the best roadmap results via the application of QR codes to a common search session. A simple scenario is included here to illustrate the system behavior and roadmap concept: 1.

The user enters the query: cold fusion

2.

The system responds with a number of existing roadmaps using White’s algorithm, based on relevancy to the query. The system also presents a set of clustered traditional results from search engines like Google, thereby giving the user flexibility in either following an existing roadmap, or exploring on their own.

3.

The user decides that none of the existing roadmaps is interesting, and selects an item from the search engine results. A new roadmap with on-ramp: Italian Scientist Claim Cold Fusion Breakthrough (at foxnews.com) is created.

4.

As the user selects interesting links on this and subsequent pages, the system buildings out the roadmap dynamically. The system adds the following points of interest based on user navigation: At Annual Convention, Chemists Warm to Cold Fusion (at www.popsci.com/technology) and North Korea Claims to Achieve Unprecedented Nuclear Fusion Success (another page on the same site)

5.

The user finishes the current search thread by returning to the main search page. The last milestone in the generated roadmap above becomes the off-ramp, and a QR code containing the user roadmap is generated and displayed.

5. DISCUSSION Despite being a prototypical system, early indication is that CMM is easy to use and achieves the humanistic concerns that were articulated earlier in this paper. Further, because CMM presents both traditional clustered search results and reusable roadmaps, we believe users’ time to value and frustration levels will be no worse than using a traditional search paradigm, and in most cases should be significantly better. For usability, we attempted to be as “non-invasive” as possible. From a usage perspective, that meant our design focused on as much “hands free” operation as possible. Roadmaps are generated automatically. User routes/overlays are optional, and not required. Clustered search results are presented below any available roadmaps, so if users want to search in a traditional way (on their own), they are free to do so. Roadmaps that are presented display only on-ramp information unless explicitly expanded, thus minimizing the “information overload” problem. To address other usability aspects, we used jQuery for many interactive user elements, thus giving the user a responsive, “information on demand” environment. This includes rich hover capability as users place their mouse over roadmap elements, and information compression (collapsible table views, accordion elements) to show key data without overwhelming the screen with information. In addition to the usability and our overall roadmap framework, we believe a significant contribution was established by using QR codes as a lightweight personalization and reuse mechanism. QR codes have been pervasive in Asia for some time now, but are only recently beginning to start a viral adoption in the United States now (2011). The authors have seen them used for a variety of purposes, including email signatures, references to URLs, and even “hidden text”, but we believe our application of QR codes to store user routes and re-load them into our system for later use is unique. We generate QR codes by using the freely accessible Google chart API on Google’s site. For decoding, we use the ZXing java framework (also by Google), which resides locally within the CMM system server. Both of these are accessed directly via a call-out from our main Ruby code. By utilizing QR codes for this purpose, we are able to store significantly large roadmap session history without cluttering the user interface, and without openly exposing (on the screen – which may be viewed by others) which URLs a user has visited. We believe our approach addresses more than just a simple privacy concern. Our usage of QR codes relieves the user from having to register or log-in to get some of the benefits of personalization.

6. CONCLUSIONS

into the system to assist in re-searching/re-finding tasks, as well as a mechanism to facilitate non-co-located joint search sessions amongst two or more collaborators. Future work will examine how to deal with the management of very large numbers of system roadmaps as more users search for similar items over time. We will also explore adding more contextual information for roadmaps, including user ratings, search notes, and other annotations that would enhance a user’s designation of relevance.

7. REFERENCES [1] Bharat, K. SearchPad: Explicit Capture of Search Context to Support Web Search. Computer Networks 33, 2000 [2] Feild, H. and Allan, J. Modeling Searcher Frustration. HCIR 2009. [3] Gwizdka, J. Cognitive Load and Web Search Tasks. Proceedings of HCIR 2009. 2009. [4] Leshed, G., Haber, E., Matthews, T., and Lau, T. CoScripter: Automating & Sharing How-To Knowledge in the Enterprise. CHI 2008. ACM, 2008. [5] Li, I. Nichols, J., Lau, T., Drews, C., and Cypher, A. Here's What I Did: Sharing and Reusing Web Activity with ActionShot. CHI 2010, End-User Programming II. ACM, 2010. [6] Morris, D., Morris, M.R., and Venolia, G. SearchBar: A Search-Centric Web History for Task Resumption and Information Re-finding. Proceeding of the twenty-sixth annual SIGCHI conference on Human Factors in computing systems. ACM, 2008. [7] Paul, S.A. and Morris, M.R. CoSense: Enhancing Sensemaking for Collaborative Web Search. Proceedings of the 27th international conference on Human factors in computing systems. ACM, 2009. [8] Pedersen, E.R., Gyllstrom, K., Gu, S., and Hong, P.J. Automatic Generation of Research Trails in Web History. CHI '09. ACM, 2009. [9] Shah, C. and Marchionini, G. Query Reuse in Exploratory Search Tasks. HCIR 2009. 2009. [10] Teevan, J. The Re:Search Engine: Simultaneous Support for Finding and Re-finding. Proceedings of the 20th annual ACM symposium on User interface software and technology. Newport, RI: ACM, 2007. 23-32. [11] Tunkelang, D. The Information Availability Problem. Proceedings of HCIR 2009. 2009.

In our paper, we describe a framework and prototypical system that reduces a user's “time to value” by matching a search query with existing search paths that are synergistic to the current query. These Conceptual Mile Markers are presented in conjunction with a traditional clustered result set, thereby allowing users to explore on their own, or reuse previous work.

[12] Tyler, Sarah K. and Teevan, J. Large Scale Query Log Analysis of Re-Finding. Web Searc and Web Data Mining 2010 (WSDM '10). ACM, 2010.

As part of our prototype, we believe we have made a significant contribution through the use QR codes for lightweight personalization and reuse. We use QR codes to store significantly large roadmap session history that can be reused and "overlaid"

[14] Won, S.S., Jin, J., and Hong, J.J. Contextual Web History: Using Visual and Contextual Cues to Improve Web Browser History. Proceedings of CHI '09. ACM, 2009

[13] White, S. How to Strike a Match (String Similarity Algorithm). http://www.catalysoft.com/articles/StrikeAMatch.html.

Proceedings Template - WORD

personalization mechanisms as a type of “bring your own favorite roadmap”. QR code: A ... system stores the recently created roadmap in a server database. It.

297KB Sizes 1 Downloads 270 Views

Recommend Documents

Proceedings Template - WORD
This paper presents a System for Early Analysis of SoCs (SEAS) .... converted to a SystemC program which has constructor calls for ... cores contain more critical connections, such as high-speed IOs, ... At this early stage, the typical way to.

Proceedings Template - WORD - PDFKUL.COM
multimedia authoring system dedicated to end-users aims at facilitating multimedia documents creation. ... LimSee3 [7] is a generic tool (or platform) for editing multimedia documents and as such it provides several .... produced with an XSLT transfo

Proceedings Template - WORD
Through the use of crowdsourcing services like. Amazon's Mechanical ...... improving data quality and data mining using multiple, noisy labelers. In KDD 2008.

Proceedings Template - WORD
software such as Adobe Flash Creative Suite 3, SwiSH, ... after a course, to create a fully synchronized multimedia ... of on-line viewable course presentations.

Proceedings Template - WORD
We propose to address the problem of encouraging ... Topic: A friend of yours insists that you must only buy and .... Information Seeking Behavior on the Web.

Proceedings Template - WORD
10, 11]. Dialogic instruction involves fewer teacher questions and ... achievment [1, 3, 10]. ..... system) 2.0: A Windows laptop computer system for the in-.

Proceedings Template - WORD
Universal Hash Function has over other classes of Hash function. ..... O PG. O nPG. O MG. M. +. +. +. = +. 4. CONCLUSIONS. As stated by the results in the ... 1023–1030,. [4] Mitchell, M. An Introduction to Genetic Algorithms. MIT. Press, 2005.

Proceedings Template - WORD
As any heuristic implicitly sequences the input when it reads data, the presentation captures ... Pushing this idea further, a heuristic h is a mapping from one.

Proceedings Template - WORD
Experimental results on the datasets of TREC web track, OSHUMED, and a commercial web search ..... TREC data, since OHSUMED is a text document collection without hyperlink. ..... Knowledge Discovery and Data Mining (KDD), ACM.

Proceedings Template - WORD
685 Education Sciences. Madison WI, 53706-1475 [email protected] ... student engagement [11] and improve student achievement [24]. However, the quality of implementation of dialogic ..... for Knowledge Analysis (WEKA) [9] an open source data min

Proceedings Template - WORD
presented an image of a historical document and are asked to transcribe selected fields thereof. FSI has over 100,000 volunteer annotators and a large associated infrastructure of personnel and hardware for managing the crowd sourcing. FSI annotators

Proceedings Template - WORD
has existed for over a century and is routinely used in business and academia .... Administration ..... specifics of the data sources are outline in Appendix A. This.

Proceedings Template - WORD
the technical system, the users, their tasks and organizational con- ..... HTML editor employee. HTML file. Figure 2: Simple example of the SeeMe notation. 352 ...

Proceedings Template - WORD
Dept. of Computer Science. University of Vermont. Burlington, VT 05405. 802-656-9116 [email protected]. Margaret J. Eppstein. Dept. of Computer Science. University of Vermont. Burlington, VT 05405. 802-656-1918. [email protected]. ABSTRACT. T

Proceedings Template - WORD
Mar 25, 2011 - RFID. 10 IDOC with cryptic names & XSDs with long names. CRM. 8. IDOC & XSDs with long ... partners to the Joint Automotive Industry standard. The correct .... Informationsintegration in Service-Architekturen. [16] Rahm, E.

Proceedings Template - WORD
Jun 18, 2012 - such as social networks, micro-blogs, protein-protein interactions, and the .... the level-synchronized BFS are explained in [2][3]. Algorithm I: ...

Proceedings Template - WORD
information beyond their own contacts such as business services. We propose tagging contacts and sharing the tags with one's social network as a solution to ...

Proceedings Template - WORD
accounting for the gap. There was no ... source computer vision software library, was used to isolate the red balloon from the ..... D'Mello, S. et al. 2016. Attending to Attention: Detecting and Combating Mind Wandering during Computerized.

Proceedings Template - WORD
fitness function based on the ReliefF data mining algorithm. Preliminary results from ... the approach to larger data sets and to lower heritabilities. Categories and ...

Proceedings Template - WORD
non-Linux user with Opera non-Linux user with FireFox. Linux user ... The click chain model is introduced by F. Guo et al.[15]. It differs from the original cascade ...

Proceedings Template - WORD
temporal resolution between satellite sensor data, the need to establish ... Algorithms, Design. Keywords ..... cyclone events to analyze and visualize. On the ...

Proceedings Template - WORD
Many software projects use dezvelopment support systems such as bug tracking ... hosting service such as sourceforge.net that can be used at no fee. In case of ...

Proceedings Template - WORD
access speed(for the time being), small screen, and personal holding. ... that implement the WAP specification, like mobile phones. It is simpler and more widely ...

Proceedings Template - WORD
effectiveness of the VSE compare to Google is evaluated. The VSE ... provider. Hence, the VSE is a visualized layer built on top of Google as a search interface with which the user interacts .... Lexical Operators to Improve Internet Searches.