Domain-Specific-Custom-Search-for-Quicker-Information-Retrieval ...

Viewer
Transcript

26 International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013

Domain Specific Custom Search for Quicker Information Retrieval Tushar Kanti Saha, Department of Computer Science and Engineering, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh, Bangladesh A. B. M. Shawkat Ali, School of Engineering and Technology, Central Queensland University, North Rockhampton, QLD 4701, Australia & i-LaB Australia, QLD 4701, Australia

ABSTRACT Recently researchers are using Google scholar widely to find out the research articles and the relevant experts in their domain. But it is unable to find out all experts in a relevant research area from a specific country by a quick search. Basically the custom search technique is not available in the current Google scholar’s setup. The authors have combined custom search with domain-specific search and named as domain specific custom search in this research. First time this research introduces a domain specific custom search technique using new search methodology called n-paged-m-items partial crawling algorithm. This algorithm is a real-time faster crawling algorithm due to the partial crawling technique. It does not store anything in the database, which can be shown later on to the user. The proposed algorithm is implemented on a new domain scholar. google.com to find out the scholars or experts quickly. Finally the authors observe the better performance of the proposed algorithm comparing with Google scholar.

Keywords: Custom Search, Domain, Information Retrieval, Specific, Tool

1. INTRODUCTION Modern world is overwhelmed due to the huge amount of information. In general, knowledge seeker requires very specific information in a quick manner from the huge volume of information repository through the internet. Search engine is a common tool to meet the user demand. But when they post a query to the search engine, it returns thousand of results within a few seconds. User may get the desired result in the first line or in the first page of the search results. Otherwise, user has to look up

page by page for finding the desired result. If desired result is not found within 4-5 pages, user usually tries another related query. Otherwise user moves to another search engine to meet the own demand. This is a time consuming process. Search engine service providers like Google, Yahoo, Bing, etc. have engaged their spiders to collect as many information as they can to enrich the repositories. These engines are also considered other issues, for instances page ranking, quick search result display, most relevant result display, etc. These are not focusing on user’s custom proper requirement. In

DOI: 10.4018/ijirr.2013070102 Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013 27

this research we focus on that issue. Moreover, now-a-days custom search engine (Schmick, 2012) and domain-specific search (DSS) engines (Bhatt, 2003; Wikipedia, 2013a; Hanbury & Lupu, 2013) are very much popular among the people who are familiar with it. As a result we concentrate on combining these two search techniques called domain specific custom search (DSCS) technique on search engine results. In our work, domain has two different concepts. One is search area and another is website itself known as domain. Previous researchers worked on personalized DSS (Zhang, 2008) to retrieve data from Chinese domain. But we have personalized our search on single domain i.e. website and single area. Our DSCS technique works for expert retrieval depending on their research area and country. To discuss about DSCS tool for quicker information retrieval, we have taken expert retrieval as our custom search area and Google scholar domain as our work area. The rest of the paper is ordered as follows. Section 2 discusses literary survey of domain specific custom search up to this year in different views. Section 3 and 4 show what DSCS is and why it is needed. Applications of domain specific custom search engine are shown in section 5. In section 6, current search trend at Google scholar and its pitfalls are discussed. As the proposed algorithm n-paged-m-items partial crawling is briefly discussed in section 7. A vast description of DSCS technique developed in this research is presented in section 8. Section 9 shows the proposed algorithm real life implementation as a tool. The tool performances are summarized in section 10. Future direction for the upcoming researchers is explained in section 11. The conclusion of this proposed research is summarised towards the end of this paper.

2. RELATED WORK Research in domain specific information retrieval has been started about one and half decade ago. Witten et al. (1995) started work on domain specific information retrieval during building a digital library and later on they had reviewed their work (Witten, Nevill-Manning,

McNab, & Cunnningham, 1998). McCallum et al. (1999) used a machine learning approach for the creation and maintenance of a domain specific search engine whose task was to find out computer science research papers. Another machine learning approach for information retrieval using DSS was also proposed by Sharma (2008). A new method named keyword spice was proposed by Oyama et al. (2001) for building domain-specific web search engines. Here they claim that addition of domain specific keywords called ‘keyword spices’ in the user’s query improves search performance. Another task on effective retrieval of healthcare and shopping information was done by Bhavnani (2002). He explained that such knowledge is not automatically acquired from using generalpurpose search engines. Bhatt (2003) described his invention which was directed to a search engine configuration to locate data within a defined domain. In the same year, an objectoriented framework for domain specific search engine was proposed by Zhang et al. (2003). They claim that this framework increased the capability of building DSS applications. After 2001, Oyama et al. (2004) implemented their technique to build a domain-specific web search engine which used that keyword spices for locating recipes, restaurants, and used cars. But they need training examples classified by humans to discover keyword spices which were collected from web documents by using learning technology. These training examples are expensive which has the significant disadvantage of their method. Guo et al. (2006) implemented a system called SESQ for building domain specific search engine. Here they introduced new system for advanced search system for a specific item (e.g. research paper) where user first needs to specify the data schema and seed for the data. Depending on this schema user finds new web sites and web pages by crawling. Chandrashekar et al. (2010) explained a new hypertext resource discovery system called ‘topic specific crawler’. The goal of this crawler was to find out only those pages which are most relevant to the query depending on a predefined set of topics. After that different researchers have worked on different domain specific retrieval system in the

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

28 International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013

Figure 1. Basic block diagram of DSCS

search areas like medical information (Hanbury, 2012), music (Li & Wang, 2012) and semantic annotation (Ahmed, 2009). But domain specific expertise retrieval for finding expert was done by Jiang et al. (2008). Here they concentrate on Google search results to find the experts. Deng (2009) proposed a weighted language model for expertise retrieval with graph-based regularization on his PhD thesis under the department of computer science and engineering at the Chinese university of Hong Kong. In his thesis, the expert retrieval was done based on web access or query log of the expert and the documents found upon this query. Priyatam et al. (2012) used DSS crawler to search tourism and health contents from web domains which are developed using 10 major Indian languages. A new model is proposed by Hanbury and Lupu (2013) who express a new definition of domain specific search, together with a corresponding model, to assist researchers, systems designers and system beneficiaries in their analysis of their own domain. Recently Redkar et al. (2013) proposed a method for generating topic hierarchies depending on search key to improve the performance of DSS. Through our work, we have implemented our custom search technique (CST) on a specific domain to find out scholar quickly in a different manner to help the researchers.

searching a particular topic on a specific domain but in custom format. Basically it is combination of both DSS and CS technique. The basic block diagram of DSCS is shown in Figure 1 which is a five steps diagram. In this search technique, end users use a custom interface to send their query to web crawler. The web crawler accepts it and does some pre-processing tasks on the query. Then it sends processed query to actual domain. After that search results returned by the actual domain are processed by the result processor. Finally the custom output is displayed to the user. There are about 42 domain specific search engine shown in (Wikipedia, 2013c). Some examples of DSS are indeed. com for job searching (Wikipedia, 2013c), rara. com for music searching (Wikipedia, 2013a), artcyclopedia.com for fine-art search engine (Wikipedia, 2013a), etc.

3. WHAT IS DOMAIN SPECIFIC CUSTOM SEARCH (DSCS)?

•

Domain specific search (DSS) also called vertical search (Chau, 2003) means searching a specific area of web content in some particular domains and custom search (CS) means giving user special facility to search rather than only keyword based search. In our work DSCS means

4. WHY DSCS? DSCS acts as a supporting tool for end users who generally use search engines for their web content searching. This really helps them to find the information quickly what they want. Day by day it is becoming popular among web users and researchers. We can point out some major issues that act as a reason of popularity of DSCS as follows:

• • • •

Extra search advantages than regular search engine. Quicker information finding facility than general search engine. Remove the flaws of search engines. Show the most relevant search results for search query. User gets an advanced custom interface for sending their query.

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013 29

5. APPLICATION OF DSCS TECHNIQUE Both DSS and CS technique employed as DSCS have already applied in various fields of search by different researchers all over the world. Here we review some applications area of early researchers work and their findings to state its vast application areas over the internet among end users. Some of the applications of DSCS are stated as below.

5.1. Medical Information Search Domain specific search is now widely used in this field but sensitively. Researchers in this domain must be careful because a simple mistake may cause loss of life. For example - researchers worked in finding healthcare information (Bhavnani, 2002), biomedicine (Fautsch, 2009), medical information (Hanbury, 2012) and custom search engine for finding treatment like heart diseases (Dey & Abraham, 2010).

5.2. Music Search Domain specific search can be used in searching music according its tempo, culture and beat strength features. For example - in gait training a domain-specific music retrieval system according to its quality helps music therapists finding appropriate music for Parkinson’s disease patients (Li & Wang, 2012).

5.3. Expert Search To do any effective and research work, company is always looking for experts in different fields. So domain specific expert search can be effective in this case. For example – expert search can be done through analysing underlying facility of search engines ranking by customize it (Jiang, Han, & Lu, 2008; Santos, Macdonald & Ounis, 2011; Balog, Fang, de Rijke, Serdyukov, & Si, 2012).

5.4. Wikipedia Search Wikipedia is acting as vast source of knowledge. That’s why researchers are interested more to

work for custom search engine for this domain only such as Wikipedia based search engine (Halling, Sigurðsson, Larsen, Knudsen, & Hansen, 2008), Koru- a new search interface (Milne, Witten, & Nichols, 2007), etc.

5.5. Meta Search Domain specific search is now being used in searching Meta data from the web. For example – Wang et al. (2006) applied of domain-specific search method in Meta search engine over the Internet.

5.6. E-Business Information Search Domain specific search is also used in searching business information such as domain-specific business information search system in ecommerce (Xia, Wang, Wang, & Liu, 2009), shopping information (Bhavnani, 2002).

6. CURRENT SEARCH TRENDS AT GOOGLE SCHOLAR Before the discussion of search trends of Google scholar we will say what is it? It is a freely accessible web search engine that indexes the full text of scholarly literatures across an array of publishing formats and disciplines (Wikipedia, 2013b). They are also facilitating scholars all over the world to create their own profile and store their affiliation, fields of interest, own article and citation found in Google scholar. At present, Google scholar is helping researchers, and others to find out their interested articles in advance manner which is shown in Figure 2. Here advanced manners include article searching through keywords in title or body, author’s name, publisher, published year, etc. This is also providing authors searching facility found in the inner part of Google scholar search engine as shown in Figure 3 according to the research fields. However most of the novice researchers are not much aware of how to use this? So we are going to customize this part of Google scholar to help researcher and others to retrieve their expected information in a quicker

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

30 International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013

Figure 2. Current search interfaces at Google scholar for searching articles and legal documents

Figure 3. Search author interface at Google scholar

manner. Those who are using it have to put their keywords using underscore between word that is preceded by the text ‘label’ and a punctuation mark (:) called colon. For example – if someone wants to search experts in the field of ‘transfer learning’ then his search keyword will be ‘label: transfer_learning’ as shown in Figure 3. Beside this, Google scholar has also data repetition error as shown by the second row of Figure 4. Here we have used the search key ‘label:smart_grid’ then then Google scholar return same profile of one author on third page in their search results. If a novice researcher wants to accomplish his higher degree to specific country in his related

area of interest, it is hard for him or her to find that throughout current search interface. He or she has to go through every result page and also visit author personal web page to find out their country. Someone may understand from their email address if it contains country code top level domain (ccTLD). But this process is manual and time consuming one. Moreover, the search results for authors are shown in Google scholar page according to their citation index. But Google scholar produces wrong citation count because citation counts on Google Scholar can be manipulated (Wikipedia, 2013b). So it

Figure 4. Author’s information repetition error (some pictures are not visualized due to privacy issue)

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013 31

would be better if there is a custom interface to search authors at Google scholar.

7. PARTIAL CRAWLING ALGORITHM In this section, we discuss about our proposed algorithm known as n-paged-m-items partial crawling algorithm that has been used in our DSCS technique. Our crawling algorithm is quite different than previous researchers (Yin, Liu, Yang & Zhang, 2009) work on DSS. In their work, they used the crawling strategy that combined C4.5 decision tree to predict the relevance of a link target and link characteristic of parent pages. Our algorithm makes the custom search process significantly first that would be evaluated in our evaluation section. The pseudo code

of this algorithm is shown in above. Algorithm 1 stated below contains two section one main algorithm and another is a procedure that has been called from the algorithm main function. In our algorithm we have used a dollar sign ($) before name of each variable. Here crawling is called partial crawling because all crawling job is not done by a single crawl. Two special parameters n and m used here to make the partial job done perfectly. Here n denotes number of searched pages until no items found in the search and m denotes number items found in the searched result when crawling will stop. We can also say n as a page limit and m as item limit. The algorithm works as follows. First the algorithm will set the values of n and m that are greater than zero. Then it will also set the value of URL for crawling with required query string. Then in next line a pro-

Algorithm 1.

ALGORITHM: n-PAGED-m-ITEM PARTIAL CRAWLING 1. Input n and m. /* standard value is 10-20 for both n and m */ 2. Set the value of variable $url to crawl in a particular domain with required query string. 3. $result = getItems($url, $m, $n) /* Call the getItems */ 4. display ($result, $next_page_url, $prev_page_url, $message) /* Display the output and wait for decision to crawl again */ 5. Exit PROCEDURE: getItems($url, $m, $n) 1. Set $item_limit = $m and $page_limit = $n 2. Set $item_count = 0 and $page_count = 0 3. $page = readurlcontent($url); /* Read the web page $url */ 4. If $page is empty then display error and return 5. $html = gethtml($page) 6. $items = finditem($html) /* find all items from the page */ 7. $item_count =count($items) /* count the number of found items */ 8. If $item_count<=0 then $page_count =$page_count +1 /* check more page exists */ 9. If more page exists then Set the $next_page_url and $prev_page_url; /* end of check more page exists*/ /* check the item limit */ 10. If $item_count >= $m then return $next_page_url, $prev_page_url and $output; /* End of check item limit */ /* check the end of crawl */ 11. If $next_page_url is empty or $page_count >= $page_limit then return $output and $message; else return getItems($next_page_url, $item_limit, $page_limit); /*Recursive call*/ /*end of the procedure*/

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

32 International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013

cedure named ‘getItems’ is called using these parameters. This procedure will return desired items in the crawling with proper message and links. Now we will describe about the procedure how it works. The procedure first receives three parameter’s values. Then it assigns the value of item counter and page counter to zero. Then it read the page URL using partial crawling technique and assigns the crawling output in variable named page. Form the page content then it tries to find out the desired items. If number of item is zero then page counter will be increased by one. It means that number of pages having no items is one. Otherwise items found and check that number of items exceeds item limit or not. If exceeds and more pages exists then return output containing items, previous page URL and next page URL. If not exceeds and more pages exists then call this procedure from this procedure recursively until partial crawling condition holds. For example- if we set the values of n to 5 then the algorithm will crawl maximum 5 pages in the search results until no item found. If required item is not found in the 5 pages

and more pages exist to crawl then crawler will stopped and show proper message to the user. If the user is agreed to crawl again then it works in the same way. Again, if we set the value of m to 10 then crawling will be stopped when number of found items is greater than or equal to 10 and wait for the next decision to crawl from the user. If the user is satisfied with his search result then no more crawling will be done. Otherwise crawling will be done using next page URL if more pages exist. Using this technique crawling will be continued until the end of search pages if user wants more output from the crawler.

8. DSCS TECHNIQUE In this step, we discuss the main DSCS technique which is implemented on Google scholar domain. Here we apply our n-paged-m-items algorithm to find out authors from Google scholar search author pages. The processing block of this technique is shown in Figure 5. The strategies followed in this technique are described in the following steps.

Figure 5. Processing block of DSCS technique

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013 33

8.1. DSCS User Interface When user will browse our DSCS tool he or she will get an interface as shown in Figure 6. The figure shows that the interface contains two fields. One is research field of authors and another is author’s country where he is engaged in profession. That means here user will be able to locate authors according to their research field and job location. Job location describes where author’s research organization or institute is situated. Through this interface users will send their search query to our web crawler. Here we have removed the weakness of Google scholar keyword input facility as described in section 6.

8.2. Query Filter Between the web user interface and web crawler, there is a phase named query filter. The function of this filter is to find out some malicious query by which hacker may attack our custom search engine. It also validates the input of the user. When filtering is done successfully, it then sends to web crawler for searching user query to actual domain.

8.3. Web Crawler In this phase, the crawling job is done by using simple HTML DOM parser (Saha & Ambia, 2013). Actually DOM parser helps to parse author’s page in structural way. Through parsing we have found out every details of the author provided in the search results page of the Google scholar. Here generally author information includes his/her name, affiliation, institute or organization name and verified email. We have

used here our algorithm called n-paged-m-items partial crawling algorithm which is discussed in the previous section. In our partial crawling technique implemented on Google scholar, n denotes number of pages that will be searched by our crawler until no item found i.e. page limit. Another parameter, m denotes number of authors found in the crawling i.e. author limit. If m authors or greater than m authors are found by the crawler, then it will stop working and wait for the next decision to search. Again if less than m authors are found after completing n-paged search then crawler will also stop working and wait for the next decision to crawl. Here values of n and m are set by site administrator depending speed on web server where DSCS tool is hosted. If the crawling is done successfully then it produces some pages of author’s information. One of the example current search result page implemented by Google scholar is shown in Figure 7. If crawling is unsuccessful then proper message would display to the users. Successful search result pages are then sent to the next phase called search result analyser.

8.4. Search Result Analyser In this phase, every page contents will be analysed that is returned by crawler in the previous section. Here we use a database which contains institutional data all over the world and their country with ccTLD. The E-R diagram of the database is shown in Figure 8. First our algorithm read each author’s profile page exist in the scholar domain as shown in Figure 9 from these author’s pages as shown in Figure 7. Then it finds out each author’s research

Figure 6. DSCS interface (available at http://find-scholar.i-lab.org.au)

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

34 International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013

Figure 7. Google scholar results page for searching authors

Figure 8. E-R diagram of database used in DSCS techniquei

Figure 9. Example of Google scholar author’s profile page

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013 35

interests, affiliation and verified email using regular expression match. Here each author may have several research interests. In the example of Figure 9, Professor Giles has 5 areas of research interest. From verified email, we have found out the ccTDL using regular expression match and send it to our database to find out corresponding country. If ccTDL is not found in the email address then we have compared university name with our database to find out that country. When country name matched with user’s query then we have matched each author’s research interest with user’s query of research interest. If one of the interests matches with user’s query then we have counted the author as user’s desired author. Here we have skipped some authors or scholars who are working in some business firms and research organization which is out of database list. But if their mail address somehow contains ccTDL then they are not in the skipped author list. When the first partial crawling and analysis is completed then searching success or failure decision is made. If search is successfully done then result would be displayed. Search results are displayed in three ways as shown in Figure10. For example – in our proposed tool contains two parameters are set as n=10, m=10 and user is looking for all authors from Australia in the field of ‘smart grid. In the very first page he/she gets the result as shown in Figure 11. Here is number of authors equal to value of m. This is

why crawling is stopped and result is displayed. If user is not satisfied with this result then he/ she has the chance to go next pages. In the fifth page, he gets only one author as shown in Figure12 because crawler have reached the limit of number of pages search at a time i.e. number searched pages is equal to the value of n. But still now there are some remaining pages to crawl as indicated by next link in the right section as shown in Figure 11. Lastly at the end of search display he/she will get a page shown in Figure 12 where next page link is totally absent.

8.5. Crawling Decision Maker This decision maker depends on the crawling parameters n and m. This is the phase where it is decided that whether more crawling is required or not. It also checks whether more search result pages exist or not. If exist then control goes to third phase called web crawler section again. Otherwise it shows proper message to the user.

9. TECHNIQUE IMPLEMENTED AS A TOOL Our DSCS technique is implemented as a tool and hosted on our personal domain. We have launched this site on a sub-domain named http:// find-scholar.i-lab.org.au. We are expecting its

Figure 10. Format-1 of search result display in DSCS technique

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

36 International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013

Figure 11. Format-2 of search result display in DSCS technique

Figure 12. Format-3 of search result display in DSCS technique

popularity among the students, teachers and researchers all over the world. Through this tool we have been able to remove the data repetition problem of Google scholar as described in section 6. We also hope that this tool not only help new researcher but also established researchers to find out their position in his own country according to their citation index.

10. PERFORMANCE EVALUATION We can evaluate this DSCS tool by applying some real world data into it. The performance

evaluation of our tool for different values of our crawling parameters is shown in Figure 13. We have done a manual testing for finding all scholars or authors from China in the field of ‘information retrieval’. When we searched this using Google scholar, we got more than 130 pages to browse to get our desired result. But through our tool, we got 40 authors from different institutions of china by browsing minimum 4 pages or maximum 24 pages only in four different cases represented pictorially by a 3-D chart as shown in Figure 13. In this figure page limit is represented by blue bar, item limit by red bar and the number of page

Figure 13. 3-D chart of performance evaluation

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013 37

browsed by green bar. This figure shows that if the number pages browsed by user will be decreased with the increasing of values of our crawling parameters n and m. This performance evaluation of Figure 13 gives us a clear idea about the performance of our DSCS tool. Moreover we can evaluate our n-paged-mitems algorithm partial crawling accuracy by our previous example searching ‘information retrieval’ scholar from china. This algorithm stops working when it completes n pages crawling or its collected author’s number is greater or equal to m items or no more pages left to crawl as shown in Table 1. We can say that this algorithm is working perfectly if one of three conditions holds for each crawling. As a result partial accuracy is 100% because it does partial crawling of searched results from Google scholar when at least one condition holds as shown in the Table 1. In this search testing process, partial crawling occurs 6 times where one of stop condition holds for crawling to stop. In case of searching accuracy, our algorithm fails to search those authors who work in an organization rather than educational institution and also whose emails do not contain ccTDL. However our algorithm required more computing time if higher values are set to these parameters. But this time is nominal for a high speed machine. So if we have high performance machine with high speed internet then we can easily set our

crawling parameters to some high values for getting the best performances.

11. FUTURE SCOPE To work in the field of information retrieval and search engine require high speed internet connection and dedicated server machine where DSCS algorithm will run. Both of these are out of our scope in our lab. Moreover, this tool may function well in cloud environments where computational time may be minimized through distributed processing. On the other hand, future researchers may work on this technique to make the search interface more customizable. They may find out author depending on multiple research fields, author’s institution and country. They can also work on other search domain by using this technique to make faster search than current one.

12. CONCLUSION Undoubtedly this custom search technique and tool is really helpful for us as a researcher to find out scholar from different countries according to their field of interest. This research will also help other novice researchers all over the world to find out the desire authors or scholars in their targeted research fields. This work will help future researcher to use our developed algorithm

Table 1. Accuracy evaluation of n-paged-m-items partial crawling algorithm Crawling Number

Crawling Stop Criteria

n = 10 ?

m ≥ 10 ?

No more pages left?

1

Yes

No

No

2

No

Yes

No

3

No

Yes

No

4

No

Yes

No

5

Yes

No

No

6

No

No

Yes

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

38 International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013

in other search areas on different domains. It is our expectation that through this article we have been able to prove how custom search tool help users to get their desired results and make the search process how faster.

ACKNOWLEDGMENT We are very much grateful to Google scholar for their author searching facility in its domain. All the data retrieved by our custom search engine is all the property of Google scholar.

REFERENCES Ahmed, Z. (2009). Domain specific information extraction for semantic annotation. Diploma Thesis, Joint European Masters Program in Language and Communication Technologies (LCT). Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., & Si, L. (2012). Expertise retrieval. Foundations and Trends in Information Retrieval, 6(2-3), 127–256. doi:10.1561/1500000024 Bhatt, N. (2003). Domain specific search engine. U.S. Patent Application 10/461,642. Bhavnani, S. K. (2002, April). Domain-specific search strategies for the effective retrieval of healthcare and shopping information. In Proceedings of the Extended Abstracts on Human Factors in Computing Systems (CHI’02) (pp. 610-611). ACM. Chandrashekar, B. H., & Shobha, G. (2010, February). Semantic domain specific search engine. In Proceedings of the 2010 2nd International Conference on Computer and Automation Engineering (ICCAE) (Vol. 2, pp. 669-672). IEEE. Chau, M., & Chen, H. (2003). Comparison of three vertical search spiders. Computer, 36(5), 56–62. doi:10.1109/MC.2003.1198237 Deng, H. (2009). Web mining techniques for query log analysis and expertise retrieval (Doctoral dissertation, The Chinese University of Hong Kong). Dey, S., & Abraham, S. (2010, October). Personalised and domain specific user interface for a search engine. In Proceedings of the 2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM) (pp. 524-529). IEEE.

Fautsch, C. (2009). Domain specific information retrieval social science, blogsphere and biomedicine (Doctoral dissertation, Université de Neuchâtel). Guo, Q., Zhou, L., Guo, H., & Zhang, J. (2006). SESQ: A novel system for building domain specific web search engines. In Frontiers of WWW Research and Development-APWeb 2006 (pp. 1173-1176). Springer Berlin Heidelberg. Halling, S., Sigurðsson, M., Larsen, J. E., Knudsen, S., & Hansen, L. K. (2008). MuZeeker: A domain specific Wikipedia-based search engine. In Proceedings of the First International Workshop on Mobile Multimedia Processing (WMMP2008). Hanbury, A. (2012, August). Medical information retrieval: An instance of domain-specific search. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1191-1192). ACM. Hanbury, A., & Lupu, M. (2013, May). Toward a model of domain-specific search. In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval (pp. 33-36). Jiang, J., Han, S., & Lu, W. (2008). Expertise retrieval using search engine results. In Proceedings of the SIGIR Workshop on Future Challenges in Expertise Retrieval (fCHER’08). Li, Z., & Wang, Y. (2012, October). A domain-specific music search engine for gait training. In Proceedings of the 20th ACM International Conference on Multimedia (pp. 1311-1312). ACM. McCallum, A., Nigam, K., Rennie, J., & Seymore, K. (1999, July). A machine learning approach to building domain-specific search engines. In IJCAI, 99, 662-667. Milne, D. N., Witten, I. H., & Nichols, D. M. (2007, November). A knowledge-based search engine powered by wikipedia. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (pp. 445-454). ACM. Oyama, S., Kokubo, T., & Ishida, T. (2004). Domainspecific web search with keyword spices. IEEE Transactions on Knowledge and Data Engineering, 16(1), 17–27. doi:10.1109/TKDE.2004.1264819 Oyama, S., Kokubo, T., Ishida, T., Yamada, T., & Kitamura, Y. (2001, August). Keyword spices: A new method for building domain-specific web search engines. In Proceedings of the International Joint Conference on Artificial Intelligence (Vol. 17, No. 1, pp. 1457-1466). Lawrence Erlbaum Associates Ltd.

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

International Journal of Information Retrieval Research, 3(3), 26-39, July-September 2013 39

Priyatam, P. N., Vaddepally, S. R., & Varma, V. (2012, November). Domain specific search in indian languages. In Proceedings of the First Workshop on Information and Knowledge Management for Developing Region (pp. 23-30). ACM. Redkar, S., Dias, N., & Laxminarayana, J. A. (2013). A survey on domain specific search engine. International Journal on Advanced Computer Theory and Engineering, 2(2), 21–26.

Wikipedia. (2013a). Category: Domain-specific search engines. Retrieved from http://en.wikipedia. org/wiki/Category:Domain-specific_search_engines Wikipedia. (2013b). Google scholar. Retrieved from http://en.wikipedia.org/wiki/Google_Scholar Wikipedia. (2013c). Vertical search. Retrieved from http://en.wikipedia.org/wiki/Domain-specific_search

Saha, T. K., & Ahsan-Ul-Ambia. (2013). Code generation tools for automated server-side HTML form validation. [IJCSMR]. International Journal of Computer Science and Management Research, 2(1), 22–25.

Witten, I. H., Nevill-Manning, C., McNab, R., & Cunningham, S. J. (1998). A public digital library based on full-text retrieval: Collections and experience. Communications of the ACM, 41(4), 71–75. doi:10.1145/273035.273057

Santos, R. L., Macdonald, C., & Ounis, I. (2011). Mimicking web search engines for expert search. Information Processing & Management, 47(4), 467–481. doi:10.1016/j.ipm.2010.11.009

Witten, I. H., Nevill-Manning, C. G., & Cunningham, S. J. (1995). Building a public digital library based on full-text retrieval. Working paper 95/24.

Schmick, D. D., Johnson, E. D., Scoville, C. L., & Vaduvathiriyan, P. K. (2012). Building a Google™ custom search engine (CSE) for foreign language health information: One library’s effort to create a new tool for health professionals. Journal of Consumer Health on the Internet, 16(1), 27–36. doi:10.1080/ 15398285.2011.646590 Sharma, S. (2008). Information retrieval in domain specific search engine with machine learning approaches. World Academy of Science, Engineering and Technology, 42. Wang, Z., Wang, Q., & Wang, D. (2006, October). Application of domain-specific search method in meta-search engine on internet. In Proceedings of the IMACS Multiconference on Computational Engineering in Systems Applications (pp. 2078-2085). IEEE.

Xia, R., Wang, Q., Wang, D., & Liu, L. (2009, June). Design and implementation of domain-specific business information search system in electronic commerce environment. In Proceedings of the Control and Decision Conference (CCDC’09) (pp. 5765-5769). IEEE. Yin, C., Liu, J., Yang, C., & Zhang, H. (2009). A novel method for crawler in domain-specific search. Journal of Computer Information Systems, 1749–1755. Zhang, J., Qu, W., Du, L., & Sun, Y. (2003, October). A framework for domain-specific search engine: Design pattern perspective. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (Vol. 4, pp. 3881-3886). IEEE. Zhang, L., Peng, Y., Meng, X., & Guo, J. (2008, July). Personalized domain-specific search engine. In Proceedings of the 6th IEEE International Conference on Industrial Informatics (INDIN 2008) (pp. 1308-1313). IEEE.

Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Domain-Specific-Custom-Search-for-Quicker-Information-Retrieval ...

Domain-Specific-Custom-Search-for-Quicker-Information-Retrieval.pdf. Domain-Specific-Custom-Search-for-Quicker-Information-Retrieval.pdf. Open. Extract.

Download PDF

1MB Sizes 25 Downloads 273 Views

Report

Domain-Specific-Custom-Search-for-Quicker-Information-Retrieval ...

Recommend Documents