IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320
International Journal of Research in Information Technology (IJRIT) www.ijrit.com
ISSN 2001-5569
Mining Website Log for Improvement of Website Navigation D.Thamaraiselvi*, S.Shobana**, K.Deepika*** *PG Scholar, CSE Department, Computer Science and Engineering, Anna University, Chennai **Assistant professor,CSE Department. ***PG Scholar, CSE Department, Computer Science and Engineering, Anna University, Chennai Vivekanandha college of technology for women, Elayampalayam,Thiruchengode. Tamilnadu, India.
*
[email protected] **
[email protected]
***
[email protected] Abstract — The usage of the Internet is increased and the user’s expectations are also increased. The website should meet the user’s needs. A Web site must constantly examine its users and their use of the site, and modify itself accordingly to best serve its users mined from web server logs. The two methods are used to make the website adaptive are web transformation and web personalization approaches. First, transformation approaches change the structure of a website to facilitate the navigation for a large set of users, while personalization approaches customizing the content and structure to the needs of specific users. To reduce the cost of altering the current structure, a mathematical programming model with maximum constraints is proposed to obtain the effective navigation and also to get the user’s target data. The users target data and the target content pages are retrieved based on the user’s access logs. The user’s access logs are analyzed and then web pages are navigated for the user preferences. The path threshold value and the out degree threshold values are used to navigate the web pages effectively. These values are calculated based on the web page links and the web page navigation. Keywords—Web Personalization, Web Transformation, Website Design, User Navigation, Web Mining.
I.
INTRODUCTION
The World Wide Web is only one of hundreds of services used on the Internet. The Web page is a global set of documents, images and other resources, interrelated by links and hyperlinks and referenced with Uniform Resource Identifiers (URIs). These documents may also contain sounds, text, video, multimedia and interactive content that runs while the user is interacting with the page. A typical web page was stored in completed form on a web server in HTML format and ready to send to a user's browser in response to a request. The process of creating and serving web pages has become more automated and more dynamic. The Website developer creates the website and links the web pages based on their own interest [3]. The developer does not know the effectiveness of webpage links during creation. When the website is made available to the user the effectiveness of the links are analyzed from the website log files. Generally the website logs are stored in the web server. The website log maintains the history of page request and page access. The web log file consists of client IP address, session time, search history, web address of the accessed page [6]. The session is the time of the user spends in searching the particular content from single browser. The previous studies on website are based on altering the current structure with maximum changes in the structure of the website. This changes result in undesirable thing such that the cost of changing the current structure is greater than the cost of developing the website and also it needs more time to change the structure of the website. So the mathematical programming model with maximum
D.Thamaraiselvi,IJRIT
315
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320
constraints are used to reduce the cost of the altering the structure of the website and reduce the time taken to alter the current structure. II.
WEB PERSONALIZATION AND WEB TRANSFORMATION
Web personalization refers to the customization of content in the web pages that are relevant to the individual user, based on the user’s expectation and preferences which are analyzed from the user access log files [2]. Here the preferences refers to the users social activities, interests, context etc, Due to the explosive growth of the internet and user’s expectation the web personalization plays an important role in research and commercial area. Web structures present in the web pages are large and complicated and users often miss the goal of their inquiry and receive ambiguous results when they try to navigate through the links in the web pages. The general function of web personalization is classifies the website content based on the conceptual categories. Then the Web site’s content can be enhanced with additional information acquired from other Web sources based on user log files. The additional information about the individual users can be obtained to navigate the web pages effectively. A publishing mechanism will perform the site modification and ensuring that each user navigates through the optimal structure of the web pages. The user's interest will be ranked and based on the rank the web page links are modified. Web transformation refers to transforming the changes applied in the web page links to the user during the next visit. III.
EXIXTING SYSTEM
In the existing system the MP model is used to improve the user navigation on a website while minimizing alterations to its current structure [1]. The user searches the content in the web site.
Figure 1 The Existing MP Model The website developer develops the website and made the website available to the user. Then the user visits the website and searches the content in the website. The web pages are navigated as per the link given by the developer. If the structure of the website is complicated for the users then the users then the users having difficulty in findings the target page and leave the website. The information present in the website is high quality but the structure is not good then the website is said to be poor website. So to make the website efficient the structure of the website should be altered for the users. This can be done with the help of user log files which are present in the websites. The MP model changes the current structure of website with minimum changes to the structure of the website based on the user access log files.
D.Thamaraiselvi,IJRIT
316
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320
The working procedure for the existing system is given as follows: • The user log files are stored in the websites are extracted and analyzed the interaction between users and a website. • The log files are then broken into sessions i.e., the group of activities performed by a user during the visit to the website. • Then the mini sessions are constructed from the sessions. The mini sessions refers to a group of pages visited by the user for the single target page. • If the user cannot identify the target page then the user backtrack to the home page to identify the target page. • The graph is constructed based on the mini sessions and find out the shortest path to reach the target are identified • The objective function found in the MP model reduces the cost needed to improve the website structure. The objective function is based on the number of new links to be added in the website and a page contains the excessive links. • Then the web pages are relinked based on the out degree threshold and path threshold value. Path threshold is defined as the maximum number of paths allowed to reach the target page.The out-degree threshold for a page is highly dependent on the purpose of the page and the website. Based on the out degree threshold the new links are added to the website. By adding new links to the structure if out degree threshold value exceeds then the links should be analyzed and make the links to meet out degree threshold. Consider an example for constructing the mini sessions from the log file present in the website. Consider the website which consists of 10 web pages which is shown in the figure 2. The web pages are represented by 1 to 10.Let Consider 10th page as the target page.
User start Browsing
Figure 2 Example of a mini session
D.Thamaraiselvi,IJRIT
317
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320
The user start browsing from web page1 browses 4 and 7 and backtracks to 4.Then from 4 the user visits 3,2,5,9 and backtracks to 2. Then the user goes from 2 to 6 and finally reaches the target page 10. The mini session S can be denoted by S={{ 1,4,7},{3,2,5,9},{6,10}} where an element in S represents a set of path traversed by the user. In this example, the mini session S has three paths as the user backtracks at 7and 9 before reaching the target 10. The user has traversed three paths before reaching the target. To help user reach the target faster is to introduce more links. The extra links can be added in many ways. If a link is added from 4 to 10, the user can directly reach 10 via 7 and hence reach the target in the first path. Thus, adding this link to the web pages saves the user two paths. Similarly, establishing a link from 2 to 10 enables the user to reach the target in the second path. Hence, this saves the user one path. There is also possibility to insert a link from 5 to 10, and this is considered the same as linking 2 to 10. This is because both 2 and 5 are pages visited in the second path, so linking either one to 10 saves only one path. Yet, another possibility is to link 3 to 6, a non-target page. In this case, the user does not follow the new link, because this link does not directly connect a page to the target. The optimal solutions are obtained from the MP model. Even though the optimal solutions are obtained the MP model mainly focused on the web page navigation of the website. The techniques that can accurately identify user’s targets data are critical to the MP model. III.
PROPOSED SYSTEM
Difficulty in identify user’s targets page is reported as the problem that triggers most customers to abandon a website and switch to a competitor. The proposed system architecture diagram is shown in the figure 3.
Figure 3 Proposed System The user searches the content in the websites. The users faces the decision point at each page and the probability of reaching their targets via each link and make navigation decisions according to the web page link .A user is assumed to follow the path that appears most likely to lead the user to the target. A user may backtrack to an already visited page and find the new path if the user could not locate the target page in the current path. Therefore, the number of paths a user has traversed to reach the target as approximate measure to the number of times the user has attempted to locate one target. The backtrack is used to identify the paths that a user has traversed, where a backtrack is defined as a user’s revisit to a previously browsed page. The following are the procedures for improved mathematical programming model. • Analyze the website log files and construct the mini sessions. • Identify the links which are irrelevant to the user from the mini session • Apply improved MP model to add target pages to the website along with the new links. • Based on the path threshold and out degree threshold the new links are added.
D.Thamaraiselvi,IJRIT
318
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320
•
A.
The performance of the improved website can be analysed by using the two metrics i.e., number of users uses the improved website and the time efficiency of users . CONSTRAINTS FOR PROPOSED SYSTEM The following are the constraints for the proposed system.
a)
OUT DEGREE THRESHOLD The web pages are generally classified into two types: index page and content page. The index page consists of links which is used to help the users for better navigation. The content page consists of information and does not consist of many links. The out-degree threshold for a page is highly dependent on the purpose of the page and the website. The out degree threshold value for index pages should be larger than content pages.
b) PATH THRESHOLD Path threshold is defined as the maximum number of paths allowed to reach the target page. c)
IDENTIFYING TARGET PAGE The target pages are identified based on the time spend by the user in the website and also by adding the tags to the user search and identify the target pages.
Based on these three constraints the improved MP model is used to navigate the web pages effectively and also identify the user’s target page. The web pages are navigated effectively with minimum changes to the current structure based on the user’s preferences. IV.
CONCLUSION
The web mining with web page navigations are now consider as the emerging trend in the field of market, educational system and other fields which is based on the web. This model is effective for the informational websites whose content are stable over time. It is also used in website maintenance on a progressive basis.The optimal solutions obtained from the model are very effective to real world websites. This model provides balance between user and the web page navigation. V.
REFERENCES
[1] M.Chen and U.Young Ryu ‘Facilitating Effective User Navigation through Website Structure Improvement’ IEEE Trans.Knowledge and Data Eng, Vol. 25, No.3, 2013. [2] M. Eirinaki and M. Vazirgiannis, “Web Mining for Web Personalization,” ACM Trans. Internet Technology, vol. 3, no. 1, pp. 1-27, 2003. [3] R. Gupta, A. Bagchi, and S. Sarkar, “Improving Linkage of Web Pages,” INFORMS J. Computing, vol. 19, no. 1, pp. 127-136, 2007. [4] R.Cristobal, V.Sebastian, Z. Ameliaand B. Paul ‘Applying Web usage mining for personalizing hyperlinks in Webbased adaptive educational systems’ Science Direct Computers & Education, Vol. 53,pp. 828–840,2009. [5] Fu Y., Shih M.Y., Creado M and Ju C. ‘Reorganizing Web Sites Based on User Access Patterns’ Intelligent Systems in Accounting, Finance and Management, Vol. 11, No. 1, pp. 39-53,2002. [6] M. Ivory and M. Hearst, “Improving Web Site Design,” IEEE Internet Computing, vol. 6, no. 2, pp. 56-63, Mar. 2002. [7] H. Kao, S. Lin, J. Ho, and M. Chen, “Mining Web Informative Structures and Contents Based on Entropy Analysis,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 1, pp. 41-55, Jan. 2004. [8] J. Kim and B. Yoo, “Toward the Optimal Link Structure of the Cyber Shopping Mall,” Int’l J. Human-Computer Studies, vol. 52, no. 3, pp. 531-551, 2000. [9] C.C. Lin and L. Tseng, “Website Reorganization Using an Ant Colony System,” Expert Systems with Applications, vol. 37, no. 12, pp. 7598-7605, 2010. [10] Y. Fu, M.Y. Shih, M. Creado, and C. Ju, “Reorganizing Web Sites Based on User Access Patterns,” Intelligent Systems in Accounting, Finance and Management, vol. 11, no. 1, pp. 39-53, 2002.
D.Thamaraiselvi,IJRIT
319
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320
[11] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, “Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization,”Data Mining and Knowledge Discovery, vol. 6, no. 1, pp. 61-82, 2002. [12] B. Mobasher, R. Cooley, and J. Srivastava, “Automatic Personalization Based on Web Usage Mining,” Comm. ACM, vol. 43, no. 8, pp. 142-151, 2000. [13] B. Mobasher, R. Cooley, and J. Srivastava, “Creating Adaptive Web Sites through Usage-Based Clustering of URLs,” Proc. Workshop Knowledge and Data Eng. Exchange, 1999. [14] W. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal, “From User Access Patterns to Dynamic Hypertext Linking,” Computer Networks and ISDN Systems, vol. 28, nos. 7-11, pp. 1007-1014, May 1996. [15] R.E. Bucklin and C. Sismeir, “A Model of Website Browsing Behavior Estimated on Clickstream Data,” J. Marketing Research,vol. 40, no. 3, pp. 249-267, 2003.
D.Thamaraiselvi,IJRIT
320