IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320

International Journal of Research in Information Technology (IJRIT) www.ijrit.com

ISSN 2001-5569

Mining Website Log for Improvement of Website Navigation D.Thamaraiselvi*, S.Shobana**, K.Deepika*** *PG Scholar, CSE Department, Computer Science and Engineering, Anna University, Chennai **Assistant professor,CSE Department. ***PG Scholar, CSE Department, Computer Science and Engineering, Anna University, Chennai Vivekanandha college of technology for women, Elayampalayam,Thiruchengode. Tamilnadu, India.

*[email protected] **[email protected]

***[email protected] Abstract — The usage of the Internet is increased and the user’s expectations are also increased. The website should meet the user’s needs. A Web site must constantly examine its users and their use of the site, and modify itself accordingly to best serve its users mined from web server logs. The two methods are used to make the website adaptive are web transformation and web personalization approaches. First, transformation approaches change the structure of a website to facilitate the navigation for a large set of users, while personalization approaches customizing the content and structure to the needs of specific users. To reduce the cost of altering the current structure, a mathematical programming model with maximum constraints is proposed to obtain the effective navigation and also to get the user’s target data. The users target data and the target content pages are retrieved based on the user’s access logs. The user’s access logs are analyzed and then web pages are navigated for the user preferences. The path threshold value and the out degree threshold values are used to navigate the web pages effectively. These values are calculated based on the web page links and the web page navigation. Keywords—Web Personalization, Web Transformation, Website Design, User Navigation, Web Mining.

I.

INTRODUCTION

The World Wide Web is only one of hundreds of services used on the Internet. The Web page is a global set of documents, images and other resources, interrelated by links and hyperlinks and referenced with Uniform Resource Identifiers (URIs). These documents may also contain sounds, text, video, multimedia and interactive content that runs while the user is interacting with the page. A typical web page was stored in completed form on a web server in HTML format and ready to send to a user's browser in response to a request. The process of creating and serving web pages has become more automated and more dynamic. The Website developer creates the website and links the web pages based on their own interest [3]. The developer does not know the effectiveness of webpage links during creation. When the website is made available to the user the effectiveness of the links are analyzed from the website log files. Generally the website logs are stored in the web server. The website log maintains the history of page request and page access. The web log file consists of client IP address, session time, search history, web address of the accessed page [6]. The session is the time of the user spends in searching the particular content from single browser. The previous studies on website are based on altering the current structure with maximum changes in the structure of the website. This changes result in undesirable thing such that the cost of changing the current structure is greater than the cost of developing the website and also it needs more time to change the structure of the website. So the mathematical programming model with maximum

D.Thamaraiselvi,IJRIT

315

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320

constraints are used to reduce the cost of the altering the structure of the website and reduce the time taken to alter the current structure. II.

WEB PERSONALIZATION AND WEB TRANSFORMATION

Web personalization refers to the customization of content in the web pages that are relevant to the individual user, based on the user’s expectation and preferences which are analyzed from the user access log files [2]. Here the preferences refers to the users social activities, interests, context etc, Due to the explosive growth of the internet and user’s expectation the web personalization plays an important role in research and commercial area. Web structures present in the web pages are large and complicated and users often miss the goal of their inquiry and receive ambiguous results when they try to navigate through the links in the web pages. The general function of web personalization is classifies the website content based on the conceptual categories. Then the Web site’s content can be enhanced with additional information acquired from other Web sources based on user log files. The additional information about the individual users can be obtained to navigate the web pages effectively. A publishing mechanism will perform the site modification and ensuring that each user navigates through the optimal structure of the web pages. The user's interest will be ranked and based on the rank the web page links are modified. Web transformation refers to transforming the changes applied in the web page links to the user during the next visit. III.

EXIXTING SYSTEM

In the existing system the MP model is used to improve the user navigation on a website while minimizing alterations to its current structure [1]. The user searches the content in the web site.

Figure 1 The Existing MP Model The website developer develops the website and made the website available to the user. Then the user visits the website and searches the content in the website. The web pages are navigated as per the link given by the developer. If the structure of the website is complicated for the users then the users then the users having difficulty in findings the target page and leave the website. The information present in the website is high quality but the structure is not good then the website is said to be poor website. So to make the website efficient the structure of the website should be altered for the users. This can be done with the help of user log files which are present in the websites. The MP model changes the current structure of website with minimum changes to the structure of the website based on the user access log files.

D.Thamaraiselvi,IJRIT

316

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320

The working procedure for the existing system is given as follows: • The user log files are stored in the websites are extracted and analyzed the interaction between users and a website. • The log files are then broken into sessions i.e., the group of activities performed by a user during the visit to the website. • Then the mini sessions are constructed from the sessions. The mini sessions refers to a group of pages visited by the user for the single target page. • If the user cannot identify the target page then the user backtrack to the home page to identify the target page. • The graph is constructed based on the mini sessions and find out the shortest path to reach the target are identified • The objective function found in the MP model reduces the cost needed to improve the website structure. The objective function is based on the number of new links to be added in the website and a page contains the excessive links. • Then the web pages are relinked based on the out degree threshold and path threshold value. Path threshold is defined as the maximum number of paths allowed to reach the target page.The out-degree threshold for a page is highly dependent on the purpose of the page and the website. Based on the out degree threshold the new links are added to the website. By adding new links to the structure if out degree threshold value exceeds then the links should be analyzed and make the links to meet out degree threshold. Consider an example for constructing the mini sessions from the log file present in the website. Consider the website which consists of 10 web pages which is shown in the figure 2. The web pages are represented by 1 to 10.Let Consider 10th page as the target page.

User start Browsing

Figure 2 Example of a mini session

D.Thamaraiselvi,IJRIT

317

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320

The user start browsing from web page1 browses 4 and 7 and backtracks to 4.Then from 4 the user visits 3,2,5,9 and backtracks to 2. Then the user goes from 2 to 6 and finally reaches the target page 10. The mini session S can be denoted by S={{ 1,4,7},{3,2,5,9},{6,10}} where an element in S represents a set of path traversed by the user. In this example, the mini session S has three paths as the user backtracks at 7and 9 before reaching the target 10. The user has traversed three paths before reaching the target. To help user reach the target faster is to introduce more links. The extra links can be added in many ways. If a link is added from 4 to 10, the user can directly reach 10 via 7 and hence reach the target in the first path. Thus, adding this link to the web pages saves the user two paths. Similarly, establishing a link from 2 to 10 enables the user to reach the target in the second path. Hence, this saves the user one path. There is also possibility to insert a link from 5 to 10, and this is considered the same as linking 2 to 10. This is because both 2 and 5 are pages visited in the second path, so linking either one to 10 saves only one path. Yet, another possibility is to link 3 to 6, a non-target page. In this case, the user does not follow the new link, because this link does not directly connect a page to the target. The optimal solutions are obtained from the MP model. Even though the optimal solutions are obtained the MP model mainly focused on the web page navigation of the website. The techniques that can accurately identify user’s targets data are critical to the MP model. III.

PROPOSED SYSTEM

Difficulty in identify user’s targets page is reported as the problem that triggers most customers to abandon a website and switch to a competitor. The proposed system architecture diagram is shown in the figure 3.

Figure 3 Proposed System The user searches the content in the websites. The users faces the decision point at each page and the probability of reaching their targets via each link and make navigation decisions according to the web page link .A user is assumed to follow the path that appears most likely to lead the user to the target. A user may backtrack to an already visited page and find the new path if the user could not locate the target page in the current path. Therefore, the number of paths a user has traversed to reach the target as approximate measure to the number of times the user has attempted to locate one target. The backtrack is used to identify the paths that a user has traversed, where a backtrack is defined as a user’s revisit to a previously browsed page. The following are the procedures for improved mathematical programming model. • Analyze the website log files and construct the mini sessions. • Identify the links which are irrelevant to the user from the mini session • Apply improved MP model to add target pages to the website along with the new links. • Based on the path threshold and out degree threshold the new links are added.

D.Thamaraiselvi,IJRIT

318

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320



A.

The performance of the improved website can be analysed by using the two metrics i.e., number of users uses the improved website and the time efficiency of users . CONSTRAINTS FOR PROPOSED SYSTEM The following are the constraints for the proposed system.

a)

OUT DEGREE THRESHOLD The web pages are generally classified into two types: index page and content page. The index page consists of links which is used to help the users for better navigation. The content page consists of information and does not consist of many links. The out-degree threshold for a page is highly dependent on the purpose of the page and the website. The out degree threshold value for index pages should be larger than content pages.

b) PATH THRESHOLD Path threshold is defined as the maximum number of paths allowed to reach the target page. c)

IDENTIFYING TARGET PAGE The target pages are identified based on the time spend by the user in the website and also by adding the tags to the user search and identify the target pages.

Based on these three constraints the improved MP model is used to navigate the web pages effectively and also identify the user’s target page. The web pages are navigated effectively with minimum changes to the current structure based on the user’s preferences. IV.

CONCLUSION

The web mining with web page navigations are now consider as the emerging trend in the field of market, educational system and other fields which is based on the web. This model is effective for the informational websites whose content are stable over time. It is also used in website maintenance on a progressive basis.The optimal solutions obtained from the model are very effective to real world websites. This model provides balance between user and the web page navigation. V.

REFERENCES

[1] M.Chen and U.Young Ryu ‘Facilitating Effective User Navigation through Website Structure Improvement’ IEEE Trans.Knowledge and Data Eng, Vol. 25, No.3, 2013. [2] M. Eirinaki and M. Vazirgiannis, “Web Mining for Web Personalization,” ACM Trans. Internet Technology, vol. 3, no. 1, pp. 1-27, 2003. [3] R. Gupta, A. Bagchi, and S. Sarkar, “Improving Linkage of Web Pages,” INFORMS J. Computing, vol. 19, no. 1, pp. 127-136, 2007. [4] R.Cristobal, V.Sebastian, Z. Ameliaand B. Paul ‘Applying Web usage mining for personalizing hyperlinks in Webbased adaptive educational systems’ Science Direct Computers & Education, Vol. 53,pp. 828–840,2009. [5] Fu Y., Shih M.Y., Creado M and Ju C. ‘Reorganizing Web Sites Based on User Access Patterns’ Intelligent Systems in Accounting, Finance and Management, Vol. 11, No. 1, pp. 39-53,2002. [6] M. Ivory and M. Hearst, “Improving Web Site Design,” IEEE Internet Computing, vol. 6, no. 2, pp. 56-63, Mar. 2002. [7] H. Kao, S. Lin, J. Ho, and M. Chen, “Mining Web Informative Structures and Contents Based on Entropy Analysis,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 1, pp. 41-55, Jan. 2004. [8] J. Kim and B. Yoo, “Toward the Optimal Link Structure of the Cyber Shopping Mall,” Int’l J. Human-Computer Studies, vol. 52, no. 3, pp. 531-551, 2000. [9] C.C. Lin and L. Tseng, “Website Reorganization Using an Ant Colony System,” Expert Systems with Applications, vol. 37, no. 12, pp. 7598-7605, 2010. [10] Y. Fu, M.Y. Shih, M. Creado, and C. Ju, “Reorganizing Web Sites Based on User Access Patterns,” Intelligent Systems in Accounting, Finance and Management, vol. 11, no. 1, pp. 39-53, 2002.

D.Thamaraiselvi,IJRIT

319

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 5, May 2014, Pg: 315-320

[11] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, “Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization,”Data Mining and Knowledge Discovery, vol. 6, no. 1, pp. 61-82, 2002. [12] B. Mobasher, R. Cooley, and J. Srivastava, “Automatic Personalization Based on Web Usage Mining,” Comm. ACM, vol. 43, no. 8, pp. 142-151, 2000. [13] B. Mobasher, R. Cooley, and J. Srivastava, “Creating Adaptive Web Sites through Usage-Based Clustering of URLs,” Proc. Workshop Knowledge and Data Eng. Exchange, 1999. [14] W. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal, “From User Access Patterns to Dynamic Hypertext Linking,” Computer Networks and ISDN Systems, vol. 28, nos. 7-11, pp. 1007-1014, May 1996. [15] R.E. Bucklin and C. Sismeir, “A Model of Website Browsing Behavior Estimated on Clickstream Data,” J. Marketing Research,vol. 40, no. 3, pp. 249-267, 2003.

D.Thamaraiselvi,IJRIT

320

Mining Website Log for Improvement of Website ...

Keywords—Web Personalization, Web Transformation, Website Design, .... based adaptive educational systems' Science Direct Computers & Education, Vol.

667KB Sizes 0 Downloads 272 Views

Recommend Documents

Website Privacy Preservation for Query Log Publishing
[email protected] magdeburg.de .... Web server access logs, which record all the requests (clicks) made by users [12]. .... used in the HTTP specifications. 4. ATTACKS ... adversary could go about doing an attack to discover infor- mation about a ...

CILIP Website Refresh Project Instructions for Website Editors ...
All editing is performed on the actual website. There is no equivalent of the old mossauthor site. At the moment you can't save your webpage while you work on it ...

for website Fifthlist.pdf
5 SHAILESH GAHA MAGAR Syangja Male 2073-831 1796 54 6 8 NEW. 6 RAMESH SHRESTHA Dhading Male 2073-8538 1837 53.643 1 2 7 8 NEW. 7 DEMAN SING THAPA Tanahun Male 2073-6563 1862 53.643 12 8 18 NEW. 8 SANDESH GYAWALI Kathmandu Male 2073-3825 1865 53.643 2

A Website Mining Model Centered on User Queries
Web servers register important data about the usage of a website. This in- formation generally ... Our model also generates a visualization of the site's content ...

A Content and Structure Website Mining Model
May 26, 2006 - A Content and Structure Website Mining Model ... content and structure organization of a website. .... that use a content management system). 3.

A Website Mining Model Centered on User Queries
Internal queries: These are queries submitted to a website's internal search box. Additionally, external queries that are specified by users for a partic- ular site, will be considered as internal queries for that site. For example,. Google.com queri

Website Brief - Template1
whitworths.com.au ... What key interactions and features will your website support? ... Multilingual to support clients in English, Simplified Chinese and Thai.

Website quote.pdf
... create a 'Website Design Prototype' and present this to. you. The Prototype will show you how we envisage the structure and layout of your website to be. The.

Website Marketing.pdf
Due is a free tool for marketers that offers a time-tracking app to assess how much time is spent. on each marketing project. This helps your organization budget ...

Visit My Website... - fouryoursucess.com
Mr. Micawber, one of those happy creations of the genial Dickens, puts the case in a ...... SEO Elite is the Daddy of all SEO software, helping you to get a.

Website flashcards - Carol Read
the procedure and names another flashcard. The game continues in the same way until all the flashcards on the blackboard have been named. 19. Flashcard ...

website version.pdf
... [4] that my parents had gotten for me. The puzzle consisted of 19 small hexagons numbered 1,...,19 and the goal was to arrange. these into a larger hexagonal arrangement such that every row adds up to 38. Below is an. image of the larger hexagona

Website Audit.pdf
Typography – Most companies have a particular style and. size of a font (or ... you are and what you do, while aligning with the company's ... Website Audit.pdf.

Researcher-maintained Website for Research ...
Website Experiment. ◇ Puppy Linux developer community. ◇ Height of popularity by Q1 2008 (“Deep Thought”). ◇ Domain = puppylinux.org. ◇ Hosting = First ...

DownloadPDF Vitamins for Your Website
Book Synopsis. "Vitamins For Your Website" offers simple tips for everyday people: make your website more user friendly, improve its search engine optimization ...

Project Brief for Website- SMoDMRPA.pdf
Project Brief for Website- SMoDMRPA.pdf. Project Brief for Website- SMoDMRPA.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Project Brief for ...

RTK for Website 2016.pdf
BALLENTINE SUSAN OTHER $637.50. BALLENTINE KAYLA SUPPORT STAFF $9,524.27. BALLIRO ANGELINA OTHER $517.50. BALOGH LINDA OTHER $2,175.00. BALSER DIANE TEACHER $74,890.47. BARON MICHELLE TEACHER $39,670.47. BARRON JANNA SUPPORT STAFF $14,964.85. BARTLET

July 14th For Website Thursday.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Main menu.

SPANISH - Website - REQUEST FORM FOR NEXT YEAR ...
SPANISH - Website - REQUEST FORM FOR NEXT YEAR TRANSPORTATION.pdf. SPANISH - Website - REQUEST FORM FOR NEXT YEAR ...

Website flyer_revised.pdf
Page 1 of 3. HELP US BRING THEM TO AMERICA! They're known as Turkey's Forgotten Dogs. Upwards of 50,000. dogs are homeless and roaming the streets ...

Visit My Website... - fouryoursucess.com
PT Barnum - The Art of Money Getting. SEO Elite. Getting your website ranked high on the Search Engines can sometimes seem almost impossible. Until now.