Building User Communities of Interests by Using Latent Semantic Analysis

Author(s):  
Guandong Xu

Nowadays Web users are facing the problems of information overload and drowning due to the significant and rapid growth in the amount of information and the large number of users. As a result, how to provide Web users more exactly needed information is becoming a critical issue in Web-based information retrieval and data management. In order to address the above difficulties, Web mining was proposed as an efficient means to discover the intrinsic relationships among Web data. In particular, Web usage mining is to discover Web usage patterns and utilize the discovered usage knowledge for constructing interest-oriented user communities, which could be, in turn, used for presenting Web users more personalized Web contents, i.e. Web recommendation. On the other hand, Latent Semantic Analysis (LSA) is one kind of approaches that is used to reveal the inherent correlation resided in co-occurrence activities, such as Web usage data. Moreover, LSA possesses the capability of capturing the hidden knowledge at semantic level that can’t be achieved by traditional methods. In this chapter, we aim to address building user communities of interests via combining Web usage mining and latent semantic analysis. Meanwhile we also present the application of user communities for Web recommendation.

Big Data ◽  
2016 ◽  
pp. 899-928
Author(s):  
Abubakr Gafar Abdalla ◽  
Tarig Mohamed Ahmed ◽  
Mohamed Elhassan Seliaman

The web is a rich data mining source which is dynamic and fast growing, providing great opportunities which are often not exploited. Web data represent a real challenge to traditional data mining techniques due to its huge amount and the unstructured nature. Web logs contain information about the interactions between visitors and the website. Analyzing these logs provides insights into visitors' behavior, usage patterns, and trends. Web usage mining, also known as web log mining, is the process of applying data mining techniques to discover useful information hidden in web server's logs. Web logs are primarily used by Web administrators to know how much traffic they get and to detect broken links and other types of errors. Web usage mining extracts useful information that can be beneficial to a number of application areas such as: web personalization, website restructuring, system performance improvement, and business intelligence. The Web usage mining process involves three main phases: pre-processing, pattern discovery, and pattern analysis. Various preprocessing techniques have been proposed to extract information from log files and group primitive data items into meaningful, lighter level abstractions that are suitable for mining, usually in forms of visitors' sessions. Major data mining techniques in web usage mining pattern discovery are: clustering, association analysis, classification, and sequential patterns discovery. This chapter discusses the process of web usage mining, its procedure, methods, and patterns discovery techniques. The chapter also presents a practical example using real web log data.


Author(s):  
T. Venkat Narayana Rao ◽  
D. Hiranmayi

Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. It is the type of Web mining activity that involves the automatic discovery of out what users are looking for on the Internet. In this chapter methodology of web usage mining explained in detail which are data collection, data preprocessing, knowledge discovery and pattern analysis. The different Web Usage Mining techniques are described, which are used for knowledge and pattern discovery. These are statistical analysis, sequential patterns, classification, association rule mining, clustering, dependency modeling. Pattern analysis is needed to filter out uninterested rules or patterns from the set found in the pattern discovery phase.


Author(s):  
P. K. Nizar Banu ◽  
H. Inbarani

As websites increase in complexity, locating needed information becomes a difficult task. Such difficulty is often related to the websites’ design but also ineffective and inefficient navigation processes. Research in web mining addresses this problem by applying techniques from data mining and machine learning to web data and documents. In this study, the authors examine web usage mining, applying data mining techniques to web server logs. Web usage mining has gained much attention as a potential approach to fulfill the requirement of web personalization. In this paper, the authors propose K-means biclustering, rough biclustering and fuzzy biclustering approaches to disclose the duality between users and pages by grouping them in both dimensions simultaneously. The simultaneous clustering of users and pages discovers biclusters that correspond to groups of users that exhibit highly correlated ratings on groups of pages. The results indicate that the fuzzy C-means biclustering algorithm best and is able to detect partial matching of preferences.


2014 ◽  
Vol 7 (4) ◽  
pp. 27-41
Author(s):  
Hanane Ezzikouri ◽  
Mohamed Fakir ◽  
Cherki Daoui ◽  
Mohamed Erritali

The user behavior on a website triggers a sequence of queries that have a result which is the display of certain pages. The Information about these queries (including the names of the resources requested and responses from the Web server) are stored in a text file called a log file. Analysis of server log file can provide significant and useful information. Web Mining is the extraction of interesting and potentially useful patterns and implicit information from artifacts or activity related to the World Wide Web. Web usage mining is a main research area in Web mining focused on learning about Web users and their interactions with Web sites. The motive of mining is to find users' access models automatically and quickly from the vast Web log file, such as frequent access paths, frequent access page groups and user clustering. Through Web Usage Mining, several information left by user access can be mined which will provide foundation for decision making of organizations, Also the process of Web mining was defined as the set of techniques designed to explore, process and analyze large masses of consecutive information activities on the Internet, has three main steps: data preprocessing, extraction of reasons of the use and the interpretation of results. This paper will start with the presentation of different formats of web log files, then it will present the different preprocessing method that have been used, and finally it presents a system for “Web content and Usage Mining'' for web data extraction and web site analysis using Data Mining Algorithms Apriori, FPGrowth, K-Means, KNN, and ID3.


2017 ◽  
Vol 2 (1) ◽  
pp. 91 ◽  
Author(s):  
Rahmi Rohdiniyah ◽  
Ibnu Asror ◽  
Gede Agung Ary Wisudawan

Penggunaan <em>website</em> pada bidang pendidikan, khususnya sebuah universitas, bertujuan untuk menyimpan berbagai informasi yang ada pada lingkungan universitas tersebut. Untuk itu, perlu dilakukan perbaikan struktur untuk memelihara kualitas dari web. Salah satu teknik yang dapat digunakan adalah dengan menggunakan <em>web usage mining. Web usage mining</em> merupakan salah satu cabang dari <em>web mining</em> yang digunakan untuk menemukan informasi atau pengetahuan yang bermanfaat dari pola navigasi <em>user </em>pada sebuah<em> website</em>. Pada penelitian ini menggunakan metode berbasis graph untuk <em>frequent sequential access patterns</em> dan menggunakan Igracias Universitas Telkom sebagai studi kasusnya. Karena Igracias selalu digunakan oleh seluruh entitas yang ada pada Universitas Telkom. Metode ini  memiliki kelebihan untuk menemukan <em>behavior</em> pola pengaksesan <em>user</em>. Dari implementasi metoda ini didapat pola akses group user secara berurutan.


Author(s):  
Marcello Pecoraro

This chapter aims at providing an overview about the use of statistical methods supporting the Web Usage Mining. Within the first part is described the framework of the Web Usage Mining as a branch of the Web Mining committed to the study of how to use a Website. Then, the data (object of the analysis) are detailed together with the problems linked to the pre-processing. Once clarified, the data origin and their treatment for a correct development of a Web Usage analysis,the focus shifts on the statistical techniques that can be applied to the analysis background, with reference to binary segmentation methods. Those latter allow the discrimination through a response variable that determines the affiliation of the users to a group by considering some characteristics detected on the same users.


Author(s):  
Anne Kao ◽  
Steve Poteet ◽  
Jason Wu ◽  
William Ferng ◽  
Rod Tjoelker ◽  
...  

Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI), when applied to information retrieval, has been a major analysis approach in text mining. It is an extension of the vector space method in information retrieval, representing documents as numerical vectors but using a more sophisticated mathematical approach to characterize the essential features of the documents and reduce the number of features in the search space. This chapter summarizes several major approaches to this dimensionality reduction, each of which has strengths and weaknesses, and it describes recent breakthroughs and advances. It shows how the constructs and products of LSA applications can be made user-interpretable and reviews applications of LSA beyond information retrieval, in particular, to text information visualization. While the major application of LSA is for text mining, it is also highly applicable to cross-language information retrieval, Web mining, and analysis of text transcribed from speech and textual information in video.


2020 ◽  
Vol 17 (11) ◽  
pp. 5113-5116
Author(s):  
Varun Malik ◽  
Vikas Rattan ◽  
Jaiteg Singh ◽  
Ruchi Mittal ◽  
Urvashi Tandon

Web usage mining is the branch of web mining that deals with mining of data over the web. Web mining can be categorized as web content mining, web structure mining, web usage mining. In this paper, we have summarized the web usage mining results executed over the user tool WMOT (web mining optimized tool) based on the WEKA tool that has been used to apply various classification algorithms such as Naïve Bayes, KNN, SVM and tree based algorithms. Authors summarized the results of classification algorithms on WMOT tool and compared the results on the basis of classified instances and identify the algorithms that gives better instances accuracy.


Sign in / Sign up

Export Citation Format

Share Document