A Tag-Based Improved LDA and Web Page Clustering Analysis

With the rapid development of Internet, tag technology has been widely used in various sites. The brief text labels of network resources are greatly convenient for people to access the massive data. Social tags allows the user to use any word ----to tag network objects, and to share these tags, because of its simple and flexible operation, and it has become one of the popular applications. However, there exists some problems like noise of tags, lack of using criteria, and sparse distribution etc. Especially sparsity of tags seriously limits its application in the semantic analysis of web pages. This paper, by exploiting the user-related tag expansion method to overcome this problem, at the same time by using the topic model----LDA to model the web tags, mine its potential topic from the large-scale web page, and obtain the topic distribution of the text to the text clustering analysis. The experimental results show that, compared with the traditional clustering algorithm, the method of based LDA clustering on the analysis of the web tags have a larger increase.

Download Full-text

Web Mining by Automatically Organizing Web Pages into Categories

Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications - Advances in Intelligent Information Technologies ◽

10.4018/978-1-60566-144-5.ch012 ◽

2011 ◽

pp. 214-231 ◽

Cited By ~ 1

Author(s):

Ben Choi

Keyword(s):

Artificial Intelligence ◽

Web Mining ◽

Clustering Algorithm ◽

Constant Factor ◽

Web Pages ◽

Web Page ◽

Number Of Clusters ◽

Web Contents ◽

Web Page Clustering ◽

The Web

Web mining aims for searching, organizing, and extracting information on the Web and search engines focus on searching. The next stage of Web mining is the organization of Web contents, which will then facilitate the extraction of useful information from the Web. This chapter will focus on organizing Web contents. Since a majority of Web contents are stored in the form of Web pages, this chapter will focus on techniques for automatically organizing Web pages into categories. Various artificial intelligence techniques have been used; however the most successful ones are classification and clustering. This chapter will focus on clustering. Clustering is well suited for Web mining by automatically organizing Web pages into categories each of which contain Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain, until now there is no such a method suitable for Web page clustering. To address this problem, this chapter describes a method to discover a constant factor that characterizes the Web domain and proposes a new method for automatically determining the number of clusters in Web page datasets. This chapter also proposes a new bi-directional hierarchical clustering algorithm, which arranges individual Web pages into clusters and then arranges the clusters into larger clusters and so on until the average inter-cluster similarity approaches the constant factor. Having the constant factor together with the algorithm, this chapter provides a new clustering system suitable for mining the Web.

Download Full-text

Improved Web page clustering algorithm based on partial tag tree matching

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.00818 ◽

2010 ◽

Vol 30 (3) ◽

pp. 818-820

Author(s):

Rui LI ◽

Jun-yu ZENG ◽

Si-wang ZHOU

Keyword(s):

Clustering Algorithm ◽

Web Page ◽

Web Page Clustering

Download Full-text

A method of query expansion based on topic models and user profile for search in folksonomy

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210508 ◽

2021 ◽

pp. 1-11

Author(s):

Zhinan Gou ◽

Yan Li

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Information Overload ◽

Topic Model ◽

User Profile ◽

Expansion Method ◽

Collaborative Tagging ◽

Search Query ◽

Tagging System ◽

The Web

With the development of the web 2.0 communities, information retrieval has been widely applied based on the collaborative tagging system. However, a user issues a query that is often a brief query with only one or two keywords, which leads to a series of problems like inaccurate query words, information overload and information disorientation. The query expansion addresses this issue by reformulating each search query with additional words. By analyzing the limitation of existing query expansion methods in folksonomy, this paper proposes a novel query expansion method, based on user profile and topic model, for search in folksonomy. In detail, topic model is constructed by variational antoencoder with Word2Vec firstly. Then, query expansion is conducted by user profile and topic model. Finally, the proposed method is evaluated by a real dataset. Evaluation results show that the proposed method outperforms the baseline methods.

Download Full-text

A Web Page Clustering Method Based on Formal Concept Analysis

Information ◽

10.3390/info9090228 ◽

2018 ◽

Vol 9 (9) ◽

pp. 228 ◽

Cited By ~ 1

Author(s):

Zuping Zhang ◽

Jing Zhao ◽

Xiping Yan

Keyword(s):

Formal Concept Analysis ◽

Concept Lattice ◽

Concept Analysis ◽

Formal Context ◽

Formal Concept ◽

Web Pages ◽

Web Page ◽

Data Links ◽

Web Page Clustering ◽

The Web

Web page clustering is an important technology for sorting network resources. By extraction and clustering based on the similarity of the Web page, a large amount of information on a Web page can be organized effectively. In this paper, after describing the extraction of Web feature words, calculation methods for the weighting of feature words are studied deeply. Taking Web pages as objects and Web feature words as attributes, a formal context is constructed for using formal concept analysis. An algorithm for constructing a concept lattice based on cross data links was proposed and was successfully applied. This method can be used to cluster the Web pages using the concept lattice hierarchy. Experimental results indicate that the proposed algorithm is better than previous competitors with regard to time consumption and the clustering effect.

Download Full-text

Extracting Top-k Company Acquisition Relations From the Web

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2017100102 ◽

2017 ◽

Vol 13 (4) ◽

pp. 27-41 ◽

Cited By ~ 1

Author(s):

Jie Zhao ◽

Jianfei Wang ◽

Jia Yang ◽

Peiquan Jin

Keyword(s):

Rapid Development ◽

Relation Extraction ◽

Experimental Results ◽

Competitive Intelligence ◽

Web Pages ◽

Web Content ◽

Web Page ◽

Competitive Strategies ◽

The Web ◽

Novel Algorithm

Company acquisition relation reflects a company's development intent and competitive strategies, which is an important type of enterprise competitive intelligence. In the traditional environment, the acquisition of competitive intelligence mainly relies on newspapers, internal reports, and so on, but the rapid development of the Web introduces a new way to extract company acquisition relation. In this paper, the authors study the problem of extracting company acquisition relation from huge amounts of Web pages, and propose a novel algorithm for company acquisition relation extraction. The authors' algorithm considers the tense feature of Web content and classification technology of semantic strength when extracting company acquisition relation from Web pages. It first determines the tense of each sentence in a Web page, which is then applied in sentences classification so as to evaluate the semantic strength of the candidate sentences in describing company acquisition relation. After that, the authors rank the candidate acquisition relations and return the top-k company acquisition relation. They run experiments on 6144 pages crawled through Google, and measure the performance of their algorithm under different metrics. The experimental results show that the algorithm is effective in determining the tense of sentences as well as the company acquisition relation.

Download Full-text

Translation of news reports related to COVID-19 of Japanese Linguistics based on page link mining

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189296 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8981-8988

Author(s):

Xiaohua Liu

Keyword(s):

Clustering Algorithm ◽

Web Content ◽

Web Page ◽

News Reports ◽

The Face ◽

Public Emergency ◽

Japanese Linguistics ◽

Web Page Clustering ◽

Content Clustering ◽

Basic Content

In the face of the current epidemic situation, news reports are facing the problem of higher accuracy. The speed and accuracy of public emergency news depends on the accuracy of web page links and tags clustering. An improved web page clustering method based on the combination of topic clustering and structure clustering is proposed in this paper. The algorithm takes the result of web page structure clustering as the weight factor. Combined with the web content clustering by K-means algorithm, the basic content that meets the conditions is selected. Through the improved translator of clustering algorithm, it is translated into Chinese and compared with the target content to analyze the similarity. It realized the translation aim of new crown virus epidemic related news report of Japanese Linguistics based on page link mining.

Download Full-text

On Internet Resources Service System Based on SN-Network Service Mmodel and SOA Framework

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.846-847.1868 ◽

2013 ◽

Vol 846-847 ◽

pp. 1868-1872

Author(s):

Shuai Gang

Keyword(s):

Web Services ◽

Web Service ◽

Large Scale ◽

Rapid Development ◽

Ideological And Political Education ◽

Political Education ◽

The Internet ◽

Internet Resources ◽

Resource Service ◽

The Web

In recent years, the number and size of Web services on the Internet have a rapid development. Industry and academia start to study the web service. In Internet resources, if the web cannot be found, the web service will become meaningless. So for web services, large-scale managements and problems are the keys of the study of Internet service resources. This paper studies large-scale distributed web services in network resources based on SOA architecture ideas. It also designs the unified management and organization system of ideological and political education which treat the ideological and political education as the content. It proposes SN network resource service model of ideological and political education. With the development and popularization of the Internet today, the study on Internet resources of ideological and political education in this paper provides a theoretical reference for the innovation of the ideological and political education.

Download Full-text

PLSA-Based Personalized Information Retrieval with Network Regularization

Journal of Information Technology Research ◽

10.4018/jitr.2019010108 ◽

2019 ◽

Vol 12 (1) ◽

pp. 105-116

Author(s):

Qiuyu Zhu ◽

Dongmei Li ◽

Cong Dai ◽

Qichen Han ◽

Yi Lin

Keyword(s):

Information Retrieval ◽

Semantic Analysis ◽

Topic Model ◽

Rapid Development ◽

Probabilistic Latent Semantic Analysis ◽

Retrieval Model ◽

User Interest ◽

Model Based ◽

User Query ◽

Academic Information

With the rapid development of the Internet, the information retrieval model based on the keywords matching algorithm has not met the requirements of users, because people with various query history always have different retrieval intentions. User query history often implies their interests. Therefore, it is of great importance to enhance the recall ratio and the precision ratio by applying query history into the judgment of retrieval intentions. For this sake, this article does research on user query history and proposes a method to construct user interest model utilizing query history. Coordinately, the authors design a model called PLSA-based Personalized Information Retrieval with Network Regularization. Finally, the model is applied into academic information retrieval and the authors compare it with Baidu Scholar and the personalized information retrieval model based on the probabilistic latent semantic analysis topic model. The experiment results prove that this model can effectively extract topics and retrieves back results more satisfied for users' requirements. Also, this model improves the effect of retrieval results apparently. In addition, the retrieval model can be utilized not only in the academic information retrieval, but also in the personalized information retrieval on microblog search, associate recommendation, etc.

Download Full-text

Image-Text Joint Learning for Social Images with Spatial Relation Model

Complexity ◽

10.1155/2020/1543947 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Jiangfan Feng ◽

Xuejun Fu ◽

Yao Zhou ◽

Yuling Zhu ◽

Xiaobo Luo

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Spatial Relationship ◽

Spatial Relation ◽

Spatial Relations ◽

Image Features ◽

Sensor Technology ◽

Joint Learning ◽

Social Images

The rapid developments in sensor technology and mobile devices bring a flourish of social images, and large-scale social images have attracted increasing attention to researchers. Existing approaches generally rely on recognizing object instances individually with geo-tags, visual patterns, etc. However, the social image represents a web of interconnected relations; these relations between entities carry semantic meaning and help a viewer differentiate between instances of a substance. This article forms the perspective of the spatial relationship to exploring the joint learning of social images. Precisely, the model consists of three parts: (a) a module for deep semantic understanding of images based on residual network (ResNet); (b) a deep semantic analysis module of text beyond traditional word bag methods; (c) a joint reasoning module from which the text weights obtained using image features on self-attention and a novel tree-based clustering algorithm. The experimental results demonstrate the effectiveness of using Flickr30k and Microsoft COCO datasets. Meanwhile, our method considers spatial relations while matching.

Download Full-text

Web Page Clustering using Heuristic Search in the Web Graph

10.21236/ada457111 ◽

2006 ◽

Cited By ~ 6

Author(s):

Ron Bekkerman ◽

Shlomo Zilberstein ◽

James Allan

Keyword(s):

Heuristic Search ◽

Web Page ◽

Web Graph ◽

Web Page Clustering ◽

The Web

Download Full-text