Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm

Author(s):  
Nan-Chao Luo ◽  

The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.

2011 ◽  
Vol 403-408 ◽  
pp. 223-227
Author(s):  
Bao Ling Liu

The Supervisory Information System (SIS) [1]is widely installed in power plant of more than 300MW. Its massive data contains valuable information and resources which requires further excavation. In this paper a way of working conditions analysis based on cluster-based data mining algorithm is explored and experimented to SIS. The results illustrate that the way can identify and analyze the working conditions very well.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Kai Ji

Wireless personal communication network is easily affected by intrusion data in the communication process, resulting in the inability to ensure the security of personal information in wireless communication. Therefore, this paper proposes a malicious intrusion data mining algorithm based on legitimate big data in wireless personal communication networks. The clustering algorithm is used to iteratively obtain the central point of malicious intrusion data and determine its expected membership. The noise in malicious intrusion data is denoised by objective function, and the membership degree of communication data is calculated. The change factor of the neighborhood center of gravity of malicious intrusion data in wireless personal communication network is determined, the similarity between the characteristics of malicious intrusion data by using the Markov distance was determined, and the malicious intrusion data mining of wireless personal communication network supported by legal big data was completed. The experimental results show that the accuracy of mining malicious data is high and the mining time is short.


Author(s):  
Zhi-Hua Zhou

Data mining attempts to identify valid, novel, potentially useful, and ultimately understandable patterns from huge volume of data. The mined patterns must be ultimately understandable because the purpose of data mining is to aid decision-making. If the decision-makers cannot understand what does a mined pattern mean, then the pattern cannot be used well. Since most decision-makers are not data mining experts, ideally, the patterns should be in a style comprehensible to common people. So, comprehensibility of data mining algorithms, that is, the ability of a data mining algorithm to produce patterns understandable to human beings, is an important factor.


Author(s):  
TZUNG-PEI HONG ◽  
CHAN-SHENG KUO ◽  
SHENG-CHAI CHI

Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values. Transactions with quantitative values are however commonly seen in real-world applications. We proposed a fuzzy mining algorithm by which each attribute used only the linguistic term with the maximum cardinality int he mining process. The number of items was thus the same as that of the original attributes, making the processing time reduced. The fuzzy association rules derived in this way are not complete. This paper thus modifies it and proposes a new fuzzy data-mining algorithm for extrating interesting knowledge from transactions stored as quantitative values. The proposed algorithm can derive a more complete set of rules but with more computation time than the method proposed. Trade-off thus exists between the computation time and the completeness of rules. Choosing an appropriate learning method thus depends on the requirement of the application domains.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Jiangang Sun ◽  
Xiaoran Jiang ◽  
Guoliang Yuan ◽  
Zhenhuai Chen

With the continuous improvement of living standards, the level of physical development of adolescents has improved significantly. The physical functions and healthy development of adolescents are relatively slow and even appear to decline. This paper proposes a novel data mining algorithm based on big data for monitoring of adolescent student’s physical health to overcome this problem and enhance young people’s physical fitness and mental health. Since big data technology has positive practical significance in promoting young people’s healthy development and promoting individual health rights, this article will implement commonly used data mining algorithms and Hadoop/Spark big data processing. The algorithm on different platforms verified that the big data platform has good computing performance for the data mining algorithm by comparing the running time. The current work will prove to be a complete physical health data management system and effectively save, process, and analyze adolescents’ physical test data.


Author(s):  
Wenjun Yang ◽  
Jia Guo

E-commerce platform can recommend products to users by analyzing consumers’ purchase behavior preference. In the clustering process, the existing methods of purchasing behavior preference analysis are easy to fall into the local optimal problem, which makes the results of preference analysis inaccurate. Therefore, this paper proposes a method of consumer purchasing behavior preference analysis on e-commerce platform based on data mining algorithm. Create e-commerce platform user portrait template with consumer data records, select attribute variables and set value range. This paper uses data mining algorithm to extract the purchase behavior characteristics of user portrait template, takes the characteristics as the clustering analysis object, designs the clustering algorithm of consumer purchase behavior, and grasps the common points of group behavior. On this basis, the model of consumer purchase behavior preference is established to predict and evaluate the behavior preference. The experimental results show that the accuracy rate of this method is 91.74%, the recall rate is 88.67%, and the F1 value is 90.17%, which are higher than the existing methods, and can provide consumers with more satisfactory product information push.


Author(s):  
R. B. V. SUBRAMANYAM ◽  
A. GOSWAMI

In real world applications, the databases are constantly added with a large number of transactions and hence maintaining latest sequential patterns valid on the updated database is crucial. Existing data mining algorithms can incrementally mine the sequential patterns from databases with binary values. Temporal transactions with quantitative values are commonly seen in real world applications. In addition, several methods have been proposed for representing uncertain data in a database. In this paper, a fuzzy data mining algorithm for incremental mining of sequential patterns from quantitative databases is proposed. Proposed algorithm called IQSP algorithm uses the fuzzy grid notion to generate fuzzy sequential patterns validated on the updated database containing the transactions in the original database and in the incremental database. It uses the information about sequential patterns that are already mined from original database and avoids start-from-scratch process. Also, it minimizes the number of candidates to check as well as number of scans to original database by identifying the potential sequences in incremental database.


2015 ◽  
Vol 8 (4) ◽  
pp. 40
Author(s):  
Aleksandar Karadimce

<p class="zhengwen"><span lang="EN-GB">New cloud-based services are being developed constantly in order to meet the need for faster, reliable and scalable methods for knowledge discovery. The major benefit of the cloud-based services is the efficient execution of heavy computation algorithms in the cloud simply by using Big Data storage and processing platforms. Therefore, we have proposed a model that provides data mining techniques as cloud-based services that are available to users on their demand. The widely known data mining algorithms have been implemented as Map/Reduce jobs that are been executed as services in cloud architecture. The user simply chooses or uploads the dataset to the cloud, makes appropriate settings for the data mining algorithm, executes the job request to be processed and receives the results. The major benefit of this model of cloud-based services is the efficient execution of heavy computation data mining algorithm in the cloud simply by using the Ankus - Open Source Big Data Mining Tool and StarfishHadoop Log Analyzer. The expected outcome of this research is to offer the integration of the cloud-based services for data mining analysis in order to provide researchers with reliable collaborative data mining analysis model.<strong></strong></span></p>


2014 ◽  
Vol 971-973 ◽  
pp. 1459-1462
Author(s):  
Wen Liang Cao ◽  
Li Ping Chen

Data mining has attracted a great deal of attention in the information industry in recent years and can be used for applications rangning from business management, production control, and science exploration etc. Most of the existing data mining algorithms are processing in the centralized systems; however, at present large database is usually distributed. Compared with the frequent itemsets lost and high communication traffic in distributed database conventional and improved algorithm FDM, An improved distributed data mining algorithm LTDM based on association roles is proposed. LTDM algorithm introduces the mapping indicated array mechanism to keep the integrity of frequent itemsets and decrease the communication traffic. The experimental results prove the efficiency of the proposed algorithm. The algorithm can be applied to information retrieval and so on in the digital library.


2018 ◽  
Vol 48 (4) ◽  
pp. 261-266
Author(s):  
G. E. WEI ◽  
L. GAO ◽  
F. SHI

With the continuous development and application of high-speed information technology such as the Internet, the acquisition and utilization of economic intelligence has an important impact on the operation of the national economy and the operation of enterprises. Based on the detailed analysis of data mining algorithms, this paper constructs a user classification model based on clustering algorithm and a user interest feature extraction model based on UR-LDA, and uses the improved K-means algorithm in an unsupervised manner. User clustering was carried out, and data mining experiments were conducted on users of Sina Weibo. The experimental results show that the user data extracted from the interest feature topic is clustered by the improved K-means, and six similar user clusters are obtained. The better clustering results are obtained, which indicates that the classification model constructed in this paper is effective.


Sign in / Sign up

Export Citation Format

Share Document