Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm

The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.

Download Full-text

Data Mining in Conditions Analysis at a Coal-Fired Utility Boiler

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.403-408.223 ◽

2011 ◽

Vol 403-408 ◽

pp. 223-227

Author(s):

Bao Ling Liu

Keyword(s):

Data Mining ◽

Information System ◽

Power Plant ◽

Working Conditions ◽

Massive Data ◽

Data Mining Algorithm ◽

Utility Boiler ◽

Mining Algorithm ◽

The Way

The Supervisory Information System (SIS) [1]is widely installed in power plant of more than 300MW. Its massive data contains valuable information and resources which requires further excavation. In this paper a way of working conditions analysis based on cluster-based data mining algorithm is explored and experimented to SIS. The results illustrate that the way can identify and analyze the working conditions very well.

Download Full-text

Malicious Intrusion Data Mining Algorithm of Wireless Personal Communication Network Supported by Legal Big Data

Wireless Communications and Mobile Computing ◽

10.1155/2021/8321636 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Kai Ji

Keyword(s):

Data Mining ◽

Communication Network ◽

Big Data ◽

Clustering Algorithm ◽

Personal Information ◽

Personal Communication ◽

Communication Process ◽

Data Mining Algorithm ◽

Mining Algorithm ◽

Wireless Personal Communication

Wireless personal communication network is easily affected by intrusion data in the communication process, resulting in the inability to ensure the security of personal information in wireless communication. Therefore, this paper proposes a malicious intrusion data mining algorithm based on legitimate big data in wireless personal communication networks. The clustering algorithm is used to iteratively obtain the central point of malicious intrusion data and determine its expected membership. The noise in malicious intrusion data is denoised by objective function, and the membership degree of communication data is calculated. The change factor of the neighborhood center of gravity of malicious intrusion data in wireless personal communication network is determined, the similarity between the characteristics of malicious intrusion data by using the Markov distance was determined, and the malicious intrusion data mining of wireless personal communication network supported by legal big data was completed. The experimental results show that the accuracy of mining malicious data is high and the mining time is short.

Download Full-text

Comprehensibility of Data Mining Algorithms

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch037 ◽

2011 ◽

pp. 190-195 ◽

Cited By ~ 7

Author(s):

Zhi-Hua Zhou

Keyword(s):

Data Mining ◽

Decision Making ◽

Decision Makers ◽

Data Mining Algorithm ◽

Human Beings ◽

Common People ◽

Data Mining Algorithms ◽

Mining Algorithm ◽

Aid Decision ◽

Mining Algorithms

Data mining attempts to identify valid, novel, potentially useful, and ultimately understandable patterns from huge volume of data. The mined patterns must be ultimately understandable because the purpose of data mining is to aid decision-making. If the decision-makers cannot understand what does a mined pattern mean, then the pattern cannot be used well. Since most decision-makers are not data mining experts, ideally, the patterns should be in a style comprehensible to common people. So, comprehensibility of data mining algorithms, that is, the ability of a data mining algorithm to produce patterns understandable to human beings, is an important factor.

Download Full-text

TRADE-OFF BETWEEN COMPUTATION TIME AND NUMBER OF RULES FOR FUZZY MINING FROM QUANTITATIVE DATA

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488501001071 ◽

2001 ◽

Vol 09 (05) ◽

pp. 587-604 ◽

Cited By ~ 122

Author(s):

TZUNG-PEI HONG ◽

CHAN-SHENG KUO ◽

SHENG-CHAI CHI

Keyword(s):

Data Mining ◽

Computation Time ◽

Data Mining Algorithm ◽

Trade Off ◽

Fuzzy Association Rules ◽

Data Mining Algorithms ◽

Mining Algorithm ◽

Linguistic Term ◽

Complete Set ◽

Mining Algorithms

Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values. Transactions with quantitative values are however commonly seen in real-world applications. We proposed a fuzzy mining algorithm by which each attribute used only the linguistic term with the maximum cardinality int he mining process. The number of items was thus the same as that of the original attributes, making the processing time reduced. The fuzzy association rules derived in this way are not complete. This paper thus modifies it and proposes a new fuzzy data-mining algorithm for extrating interesting knowledge from transactions stored as quantitative values. The proposed algorithm can derive a more complete set of rules but with more computation time than the method proposed. Trade-off thus exists between the computation time and the completeness of rules. Choosing an appropriate learning method thus depends on the requirement of the application domains.

Download Full-text

Data Mining Algorithm for Physical Health Monitoring of Young Students Based on Big Data

Journal of Healthcare Engineering ◽

10.1155/2021/9962906 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Jiangang Sun ◽

Xiaoran Jiang ◽

Guoliang Yuan ◽

Zhenhuai Chen

Keyword(s):

Data Mining ◽

Big Data ◽

Physical Health ◽

Data Management System ◽

Practical Significance ◽

Data Mining Algorithm ◽

Physical Test ◽

Healthy Development ◽

Data Mining Algorithms ◽

Mining Algorithm

With the continuous improvement of living standards, the level of physical development of adolescents has improved significantly. The physical functions and healthy development of adolescents are relatively slow and even appear to decline. This paper proposes a novel data mining algorithm based on big data for monitoring of adolescent student’s physical health to overcome this problem and enhance young people’s physical fitness and mental health. Since big data technology has positive practical significance in promoting young people’s healthy development and promoting individual health rights, this article will implement commonly used data mining algorithms and Hadoop/Spark big data processing. The algorithm on different platforms verified that the big data platform has good computing performance for the data mining algorithm by comparing the running time. The current work will prove to be a complete physical health data management system and effectively save, process, and analyze adolescents’ physical test data.

Download Full-text

Consumers’ Purchase Behavior Preference in E-Commerce Platform Based on Data Mining Algorithm

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2022.16.75 ◽

2022 ◽

Vol 16 ◽

pp. 603-609

Author(s):

Wenjun Yang ◽

Jia Guo

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Product Information ◽

Group Behavior ◽

Purchase Behavior ◽

Data Mining Algorithm ◽

Purchasing Behavior ◽

Preference Analysis ◽

Mining Algorithm ◽

Behavior Preference

E-commerce platform can recommend products to users by analyzing consumers’ purchase behavior preference. In the clustering process, the existing methods of purchasing behavior preference analysis are easy to fall into the local optimal problem, which makes the results of preference analysis inaccurate. Therefore, this paper proposes a method of consumer purchasing behavior preference analysis on e-commerce platform based on data mining algorithm. Create e-commerce platform user portrait template with consumer data records, select attribute variables and set value range. This paper uses data mining algorithm to extract the purchase behavior characteristics of user portrait template, takes the characteristics as the clustering analysis object, designs the clustering algorithm of consumer purchase behavior, and grasps the common points of group behavior. On this basis, the model of consumer purchase behavior preference is established to predict and evaluate the behavior preference. The experimental results show that the accuracy rate of this method is 91.74%, the recall rate is 88.67%, and the F1 value is 90.17%, which are higher than the existing methods, and can provide consumers with more satisfactory product information push.

Download Full-text

A FUZZY DATA MINING ALGORITHM FOR INCREMENTAL MINING OF QUANTITATIVE SEQUENTIAL PATTERNS

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488505003722 ◽

2005 ◽

Vol 13 (06) ◽

pp. 633-652 ◽

Cited By ~ 11

Author(s):

R. B. V. SUBRAMANYAM ◽

A. GOSWAMI

Keyword(s):

Data Mining ◽

Real World ◽

Uncertain Data ◽

Sequential Patterns ◽

Data Mining Algorithm ◽

Fuzzy Data ◽

Incremental Mining ◽

Data Mining Algorithms ◽

Mining Algorithm ◽

Real World Applications

In real world applications, the databases are constantly added with a large number of transactions and hence maintaining latest sequential patterns valid on the updated database is crucial. Existing data mining algorithms can incrementally mine the sequential patterns from databases with binary values. Temporal transactions with quantitative values are commonly seen in real world applications. In addition, several methods have been proposed for representing uncertain data in a database. In this paper, a fuzzy data mining algorithm for incremental mining of sequential patterns from quantitative databases is proposed. Proposed algorithm called IQSP algorithm uses the fuzzy grid notion to generate fuzzy sequential patterns validated on the updated database containing the transactions in the original database and in the incremental database. It uses the information about sequential patterns that are already mined from original database and avoids start-from-scratch process. Also, it minimizes the number of candidates to check as well as number of scans to original database by identifying the potential sequences in incremental database.

Download Full-text

Model of Cloud-Based Services for Data Mining Analysis

Computer and Information Science ◽

10.5539/cis.v8n4p40 ◽

2015 ◽

Vol 8 (4) ◽

pp. 40

Author(s):

Aleksandar Karadimce

Keyword(s):

Data Mining ◽

Big Data ◽

Data Storage ◽

Data Mining Algorithm ◽

Analysis Model ◽

Data Mining Algorithms ◽

Mining Algorithm ◽

Major Benefit ◽

Efficient Execution ◽

Data Mining Analysis

New cloud-based services are being developed constantly in order to meet the need for faster, reliable and scalable methods for knowledge discovery. The major benefit of the cloud-based services is the efficient execution of heavy computation algorithms in the cloud simply by using Big Data storage and processing platforms. Therefore, we have proposed a model that provides data mining techniques as cloud-based services that are available to users on their demand. The widely known data mining algorithms have been implemented as Map/Reduce jobs that are been executed as services in cloud architecture. The user simply chooses or uploads the dataset to the cloud, makes appropriate settings for the data mining algorithm, executes the job request to be processed and receives the results. The major benefit of this model of cloud-based services is the efficient execution of heavy computation data mining algorithm in the cloud simply by using the Ankus - Open Source Big Data Mining Tool and StarfishHadoop Log Analyzer. The expected outcome of this research is to offer the integration of the cloud-based services for data mining analysis in order to provide researchers with reliable collaborative data mining analysis model.

Download Full-text

A Distributed Association Rules Mining Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.971-973.1459 ◽

2014 ◽

Vol 971-973 ◽

pp. 1459-1462

Author(s):

Wen Liang Cao ◽

Li Ping Chen

Keyword(s):

Data Mining ◽

Production Control ◽

Distributed Database ◽

Frequent Itemsets ◽

Data Mining Algorithm ◽

Distributed Data ◽

Information Industry ◽

Data Mining Algorithms ◽

Mining Algorithm ◽

Communication Traffic

Data mining has attracted a great deal of attention in the information industry in recent years and can be used for applications rangning from business management, production control, and science exploration etc. Most of the existing data mining algorithms are processing in the centralized systems; however, at present large database is usually distributed. Compared with the frequent itemsets lost and high communication traffic in distributed database conventional and improved algorithm FDM, An improved distributed data mining algorithm LTDM based on association roles is proposed. LTDM algorithm introduces the mapping indicated array mechanism to keep the integrity of frequent itemsets and decrease the communication traffic. The experimental results prove the efficiency of the proposed algorithm. The algorithm can be applied to information retrieval and so on in the digital library.

Download Full-text

APPLICATION OF DATA MINING ALGORITHM IN INTELLIGENCE ANALYSIS OF ENTERPRISE ECONOMIC INTELLIGENCE

Latin American Applied Research - An international journal ◽

10.52292/j.laar.2018.238 ◽

2018 ◽

Vol 48 (4) ◽

pp. 261-266

Author(s):

G. E. WEI ◽

L. GAO ◽

F. SHI

Keyword(s):

Data Mining ◽

High Speed ◽

Clustering Algorithm ◽

Classification Model ◽

Intelligence Analysis ◽

Data Mining Algorithm ◽

User Interest ◽

User Classification ◽

Model Based ◽

Data Mining Algorithms

With the continuous development and application of high-speed information technology such as the Internet, the acquisition and utilization of economic intelligence has an important impact on the operation of the national economy and the operation of enterprises. Based on the detailed analysis of data mining algorithms, this paper constructs a user classification model based on clustering algorithm and a user interest feature extraction model based on UR-LDA, and uses the improved K-means algorithm in an unsupervised manner. User clustering was carried out, and data mining experiments were conducted on users of Sina Weibo. The experimental results show that the user data extracted from the interest feature topic is clustered by the improved K-means, and six similar user clusters are obtained. The better clustering results are obtained, which indicates that the classification model constructed in this paper is effective.

Download Full-text