Solving the problem of calendar data preprocessing during the implementation of Data Mining technology

At the moment, dirty data, that is, low-quality data, is becoming one of the main problems of effectively solving Data Mining tasks. Since the source data is accumulated from a variety of sources, the probability of getting dirty data is very high. In this regard, one of the most important tasks that have to be solved during the implementation of the Data Mining process is the initial processing (clearing) of data, i.e. preprocessing. It should be noted that preprocessing calendar data is a rather time-consuming procedure that can take up to half of the entire time of implementing the Data Mining technology. Reducing the time spent on the data cleaning procedure can be achieved by automating this process using specially designed tools (algorithms and programs). At the same time, of course, it should be remembered that the use of the above elements does not guarantee one hundred percent cleaning of "dirty" data, and in some cases may even lead to additional errors in the source data. The authors developed a model for automated preprocessing of calendar data based on parsing and regular expressions. The proposed algorithm is characterized by flexible configuration of preprocessing parameters, fairly simple implementability and high interpretability of results, which in turn provides additional opportunities for analyzing unsuccessful results of Data Mining technology application. Despite the fact that the proposed algorithm is not a tool for cleaning absolutely all types of dirty calendar data, nevertheless, it successfully functions in a significant part of real practical situations.

Download Full-text

Data Mining Technology Application in False Text Information Recognition

Mobile Information Systems ◽

10.1155/2021/4206424 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Jie Wan ◽

Xue Cao ◽

Kun Yao ◽

Donghui Yang ◽

E. Peng ◽

...

Keyword(s):

Data Mining ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Characteristic Matrix ◽

Mining Technology ◽

Technology Application ◽

Text Information ◽

The Government ◽

Effect Of The Support

False information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real drug advertisements were collected from the official websites to build a database of false and real drug advertisements. Secondly, by performing feature extraction on the text of drug advertisements, this work built a characteristic matrix based on the effective features and assigned positive or negative labels to the feature vector of the matrix according to whether it is a fake medical advertisement or not. Thirdly, this study trained and tested several different classifiers, selected the classification model with the best performance in identifying false drug advertisements, and found the key characteristics that can determine the classification. Finally, the model with the best performance was used to predict new false drug advertisements collected from Sina Weibo. In the case of identifying false drug advertisements, the classification effect of the support vector machine (SVM) classifier established on the feature set after feature selection was the most effective. The findings of this study can provide an effective method for the government to identify and combat false advertisements. This study has a certain reference significance in demonstrating the use of text data mining technology to identify and detect information fraud behavior.

Download Full-text

Research and Application of Data Mining Technology Based on the Genetic Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.1976 ◽

2014 ◽

Vol 644-650 ◽

pp. 1976-1979

Author(s):

Chao Wang ◽

Ying Jie Lian

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Customer Satisfaction ◽

Basic Theory ◽

Mining Technology ◽

Technology Application

This paper introduces the basic theory of the data mining technology and genetic algorithm, analyzed the feasibility of genetic algorithm in data mining technology application, and set customer satisfaction as an example to demonstrate the feasibility and validity of the model.

Download Full-text

Research on the virtual height of urban haze index measurement based on data mining technology

2020 39th Chinese Control Conference (CCC) ◽

10.23919/ccc50068.2020.9188911 ◽

2020 ◽

Author(s):

Chen Yao ◽

Xiaoning Yue ◽

Yujia Song

Keyword(s):

Data Mining ◽

Mining Technology ◽

Virtual Height ◽

Index Measurement

Download Full-text

Advanced Statistical Tools for Improving Yield and Reliability

ISTFA 1999: Conference Proceedings from the 25th International Symposium for Testing and Failure Analysis ◽

10.31399/asm.cp.istfa1999p0233 ◽

1999 ◽

Author(s):

Richard C. Kittler

Keyword(s):

Data Mining ◽

Analysis Data ◽

Neural Nets ◽

Future Trends ◽

Statistical Tools ◽

Mining Technology ◽

Explanatory Variables ◽

Data Coverage ◽

Statistical Approaches ◽

Adequate Data

Abstract Analysis of manufacturing data as a tool for failure analysts often meets with roadblocks due to the complex non-linear behaviors of the relationships between failure rates and explanatory variables drawn from process history. The current work describes how the use of a comprehensive engineering database and data mining technology over-comes some of these difficulties and enables new classes of problems to be solved. The characteristics of the database design necessary for adequate data coverage and unit traceability are discussed. Data mining technology is explained and contrasted with traditional statistical approaches as well as those of expert systems, neural nets, and signature analysis. Data mining is applied to a number of common problem scenarios. Finally, future trends in data mining technology relevant to failure analysis are discussed.

Download Full-text

Power Grid Dispatching Operation and Analysis System Based on Data Mining Technology

2020 2nd International Conference on Information Technology and Computer Application (ITCA) ◽

10.1109/itca52113.2020.00078 ◽

2020 ◽

Author(s):

Liu Ning ◽

Lu Renjie ◽

Yu Xun ◽

Yan Fengjun ◽

Yuan Guanfang

Keyword(s):

Data Mining ◽

Power Grid ◽

Mining Technology ◽

Analysis System

Download Full-text

Application of Data Mining Technology in Music Curriculum Resources

Journal of Physics Conference Series ◽

10.1088/1742-6596/1533/4/042031 ◽

2020 ◽

Vol 1533 ◽

pp. 042031

Author(s):

LianFeng Zou

Keyword(s):

Data Mining ◽

Music Curriculum ◽

Mining Technology

Download Full-text

Framework and performance analysis of college English testing system based on data mining technology

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219152 ◽

2021 ◽

pp. 1-11

Author(s):

Liu Narengerile ◽

Li Di ◽

Keyword(s):

Data Mining ◽

Test System ◽

Test Time ◽

Testing System ◽

Test Results ◽

System Software ◽

Examination System ◽

Mining Technology ◽

College English ◽

And Performance

At present, the college English testing system has become an indispensable system in many universities. However, the English test system is not highly humanized due to problems such as unreasonable framework structure. This paper combines data mining technology to build a college English test framework. The college English test system software based on data mining mainly realizes the computer program to automatically generate test papers, set the test time to automatically judge the test takers’ test results, and give out results on the spot. The test takers log in to complete the test through the test system software. The examination system software solves the functions of printing test papers, arranging invigilation classrooms, invigilating teachers, invigilating process, collecting test papers, scoring and analyzing test papers in traditional examinations. Finally, this paper analyzes the performance of this paper through experimental research. The research results show that the system constructed in this paper has certain practical effects.

Download Full-text