An In-Depth Analysis of The Data Mining Algorithms and Their Effective Applicability in Health and Biomedical Informatics

Nowadays, the Health care industry is one of the fastest-growing industries. As we already know, health care has researched very widely, introducing many medical data that is not easy to mine. Data mining is an approach that helps to discover essential data from massive data or collection of data. So, in medical Science, there is a need for tools that help analyses the data, extract the significant result from massive data, and discover efficient use of information. Generally, three things are mandatory in medical for every patient. First is patient details, diagnosis and medications. Converting these data into a basic pattern for predicting the patient disease helps in early diagnosis. This research mainly focuses on the data mining approach, which is widely considered in the medical field.

Download Full-text

A Data Mining Approach for Generation of Control Signatures

Journal of Manufacturing Science and Engineering ◽

10.1115/1.1511524 ◽

2002 ◽

Vol 124 (4) ◽

pp. 923-926 ◽

Cited By ~ 13

Author(s):

Andrew Kusiak

Keyword(s):

Data Mining ◽

Metal Forming ◽

Rough Set Theory ◽

Decision Rules ◽

Forming Process ◽

Data Mining Algorithms ◽

Data Mining Approach ◽

Metal Forming Process ◽

Material Forming ◽

Mining Algorithms

Data mining offers methodologies and tools for data analysis, discovery of new knowledge, and autonomous process control. This paper introduces basic data mining algorithms. An approach based on rough set theory is used to derive associations among control parameters and the product quality in the form of decision rules. The model presented in the paper produces control signatures leading to good quality products of a metal forming process. The computational results reported in the paper indicate that data mining opens a new avenue for decision-making in material forming industry.

Download Full-text

Fault Monitoring of Wind Turbine Generator Brushes: A Data-Mining Approach

Journal of Solar Energy Engineering ◽

10.1115/1.4005624 ◽

2012 ◽

Vol 134 (2) ◽

Cited By ~ 15

Author(s):

Anoop Verma ◽

Andrew Kusiak

Keyword(s):

Data Mining ◽

Wind Turbine ◽

Wind Turbines ◽

Turbine Generator ◽

Wind Turbine Generator ◽

Data Mining Algorithms ◽

Data Mining Approach ◽

Carbon Brushes ◽

Critical Components ◽

Mining Algorithms

Components of wind turbines are subjected to asymmetric loads caused by variable wind conditions. Carbon brushes are critical components of the wind turbine generator. Adequately maintaining and detecting abnormalities in the carbon brushes early is essential for proper turbine performance. In this paper, data-mining algorithms are applied for early prediction of carbon brush faults. Predicting generator brush faults early enables timely maintenance or replacement of brushes. The results discussed in this paper are based on analyzing generator brush faults that occurred on 27 wind turbines. The datasets used to analyze faults were collected from the supervisory control and data acquisition (SCADA) systems installed at the wind turbines. Twenty-four data-mining models are constructed to predict faults up to 12 h before the actual fault occurs. To increase the prediction accuracy of the models discussed, a data balancing approach is used. Four data-mining algorithms were studied to evaluate the quality of the models for predicting generator brush faults. Among the selected data-mining algorithms, the boosting tree algorithm provided the best prediction results. Research limitations attributed to the available datasets are discussed.

Download Full-text

Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study (Preprint)

10.2196/preprints.18828 ◽

2020 ◽

Cited By ~ 5

Author(s):

Seyed Mohammad Ayyoubzadeh ◽

Seyed Mehdi Ayyoubzadeh ◽

Hoda Zahedi ◽

Mahnaz Ahmadi ◽

Sharareh R Niakan Kalhori

Keyword(s):

Data Mining ◽

Health Care ◽

Linear Regression ◽

Short Term Memory ◽

Google Trends ◽

Health Care Managers ◽

Health Crisis ◽

Data Mining Algorithms ◽

Performance Metric ◽

Mining Algorithms

BACKGROUND The recent global outbreak of coronavirus disease (COVID-19) is affecting many countries worldwide. Iran is one of the top 10 most affected countries. Search engines provide useful data from populations, and these data might be useful to analyze epidemics. Utilizing data mining methods on electronic resources’ data might provide a better insight into the COVID-19 outbreak to manage the health crisis in each country and worldwide. OBJECTIVE This study aimed to predict the incidence of COVID-19 in Iran. METHODS Data were obtained from the Google Trends website. Linear regression and long short-term memory (LSTM) models were used to estimate the number of positive COVID-19 cases. All models were evaluated using 10-fold cross-validation, and root mean square error (RMSE) was used as the performance metric. RESULTS The linear regression model predicted the incidence with an RMSE of 7.562 (SD 6.492). The most effective factors besides previous day incidence included the search frequency of handwashing, hand sanitizer, and antiseptic topics. The RMSE of the LSTM model was 27.187 (SD 20.705). CONCLUSIONS Data mining algorithms can be employed to predict trends of outbreaks. This prediction might support policymakers and health care managers to plan and allocate health care resources accordingly.

Download Full-text

Prediction for rice yield using data mining approach in Ranga Reddy district of Telangana, India

Journal of Agrometeorology ◽

10.54386/jam.v23i2.75 ◽

2021 ◽

Vol 23 (2) ◽

pp. 242-248

Author(s):

BABY AKULA ◽

R.S.PARMAR ◽

M. P. RAJ ◽

K. INDUDHAR REDDY

Keyword(s):

Data Mining ◽

Logistic Model ◽

Yield Data ◽

Data Mining Algorithms ◽

Data Mining Approach ◽

Rice Yields ◽

Using Data ◽

Mlp Classifier ◽

Mining Algorithms ◽

Best Fit

In order to explore the possibility of crop estimation, data mining approach being multidisciplinary was followed. The district of Ranga Reddy, Telangana State, India has been chosen for the study and its year wise average yield data of rice and daily weather over a period of 31 years i.e. from 1988-2019 (30th to 47th Standard Meteorological Weeks). Data mining tool WEKA (V3.8.1). Min- Max Normalization technique followed by Feature Selection algorithm, ‘cfsSubsetEval’ was also adopted to improve quality and accuracy of data mining algorithms. Thus, after cleaning and sorting of data, five classifiers viz., Logistic, MLP (Multi Layer Perceptron), J48 Classifier, LMT (Logistic Model Trees) and PART Classifier were employed over the trained data. The results indicated that the function based and tree based models have better performance over rule based model. In case of function based two models examined, viz., Logistic and MLP, the later performed better over Logistic model. Between tree based two models, LMT performed better over J48. Thus, MLP classifier model found to be the best fit model in predicting rice yields as it recorded an accuracy of 74.19 %, sensitivity of 0.742 and precision of 0.743 as compared with other models. The MLP has also achieved the highest F1 score of (0.742) and MCC (0.581).

Download Full-text

Problems of KDD Cup 99 Dataset Existed and Data Preprocessing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.667.218 ◽

2014 ◽

Vol 667 ◽

pp. 218-225 ◽

Cited By ~ 5

Author(s):

Yan Wang ◽

Kun Yang ◽

Xiang Jing ◽

Huang Long Jin

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Detection System ◽

Data Preprocessing ◽

Vital Role ◽

Data Mining Algorithms ◽

Network Intrusion ◽

Depth Analysis ◽

Mining Algorithms ◽

Kdd Cup 99

KDD Cup 99 dataset is not only the most widely used dataset in intrusion detection, but also the de facto benchmark on evaluating the performance merits of intrusion detection system. Nevertheless there are a lot of issues in this dataset which cannot be omitted. In order to establish good data mining models in intrusion detection and find the appropriate network intrusion attack types’ features, researchers should have a well-known understanding on this dataset. In this paper, first and foremost we have made an in-depth analysis on the problems which the dataset are existed, and given the related solutions. Secondly, we also have carried out plenty data preprocessing on the 10% subset of KDD Cup 99 dataset’s training set, giving better results to the following process. What’s more, by comparing 10 common kinds of data mining algorithms in our experiment, we have analyzed and summarized that data preprocessing plays a vital role on the performance and importance to data mining algorithms.

Download Full-text

Hybrid Data Mining for Medical Applications

Handbook of Research on Modern Systems Analysis and Design Technologies and Applications ◽

10.4018/978-1-59904-887-1.ch029 ◽

2009 ◽

pp. 523-543

Author(s):

Syed Zahid Hassan ◽

Brijesh Verma

Keyword(s):

Data Mining ◽

Medical Applications ◽

Medical Database ◽

Data Repositories ◽

Data Mining Algorithms ◽

Hybrid Approaches ◽

Data Mining Approach ◽

Hybrid Data ◽

Mining Algorithms ◽

Existing Data

This chapter focuses on hybrid data mining algorithms and their use in medical applications. It reviews existing data mining algorithms and presents a novel hybrid data mining approach, which takes advantage of intelligent and statistical modeling of data mining algorithms to extract meaningful patterns from medical data repositories. Various hybrid combinations of data mining algorithms are formulated and tested on a benchmark medical database. The chapter includes the experimental results with existing and new hybrid approaches to demonstrate the superiority of hybrid data mining algorithms over standard algorithms.

Download Full-text

An Entropy-Based Car Failure Detection Method Based on Data Acquisition Pipeline

Entropy ◽

10.3390/e21040426 ◽

2019 ◽

Vol 21 (4) ◽

pp. 426 ◽

Cited By ~ 5

Author(s):

Bartosz Kowalik ◽

Marcin Szpyrka

Keyword(s):

Data Mining ◽

Detection Method ◽

Electronic Devices ◽

Failure Detection ◽

Point Of View ◽

Electronic Control ◽

Data Mining Algorithms ◽

Data Mining Approach ◽

Control Units ◽

Mining Algorithms

Modern cars are equipped with plenty of electronic devices called Electronic Control Units (ECU). ECUs collect diagnostic data from a car’s components such as the engine, brakes etc. These data are then processed, and the appropriate information is communicated to the driver. From the point of view of safety of the driver and the passengers, the information about the car faults is vital. Regardless of the development of on-board computers, only a small amount of information is passed on to the driver. With the data mining approach, it is possible to obtain much more information from the data than it is provided by standard car equipment. This paper describes the environment built by the authors for data collection from ECUs. The collected data have been processed using parameterized entropies and data mining algorithms. Finally, we built a classifier able to detect a malfunctioning thermostat even if the car equipment does not indicate it.

Download Full-text

Data Mining Algorithms: An Overview

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v15i6.1615 ◽

2016 ◽

Vol 15 (6) ◽

pp. 6806-6813 ◽

Cited By ~ 2

Author(s):

Sethunya R Joseph ◽

Hlomani Hlomani ◽

Keletso Letsholo

Keyword(s):

Data Mining ◽

Large Scale ◽

Predictive Analytics ◽

Large Data ◽

Knowledge Discovery In Databases ◽

Data Sets ◽

Data Mining Algorithms ◽

Data Extract ◽

Mining Algorithms ◽

Operational Processes

The research on data mining has successfully yielded numerous tools, algorithms, methods and approaches for handling large amounts of data for various purposeful use andÂ Â problem solving. Data mining has become an integral part of many application domains such as data ware housing, predictive analytics, business intelligence, bio-informatics and decision support systems. Prime objective of data mining is to effectively handle large scale data, extract actionable patterns, and gain insightful knowledge. Data mining is part and parcel of knowledge discovery in databases (KDD) process. Success and improved decision making normally depends on how quickly one can discover insights from data. These insights could be used to drive better actions which can be used in operational processes and even predict future behaviour. This paper presents an overview of various algorithms necessary for handling large data sets. These algorithms define various structures and methods implemented to handle big data. The review also discusses the general strengths and limitations of these algorithms. This paper can quickly guide or an eye opener to the data mining researchers on which algorithm(s) to select and apply in solving the problems they will be investigating.

Download Full-text

Prediction of Status Patterns of Wind Turbines: A Data-Mining Approach

Journal of Solar Energy Engineering ◽

10.1115/1.4003188 ◽

2011 ◽

Vol 133 (1) ◽

Cited By ~ 25

Author(s):

Andrew Kusiak ◽

Anoop Verma

Keyword(s):

Data Mining ◽

Wind Turbines ◽

Association Rule ◽

Performance Monitoring ◽

Prediction Models ◽

Rule Mining ◽

Data Mining Algorithms ◽

Data Mining Approach ◽

The Status ◽

Mining Algorithms

This paper presents the application of data-mining techniques for identification and prediction of status patterns in wind turbines. Early prediction of status patterns benefits turbine maintenance by indicating the deterioration of components. An association rule mining algorithm is used to identify frequent status patterns of turbine components and systems that are in turn predicted using historical wind turbine data. The status patterns are predicted at six time periods spaced at 10 min intervals. The prediction models are generated by five data-mining algorithms. The random forest algorithm has produced the best prediction results. The prediction results are used to develop a component performance monitoring scheme.

Download Full-text

The Exploration of Data Mining in Face of Cloud Computing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.664.1066 ◽

2013 ◽

Vol 664 ◽

pp. 1066-1071

Author(s):

Chen Wang ◽

Shu Xiang Li

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Massive Data ◽

Data Mining Algorithms ◽

Mining Algorithms

This paper first discusses the limitations and shortcomings of data mining under the conditions of massive data. Combined with the advantages of cloud computing, a data mining architecture for cloud computing is designed, and on this basis, the paper discusses the improvement of data mining algorithms for cloud computing. As a theoretical exploration, the paper proposed useful suggestions for the optimization of data mining in face of cloud computing.

Download Full-text