Data Mining Approach Improving Decision-Making Competency along the Business Digital Transformation Journey: A Case Study – Home Appliances after Sales Service

Abstract Data mining, as an essential part of artificial intelligence, is a powerful digital technology, which makes businesses predict future trends and alleviate the process of decision-making and enhancing customer experience along their digital transformation journey. This research provides a practical implication – a case study - to provide guidance on analyzing information and predicting repairs in home appliances after sales services business. The main benefit of this practical comparative study of various classification algorithms, by using the Weka tool, is the analysis of information and the prediction of repairs in the home appliances after sales services business. The comparison of algorithms is performed considering different parameters, such as the mean absolute error, root mean square error, relative absolute error and root relative squared error, receiver operating characteristic area, accuracy, Matthews’s correlation coefficient, precision-recall curve, precision, F-measure, recall and statistical criteria. Five classification algorithms such as the Naive Bayes, J48, random forest, K-Nearest Neighbor, and logistic regression were implemented in the dataset. J48 has proved to provide the best accuracy and the lowest error among the other examined algorithms applied to a home appliances after sales services dataset to predict repairs based on product guarantee period. The extracted information and results of an after sales services business by using data mining techniques prove to alleviate the process of streamlining decision-making and provide reliable predictions, especially for the customers, as well as increase businesses’ efficiency along their digital transformation journey.

Download Full-text

Performance Research on Medical Data Classification using Traditional and Soft Computing Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1185.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 990-995

Keyword(s):

Data Mining ◽

Soft Computing ◽

Nearest Neighbor ◽

Classification Performance ◽

Medical Data ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Classification Techniques ◽

Soft Computing Techniques

The world today has made giant leaps in the field of Medicine. There is tremendous amount of researches being carried out in this field leading to new discoveries that is making a heavy impact on the mankind. Data being generated in this field is increasing enormously. A need has arisen to analyze these data in order to find out the meaningful and relevant hidden patterns. These patterns can be used for clinical diagnosis. Data mining is an efficient approach in discovering these patterns. Among the many data mining techniques that exists, this paper aims at analyzing the medical data using various Classification techniques. The classification techniques used in this study include k-Nearest neighbor (kNN), Decision Tree, Naive Bayes which are hard computing algorithms, whereas the soft computing algorithms used in this study include Support Vector Machine (SVM), Artificial Neural Networks (ANN) and Fuzzy k-Means clustering. We have applied these algorithms to three kinds of datasets that are Breast Cancer Wisconsin, Haberman Data and Contraceptive Method Choice dataset. Our results show that soft computing based classification algorithms better classifications than the traditional classification algorithms in terms of various classification performance measures

Download Full-text

Predicting Weather Forecasting State Based on Data Mining Classification Algorithms

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v9i330222 ◽

2021 ◽

pp. 13-24

Author(s):

Fairoz Q. Kareem ◽

Adnan Mohsin Abdulazeez ◽

Dathar A. Hasan

Keyword(s):

Data Mining ◽

Random Forest ◽

Nearest Neighbor ◽

Prediction Models ◽

Weather Forecasting ◽

Model Performance ◽

Machine Learning Techniques ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Unseen Data

Weather forecasting is the process of predicting the status of the atmosphere for certain regions or locations by utilizing recent technology. Thousands of years ago, humans tried to foretell the weather state in some civilizations by studying the science of stars and astronomy. Realizing the weather conditions has a direct impact on many fields, such as commercial, agricultural, airlines, etc. With the recent development in technology, especially in the DM and machine learning techniques, many researchers proposed weather forecasting prediction systems based on data mining classification techniques. In this paper, we utilized neural networks, Naïve Bayes, random forest, and K-nearest neighbor algorithms to build weather forecasting prediction models. These models classify the unseen data instances to multiple class rain, fog, partly-cloudy day, clear-day and cloudy. These model performance for each algorithm has been trained and tested using synoptic data from the Kaggle website. This dataset contains (1796) instances and (8) attributes in our possession. Comparing with other algorithms, the Random forest algorithm achieved the best performance accuracy of 89%. These results indicate the ability of data mining classification algorithms to present optimal tools to predict weather forecasting.

Download Full-text

Machine Learning Based Approaches for Modeling the Output Power of Photovoltaic Array in Real Outdoor Conditions

Electronics ◽

10.3390/electronics9020315 ◽

2020 ◽

Vol 9 (2) ◽

pp. 315 ◽

Cited By ~ 4

Author(s):

Maria ◽

Yassine

Keyword(s):

Machine Learning ◽

Output Power ◽

Nearest Neighbor ◽

Hybrid Approach ◽

Absolute Error ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Photovoltaic Array ◽

Pv Panel

It is important to investigate the long-term performances of an accurate modeling of photovoltaic (PV) systems, especially in the prediction of output power, with single and double diode models as the configurations mainly applied for this purpose. However, the use of one configuration to model PV panel limits the accuracy of its predicted performances. This paper proposes a new hybrid approach based on classification algorithms in the machine learning framework that combines both single and double models in accordance with the climatic condition in order to predict the output PV power with higher accuracy. Classification trees, k-nearest neighbor, discriminant analysis, Naïve Bayes, support vector machines (SVMs), and classification ensembles algorithms are investigated to estimate the PV power under different conditions of the Mediterranean climate. The examined classification algorithms demonstrate that the double diode model seems more relevant for low and medium levels of solar irradiance and temperature. Accuracy between 86% and 87.5% demonstrates the high potential of the classification techniques in the PV power predicting. The normalized mean absolute error up to 1.5% ensures errors less than those obtained from both single-diode and double-diode equivalent-circuit models with a reduction up to 0.15%. The proposed hybrid approach using machine learning (ML) algorithms could be a key solution for photovoltaic and industrial software to predict more accurate performances.

Download Full-text

Recent trends in big data using hadoop

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v8i1.pp39-49 ◽

2019 ◽

Vol 8 (1) ◽

pp. 39

Author(s):

Chetna Kaushal ◽

Deepika Koundal

Keyword(s):

Data Mining ◽

Social Media ◽

Big Data ◽

Nearest Neighbor ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Clustering Techniques ◽

Recent Trends ◽

K Nearest Neighbor Algorithm

<span>Big data refers to huge set of data which is very common these days due to the increase of internet utilities. Data generated from social media is a very common example for the same. This paper depicts the summary on big data and ways in which it has been utilized in all aspects. Data mining is radically a mode of deriving the indispensable knowledge from extensively vast fractions of data which is quite challenging to be interpreted by conventional methods. The paper mainly focuses on the issues related to the clustering techniques in big data. For the classification purpose of the big data, the existing classification algorithms are concisely acknowledged and after that, k-nearest neighbor algorithm is discreetly chosen among them and described along with an example. </span>

Download Full-text

A Comparative Analysis of Classification Algorithms on Diverse Datasets

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.1952 ◽

2018 ◽

Vol 8 (2) ◽

pp. 2790-2795 ◽

Cited By ~ 4

Author(s):

M. Alghobiri

Keyword(s):

Data Mining ◽

Performance Evaluation ◽

Comparative Analysis ◽

Nearest Neighbor ◽

Absolute Error ◽

Classification Algorithms ◽

Kappa Statistics ◽

Data Sets ◽

Evaluation Measures ◽

F Measure

Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

Download Full-text

Perancangan Aplikasi Prediksi Kelulusan Tepat Waktu Bagi Mahasiswa Baru Dengan Teknik Data Mining (Studi Kasus: Data Akademik Mahasiswa STMIK Dipanegara Makassar)

Creative Information Technology Journal ◽

10.24076/citec.2014v1i4.27 ◽

2015 ◽

Vol 1 (4) ◽

pp. 270

Author(s):

Muhammad Syukri Mustafa ◽

I. Wayan Simpen

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Test Results ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Sample Data ◽

New Students ◽

K Nearest Neighbor Algorithm ◽

Using Data ◽

Existing Data

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.

Download Full-text

Assessing the Relation between Mud Components and Rheology for Loss Circulation Prevention Using Polymeric Gels: A Machine Learning Approach

Energies ◽

10.3390/en14051377 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1377

Author(s):

Musaab I. Magzoub ◽

Raj Kiran ◽

Saeed Salehi ◽

Ibnelwaleed A. Hussein ◽

Mustafa S. Nasser

Keyword(s):

Machine Learning ◽

Rheological Properties ◽

Nearest Neighbor ◽

Drilling Fluid ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Wide Range ◽

Machine Learning Approach ◽

Drilling Operations

The traditional way to mitigate loss circulation in drilling operations is to use preventative and curative materials. However, it is difficult to quantify the amount of materials from every possible combination to produce customized rheological properties. In this study, machine learning (ML) is used to develop a framework to identify material composition for loss circulation applications based on the desired rheological characteristics. The relation between the rheological properties and the mud components for polyacrylamide/polyethyleneimine (PAM/PEI)-based mud is assessed experimentally. Four different ML algorithms were implemented to model the rheological data for various mud components at different concentrations and testing conditions. These four algorithms include (a) k-Nearest Neighbor, (b) Random Forest, (c) Gradient Boosting, and (d) AdaBoosting. The Gradient Boosting model showed the highest accuracy (91 and 74% for plastic and apparent viscosity, respectively), which can be further used for hydraulic calculations. Overall, the experimental study presented in this paper, together with the proposed ML-based framework, adds valuable information to the design of PAM/PEI-based mud. The ML models allowed a wide range of rheology assessments for various drilling fluid formulations with a mean accuracy of up to 91%. The case study has shown that with the appropriate combination of materials, reasonable rheological properties could be achieved to prevent loss circulation by managing the equivalent circulating density (ECD).

Download Full-text

Data Mining Approach to Analyze COVID-19 Clinical Dataset

10.53350/pjmhs211561812 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1812-1819

Author(s):

Azita Yazdani ◽

Ramin Ravangard ◽

Roxana Sharifian

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Clinical Signs ◽

Study Data ◽

Mining Machine ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Mining Approach

The new coronavirus has been spreading since the beginning of 2020 and many efforts have been made to develop vaccines to help patients recover. It is now clear that the world needs a rapid solution to curb the spread of COVID-19 worldwide with non-clinical approaches such as data mining, enhanced intelligence, and other artificial intelligence techniques. These approaches can be effective in reducing the burden on the health care system to provide the best possible way to diagnose and predict the COVID-19 epidemic. In this study, data mining models for early detection of Covid-19 in patients were developed using the epidemiological dataset of patients and individuals suspected of having Covid-19 in Iran. C4.5, support vector machine, Naive Bayes, logistic regression, Random Forest, and k-nearest neighbor algorithm were used directly on the dataset using Rapid miner to develop the models. By receiving clinical signs, this model diagnosis the risk of contracting the COVID-19 virus. Examination of the models in this study has shown that the support vector machine with 93.41% accuracy is more efficient in the diagnosis of patients with COVID-19 pandemic, which is the best model among other developed models. Keywords: COVID-19, Data mining, Machine Learning, Artificial Intelligence, Classification

Download Full-text

Performance of Naïve Bayes, C4.5 and KNN using Breast Cancer, Iris and Hypothyroid Datasets

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8795.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2193-2197

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Specific Pattern ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Digital Format ◽

Tree Classifier

Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.

Download Full-text

Analysis and Prediction of CET4 Scores Based on Data Mining Algorithm

Complexity ◽

10.1155/2021/5577868 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Hongyan Wang

Keyword(s):

Data Mining ◽

Linear Regression ◽

Test Score ◽

Nearest Neighbor ◽

Classification Model ◽

Data Mining Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

Classification Efficiency

This paper presents the concept and algorithm of data mining and focuses on the linear regression algorithm. Based on the multiple linear regression algorithm, many factors affecting CET4 are analyzed. Ideas based on data mining, collecting history data and appropriate to transform, using statistical analysis techniques to the many factors influencing the CET-4 test were analyzed, and we have obtained the CET-4 test result and its influencing factors. It was found that the linear regression relationship between the degrees of fit was relatively high. We further improve the algorithm and establish a partition-weighted K-nearest neighbor algorithm. The K-weighted K nearest neighbor algorithm and the partition algorithm are used in the CET-4 test score classification prediction, and the statistical method is used to study the relevant factors that affect the CET-4 test score, and screen classification is performed to predict when the comparison verification will pass. The weight K of the input feature and the adjacent feature are weighted, although the allocation algorithm of the adjacent classification effect has not been significantly improved, but the stability classification is better than K-nearest neighbor algorithm, its classification efficiency is greatly improved, classification time is greatly reduced, and classification efficiency is increased by 119%. In order to detect potential risk graduating students earlier, this paper proposes an appropriate and timely early warning and preschool K-nearest neighbor algorithm classification model. Taking test scores or make-up exams and re-learning as input features, the classification model can effectively predict ordinary students who have not graduated.

Download Full-text