scholarly journals Data Mining Approach Improving Decision-Making Competency along the Business Digital Transformation Journey: A Case Study – Home Appliances after Sales Service

SEEU Review ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. 45-65
Author(s):  
Hyrmet Mydyti

Abstract Data mining, as an essential part of artificial intelligence, is a powerful digital technology, which makes businesses predict future trends and alleviate the process of decision-making and enhancing customer experience along their digital transformation journey. This research provides a practical implication – a case study - to provide guidance on analyzing information and predicting repairs in home appliances after sales services business. The main benefit of this practical comparative study of various classification algorithms, by using the Weka tool, is the analysis of information and the prediction of repairs in the home appliances after sales services business. The comparison of algorithms is performed considering different parameters, such as the mean absolute error, root mean square error, relative absolute error and root relative squared error, receiver operating characteristic area, accuracy, Matthews’s correlation coefficient, precision-recall curve, precision, F-measure, recall and statistical criteria. Five classification algorithms such as the Naive Bayes, J48, random forest, K-Nearest Neighbor, and logistic regression were implemented in the dataset. J48 has proved to provide the best accuracy and the lowest error among the other examined algorithms applied to a home appliances after sales services dataset to predict repairs based on product guarantee period. The extracted information and results of an after sales services business by using data mining techniques prove to alleviate the process of streamlining decision-making and provide reliable predictions, especially for the customers, as well as increase businesses’ efficiency along their digital transformation journey.

The world today has made giant leaps in the field of Medicine. There is tremendous amount of researches being carried out in this field leading to new discoveries that is making a heavy impact on the mankind. Data being generated in this field is increasing enormously. A need has arisen to analyze these data in order to find out the meaningful and relevant hidden patterns. These patterns can be used for clinical diagnosis. Data mining is an efficient approach in discovering these patterns. Among the many data mining techniques that exists, this paper aims at analyzing the medical data using various Classification techniques. The classification techniques used in this study include k-Nearest neighbor (kNN), Decision Tree, Naive Bayes which are hard computing algorithms, whereas the soft computing algorithms used in this study include Support Vector Machine (SVM), Artificial Neural Networks (ANN) and Fuzzy k-Means clustering. We have applied these algorithms to three kinds of datasets that are Breast Cancer Wisconsin, Haberman Data and Contraceptive Method Choice dataset. Our results show that soft computing based classification algorithms better classifications than the traditional classification algorithms in terms of various classification performance measures


Author(s):  
Fairoz Q. Kareem ◽  
Adnan Mohsin Abdulazeez ◽  
Dathar A. Hasan

Weather forecasting is the process of predicting the status of the atmosphere for certain regions or locations by utilizing recent technology. Thousands of years ago, humans tried to foretell the weather state in some civilizations by studying the science of stars and astronomy. Realizing the weather conditions has a direct impact on many fields, such as commercial, agricultural, airlines, etc. With the recent development in technology, especially in the DM and machine learning techniques, many researchers proposed weather forecasting prediction systems based on data mining classification techniques. In this paper, we utilized neural networks, Naïve Bayes, random forest, and K-nearest neighbor algorithms to build weather forecasting prediction models. These models classify the unseen data instances to multiple class rain, fog, partly-cloudy day, clear-day and cloudy. These model performance for each algorithm has been trained and tested using synoptic data from the Kaggle website. This dataset contains (1796) instances and (8) attributes in our possession. Comparing with other algorithms, the Random forest algorithm achieved the best performance accuracy of 89%. These results indicate the ability of data mining classification algorithms to present optimal tools to predict weather forecasting.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 315 ◽  
Author(s):  
Maria ◽  
Yassine

It is important to investigate the long-term performances of an accurate modeling of photovoltaic (PV) systems, especially in the prediction of output power, with single and double diode models as the configurations mainly applied for this purpose. However, the use of one configuration to model PV panel limits the accuracy of its predicted performances. This paper proposes a new hybrid approach based on classification algorithms in the machine learning framework that combines both single and double models in accordance with the climatic condition in order to predict the output PV power with higher accuracy. Classification trees, k-nearest neighbor, discriminant analysis, Naïve Bayes, support vector machines (SVMs), and classification ensembles algorithms are investigated to estimate the PV power under different conditions of the Mediterranean climate. The examined classification algorithms demonstrate that the double diode model seems more relevant for low and medium levels of solar irradiance and temperature. Accuracy between 86% and 87.5% demonstrates the high potential of the classification techniques in the PV power predicting. The normalized mean absolute error up to 1.5% ensures errors less than those obtained from both single-diode and double-diode equivalent-circuit models with a reduction up to 0.15%. The proposed hybrid approach using machine learning (ML) algorithms could be a key solution for photovoltaic and industrial software to predict more accurate performances.


Author(s):  
Chetna Kaushal ◽  
Deepika Koundal

<span>Big data refers to huge set of data which is very common these days due to the increase of internet utilities. Data generated from social media is a very common example for the same. This paper depicts the summary on big data and ways in which it has been utilized in all aspects. Data mining is radically a mode of deriving the indispensable knowledge from extensively vast fractions of data which is quite challenging to be interpreted by conventional methods. The paper mainly focuses on the issues related to the clustering techniques in big data. For the classification purpose of the big data, the existing classification algorithms are concisely acknowledged and after that, k-nearest neighbor algorithm is discreetly chosen among them and described along with an example. </span>


2018 ◽  
Vol 8 (2) ◽  
pp. 2790-2795 ◽  
Author(s):  
M. Alghobiri

Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.


2015 ◽  
Vol 1 (4) ◽  
pp. 270
Author(s):  
Muhammad Syukri Mustafa ◽  
I. Wayan Simpen

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1377
Author(s):  
Musaab I. Magzoub ◽  
Raj Kiran ◽  
Saeed Salehi ◽  
Ibnelwaleed A. Hussein ◽  
Mustafa S. Nasser

The traditional way to mitigate loss circulation in drilling operations is to use preventative and curative materials. However, it is difficult to quantify the amount of materials from every possible combination to produce customized rheological properties. In this study, machine learning (ML) is used to develop a framework to identify material composition for loss circulation applications based on the desired rheological characteristics. The relation between the rheological properties and the mud components for polyacrylamide/polyethyleneimine (PAM/PEI)-based mud is assessed experimentally. Four different ML algorithms were implemented to model the rheological data for various mud components at different concentrations and testing conditions. These four algorithms include (a) k-Nearest Neighbor, (b) Random Forest, (c) Gradient Boosting, and (d) AdaBoosting. The Gradient Boosting model showed the highest accuracy (91 and 74% for plastic and apparent viscosity, respectively), which can be further used for hydraulic calculations. Overall, the experimental study presented in this paper, together with the proposed ML-based framework, adds valuable information to the design of PAM/PEI-based mud. The ML models allowed a wide range of rheology assessments for various drilling fluid formulations with a mean accuracy of up to 91%. The case study has shown that with the appropriate combination of materials, reasonable rheological properties could be achieved to prevent loss circulation by managing the equivalent circulating density (ECD).


2021 ◽  
Vol 15 (6) ◽  
pp. 1812-1819
Author(s):  
Azita Yazdani ◽  
Ramin Ravangard ◽  
Roxana Sharifian

The new coronavirus has been spreading since the beginning of 2020 and many efforts have been made to develop vaccines to help patients recover. It is now clear that the world needs a rapid solution to curb the spread of COVID-19 worldwide with non-clinical approaches such as data mining, enhanced intelligence, and other artificial intelligence techniques. These approaches can be effective in reducing the burden on the health care system to provide the best possible way to diagnose and predict the COVID-19 epidemic. In this study, data mining models for early detection of Covid-19 in patients were developed using the epidemiological dataset of patients and individuals suspected of having Covid-19 in Iran. C4.5, support vector machine, Naive Bayes, logistic regression, Random Forest, and k-nearest neighbor algorithm were used directly on the dataset using Rapid miner to develop the models. By receiving clinical signs, this model diagnosis the risk of contracting the COVID-19 virus. Examination of the models in this study has shown that the support vector machine with 93.41% accuracy is more efficient in the diagnosis of patients with COVID-19 pandemic, which is the best model among other developed models. Keywords: COVID-19, Data mining, Machine Learning, Artificial Intelligence, Classification


Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Hongyan Wang

This paper presents the concept and algorithm of data mining and focuses on the linear regression algorithm. Based on the multiple linear regression algorithm, many factors affecting CET4 are analyzed. Ideas based on data mining, collecting history data and appropriate to transform, using statistical analysis techniques to the many factors influencing the CET-4 test were analyzed, and we have obtained the CET-4 test result and its influencing factors. It was found that the linear regression relationship between the degrees of fit was relatively high. We further improve the algorithm and establish a partition-weighted K-nearest neighbor algorithm. The K-weighted K nearest neighbor algorithm and the partition algorithm are used in the CET-4 test score classification prediction, and the statistical method is used to study the relevant factors that affect the CET-4 test score, and screen classification is performed to predict when the comparison verification will pass. The weight K of the input feature and the adjacent feature are weighted, although the allocation algorithm of the adjacent classification effect has not been significantly improved, but the stability classification is better than K-nearest neighbor algorithm, its classification efficiency is greatly improved, classification time is greatly reduced, and classification efficiency is increased by 119%. In order to detect potential risk graduating students earlier, this paper proposes an appropriate and timely early warning and preschool K-nearest neighbor algorithm classification model. Taking test scores or make-up exams and re-learning as input features, the classification model can effectively predict ordinary students who have not graduated.


Sign in / Sign up

Export Citation Format

Share Document