scholarly journals The Implementation of Classification and Clustering Techniques on Churn Analysis

Author(s):  
Ahmet Elbir ◽  
Hamza Osman Ilhan ◽  
Mehmet Furkan Aydin ◽  
Yunus Emre Demirbulut

One of the most important problems of telecommunication companies is the potential transfer of customers between the firms. In order to avoid this problem, it is very important to identify customers who are likely to leave. In this study, the performance of the classification and the clustering algorithms in machine learning techniques has been evaluated and compared on the analysis of potential customer trends, which have been reported as churn analysis. K nearest neighbors, decision trees, random forests, support vector machines and naive bayes methods were tested in scope of classification idea. Additionally, K-Means and hierarchical clustering methods were tested. The performances of the methods have been evaluated according to the accuracy, precision, sensitivity and F-measure performance metrics.

Advancement in medical science has always been one of the most vital aspects of the human race. With the progress in technology, the use of modern techniques and equipment is always imposed on treatment purposes. Nowadays, machine learning techniques have widely been used in medical science for assuring accuracy. In this work, we have constructed computational model building techniques for liver disease prediction accurately. We used some efficient classification algorithms: Random Forest, Perceptron, Decision Tree, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) for predicting liver diseases. Our works provide the implementation of hybrid model construction and comparative analysis for improving prediction performance. At first, classification algorithms are applied to the original liver patient datasets collected from the UCI repository. Then we analyzed features and tweaked to improve the performance of our predictor and made a comparative analysis among the classifiers. We examined that, KNN algorithm outperformed all other techniques with feature selection.


2020 ◽  
Vol 10 (19) ◽  
pp. 6750
Author(s):  
Ditsuhi Iskandaryan ◽  
Francisco Ramos ◽  
Denny Asarias Palinggi ◽  
Sergio Trilles

The growing popularity of soccer has led to the prediction of match results becoming of interest to the research community. The aim of this research is to detect the effects of weather on the result of matches by implementing Random Forest, Support Vector Machine, K-Nearest Neighbors Algorithm, and Extremely Randomized Trees Classifier. The analysis was executed using the Spanish La Liga and Segunda division from the seasons 2013–2014 to 2017–2018 in combination with weather data. Two tasks were proposed as part of this study: the first was to find out whether the game will end in a draw, a win by the hosts or a victory by the guests, and the second was to determine whether the match will end in a draw or if one of the teams will win. The results show that, for the first task, Extremely Randomized Trees Classifier is a better method, with an accuracy of 65.9%, and, for the second task, Support Vector Machine yielded better results with an accuracy of 79.3%. Moreover, it is possible to predict whether the game will end in a draw or not with 0.85 AUC-ROC. Additionally, for comparative purposes, the analysis was also performed without weather data.


2021 ◽  
pp. 1-29
Author(s):  
Ahmed Alsaihati ◽  
Mahmoud Abughaban ◽  
Salaheldin Elkatatny ◽  
Abdulazeez Abdulraheem

Abstract Fluid loss into formations is a common operational issue that is frequently encountered when drilling across naturally or induced fractured formations. This could pose significant operational risks, such as well-control, stuck pipe, and wellbore instability, which, in turn, lead to an increase of well time and cost. This research aims to use and evaluate different machine learning techniques, namely: support vector machines, random forests, and K-nearest neighbors in detecting loss circulation occurrences while drilling using solely drilling surface parameters. Actual field data of seven wells, which had suffered partial or severe loss circulation, were used to build predictive models, while Well-8 was used to compare the performance of the developed models. Different performance metrics were used to evaluate the performance of the developed models. Recall, precision, and F1-score measures were used to evaluate the ability of the developed model to detect loss circulation occurrences. The results showed the K-nearest neighbors classifier achieved a high F1-score of 0.912 in detecting loss circulation occurrence in the testing set, while the random forests was the second-best classifier with almost the same F1-score of 0.910. The support vector machines achieved an F1-score of 0.83 in predicting the loss circulation occurrence in the testing set. The K-nearest neighbors outperformed other models in detecting the loss circulation occurrences in Well-8 with an F1-score of 0.80. The main contribution of this research as compared to previous studies is that it identifies losses events based on real-time measurements of the active pit volume.


Author(s):  
Lorenzo Cesaretti ◽  
Laura Screpanti ◽  
David Scaradozzi ◽  
Eleni Mangina

AbstractThis paper presents the preliminary results of using machine learning techniques to analyze educational robotics activities. An experiment was conducted with 197 secondary school students in Italy: the authors updated Lego Mindstorms EV3 programming blocks to record log files with coding sequences students had designed in teams. The activities were part of a preliminary robotics exercise. We used four machine learning techniques—logistic regression, support-vector machine (SVM), K-nearest neighbors and random forests—to predict the students’ performance, comparing a supervised approach (using twelve indicators extracted from the log files as input for the algorithms) and a mixed approach (applying a k-means algorithm to calculate the machine learning features). The results showed that the mixed approach with SVM outperformed the other techniques, and that three predominant learning styles emerged from the data mining analysis.


Sentiment analysis or opinion mining has gained much attention in recent years.With the constantly evolving social networks and internet marketing sites, reviews and blogs have been obtained among them, they act as an significant source for future analysis and better decision making. These reviews are naturally unstructured and thus require pre processing and further classification to gain the significant information for future use. These reviews and blogs can be of different types such as positive, negative and neutral . Supervised machine learning techniquess help to classify these reviews. In this paper five machine learning algorithms (K-Nearest Neighbors (KNN), Decision Tree, Artificial neural networks (ANNs), Naïve bayes and Support Vector Machine (SVM))are used for classification of sentiments. These algorithms are analyzed usingTwitter dataset. Performance analysis of these algorithms are done by using various performance measures such as Accuracy, precision, recall and F-measure. The evaluation of these techniques on Twitter datasetshowed predictive ability of Machine Learning in opinion mining


2022 ◽  
Vol 2161 (1) ◽  
pp. 012013
Author(s):  
Chiradeep Gupta ◽  
Athina Saha ◽  
N V Subba Reddy ◽  
U Dinesh Acharya

Abstract Diagnosis of cardiac disease requires being more accurate, precise, and reliable. The number of death cases due to cardiac attacks is increasing exponentially day by day. Thus, practical approaches for earlier diagnosis of cardiac or heart disease are done to achieve prompt management of the disease. Various supervised machine learning techniques like K-Nearest Neighbour, Decision Tree, Logistic Regression, Naïve Bayes, and Support Vector Machine (SVM) model are used for predicting cardiac disease using a dataset that was collected from the repository of the University of California, Irvine (UCI). The results depict that Logistic Regression was better than all other supervised classifiers in terms of the performance metrics. The model is also less risky since the number of false negatives is low as compared to other models as per the confusion matrix of all the models. In addition, ensemble techniques can be approached for the accuracy improvement of the classifier. Jupyter notebook is the best tool, for the implementation of Python Programming having many types of libraries, header files, for accurate and precise work.


Author(s):  
Tenali Pranuthi

There are various algorithms and methodologies used for automated screening of cervical cancer by segmenting and classifying cervical cancer cells into different categories. This study presents a critical review of different research papers published that integrated ML methods in screening cervical cancer via different approaches analyzed in terms of typical metrics like dataset size, drawbacks, accuracy etc. An attempt has been made to furnish the reader with an insight of Machine Learning algorithms like SVM (Support Vector Machines), k-NN (k-Nearest Neighbors), RFT (Random Forest Trees), for feature extraction and classification. This paper also covers the publicly available datasets related to cervical cancer. It presents a holistic review on the computational methods that have evolved over the period of time, in detection of malignant cells. In this paper, we are going to train our model using various machine learning techniques and all the models thus made are compared in terms of accuracy, precision and recall.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Maisa Cardoso Aniceto ◽  
Flavio Barboza ◽  
Herbert Kimura

AbstractCredit risk evaluation has a relevant role to financial institutions, since lending may result in real and immediate losses. In particular, default prediction is one of the most challenging activities for managing credit risk. This study analyzes the adequacy of borrower’s classification models using a Brazilian bank’s loan database, and exploring machine learning techniques. We develop Support Vector Machine, Decision Trees, Bagging, AdaBoost and Random Forest models, and compare their predictive accuracy with a benchmark based on a Logistic Regression model. Comparisons are analyzed based on usual classification performance metrics. Our results show that Random Forest and Adaboost perform better when compared to other models. Moreover, Support Vector Machine models show poor performance using both linear and nonlinear kernels. Our findings suggest that there are value creating opportunities for banks to improve default prediction models by exploring machine learning techniques.


2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


Sign in / Sign up

Export Citation Format

Share Document