scholarly journals Telecom Churn Prediction Using Seven Machine Learning Experiments integrating Features engineering and Normalization

Author(s):  
Hemlata Jain ◽  
Ajay Khunteta ◽  
Sumit Private Shrivastav

Abstract Machine Learning and Deep learning classification has become an important topic in the area of Telecom Churn Prediction. Researchers have come out with very efficient experiments for Churn Prediction and have given a new direction to the telecommunication Industry to save their customers. Companies are eagerly developing the models for predicting churn and putting their efforts to save the potential churners. Therefore, for a better churn prediction model, finding the factors of churn is very important. This study is aiming to find the factors of user’s churn by evaluating their past service usage details. For this purpose, study is taking the advantage of feature importance, feature normalisation, feature correlation and feature extraction. After feature selection and extraction this study performing seven different experiments on the dataset to bring out the best results and compared the techniques. First Experiment includes a hybrid model of Decision tree and Logistic Regression, second experiment include PCA with Logistic Regression and Logit Boost, third experiment using a Deep Learning Technique that is CNN-VAE (Convolutional Neural Network with Variational Autoencoder), Fourth, fifth, sixth and seventh experiments was done on Logistic Regression, Logit Boost, XGBoost and Random Forest respectively. First four experiments are hybrid models and rest are using standalone techniques. The Orange dataset was used in this technique which has 3333 subscriber’s entries and 21 features. On the other hand, these experiments are compared with already existing models that have been developed in literature studies. The performance was evaluated using Accuracy, Precision, Recall rate, F-measure, Confusion Matrix, Marco Average and Weighted Average. This study proved to get better results as compared to old models. Random Forest outperformed in this study by achieving 95% Accuracy and all other experiments also produced very good results. The study states the importance of data mining techniques for a churn prediction model and proposes a very good comparison model where all machine Learning Standalone techniques, Deep Learning Technique and hybrid models with Feature Extraction tasks are being used and compared on the same dataset to evaluate the techniques performance better.

Scientific Knowledge and Electronic devices are growing day by day. In this aspect, many expert systems are involved in the healthcare industry using machine learning algorithms. Deep neural networks beat the machine learning techniques and often take raw data i.e., unrefined data to calculate the target output. Deep learning or feature learning is used to focus on features which is very important and gives a complete understanding of the model generated. Existing methodology used data mining technique like rule based classification algorithm and machine learning algorithm like hybrid logistic regression algorithm to preprocess data and extract meaningful insights of data. This is, however a supervised data. The proposed work is based on unsupervised data that is there is no labelled data and deep neural techniques is deployed to get the target output. Machine learning algorithms are compared with proposed deep learning techniques using TensorFlow and Keras in the aspect of accuracy. Deep learning methodology outfits the existing rule based classification and hybrid logistic regression algorithm in terms of accuracy. The designed methodology is tested on the public MIT-BIH arrhythmia database, classifying four kinds of abnormal beats. The proposed approach based on deep learning technique offered a better performance, improving the results when compared to machine learning approaches of the state-of-the-art


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
R. Shashikant ◽  
P. Chetankumar

Cardiac arrest is a severe heart anomaly that results in billions of annual casualties. Smoking is a specific hazard factor for cardiovascular pathology, including coronary heart disease, but data on smoking and heart death not earlier reviewed. The Heart Rate Variability (HRV) parameters used to predict cardiac arrest in smokers using machine learning technique in this paper. Machine learning is a method of computing experience based on automatic learning and enhances performances to increase prognosis. This study intends to compare the performance of logistical regression, decision tree, and random forest model to predict cardiac arrest in smokers. In this paper, a machine learning technique implemented on the dataset received from the data science research group MITU Skillogies Pune, India. To know the patient has a chance of cardiac arrest or not, developed three predictive models as 19 input feature of HRV indices and two output classes. These model evaluated based on their accuracy, precision, sensitivity, specificity, F1 score, and Area under the curve (AUC). The model of logistic regression has achieved an accuracy of 88.50%, precision of 83.11%, the sensitivity of 91.79%, the specificity of 86.03%, F1 score of 0.87, and AUC of 0.88. The decision tree model has arrived with an accuracy of 92.59%, precision of 97.29%, the sensitivity of 90.11%, the specificity of 97.38%, F1 score of 0.93, and AUC of 0.94. The model of the random forest has achieved an accuracy of 93.61%, precision of 94.59%, the sensitivity of 92.11%, the specificity of 95.03%, F1 score of 0.93 and AUC of 0.95. The random forest model achieved the best accuracy classification, followed by the decision tree, and logistic regression shows the lowest classification accuracy.


2019 ◽  
Vol 8 (1) ◽  
pp. 269-275 ◽  
Author(s):  
N. E. Md Isa ◽  
A. Amir ◽  
M. Z. Ilyas ◽  
M. S. Razalli

This paper focuses on classification of motor imagery in Brain Computer Interface (BCI) by using classifiers from machine learning technique. The BCI system consists of two main steps which are feature extraction and classification. The Fast Fourier Transform (FFT) features is extracted from the electroencephalography (EEG) signals to transform the signals into frequency domain. Due to the high dimensionality of data resulting from the feature extraction stage, the Linear Discriminant Analysis (LDA) is used to minimize the number of dimension by finding the feature subspace that optimizes class separability. Five classifiers: Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Naïve Bayes, Decision Tree and Logistic Regression are used in the study. The performance was tested by using Dataset 1 from BCI Competition IV which consists of imaginary hand and foot movement EEG data. As a result, SVM, Logistic Regression and Naïve Bayes classifier achieved the highest accuracy with 89.09% in AUC measurement.


2021 ◽  
Vol 44 (4) ◽  
pp. 1-12
Author(s):  
Ratchainant Thammasudjarit ◽  
Punnathorn Ingsathit ◽  
Sigit Ari Saputro ◽  
Atiporn Ingsathit ◽  
Ammarin Thakkinstian

Background: Chronic kidney disease (CKD) takes huge amounts of resources for treatments. Early detection of patients by risk prediction model should be useful in identifying risk patients and providing early treatments. Objective: To compare the performance of traditional logistic regression with machine learning (ML) in predicting the risk of CKD in Thai population. Methods: This study used Thai Screening and Early Evaluation of Kidney Disease (SEEK) data. Seventeen features were firstly considered in constructing prediction models using logistic regression and 4 MLs (Random Forest, Naïve Bayes, Decision Tree, and Neural Network). Data were split into train and test data with a ratio of 70:30. Performances of the model were assessed by estimating recall, C statistics, accuracy, F1, and precision. Results: Seven out of 17 features were included in the prediction models. A logistic regression model could well discriminate CKD from non-CKD patients with the C statistics of 0.79 and 0.78 in the train and test data. The Neural Network performed best among ML followed by a Random Forest, Naïve Bayes, and a Decision Tree with the corresponding C statistics of 0.82, 0.80, 0.78, and 0.77 in training data set. Performance of these corresponding models in testing data decreased about 5%, 3%, 1%, and 2% relative to the logistic model by 2%. Conclusions: Risk prediction model of CKD constructed by the logit equation may yield better discrimination and lower tendency to get overfitting relative to ML models including the Neural Network and Random Forest.  


2020 ◽  
Vol 38 (6_suppl) ◽  
pp. 343-343 ◽  
Author(s):  
Paul Sargos ◽  
Nicolas Leduc ◽  
Nicolas Giraud ◽  
Giorgio Gandaglia ◽  
Mathieu Roumiguie ◽  
...  

343 Background: Recent advances in machine learning algorithms and deep learning solutions paved the way for improved accuracy in survival analysis. We aim to investigate the accuracy of conventional machine learning and deep learning methods in the prediction of 3-year biochemical recurrence (BCR) as compared to CAPRA score prediction. Methods: A total of 5043 men who underwent RP between 2000 and 2015 for clinically localized PCa iwere analyzed retrospectively. Three-year BCR was predicted using the following models: CAPRA score, Cox regression analysis, logistic regression, k-nearest neighbor, random forest and densely connected feed-forward neural network classifier. The discrimination of the models was quantified using the C-index or the area under the receiver operating characteristics curve. Results: Patients with CAPRA score 2 and 3 accounted for 64% of the population. C-index measuring performance for the prediction of the three-year BCR for CAPRA score was 0.63. C-index values for k-neighbor classifier, logistic regression, Cox regression analysis, random forest classifier and densely optimized neural network were respectively 0.55, 0.63, 0.64, 0.64 and 0.70 (pairwise, adjusted p-value < 0.01). After inclusion of available post-surgical variables, C-index value reached respectively 0.58, 0.77, 0.74, 0.75 and 0.84 (pairwise, adjusted p-value < 0.05). Conclusions: Our results show that CAPRA score performed poorly in intermediate-risk patients undergoing RP. Densely connected neural networks with simple architecture further increased predictive power with low computational cost. In order to predict 3-years BCR, adding post-surgical features to the model greatly enhanced its performance.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Abolfazl Mehbodniya ◽  
Izhar Alam ◽  
Sagar Pande ◽  
Rahul Neware ◽  
Kantilal Pitambar Rane ◽  
...  

Healthcare sector is one of the prominent sectors in which a lot of data can be collected not only in terms of health but also in terms of finances. Major frauds happen in the healthcare sector due to the utilization of credit cards as the continuous enhancement of electronic payments, and credit card fraud monitoring has been a challenge in terms of financial condition to the different service providers. Hence, continuous enhancement is necessary for the system for detecting frauds. Various fraud scenarios happen continuously, which has a massive impact on financial losses. Many technologies such as phishing or virus-like Trojans are mostly used to collect sensitive information about credit cards and their owner details. Therefore, efficient technology should be there for identifying the different types of fraudulent conduct in credit cards. In this paper, various machine learning and deep learning approaches are used for detecting frauds in credit cards and different algorithms such as Naive Bayes, Logistic Regression, K-Nearest Neighbor (KNN), Random Forest, and the Sequential Convolutional Neural Network are skewed for training the other standard and abnormal features of transactions for detecting the frauds in credit cards. For evaluating the accuracy of the model, publicly available data are used. The different algorithm results visualized the accuracy as 96.1%, 94.8%, 95.89%, 97.58%, and 92.3%, corresponding to various methodologies such as Naive Bayes, Logistic Regression, K-Nearest Neighbor (KNN), Random Forest, and the Sequential Convolutional Neural Network, respectively. The comparative analysis visualized that the KNN algorithm generates better results than other approaches.


Author(s):  
Farrikh Alzami ◽  
Erika Devi Udayanti ◽  
Dwi Puji Prabowo ◽  
Rama Aria Megantara

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.


Sign in / Sign up

Export Citation Format

Share Document