scholarly journals PREDICCIÓN DE FUGA DE CLIENTES EN UNA EMPRESA DE DISTRIBUCIÓN DE GAS NATURAL MEDIANTE EL USO DE MINERÍA DE DATOS

2020 ◽  
Vol 24 (106) ◽  
pp. 79-87
Author(s):  
Fredy Humberto Troncoso Espinosa ◽  
Javiera Valentina Ruiz Tapia

La fuga de clientes es un problema relevante al que enfrentan las empresas de servicios y que les puede generar pérdidas económicas significativas. Identificar los elementos que llevan a un cliente a dejar de consumir un servicio es una tarea compleja, sin embargo, mediante su comportamiento es posible estimar una probabilidad de fuga asociada a cada uno de ellos. Esta investigación aplica minería de datos para la predicción de la fuga de clientes en una empresa de distribución de gas natural, mediante dos técnicas de machine learning: redes neuronales y support vector machine. Los resultados muestran que mediante la aplicación de estas técnicas es posible identificar los clientes con mayor probabilidad de fuga para tomar sobre estas acciones de retenciónoportunas y focalizadas, minimizando los costos asociados al error en la identificación de estos clientes. Palabras Clave: fuga de clientes, minería de datos, machine learning, distribución de gas natural. Referencias [1]J. Miranda, P. Rey y R. Weber, «Predicción de Fugas de Clientes para una Institución Financiera Mediante Support Vector Machines,» Revista Ingeniería de Sistemas Volumen XIX, pp. 49-68, 2005. [2]P. A. Pérez V., «Modelo de predicción de fuga de clientes de telefonía movil post pago,» Universidad de Chile, Santiago, Chile, 2014. [3]Gas Sur S.A., «https://www.gassur.cl/Quienes-Somos/,» [En línea]. [4]J. Xiao, X. Jiang, C. He y G. Teng, «Churn prediction in customer relationship management via GMDH-based multiple classifiers ensemble,» IEEE IntelligentSystems, vol. 31, nº 2, pp. 37-44, 2016. [5]A. M. Almana, M. S. Aksoy y R. Alzahrani, «A survey on data mining techniques in customer churn analysis for telecom industry,» International Journal of Engineering Research and Applications, vol. 4, nº 5, pp. 165-171, 2014. [6]A. Jelvez, M. Moreno, V. Ovalle, C. Torres y F. Troncoso, «Modelo predictivo de fuga de clientes utilizando mineríaa de datos para una empresa de telecomunicaciones en chile,» Universidad, Ciencia y Tecnología, vol. 18, nº 72, pp. 100-109, 2014. [7]D. Anil Kumar y V. Ravi, «Predicting credit card customer churn in banks using data mining,» International Journal of Data Analysis Techniques and Strategies, vol. 1, nº 1, pp. 4-28, 2008. [8]E. Aydoğan, C. Gencer y S. Akbulut, «Churn analysis and customer segmentation of a cosmetics brand using data mining techniques,» Journal of Engineeringand Natural Sciences, vol. 26, nº 1, 2008. [9]G. Dror, D. Pelleg, O. Rokhlenko y I. Szpektor, «Churn prediction in new users of Yahoo! answers,» de Proceedings of the 21st International Conference onWorld Wide Web, 2012. [10]T. Vafeiadis, K. Diamantaras, G. Sarigiannidis y K. Chatzisavvas, «A comparison of machine learning techniques for customer churn prediction,» SimulationModelling Practice and Theory, vol. 55, pp. 1-9, 2015. [11]Y. Xie, X. Li, E. Ngai y W. Ying, «Customer churn prediction using improved balanced random forests,» Expert Systems with Applications, vol. 36, nº 3, pp.5445-5449, 2009. [12]U. Fayyad, G. Piatetsky-Shapiro y P. Smyth, «Knowledge Discovery and Data Mining: Towards a Unifying Framework,» de KDD-96 Proceedings, 1996. [13]R. Brachman y T. Anand, «The process of knowledge discovery in databases,» de Advances in knowledge discovery and data mining, 1996. [14]K. Lakshminarayan, S. Harp, R. Goldman y T. Samad, «Imputation of Missing Data Using Machine Learning Techniques,» de KDD, 1996. [15]B. Nguyen , J. L. Rivero y C. Morell, «Aprendizaje supervisado de funciones de distancia: estado del arte,» Revista Cubana de Ciencias Informáticas, vol. 9, nº 2, pp. 14-28, 2015. [16]I. Monedero, F. Biscarri, J. Guerrero, M. Peña, M. Roldán y C. León, «Detection of water meter under-registration using statistical algorithms,» Journal of Water Resources Planning and Management, vol. 142, nº 1, p. 04015036, 2016. [17]I. Guyon y A. Elisseeff, «An introduction to variable and feature selection,» Journal of machine learning research, vol. 3, nº Mar, pp. 1157-1182, 2003. [18]K. Polat y S. Güneş, «A new feature selection method on classification of medical datasets: Kernel F-score feature selection,» Expert Systems with Applications, vol. 36, nº 7, pp. 10367-10373, 2009. [19]D. J. Matich, «Redes Neuronales. Conceptos Básicos y Aplicaciones,» de Cátedra: Informática Aplicada ala Ingeniería de Procesos- Orientación I, 2001. [20]E. Acevedo M., A. Serna A. y E. Serna M., «Principios y Características de las Redes Neuronales Artificiales, » de Desarrollo e Innovación en Ingeniería, Medellín, Editorial Instituto Antioqueño de Investigación, 2017, pp. Capítulo 10, 173-182. [21]M. Hofmann y R. Klinkenberg, RapidMiner: Data mining use cases and business analytics applications, CRC Press, 2016. [22]R. Pupale, «Towards Data Science,» 2018. [En línea]. Disponible: https://towardsdatascience.com/https-medium-com-pupalerushikesh-svm-f4b42800e989. [23]F. H. Troncoso Espinosa, «Prediction of recidivismin thefts and burglaries using machine learning,» Indian Journal of Science and Technology, vol. 13, nº 6, pp. 696-711, 2020. [24]L. Tashman, «Out-of-sample tests of forecasting accuracy: an analysis and review,» International journal of forecasting, vol. 16, nº 4, pp. 437-450, 2000. [25]S. Varma y R. Simon, «Bias in error estimation when using cross-validation for model selection,» BMC bioinformatics, vol. 7, nº 1, p. 91, 2006. [26]N. V. Chawla, K. W. Bowyer, L. O. Hall y W. Kegelmeyer, «SMOTE: Synthetic Minority Over-sampling Technique,» Journal of Artificial Inteligence Research16, pp. 321-357, 2002. [27]M. Sokolova y G. Lapalme, «A systematic analysis of performance measures for classification tasks,» Information processing & management, vol. 45, nº 4, pp. 427-437, 2009. [28]S. Narkhede, «Understanding AUC-ROC Curve,» Towards Data Science, vol. 26, 2018. [29]R. Westermann y W. Hager, «Error Probabilities in Educational and Psychological Research,» Journal of Educational Statistics, Vol 11, No 2, pp. 117-146, 1986.  

Due to competition between online retailers, the need for providing improved customer service has grown rapidly. In addition to reduction in sales due to loss of customers, more investments are needed to be done to attract new customers. Companies now are working continuously to improve their perceived quality by way of giving timely and quality service to their customers. Customer churn has become one of the primary challenges that many firms are facing nowadays. Several churn prediction models and techniques are proposed previously in literature to predict customer churn in areas such as finance, telecom, banking etc. Researchers are also working on customer churn prediction in e-commerce using data mining and machine learning techniques. In this paper, a comprehensive review of various models to predict customer churn in e-commerce data mining and machine learning techniques has been presented. A critical review of recent research papers in the field of customer churn prediction in e-commerce using data mining has been done. Thereafter, important inferences and research gaps after studying the literature are presented. Finally, the research significance and concluding remarks are described in the end.


2017 ◽  
Vol 117 (1) ◽  
pp. 90-109 ◽  
Author(s):  
Eui-Bang Lee ◽  
Jinwha Kim ◽  
Sang-Gun Lee

Purpose The purpose of this paper is to identify the influence of the frequency of word exposure on online news based on the availability heuristic concept. So that this is different from most churn prediction studies that focus on subscriber data. Design/methodology/approach This study examined the churn prediction through words presented the previous studies and additionally identified words what churn generate using data mining technology in combination with logistic regression, decision tree graphing, neural network models, and a partial least square (PLS) model. Findings This study found prediction rates similar to those delivered by subscriber data-based analyses. In addition, because previous studies do not clearly suggest the effects of the factors, this study uses decision tree graphing and PLS modeling to identify which words deliver positive or negative influences. Originality/value These findings imply an expansion of churn prediction, advertising effect, and various psychological studies. It also proposes concrete ideas to advance the competitive advantage of companies, which not only helps corporate development, but also improves industry-wide efficiency.


Author(s):  
KM Jyoti Rani

Diabetes is a chronic disease with the potential to cause a worldwide health care crisis. According to International Diabetes Federation 382 million people are living with diabetes across the whole world. By 2035, this will be doubled as 592 million. Diabetes is a disease caused due to the increase level of blood glucose. This high blood glucose produces the symptoms of frequent urination, increased thirst, and increased hunger. Diabetes is a one of the leading cause of blindness, kidney failure, amputations, heart failure and stroke. When we eat, our body turns food into sugars, or glucose. At that point, our pancreas is supposed to release insulin. Insulin serves as a key to open our cells, to allow the glucose to enter and allow us to use the glucose for energy. But with diabetes, this system does not work. Type 1 and type 2 diabetes are the most common forms of the disease, but there are also other kinds, such as gestational diabetes, which occurs during pregnancy, as well as other forms. Machine learning is an emerging scientific field in data science dealing with the ways in which machines learn from experience. The aim of this project is to develop a system which can perform early prediction of diabetes for a patient with a higher accuracy by combining the results of different machine learning techniques. The algorithms like K nearest neighbour, Logistic Regression, Random forest, Support vector machine and Decision tree are used. The accuracy of the model using each of the algorithms is calculated. Then the one with a good accuracy is taken as the model for predicting the diabetes.


Author(s):  
Mohammad M. Masud ◽  
Latifur Khan ◽  
Bhavani Thuraisingham

This chapter applies data mining techniques to detect email worms. Email messages contain a number of different features such as the total number of words in message body/subject, presence/absence of binary attachments, type of attachments, and so on. The goal is to obtain an efficient classification model based on these features. The solution consists of several steps. First, the number of features is reduced using two different approaches: feature-selection and dimension-reduction. This step is necessary to reduce noise and redundancy from the data. The feature-selection technique is called Two-phase Selection (TPS), which is a novel combination of decision tree and greedy selection algorithm. The dimensionreduction is performed by Principal Component Analysis. Second, the reduced data is used to train a classifier. Different classification techniques have been used, such as Support Vector Machine (SVM), Naïve Bayes and their combination. Finally, the trained classifiers are tested on a dataset containing both known and unknown types of worms. These results have been compared with published results. It is found that the proposed TPS selection along with SVM classification achieves the best accuracy in detecting both known and unknown types of worms.


Author(s):  
Bhavani Thuraisingham

Data mining is the process of posing queries to large quantities of data and extracting information often previously unknown using mathematical, statistical, and machine-learning techniques. Data mining has many applications in a number of areas, including marketing and sales, medicine, law, manufacturing, and, more recently, homeland security. Using data mining, one can uncover hidden dependencies between terrorist groups as well as possibly predict terrorist events based on past experience. One particular data-mining technique that is being investigated a great deal for homeland security is link analysis, where links are drawn between various nodes, possibly detecting some hidden links.


Author(s):  
Baban. U. Rindhe ◽  
Nikita Ahire ◽  
Rupali Patil ◽  
Shweta Gagare ◽  
Manisha Darade

Heart-related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need fora reliable, accurate, and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart-related diseases. Heart is the next major organ comparing to the brain which has more priority in the Human body. It pumps the blood and supplies it to all organs of the whole body. Prediction of occurrences of heart diseases in the medical field is significant work. Data analytics is useful for prediction from more information and it helps the medical center to predict various diseases. A huge amount of patient-related data is maintained on monthly basis. The stored data can be useful for the source of predicting the occurrence of future diseases. Some of the data mining and machine learning techniques are used to predict heart diseases, such as Artificial Neural Network (ANN), Random Forest,and Support Vector Machine (SVM).Prediction and diagnosingof heart disease become a challenging factor faced by doctors and hospitals both in India and abroad. To reduce the large scale of deaths from heart diseases, a quick and efficient detection technique is to be discovered. Data mining techniques and machine learning algorithms play a very important role in this area. The researchers accelerating their research works to develop software with thehelp of machine learning algorithms which can help doctors to decide both prediction and diagnosing of heart disease. The main objective of this research project is to predict the heart disease of a patient using machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document