Prediction and Analysis Model of Telecom Customer Churn Based on Missing Data

Author(s):  
Rui Zeng ◽  
Lingyun Yuan ◽  
Zhixia Ye ◽  
Jinyan Cai
2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Rosie Cornish ◽  
Kate Tilling ◽  
Rosie Cornish ◽  
James Carpenter

Abstract Focus of presentation Missing data are ubiquitous in medical research. Although there is increasing guidance on how to handle missing data, practice is changing slowly and misapprehensions abound, particularly in observational research. We present a practical framework for handling and reporting the analysis of incomplete data in observational studies, which we illustrate using a case study from the Avon Longitudinal Study of Parents and Children. Findings The framework consists of three steps: 1) Develop an analysis plan specifying the analysis model and how missing data are going to be addressed. Important considerations are whether a complete records analysis is likely to be valid, whether multiple imputation or an alternative approach is likely to offer benefits, and whether a sensitivity analysis regarding the missingness mechanism is required. 2) Explore the data, checking the methods outlined in the analysis plan are appropriate, and conduct the pre-planned analysis. 3) Report the results, including a description of the missing data, details on how missing data were addressed, and the results from all analyses, interpreted in light of the missing data and clinical relevance. Conclusions/Implications This framework encourages researchers to think carefully about their missing data and be transparent about the potential effect on the study results. This will increase confidence in the reliability and reproducibility of results from published papers. Key messages Researchers need to develop a plan for missing data prior to conducting their analysis, and be transparent about how they handled the missing data and its potential effect when reporting their results.


2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Katherine Lee ◽  
Kate Tilling ◽  
Rosie Cornish ◽  
James Carpenter

Abstract Focus of presentation Missing data are ubiquitous in medical research. Although there is increasing guidance on how to handle missing data, practice is changing slowly and misapprehensions abound, particularly in observational research. We present a practical framework for handling and reporting the analysis of incomplete data in observational studies, which we illustrate using a case study from the Avon Longitudinal Study of Parents and Children. Findings The framework consists of three steps: 1) Develop an analysis plan specifying the analysis model and how missing data are going to be addressed. Important considerations are whether a complete records analysis is likely to be valid, whether multiple imputation or an alternative approach is likely to offer benefits, and whether a sensitivity analysis regarding the missingness mechanism is required. 2) Explore the data, checking the methods outlined in the analysis plan are appropriate, and conduct the pre-planned analysis. 3) Report the results, including a description of the missing data, details on how missing data were addressed, and the results from all analyses, interpreted in light of the missing data and clinical relevance. Conclusions/Implications This framework encourages researchers to think carefully about their missing data and be transparent about the potential effect on the study results. This will increase confidence in the reliability and reproducibility of results from published papers. Key messages Researchers need to develop a plan for missing data prior to conducting their analysis, and be transparent about how they handled the missing data and its potential effect when reporting their results.


2017 ◽  
Vol 28 (1) ◽  
pp. 289-308 ◽  
Author(s):  
Peng Yin ◽  
Jian Q Shi

Sensitivity analysis is popular in dealing with missing data problems particularly for non-ignorable missingness, where full-likelihood method cannot be adopted. It analyses how sensitively the conclusions (output) may depend on assumptions or parameters (input) about missing data, i.e. missing data mechanism. We call models with the problem of uncertainty sensitivity models. To make conventional sensitivity analysis more useful in practice we need to define some simple and interpretable statistical quantities to assess the sensitivity models and make evidence based analysis. We propose a novel approach in this paper on attempting to investigate the possibility of each missing data mechanism model assumption, by comparing the simulated datasets from various MNAR models with the observed data non-parametrically, using the K-nearest-neighbour distances. Some asymptotic theory has also been provided. A key step of this method is to plug in a plausibility evaluation system towards each sensitivity parameter, to select plausible values and reject unlikely values, instead of considering all proposed values of sensitivity parameters as in the conventional sensitivity analysis method. The method is generic and has been applied successfully to several specific models in this paper including meta-analysis model with publication bias, analysis of incomplete longitudinal data and mean estimation with non-ignorable missing data.


2009 ◽  
Vol 69-70 ◽  
pp. 675-679
Author(s):  
D.S. Liu ◽  
Chun Hua Ju

To address the problem of customer churn in CRM in manufacturing industry, this paper proposes a prediction model based on Support Vector Machine (SVM). Considering the large-scale and imbalanced churn data, principal component analysis (PCA) is adopted to reduce dimensions and eliminate redundant information, which makes the sample space for SVM more compact and reasonable. An improved SVM is used to predict customer churn. Firstly, PCA is adopted to process 17 dimensional feature vectors of customer churn data, and then the application in manufacturing industry verifies that this model based on both PCA and SVM performs better than the model based on SVM only and other traditional models.


2016 ◽  
Vol 27 (9) ◽  
pp. 2610-2626 ◽  
Author(s):  
Thomas R Sullivan ◽  
Ian R White ◽  
Amy B Salter ◽  
Philip Ryan ◽  
Katherine J Lee

The use of multiple imputation has increased markedly in recent years, and journal reviewers may expect to see multiple imputation used to handle missing data. However in randomized trials, where treatment group is always observed and independent of baseline covariates, other approaches may be preferable. Using data simulation we evaluated multiple imputation, performed both overall and separately by randomized group, across a range of commonly encountered scenarios. We considered both missing outcome and missing baseline data, with missing outcome data induced under missing at random mechanisms. Provided the analysis model was correctly specified, multiple imputation produced unbiased treatment effect estimates, but alternative unbiased approaches were often more efficient. When the analysis model overlooked an interaction effect involving randomized group, multiple imputation produced biased estimates of the average treatment effect when applied to missing outcome data, unless imputation was performed separately by randomized group. Based on these results, we conclude that multiple imputation should not be seen as the only acceptable way to handle missing data in randomized trials. In settings where multiple imputation is adopted, we recommend that imputation is carried out separately by randomized group.


2020 ◽  
Vol 29 (10) ◽  
pp. 3076-3092 ◽  
Author(s):  
Susan Gachau ◽  
Matteo Quartagno ◽  
Edmund Njeru Njagi ◽  
Nelson Owuor ◽  
Mike English ◽  
...  

Missing information is a major drawback in analyzing data collected in many routine health care settings. Multiple imputation assuming a missing at random mechanism is a popular method to handle missing data. The missing at random assumption cannot be confirmed from the observed data alone, hence the need for sensitivity analysis to assess robustness of inference. However, sensitivity analysis is rarely conducted and reported in practice. We analyzed routine paediatric data collected during a cluster randomized trial conducted in Kenyan hospitals. We imputed missing patient and clinician-level variables assuming the missing at random mechanism. We also imputed missing clinician-level variables assuming a missing not at random mechanism. We incorporated opinions from 15 clinical experts in the form of prior distributions and shift parameters in the delta adjustment method. An interaction between trial intervention arm and follow-up time, hospital, clinician and patient-level factors were included in a proportional odds random-effects analysis model. We performed these analyses using R functions derived from the jomo package. Parameter estimates from multiple imputation under the missing at random mechanism were similar to multiple imputation estimates assuming the missing not at random mechanism. Our inferences were insensitive to departures from the missing at random assumption using either the prior distributions or shift parameters sensitivity analysis approach.


2021 ◽  
Author(s):  
Melissa Middleton ◽  
Cattram Nguyen ◽  
Margarita Moreno-Betancur ◽  
John B Carlin ◽  
Katherine J Lee

Abstract Background In case-cohort studies a random subcohort is selected from the inception cohort and acts as the sample of controls for several outcome investigations. Analysis is conducted using only the cases and the subcohort, with inverse probability weighting (IPW) used to account for the unequal sampling probabilities resulting from the study design. Like all epidemiological studies, case-cohort studies are susceptible to missing data. Multiple imputation (MI) has become increasingly popular for addressing missing data in epidemiological studies. It is currently unclear how best to incorporate the weights from a case-cohort analysis in MI procedures used to address missing covariate data.Method A simulation study was conducted with missingness in two covariates, motivated by a case study within the Barwon Infant Study. MI methods considered were: using the outcome, a proxy for weights in the simple case-cohort design considered, as a predictor in the imputation model, with and without exposure and covariate interactions; imputing separately within each weight category; and using a weighted imputation model. These methods were compared to a complete case analysis (CCA) within the context of a standard IPW analysis model estimating either the risk or odds ratio. The strength of associations, missing data mechanism, proportion of observations with incomplete covariate data, and subcohort selection probability varied across the simulation scenarios. Methods were also applied to the case study.Results There was similar performance in terms of relative bias and precision with all MI methods across the scenarios considered, with expected improvements compared with the CCA. Slight underestimation of the standard error was seen throughout but the nominal level of coverage (95%) was generally achieved. All MI methods showed a similar increase in precision as the subcohort selection probability increased, irrespective of the scenario. A similar pattern of results was seen in the case study.Conclusions How weights were incorporated into the imputation model had minimal effect on the performance of MI; this may be due to case-cohort studies only having two weight categories. In this context, inclusion of the outcome in the imputation model was sufficient to account for the unequal sampling probabilities in the analysis model.


Sign in / Sign up

Export Citation Format

Share Document