Survival Analysis and Data Mining

Author(s):  
Qiyang Chen ◽  
Alan Oppenheim ◽  
Dajin Wang

Survival analysis (SA) consists of a variety of methods for analyzing the timing of events and/or the times of transition among several states or conditions. The event of interest can happen at most only once to any individual or subject. Alternate terms to identify this process include Failure Analysis (FA), Reliability Analysis (RA), Lifetime Data Analysis (LDA), Time to Event Analysis (TEA), Event History Analysis (EHA), and Time Failure Analysis (TFA), depending on the type of application for which the method is used (Elashoff, 1997). Survival Data Mining (SDM) is a new term that was coined recently (SAS, 2004). There are many models and variations of SA. This article discusses some of the more common methods of SA with real-life applications. The calculations for the various models of SA are very complex. Currently, multiple software packages are available to assist in performing the necessary analyses much more quickly.

Author(s):  
Qiyang Chen

Survival analysis (SA) consists of a variety of methods for analyzing the timing of events and/or the times of transition among several states or conditions. The event of interest can only happen at most once to any individual or subject. Alternate terms to identify this process include Failure Analysis (FA), Reliability Analysis (RA), Lifetime Data Analysis (LDA), Time to Event Analysis (TEA), Event History Analysis (EHA), and Time Failure Analysis (TFA) depending on the type of application the method is used for (Elashoff, 1997). Survival Data Mining (SDM) is a new term being coined recently (SAS, 2004). There are many models and variations on the different models for SA or failure analysis. This chapter discusses some of the more common methods of SA with real life applications. The calculations for the various models of SA are very complex. Currently, there are multiple software packages to assist in performing the necessary analyses much more quickly.


2021 ◽  
Author(s):  
Shekoufeh Gorgi Zadeh ◽  
Charlotte Behning ◽  
Matthias Schmid

Abstract With the popularity of deep neural networks (DNNs) in recent years, many researchers have proposed DNNs for the analysis of survival data (time-to-event data). These networks learn the distribution of survival times directly from the predictor variables without making strong assumptions on the underlying stochastic process. In survival analysis, it is common to observe several types of events, also called competing events. The occurrences of these competing events are usually not independent of one another and have to be incorporated in the modeling process in addition to censoring. In classical survival analysis, a popular method to incorporate competing events is the subdistribution hazard model, which is usually fitted using weighted Cox regression. In the DNN framework, only few architectures have been proposed to model the distribution of time to a specific event in a competing events situation. These architectures are characterized by a separate subnetwork/pathway per event, leading to large networks with huge amounts of parameters that may become difficult to train. In this work, we propose a novel imputation strategy for data preprocessing that incorporates the subdistribution weights derived from the classical model. With this, it is no longer necessary to add multiple subnetworks to the DNN to handle competing events. Our experiments on synthetic and real-world datasets show that DNNs with multiple subnetworks per event can simply be replaced by a DNN designed for a single-event analysis without loss in accuracy.


Author(s):  
Patricia Cerrito ◽  
John Cerrito

Survival analysis is almost always reserved for an endpoint of mortality or recurrence. (Mantel, 1966) However, it can be used for many different types of endpoints as the survival distribution is defined as the time to an event. That event can be any endpoint of interest. For patients with chronic diseases, there are many endpoints to examine. For example, patients with diabetes want to avoid organ failure as well as death, and treatments that can prolong the time to organ failure will be beneficial. For patients with resistant infections, treatments that prevent one or multiple recurrences should be explored. Survival data mining differs from survival analysis in that multiple patient events can occur in sequence. The first step in survival data mining is to define an episode of treatment so that the patient events can be found for analysis. It can be thought of as a sequence of survival functions. In this chapter, we will look at the time to a switch in medications, and contrast how prescriptions are given to patients, either following or disregarding treatment guidelines.


2019 ◽  
Vol 1 (3) ◽  
pp. 1013-1038 ◽  
Author(s):  
Frank Emmert-Streib ◽  
Matthias Dehmer

The modeling of time to event data is an important topic with many applications in diverse areas. The collective of methods to analyze such data are called survival analysis, event history analysis or duration analysis. Survival analysis is widely applicable because the definition of an ’event’ can be manifold and examples include death, graduation, purchase or bankruptcy. Hence, application areas range from medicine and sociology to marketing and economics. In this paper, we review the theoretical basics of survival analysis including estimators for survival and hazard functions. We discuss the Cox Proportional Hazard Model in detail and also approaches for testing the proportional hazard (PH) assumption. Furthermore, we discuss stratified Cox models for cases when the PH assumption does not hold. Our discussion is complemented with a worked example using the statistical programming language R to enable the practical application of the methodology.


1995 ◽  
Vol 49 (2) ◽  
pp. 355-357
Author(s):  
Johannes Huinink

1998 ◽  
Vol 10 (1-3) ◽  
pp. 1-9
Author(s):  
Onno Boonstra ◽  
Maarten Panhuysen

Population registers are recognised to be a very important source for demographic research, because it enables us to study the lifecourse of individuals as well as households. A very good technique for lifecourse analysis is event history analysis. Unfortunately, there are marked differences in the way the data are available in population registers and the way event history analysis expects them to be. The source-oriented approach of computing historical data calls for a ‘five-file structure’, whereas event history analysis only can handle fiat files. In this article, we suggest a series of twelve steps with which population register data can be transposed from a five-file structured database into a ‘flat file’ event history analysis dataset.


Author(s):  
Yujin Kim

In the context of South Korea, characterized by increasing population aging and a changing family structure, this study examined differences in the risk of cognitive impairment by marital status and investigated whether this association differs by gender. The data were derived from the 2006–2018 Korean Longitudinal Study of Aging. The sample comprised 7,568 respondents aged 45 years or older, who contributed 30,414 person-year observations. Event history analysis was used to predict the odds of cognitive impairment by marital status and gender. Relative to their married counterparts, never-married and divorced people were the most disadvantaged in terms of cognitive health. In addition, the association between marital status and cognitive impairment was much stronger for men than for women. Further, gender-stratified analyses showed that, compared with married men, never-married men had a higher risk of cognitive impairment, but there were no significant effects of marital status for women.


Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


Author(s):  
Louisa Vogiazides ◽  
Hernan Mondani

Abstract Many countries actively seek to disperse refugees to counteract residential segregation or/and take measures to attract and retain international migrants in smaller communities to mitigate or reverse population decline. This study explores the regional distribution and inter-regional mobility among refugees in Sweden. It uses individual-level register data to follow two cohorts for 8 years after their arrival in Sweden, distinguishing between refugees subject to a placement policy in the 1990s and recent cohorts that either had arranged their own housing or had been assigned housing. It uses sequence analysis and multinomial logit regression to analyse regional trajectories, and event history analysis to examine mobility determinants. The results indicate that most refugees remained in the same type of region throughout the period. A significant proportion of refugees with assigned housing in large city or small city/rural regions stayed there over a long period, suggesting that refugee settlement policies have long-lasting consequences.


Sign in / Sign up

Export Citation Format

Share Document