survival regression
Recently Published Documents


TOTAL DOCUMENTS

59
(FIVE YEARS 23)

H-INDEX

13
(FIVE YEARS 0)

2021 ◽  
Author(s):  
◽  
Nazrina Aziz

<p>This thesis investigates three research problems which arise in multivariate data and censored regression. The first is the identification of outliers in multivariate data. The second is a dissimilarity measure for clustering purposes. The third is the diagnostics analysis for the Buckley-James method in censored regression. Outliers can be defined simply as an observation (or a subset of observations) that is isolated from the other observations in the data set. There are two main reasons that motivate people to find outliers; the first is the researcher's intention. The second is the effects of an outlier on analyses, i.e. the existence of outliers will affect means, variances and regression coefficients; they will also cause a bias or distortion of estimates; likewise, they will inflate the sums of squares and hence, false conclusions are likely to be created. Sometimes, the identification of outliers is the main objective of the analysis, and whether to remove the outliers or for them to be down-weighted prior to fitting a non-robust model. This thesis does not differentiate between the various justifications for outlier detection. The aim is to advise the analyst of observations that are considerably different from the majority. Note that the techniques for identification of outliers introduce in this thesis is applicable to a wide variety of settings. Those techniques are performed on large and small data sets. In this thesis, observations that are located far away from the remaining data are considered to be outliers. Additionally, it is noted that some techniques for the identification of outliers are available for finding clusters. There are two major challenges in clustering. The first is identifying clusters in high-dimensional data sets is a difficult task because of the curse of dimensionality. The second is a new dissimilarity measure is needed as some traditional distance functions cannot capture the pattern dissimilarity among the objects. This thesis deals with the latter challenge. This thesis introduces Influence Angle Cluster Approach (iaca) that may be used as a dissimilarity matrix and the author has managed to show that iaca successfully develops a cluster when it is used in partitioning clustering, even if the data set has mixed variables, i.e. interval and categorical variables. The iaca is developed based on the influence eigenstructure. The first two problems in this thesis deal with a complete data set. It is also interesting to study about the incomplete data set, i.e. censored data set. The term 'censored' is mostly used in biological science areas such as a survival analysis. Nowadays, researchers are interested in comparing the survival distribution of two samples. Even though this can be done by using the logrank test, this method cannot examine the effects of more than one variable at a time. This difficulty can easily be overcome by using the survival regression model. Examples of the survival regression model are the Cox model, Miller's model, the Buckely James model and the Koul- Susarla-Van Ryzin model. The Buckley James model's performance is comparable with the Cox model and the former performs best when compared both to the Miller model and the Koul-Susarla-Van Ryzin model. Previous comparison studies proved that the Buckley-James estimator is more stable and easier to explain to non-statisticians than the Cox model. Today, researchers are interested in using the Cox model instead of the Buckley-James model. This is because of the lack of function of Buckley-James model in the computer software and choices of diagnostics analysis. Currently, there are only a few diagnostics analyses for Buckley James model that exist. Therefore, this thesis proposes two new diagnostics analyses for the Buckley-James model. The first proposed diagnostics analysis is called renovated Cook's distance. This method produces comparable results with the previous findings. Nevertheless, this method cannot identify influential observations from the censored group. It can only detect influential observations from the uncensored group. This issue needs further investigation because of the possibility of censored points becoming influential cases in censored regression. Secondly, the local influence approach for the Buckley-James model is proposed. This thesis presents the local influence diagnostics of the Buckley-James model which consist of variance perturbation, response variable perturbation, censoring status perturbation, and independent variables perturbation. The proposed diagnostics improves and also challenge findings of the previous ones by taking into account both censored and uncensored data to have a possibility to become an influential observation.</p>


2021 ◽  
Author(s):  
◽  
Nazrina Aziz

<p>This thesis investigates three research problems which arise in multivariate data and censored regression. The first is the identification of outliers in multivariate data. The second is a dissimilarity measure for clustering purposes. The third is the diagnostics analysis for the Buckley-James method in censored regression. Outliers can be defined simply as an observation (or a subset of observations) that is isolated from the other observations in the data set. There are two main reasons that motivate people to find outliers; the first is the researcher's intention. The second is the effects of an outlier on analyses, i.e. the existence of outliers will affect means, variances and regression coefficients; they will also cause a bias or distortion of estimates; likewise, they will inflate the sums of squares and hence, false conclusions are likely to be created. Sometimes, the identification of outliers is the main objective of the analysis, and whether to remove the outliers or for them to be down-weighted prior to fitting a non-robust model. This thesis does not differentiate between the various justifications for outlier detection. The aim is to advise the analyst of observations that are considerably different from the majority. Note that the techniques for identification of outliers introduce in this thesis is applicable to a wide variety of settings. Those techniques are performed on large and small data sets. In this thesis, observations that are located far away from the remaining data are considered to be outliers. Additionally, it is noted that some techniques for the identification of outliers are available for finding clusters. There are two major challenges in clustering. The first is identifying clusters in high-dimensional data sets is a difficult task because of the curse of dimensionality. The second is a new dissimilarity measure is needed as some traditional distance functions cannot capture the pattern dissimilarity among the objects. This thesis deals with the latter challenge. This thesis introduces Influence Angle Cluster Approach (iaca) that may be used as a dissimilarity matrix and the author has managed to show that iaca successfully develops a cluster when it is used in partitioning clustering, even if the data set has mixed variables, i.e. interval and categorical variables. The iaca is developed based on the influence eigenstructure. The first two problems in this thesis deal with a complete data set. It is also interesting to study about the incomplete data set, i.e. censored data set. The term 'censored' is mostly used in biological science areas such as a survival analysis. Nowadays, researchers are interested in comparing the survival distribution of two samples. Even though this can be done by using the logrank test, this method cannot examine the effects of more than one variable at a time. This difficulty can easily be overcome by using the survival regression model. Examples of the survival regression model are the Cox model, Miller's model, the Buckely James model and the Koul- Susarla-Van Ryzin model. The Buckley James model's performance is comparable with the Cox model and the former performs best when compared both to the Miller model and the Koul-Susarla-Van Ryzin model. Previous comparison studies proved that the Buckley-James estimator is more stable and easier to explain to non-statisticians than the Cox model. Today, researchers are interested in using the Cox model instead of the Buckley-James model. This is because of the lack of function of Buckley-James model in the computer software and choices of diagnostics analysis. Currently, there are only a few diagnostics analyses for Buckley James model that exist. Therefore, this thesis proposes two new diagnostics analyses for the Buckley-James model. The first proposed diagnostics analysis is called renovated Cook's distance. This method produces comparable results with the previous findings. Nevertheless, this method cannot identify influential observations from the censored group. It can only detect influential observations from the uncensored group. This issue needs further investigation because of the possibility of censored points becoming influential cases in censored regression. Secondly, the local influence approach for the Buckley-James model is proposed. This thesis presents the local influence diagnostics of the Buckley-James model which consist of variance perturbation, response variable perturbation, censoring status perturbation, and independent variables perturbation. The proposed diagnostics improves and also challenge findings of the previous ones by taking into account both censored and uncensored data to have a possibility to become an influential observation.</p>


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
M Temtem ◽  
M Serrao ◽  
M I Mendonca ◽  
M Santos ◽  
A Sousa ◽  
...  

Abstract Background Hepatocyte nuclear factor4 A (HNF4A) gene was considered by GWAS associated with atherosclerosis and CAD susceptibility. Loss-of-function mutations in human hepatocyte nuclear factor 4α (HNF4α), a transcriptor factor encoded by the HNF4A gene, are associated with maturity-onset diabetes of the young and lipid disorders. However, the mechanisms underlying the lipid disorders are poorly understood. Aim We propose identifying the genetic predisposition to atherosclerosis progression and events occurrence or regression and better prognosis, through a cohort study from GENEMACOR population. Methods We investigated a cohort of 1,712 patients who underwent coronary angiography with more than 70% stenosis of at least one main coronary vessel. 33 SNPs associated with the risk of CAD in previous GWAS were genotyped by TaqMan assays methodology. We evaluated the best genetic model associated with CAD prognosis (events) with a 95% CI in bivariate analysis. The hazard function was performed by a Cox survival regression model adjusted for age, sex, type 2 diabetes, hypertension, and hypercholesterolemia, to evaluate their relationship with the event's incidence. Finally, we constructed Kaplan–Meier cumulative-event curves for the significant genetic variants. Results Our evaluation revealed a SNP paradoxically associated with protection from atherosclerosis progression and events occurrence: rs1884613 C&gt;G in the HNF4A gene on chromosome 20 dominant model [OR=0.653; 95% CI (0.522–0.817); p=0.0002]. Cox survival regression model showed a CAD protective effect of HNF4A with a Hazard ratio (HR) of 0.771; p=0.007. The Kaplan-Meier cumulative event analysis disclosed that the CG+GG vs CC genotype of rs1884613 HNF4α was associated with a better prognosis (Breslow test, p=0.004) at the end of the follow-up. Conclusion We identified, in this study, one SNPs paradoxically associated with a better CAD prognosis rs1884613 in HNF4A. The HNF4A gene variants could induce loss of HNF4α function, modifying and modulating hepatic lipase and lipid metabolism conferring a beneficial effect on atherosclerosis progression and events occurrence. FUNDunding Acknowledgement Type of funding sources: None.


2021 ◽  
Author(s):  
Stepan Nersisyan ◽  
Victor Novosad ◽  
Alexei Galatenko ◽  
Andrey Sokolov ◽  
Grigoriy Bokov ◽  
...  

Motivation: Feature selection is one of the main techniques used to prevent overfitting in machine learning applications. The most straightforward approach for feature selection is exhaustive search: one can go over all possible feature combinations and pick up the model with the highest accuracy. This method together with its optimizations were actively used in biomedical research, however, publicly available implementation is missing. Results: We present ExhauFS - the user-friendly command-line implementation of the exhaustive search approach for classification and survival regression. Aside from tool description, we included three application examples in the manuscript to comprehensively review the implemented function-ality. First, we executed ExhauFS on a toy cervical cancer dataset to illustrate basic concepts. Then, a multi-cohort microarray and RNA-seq breast cancer datasets were used to construct gene signatures for 5-year recurrence classification. Finally, Cox survival regression models were used to fit isomiR signatures for overall survival prediction for patients with colorectal cancer. Availability: Source codes and documentation of ExhauFS are available on GitHub: https://github.com/s-a-nersisyan/ExhauFS.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
A. F. Fagbamigbe ◽  
M. M. Salawu ◽  
S. M. Abatan ◽  
O. Ajumobi

AbstractThe need for more pragmatic approaches to achieve sustainable development goal on childhood mortality reduction necessitated this study. Simultaneous study of the influence of where the children live and the censoring nature of children survival data is scarce. We identified the compositional and contextual factors associated with under-five (U5M) and infant (INM) mortality in Nigeria from 5 MCMC Bayesian hierarchical Poisson regression models as approximations of the Cox survival regression model. The 2018 DHS data of 33,924 under-five children were used. Life table techniques and the Mlwin 3.05 module for the analysis of hierarchical data were implemented in Stata Version 16. The overall INM rate (INMR) was 70 per 1000 livebirths compared with U5M rate (U5MR) of 131 per 1000 livebirth. The INMR was lowest in Ogun (17 per 1000 live births) and highest in Kaduna (106), Gombe (112) and Kebbi (116) while the lowest U5MR was found in Ogun (29) and highest in Jigawa (212) and Kebbi (248). The risks of INM and U5M were highest among children with none/low maternal education, multiple births, low birthweight, short birth interval, poorer households, when spouses decide on healthcare access, having a big problem getting to a healthcare facility, high community illiteracy level, and from states with a high proportion of the rural population in the fully adjusted model. Compared with the null model, 81% vs 13% and 59% vs 35% of the total variation in INM and U5M were explained by the state- and neighbourhood-level factors respectively. Infant- and under-five mortality in Nigeria is influenced by compositional and contextual factors. The Bayesian hierarchical Poisson regression model used in estimating the factors associated with childhood deaths in Nigeria fitted the survival data.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Moses M. Ngari ◽  
Susanne Schmitz ◽  
Christopher Maronga ◽  
Lazarus K. Mramba ◽  
Michel Vaillant

Abstract Background Survival analyses methods (SAMs) are central to analysing time-to-event outcomes. Appropriate application and reporting of such methods are important to ensure correct interpretation of the data. In this study, we systematically review the application and reporting of SAMs in studies of tuberculosis (TB) patients in Africa. It is the first review to assess the application and reporting of SAMs in this context. Methods Systematic review of studies involving TB patients from Africa published between January 2010 and April 2020 in English language. Studies were eligible if they reported use of SAMs. Application and reporting of SAMs were evaluated based on seven author-defined criteria. Results Seventy-six studies were included with patient numbers ranging from 56 to 182,890. Forty-three (57%) studies involved a statistician/epidemiologist. The number of published papers per year applying SAMs increased from two in 2010 to 18 in 2019 (P = 0.004). Sample size estimation was not reported by 67 (88%) studies. A total of 22 (29%) studies did not report summary follow-up time. The survival function was commonly presented using Kaplan-Meier survival curves (n = 51, (67%) studies) and group comparisons were performed using log-rank tests (n = 44, (58%) studies). Sixty seven (91%), 3 (4.1%) and 4 (5.4%) studies reported Cox proportional hazard, competing risk and parametric survival regression models, respectively. A total of 37 (49%) studies had hierarchical clustering, of which 28 (76%) did not adjust for the clustering in the analysis. Reporting was adequate among 4.0, 1.3 and 6.6% studies for sample size estimation, plotting of survival curves and test of survival regression underlying assumptions, respectively. Forty-five (59%), 52 (68%) and 73 (96%) studies adequately reported comparison of survival curves, follow-up time and measures of effect, respectively. Conclusion The quality of reporting survival analyses remains inadequate despite its increasing application. Because similar reporting deficiencies may be common in other diseases in low- and middle-income countries, reporting guidelines, additional training, and more capacity building are needed along with more vigilance by reviewers and journal editors.


Sign in / Sign up

Export Citation Format

Share Document