Fast Lasso method for Large-scale and Ultrahigh-dimensional Cox Model with applications to UK Biobank

AbstractWe develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the L1-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in (Qian et al. 2019). The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow et al. 2015). Our approach, which we refer to as snpnet-Cox, is implemented in a publicly available package.

Download Full-text

Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank

Biostatistics ◽

10.1093/biostatistics/kxaa038 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ruilin Li ◽

Christopher Chang ◽

Johanne M Justesen ◽

Yosuke Tanigawa ◽

Junyang Qiang ◽

...

Keyword(s):

Large Scale ◽

Likelihood Function ◽

Cox Model ◽

Partial Likelihood ◽

Parameter Estimates ◽

Proportional Hazard ◽

Uk Biobank ◽

Cox Proportional Hazard ◽

The Uk ◽

Lasso Method

Summary We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the $L^1$-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in Qian and others (2019). Our algorithm is particularly suitable for large-scale and high-dimensional data that do not fit in the memory. The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow and others, 2015). We provide a publicly available implementation of the proposed approach for genetics data on top of the PLINK2 package and name it snpnet-Cox.

Download Full-text

Corrigendum to: Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank

Biostatistics ◽

10.1093/biostatistics/kxab019 ◽

2021 ◽

Author(s):

Ruilin Li ◽

Christopher Chang ◽

Johanne M Justesen ◽

Yosuke Tanigawa ◽

Junyang Qian ◽

...

Keyword(s):

Large Scale ◽

Cox Model ◽

Uk Biobank ◽

Lasso Method

Download Full-text

Clustered Mendelian Randomization analyses identifies distinct and opposing pathways in the causal association between insulin-like growth factor-1 and type 2 diabetes mellitus

10.1101/2021.05.12.21257093 ◽

2021 ◽

Author(s):

Wenyi Wang ◽

Ephrem Baraki Tesfay ◽

Ko Willems van Dijk ◽

Andrzej Bartke ◽

Diana van Heemst ◽

...

Keyword(s):

Type 2 Diabetes ◽

Growth Factor ◽

Mendelian Randomization ◽

Causal Effect ◽

Proportional Hazard ◽

Uk Biobank ◽

Heterogeneous Distribution ◽

Cox Proportional Hazard ◽

The Uk

Aims/hypothesis: There is inconsistent evidence for the causal role of serum insulin-like growth factor-1 (IGF-1) concentration in the pathogenesis of type 2 diabetes. Here, we investigated the association between IGF-1 and type 2 diabetes using a combination of multivariable-adjusted and (clustered) Mendelian Randomization (MR) analyses in the UK Biobank. Methods: We conducted Cox proportional hazard analyses in 451,232 European-ancestry individuals of the UK Biobank (55.3% women, mean age at recruitment 56.6 years), among which 13,247 individuals developed type 2 diabetes during up to 12 years of follow-up. In addition, we conducted two-sample MR analyses based on independent SNPs associated with IGF-1. Given the heterogeneity between the causal estimates of individual instruments (P-value for Q statistic=4.03e-145), we also conducted clustered MR analyses. Biological pathway analyses of the identified clusters were performed by overrepresentation analyses. Results: In the Cox proportional hazard models, with IGF-1 concentrations stratified in quintiles, we observed that participants in the lowest quintile had the highest relative risk of type 2 diabetes (HR: 1.31; CI: 1.23-1.39). In contrast, in the two-sample MR analyses, higher genetically-influenced IGF-1 was associated with a higher risk of type 2 diabetes. Based on the heterogeneous distribution of causal effect estimates, six clusters associated either with a lower or a higher risk of type 2 diabetes were identified. The main clusters in which a higher IGF-1 was associated with a lower risk of type 2 diabetes consisted of instruments mapping to genes in the growth-hormone signaling pathway, whereas the main clusters in which a higher IGF-1 was associated with a higher risk of type 2 diabetes consisted of instruments mapping to genes in pathways related to amino acid metabolism and genomic integrity. Conclusion: The IGF-1 associated SNPs used as genetic instruments in MR analyses showed a heterogeneous distribution of causal effect estimates on the risk of type 2 diabetes. This was likely explained by differences in the underlying molecular pathways that increase IGF-1 concentration and differentially mediate the effects of IGF-1 on type 2 diabetes.

Download Full-text

ANALISIS SURVIVAL UNTUK DURASI PROSES KELAHIRAN MENGGUNAKAN MODEL REGRESI HAZARD ADDITIF

Jurnal Gaussian ◽

10.14710/j.gauss.v9i4.29259 ◽

2020 ◽

Vol 9 (4) ◽

pp. 402-410

Author(s):

Triastuti Wuryandari ◽

Sri Haryatmi Kartiko ◽

Danardono Danardono

Keyword(s):

Survival Data ◽

Cox Model ◽

Hazard Model ◽

Partial Likelihood ◽

Cox Proportional Hazard Model ◽

Proportional Hazard ◽

Proportional Hazard Model ◽

Birth Process ◽

Cox Proportional Hazard ◽

Additive Hazard Model

Survival data is the length of time until an event occurs. If the survival time is affected by other factor, it can be modeled with a regression model. The regression model for survival data is commonly based on the Cox proportional hazard model. In the Cox proportional hazard model, the covariate effect act multiplicatively on unknown baseline hazard. Alternative to the multiplicative hazard model is the additive hazard model. One of the additive hazard models is the semiparametric additive hazard model that introduced by Lin Ying in 1994. The regression coefficient estimates in this model mimic the scoring equation in the Cox model. Score equation of Cox model is the derivative of the Partial Likelihood and methods to maximize partial likelihood with Newton Raphson iterasi. Subject from this paper is describe the multiplicative and additive hazard model that applied to the duration of the birth process. The data is obtained from two different clinics,there are clinic that applies gentlebirth method while the other one no gentlebirth. From the data processing obtained the factors that affect on the duration of the birth process are baby’s weight, baby’s height and method of birth. Keywords: survival, additive hazard model, cox proportional hazard, partial likelihood, gentlebirth, duration

Download Full-text

Penanganan Ties Event dalam Regresi Cox Proportional Hazard Menggunakan Metode Breslow (Kasus: Pasien Rawat Inap DBD di RSAL Jala Ammari Makassar)

VARIANSI: Journal of Statistics and Its application on Teaching and Research ◽

10.35580/variansiunm12897 ◽

2020 ◽

Vol 2 (1) ◽

pp. 13

Author(s):

Herawati Hafid ◽

Muhammad Nadjib Bustan ◽

Muhammad Kasim Aidid

Keyword(s):

Survival Analysis ◽

Dengue Hemorrhagic Fever ◽

Recovery Rate ◽

Cox Model ◽

Hemorrhagic Fever ◽

Hazard Ratio ◽

Partial Likelihood ◽

P Value ◽

Proportional Hazard ◽

Cox Proportional Hazard

Abstrak Analisis Survival adalah prosedur statistika yang digunakan untuk menganalisis data dimana peubah yang diperhatikan adalah waktu sampai terjadinya suatu event. Waktu dapat dinyatakan dalam hitungan hari, minggu, bulan dan tahun. Salah satu tujuan dari analisis survival adalah untuk mengetahui hubungan antara waktu kejadian peubah bebas yang terukur pada saat dilakukan penelitian. Metode yang sering digunakan dalam analisis survival khususnya data kesehatan adalah Regresi Cox Proportional Hazard (PH) karena distribusinya tidak tergantung pada asumsi waktu kejadian. Dalam suatu data seperti data pasien penderita penyakit Demam Berdarah Dengue (DBD) ditemukan adanya data kejadian bersama (ties event) yang berpengaruh pada pembentukan himpunan risikonya pada bagian estimasi parameter model cox,pada kasus kejadian bersama (ties event) dilakukan modifikasi pada partial likelihood untuk mengetahui faktor-faktor yang mempengaruhi laju kesembuhan pasien penderita penyakit DBD. Adapun hasil analisisnya, diperoleh faktor yang paling berpengaruh terhadap laju kesembuhan penyakit DBD yakni leukosit dengan p-value =0,097< α 0,05, dan nilai hazard ratio sebesar 1,1024 serta faktor yang kedua yaitu hematokrit dengan p-value =0,0141< α 0,05, dan nilai hazard ratio sebesar 1,595. Kata Kunci: Analisis Survival, Regresi Cox PH, Ties Event, Metode Breslow, Demam Berdarah Dengue (DBD). Abstract Survival analysis is a statistical procedure that is used to analyze data where the variables considered are the time until the occurrence of an event. Time can be expressed in days, weeks, months and years. One of the objectives of survival analysis is to find out the relationship between the time of occurrence of independent variables measured at the time of the study. The method often used in survival analysis, especially health data, is Cox Proportional Hazard (PH) Regression because its distribution does not depend on the assumption of the time of the event. In a data such as data on patients with Dengue Hemorrhagic Fever (DHF) data, there were ties event data that influenced the formation of risk sets in the cox model parameter estimation section, in the case of ties event modifications were made to the partial likelihood for know the factors that influence the recovery rate of patients with DHF. As for the results of the analysis, the factors that most influence the recovery rate of leucocyte dengue fever with p-value = 0,097 < α = 0,05 and the hazard ratio of 1.1024 and the second factor is the hematocrit with p-value = 0,0141 < α = 0,05 and the hazard ratio valueamounting to 1,595. Keywords: Survival Analysis, Cox PH Regression, Ties Event, Breslow Method, Dengue Hemorrhagic Fever (DHF).

Download Full-text

EZ-ALBI Score for Predicting Hepatocellular Carcinoma Prognosis

Liver Cancer ◽

10.1159/000508971 ◽

2020 ◽

Vol 9 (6) ◽

pp. 734-743

Author(s):

Kazuya Kariyama ◽

Kazuhiro Nouso ◽

Atsushi Hiraoka ◽

Akiko Wakuta ◽

Ayano Oonishi ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Liver Function ◽

Large Scale ◽

Regression Coefficient ◽

Information Criterion ◽

Proportional Hazard ◽

Cox Proportional Hazard ◽

Highly Correlated ◽

Survival Risk ◽

Good Agreement

Introduction: The ALBI score is acknowledged as the gold standard for the assessment of liver function in patients with hepatocellular carcinoma (HCC). Unlike the Child-Pugh score, the ALBI score uses only objective parameters, albumin (Alb) and total bilirubin (T.Bil), enabling a better evaluation. However, the complex calculation of the ALBI score limits its applicability. Therefore, we developed a simplified ALBI score, based on data from a large-scale HCC database.We used the data of 5,249 naïve HCC cases registered in eight collaborating hospitals. Methods: We developed a new score, the EZ (Easy)-ALBI score, based on regression coefficients of Alb and T.Bil for survival risk in a multivariate Cox proportional hazard model. We also developed the EZ-ALBI grade and EZ-ALBI-T grade as alternative options for the ALBI grade and ALBI-T grade and evaluated their stratifying ability. Results: The equation used to calculate the EZ-ALBI score was simple {[T.Bil (mg/dL)] – [9 × Alb (g/dL)]}; this value highly correlated with the ALBI score (correlation coefficient, 0.981; p < 0.0001). The correlation was preserved across different Barcelona clinic liver cancer grade scores (regression coefficient, 0.93–0.98) and across different hospitals (regression coefficient, 0.98–0.99), indicating good generalizability. Although a good agreement was observed between ALBI and EZ-ALBI, discrepancies were observed in patients with poor liver function (T.Bil, ≥3 mg/dL; regression coefficient, 0.877). The stratifying ability of EZ-ALBI grade and EZ-ALBI-T grade were good and their Akaike’s information criterion values (35,897 and 34,812, respectively) were comparable with those of ALBI grade and ALBI-T grade (35,914 and 34,816, respectively). Conclusions: The EZ-ALBI score, EZ-ALBI grade, and EZ-ALBI-T grade are useful, simple scores, which might replace the conventional ALBI score in the future.

Download Full-text

The temporal relationship between cancer and adult onset anti-transcriptional intermediary factor 1 antibody–positive dermatomyositis

Rheumatology ◽

10.1093/rheumatology/key357 ◽

2018 ◽

Vol 58 (4) ◽

pp. 650-655 ◽

Cited By ~ 26

Author(s):

Alexander Oldroyd ◽

Jamie C Sergeant ◽

Paul New ◽

Neil J McHugh ◽

Zoe Betteridge ◽

...

Keyword(s):

Proportional Hazard ◽

Adult Onset ◽

Cox Proportional Hazard ◽

Kaplan Meier ◽

Specific Cancer ◽

Associated Malignancy ◽

Cancer Types ◽

The Uk ◽

Screening Approaches

Abstract Objectives To characterize the 10 year relationship between anti-transcriptional intermediary factor 1 antibody (anti-TIF1-Ab) positivity and cancer onset in a large UK-based adult DM cohort. Methods Data from anti-TIF1-Ab-positive/-negative adults with verified diagnoses of DM from the UK Myositis Network register were analysed. Each patient was followed up until they developed cancer. Kaplan–Meier methods and Cox proportional hazard modelling were employed to estimate the cumulative cancer incidence. Results Data from 263 DM cases were analysed, with a total of 3252 person-years and a median 11 years of follow-up; 55 (21%) DM cases were anti-TIF1-Ab positive. After 10 years of follow-up, a higher proportion of anti-TIF1-Ab-positive cases developed cancer compared with anti-TIF1-Ab-negative cases: 38% vs 15% [hazard ratio 3.4 (95% CI 2.2, 5.4)]. All the detected malignancy cases in the anti-TIF1-Ab-positive cohort occurred between 3 years prior to and 2.5 years after DM onset. No cancer cases were detected within the following 7.5 years in this group, whereas cancers were detected during this period in the anti-TIF1-Ab-negative cases. Ovarian cancer was more common in the anti-TIF1-Ab-positive vs -negative cohort: 19% vs 2%, respectively (P < 0.05). No anti-TIF1-Ab-positive case <39 years of age developed cancer, compared with 21 (53%) of those ≥39 years of age. Conclusion Anti-TIF1-Ab-positive-associated malignancy occurs exclusively within the 3 year period on either side of DM onset, the risk being highest in those ≥39 years of age. Cancer types differ according to anti-TIF1-Ab status, and this may warrant specific cancer screening approaches.

Download Full-text

Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from the UK Biobank Stroke Outcomes Group

PLoS ONE ◽

10.1371/journal.pone.0140533 ◽

2015 ◽

Vol 10 (10) ◽

pp. e0140533 ◽

Cited By ~ 39

Author(s):

Rebecca Woodfield ◽

Ian Grant ◽

Cathie L. M. Sudlow ◽

◽

Keyword(s):

Systematic Review ◽

Electronic Health Record ◽

Large Scale ◽

Epidemiological Studies ◽

Health Record ◽

Uk Biobank ◽

Electronic Health Record Data ◽

Stroke Outcomes ◽

Record Data ◽

The Uk

Download Full-text

ESTIMASI PARAMETER PADA MODEL COX MULTIVARIAT DENGAN METODE MAXIMUM PARTIAL LIKELIHOOD ESTIMATION

Jurnal Ilmiah Matematika dan Pendidikan Matematika ◽

10.20884/1.jmp.2012.4.1.2954 ◽

2012 ◽

Vol 4 (1) ◽

pp. 185

Author(s):

Irfan Wahyudi ◽

Purhadi Purhadi ◽

Sutikno Sutikno ◽

Irhamah Irhamah

Keyword(s):

Parameter Estimation ◽

Likelihood Function ◽

Cox Model ◽

Hessian Matrix ◽

Likelihood Estimation ◽

Partial Likelihood ◽

Estimation Methods ◽

Hazard Functions ◽

Score Vector ◽

Parameter Estimation Methods

Multivariate Cox proportional hazard models have ratio property, that is the ratio of hazard functions for two individuals with covariate vectors z1 and z2 are constant (time independent). In this study we talk about estimation of prameters on multivariate Cox model by using Maximum Partial Likelihood Estimation (MPLE) method. To determine the appropriate estimators that maximize the ln-partial likelihood function, after a score vector and a Hessian matrix are found, numerical iteration methods are applied. In this case, we use a Newton Raphson method. This numerical method is used since the solutions of the equation system of the score vector after setting it equal to zero vector are not closed form. Considering the studies about multivariate Cox model are limited, including the parameter estimation methods, but the methods are urgently needed by some fields of study related such as economics, engineering and medical sciences. For this reasons, the goal of this study is designed to develop parameter estimation methods from univariate to multivariate cases.

Download Full-text