scholarly journals Fast Lasso method for Large-scale and Ultrahigh-dimensional Cox Model with applications to UK Biobank

Author(s):  
Ruilin Li ◽  
Christopher Chang ◽  
Johanne Marie Justesen ◽  
Yosuke Tanigawa ◽  
Junyang Qian ◽  
...  

AbstractWe develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the L1-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in (Qian et al. 2019). The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow et al. 2015). Our approach, which we refer to as snpnet-Cox, is implemented in a publicly available package.

Author(s):  
Ruilin Li ◽  
Christopher Chang ◽  
Johanne M Justesen ◽  
Yosuke Tanigawa ◽  
Junyang Qiang ◽  
...  

Summary We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the $L^1$-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in Qian and others (2019). Our algorithm is particularly suitable for large-scale and high-dimensional data that do not fit in the memory. The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow and others, 2015). We provide a publicly available implementation of the proposed approach for genetics data on top of the PLINK2 package and name it snpnet-Cox.


Biostatistics ◽  
2021 ◽  
Author(s):  
Ruilin Li ◽  
Christopher Chang ◽  
Johanne M Justesen ◽  
Yosuke Tanigawa ◽  
Junyang Qian ◽  
...  

2021 ◽  
Author(s):  
Wenyi Wang ◽  
Ephrem Baraki Tesfay ◽  
Ko Willems van Dijk ◽  
Andrzej Bartke ◽  
Diana van Heemst ◽  
...  

Aims/hypothesis: There is inconsistent evidence for the causal role of serum insulin-like growth factor-1 (IGF-1) concentration in the pathogenesis of type 2 diabetes. Here, we investigated the association between IGF-1 and type 2 diabetes using a combination of multivariable-adjusted and (clustered) Mendelian Randomization (MR) analyses in the UK Biobank. Methods: We conducted Cox proportional hazard analyses in 451,232 European-ancestry individuals of the UK Biobank (55.3% women, mean age at recruitment 56.6 years), among which 13,247 individuals developed type 2 diabetes during up to 12 years of follow-up. In addition, we conducted two-sample MR analyses based on independent SNPs associated with IGF-1. Given the heterogeneity between the causal estimates of individual instruments (P-value for Q statistic=4.03e-145), we also conducted clustered MR analyses. Biological pathway analyses of the identified clusters were performed by overrepresentation analyses. Results: In the Cox proportional hazard models, with IGF-1 concentrations stratified in quintiles, we observed that participants in the lowest quintile had the highest relative risk of type 2 diabetes (HR: 1.31; CI: 1.23-1.39). In contrast, in the two-sample MR analyses, higher genetically-influenced IGF-1 was associated with a higher risk of type 2 diabetes. Based on the heterogeneous distribution of causal effect estimates, six clusters associated either with a lower or a higher risk of type 2 diabetes were identified. The main clusters in which a higher IGF-1 was associated with a lower risk of type 2 diabetes consisted of instruments mapping to genes in the growth-hormone signaling pathway, whereas the main clusters in which a higher IGF-1 was associated with a higher risk of type 2 diabetes consisted of instruments mapping to genes in pathways related to amino acid metabolism and genomic integrity. Conclusion: The IGF-1 associated SNPs used as genetic instruments in MR analyses showed a heterogeneous distribution of causal effect estimates on the risk of type 2 diabetes. This was likely explained by differences in the underlying molecular pathways that increase IGF-1 concentration and differentially mediate the effects of IGF-1 on type 2 diabetes.


2020 ◽  
Vol 9 (4) ◽  
pp. 402-410
Author(s):  
Triastuti Wuryandari ◽  
Sri Haryatmi Kartiko ◽  
Danardono Danardono

Survival data is the length of time until an event occurs. If  the survival  time is affected by other factor, it can be modeled with a regression model. The regression model for survival data is commonly based  on the Cox proportional hazard model. In the Cox proportional hazard model, the covariate effect act  multiplicatively on unknown baseline hazard. Alternative to the multiplicative hazard model is the additive hazard model. One of  the additive hazard models is the semiparametric additive  hazard model  that introduced by Lin Ying in 1994.  The regression coefficient estimates in this model mimic the scoring equation in the Cox model. Score equation of Cox model is the derivative of the Partial Likelihood and methods to maximize partial likelihood with Newton Raphson iterasi. Subject from this paper is describe the multiplicative and additive hazard model that applied to the duration of the birth process. The data is obtained from two different clinics,there are clinic that applies gentlebirth method while the other one no gentlebirth. From the data processing obtained the factors that affect on the duration of the birth process are baby’s weight, baby’s height and  method of birth. Keywords: survival, additive hazard model, cox proportional hazard, partial likelihood, gentlebirth, duration


Author(s):  
Herawati Hafid ◽  
Muhammad Nadjib Bustan ◽  
Muhammad Kasim Aidid

Abstrak Analisis Survival adalah prosedur statistika yang digunakan untuk menganalisis data dimana peubah yang diperhatikan adalah waktu sampai terjadinya suatu event. Waktu dapat dinyatakan dalam hitungan hari, minggu, bulan dan tahun. Salah satu tujuan dari analisis survival adalah untuk mengetahui hubungan antara waktu kejadian  peubah bebas yang terukur pada saat dilakukan penelitian. Metode yang sering digunakan dalam analisis survival khususnya data kesehatan adalah Regresi Cox Proportional Hazard (PH) karena distribusinya tidak tergantung pada asumsi waktu kejadian. Dalam suatu data seperti data pasien penderita penyakit Demam Berdarah Dengue (DBD) ditemukan adanya data kejadian bersama (ties event) yang berpengaruh pada pembentukan himpunan risikonya pada bagian estimasi parameter model cox,pada kasus kejadian bersama (ties event) dilakukan modifikasi pada partial likelihood untuk mengetahui faktor-faktor yang mempengaruhi laju kesembuhan pasien penderita penyakit DBD. Adapun hasil analisisnya, diperoleh faktor yang paling berpengaruh terhadap laju kesembuhan penyakit DBD yakni leukosit dengan p-value =0,097< α 0,05, dan nilai hazard ratio sebesar 1,1024 serta faktor yang kedua yaitu hematokrit dengan p-value =0,0141< α 0,05, dan nilai hazard ratio sebesar 1,595. Kata Kunci: Analisis Survival, Regresi Cox PH, Ties Event, Metode Breslow, Demam Berdarah Dengue (DBD). Abstract Survival analysis is a statistical procedure that is used to analyze data where the variables considered are the time until the occurrence of an event. Time can be expressed in days, weeks, months and years. One of the objectives of survival analysis is to find out the relationship between the time of occurrence of independent variables measured at the time of the study. The method often used in survival analysis, especially health data, is Cox Proportional Hazard (PH) Regression because its distribution does not depend on the assumption of the time of the event. In a data such as data on patients with Dengue Hemorrhagic Fever (DHF) data, there were ties event data that influenced the formation of risk sets in the cox model parameter estimation section, in the case of ties event modifications were made to the partial likelihood for know the factors that influence the recovery rate of patients with DHF. As for the results of the analysis, the factors that most influence the recovery rate of leucocyte dengue fever with p-value = 0,097 < α = 0,05 and the hazard ratio of 1.1024 and the second factor is the hematocrit with p-value = 0,0141 < α = 0,05 and the hazard ratio valueamounting to 1,595. Keywords: Survival Analysis, Cox PH Regression, Ties Event, Breslow Method, Dengue Hemorrhagic Fever (DHF).


Liver Cancer ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 734-743
Author(s):  
Kazuya Kariyama ◽  
Kazuhiro Nouso ◽  
Atsushi Hiraoka ◽  
Akiko Wakuta ◽  
Ayano Oonishi ◽  
...  

<b><i>Introduction:</i></b> The ALBI score is acknowledged as the gold standard for the assessment of liver function in patients with hepatocellular carcinoma (HCC). Unlike the Child-Pugh score, the ALBI score uses only objective parameters, albumin (Alb) and total bilirubin (T.Bil), enabling a better evaluation. However, the complex calculation of the ALBI score limits its applicability. Therefore, we developed a simplified ALBI score, based on data from a large-scale HCC database.We used the data of 5,249 naïve HCC cases registered in eight collaborating hospitals. <b><i>Methods:</i></b> We developed a new score, the EZ (Easy)-ALBI score, based on regression coefficients of Alb and T.Bil for survival risk in a multivariate Cox proportional hazard model. We also developed the EZ-ALBI grade and EZ-ALBI-T grade as alternative options for the ALBI grade and ALBI-T grade and evaluated their stratifying ability. <b><i>Results:</i></b> The equation used to calculate the EZ-ALBI score was simple {[T.Bil (mg/dL)] – [9 × Alb (g/dL)]}; this value highly correlated with the ALBI score (correlation coefficient, 0.981; <i>p</i> &#x3c; 0.0001). The correlation was preserved across different Barcelona clinic liver cancer grade scores (regression coefficient, 0.93–0.98) and across different hospitals (regression coefficient, 0.98–0.99), indicating good generalizability. Although a good agreement was observed between ALBI and EZ-ALBI, discrepancies were observed in patients with poor liver function (T.Bil, ≥3 mg/dL; regression coefficient, 0.877). The stratifying ability of EZ-ALBI grade and EZ-ALBI-T grade were good and their Akaike’s information criterion values (35,897 and 34,812, respectively) were comparable with those of ALBI grade and ALBI-T grade (35,914 and 34,816, respectively). <b><i>Conclusions:</i></b> The EZ-ALBI score, EZ-ALBI grade, and EZ-ALBI-T grade are useful, simple scores, which might replace the conventional ALBI score in the future.


Rheumatology ◽  
2018 ◽  
Vol 58 (4) ◽  
pp. 650-655 ◽  
Author(s):  
Alexander Oldroyd ◽  
Jamie C Sergeant ◽  
Paul New ◽  
Neil J McHugh ◽  
Zoe Betteridge ◽  
...  

Abstract Objectives To characterize the 10 year relationship between anti-transcriptional intermediary factor 1 antibody (anti-TIF1-Ab) positivity and cancer onset in a large UK-based adult DM cohort. Methods Data from anti-TIF1-Ab-positive/-negative adults with verified diagnoses of DM from the UK Myositis Network register were analysed. Each patient was followed up until they developed cancer. Kaplan–Meier methods and Cox proportional hazard modelling were employed to estimate the cumulative cancer incidence. Results Data from 263 DM cases were analysed, with a total of 3252 person-years and a median 11 years of follow-up; 55 (21%) DM cases were anti-TIF1-Ab positive. After 10 years of follow-up, a higher proportion of anti-TIF1-Ab-positive cases developed cancer compared with anti-TIF1-Ab-negative cases: 38% vs 15% [hazard ratio 3.4 (95% CI 2.2, 5.4)]. All the detected malignancy cases in the anti-TIF1-Ab-positive cohort occurred between 3 years prior to and 2.5 years after DM onset. No cancer cases were detected within the following 7.5 years in this group, whereas cancers were detected during this period in the anti-TIF1-Ab-negative cases. Ovarian cancer was more common in the anti-TIF1-Ab-positive vs -negative cohort: 19% vs 2%, respectively (P < 0.05). No anti-TIF1-Ab-positive case <39 years of age developed cancer, compared with 21 (53%) of those ≥39 years of age. Conclusion Anti-TIF1-Ab-positive-associated malignancy occurs exclusively within the 3 year period on either side of DM onset, the risk being highest in those ≥39 years of age. Cancer types differ according to anti-TIF1-Ab status, and this may warrant specific cancer screening approaches.


2012 ◽  
Vol 4 (1) ◽  
pp. 185
Author(s):  
Irfan Wahyudi ◽  
Purhadi Purhadi ◽  
Sutikno Sutikno ◽  
Irhamah Irhamah

Multivariate Cox proportional hazard models have ratio property, that is the ratio of  hazard functions for two individuals with covariate vectors  z1 and  z2 are constant (time independent). In this study we talk about estimation of prameters on multivariate Cox model by using Maximum Partial Likelihood Estimation (MPLE) method. To determine the appropriate estimators  that maximize the ln-partial likelihood function, after a score vector and a Hessian matrix are found, numerical iteration methods are applied. In this case, we use a Newton Raphson method. This numerical method is used since the solutions of the equation system of the score vector after setting it equal to zero vector are not closed form. Considering the studies about multivariate Cox model are limited, including the parameter estimation methods, but the methods are urgently needed by some fields of study related such as economics, engineering and medical sciences. For this reasons, the goal of this study is designed to develop parameter estimation methods from univariate to multivariate cases.


2019 ◽  
Vol 29 ◽  
pp. S125-S126
Author(s):  
Amanda Gentry ◽  
Roseann Peterson ◽  
Alexis Edwards ◽  
Brien Riley ◽  
B. Todd Webb

Sign in / Sign up

Export Citation Format

Share Document