Network-adjusted Kendall’s Tau Measure for Feature Screening with Application to High-dimensional Survival Genomic Data

Author(s):  
Jie-Huei Wang ◽  
Yi-Hau Chen

Abstract Motivation In high-dimensional genetic/genomic data, the identification of genes related to clinical survival trait is a challenging and important issue. In particular, right-censored survival outcomes and contaminated biomarker data make the relevant feature screening difficult. Several independence screening methods have been developed, but they fail to account for gene–gene dependency information, and may be sensitive to outlying feature data. Results We improve the inverse probability-of-censoring weighted (IPCW) Kendall’s tau statistic by using Google’s PageRank Markov matrix to incorporate feature dependency network information. Also, to tackle outlying feature data, the nonparanormal approach transforming the feature data to multivariate normal variates are utilized in the graphical lasso procedure to estimate the network structure in feature data. Simulation studies under various scenarios show that the proposed network-adjusted weighted Kendall’s tau approach leads to more accurate feature selection and survival prediction than the methods without accounting for feature dependency network information and outlying feature data. The applications on the clinical survival outcome data of diffuse large B-cell lymphoma and of The Cancer Genome Atlas lung adenocarcinoma patients demonstrate clearly the advantages of the new proposal over the alternative methods. Supplementary information Supplementary data are available at Bioinformatics online.

1997 ◽  
Vol 81 (2) ◽  
pp. 655-658
Author(s):  
Manuel Martinez-Pons

A rank-free method of calculating Kendall's τ is described. Derived from Daniels' 1944 general treatment of correlation, it is based on the signs of all possible paired comparisons in a data set. Unlike the 1975 alternative methods of Cooper and those of 1977 by Stuart, it allows for ties and thus yields the same coefficient as that of Kendall's original method. The execution is simpler, however, because it does not require ranking of data.


Author(s):  
Baoshan Ma ◽  
Ge Yan ◽  
Bingjie Chai ◽  
Xiaoyu Hou

Abstract Motivation Survival analysis using gene expression profiles plays a crucial role in the interpretation of clinical research and assessment of disease therapy programs. Several prediction models have been developed to explore the relationship between patients’ covariates and survival. However, the high-dimensional genomic features limit the prediction performance of the survival model. Thus, an accurate and reliable prediction model is necessary for survival analysis using high-dimensional genomic data. Results In this study, we proposed an improved survival prediction model based on XGBoost framework called XGBLC, which used Lasso-Cox to enhance the ability to analyze high-dimensional genomic data. The novel first- and second-order gradient statistics of Lasso-Cox were defined to construct the loss function of XGBLC. We extensively tested our XGBLC algorithm on both simulated and real-world datasets, and estimated the performance of models with 5-fold cross-validation. Based on 20 cancer datasets from The Cancer Genome Atlas (TCGA), XGBLC outperforms five state-of-the-art survival methods in terms of C-index, Brier score and AUC. The results show that XGBLC still keeps good accuracy and robustness by comparing the performance on the simulated datasets with different scales. The developed prediction model would be beneficial for physicians to understand the effects of patient’s genomic characteristics on survival and make personalized treatment decisions. Availability and implementation The implementation of XGBLC algorithm based on R language is available at: https://github.com/lab319/XGBLC Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (10) ◽  
pp. 3004-3010
Author(s):  
Huang Xu ◽  
Xiang Li ◽  
Yaning Yang ◽  
Yi Li ◽  
Jose Pinheiro ◽  
...  

Abstract Motivation With the emerging of high-dimensional genomic data, genetic analysis such as genome-wide association studies (GWAS) have played an important role in identifying disease-related genetic variants and novel treatments. Complex longitudinal phenotypes are commonly collected in medical studies. However, since limited analytical approaches are available for longitudinal traits, these data are often underutilized. In this article, we develop a high-throughput machine learning approach for multilocus GWAS using longitudinal traits by coupling Empirical Bayesian Estimates from mixed-effects modeling with a novel ℓ0-norm algorithm. Results Extensive simulations demonstrated that the proposed approach not only provided accurate selection of single nucleotide polymorphisms (SNPs) with comparable or higher power but also robust control of false positives. More importantly, this novel approach is highly scalable and could be approximately >1000 times faster than recently published approaches, making genome-wide multilocus analysis of longitudinal traits possible. In addition, our proposed approach can simultaneously analyze millions of SNPs if the computer memory allows, thereby potentially allowing a true multilocus analysis for high-dimensional genomic data. With application to the data from Alzheimer's Disease Neuroimaging Initiative, we confirmed that our approach can identify well-known SNPs associated with AD and were much faster than recently published approaches (≥6000 times). Availability and implementation The source code and the testing datasets are available at https://github.com/Myuan2019/EBE_APML0. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
R. J. M. Bruls ◽  
R. M. Kwee

Abstract Background The objective of this study is to investigate the workload for radiologists during on-call hours and to quantify the 15-year trend in a large general hospital in Western Europe. Methods Data regarding the number of X-ray, ultrasound and computed tomography (CT) studies during on-call hours (weekdays between 6.00 p.m. and 7.00 a.m., weekends, and national holidays) between 2006 and 2020 were extracted from the picture archiving and communication system. All studies were converted into relative value units (RVUs) to estimate the on-call workload. The Mann–Kendall test was performed to assess the temporal trend. Results The total RVUs during on-call hours showed a significant increase between 2006 and 2020 (Kendall's tau-b = 0.657, p = 0.001). The overall workload in terms of RVUs during on-call hours has quadrupled. The number of X-ray studies significantly decreased (Kendall's tau-b = − 0.433, p = 0.026), whereas the number of CT studies significantly increased (Kendall's tau-b = 0.875, p < 0.001) between 2006 and 2020. CT studies which increased by more than 500% between 2006 and 2020 are CT for head trauma, brain CTA, brain CTV, chest CT (for suspected pulmonary embolism), spinal CT, neck CT, pelvic CT, and CT for suspected aortic dissection. The number of ultrasound studies did not change significantly (Kendall's tau-b = 0.202, p = 0.298). Conclusions The workload for radiologists during on-call hours increased dramatically in the past 15 years. The growing amount of CT studies is responsible for this increase. Radiologist and technician workforce should be matched to this ongoing increasing trend to avoid potential burn-out and to maintain quality and safety of radiological care.


Sign in / Sign up

Export Citation Format

Share Document