scholarly journals Some recent statistical learning methods for longitudinal high-dimensional data

2013 ◽  
Vol 6 (1) ◽  
pp. 10-18 ◽  
Author(s):  
Shuo Chen ◽  
Edward Grant ◽  
Tong Tong Wu ◽  
F. DuBois Bowman
2019 ◽  
pp. 1-9 ◽  
Author(s):  
Yize Zhao ◽  
Changgee Chang ◽  
Qi Long

High-dimensional -omics data such as genomic, transcriptomic, and metabolomic data offer great promise in advancing precision medicine. In particular, such data have enabled the investigation of complex diseases such as cancer at an unprecedented scale and in multiple dimensions. However, a number of analytical challenges complicate analysis of high-dimensional -omics data. One is the growing recognition that complex diseases such as cancer are multifactorial and may be attributed to harmful changes on multiple -omics levels and on the pathway level. When individual genes in an important pathway have relatively weak signals, it can be challenging to detect them on their own, but the aggregated signal in the pathway can be considerably stronger and hence easier to detect with the same sample size. To address these challenges, there is a growing body of literature on knowledge-guided statistical learning methods for analysis of high-dimensional -omics data that can incorporate biological knowledge such as functional genomics and functional proteomics. These methods have been shown to improve predication and classification accuracy and yield biologically more interpretable results compared with statistical learning methods that do not use biological knowledge. In this review, we survey current knowledge-guided statistical learning methods, including both supervised learning and unsupervised learning, and their applications to precision oncology, and we discuss future research directions.


2021 ◽  
Author(s):  
Mu Yue

In high-dimensional data, penalized regression is often used for variable selection and parameter estimation. However, these methods typically require time-consuming cross-validation methods to select tuning parameters and retain more false positives under high dimensionality. This chapter discusses sparse boosting based machine learning methods in the following high-dimensional problems. First, a sparse boosting method to select important biomarkers is studied for the right censored survival data with high-dimensional biomarkers. Then, a two-step sparse boosting method to carry out the variable selection and the model-based prediction is studied for the high-dimensional longitudinal observations measured repeatedly over time. Finally, a multi-step sparse boosting method to identify patient subgroups that exhibit different treatment effects is studied for the high-dimensional dense longitudinal observations. This chapter intends to solve the problem of how to improve the accuracy and calculation speed of variable selection and parameter estimation in high-dimensional data. It aims to expand the application scope of sparse boosting and develop new methods of high-dimensional survival analysis, longitudinal data analysis, and subgroup analysis, which has great application prospects.


2015 ◽  
Vol 14s5 ◽  
pp. CIN.S30804 ◽  
Author(s):  
Amin Zollanvari

High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical-statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and present the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject.


2009 ◽  
Vol 35 (7) ◽  
pp. 859-866
Author(s):  
Ming LIU ◽  
Xiao-Long WANG ◽  
Yuan-Chao LIU

Sign in / Sign up

Export Citation Format

Share Document