Some recent statistical learning methods for longitudinal high-dimensional data

Shuo Chen; Edward Grant; Tong Tong Wu; F. DuBois Bowman

doi:10.1002/wics.1282

Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology

JCO Precision Oncology ◽

10.1200/po.19.00018 ◽

2019 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Yize Zhao ◽

Changgee Chang ◽

Qi Long

Keyword(s):

Statistical Learning ◽

Current Knowledge ◽

Complex Diseases ◽

High Dimensional ◽

Future Research ◽

Great Promise ◽

Biological Knowledge ◽

Precision Oncology ◽

Omics Data ◽

Learning Methods

High-dimensional -omics data such as genomic, transcriptomic, and metabolomic data offer great promise in advancing precision medicine. In particular, such data have enabled the investigation of complex diseases such as cancer at an unprecedented scale and in multiple dimensions. However, a number of analytical challenges complicate analysis of high-dimensional -omics data. One is the growing recognition that complex diseases such as cancer are multifactorial and may be attributed to harmful changes on multiple -omics levels and on the pathway level. When individual genes in an important pathway have relatively weak signals, it can be challenging to detect them on their own, but the aggregated signal in the pathway can be considerably stronger and hence easier to detect with the same sample size. To address these challenges, there is a growing body of literature on knowledge-guided statistical learning methods for analysis of high-dimensional -omics data that can incorporate biological knowledge such as functional genomics and functional proteomics. These methods have been shown to improve predication and classification accuracy and yield biologically more interpretable results compared with statistical learning methods that do not use biological knowledge. In this review, we survey current knowledge-guided statistical learning methods, including both supervised learning and unsupervised learning, and their applications to precision oncology, and we discuss future research directions.

Download Full-text

Sparse Boosting Based Machine Learning Methods for High-Dimensional Data

10.5772/intechopen.100506 ◽

2021 ◽

Author(s):

Mu Yue

Keyword(s):

Machine Learning ◽

Parameter Estimation ◽

Variable Selection ◽

Survival Data ◽

High Dimensional Data ◽

High Dimensional ◽

Learning Methods ◽

Require Time ◽

Machine Learning Methods ◽

Boosting Method

In high-dimensional data, penalized regression is often used for variable selection and parameter estimation. However, these methods typically require time-consuming cross-validation methods to select tuning parameters and retain more false positives under high dimensionality. This chapter discusses sparse boosting based machine learning methods in the following high-dimensional problems. First, a sparse boosting method to select important biomarkers is studied for the right censored survival data with high-dimensional biomarkers. Then, a two-step sparse boosting method to carry out the variable selection and the model-based prediction is studied for the high-dimensional longitudinal observations measured repeatedly over time. Finally, a multi-step sparse boosting method to identify patient subgroups that exhibit different treatment effects is studied for the high-dimensional dense longitudinal observations. This chapter intends to solve the problem of how to improve the accuracy and calculation speed of variable selection and parameter estimation in high-dimensional data. It aims to expand the application scope of sparse boosting and develop new methods of high-dimensional survival analysis, longitudinal data analysis, and subgroup analysis, which has great application prospects.

Download Full-text

High-Dimensional Statistical Learning: Roots, Justifications, and Potential Machineries

Cancer Informatics ◽

10.4137/cin.s30804 ◽

2015 ◽

Vol 14s5 ◽

pp. CIN.S30804 ◽

Cited By ~ 2

Author(s):

Amin Zollanvari

Keyword(s):

Sample Size ◽

Statistical Learning ◽

High Dimensional Data ◽

Operating Conditions ◽

High Dimensional ◽

Finite Sample ◽

Statistical Software ◽

Software Packages ◽

State Of Affairs ◽

Remarkable Progress

High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical-statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and present the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject.

Download Full-text

Machine Learning Methods for Mortality Prediction of Polytraumatized Patients in Intensive Care Units – Dealing with Imbalanced and High-Dimensional Data

Intelligent Data Engineering and Automated Learning – IDEAL 2014 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-10840-7_38 ◽

2014 ◽

pp. 309-317 ◽

Cited By ~ 3

Author(s):

María N. Moreno García ◽

Javier González Robledo ◽

Félix Martín González ◽

Fernando Sánchez Hernández ◽

Mercedes Sánchez Barba

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Intensive Care Units ◽

High Dimensional Data ◽

Mortality Prediction ◽

High Dimensional ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Large Sample Covariance Matrices and High-Dimensional Data Analysis

10.1017/cbo9781107588080 ◽

2015 ◽

Cited By ~ 26

Author(s):

Jianfeng Yao ◽

Shurong Zheng ◽

Zhidong Bai

Keyword(s):

Data Analysis ◽

High Dimensional Data ◽

Covariance Matrices ◽

High Dimensional ◽

Large Sample ◽

Sample Covariance Matrices ◽

Sample Covariance ◽

High Dimensional Data Analysis

Download Full-text

Fractal-Based Methods as a Technique for Estimating the Intrinsic Dimensionality of High-Dimensional Data: A Survey

Informatica ◽

10.15388/informatica.2016.84 ◽

2016 ◽

Vol 27 (2) ◽

pp. 257-281 ◽

Cited By ~ 5

Author(s):

Rasa Karbauskaitė ◽

Gintautas Dzemyda

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Intrinsic Dimensionality

Download Full-text

A Fast Clustering Algorithm for Large-scale and High Dimensional Data

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.00859 ◽

2009 ◽

Vol 35 (7) ◽

pp. 859-866

Author(s):

Ming LIU ◽

Xiao-Long WANG ◽

Yuan-Chao LIU

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Improved negative selection algorithm for network anomaly detection on high-dimensional data

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.00805 ◽

2009 ◽

Vol 29 (3) ◽

pp. 805-807 ◽

Cited By ~ 1

Author(s):

Wen-zhong GUO ◽

Guo-long CHEN ◽

Qing-liang CHEN

Keyword(s):

Anomaly Detection ◽

Negative Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Algorithm ◽

Negative Selection Algorithm ◽

Network Anomaly Detection

Download Full-text

An Advanced Mining Services in Predicting and Ranking User Vitality across Dynamic and High Dimensional Data Sets

SSRN Electronic Journal ◽

10.2139/ssrn.3395242 ◽

2019 ◽

Author(s):

Ch. Durga Bhavani ◽

Dr. A. Daveedu Raju ◽

Dr. V. Surya Narayana

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Data Sets

Download Full-text

Outlier Detection in High Dimensional Data Based on the Anti-Hub and Regression Technique

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2017.8219 ◽

2017 ◽

Vol V (VIII) ◽

pp. 1543-1551

Author(s):

Golla Hemalatha

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Regression Technique ◽

High Dimensional

Download Full-text