Sparse Boosting Based Machine Learning Methods for High-Dimensional Data

Mapping Intimacies ◽

10.5772/intechopen.100506 ◽

2021 ◽

Author(s):

Mu Yue

Keyword(s):

Machine Learning ◽

Parameter Estimation ◽

Variable Selection ◽

Survival Data ◽

High Dimensional Data ◽

High Dimensional ◽

Learning Methods ◽

Require Time ◽

Machine Learning Methods ◽

Boosting Method

In high-dimensional data, penalized regression is often used for variable selection and parameter estimation. However, these methods typically require time-consuming cross-validation methods to select tuning parameters and retain more false positives under high dimensionality. This chapter discusses sparse boosting based machine learning methods in the following high-dimensional problems. First, a sparse boosting method to select important biomarkers is studied for the right censored survival data with high-dimensional biomarkers. Then, a two-step sparse boosting method to carry out the variable selection and the model-based prediction is studied for the high-dimensional longitudinal observations measured repeatedly over time. Finally, a multi-step sparse boosting method to identify patient subgroups that exhibit different treatment effects is studied for the high-dimensional dense longitudinal observations. This chapter intends to solve the problem of how to improve the accuracy and calculation speed of variable selection and parameter estimation in high-dimensional data. It aims to expand the application scope of sparse boosting and develop new methods of high-dimensional survival analysis, longitudinal data analysis, and subgroup analysis, which has great application prospects.

Download Full-text

Machine Learning Methods for Mortality Prediction of Polytraumatized Patients in Intensive Care Units – Dealing with Imbalanced and High-Dimensional Data

Intelligent Data Engineering and Automated Learning – IDEAL 2014 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-10840-7_38 ◽

2014 ◽

pp. 309-317 ◽

Cited By ~ 3

Author(s):

María N. Moreno García ◽

Javier González Robledo ◽

Félix Martín González ◽

Fernando Sánchez Hernández ◽

Mercedes Sánchez Barba

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Intensive Care Units ◽

High Dimensional Data ◽

Mortality Prediction ◽

High Dimensional ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Prediction of Hanwoo Cattle Phenotypes from Genotypes Using Machine Learning Methods

Animals ◽

10.3390/ani11072066 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2066

Author(s):

Swati Srivastava ◽

Bryan Irvine Lopez ◽

Himansu Kumar ◽

Myoungjin Jang ◽

Han-Ha Chai ◽

...

Keyword(s):

Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Eye Muscle ◽

Important Species ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Boosting Method ◽

Predictive Correlation ◽

Hanwoo Cattle

Hanwoo was originally raised for draft purposes, but the increase in local demand for red meat turned that purpose into full-scale meat-type cattle rearing; it is now considered one of the most economically important species and a vital food source for Koreans. The application of genomic selection in Hanwoo breeding programs in recent years was expected to lead to higher genetic progress. However, better statistical methods that can improve the genomic prediction accuracy are required. Hence, this study aimed to compare the predictive performance of three machine learning methods, namely, random forest (RF), extreme gradient boosting method (XGB), and support vector machine (SVM), when predicting the carcass weight (CWT), marbling score (MS), backfat thickness (BFT) and eye muscle area (EMA). Phenotypic and genotypic data (53,866 SNPs) from 7324 commercial Hanwoo cattle that were slaughtered at the age of around 30 months were used. The results showed that the boosting method XGB showed the highest predictive correlation for CWT and MS, followed by GBLUP, SVM, and RF. Meanwhile, the best predictive correlation for BFT and EMA was delivered by GBLUP, followed by SVM, RF, and XGB. Although XGB presented the highest predictive correlations for some traits, we did not find an advantage of XGB or any machine learning methods over GBLUP according to the mean squared error of prediction. Thus, we still recommend the use of GBLUP in the prediction of genomic breeding values for carcass traits in Hanwoo cattle.

Download Full-text

A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

Business Systems Research Journal ◽

10.2478/bsrj-2014-0021 ◽

2014 ◽

Vol 5 (3) ◽

pp. 82-96 ◽

Cited By ~ 3

Author(s):

Marijana Zekić-Sušac ◽

Sanja Pfeifer ◽

Nataša Šarlija

Keyword(s):

Neural Network ◽

Machine Learning ◽

Classification Accuracy ◽

Classification Problem ◽

High Dimensional ◽

Nearest Neighbour ◽

Learning Methods ◽

Machine Learning Methods ◽

Dimensional Classification ◽

Artificial Neural

Abstract Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.

Download Full-text

TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification

Bioinformatics ◽

10.1093/bioinformatics/btz394 ◽

2019 ◽

Vol 35 (14) ◽

pp. i31-i40 ◽

Cited By ~ 1

Author(s):

Erfan Sayyari ◽

Ban Kawas ◽

Siavash Mirarab

Keyword(s):

Machine Learning ◽

Sample Size ◽

Data Augmentation ◽

Training Data ◽

Supplementary Information ◽

High Dimensional ◽

Learning Methods ◽

Machine Learning Methods ◽

Phenotype Classification ◽

Microbiome Data

Abstract Motivation Learning associations of traits with the microbial composition of a set of samples is a fundamental goal in microbiome studies. Recently, machine learning methods have been explored for this goal, with some promise. However, in comparison to other fields, microbiome data are high-dimensional and not abundant; leading to a high-dimensional low-sample-size under-determined system. Moreover, microbiome data are often unbalanced and biased. Given such training data, machine learning methods often fail to perform a classification task with sufficient accuracy. Lack of signal is especially problematic when classes are represented in an unbalanced way in the training data; with some classes under-represented. The presence of inter-correlations among subsets of observations further compounds these issues. As a result, machine learning methods have had only limited success in predicting many traits from microbiome. Data augmentation consists of building synthetic samples and adding them to the training data and is a technique that has proved helpful for many machine learning tasks. Results In this paper, we propose a new data augmentation technique for classifying phenotypes based on the microbiome. Our algorithm, called TADA, uses available data and a statistical generative model to create new samples augmenting existing ones, addressing issues of low-sample-size. In generating new samples, TADA takes into account phylogenetic relationships between microbial species. On two real datasets, we show that adding these synthetic samples to the training set improves the accuracy of downstream classification, especially when the training data have an unbalanced representation of classes. Availability and implementation TADA is available at https://github.com/tada-alg/TADA. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Joint Model of Longitudinal and Survival Data—Based on Machine Learning Methods

Statistics and Applications ◽

10.12677/sa.2015.44028 ◽

2015 ◽

Vol 04 (04) ◽

pp. 252-261

Author(s):

征温

Keyword(s):

Machine Learning ◽

Survival Data ◽

Joint Model ◽

Learning Methods ◽

Machine Learning Methods ◽

Longitudinal And Survival Data

Download Full-text

A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling

Scientific Reports ◽

10.1038/s41598-017-13448-3 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 69

Author(s):

Stefan Leger ◽

Alex Zwanenburg ◽

Karoline Pilz ◽

Fabian Lohaus ◽

Annett Linge ◽

...

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Survival Data ◽

Time To Event ◽

Learning Methods ◽

Risk Modelling ◽

Machine Learning Methods

Download Full-text

Morphological classification of brains via high-dimensional shape transformations and machine learning methods

NeuroImage ◽

10.1016/j.neuroimage.2003.09.027 ◽

2004 ◽

Vol 21 (1) ◽

pp. 46-57 ◽

Cited By ~ 236

Author(s):

Zhiqiang Lao ◽

Dinggang Shen ◽

Zhong Xue ◽

Bilge Karacali ◽

Susan M. Resnick ◽

...

Keyword(s):

Machine Learning ◽

High Dimensional ◽

Morphological Classification ◽

Learning Methods ◽

Machine Learning Methods ◽

Shape Transformations ◽

Dimensional Shape

Download Full-text

Using high-dimensional machine learning methods to estimate an anatomical risk factor for Alzheimer's disease across imaging databases

NeuroImage ◽

10.1016/j.neuroimage.2018.08.040 ◽

2018 ◽

Vol 183 ◽

pp. 401-411 ◽

Cited By ~ 13

Author(s):

Ramon Casanova ◽

Ryan T. Barnard ◽

Sarah A. Gaussoin ◽

Santiago Saldana ◽

Kathleen M. Hayden ◽

...

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Risk Factor ◽

High Dimensional ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Machine learning methods for robust parameter estimation

Artificial Intelligence for Computational Modeling of the Heart ◽

10.1016/b978-0-12-817594-1.00016-4 ◽

2020 ◽

pp. 161-181

Author(s):

Dominik Neumann ◽

Tommaso Mansi

Keyword(s):

Machine Learning ◽

Parameter Estimation ◽

Learning Methods ◽

Machine Learning Methods ◽

Robust Parameter ◽

Robust Parameter Estimation

Download Full-text

Editorial: Application of Novel Statistical and Machine-Learning Methods to High-Dimensional Clinical Cancer and (Multi-)Omics Data

Frontiers in Genetics ◽

10.3389/fgene.2021.739442 ◽

2021 ◽

Vol 12 ◽

Author(s):

Chao Xu ◽

Shaolong Cao ◽

Md. Ashad Alam

Keyword(s):

Machine Learning ◽

High Dimensional ◽

Omics Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Clinical Cancer

Download Full-text