scholarly journals AucPR: An AUC-based approach using penalized regression for disease prediction with high-dimensional omics data

BMC Genomics ◽  
2014 ◽  
Vol 15 (S10) ◽  
Author(s):  
Wenbao Yu ◽  
Taesung Park
2017 ◽  
Vol 2017 ◽  
pp. 1-14 ◽  
Author(s):  
Anne-Laure Boulesteix ◽  
Riccardo De Bin ◽  
Xiaoyu Jiang ◽  
Mathias Fuchs

As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Hongwei Sun ◽  
Jiu Wang ◽  
Zhongwen Zhang ◽  
Naibao Hu ◽  
Tong Wang

High dimensionality and noise have made it difficult to detect related biomarkers in omics data. Through previous study, penalized maximum trimmed likelihood estimation is effective in identifying mislabeled samples in high-dimensional data with mislabeled error. However, the algorithm commonly used in these studies is the concentration step (C-step), and the C-step algorithm that is applied to robust penalized regression does not ensure that the criterion function is gradually optimized iteratively, because the regularized parameters change during the iteration. This makes the C-step algorithm runs very slowly, especially when dealing with high-dimensional omics data. The AR-Cstep (C-step combined with an acceptance-rejection scheme) algorithm is proposed. In simulation experiments, the AR-Cstep algorithm converged faster (the average computation time was only 2% of that of the C-step algorithm) and was more accurate in terms of variable selection and outlier identification than the C-step algorithm. The two algorithms were further compared on triple negative breast cancer (TNBC) RNA-seq data. AR-Cstep can solve the problem of the C-step not converging and ensures that the iterative process is in the direction that improves criterion function. As an improvement of the C-step algorithm, the AR-Cstep algorithm can be extended to other robust models with regularized parameters.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Van Hoan Do ◽  
Stefan Canzar

AbstractEmerging single-cell technologies profile multiple types of molecules within individual cells. A fundamental step in the analysis of the produced high-dimensional data is their visualization using dimensionality reduction techniques such as t-SNE and UMAP. We introduce j-SNE and j-UMAP as their natural generalizations to the joint visualization of multimodal omics data. Our approach automatically learns the relative contribution of each modality to a concise representation of cellular identity that promotes discriminative features but suppresses noise. On eight datasets, j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes.


2019 ◽  
Author(s):  
Wikum Dinalankara ◽  
Qian Ke ◽  
Donald Geman ◽  
Luigi Marchionni

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.


Diabetes is a long-term disease that ends up in multiple side-effects. It has now become a reticent exterminator in society because it doesn’t reveal any signs hitherto to the patients until it’s too late. It leads to many complications to other organs, such as kidney, cardiovascular, liver or blood pressure [1]. This work tends to apply a unique multitask learning [2] to synchronously map the relation between manifold complications wherever every task conforms to risks of modelling of complications [3]. It also uses feature selection to reduce the set of risk factors from high-dimensional datasets. Then using the concept of correlation, it finds the degree of relativity among various sideeffects. The proposed method is able to identify the possible future health hazards identified with the diabetes patient. This will enable us to explain medical conditions and can improves healthcare applications which would help to improve disease prediction performance.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249002
Author(s):  
Wikum Dinalankara ◽  
Qian Ke ◽  
Donald Geman ◽  
Luigi Marchionni

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.


2018 ◽  
pp. 447-472
Author(s):  
N. Sedaghat ◽  
I.B. Stanway ◽  
S.Z. Zangeneh ◽  
T. Bammler ◽  
A. Shojaie

Sign in / Sign up

Export Citation Format

Share Document