scholarly journals Enhancing the Lasso Approach for Developing a Survival Prediction Model Based on Gene Expression Data

2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Shuhei Kaneko ◽  
Akihiro Hirakawa ◽  
Chikuma Hamada

In the past decade, researchers in oncology have sought to develop survival prediction models using gene expression data. The least absolute shrinkage and selection operator (lasso) has been widely used to select genes that truly correlated with a patient’s survival. The lasso selects genes for prediction by shrinking a large number of coefficients of the candidate genes towards zero based on a tuning parameter that is often determined by a cross-validation (CV). However, this method can pass over (or fail to identify) true positive genes (i.e., it identifies false negatives) in certain instances, because the lasso tends to favor the development of a simple prediction model. Here, we attempt to monitor the identification of false negatives by developing a method for estimating the number of true positive (TP) genes for a series of values of a tuning parameter that assumes a mixture distribution for the lasso estimates. Using our developed method, we performed a simulation study to examine its precision in estimating the number of TP genes. Additionally, we applied our method to a real gene expression dataset and found that it was able to identify genes correlated with survival that a CV method was unable to detect.

2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Putri W. Novianti ◽  
Victor L. Jong ◽  
Kit C. B. Roes ◽  
Marinus J. C. Eijkemans

2010 ◽  
Vol 27 (3) ◽  
pp. 359-367 ◽  
Author(s):  
Vinicius Bonato ◽  
Veerabhadran Baladandayuthapani ◽  
Bradley M. Broom ◽  
Erik P. Sulman ◽  
Kenneth D. Aldape ◽  
...  

Leukemia ◽  
2021 ◽  
Author(s):  
Adrián Mosquera Orgueira ◽  
Marta Sonia González Pérez ◽  
José Ángel Díaz Arias ◽  
Beatriz Antelo Rodríguez ◽  
Natalia Alonso Vence ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Farzaneh Hamidi ◽  
Neda Gilani ◽  
Reza Arabi Belaghi ◽  
Parvin Sarbakhsh ◽  
Tuba Edgünlü ◽  
...  

Ovarian cancer is the second most dangerous gynecologic cancer with a high mortality rate. The classification of gene expression data from high-dimensional and small-sample gene expression data is a challenging task. The discovery of miRNAs, a small non-coding RNA with 18–25 nucleotides in length that regulates gene expression, has revealed the existence of a new array for regulation of genes and has been reported as playing a serious role in cancer. By using LASSO and Elastic Net as embedded algorithms of feature selection techniques, the present study identified 10 miRNAs that were regulated in ovarian serum cancer samples compared to non-cancer samples in public available dataset GSE106817: hsa-miR-5100, hsa-miR-6800-5p, hsa-miR-1233-5p, hsa-miR-4532, hsa-miR-4783-3p, hsa-miR-4787-3p, hsa-miR-1228-5p, hsa-miR-1290, hsa-miR-3184-5p, and hsa-miR-320b. Further, we implemented state-of-the-art machine learning classifiers, such as logistic regression, random forest, artificial neural network, XGBoost, and decision trees to build clinical prediction models. Next, the diagnostic performance of these models with identified miRNAs was evaluated in the internal (GSE106817) and external validation dataset (GSE113486) by ROC analysis. The results showed that first four prediction models consistently yielded an AUC of 100%. Our findings provide significant evidence that the serum miRNA profile represents a promising diagnostic biomarker for ovarian cancer.


2021 ◽  
Vol 11 ◽  
Author(s):  
Adrián Mosquera Orgueira ◽  
Andrés Peleteiro Raíndo ◽  
Miguel Cid López ◽  
José Ángel Díaz Arias ◽  
Marta Sonia González Pérez ◽  
...  

Acute Myeloid Leukemia (AML) is a heterogeneous neoplasm characterized by cytogenetic and molecular alterations that drive patient prognosis. Currently established risk stratification guidelines show a moderate predictive accuracy, and newer tools that integrate multiple molecular variables have proven to provide better results. In this report, we aimed to create a new machine learning model of AML survival using gene expression data. We used gene expression data from two publicly available cohorts in order to create and validate a random forest predictor of survival, which we named ST-123. The most important variables in the model were age and the expression of KDM5B and LAPTM4B, two genes previously associated with the biology and prognostication of myeloid neoplasms. This classifier achieved high concordance indexes in the training and validation sets (0.7228 and 0.6988, respectively), and predictions were particularly accurate in patients at the highest risk of death. Additionally, ST-123 provided significant prognostic improvements in patients with high-risk mutations. Our results indicate that survival of patients with AML can be predicted to a great extent by applying machine learning tools to transcriptomic data, and that such predictions are particularly precise among patients with high-risk mutations.


PLoS ONE ◽  
2020 ◽  
Vol 15 (3) ◽  
pp. e0230536
Author(s):  
Guillermo López-García ◽  
José M. Jerez ◽  
Leonardo Franco ◽  
Francisco J. Veredas

Sign in / Sign up

Export Citation Format

Share Document