An ensemble greedy algorithm for feature selection in cancer genomics

Author(s):  
S. M. Pagnotta ◽  
C. Laudanna ◽  
M. Pancione ◽  
L. Cerulo ◽  
V. Colantuoni ◽  
...  
2021 ◽  
pp. 153537022199201
Author(s):  
Runmin Li ◽  
Guosheng Wang ◽  
ZhouJie Wu ◽  
HuaGuang Lu ◽  
Gen Li ◽  
...  

Multiple-omics sequencing information with high-throughput has laid a solid foundation to identify genes associated with cancer prognostic process. Multiomics information study is capable of revealing the cancer occurring and developing system according to several aspects. Currently, the prognosis of osteosarcoma is still poor, so a genetic marker is needed for predicting the clinically related overall survival result. First, Office of Cancer Genomics (OCG Target) provided RNASeq, copy amount variations information, and clinically related follow-up data. Genes associated with prognostic process and genes exhibiting copy amount difference were screened in the training group, and the mentioned genes were integrated for feature selection with least absolute shrinkage and selection operator (Lasso). Eventually, effective biomarkers received the screening process. Lastly, this study built and demonstrated one gene-associated prognosis mode according to the set of the test and gene expression omnibus validation set; 512 prognosis-related genes ( P < 0.01), 336 copies of amplified genes ( P < 0.05), and 36 copies of deleted genes ( P < 0.05) were obtained, and those genes of the mentioned genomic variants display close associations with tumor occurring and developing mechanisms. This study generated 10 genes for candidates through the integration of genomic variant genes as well as prognosis-related genes. Six typical genes (i.e. MYC, CHIC2, CCDC152, LYL1, GPR142, and MMP27) were obtained by Lasso feature selection and stepwise multivariate regression study, many of which are reported to show a relationship to tumor progressing process. The authors conducted Cox regression study for building 6-gene sign, i.e. one single prognosis-related element, in terms of cases carrying osteosarcoma. In addition, the samples were able to be risk stratified in the training group, test set, and externally validating set. The AUC of five-year survival according to the training group and validation set reached over 0.85, with superior predictive performance as opposed to the existing researches. Here, 6-gene sign was built to be new prognosis-related marking elements for assessing osteosarcoma cases’ surviving state.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Sheng Yang ◽  
Li Guo ◽  
Fang Shao ◽  
Yang Zhao ◽  
Feng Chen

Sequencing is widely used to discover associations between microRNAs (miRNAs) and diseases. However, the negative binomial distribution (NB) and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS) algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF), was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96) from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes.


Biostatistics ◽  
2019 ◽  
Author(s):  
Magnus M Münch ◽  
Carel F W Peeters ◽  
Aad W Van Der Vaart ◽  
Mark A Van De Wiel

Summary In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (i) $p$-values from a previous study and (ii) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection but is not straightforward. We propose a group-regularized (logistic) elastic net regression method, where each penalty parameter corresponds to a group of features based on the external information. The method, termed gren, makes use of the Bayesian formulation of logistic elastic net regression to estimate both the model and penalty parameters in an approximate empirical–variational Bayes framework. Simulations and applications to three cancer genomics studies and one Alzheimer metabolomics study show that, if the partitioning of the features is informative, classification performance, and feature selection are indeed enhanced.


Author(s):  
Lindsey M. Kitchell ◽  
Francisco J. Parada ◽  
Brandi L. Emerick ◽  
Tom A. Busey

CCIT Journal ◽  
2019 ◽  
Vol 12 (2) ◽  
pp. 170-176
Author(s):  
Anggit Dwi Hartanto ◽  
Aji Surya Mandala ◽  
Dimas Rio P.L. ◽  
Sidiq Aminudin ◽  
Andika Yudirianto

Pacman is one of the labyrinth-shaped games where this game has used artificial intelligence, artificial intelligence is composed of several algorithms that are inserted in the program and Implementation of the dijkstra algorithm as a method of solving problems that is a minimum route problem on ghost pacman, where ghost plays a role chase player. The dijkstra algorithm uses a principle similar to the greedy algorithm where it starts from the first point and the next point is connected to get to the destination, how to compare numbers starting from the starting point and then see the next node if connected then matches one path with the path). From the results of the testing phase, it was found that the dijkstra algorithm is quite good at solving the minimum route solution to pursue the player, namely by getting a value of 13 according to manual calculations


Sign in / Sign up

Export Citation Format

Share Document