Heuristic Breadth-First Search Algorithm for Informative Gene Selection Based on Gene Expression Profiles

2009 ◽  
Vol 31 (4) ◽  
pp. 636-649 ◽  
Author(s):  
Shu-Lin WANG ◽  
Ji WANG ◽  
Huo-Wang CHEN ◽  
Shu-Tao LI ◽  
Bo-Yun ZHANG
2021 ◽  
Vol 16 ◽  
Author(s):  
Yueling Xiong ◽  
Qingqing Li ◽  
Peipei Wang ◽  
Mingquan Ye

Background: Informative gene selection is an essential step in performing tumor classification. However, it is difficult to select informative genes related to tumors from large-scale gene expression profiles because of their characteristics, such as high dimensionality, relatively small samples, and class imbalance, and some genes being superfluous and irrelevant. Objective: Many researchers analyze and process gene expression data to obtain classified gene subsets by using machine learning methods. However, the gene expression profiles of tumors often have massive computational challenges. In addition, when improving feature importance and classification accuracy, cost estimation is often ignored in traditional feature selection algorithms, which makes tumor classification more difficult. Method: In this study, a novel informative gene selection method based on cost-sensitive fast correlation-based feature selection (CS-FCBF) is proposed. Results: First, the symmetric uncertainty index is used to evaluate the correlation between informative genes and class labels, and then a large number of irrelevant and redundant genes are quickly filtered according to importance. Thereby, a candidate gene subset is generated. Second, cost-sensitive learning, which introduces the misclassification cost matrix and support vector machine attribute evaluation, is used to obtain the top-ranked gene subset with minimum misclassification loss. Finally, the candidate gene subset is optimized. Conclusion: This experiment was verified in eight independent tumor datasets. By comparing and analyzing CS-FCBF with another three hybrids of typical gene selection algorithms combined with cost-sensitive learning, we found that the method proposed in this study exhibited a better classification performance with fewer selected genes, which might provide guidance in tumor diagnosis and research.


2010 ◽  
Vol 9 ◽  
pp. CIN.S3794 ◽  
Author(s):  
Xiaosheng Wang ◽  
Osamu Gotoh

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kota Fujisawa ◽  
Mamoru Shimo ◽  
Y.-H. Taguchi ◽  
Shinya Ikematsu ◽  
Ryota Miyata

AbstractCoronavirus disease 2019 (COVID-19) is raging worldwide. This potentially fatal infectious disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, the complete mechanism of COVID-19 is not well understood. Therefore, we analyzed gene expression profiles of COVID-19 patients to identify disease-related genes through an innovative machine learning method that enables a data-driven strategy for gene selection from a data set with a small number of samples and many candidates. Principal-component-analysis-based unsupervised feature extraction (PCAUFE) was applied to the RNA expression profiles of 16 COVID-19 patients and 18 healthy control subjects. The results identified 123 genes as critical for COVID-19 progression from 60,683 candidate probes, including immune-related genes. The 123 genes were enriched in binding sites for transcription factors NFKB1 and RELA, which are involved in various biological phenomena such as immune response and cell survival: the primary mediator of canonical nuclear factor-kappa B (NF-κB) activity is the heterodimer RelA-p50. The genes were also enriched in histone modification H3K36me3, and they largely overlapped the target genes of NFKB1 and RELA. We found that the overlapping genes were downregulated in COVID-19 patients. These results suggest that canonical NF-κB activity was suppressed by H3K36me3 in COVID-19 patient blood.


2017 ◽  
Vol 35 (15_suppl) ◽  
pp. e23162-e23162
Author(s):  
Konstantin Volyanskyy ◽  
Minghao Zhong ◽  
Payal Keswarpu ◽  
John T Fallon ◽  
Michael Paul Fanucchi ◽  
...  

e23162 Background: Cancer is characterized by a variety of heterogeneous genomic and transcriptomic patterns involving highly complex signaling biological pathways. The problem of identification of the factors driving tumor progression becomes even more challenging due to intricate interaction mechanisms between these pathways. Using novel approaches in machine learning, we demonstrate the ability to quantitatively describe characteristic signaling patterns in cancer based on transcriptomic data Methods: We used RNASeq data from 20531 genes in 174 samples of GBM from The Cancer Genome Atlas including 5 major histological subtypes – Classical, G-CIMP, Mesenchymal, Neural, and Proneural, anddeveloped predictive computational framework for molecular subtype differentiation from normal tissue relying on variance based gene selection and random forest algorithm. Results: We obtained a few key findings – (1) genes from cell signaling pathways alone differentiate each subtype from normal tissue with 100% accuracy; (2) predictive genes are specific to each subtype; (3) inferred pathway interactions are also specific to each subtype; (4) typically most of the predictive genes involved in signaling are down-regulated in tumor compared to normal tissue (MAPT, PRKCG, PDE2A, RYR2, ATP1B1, GRN1, GNAO1), however, in each subtype we observed a smaller subset of predictive genes which are highly up-regulated in tumor (ID3, FN1, JAG1, F2R, COL4A1, EDAR, CDK2, CDK4, MFNG, BIRC5, CCNB2). We detected and quantitatively evaluated characteristic signaling pathway involvement across the GBM subtypes for MAPK, RAP1, RAS, Notch, PI3K-Akt, mTOR, FoxO, Jak-STAT, Wnt, cAMP, and Calcium Signaling, providing a unique approximation for each subtype signaling profile. Conclusions: In this study, we identified gene expression profiles and associated signaling pathways for distinguishing GBM Multiforme subtypes from normal tissue. We observed and described a dense complex picture of interacting signaling pathways. The detected interactions may provide clinical insights and could be used to identify potential therapeutic targets, however, more research is needed to confirm this.


Sign in / Sign up

Export Citation Format

Share Document