scholarly journals A genetic programming approach to oral cancer prognosis

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2482 ◽  
Author(s):  
Mei Sze Tan ◽  
Jing Wei Tan ◽  
Siow-Wee Chang ◽  
Hwa Jen Yap ◽  
Sameem Abdul Kareem ◽  
...  

BackgroundThe potential of genetic programming (GP) on various fields has been attained in recent years. In bio-medical field, many researches in GP are focused on the recognition of cancerous cells and also on gene expression profiling data. In this research, the aim is to study the performance of GP on the survival prediction of a small sample size of oral cancer prognosis dataset, which is the first study in the field of oral cancer prognosis.MethodGP is applied on an oral cancer dataset that contains 31 cases collected from the Malaysia Oral Cancer Database and Tissue Bank System (MOCDTBS). The feature subsets that is automatically selected through GP were noted and the influences of this subset on the results of GP were recorded. In addition, a comparison between the GP performance and that of the Support Vector Machine (SVM) and logistic regression (LR) are also done in order to verify the predictive capabilities of the GP.ResultThe result shows that GP performed the best (average accuracy of 83.87% and average AUROC of 0.8341) when the features selected are smoking, drinking, chewing, histological differentiation of SCC, and oncogene p63. In addition, based on the comparison results, we found that the GP outperformed the SVM and LR in oral cancer prognosis.DiscussionSome of the features in the dataset are found to be statistically co-related. This is because the accuracy of the GP prediction drops when one of the feature in the best feature subset is excluded. Thus, GP provides an automatic feature selection function, which chooses features that are highly correlated to the prognosis of oral cancer. This makes GP an ideal prediction model for cancer clinical and genomic data that can be used to aid physicians in their decision making stage of diagnosis or prognosis.

2020 ◽  
Author(s):  
Andrew Lensen ◽  
Harith Al-Sahaf ◽  
Mengjie Zhang ◽  
Bing Xue

© 2015 IEEE. Image classification is a crucial task in Computer Vision. Feature detection represents a key component of the image classification process, which aims at detecting a set of important features that have the potential to facilitate the classification task. In this paper, we propose a Genetic Programming (GP) approach to image feature detection. The proposed method uses the Speeded Up Robust Features (SURF) method to extract features from regions automatically selected by GP, and adopts a wrapper approach combined with a voting scheme to perform image classification. The proposed approach is evaluated using three datasets of increasing difficulty, and is compared to five popularly used machine learning methods: Support Vector Machines, Random Forest, Naive Bayes, Decision Trees, and Adaptive Boosting. The experimental results show the proposed approach has achieved comparable or better performance than the five existing methods on all three datasets, and reveal its capability to automatically detect good regions from a large image from which good features are automatically constructed.


2019 ◽  
Vol 11 (21) ◽  
pp. 2571 ◽  
Author(s):  
Ma ◽  
Liang ◽  
Wang ◽  
Lv ◽  
Yu ◽  
...  

Research work on distinguishing humans from animals can help provide priority orders and optimize the distribution of resources in earthquake- or mining-related rescue missions. However, the existing solutions are few and their stability and accuracy of classification are less. This study proposes an accurate method for distinguishing stationary human targets from dog targets under through-wall condition based on ultra-wideband (UWB) radar. Eight humans and five beagles were used to collect 130 samples of through-wall signals using the UWB radar. Twelve corresponding features belonging to four categories were combined using the support vector machine (SVM) method. A recursive feature elimination (RFE) method determined an optimal feature subset from the twelve features to overcome overfitting and poor generalization. The results after ten-fold cross-validation showed that the area under the receiver operator characteristic (ROC) curve can reach 0.9993, which indicates that the two subjects can be distinguished under through-wall condition. The study also compared the ability of the proposed features of four categories when used independently in a classifier. Comparison results indicated that wavelet entropy-corresponding features among them have the best performance. The method and results are envisioned to be applied in various practical situations, such as post-disaster searching, hostage rescues, and intelligent homecare.


2018 ◽  
Vol Volume-2 (Issue-3) ◽  
pp. 2422-2426
Author(s):  
Rasika Joat ◽  
Dr. A. P. Thakare ◽  
Dr. Ketaki Kalele | Dr. Viashali Thakare ◽  

2020 ◽  
Author(s):  
Andrew Lensen ◽  
Harith Al-Sahaf ◽  
Mengjie Zhang ◽  
Bing Xue

© 2015 IEEE. Image classification is a crucial task in Computer Vision. Feature detection represents a key component of the image classification process, which aims at detecting a set of important features that have the potential to facilitate the classification task. In this paper, we propose a Genetic Programming (GP) approach to image feature detection. The proposed method uses the Speeded Up Robust Features (SURF) method to extract features from regions automatically selected by GP, and adopts a wrapper approach combined with a voting scheme to perform image classification. The proposed approach is evaluated using three datasets of increasing difficulty, and is compared to five popularly used machine learning methods: Support Vector Machines, Random Forest, Naive Bayes, Decision Trees, and Adaptive Boosting. The experimental results show the proposed approach has achieved comparable or better performance than the five existing methods on all three datasets, and reveal its capability to automatically detect good regions from a large image from which good features are automatically constructed.


2019 ◽  
Vol 17 ◽  
Author(s):  
Yanqiu Yao ◽  
Xiaosa Zhao ◽  
Qiao Ning ◽  
Junping Zhou

Background: Glycation is a nonenzymatic post-translational modification process by attaching a sugar molecule to a protein or lipid molecule. It may impair the function and change the characteristic of the proteins which may lead to some metabolic diseases. In order to understand the underlying molecular mechanisms of glycation, computational prediction methods have been developed because of their convenience and high speed. However, a more effective computational tool is still a challenging task in computational biology. Methods: In this study, we showed an accurate identification tool named ABC-Gly for predicting lysine glycation sites. At first, we utilized three informative features, including position-specific amino acid propensity, secondary structure and the composition of k-spaced amino acid pairs to encode the peptides. Moreover, to sufficiently exploit discriminative features thus can improve the prediction and generalization ability of the model, we developed a two-step feature selection, which combined the Fisher score and an improved binary artificial bee colony algorithm based on support vector machine. Finally, based on the optimal feature subset, we constructed the effective model by using Support Vector Machine on the training dataset. Results: The performance of the proposed predictor ABC-Gly was measured with the sensitivity of 76.43%, the specificity of 91.10%, the balanced accuracy of 83.76%, the area under the receiver-operating characteristic curve (AUC) of 0.9313, a Matthew’s Correlation Coefficient (MCC) of 0.6861 by 10-fold cross-validation on training dataset, and a balanced accuracy of 59.05% on independent dataset. Compared to the state-of-the-art predictors on the training dataset, the proposed predictor achieved significant improvement in the AUC of 0.156 and MCC of 0.336. Conclusion: The detailed analysis results indicated that our predictor may serve as a powerful complementary tool to other existing methods for predicting protein lysine glycation. The source code and datasets of the ABC-Gly were provided in the Supplementary File 1.


2020 ◽  
Vol 15 ◽  
Author(s):  
Chun Qiu ◽  
Sai Li ◽  
Shenghui Yang ◽  
Lin Wang ◽  
Aihui Zeng ◽  
...  

Aim: To search the genes related to the mechanisms of the occurrence of glioma and to try to build a prediction model for glioblastomas. Background: The morbidity and mortality of glioblastomas are very high, which seriously endangers human health. At present, the goals of many investigations on gliomas are mainly to understand the cause and mechanism of these tumors at the molecular level and to explore clinical diagnosis and treatment methods. However, there is no effective early diagnosis method for this disease, and there are no effective prevention, diagnosis or treatment measures. Methods: First, the gene expression profiles derived from GEO were downloaded. Then, differentially expressed genes (DEGs) in the disease samples and the control samples were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-based feature subset (CFS) method was applied to the selection of key DEGs. In addition, the classification model between the glioblastoma samples and the controls was built by an Support Vector Machine (SVM) based on selected key genes. Results and Discussion: Thirty-six DEGs, including 17 upregulated and 19 downregulated genes, were selected as the feature genes to build the classification model between the glioma samples and the control samples by the CFS method. The accuracy of the classification model by using a 10-fold cross-validation test and independent set test was 76.25% and 70.3%, respectively. In addition, PPP2R2B and CYBB can also be found in the top 5 hub genes screened by the protein– protein interaction (PPI) network. Conclusions: This study indicated that the CFS method is a useful tool to identify key genes in glioblastomas. In addition, we also predicted that genes such as PPP2R2B and CYBB might be potential biomarkers for the diagnosis of glioblastomas.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Florent Le Borgne ◽  
Arthur Chatton ◽  
Maxime Léger ◽  
Rémi Lenain ◽  
Yohann Foucher

AbstractIn clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.


Sign in / Sign up

Export Citation Format

Share Document