scholarly journals Primal-dual for classification with rejection (PD-CR): a novel method for classification and feature selection—an application in metabolomics studies

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
David Chardin ◽  
Olivier Humbert ◽  
Caroline Bailleux ◽  
Fanny Burel-Vandenbos ◽  
Valerie Rigau ◽  
...  

Abstract Background Supervised classification methods have been used for many years for feature selection in metabolomics and other omics studies. We developed a novel primal-dual based classification method (PD-CR) that can perform classification with rejection and feature selection on high dimensional datasets. PD-CR projects data onto a low dimension space and performs classification by minimizing an appropriate quadratic cost. It simultaneously optimizes the selected features and the prediction accuracy with a new tailored, constrained primal-dual method. The primal-dual framework is general enough to encompass various robust losses and to allow for convergence analysis. Here, we compare PD-CR to three commonly used methods: partial least squares discriminant analysis (PLS-DA), random forests and support vector machines (SVM). We analyzed two metabolomics datasets: one urinary metabolomics dataset concerning lung cancer patients and healthy controls; and a metabolomics dataset obtained from frozen glial tumor samples with mutated isocitrate dehydrogenase (IDH) or wild-type IDH. Results PD-CR was more accurate than PLS-DA, Random Forests and SVM for classification using the 2 metabolomics datasets. It also selected biologically relevant metabolites. PD-CR has the advantage of providing a confidence score for each prediction, which can be used to perform classification with rejection. This substantially reduces the False Discovery Rate. Conclusion PD-CR is an accurate method for classification of metabolomics datasets which can outperform PLS-DA, Random Forests and SVM while selecting biologically relevant features. Furthermore the confidence score provided with PD-CR can be used to perform classification with rejection and reduce the false discovery rate.

2021 ◽  
Author(s):  
David Chardin ◽  
Michel Barlaud ◽  
Olivier Humbert ◽  
Fanny Burel-vandenbos ◽  
Thierry Pourcher ◽  
...  

Abstract Background: Supervised classification methods have been used for many years for feature selection in metabolomics and other omics studies. We developed a novel primal-dual based classification method (PD-CR) that can perform classification with rejection and feature selection on high dimensional datasets. PD-CR projects data onto a low dimension space and performs classification by minimizing an appropriate quadratic cost. It simultaneously optimizes the selected features and the prediction accuracy with a new tailored, constrained primal-dual method. The primal-dual framework is general enough to encompass various robust losses and to allow for convergence analysis. Here, we compared PD-CR to two commonly used methods : Partial Least Squares Discriminant Analysis (PLS-DA) and Random Forests. We analyzed two metabolomics datasets: one urinary metabolomics dataset concerning lung cancer patients and healthy controls; and a metabolomics dataset obtained from frozen glial tumor samples with mutated isocitrate dehydrogenase (IDH) or wild-type IDH. Results: PD-CR was more accurate than PLS-DA and Random Forests for classification using the 2 metabolomics datasets. It also selected biologically relevant metabolites. PD-CR has the advantage of providing a confidence score for each prediction, which can be used to perform classification with rejection. This substantially reduces the False Discovery Rate. Conclusion: The confidence score provided with PD-CR adds considerable value to the prediction as it includes a metric that is implicitly used by every physician when they make a medical decision: the probability to make the wrong choice. So far, one of the main obstacles to the use of machine learning in medicine resides in the fact that it is harder to trust the decision of a machine learning method than that of a physician when it comes to health issues. We believe that providing a confidence score associated to the decision would make these new tools more convincing if used in routine clinical practice.


2008 ◽  
Vol 1 (2) ◽  
pp. 57-66 ◽  
Author(s):  
Seoung Bum Kim ◽  
Victoria C. P. Chen ◽  
Youngja Park ◽  
Thomas R. Ziegler ◽  
Dean P. Jones

2018 ◽  
Vol 15 (4) ◽  
pp. 1066-1078 ◽  
Author(s):  
Alexej Gossmann ◽  
Shaolong Cao ◽  
Damian Brzyski ◽  
Lan-Juan Zhao ◽  
Hong-Wen Deng ◽  
...  

2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Alassane Thiam ◽  
Michel Sanka ◽  
Rokhaya Ndiaye Diallo ◽  
Magali Torres ◽  
Babacar Mbengue ◽  
...  

Abstract Background Plasmodium falciparum malaria remains a major health problem in Africa. The mechanisms of pathogenesis are not fully understood. Transcriptomic studies may provide new insights into molecular pathways involved in the severe form of the disease. Methods Blood transcriptional levels were assessed in patients with cerebral malaria, non-cerebral malaria, or mild malaria by using microarray technology to look for gene expression profiles associated with clinical status. Multi-way ANOVA was used to extract differentially expressed genes. Network and pathways analyses were used to detect enrichment for biological pathways. Results We identified a set of 443 genes that were differentially expressed in the three patient groups after applying a false discovery rate of 10%. Since the cerebral patients displayed a particular transcriptional pattern, we focused our analysis on the differences between cerebral malaria patients and mild malaria patients. We further found 842 differentially expressed genes after applying a false discovery rate of 10%. Unsupervised hierarchical clustering of cerebral malaria-informative genes led to clustering of the cerebral malaria patients. The support vector machine method allowed us to correctly classify five out of six cerebral malaria patients and six of six mild malaria patients. Furthermore, the products of the differentially expressed genes were mapped onto a human protein-protein network. This led to the identification of the proteins with the highest number of interactions, including GSK3B, RELA, and APP. The enrichment analysis of the gene functional annotation indicates that genes involved in immune signalling pathways play a role in the occurrence of cerebral malaria. These include BCR-, TCR-, TLR-, cytokine-, FcεRI-, and FCGR- signalling pathways and natural killer cell cytotoxicity pathways, which are involved in the activation of immune cells. In addition, our results revealed an enrichment of genes involved in Alzheimer’s disease. Conclusions In the present study, we examine a set of genes whose expression differed in cerebral malaria patients and mild malaria patients. Moreover, our results provide new insights into the potential effect of the dysregulation of gene expression in immune pathways. Host genetic variation may partly explain such alteration of gene expression. Further studies are required to investigate this in African populations.


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


Sign in / Sign up

Export Citation Format

Share Document