scholarly journals Discrimination of Gentiana and Its Related Species Using IR Spectroscopy Combined with Feature Selection and Stacked Generalization

Molecules ◽  
2020 ◽  
Vol 25 (6) ◽  
pp. 1442 ◽  
Author(s):  
Tao Shen ◽  
Hong Yu ◽  
Yuan-Zhong Wang

Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000–4000 cm−1) and Fourier transform mid-infrared (MIR: 4000–600 cm−1) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen’s kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana.

Author(s):  
Zhenhua Li ◽  
Junjie Cheng ◽  
A. Abu-Siada

Background: Winding deformation is one of the most common faults that an operating power transformer experiences over its operational life. Thus it is essential to detect and rectify such faults at early stages to avoid potential catastrophic consequences to the transformer. At present, methods published in the literature for transformer winding fault diagnosis are mainly focused on identifying fault type and quantifying its extent without giving much attention to the identification of fault location. Methods: This paper presents a method based on a genetic algorithm and support vector machine (GA-SVM) to improve the faults’ classification of power transformers in terms of type and location. In this regard, a sinusoidal sweep signal in the frequency range of 600 kHz to 1MHz is applied to one terminal of the transformer winding. A mathematical index of the induced current at the head and end of the transformer winding under various fault conditions is used to extract unique features that are fed to a support vector machine (SVM) model for training. Parameters of the SVM model are optimized using a genetic algorithm (GA). Results : The effectiveness of mathematical indicators to extract fault type characteristics and the proposed fault classification model for fault diagnosis is demonstrated through extensive simulation analysis for various transformer winding faults at different locations. Conclusion : The proposed model can effectively identify different fault types and determine their location within the transformer winding, and the diagnostic rate of the fault type and fault location are 100% and 90%, respectively.


Author(s):  
Alok Kumar Shukla ◽  
Pradeep Singh ◽  
Manu Vardhan

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).


2016 ◽  
Vol 78 (5-10) ◽  
Author(s):  
Farzana Kabir Ahmad ◽  
Abdullah Yousef Awwad Al-Qammaz ◽  
Yuhanis Yusof

Human-computer intelligent interaction (HCII) is a rising field of science that aims to refine and enhance the interaction between computer and human. Since emotion plays a vital role in human daily life, the ability of computer to interpret and response to human emotion is a crucial element for future intelligent system. Accordingly, several studies have been conducted to recognise human emotion using different technique such as facial expression, speech, galvanic skin response (GSR), or heart rate (HR). However, such techniques have problems mainly in terms of credibility and reliability as people can fake their feeling and response. Electroencephalogram (EEG) on the other has shown to be a very effective way in recognising human emotion as this technique records the brain activity of human and they can hardly be deceived by voluntary control. Regardless the popularity of EEG in recognizing human emotion, this study field is relatively challenging as EEG signal is nonlinear, involves myriad factors and chaotic in nature. These issues have led to high dimensional problem and poor classification results. To address such problems, this study has proposed a novel computational model, which consist of three main stages, namely a) feature extraction; b) feature selection and c) classifier. Discrete wavelet packet transform (DWPT) has been used to extract EEG signals feature and ultimately 204,800 features from 32 subject-independent have been obtained. Meanwhile, Genetic Algorithm (GA) and Least squares support vector machine (LS-SVM) have been used as a feature selection technique and classifier respectively. This computational model is tested on the common DEAP pre-processed EEG dataset in order to classify three levels of valence and arousal. The empirical results have shown that the proposed GA-LSSVM, has improved the classification results to 49.22% and 54.83% for valence and arousal respectively, whereas is it observed that 46.33% of valence and 48.30% of arousal classification were achieved when no feature selection technique is applied on the identical classifier


2014 ◽  
Vol 587-589 ◽  
pp. 2100-2104
Author(s):  
Qin Liu ◽  
Jian Min Xu ◽  
Kai Lu

Oversaturation in the modern urban traffic often happens. In order to describe the degree of oversaturation, the indexes of intersection oversaturation degree are put forward include dissipation time, stranded queue, overflow queue and travel speed. On the basis of selected indexes, the genetic algorithm support vector machine (GA-SVM) model was proposed to quantify the degree of oversaturation. In this method the genetic algorithm is used to select the model parameters. The GA-SVM model built is used to quantify the degree of oversaturation. Combining with the volume of intersections in Guangzhou city the method is calculated and simulated through programming. The simulation results show that GA-SVM method is effective and the accuracy of GA-SVM is higher than support vector machine (SVM).This method provides a theoretical basis for the analysis of traffic system under over-saturated traffic conditions.


2011 ◽  
Vol 3 ◽  
pp. BECB.S7503 ◽  
Author(s):  
Sangeetha Subramaniam ◽  
Monica Mehrotra ◽  
Dinesh Gupta

There is an urgent need to develop novel anti-malarials in view of the increasing disease burden and growing resistance of the currently used drugs against the malarial parasites. Proliferation inhibitors targeting P. falciparum intraerythrocytic cycle are one of the important classes of compounds being explored for its potential to be novel antimalarials. Support Vector Machine (SVM) based model developed by us can facilitate rapid screening of large and diverse chemical libraries by reducing false hits and prioritising compounds before setting up expensive High Throughput Screening experiment. The SVM model, trained with molecular descriptors of proliferation inhibitors and non-inhibitors, displayed a satisfactory performance on cross validations and independent data set, with an average accuracy of 83% and AUC of 0.88. Intriguingly, the method displayed remarkable accuracy for the recently submitted P. falciparum whole cell screening datasets. The method also predicted several inhibitors in the National Cancer Institute diversity set, mostly similar to the known inhibitors.


2021 ◽  
Vol 18 (17) ◽  
Author(s):  
Micheal Olaolu AROWOLO ◽  
Marion Olubunmi ADEBIYI ◽  
Chiebuka Timothy NNODIM ◽  
Sulaiman Olaniyi ABDULSALAM ◽  
Ayodele Ariyo ADEBIYI

As mosquito parasites breed across many parts of the sub-Saharan Africa part of the world, infected cells embrace an unpredictable and erratic life period. Millions of individual parasites have gene expressions. Ribonucleic acid sequencing (RNA-seq) is a popular transcriptional technique that has improved the detection of major genetic probes. The RNA-seq analysis generally requires computational improvements of machine learning techniques since it computes interpretations of gene expressions. For this study, an adaptive genetic algorithm (A-GA) with recursive feature elimination (RFE) (A-GA-RFE) feature selection algorithms was utilized to detect important information from a high-dimensional gene expression malaria vector RNA-seq dataset. Support Vector Machine (SVM) kernels were used as the classification algorithms to evaluate its predictive performances. The feasibility of this study was confirmed by using an RNA-seq dataset from the mosquito Anopheles gambiae. The technique results in related performance had 98.3 and 96.7 % accuracy rates, respectively. HIGHLIGHTS Dimensionality reduction method based of feature selection Classification using Support vector machine Classification of malaria vector dataset using an adaptive GA-RFE-SVM GRAPHICAL ABSTRACT


Sign in / Sign up

Export Citation Format

Share Document