scholarly journals Stable Isotope Ratio and Elemental Profile Combined with Support Vector Machine for Provenance Discrimination of Oolong Tea (Wuyi-Rock Tea)

2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Yun-xiao Lou ◽  
Xian-shu Fu ◽  
Xiao-ping Yu ◽  
Zi-hong Ye ◽  
Hai-feng Cui ◽  
...  

This paper focused on an effective method to discriminate the geographical origin of Wuyi-Rock tea by the stable isotope ratio (SIR) and metallic element profiling (MEP) combined with support vector machine (SVM) analysis. Wuyi-Rock tea (n=99) collected from nine producing areas and non-Wuyi-Rock tea (n=33) from eleven nonproducing areas were analysed for SIR and MEP by established methods. The SVM model based on coupled data produced the best prediction accuracy (0.9773). This prediction shows that instrumental methods combined with a classification model can provide an effective and stable tool for provenance discrimination. Moreover, every feature variable in stable isotope and metallic element data was ranked by its contribution to the model. The results show that δ2H, δ18O, Cs, Cu, Ca, and Rb contents are significant indications for provenance discrimination and not all of the metallic elements improve the prediction accuracy of the SVM model.

Molecules ◽  
2020 ◽  
Vol 25 (6) ◽  
pp. 1442 ◽  
Author(s):  
Tao Shen ◽  
Hong Yu ◽  
Yuan-Zhong Wang

Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000–4000 cm−1) and Fourier transform mid-infrared (MIR: 4000–600 cm−1) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen’s kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana.


2011 ◽  
Vol 3 ◽  
pp. BECB.S7503 ◽  
Author(s):  
Sangeetha Subramaniam ◽  
Monica Mehrotra ◽  
Dinesh Gupta

There is an urgent need to develop novel anti-malarials in view of the increasing disease burden and growing resistance of the currently used drugs against the malarial parasites. Proliferation inhibitors targeting P. falciparum intraerythrocytic cycle are one of the important classes of compounds being explored for its potential to be novel antimalarials. Support Vector Machine (SVM) based model developed by us can facilitate rapid screening of large and diverse chemical libraries by reducing false hits and prioritising compounds before setting up expensive High Throughput Screening experiment. The SVM model, trained with molecular descriptors of proliferation inhibitors and non-inhibitors, displayed a satisfactory performance on cross validations and independent data set, with an average accuracy of 83% and AUC of 0.88. Intriguingly, the method displayed remarkable accuracy for the recently submitted P. falciparum whole cell screening datasets. The method also predicted several inhibitors in the National Cancer Institute diversity set, mostly similar to the known inhibitors.


Author(s):  
Jie Xu ◽  
Xianglong Liu ◽  
Zhouyuan Huo ◽  
Cheng Deng ◽  
Feiping Nie ◽  
...  

Support Vector Machine (SVM) is originally proposed as a binary classification model, and it has already achieved great success in different applications. In reality, it is more often to solve a problem which has more than two classes. So, it is natural to extend SVM to a multi-class classifier. There have been many works proposed to construct a multi-class classifier based on binary SVM, such as one versus all strategy, one versus one strategy and Weston's multi-class SVM. One versus all strategy and one versus one strategy split the multi-class problem to multiple binary classification subproblems, and we need to train multiple binary classifiers. Weston's multi-class SVM is formed by ensuring risk constraints and imposing a specific regularization, like Frobenius norm. It is not derived by maximizing the margin between hyperplane and training data which is the motivation in SVM. In this paper, we propose a multi-class SVM model from the perspective of maximizing margin between training points and hyperplane, and analyze the relation between our model and other related methods. In the experiment, it shows that our model can get better or compared results when comparing with other related methods.


2018 ◽  
Vol 173 ◽  
pp. 01007
Author(s):  
Han Aoyang ◽  
Yu Litao ◽  
An Shuhuai ◽  
Zhang Zhisheng

Short-term load forecasting for microgrid is the basis of the research on scheduling techniques of microgrid. Accurate load forecasting for microgrid will provide the necessary basis for cooperative optimization scheduling. Short-term loadforecasting model for microgrid based on support vector machine(SVM) is constructed in this paper. The harmony search optimization algorithm(HSA) is used to optimize the parameters of the SVM model, because it has the advantages of fast convergence speed and better optimization ability. Through the simulation and test of the actual microgrid load system, it is proved that the short-term loadforecasting model for microgrid based on HSA-SVM can effectively improve the prediction accuracy.


2019 ◽  
Vol 11 (14) ◽  
pp. 3981 ◽  
Author(s):  
Xiaoqian Zu ◽  
Yongxiang Wu ◽  
Zhenduo Zhang ◽  
Lu Yu

To examine how cross-strata neighboring behavior in a mixed-income community can influence the consumption choices of individuals in low-income groups, and to improve the prediction accuracy of the consumption choice model of low-income groups for small sample sizes, we developed a support vector machine (SVM) algorithm based on the influence of neighboring behavior. We substituted the predicted latent variables into the SVM classifier and constructed an SVM prediction model with latent variables based on reference group theory. We established the model parameters using cross-validation and used low-income residents from a mixed-income community in Shanghai as study objects to empirically test the model’s performance. The results show that the SVM selection model with latent variables has good prediction accuracy. The proposed model’s accuracy was improved by 1.29% on the basis of the particle swarm optimization (PSO)-SVM model without latent variables, and by 19.35% on the basis of the SVM model with latent variables. The proposed model can be employed to predict the consumption choices of individuals in low-income groups. This paper offers a theoretical reference for investigating neighboring behavior in a mixed-income community and the consumption choices of individuals in low-income groups and is practically important for urban community planning systems.


Author(s):  
Zhenhua Li ◽  
Junjie Cheng ◽  
A. Abu-Siada

Background: Winding deformation is one of the most common faults that an operating power transformer experiences over its operational life. Thus it is essential to detect and rectify such faults at early stages to avoid potential catastrophic consequences to the transformer. At present, methods published in the literature for transformer winding fault diagnosis are mainly focused on identifying fault type and quantifying its extent without giving much attention to the identification of fault location. Methods: This paper presents a method based on a genetic algorithm and support vector machine (GA-SVM) to improve the faults’ classification of power transformers in terms of type and location. In this regard, a sinusoidal sweep signal in the frequency range of 600 kHz to 1MHz is applied to one terminal of the transformer winding. A mathematical index of the induced current at the head and end of the transformer winding under various fault conditions is used to extract unique features that are fed to a support vector machine (SVM) model for training. Parameters of the SVM model are optimized using a genetic algorithm (GA). Results : The effectiveness of mathematical indicators to extract fault type characteristics and the proposed fault classification model for fault diagnosis is demonstrated through extensive simulation analysis for various transformer winding faults at different locations. Conclusion : The proposed model can effectively identify different fault types and determine their location within the transformer winding, and the diagnostic rate of the fault type and fault location are 100% and 90%, respectively.


2020 ◽  
Vol 10 (11) ◽  
pp. 2628-2633 ◽  
Author(s):  
A. Sheryl Oliver ◽  
M. Anuradha ◽  
J. Jean Justus ◽  
Kiranmai Bellam ◽  
T. Jayasankar

Lung cancer is a serious illness affects people all over the globe. To increase the survival rate of patients affected by lung cancer, in advance recognition of lung cancer with effective treatments is important. This study introduces a new deep learning (DL) based feature extraction and classification technique for CT lung images. A DL model using Coding Network (CN) is presented for the extraction of high-level features and classical features. Initially, the convolution neural network is trained as a coding network and the actual pixels are coded into feature vectors for representing the high-level concepts for classification. Next, an extraction of chosen classical features takes place depending upon background knowledge of lung CT images. In addition, an automatic feature fusion takes place to avoid annoying parameter choice. Besides, support vector machine (SVM) model is employed for classify CT lung images in an effective way. For experimentation, a benchmark dataset is utilized to appraise the outcome of the presented CN-SVM model and is validated under several dimensions.


2021 ◽  
Vol 8 (1) ◽  
pp. 57-64
Author(s):  
Lionel Reinhart Halim ◽  
Alethea Suryadibrata

Depression and social anxiety are the two main negative impacts of cyberbullying. Unfortunately, a survey conducted by UNICEF on 3rd September 2019 showed that 1 in 3 young people in 30 countries had been victims of cyberbullying. Sentiment analysis research will be conducted to detect a comment that contains cyberbullying. Dataset of cyberbullying is obtained from the Kaggle website, named, Toxic Comment Classification Challenge. The pre-processing process consists of 4 stages, namely comment generalization (convert text into lowercase and remove punctuation), tokenization, stop words removal, and lemmatization. Word Embedding will be used to conduct sentiment analysis by implementing Word2Vec. After that, One-Against-All (OAA) method with the Support Vector Machine (SVM) model will be used to make predictions in the form of multi labelling. The SVM model will go through a hyperparameter tuning process using Randomized Search CV. Then, evaluation will be carried out using Micro Averaged F1 Score to assess the prediction accuracy and Hamming Loss to assess the numbers of pairs of sample and label that are incorrectly classified. Implementation result of Word2Vec and OAA SVM model provide the best result for the data undergoing the process of pre-processing using comment generalization, tokenization, stop words removal, and lemmatization which is stored into 100 features in Word2Vec model. Micro Averaged F1 and Hamming Loss percentage that is produced by the tuned model is 83.40% and 15.13% respectively.   Index Terms— Sentiment Analysis; Word Embedding; Word2Vec; One-Against-All; Support Vector Machine; Toxic Comment Classification Challenge; Multi Labelling


Sign in / Sign up

Export Citation Format

Share Document