GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome

2016 ◽  
Vol 14 (01) ◽  
pp. 1640003 ◽  
Author(s):  
Bingxin Lu ◽  
Hon Wai Leong

Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

2019 ◽  
Vol 15 (3) ◽  
pp. 206-211 ◽  
Author(s):  
Jihui Tang ◽  
Jie Ning ◽  
Xiaoyan Liu ◽  
Baoming Wu ◽  
Rongfeng Hu

<P>Introduction: Machine Learning is a useful tool for the prediction of cell-penetration compounds as drug candidates. </P><P> Materials and Methods: In this study, we developed a novel method for predicting Cell-Penetrating Peptides (CPPs) membrane penetrating capability. For this, we used orthogonal encoding to encode amino acid and each amino acid position as one variable. Then a software of IBM spss modeler and a dataset including 533 CPPs, were used for model screening. </P><P> Results: The results indicated that the machine learning model of Support Vector Machine (SVM) was suitable for predicting membrane penetrating capability. For improvement, the three CPPs with the most longer lengths were used to predict CPPs. The penetration capability can be predicted with an accuracy of close to 95%. </P><P> Conclusion: All the results indicated that by using amino acid position as a variable can be a perspective method for predicting CPPs membrane penetrating capability.</P>


2014 ◽  
Vol 26 (01) ◽  
pp. 1450002 ◽  
Author(s):  
Hanguang Xiao

The early detection and intervention of artery stenosis is very important to reduce the mortality of cardiovascular disease. A novel method for predicting artery stenosis was proposed by using the input impedance of the systemic arterial tree and support vector machine (SVM). Based on the built transmission line model of a 55-segment systemic arterial tree, the input impedance of the arterial tree was calculated by using a recursive algorithm. A sample database of the input impedance was established by specifying the different positions and degrees of artery stenosis. A SVM prediction model was trained by using the sample database. 10-fold cross-validation was used to evaluate the performance of the SVM. The effects of stenosis position and degree on the accuracy of the prediction were discussed. The results showed that the mean specificity, sensitivity and overall accuracy of the SVM are 80.2%, 98.2% and 89.2%, respectively, for the 50% threshold of stenosis degree. Increasing the threshold of the stenosis degree from 10% to 90% increases the overall accuracy from 82.2% to 97.4%. Increasing the distance of the stenosis artery from the heart gradually decreases the overall accuracy from 97.1% to 58%. The deterioration of the stenosis degree to 90% increases the prediction accuracy of the SVM to more than 90% for the stenosis of peripheral artery. The simulation demonstrated theoretically the feasibility of the proposed method for predicting artery stenosis via the input impedance of the systemic arterial tree and SVM.


PLoS ONE ◽  
2009 ◽  
Vol 4 (2) ◽  
pp. e4524 ◽  
Author(s):  
Cheong Xin Chan ◽  
Aaron E. Darling ◽  
Robert G. Beiko ◽  
Mark A. Ragan

2021 ◽  
Author(s):  
Leila Zahedi ◽  
Farid Ghareh Mohammadi ◽  
M. Hadi Amini

Machine learning techniques lend themselves as promising decision-making and analytic tools in a wide range of applications. Different ML algorithms have various hyper-parameters. In order to tailor an ML model towards a specific application, a large number of hyper-parameters should be tuned. Tuning the hyper-parameters directly affects the performance (accuracy and run-time). However, for large-scale search spaces, efficiently exploring the ample number of combinations of hyper-parameters is computationally challenging. Existing automated hyper-parameter tuning techniques suffer from high time complexity. In this paper, we propose HyP-ABC, an automatic innovative hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach, to measure the classification accuracy of three ML algorithms, namely random forest, extreme gradient boosting, and support vector machine. Compared to the state-of-the-art techniques, HyP-ABC is more efficient and has a limited number of parameters to be tuned, making it worthwhile for real-world hyper-parameter optimization problems. We further compare our proposed HyP-ABC algorithm with state-of-the-art techniques. In order to ensure the robustness of the proposed method, the algorithm takes a wide range of feasible hyper-parameter values, and is tested using a real-world educational dataset.


Author(s):  
S. Mustak ◽  
G. Uday ◽  
B. Ramesh ◽  
B. Praveen

<p><strong>Abstract.</strong> Crop discrimination and acreage play a vital role in interpreting the cropping pattern, statistics of the produce and market value of each product. Sultan Battery is an area where a large amount of irrigated and rainfed paddy crops are grown along with Rubber, Arecanut and Coconut. In addition, the northern region of Sultan Battery is covered with evergreen and deciduous forest. In this study, the main objective is to evaluate the performance of optical and Synthetic Aperture Radar (SAR)-optical hybrid fusion imageries for crop discrimination in Sultan Bathery Taluk of Wayanad district in Kerala. Seven land use classes such as paddy, rubber, coconut, deciduous forest, evergreen forest, water bodies and others land use (e.g., built-up, barren etc.) were selected based on literature review and local land use classification policy. Both Sentinel-2A (optical) and sentinel-1A (SAR) satellite imageries of 2017 for Kharif season were used for classification using three machine learning classifiers such as Support Vector Machine (SVM), Random Forest (RF) and Classification and Regression Trees (CART). Further, the performance of these techniques was also compared in order to select the best classifier. In addition, spectral indices and textural matrices (NDVI, GLCM) were extracted from the image and best features were selected using the sequential feature selection approach. Thus, 10-fold cross-validation was employed for parameter tuning of such classifiers to select best hyperparameters to improve the classification accuracy. Finally, best features, best hyperparameters were used for final classification and accuracy assessment. The results show that SVM outperforms the RF and CART and similarly, Optical+SAR datasets outperforms the optical and SAR satellite imageries. This study is very supportive for the earth observation scientists to support promising guideline to the agricultural scientist, policy-makers and local government for sustainable agriculture practice.</p>


Sign in / Sign up

Export Citation Format

Share Document