Open-Source Essential Protein Prediction Model by Integrating Chi-Square and Support Vector Machine

2020 ◽  
Vol 11 (3) ◽  
pp. 38-56
Author(s):  
S. R. Mani Sekhar ◽  
Siddesh G. M. ◽  
Sunilkumar S. Manvi

Identification and analysis of protein play a vital role in drug design and disease prediction. There are several open-source applications that have been developed for identifying essential proteins which are based on biological or topological features. These techniques infer the possibility of proteins to be essential by using the network topology and feature selection, which can ignore some of the features to reduce the complexity and, subsequently, results in less accuracy. In the paper, the authors have used selenium driver to scrap the dataset. Later, the authors integrated the chi-square method with support vector machine for the prediction of essential proteins in baker yeast. Here, chi-square is a test of dissimilarity used for altering the record, and afterward, the support vector machine is used to classify the test dataset. The results show that the proposed model Chi-SVM model achieves an accuracy of 99.56%, whereas BC and CC achieved an accuracy of 84.0% and 86.0%. Finally, the proposed model is validated using Statistical performance measures such as PPA, NPA, SA, and STA.

2018 ◽  
Vol 5 (5) ◽  
pp. 537 ◽  
Author(s):  
Oman Somantri ◽  
Dyah Apriliani

<p class="Judul2"><strong>Abstrak</strong></p><p class="Judul2"> </p><p class="Abstrak">Setiap pelanggan pasti menginginkan sebuah pendukung keputusan dalam menentukan pilihan ketika akan mengunjungi sebuah tempat makan atau kuliner yang sesuai dengan keinginan salah satu contohnya yaitu di Kota Tegal. <em>Sentiment analysis</em> digunakan untuk memberikan sebuah solusi terkait dengan permasalahan tersebut, dengan menereapkan model algoritma S<em>upport Vector Machine</em> (SVM). Tujuan dari penelitian ini adalah mengoptimalisasi model yang dihasilkan dengan diterapkannya <em>feature selection</em> menggunakan algoritma <em>Informatioan Gain</em> (IG) dan <em>Chi Square</em> pada hasil model terbaik yang dihasilkan oleh SVM pada klasifikasi tingkat kepuasan pelanggan terhadap warung dan restoran kuliner di Kota Tegal sehingga terjadi peningkatan akurasi dari model yang dihasilkan. Hasil penelitian menunjukan bahwa tingkat akurasi terbaik dihasilkan oleh model SVM-IG dengan tingkat akurasi terbaik sebesar 72,45% mengalami peningkatan sekitar 3,08% yang awalnya 69.36%. Selisih rata-rata yang dihasilkan setelah dilakukannya optimasi SVM dengan <em>feature selection</em> adalah 2,51% kenaikan tingkat akurasinya. Berdasarkan hasil penelitian bahwa <em>feature selection</em> dengan menggunakan <em>Information Gain (IG)</em> (SVM-IG) memiliki tingkat akurasi lebih baik apabila dibandingkan SVM dan <em>Chi Squared</em> (SVM-CS) sehingga dengan demikian model yang diusulkan dapat meningkatkan tingkat akurasi yang dihasilkan oleh SVM menjadi lebih baik.</p><p class="Abstrak"><strong><em><br /></em></strong></p><p class="Abstrak"><strong><em>Abstract</em></strong></p><p class="Judul2"> </p><p class="Judul2"><em>The Customer needs to get a decision support in determining a choice when they’re visit a culinary restaurant accordance to their wishes especially at Tegal City. Sentiment analysis is used to provide a solution related to this problem by applying the Support Vector Machine (SVM) algorithm model. The purpose of this research is to optimize the generated model by applying feature selection using Informatioan Gain (IG) and Chi Square algorithm on the best model produced by SVM on the classification of customer satisfaction level based on culinary restaurants at Tegal City so that there is an increasing accuracy from the model. The results showed that the best accuracy level produced by the SVM-IG model with the best accuracy of 72.45% experienced an increase of about 3.08% which was initially 69.36%. The difference average produced after SVM optimization with feature selection is 2.51% increase in accuracy. Based on the results of the research, the feature selection using Information Gain (SVM-IG) has a better accuracy rate than SVM and Chi Squared (SVM-CS) so that the proposed model can improve the accuracy of SVM better.</em></p>


2014 ◽  
Vol 628 ◽  
pp. 383-389 ◽  
Author(s):  
Ya Hui Peng ◽  
Kang Peng ◽  
Jian Zhou ◽  
Zhi Xiang Liu

Due to the complex features of rock burst hazard assessment systems, a support vector machine (SVM) model for predicting of classification of rock burst was established based on the SVM theory and the actual characteristics of the project in this study. The main factors of rock burst, such as coal seam, dip, buried depth, structure situation, change of pitch angle, change of coal thickness, gas concentration, roof management, pressure relief and shooting were defined as the criterion indices for rock burst prediction in the proposed model. In order to determine reasonable and efficient the parameters of SVM, Firstly, the appropriate fitness function for genetic algorithms (GA) operation was determined, and then optimization parameters of SVM model were selected by real coded GA, therefore, the genetic algorithms and support vector machine (GSVM) model was established. A GSVM model was obtained through training 23 sets of measured data, the cross-validation method was introduced to verify the stability of GSVM model and the ratio of mis-discrimination is 0. Moreover, the proposed model was used to predict 12 new samples rock burst, the correct rate of prediction results is 91.6667% and are identical with actual situation. The results show that the genetic algorithm can speed up SVM parameter optimization search, the proposed model has a high credibility in the study of rock burst prediction of risk classification, which can be applied to practical engineering.


2018 ◽  
Vol 1 (1) ◽  
pp. 120-130 ◽  
Author(s):  
Chunxiang Qian ◽  
Wence Kang ◽  
Hao Ling ◽  
Hua Dong ◽  
Chengyao Liang ◽  
...  

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.


2012 ◽  
Vol 2012 ◽  
pp. 1-10
Author(s):  
Pijush Samui

The main objective of site characterization is the prediction of in situ soil properties at any half-space point at a site based on limited tests. In this study, the Support Vector Machine (SVM) has been used to develop a three dimensional site characterization model for Bangalore, India based on large amount of Standard Penetration Test. SVM is a novel type of learning machine based on statistical learning theory, uses regression technique by introducing ε-insensitive loss function. The database consists of 766 boreholes, with more than 2700 field SPT values () spread over 220 sq km area of Bangalore. The model is applied for corrected () values. The three input variables (, , and , where , , and are the coordinates of the Bangalore) were used for the SVM model. The output of SVM was the data. The results presented in this paper clearly highlight that the SVM is a robust tool for site characterization. In this study, a sensitivity analysis of SVM parameters (σ, , and ε) has been also presented.


2014 ◽  
Vol 24 (2) ◽  
pp. 397-404 ◽  
Author(s):  
Baozhen Yao ◽  
Ping Hu ◽  
Mingheng Zhang ◽  
Maoqing Jin

Abstract Automated Incident Detection (AID) is an important part of Advanced Traffic Management and Information Systems (ATMISs). An automated incident detection system can effectively provide information on an incident, which can help initiate the required measure to reduce the influence of the incident. To accurately detect incidents in expressways, a Support Vector Machine (SVM) is used in this paper. Since the selection of optimal parameters for the SVM can improve prediction accuracy, the tabu search algorithm is employed to optimize the SVM parameters. The proposed model is evaluated with data for two freeways in China. The results show that the tabu search algorithm can effectively provide better parameter values for the SVM, and SVM models outperform Artificial Neural Networks (ANNs) in freeway incident detection.


2018 ◽  
Vol 141 (4) ◽  
Author(s):  
Qihong Feng ◽  
Ronghao Cui ◽  
Sen Wang ◽  
Jin Zhang ◽  
Zhe Jiang

Diffusion coefficient of carbon dioxide (CO2), a significant parameter describing the mass transfer process, exerts a profound influence on the safety of CO2 storage in depleted reservoirs, saline aquifers, and marine ecosystems. However, experimental determination of diffusion coefficient in CO2-brine system is time-consuming and complex because the procedure requires sophisticated laboratory equipment and reasonable interpretation methods. To facilitate the acquisition of more accurate values, an intelligent model, termed MKSVM-GA, is developed using a hybrid technique of support vector machine (SVM), mixed kernels (MK), and genetic algorithm (GA). Confirmed by the statistical evaluation indicators, our proposed model exhibits excellent performance with high accuracy and strong robustness in a wide range of temperatures (273–473.15 K), pressures (0.1–49.3 MPa), and viscosities (0.139–1.950 mPa·s). Our results show that the proposed model is more applicable than the artificial neural network (ANN) model at this sample size, which is superior to four commonly used traditional empirical correlations. The technique presented in this study can provide a fast and precise prediction of CO2 diffusivity in brine at reservoir conditions for the engineering design and the technical risk assessment during the process of CO2 injection.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Hongbo Zhao ◽  
Zenghui Huang ◽  
Zhengsheng Zou

Stress-strain relationship of geomaterials is important to numerical analysis in geotechnical engineering. It is difficult to be represented by conventional constitutive model accurately. Artificial neural network (ANN) has been proposed as a more effective approach to represent this complex and nonlinear relationship, but ANN itself still has some limitations that restrict the applicability of the method. In this paper, an alternative method, support vector machine (SVM), is proposed to simulate this type of complex constitutive relationship. The SVM model can overcome the limitations of ANN model while still processing the advantages over the traditional model. The application examples show that it is an effective and accurate modeling approach for stress-strain relationship representation for geomaterials.


2020 ◽  
Vol 14 (1) ◽  
pp. 41-50 ◽  
Author(s):  
Hai-Bang Ly ◽  
Binh Thai Pham

Background: Shear strength of soil, the magnitude of shear stress that a soil can maintain, is an important factor in geotechnical engineering. Objective: The main objective of this study is dedicated to the development of a machine learning algorithm, namely Support Vector Machine (SVM) to predict the shear strength of soil based on 6 input variables such as clay content, moisture content, specific gravity, void ratio, liquid limit and plastic limit. Methods: An important number of experimental measurements, including more than 500 samples was gathered from the Long Phu 1 power plant project’s technical reports. The accuracy of the proposed SVM was evaluated using statistical indicators such as the coefficient of correlation (R), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) over a number of 200 simulations taking into account the random sampling effect. Finally, the most accurate SVM model was used to interpret the prediction results due to Partial Dependence Plots (PDP). Results: Validation results showed that SVM model performed well for prediction of soil shear strength (R = 0.9 to 0.95), and the moisture content, liquid limit and plastic limit were found as the three most affecting features to the prediction of soil shear strength. Conclusion: This study might help in quick and accurate prediction of soil shear strength for practical purposes in civil engineering.


Author(s):  
Sajid Umair ◽  
Muhammad Majid Sharif

Prediction of student performance on the basis of habits has been a very important research topic in academics. Studies show that selection of the correct data set also plays a vital role in these predictions. In this chapter, the authors took data from different schools that contains student habits and their comments, analyzed it using latent semantic analysis to get semantics, and then used support vector machine to classify the data into two classes, important for prediction and not important. Finally, they used artificial neural networks to predict the grades of students. Regression was also used to predict data coming from support vector machine, while giving only the important data for prediction.


Sensors ◽  
2019 ◽  
Vol 19 (22) ◽  
pp. 5018 ◽  
Author(s):  
Kyu-Won Jang ◽  
Jong-Hyeok Choi ◽  
Ji-Hoon Jeon ◽  
Hyun-Seok Kim

Combustible gases, such as CH4 and CO, directly or indirectly affect the human body. Thus, leakage detection of combustible gases is essential for various industrial sites and daily life. Many types of gas sensors are used to identify these combustible gases, but since gas sensors generally have low selectivity among gases, coupling issues often arise which adversely affect gas detection accuracy. To solve this problem, we built a decoupling algorithm with different gas sensors using a machine learning algorithm. Commercially available semiconductor sensors were employed to detect CH4 and CO, and then support vector machine (SVM) applied as a supervised learning algorithm for gas classification. We also introduced a pairing plot scheme to more effectively classify gas type. The proposed model classified CH4 and CO gases 100% correctly at all levels above the minimum concentration the gas sensors could detect. Consequently, SVM with pairing plot is a memory efficient and promising method for more accurate gas classification.


Sign in / Sign up

Export Citation Format

Share Document