scholarly journals Optimasi Data Tidak Seimbang pada Interaksi Drug Target dengan Sampling dan Ensemble Support Vector Machine

2020 ◽  
Vol 7 (6) ◽  
pp. 1221
Author(s):  
Nabila Sekar Ramadhanti ◽  
Wisnu Ananta Kusuma ◽  
Annisa Annisa

<p>Data tidak seimbang menjadi salah satu masalah yang muncul pada masalah prediksi atau klasifikasi. Penelitian ini memfokuskan untuk mengatasi masalah data tidak seimbang pada prediksi <em>drug-target interaction</em> (interaksi senyawa-protein). Ada banyak protein target dan senyawa obat yang terdapat pada basis data interaksi senyawa-protein yang belum divalidasi interaksinya secara eksperimen. Belum diketahuinya interaksi antar senyawa dan target tersebut membuat proporsi antara data yang diketahui interaksinya dan yang belum dikethui menjadi tidak seimbang. Data interaksi yang sangat tidak seimbang dapat menyebabkan hasil prediksi menjadi bias. Terdapat banyak cara untuk mengatasi data tidak seimbang ini, namun pada penelitian ini diimplementasikan metode yang menggabungkan <em>Biased Support Vector Machine</em> (BSVM), <em>oversampling, </em>dan <em>undersampling</em> dengan <em>Ensemble Support Vector Machine</em> (SVM). Penelitian ini mengeksplorasi efek sampling yang digabungkan dalam metode tersebut pada data interaksi senyawa-protein. Metode ini sudah diuji pada dataset <em>Nuclear Receptor,</em> <em>G-Protein Coupled Receptor</em> dan <em>Ion Channel </em>dengan rasio ketidakseimbangannya sebesar 14.6%, 32.36%, dan 28.2%. Hasil pengujian dengan menggunakan ketiga dataset tersebut menunjukkan nilai <em>area under curve</em> (AUC) secara berturut-turut sebesar 63.4%, 71.4%, 61.3% dan F-measure sebesar 54%, 60.7% dan 39%. Nilai akurasi dari metode yang digunakan masih terbilang cukup baik, walaupun nilai tersebut lebih kecil dari metode SVM tanpa perlakuan apapun. Nilai tersebut <em>bias</em> karena nilai AUC dan F-measure ternyata lebih kecil. Hal ini membuktikan bahwa metode yang diusulkan dapat menurunkan tingkat bias pada data tidak seimbang yang diuji dan meningkatkan nilai AUC dan f-measure sekitar 5%-20%.</p><p> </p><p><em><strong>Abstract</strong></em></p><p><em>Imbalanced data </em><em>has been one of the problems that arise in processing data. This research is focusing on handling imbalanced data problem for </em><em>drug-target</em><em> </em><em>(compound-protein) interaction data. There are many target protein and drug compound existed in compound-protein interaction databases, which many interactions are not validated yet by experiment. This unknown</em><em> interaction led drug target interaction to become imbalanced data. A really imbalanced data may cause bias to prediction result. There are many ways of handling imbalanced data, but this research implemented some methods such as BSVM, oversampling, undersampling with SVM ensemble. These method already solve the imbalanced data problem on other kind of data like image data. This research is focusing on exploration of effect on the sampling that used in these method for </em><em>compound-protein</em><em> interaction data. This method had been tested on </em><em>compound-protein</em><em> interaction Nuclear Receptor, GPCR</em> <em>and Ion Channel with 14.6%, 32.36% and 28.2% of imbalance ratio. The evaluation result using these three dataset show the value of AUC respectively 63.4%, 71.4%, 61.3% and F-measure of 54%, 60.7% and 39%. The score from this method is quite good, even though the score of accuracy and precision is smaller than the SVM. The value is bias because the AUC and F-measure score is smaller. This proves that the proposed method could reduce the bias rate in the evaluated imbalanced data and increase AUC and f-measure score from 5% to 20%.</em></p><p><em><strong><br /></strong></em></p>

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Ji-Yong An ◽  
Fan-Rong Meng ◽  
Zi-Ji Yan

Abstract Background Prediction of novel Drug–Target interactions (DTIs) plays an important role in discovering new drug candidates and finding new proteins to target. In consideration of the time-consuming and expensive of experimental methods. Therefore, it is a challenging task that how to develop efficient computational approaches for the accurate predicting potential associations between drug and target. Results In the paper, we proposed a novel computational method called WELM-SURF based on drug fingerprints and protein evolutionary information for identifying DTIs. More specifically, for exploiting protein sequence feature, Position Specific Scoring Matrix (PSSM) is applied to capturing protein evolutionary information and Speed up robot features (SURF) is employed to extract sequence key feature from PSSM. For drug fingerprints, the chemical structure of molecular substructure fingerprints was used to represent drug as feature vector. Take account of the advantage that the Weighted Extreme Learning Machine (WELM) has short training time, good generalization ability, and most importantly ability to efficiently execute classification by optimizing the loss function of weight matrix. Therefore, the WELM classifier is used to carry out classification based on extracted features for predicting DTIs. The performance of the WELM-SURF model was evaluated by experimental validations on enzyme, ion channel, GPCRs and nuclear receptor datasets by using fivefold cross-validation test. The WELM-SURF obtained average accuracies of 93.54, 90.58, 85.43 and 77.45% on enzyme, ion channels, GPCRs and nuclear receptor dataset respectively. We also compared our performance with the Extreme Learning Machine (ELM), the state-of-the-art Support Vector Machine (SVM) on enzyme and ion channels dataset and other exiting methods on four datasets. By comparing with experimental results, the performance of WELM-SURF is significantly better than that of ELM, SVM and other previous methods in the domain. Conclusion The results demonstrated that the proposed WELM-SURF model is competent for predicting DTIs with high accuracy and robustness. It is anticipated that the WELM-SURF method is a useful computational tool to facilitate widely bioinformatics studies related to DTIs prediction.


2018 ◽  
Vol 12 (3) ◽  
pp. 341-347 ◽  
Author(s):  
Feng Wang ◽  
Shaojiang Liu ◽  
Weichuan Ni ◽  
Zhiming Xu ◽  
Zemin Qiu ◽  
...  

Author(s):  
Noviyanti Santoso ◽  
Wahyu Wibowo ◽  
Hilda Hikmawati

In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in the data mining classification method such as Naive Bayes, Support Vector Machine (SVM), and Random Forest (RF) is expected to improve the performance of accuracy. In this research, it was found that the data of SMOTE gave better accuracy than the original data. In addition to the three classification methods used, RF gives the highest average AUC, F-measure, and G-means score.


2014 ◽  
Vol 47 (9) ◽  
pp. 3158-3167 ◽  
Author(s):  
Yuan-Hai Shao ◽  
Wei-Jie Chen ◽  
Jing-Jing Zhang ◽  
Zhen Wang ◽  
Nai-Yang Deng

Author(s):  
Prayag Tiwari ◽  
Brojo Kishore Mishra ◽  
Sachin Kumar ◽  
Vivek Kumar

Sentiment Analysis intends to get the basic perspective of the content, which may be anything that holds a subjective supposition, for example, an online audit, Comments on Blog posts, film rating and so forth. These surveys and websites might be characterized into various extremity gatherings, for example, negative, positive, and unbiased keeping in mind the end goal to concentrate data from the info dataset. Supervised machine learning strategies group these reviews. In this paper, three distinctive machine learning calculations, for example, Support Vector Machine (SVM), Maximum Entropy (ME) and Naive Bayes (NB), have been considered for the arrangement of human conclusions. The exactness of various strategies is basically inspected keeping in mind the end goal to get to their execution on the premise of parameters, e.g. accuracy, review, f-measure, and precision.


Sign in / Sign up

Export Citation Format

Share Document