Computational Prediction of Compound–Protein Interactions for Orphan Targets Using CGBVS

A variety of Artificial Intelligence (AI)-based (Machine Learning) techniques have been developed with regard to in silico prediction of Compound–Protein interactions (CPI)—one of which is a technique we refer to as chemical genomics-based virtual screening (CGBVS). Prediction calculations done via pairwise kernel-based support vector machine (SVM) is the main feature of CGBVS which gives high prediction accuracy, with simple implementation and easy handling. We studied whether the CGBVS technique can identify ligands for targets without ligand information (orphan targets) using data from G protein-coupled receptor (GPCR) families. As the validation method, we tested whether the ligand prediction was correct for a virtual orphan GPCR in which all ligand information for one selected target was omitted from the training data. We have specifically expressed the results of this study as applicability index and developed a method to determine whether CGBVS can be used to predict GPCR ligands. Validation results showed that the prediction accuracy of each GPCR differed greatly, but models using Multiple Sequence Alignment (MSA) as the protein descriptor performed well in terms of overall prediction accuracy. We also discovered that the effect of the type compound descriptors on the prediction accuracy was less significant than that of the type of protein descriptors used. Furthermore, we found that the accuracy of the ligand prediction depends on the amount of ligand information with regard to GPCRs related to the target. Additionally, the prediction accuracy tends to be high if a large amount of ligand information for related proteins is used in the training.

Download Full-text

Optimization of PV Systems Using Data Mining and Regression Learner MPPT Techniques

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v10.i3.pp1080-1089 ◽

2018 ◽

Vol 10 (3) ◽

pp. 1080

Author(s):

Adedayo M. Farayola ◽

Ali N Hasan ◽

Ahmed Ali

Keyword(s):

Gaussian Process Regression ◽

Weather Conditions ◽

Training Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Improved Performance ◽

Using Data ◽

Artificial Neural Network Ann ◽

Power Point

<span>Supervised machine learning techniques such as artificial neural network (ANN) and ANFIS are powerful tools used to track the maximum power point (MPPT) in photovoltaic systems. However, these offline MPPT techniques still require large and accurate training data sets for successful tracking. This paper presents an innovative use of rational quadratic gaussian process regression (RQGPR) technique to generate the large and very accurate training data required for MPPT task. To confirm the effectiveness of the RQGPR technique, the combination of ANN and RQGPR as ANN-RQGPR technique results were compared with the conventional ANN technique results, and that of combined ANN and linear support vector machine regression as ANN-LSVM technique results under different weather conditions. Results show that ANN-RQGPR technique produced the overall best result and with an improved performance. </span>

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model

10.1101/205047 ◽

2017 ◽

Cited By ~ 1

Author(s):

Manato Akiyama ◽

Kengo Sato ◽

Yasubumi Sakakibara

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Secondary Structure Prediction ◽

Training Data ◽

Support Vector ◽

Rna Secondary Structure Prediction ◽

Fine Grained

AbstractMotivation: A popular approach for predicting RNA secondary structure is the thermodynamic nearest neighbor model that finds a thermodynamically most stable secondary structure with the minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such model has been reported.Results: In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning based weighted approach. Ourfine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the ℓ1 regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed.Availability: The implementation of our algorithm is available at https://github.com/keio-bioinformatics/mxfold.Contact:[email protected]

Download Full-text

Android Malware Detection using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1011.0982s1219 ◽

2020 ◽

Vol 8 (2S12) ◽

pp. 65-70

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

User Interest ◽

Android Malware ◽

Android Malware Detection

Machine Learning is empowering many aspects of day-to-day lives from filtering the content on social networks to suggestions of products that we may be looking for. This technology focuses on taking objects as image input to find new observations or show items based on user interest. The major discussion here is the Machine Learning techniques where we use supervised learning where the computer learns by the input data/training data and predict result based on experience. We also discuss the machine learning algorithms: Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Decision Tress, Boosted Trees, Support Vector Machine, and use these classifiers on a dataset Malgenome and Drebin which are the Android Malware Dataset. Android is an operating system that is gaining popularity these days and with a rise in demand of these devices the rise in Android Malware. The traditional techniques methods which were used to detect malware was unable to detect unknown applications. We have run this dataset on different machine learning classifiers and have recorded the results. The experiment result provides a comparative analysis that is based on performance, accuracy, and cost.

Download Full-text

Optimizing the prediction accuracy of load-settlement behavior of single pile using a self-learning data mining approach

MATEC Web of Conferences ◽

10.1051/matecconf/201925802010 ◽

2019 ◽

Vol 258 ◽

pp. 02010

Author(s):

Doddy Prayogo ◽

Yudas Tadeus Teddy Susanto

Keyword(s):

Data Mining ◽

Prediction Accuracy ◽

Soil Layer ◽

Training Data ◽

Support Vector ◽

Data Mining Approach ◽

Settlement Behavior ◽

Data Points ◽

Single Piles ◽

Self Learning

Pile foundations usually are used when the upper soil layers are soft clay and, hence, unable to support the structures’ loads. Piles are needed to carry these loads deep into the hard soil layer. Therefore, the safety and stability of pile-supported structures depends on the behavior of the piles. Additionally, an accurate prediction of the piles’ behavior is very important to ensure satisfactory performance of the structures. Although many methods in the literature estimate the settlement of the piles both theoretically and experimentally, methods for comprehensively predicting the load-settlement of piles are very limited. This study develops a new data mining approach called self-learning support vector machine (SL-SVM) to predict the load-settlement behavior of single piles. SL-SVM performance is investigated using 446 training data points and 53 test data points of cone penetration test (CPT) data obtained from the previous literature. The actual prediction accuracy is then compared to other prediction methods using three statistical measurements, including mean absolute error (MAE), coefficient of correlation (R), and root mean square error (RMSE). The obtained results show that SL-SVM achieves better accuracy than does LS-SVM and BPNN. This confirms the capability of the proposed data mining method to model the accurate load-settlement behavior of single piles through CPT data. The paper proposes beneficial insights for geotechnical engineers involved in estimating pile behavior.

Download Full-text

Automatic Task Classification via Support Vector Machine and Crowdsourcing

Mobile Information Systems ◽

10.1155/2018/6920679 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Hyungsik Shin ◽

Jeongyeup Paek

Keyword(s):

Support Vector Machine ◽

Mobile Devices ◽

Prediction Accuracy ◽

Training Data ◽

Amazon Mechanical Turk ◽

Support Vector ◽

Data Set ◽

English Sentence ◽

Task Classification ◽

Personal Assistant

Automatic task classification is a core part of personal assistant systems that are widely used in mobile devices such as smartphones and tablets. Even though many industry leaders are providing their own personal assistant services, their proprietary internals and implementations are not well known to the public. In this work, we show through real implementation and evaluation that automatic task classification can be implemented for mobile devices by using the support vector machine algorithm and crowdsourcing. To train our task classifier, we collected our training data set via crowdsourcing using the Amazon Mechanical Turk platform. Our classifier can classify a short English sentence into one of the thirty-two predefined tasks that are frequently requested while using personal mobile devices. Evaluation results show high prediction accuracy of our classifier ranging from 82% to 99%. By using large amount of crowdsourced data, we also illustrate the relationship between training data size and the prediction accuracy of our task classifier.

Download Full-text

Information-assisted volume rendering and visual evaluation through machine intelligence

10.32920/ryerson.14654745 ◽

2021 ◽

Author(s):

Naimul M. Khan

Keyword(s):

Volume Rendering ◽

Transfer Functions ◽

Design Method ◽

Machine Intelligence ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Volume Data ◽

Visual Evaluation

Exploration and visualization of complex data has become an integral part of life. But there is a semantic gap between the users and the visualization scientists. The priority of the users is usability while that of the scientists is techniques. Information-Assisted Visualization (IAV) can help bridge this gap, where additional information extracted from the raw data is presented to the user in an easily interpretable way. This thesis proposes some novel machine intelligence based systems for intuitive IAV. The majority of the thesis focuses on Direct Volume Rendering, where Transfer Functions (TF) are used to color the volume data to expose structures. Existing TF design methods require manipulating complex widgets, which may be difficult for the user. We propose two novel approaches towards TF design. In the data-centric approach, we generate an organized representation of the data through clustering and provide the user with some intuitive control over the output in the cluster domain. We use Spherical Self-Organizing Maps (SS)M) as the core of this approach. Instead of manipulating complex widgets, the user interacts with the simple SSOM color-coded lattice to design the TF. In the image-centric approach, the user interaction with the data is direct and minimal. The user interactions create the training data, and supervised classification is used to generate the TF. First, we propose novel supervised classifiers that combine the local information available through Support Vector Machine-based classifiers and the global information available through Nonparametric Discriminant Analysis-based classifiers. Using these classifiers, we propose a TF design method where the user interacts with the volume slices directly to generate the output. Finally, we explore the use of IAV for home-based physical rehabilitation. We propose an information-assisted visual valuation framework which can compare a user’s performance of a physical exercise with that of an expert using our novel Incremental Dynamic Time Warping method and communicate the results visually through our color-mapped skeleton silhouette. All the proposed techniques are accompanied by detailed experimental results comparing them against the state-of-the-art. The results shows the potential of using machine learning techniques to achieve visualization tasks in a simpler yet more effective way.

Download Full-text

Rotor Unbalance Kind and Severity Identification by Current Signature Analysis with Adaptative Update to Multiclass Machine Learning Algorithms

Studies in Engineering and Technology ◽

10.11114/set.v8i1.5213 ◽

2021 ◽

Vol 8 (1) ◽

pp. 28

Author(s):

S. L. Ávila ◽

H. M. Schaberle ◽

S. Youssef ◽

F. S. Pacheco ◽

C. A. Penz

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Signature Analysis ◽

Data Set ◽

Learning Techniques ◽

Environmental Variations ◽

Current Signature

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.

Download Full-text

The Analysis Performance of Heart Failure Classification by Using Machine Learning Techniques

Journal of Soft Computing and Data Mining ◽

10.30880/jscdm.2021.02.02.009 ◽

2021 ◽

Vol 2 (2) ◽

Author(s):

Nurul Farhana Hamzah ◽

◽

Nazri Mohd Nawi ◽

Abdulkareem A. Hezam ◽

◽

...

Keyword(s):

Heart Failure ◽

Congestive Heart Failure ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Algorithms ◽

Learning Techniques ◽

Boosted Decision Tree ◽

Failure Classification ◽

Artery Disease ◽

Using Data

Heart failure means that the heart is not pumping well as normal as it should be. A congestive heart failure is a form of heart failure that involves seeking timely medical care, although the two terms are sometimes used interchangeably. Heart failure happens when the heart muscle does not pump blood as well as it can, often referred to as congestive heart failure. Some disorders, such as heart's narrowed arteries (coronary artery disease) or high blood pressure, eventually make the heart too weak or rigid to fill and pump effectively. Early detection of heart failure by using data mining techniques has gained popularity among researchers. This research uses some classification techniques for heart failure classification from medical data. This research analyzed the performance of some classification algorithms, namely Support Vector Machine (SVM), Decision Forest (DF), and Boosted Decision Tree (BDT), to classify accurately heart failure risk data as input. The best algorithm among the three is discovered for heart failure classification at the end of this research.

Download Full-text

Analisis Pola Prediksi Data Time Series menggunakan Support Vector Regression, Multilayer Perceptron, dan Regresi Linear Sederhana

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v3i2.1013 ◽

2019 ◽

Vol 3 (2) ◽

pp. 282-287

Author(s):

Ika Oktavianti ◽

Ermatita Ermatita ◽

Dian Palupi Rini

Keyword(s):

Time Series ◽

Support Vector Regression ◽

Multilayer Perceptron ◽

Time Series Data ◽

Training Data ◽

Machine Learning Techniques ◽

Series Data ◽

Support Vector ◽

Contributing Factors ◽

Testing Data

Licensing services is one of the forms of public services that important in supporting increased investment in Indonesia and is currently carried out by the Investment and Licensing Services Department. The problems that occur in general are the length of time to process licenses and one of the contributing factors is the limited number of licensing officers. Licensing data is a time series data which have monthly observation. The Artificial Neural Network (ANN) and Support Vector Machine (SVR) is used as machine learning techniques to predict licensing pattern based on time series data. Of the data used dataset 1 and dataset 2, the sharing of training data and testing data is equal to 70% and 30% with consideration that training data must be more than testing data. The result of the study showed for Dataset 1, the ANN-Multilayer Perceptron have a better performance than Support Vector Regression (SVR) with MSE, MAE and RMSE values is 251.09, 11.45, and 15.84. Then for dataset 2, SVR-Linear has better performance than MLP with values of MSE, MAE and RMSE of 1839.93, 32.80, and 42.89. The dataset used to predict the number of permissions is dataset 2. The study also used the Simple Linear Regression (SLR) method to see the causal relationship between the number of licenses issued and licensing service officers. The result is that the relationship between the number of licenses issued and the number of service officers is less significant because there are other factors that affect the number of licenses.

Download Full-text