Target prediction of compounds on jamu formula using nearest profile method

Jamu is one of Indonesia's cultural heritage, which consists of several plants that have been practiced for centuries in Indonesian society to maintain health and treat diseases. One of the scientification efforts of Jamu to reveal its mechanism is to predict the target-protein of the active ingredients of the Jamu. In this study, the prediction of the target compound for Jamu was carried out using a supervised learning approach involving conventional medicinal compounds as training data. The method used in this study is the closest profile method adopted from the nearest neighbor algorithm. This method is implemented in drug compound data to construct a learning model. The AUC value for measuring performance of the three implemented models is 0.62 for the fixed compound model, 0.78 for the fixed target model, and 0.83 for the mixed model. The fixed compound model is then used to construct a prediction model on the herbal medicine data with an optimal threshold value of 0.91. The model produced 10 potential compounds in the herbal formula and its 44 unique protein targets. Even though it has many limitations in obtaining a good performance, the closest profile method can be used to predict the target of the herbal compound whose target is not yet known.

Download Full-text

IMPLEMENTASI K-NEAREST NEIGHBORD PADA RAPIDMINER UNTUK PREDIKSI KELULUSAN MAHASISWA

High Education of Organization Archive Quality: Jurnal Teknologi Informasi ◽

10.52972/hoaq.vol10no1.p35-41 ◽

2018 ◽

Vol 10 (1) ◽

pp. 35-41

Author(s):

Sumarlin Sumarlin ◽

Dewi Anggraini

Keyword(s):

Cross Validation ◽

Nearest Neighbor ◽

Confusion Matrix ◽

Training Data ◽

K Nearest Neighbor ◽

Process Data ◽

Nearest Neighbor Algorithm ◽

Student Graduation ◽

K Nearest Neighbor Algorithm ◽

Auc Value

Data on graduate students is an important part in determining the quality of a private and public university. Graduate data is included in important assessments in the accreditation process. Data from Uyelindo Kupang STIKOM graduates every year will continue to grow and accumulate like neglected data because it is rarely used. To maximize student data into information that can be used by universities, the data must be processed in this case used as training data in a study using data mining to obtain information in the form of predictions of graduation from Kupang Uyelindo STIKOM students. The method used in this study is K-Nearest Neighbor using rapidminer software to measure K-Nearest Neighbor's accuracy against student graduate data. The criteria used were in the form of student names, gender, cumulative achievement index (GPA) from semester 1 to 6. In applying the K-Nearest Neighbor algorithm can be used to produce predictions of student graduation. To measure the performance of the k-nearest neighbor algorithm, the Cross Validation, Confusion Matrix and ROC Curves methods are used, in this study using a 5-fold cross validation to predict student graduation. From 100 student dataset records Uyelindo Kupang STIKOM graduates obtained accuracy rate reached 82% and included a very good classification because it has an AUC value between 0.90-1.00, which is 0.971, so it can be concluded that the accuracy of testing of student graduation models using K-Nearest Neighbor (K-NN) algorithm is influenced by the number of data clusters. Accuracy and the highest AUC value of 5-fold validation is to cluster data k = 4 with the accuracy value of 90%.

Download Full-text

k-Nearest Neighbor Learning with Graph Neural Networks

Mathematics ◽

10.3390/math9080830 ◽

2021 ◽

Vol 9 (8) ◽

pp. 830

Author(s):

Seokho Kang

Keyword(s):

Neural Network ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Weighting Function ◽

High Sensitivity ◽

Training Data ◽

K Nearest Neighbor ◽

Main Challenge ◽

Benchmark Datasets ◽

Graph Neural Networks

k-nearest neighbor (kNN) is a widely used learning algorithm for supervised learning tasks. In practice, the main challenge when using kNN is its high sensitivity to its hyperparameter setting, including the number of nearest neighbors k, the distance function, and the weighting function. To improve the robustness to hyperparameters, this study presents a novel kNN learning method based on a graph neural network, named kNNGNN. Given training data, the method learns a task-specific kNN rule in an end-to-end fashion by means of a graph neural network that takes the kNN graph of an instance to predict the label of the instance. The distance and weighting functions are implicitly embedded within the graph neural network. For a query instance, the prediction is obtained by performing a kNN search from the training data to create a kNN graph and passing it through the graph neural network. The effectiveness of the proposed method is demonstrated using various benchmark datasets for classification and regression tasks.

Download Full-text

Maximum Variance Hashing via Column Generation

Mathematical Problems in Engineering ◽

10.1155/2013/379718 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10

Author(s):

Lei Luo ◽

Chao Zhang ◽

Yongrui Qin ◽

Chunyuan Zhang

Keyword(s):

Column Generation ◽

Large Scale ◽

Web Search ◽

Nearest Neighbor ◽

Computational Cost ◽

Multimedia Retrieval ◽

Training Data ◽

Nonlinear Dimensionality Reduction ◽

Maximum Variance ◽

Data Volume

With the explosive growth of the data volume in modern applications such as web search and multimedia retrieval, hashing is becoming increasingly important for efficient nearest neighbor (similar item) search. Recently, a number of data-dependent methods have been developed, reflecting the great potential of learning for hashing. Inspired by the classic nonlinear dimensionality reduction algorithm—maximum variance unfolding, we propose a novel unsupervised hashing method, named maximum variance hashing, in this work. The idea is to maximize the total variance of the hash codes while preserving the local structure of the training data. To solve the derived optimization problem, we propose a column generation algorithm, which directly learns the binary-valued hash functions. We then extend it using anchor graphs to reduce the computational cost. Experiments on large-scale image datasets demonstrate that the proposed method outperforms state-of-the-art hashing methods in many cases.

Download Full-text

Drug Target Identification with Machine Learning: How to Choose Negative Examples

International Journal of Molecular Sciences ◽

10.3390/ijms22105118 ◽

2021 ◽

Vol 22 (10) ◽

pp. 5118

Author(s):

Matthieu Najm ◽

Chloé-Agathe Azencott ◽

Benoit Playe ◽

Véronique Stoven

Keyword(s):

Machine Learning ◽

Drug Target ◽

Target Identification ◽

Target Prediction ◽

False Positives ◽

Machine Learning Algorithms ◽

Statistical Bias ◽

Protein Targets ◽

Drug Target Identification ◽

Approved Drugs

Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases’ statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.

Download Full-text

A review of deep learning applications for genomic selection

BMC Genomics ◽

10.1186/s12864-020-07319-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Osval Antonio Montesinos-López ◽

Abelardo Montesinos-López ◽

Paulino Pérez-Rodríguez ◽

José Alberto Barrón-López ◽

Johannes W. R. Martini ◽

...

Keyword(s):

Deep Learning ◽

Plant Breeding ◽

Genomic Selection ◽

Genomic Prediction ◽

Mixed Model ◽

Prediction Models ◽

Genetic Effect ◽

Training Data ◽

Additive Genetic Effect ◽

Main Body

Abstract Background Several conventional genomic Bayesian (or no Bayesian) prediction methods have been proposed including the standard additive genetic effect model for which the variance components are estimated with mixed model equations. In recent years, deep learning (DL) methods have been considered in the context of genomic prediction. The DL methods are nonparametric models providing flexibility to adapt to complicated associations between data and output with the ability to adapt to very complex patterns. Main body We review the applications of deep learning (DL) methods in genomic selection (GS) to obtain a meta-picture of GS performance and highlight how these tools can help solve challenging plant breeding problems. We also provide general guidance for the effective use of DL methods including the fundamentals of DL and the requirements for its appropriate use. We discuss the pros and cons of this technique compared to traditional genomic prediction approaches as well as the current trends in DL applications. Conclusions The main requirement for using DL is the quality and sufficiently large training data. Although, based on current literature GS in plant and animal breeding we did not find clear superiority of DL in terms of prediction power compared to conventional genome based prediction models. Nevertheless, there are clear evidences that DL algorithms capture nonlinear patterns more efficiently than conventional genome based. Deep learning algorithms are able to integrate data from different sources as is usually needed in GS assisted breeding and it shows the ability for improving prediction accuracy for large plant breeding data. It is important to apply DL to large training-testing data sets.

Download Full-text

Android Malware Detection using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1011.0982s1219 ◽

2020 ◽

Vol 8 (2S12) ◽

pp. 65-70

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

User Interest ◽

Android Malware ◽

Android Malware Detection

Machine Learning is empowering many aspects of day-to-day lives from filtering the content on social networks to suggestions of products that we may be looking for. This technology focuses on taking objects as image input to find new observations or show items based on user interest. The major discussion here is the Machine Learning techniques where we use supervised learning where the computer learns by the input data/training data and predict result based on experience. We also discuss the machine learning algorithms: Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Decision Tress, Boosted Trees, Support Vector Machine, and use these classifiers on a dataset Malgenome and Drebin which are the Android Malware Dataset. Android is an operating system that is gaining popularity these days and with a rise in demand of these devices the rise in Android Malware. The traditional techniques methods which were used to detect malware was unable to detect unknown applications. We have run this dataset on different machine learning classifiers and have recorded the results. The experiment result provides a comparative analysis that is based on performance, accuracy, and cost.

Download Full-text

Identification of Disease on Leaves Soybean using Modified Otsu and Learning Vector Quantization Neural Networks

Kursor ◽

10.28961/kursor.v9i3.158 ◽

2018 ◽

Vol 9 (3) ◽

Cited By ~ 1

Author(s):

Candra Dewi ◽

Muhammad Sa’idul Umam ◽

Imam Cholissodin

Keyword(s):

Vector Quantization ◽

Threshold Value ◽

Learning Vector Quantization ◽

Training Data ◽

Blue Color ◽

Phakopsora Pachyrhizi ◽

Average Value ◽

Maximum Value ◽

Otsu Method ◽

Soybean Leaf

Disease of the soybean crop is one of the obstacles to increase soybean production in Indonesia. Some of these diseases usually are found in the leaves and resulted to the crop become unhealthy. This study aims to identify disease on soybean leaf through leaves image by applying the Learning Vector Quantization (LVQ) algorithm. The identification begins with preprocessing using modified Otsu method to get part of the diseases on the leaves with a certain threshold value. The next process is to identify the type of disease using LVQ. This process uses the minimum value, the maximum value and the average value of the red, green and blue color of the image. The testing conducted in this study is to identify two diseases called Peronospora manshurica (Downy Mildew) and phakopsora pachyrhizi (Karat). The result of testing by using 60 training data and the value of all recommendations parameters obtained the highest accuracy of identification is 95% %, but the more stable accuracy is 90%. This result shows that the method perform quite well identification of two mentioned disease.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Business Intelligence using the K-Nearest Neighbor Algorithm to Analyze Customer Behavior in Online Crowdfunding Systems

E3S Web of Conferences ◽

10.1051/e3sconf/202020216005 ◽

2020 ◽

Vol 202 ◽

pp. 16005

Author(s):

Chashif Syadzali ◽

Suryono Suryono ◽

Jatmiko Endro Suseno

Keyword(s):

Business Intelligence ◽

Nearest Neighbor ◽

Customer Behavior ◽

Training Data ◽

Business Strategies ◽

Intelligence Analysis ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Customer behavior classification can be useful to assist companies in conducting business intelligence analysis. Data mining techniques can classify customer behavior using the K-Nearest Neighbor algorithm based on the customer's life cycle consisting of prospect, responder, active and former. Data used to classify include age, gender, number of donations, donation retention and number of user visits. The calculation results from 2,114 data in the classification of each customer’s category are namely active by 1.18%, prospect by 8.99%, responder by 4.26% and former by 85.57%. System accuracy using a range of K from K = 1 to K = 20 produces that the highest accuracy is 94.3731% at a value of K = 4. The results of the training data that produce a classification of user behavior can be used as a Business Intelligence analysis that is useful for companies in determining business strategies by knowing the target of optimal market.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text