scholarly journals Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm

2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
ShaoPeng Wang ◽  
Yu-Hang Zhang ◽  
Jing Lu ◽  
Weiren Cui ◽  
Jerry Hu ◽  
...  

The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request.

2020 ◽  
Vol 37 (4) ◽  
pp. 563-569
Author(s):  
Dželila Mehanović ◽  
Jasmin Kevrić

Security is one of the most actual topics in the online world. Lists of security threats are constantly updated. One of those threats are phishing websites. In this work, we address the problem of phishing websites classification. Three classifiers were used: K-Nearest Neighbor, Decision Tree and Random Forest with the feature selection methods from Weka. Achieved accuracy was 100% and number of features was decreased to seven. Moreover, when we decreased the number of features, we decreased time to build models too. Time for Random Forest was decreased from the initial 2.88s and 3.05s for percentage split and 10-fold cross validation to 0.02s and 0.16s respectively.


2010 ◽  
Vol 81 (6) ◽  
pp. 574-584 ◽  
Author(s):  
Yong Yu ◽  
Chi Leung Patrick Hui ◽  
Tsan-Ming Choi ◽  
Sau Fun Frency Ng

2020 ◽  
Vol 4 (2) ◽  
pp. 39-47
Author(s):  
Junta Zeniarja ◽  
Anisatawalanita Ukhifahdhina ◽  
Abu Salam

Heart is one of the essential organs that assume a significant part in the human body. However, heart can also cause diseases that affect the death. World Health Organization (WHO) data from 2012 showed that all deaths from cardiovascular disease (vascular) 7.4 million (42.3%) were caused by heart disease. Increased cases of heart disease require a step as an early prevention and prevention efforts by making early diagnosis of heart disease. In this research will be done early diagnosis of heart disease by using data mining process in the form of classification. The algorithm used is K-Nearest Neighbor algorithm with Forward Selection method. The K-Nearest Neighbor algorithm is used for classification in order to obtain a decision result from the diagnosis of heart disease, while the forward selection is used as a feature selection whose purpose is to increase the accuracy value. Forward selection works by removing some attributes that are irrelevant to the classification process. In this research the result of accuracy of heart disease diagnosis with K-Nearest Neighbor algorithm is 73,44%, while result of K-Nearest Neighbor algorithm accuracy with feature selection method 78,66%. It is clear that the incorporation of the K-Nearest Neighbor algorithm with the forward selection method has improved the accuracy result. Keywords - K-Nearest Neighbor, Classification, Heart Disease, Forward Selection, Data Mining


2017 ◽  
Vol 18 (1) ◽  
pp. 15 ◽  
Author(s):  
Yuri Elias Rodrigues ◽  
Evandro Manica ◽  
Eduardo Rigon Zimmer ◽  
Tharick Ali Pascoal ◽  
Sulantha Sanjeewa Mathotaarachchi ◽  
...  

Biomarkers are a characteristic that is objectively measured and eval-uated as an indicator of normal biological processes, pathogenic processes or phar-macological responses to a therapeutic intervention. The combination of dierentbiomarker modalities often allows an accurate diagnosis classication. In Alzheimer'sdisease (AD), biomarkers are indispensable to identify cognitively normal individ-uals destined to develop dementia symptoms. However, using the combination ofcanonical AD biomarkers, studies have repeatedly shown poor classication ratesto dierentiate between AD, mild cognitive impairment and control individuals.Furthermore, the design of classiers to access multiple biomarker combinationsincludes issues such as imbalance classes and missing data. Since the numberbiomarker combinations is large then wrappers are used to avoid multiple com-parisons. Here, we compare the ability of three wrappers feature selection methodsto obtain biomarker combinations which maximize classication rates. Also, ascriterion to the wrappers feature selection we use the k-nearest neighbor classi-er with balance aids, random undersampling and SMOTE. Overall, our analysesshowed how biomarkers combinations aects the classier accuracy and how imbal-ance strategy improve it. We show that non-dening and non-cognitive biomarkershave less accuracy than cognitive measures when classifying AD. Our approach sur-pass in average the support vector machine and the weighted k-nearest neighborsclassiers and reaches 94.34 ± 3.91% of accuracy reproducing class denitions.


2020 ◽  
Vol 9 (1) ◽  
pp. 1560-1568

Feature selection is a method of dimension reduction that is used to select a specific subset of appropriate features from the original features by removing unnecessary and redundant features that do not have a benefit in classification or prediction. In this paper, the feature selection approach was conducted using three feature selection methods namely: Filter based, Wrapper based and Embedded based to predict household food insecurity from the household income, consumption, and expenditure survey data (HICE). To implement the above feature selection methods, we proposed new hybrid method by integrating the filter based feature selection methods which is Feature importance, Univariate (chi-square) and Correlation coefficient. To validate the efficiency of the proposed feature selection methods, we used five classification algorithms namely: K-Nearest Neighbor (KNN), Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB).


2019 ◽  
Vol 21 (4) ◽  
pp. 1378-1390 ◽  
Author(s):  
Jing Tang ◽  
Yunxia Wang ◽  
Jianbo Fu ◽  
Ying Zhou ◽  
Yongchao Luo ◽  
...  

Abstract Microbial community (MC) has great impact on mediating complex disease indications, biogeochemical cycling and agricultural productivities, which makes metaproteomics powerful technique for quantifying diverse and dynamic composition of proteins or peptides. The key role of biostatistical strategies in MC study is reported to be underestimated, especially the appropriate application of feature selection method (FSM) is largely ignored. Although extensive efforts have been devoted to assessing the performance of FSMs, previous studies focused only on their classification accuracy without considering their ability to correctly and comprehensively identify the spiked proteins. In this study, the performances of 14 FSMs were comprehensively assessed based on two key criteria (both sample classification and spiked protein discovery) using a variety of metaproteomics benchmarks. First, the classification accuracies of those 14 FSMs were evaluated. Then, their abilities in identifying the proteins of different spiked concentrations were assessed. Finally, seven FSMs (FC, LMEB, OPLS-DA, PLS-DA, SAM, SVM-RFE and T-Test) were identified as performing consistently superior or good under both criteria with the PLS-DA performing consistently superior. In summary, this study served as comprehensive analysis on the performances of current FSMs and could provide a valuable guideline for researchers in metaproteomics.


Sign in / Sign up

Export Citation Format

Share Document