scholarly journals Adaptive Sparse Representation of Continuous Input for Tsetlin Machines Based on Stochastic Searching on the Line

Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2107
Author(s):  
Kuruge Darshana Abeyrathna ◽  
Ole-Christoffer Granmo ◽  
Morten Goodwin

This paper introduces a novel approach to representing continuous inputs in Tsetlin Machines (TMs). Instead of using one Tsetlin Automaton (TA) for every unique threshold found when Booleanizing continuous input, we employ two Stochastic Searching on the Line (SSL) automata to learn discriminative lower and upper bounds. The two resulting Boolean features are adapted to the rest of the clause by equipping each clause with its own team of SSLs, which update the bounds during the learning process. Two standard TAs finally decide whether to include the resulting features as part of the clause. In this way, only four automata altogether represent one continuous feature (instead of potentially hundreds of them). We evaluate the performance of the new scheme empirically using five datasets, along with a study of interpretability. On average, TMs with SSL feature representation use 4.3 times fewer literals than the TM with static threshold-based features. Furthermore, in terms of average memory usage and F1-Score, our approach outperforms simple Multi-Layered Artificial Neural Networks, Decision Trees, Support Vector Machines, K-Nearest Neighbor, Random Forest, Gradient Boosted Trees (XGBoost), and Explainable Boosting Machines (EBMs), as well as the standard and real-value weighted TMs. Our approach further outperforms Neural Additive Models on Fraud Detection and StructureBoost on CA-58 in terms of the Area Under Curve while performing competitively on COMPAS.

2018 ◽  
Vol 35 (16) ◽  
pp. 2757-2765 ◽  
Author(s):  
Balachandran Manavalan ◽  
Shaherin Basith ◽  
Tae Hwan Shin ◽  
Leyi Wei ◽  
Gwang Lee

AbstractMotivationCardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.ResultsIn this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6–7% in both benchmarking and independent datasets.Availability and implementationThe user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.Supplementary informationSupplementary data are available at Bioinformatics online.


2015 ◽  
Vol 11 (A29A) ◽  
pp. 209-209
Author(s):  
Bo Han ◽  
Hongpeng Ding ◽  
Yanxia Zhang ◽  
Yongheng Zhao

AbstractCatastrophic failure is an unsolved problem existing in the most photometric redshift estimation approaches for a long history. In this study, we propose a novel approach by integration of k-nearest-neighbor (KNN) and support vector machine (SVM) methods together. Experiments based on the quasar sample from SDSS show that the fusion approach can significantly mitigate catastrophic failure and improve the accuracy of photometric redshift estimation.


Author(s):  
SHITALA PRASAD ◽  
GYANENDRA K. VERMA ◽  
BHUPESH KUMAR SINGH ◽  
PIYUSH KUMAR

This paper, proposes a novel approach for feature extraction based on the segmentation and morphological alteration of handwritten multi-lingual characters. We explored multi-resolution and multi-directional transforms such as wavelet, curvelet and ridgelet transform to extract classifying features of handwritten multi-lingual images. Evaluating the pros and cons of each multi-resolution algorithm has been discussed and resolved that Curvelet-based features extraction is most promising for multi-lingual character recognition. We have also applied some morphological operation such as thinning and thickening then feature level fusion is performed in order to create robust feature vector for classification. The classification is performed with K-nearest neighbor (K-NN) and support vector machine (SVM) classifier with their relative performance. We experiment with our in-house dataset, compiled in our lab by more than 50 personnel.


2019 ◽  
Vol 2019 ◽  
pp. 1-7
Author(s):  
Pengliang Chen ◽  
Pengwei Shi ◽  
Gang Du ◽  
Zhen Zhang ◽  
Liang Liu

Predicting the outcome after a cancer diagnosis is critical. Advances in high-throughput sequencing technologies provide physicians with vast amounts of data, yet prognostication remains challenging because the data are greatly dimensional and complex. We evaluated Wnt/β-catenin, carbohydrate metabolism, and PI3K-Akt signaling pathway-related genes as predictive features for classifying tumors and normal samples. Using differentially expressed genes as controls, these pathway-related genes were assessed for accuracy using support-vector machines and three other recommended machine learning models, namely, the random forest, decision tree, and k-nearest neighbor algorithms. The first two outperformed the others. All candidate pathway-related genes yielded areas under the curve exceeding 95.00% for cancer outcomes, and they were most accurate in predicting colorectal cancer. These results suggest that these pathway-related genes are useful and accurate biomarkers for understanding the mechanisms behind cancer development.


2021 ◽  
pp. 179-218
Author(s):  
Magy Seif El-Nasr ◽  
Truong Huy Nguyen Dinh ◽  
Alessandro Canossa ◽  
Anders Drachen

This chapter discusses several classification and regression methods that can be used with game data. Specifically, we will discuss regression methods, including Linear Regression, and classification methods, including K-Nearest Neighbor, Naïve Bayes, Logistic Regression, Linear Discriminant Analysis, Support Vector Machines, Decisions Trees, and Random Forests. We will discuss how you can setup the data to apply these algorithms, as well as how you can interpret the results and the pros and cons for each of the methods discussed. We will conclude the chapter with some remarks on the process of application of these methods to games and the expected outcomes. The chapter also includes practical labs to walk you through the process of applying these methods to real game data.


2020 ◽  
Vol 38 (4) ◽  
pp. 1073-1082
Author(s):  
Florença das Graças MOURA ◽  
Álvaro Xavier FERREIRA ◽  
Tati ALMEIDA ◽  
Jérémie GARNIER ◽  
Rejane Ennes CICERELLI ◽  
...  

O lago Poópo é o segundo maior lago da Bolívia e atualmente vem passando por uma forte crise hídrica que alguns autores associam diretamente a mudança de ocupação da terra. Neste trabalho foi realizada a classificação do uso e ocupação do solo na sub-bacia P6 do lago entre os anos de 1985 e 2017. Foi analisado o desempenho dos classificadores SVM (Support Vector Machines), KNN (K-Nearest Neighbor) e MaxVer (Máxima Verossimilhança). A classificação que obteve melhor acurácia foi a gerada pelo classificador SVM, em que o valor do índice Kappa foi de 82,28% e 83,7% para as imagens Landat-5 e Landsat-8, respectivamente, e a exatidão global foi de 92% para ambas as imagens. A partir das classificações geradas foi verificado que as maiores alterações se deram nas classes de vegetação nativa, agricultura e área úmida. A perda de área úmida na sub-bacia vem ocorrendo desde 1995, 15 anos antes do aumento da atividade agrícola, que começou a partir de 2010. Assim, diversos são os fatores que podem estar contribuindo com essa redução acelerada dos corpos de água, como variações climáticas locais e as atividades antrópicas que interferem no ciclo hidrológico de forma regional.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jie Pan ◽  
Li-Ping Li ◽  
Chang-Qing Yu ◽  
Zhu-Hong You ◽  
Zhong-Hao Ren ◽  
...  

Protein-protein interactions (PPIs) in plants are crucial for understanding biological processes. Although high-throughput techniques produced valuable information to identify PPIs in plants, they are usually expensive, inefficient, and extremely time-consuming. Hence, there is an urgent need to develop novel computational methods to predict PPIs in plants. In this article, we proposed a novel approach to predict PPIs in plants only using the information of protein sequences. Specifically, plants’ protein sequences are first converted as position-specific scoring matrix (PSSM); then, the fast Walsh–Hadamard transform (FWHT) algorithm is used to extract feature vectors from PSSM to obtain evolutionary information of plant proteins. Lastly, the rotation forest (RF) classifier is trained for prediction and produced a series of evaluation results. In this work, we named this approach FWHT-RF because FWHT and RF are used for feature extraction and classification, respectively. When applying FWHT-RF on three plants’ PPI datasets Maize, Rice, and Arabidopsis thaliana (Arabidopsis), the average accuracies of FWHT-RF using 5-fold cross validation were achieved as high as 95.20%, 94.42%, and 83.85%, respectively. To further evaluate the predictive power of FWHT-RF, we compared it with the state-of-art support vector machine (SVM) and K-nearest neighbor (KNN) classifier in different aspects. The experimental results demonstrated that FWHT-RF can be a useful supplementary method to predict potential PPIs in plants.


2020 ◽  
Vol 10 (2) ◽  
pp. 127-142
Author(s):  
An-Vinh Luong ◽  
Diep Nguyen ◽  
Dien Dinh

The readability of the text plays a very important role in selecting appropriate materials for the level of the reader. Text readability in Vietnamese language has received a lot of attention in recent years, however, studies have mainly been limited to simple statistics at the level of a sentence length, word length, etc. In this article, we investigate the role of word-level grammatical characteristics in assessing the difficulty of texts in Vietnamese textbooks. We have used machine learning models (for instance, Decision Tree, K-nearest neighbor, Support Vector Machines, etc.) to evaluate the accuracy of classifying texts according to readability, using grammatical features in word level along with other statistical characteristics. Empirical results show that the presence of POS-level characteristics increases the accuracy of the classification by 2-4%.


Sign in / Sign up

Export Citation Format

Share Document