Global sequence properties for superfamily prediction: a machine learning approach

SummaryFunctional annotation of a protein sequence in the absence of experimental data or clear similarity to a sequence of known function is difficult. In this study, a simple set of sequence attributes based on physicochemical and predicted structural characteristics were used as input to machine learning methods. In order to improve performance through increasing the data available for training, a technique of sequence enrichment was explored. These methods were used to predict membership to 24 and 49 large and diverse protein superfamiles from the SCOP database.We found the best performance was obtained using an enriched training dataset. Accuracies of 66.3% and 55.6% were achieved on datasets comprising 24 and 49 superfamilies with LibSVM and AdaBoostM1 respectively.The methods used here confirm that domains within superfamilies share global sequence properties. We show machine learning models used to predict categories within the SCOP database can be significantly improved via a simple sequence enrichment step. These approaches can be used to complement profile methods for detecting distant relationships where function is difficult to infer.

Download Full-text

How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2016.09.012 ◽

2016 ◽

Vol 64 ◽

pp. 20-24 ◽

Cited By ~ 26

Author(s):

Daisuke Ichikawa ◽

Toki Saito ◽

Waka Ujita ◽

Hiroshi Oyama

Keyword(s):

Machine Learning ◽

Virtual Screening ◽

Learning Approach ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Approach

Download Full-text

To switch or not to switch – a machine learning approach for ferroelectricity

Nanoscale Advances ◽

10.1039/c9na00731h ◽

2020 ◽

Vol 2 (5) ◽

pp. 2063-2072 ◽

Cited By ~ 1

Author(s):

Sabine M. Neumayer ◽

Stephen Jesse ◽

Gabriel Velarde ◽

Andrei L. Kholkin ◽

Ivan Kravchenko ◽

...

Keyword(s):

Machine Learning ◽

Measured Signal ◽

Learning Approach ◽

Two Dimensional ◽

Dimensional Representation ◽

Machine Learning Methods ◽

Machine Learning Approach ◽

Signal Dependence ◽

Two Parameter

The introduced two-dimensional representation of two-parameter signal dependence allows for clear interpretation and classification of the measured signal upon using machine learning methods.

Download Full-text

Classification of different skarn deposits based on the compositional variability of associated grandite garnets: a data science and Machine Learning approach

10.5194/egusphere-egu21-10537 ◽

2021 ◽

Author(s):

Urmi Ghosh ◽

Tuhin Chakraborty

Keyword(s):

Machine Learning ◽

Trace Element ◽

Data Science ◽

Training Dataset ◽

Support Vector ◽

Learning Approach ◽

Machine Learning Approach ◽

Skarn Deposits

<p>Rapid technological improvements made in in-situ analysis techniques, including LA-ICPMS, have transformed the field of analytical geochemistry. This has a far-reaching impact for different petrogenetic and ore-genetic studies where minute major and trace element compositional changes between different mineral zones within a single crystal can now be demarcated. Minerals such as garnet although robust are quite sensitive to the changing P-T and fluid conditions during their formation. These minerals have become powerful tools to characterize mineralization types. Previously, Meinert (1992) has used in-situ major element EPMA analysis results to classify different skarn deposit based on the end-member composition of hydrothermal garnets. Alternatively, Tian et al. (2019) used the garnet trace element composition for the similar purpose. However, these discrimination plots/ classification schemes show major overlap in different skarn deposits, such as Fe, Cu, Zn, and Au. The present study is an attempt to use machine learning approach on available garnet data to found a more potent classification scheme for skarn deposits, thus reaffirming garnet as a faithful indicator for hydrothermal ore deposits. We have meticulously collected major and trace element data of Ca-rich garnets, associated with different skarn deposits worldwide from 40 publications. This collected data is then used to train a model for fingerprinting the skarn deposits. Stratified random sampling method has been used on the dataset with 80% of the samples as test set and the rest 20 % as training dataset. We have used K-nearest neighbour (KNN), Support Vector Machine (SVM) and Random Forest algorithms on the data by using Python as a platform. These ML classification algorithm performs better than the earlier existing models available for classification of ore types based on garnet composition in skarn system. Factor importance is calculated that shows which elements play a pivotal role in classification of the ore type. Our results depict that multiple garnet forming elements taken together can reliably be used to discriminate between different ore formation settings.</p>

Download Full-text

Machine Learning Approach to Dysphonia Detection

Applied Sciences ◽

10.3390/app8101927 ◽

2018 ◽

Vol 8 (10) ◽

pp. 1927 ◽

Cited By ~ 1

Author(s):

Zuzana Dankovičová ◽

Dávid Sovák ◽

Peter Drotár ◽

Liberios Vokorokos

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Nearest Neighbors ◽

Classification Model ◽

Support Vector ◽

Learning Approach ◽

K Nearest Neighbors ◽

Machine Learning Methods ◽

Machine Learning Approach ◽

Speech Features

This paper addresses the processing of speech data and their utilization in a decision support system. The main aim of this work is to utilize machine learning methods to recognize pathological speech, particularly dysphonia. We extracted 1560 speech features and used these to train the classification model. As classifiers, three state-of-the-art methods were used: K-nearest neighbors, random forests, and support vector machine. We analyzed the performance of classifiers with and without gender taken into account. The experimental results showed that it is possible to recognize pathological speech with as high as a 91.3% classification accuracy.

Download Full-text

Classifying Non-Sentential Utterances in Dialogue: A Machine Learning Approach

Computational Linguistics ◽

10.1162/coli.2007.33.3.397 ◽

2007 ◽

Vol 33 (3) ◽

pp. 397-427 ◽

Cited By ~ 17

Author(s):

Raquel Fernández ◽

Jonathan Ginzburg ◽

Shalom Lappin

Keyword(s):

Machine Learning ◽

Pilot Study ◽

Full Range ◽

Learning Approach ◽

Learning Methods ◽

Fine Grained ◽

Machine Learning Methods ◽

Machine Learning Approach ◽

The Right

In this article we use well-known machine learning methods to tackle a novel task, namely the classification of non-sentential utterances (NSUs) in dialogue. We introduce a fine-grained taxonomy of NSU classes based on corpus work, and then report on the results of several machine learning experiments. First, we present a pilot study focused on one of the NSU classes in the taxonomy—bare wh-phrases or “sluices”—and explore the task of disambiguating between the different readings that sluices can convey. We then extend the approach to classify the full range of NSU classes, obtaining results of around an 87% weighted F-score. Thus our experiments show that, for the taxonomy adopted, the task of identifying the right NSU class can be successfully learned, and hence provide a very encouraging basis for the more general enterprise of fully processing NSUs.

Download Full-text

On the Potential for Open-Endedness in Neural Networks

Artificial Life ◽

10.1162/artl_a_00286 ◽

2019 ◽

Vol 25 (2) ◽

pp. 145-167 ◽

Cited By ~ 2

Author(s):

Nicholas Guttenberg ◽

Nathaniel Virgo ◽

Alexandra Penn

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Neural Networks ◽

Gradient Descent ◽

The Other ◽

Learning Approach ◽

Natural Evolution ◽

Machine Learning Methods ◽

Machine Learning Approach ◽

Common Barriers

Natural evolution gives the impression of leading to an open-ended process of increasing diversity and complexity. If our goal is to produce such open-endedness artificially, this suggests an approach driven by evolutionary metaphor. On the other hand, techniques from machine learning and artificial intelligence are often considered too narrow to provide the sort of exploratory dynamics associated with evolution. In this article, we hope to bridge that gap by reviewing common barriers to open-endedness in the evolution-inspired approach and how they are dealt with in the evolutionary case—collapse of diversity, saturation of complexity, and failure to form new kinds of individuality. We then show how these problems map onto similar ones in the machine learning approach, and discuss how the same insights and solutions that alleviated those barriers in evolutionary approaches can be ported over. At the same time, the form these issues take in the machine learning formulation suggests new ways to analyze and resolve barriers to open-endedness. Ultimately, we hope to inspire researchers to be able to interchangeably use evolutionary and gradient-descent-based machine learning methods to approach the design and creation of open-ended systems.

Download Full-text

Machine learning for identifying relevant publications in updates of systematic reviews of diagnostic test studies

10.1101/2020.06.16.20132670 ◽

2020 ◽

Author(s):

Toni Lange ◽

Guido Schwarzer ◽

Thomas Datzmann ◽

Harald Binder

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Diagnostic Test ◽

Systematic Reviews ◽

Data Preprocessing ◽

Learning Approach ◽

Learning Methods ◽

Final Performance ◽

Machine Learning Methods ◽

Machine Learning Approach

AbstractBackgroundUpdating systematic reviews is often a time-consuming process involving a lot of human effort and is therefore not carried out as often as it should be. Our aim was therefore to explore the potential of machine learning methods to reduce the human workload, and to particularly also gauge the performance of deep learning methods as compared to more established machine learning methods.MethodsWe used three available reviews of diagnostic test studies as data basis. In order to identify relevant publications we used typical text pre-processing methods. The reference standard for the evaluation was the human-consensus based binary classification (inclusion, exclusion). For the evaluation of models various scenarios were generated using a grid of combinations of data preprocessing steps. Furthermore, we evaluated each machine learning approach with an approach-specific predefined grid of tuning parameters using the Brier score metric.ResultsThe best performance was obtained with an ensemble method for two of the reviews, and by a deep learning approach for the other review. Yet, the final performance of approaches is seen to strongly depend on data preparation. Overall, machine learning methods provided reasonable classification.ConclusionIt seems possible to reduce the human workload in updating systematic reviews by using machine learning methods. Yet, as the influence of data preprocessing on the final performance seems to be at least as important as choosing the specific machine learning approach, users should not blindly expect good performance just by using approaches from a popular class, such as deep learning.

Download Full-text

Prediction of Glucose Metabolism Disorder Risk Using a Machine Learning Algorithm: Pilot Study (Preprint)

10.2196/preprints.10212 ◽

2018 ◽

Author(s):

Katsutoshi Maeta ◽

Yu Nishiyama ◽

Kazutoshi Fujibayashi ◽

Toshiaki Gunji ◽

Noriko Sasabe ◽

...

Keyword(s):

Machine Learning ◽

Glucose Metabolism ◽

Plasma Glucose ◽

Prediction Accuracy ◽

Learning Approach ◽

Classification Models ◽

Machine Learning Methods ◽

Machine Learning Approach ◽

Future Risk ◽

Input Variables

BACKGROUND A 75-g oral glucose tolerance test (OGTT) provides important information about glucose metabolism, although the test is expensive and invasive. Complete OGTT information, such as 1-hour and 2-hour postloading plasma glucose and immunoreactive insulin levels, may be useful for predicting the future risk of diabetes or glucose metabolism disorders (GMD), which includes both diabetes and prediabetes. OBJECTIVE We trained several classification models for predicting the risk of developing diabetes or GMD using data from thousands of OGTTs and a machine learning technique (XGBoost). The receiver operating characteristic (ROC) curves and their area under the curve (AUC) values for the trained classification models are reported, along with the sensitivity and specificity determined by the cutoff values of the Youden index. We compared the performance of the machine learning techniques with logistic regressions (LR), which are traditionally used in medical research studies. METHODS Data were collected from subjects who underwent multiple OGTTs during comprehensive check-up medical examinations conducted at a single facility in Tokyo, Japan, from May 2006 to April 2017. For each examination, a subject was diagnosed with diabetes or prediabetes according to the American Diabetes Association guidelines. Given the data, 2 studies were conducted: predicting the risk of developing diabetes (study 1) or GMD (study 2). For each study, to apply supervised machine learning methods, the required label data was prepared. If a subject was diagnosed with diabetes or GMD at least once during the period, then that subject’s data obtained in previous trials were classified into the risk group (y=1). After data processing, 13,581 and 6760 OGTTs were analyzed for study 1 and study 2, respectively. For each study, a randomly chosen subset representing 80% of the data was used for training 9 classification models and the remaining 20% was used for evaluating the models. Three classification models, A to C, used XGBoost with various input variables, some including OGTT data. The other 6 classification models, D to I, used LR for comparison. RESULTS For study 1, the AUC values ranged from 0.78 to 0.93. For study 2, the AUC values ranged from 0.63 to 0.78. The machine learning approach using XGBoost showed better performance compared with traditional LR methods. The AUC values increased when the full OGTT variables were included. In our analysis using a particular setting of input variables, XGBoost showed that the OGTT variables were more important than fasting plasma glucose or glycated hemoglobin. CONCLUSIONS A machine learning approach, XGBoost, showed better prediction accuracy compared with LR, suggesting that advanced machine learning methods are useful for detecting the early signs of diabetes or GMD. The prediction accuracy increased when all OGTT variables were added. This indicates that complete OGTT information is important for predicting the future risk of diabetes and GMD accurately.

Download Full-text

Novel Machine learning approach for Self-Aware prediction based on the Contextual reasoning

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2021.4.4345 ◽

2021 ◽

Vol 16 (4) ◽

Author(s):

Andrius Daranda ◽

Gintautas Dzemyda

Keyword(s):

Machine Learning ◽

Threat Assessment ◽

Support Vector ◽

Learning Approach ◽

Context Aware ◽

Learning Methods ◽

Vector Method ◽

Machine Learning Methods ◽

Contextual Reasoning ◽

Machine Learning Approach

Machine learning is compelling in solving various applied problems. Nevertheless, machine learning methods lack the contextual reasoning capabilities and cannot be fitted to utilize additional information about circumstances, environments, backgrounds, etc. Such information provides essential knowledge about possible reasons for particular actions. This knowledge could not be processed directly by either machine learning methods. This paper presents the context-aware machine learning approach for actor behavior contextual reasoning analysis and context-based prediction for threat assessment. Moreover, the proposed approach uses context-aware prediction to tackle the interaction between actors. An idea of the technique lies in the cooperative use of two classification methods when one way predicts an actor’s behavior. The second method discloses such predicted action (behavior) that is non-typical or unusual. Such integration of two-method allows the actor to make the self-awareness threat assessment based on relations between different actors where some multidimensional numerical data define the connections. This approach predicts the possible further situation and makes its threat assessment without any waiting for future actions. The suggested approach is based on the Decision Tree and Support Vector Method algorithm. Due to the complexity of context, marine traffic data was chosen to demonstrate the proposed approach capability. This technique could deal with the end-to-end approach for safe vessel navigation in maritime traffic with considerable ship congestion.

Download Full-text

A Machine-Learning Approach to Identify the Influence of Temperature on FRA Measurements

Energies ◽

10.3390/en14185718 ◽

2021 ◽

Vol 14 (18) ◽

pp. 5718

Author(s):

Regelii Suassuna de Andrade Ferreira ◽

Patrick Picher ◽

Hassan Ezzaidi ◽

Issouf Fofana

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Response Analysis ◽

Learning Approach ◽

Influence Of Temperature ◽

Machine Learning Approach ◽

Electrical Faults ◽

Transformer Model ◽

The Impact

Frequency response analysis (FRA) is a powerful and widely used tool for condition assessment in power transformers. However, interpretation schemes are still challenging. Studies show that FRA data can be influenced by parameters other than winding deformation, including temperature. In this study, a machine-learning approach with temperature as an input attribute was used to objectively identify faults in FRA traces. To the best knowledge of the authors, this has not been reported in the literature. A single-phase transformer model was specifically designed and fabricated for use as a test object for the study. The model is unique in that it allows the non-destructive interchange of healthy and distorted winding sections and, hence, reproducible and repeatable FRA measurements. FRA measurements taken at temperatures ranging from −40 °C to 40 °C were used first to describe the impact of temperature on FRA traces and then to test the ability of the machine learning algorithms to discriminate between fault conditions and temperature variation. The results show that when temperature is not considered in the training dataset, the algorithm may misclassify healthy measurements, taken at different temperatures, as mechanical or electrical faults. However, once the influence of temperature was considered in the training set, the performance of the classifier as studied was restored. The results indicate the feasibility of using the proposed approach to prevent misclassification based on temperature changes.

Download Full-text