scholarly journals AMINO ACIDS PREDICT COGNITION BEYOND CLINICAL METABOLIC MARKERS: A MACHINE LEARNING APPROACH

2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S952-S952
Author(s):  
Jessie Alwerdt ◽  
Yuan Tian ◽  
Andrew D Patterson ◽  
Martin Sliwinski

Abstract Prior work has suggested that metabolic disorders increase the risk for cognitive decline. Further, studies have identified amino acids (AAs) as potential biomarkers for dementia and diabetes. This study examines AAs and metabolic clinical markers (MCM) as predictors of cognition (Processing Speed (SOP), Working Memory (WM), Fluid (Gf) and Crystallized Intelligence (Gc)). The sample included 241 middle-aged adults from Bronx, NY. Predictors included age, gender, education, ethnicity, smoking, having diabetes, glucose, insulin, triglycerides, diastolic, and systolic blood pressure (BP), and cholesterol. AAs and associated derivatives were obtained from serum using NMR-based metabolomics. Analyses were conducted for each cognitive domain using repeated cross-validation random forests and lasso regressions. Overall, all models had acceptable cross-validation mean squared error except for WM. Several MCMs were specific to each cognitive domain, such as lower triglycerides and glucose associated with better SOP and higher systolic BP associated with better Gc while none were identified for Gf. The Gf model had the least number of AAs with lower serine associated with better FI. Two AAs, higher histidine and alanine, were associated with better SOP. Further, higher alanine, valine, isoleucine, serine, methionine, betaine, and moderate tyrosine were associated with better Gc. These results indicate that AAs were specific to each cognitive domain and ranked similar or higher in importance as several MCMs These results suggest that further investigation of AAs alongside associated MCMs is needed to assess the metabolic contribution to cognitive performance. Such research will help identify specific metabolic targets relating to cognition.

Soil Research ◽  
2015 ◽  
Vol 53 (8) ◽  
pp. 907 ◽  
Author(s):  
David Clifford ◽  
Yi Guo

Given the wide variety of ways one can measure and record soil properties, it is not uncommon to have multiple overlapping predictive maps for a particular soil property. One is then faced with the challenge of choosing the best prediction at a particular point, either by selecting one of the maps, or by combining them together in some optimal manner. This question was recently examined in detail when Malone et al. (2014) compared four different methods for combining a digital soil mapping product with a disaggregation product based on legacy data. These authors also examined the issue of how to compute confidence intervals for the resulting map based on confidence intervals associated with the original input products. In this paper, we propose a new method to combine models called adaptive gating, which is inspired by the use of gating functions in mixture of experts, a machine learning approach to forming hierarchical classifiers. We compare it here with two standard approaches – inverse-variance weights and a regression based approach. One of the benefits of the adaptive gating approach is that it allows weights to vary based on covariate information or across geographic space. As such, this presents a method that explicitly takes full advantage of the spatial nature of the maps we are trying to blend. We also suggest a conservative method for combining confidence intervals. We show that the root mean-squared error of predictions from the adaptive gating approach is similar to that of other standard approaches under cross-validation. However under independent validation the adaptive gating approach works better than the alternatives and as such it warrants further study in other areas of application and further development to reduce its computational complexity.


Author(s):  
Adiba Yaseen ◽  
Sadaf Gull ◽  
Naeem Akhtar ◽  
Imran Amin ◽  
Fayyaz Minhas

Quantifying the hemolytic activity of peptides is a crucial step in the discovery of novel therapeutic peptides. Computational methods are attractive in this domain due to their ability to guide wet-lab experimental discovery or screening of peptides based on their hemolytic activity. However, existing methods are unable to accurately model various important aspects of this predictive problem such as the role of N/C-terminal modifications, D- and L- amino acids, etc. In this work, we have developed a novel neural network-based approach called HemoNet for predicting the hemolytic activity of peptides. The proposed method captures the contextual importance of different amino acids in a given peptide sequence using a specialized feature embedding in conjunction with SMILES-based fingerprint representation of N/C-terminal modifications. We have analyzed the predictive performance of the proposed method using stratified cross-validation in comparison with previous methods, non-redundant cross-validation as well as validation on external peptides and clinical antimicrobial peptides. Our analysis shows the proposed approach achieves significantly better predictive performance (AUC-ROC of 88%) in comparison to previous approaches (HemoPI and HemoPred with AUC-ROC of 73%). HemoNet can be a useful tool in the search for novel therapeutic peptides. The python implementation of the proposed method is available at the URL: https://github.com/adibayaseen/HemoNet.


2019 ◽  
Vol 15 ◽  
pp. 117693431987129 ◽  
Author(s):  
Yiyou Song ◽  
Qingru Xu ◽  
Zhen Wei ◽  
Di Zhen ◽  
Jionglong Su ◽  
...  

Currently, although many successful bioinformatics efforts have been reported in the epitranscriptomics field for N6-methyladenosine (m6A) site identification, none is focused on the substrate specificity of different m6A-related enzymes, ie, the methyltransferases (writers) and demethylases (erasers). In this work, to untangle the target specificity and the regulatory functions of different RNA m6A writers (METTL3-METT14 and METTL16) and erasers (ALKBH5 and FTO), we extracted 49 genomic features along with the conventional sequence features and used the machine learning approach of random forest to predict their epitranscriptome substrates. Our method achieved reasonable performance on both the writer target prediction (as high as 0.918) and the eraser target prediction (as high as 0.888) in a 5-fold cross-validation, and results of the gene ontology analysis of their preferential targets further revealed the functional relevance of different RNA methylation writers and erasers.


Palaios ◽  
2020 ◽  
Vol 35 (9) ◽  
pp. 391-402 ◽  
Author(s):  
RAFAEL PIRES DE LIMA ◽  
KATIE F. WELCH ◽  
JAMES E. BARRICK ◽  
KURT J. MARFURT ◽  
ROGER BURKHALTER ◽  
...  

ABSTRACT Accurate taxonomic classification of microfossils in thin-sections is an important biostratigraphic procedure. As paleontological expertise is typically restricted to specific taxonomic groups and experts are not present in all institutions, geoscience researchers often suffer from lack of quick access to critical taxonomic knowledge for biostratigraphic analyses. Moreover, diminishing emphasis on education and training in systematics poses a major challenge for the future of biostratigraphy, and on associated endeavors reliant on systematics. Here we present a machine learning approach to classify and organize fusulinids—microscopic index fossils for the late Paleozoic. The technique we employ has the potential to use such important taxonomic knowledge in models that can be applied to recognize and categorize fossil specimens. Our results demonstrate that, given adequate images and training, convolutional neural network models can correctly identify fusulinids with high levels of accuracy. Continued efforts in digitization of biological and paleontological collections at numerous museums and adoption of machine learning by paleontologists can enable the development of highly accurate and easy-to-use classification tools and, thus, facilitate biostratigraphic analyses by non-experts as well as allow for cross-validation of disparate collections around the world. Automation of classification work would also enable expert paleontologists and others to focus efforts on exploration of more complex interpretations and concepts.


2005 ◽  
Vol 22 (2) ◽  
pp. 198-206 ◽  
Author(s):  
Phillip C. Usera ◽  
John T. Foley ◽  
Joonkoo Yun

The purpose of this study was to cross-validate skinfold and anthropometric measurements for individuals with Down syndrome (DS). Estimated body fat of 14 individuals with DS and 13 individuals without DS was compared between criterion measurement (BOP POD®) and three prediction equations. Correlations between criterion and field-based tests for non-DS group and DS groups ranged from .81 – .94 and .11 – .54, respectively. Root-Mean-Squared-Error was employed to examine the amount of error on the field-based measurements. A MANOVA indicated significant differences in accuracy between groups for Jackson’s equation and Lohman’s equation. Based on the results, efforts should now be directed toward developing new equations that can assess the body composition of individuals with DS in a clinically feasible way.


Teknika ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 18-26
Author(s):  
Hendry Cipta Husada ◽  
Adi Suryaputra Paramita

Perkembangan teknologi saat ini telah memberikan kemudahan bagi banyak orang dalam mendapatkan dan menyebarkan informasi di berbagai social media platform. Twitter merupakan salah satu media yang kerap digunakan untuk menyampaikan opini sebagai bentuk reaksi seseorang atas suatu hal. Opini yang terdapat di Twitter dapat digunakan perusahaan maskapai penerbangan sebagai parameter kunci untuk mengetahui tingkat kepuasan publik sekaligus bahan evaluasi bagi perusahaan. Berdasarkan hal tersebut, diperlukan sebuah metode yang dapat secara otomatis melakukan klasifikasi opini ke dalam kategori positif, negatif, atau netral melalui proses analisis sentimen. Proses analisis sentimen dilakukan dengan proses data preprocessing, pembobotan kata menggunakan metode TF-IDF, penerapan algoritma, dan pembahasan atas hasil klasifikasi. Klasifikasi opini dilakukan dengan machine learning approach memanfaatkan algoritma multi-class Support Vector Machine (SVM). Data yang digunakan dalam penelitian ini adalah opini dalam bahasa Inggris dari para pengguna Twitter terhadap maskapai penerbangan. Berdasarkan pengujian yang telah dilakukan, hasil klasifikasi terbaik diperoleh menggunakan SVM kernel RBF pada nilai parameter 𝐶(complexity) = 10 dan 𝛾(gamma) = 1, dengan nilai accuracy sebesar 84,37% dan 80,41% ketika menggunakan 10-fold cross validation.


2021 ◽  
Author(s):  
P. Antony Seba ◽  
Bibal Benifa JV

Abstract Chronic Kidney Disease (CKD) is a gradual loss of kidney function over the period of time and it is irrevocable once functionality reaches the critical state. Detecting the various stages of CKD helps to reduce the progression of the disease. Accurate prediction of CKD stages is one of the urgent needs in the medical industry and it can be effectively done by adopting Machine Learning (ML) techniques. The primary objective of the present research is to develop an effective classification model for the accurate prediction of CKD stages based on the patient’s health profile as well as the clinical test reports. Here, a hybrid ML strategy is employed that integrates Random Forest (RF) and AdaBoost (AB) techniques through a voting classifier (VC). The standard CKD dataset with 400 tuples and 25 parameters is used for the proposed investigation. The Modification of Diet in Renal Disease (MDRD) equation is used to extract an additional feature known as “estimated Glomerular Filtration Rate (eGFR)” for the prediction of the CKD stage. Pre-processing is carried out on the CKD dataset to fill the missing values by considering the skewness of the parameters and the issue of data leakage is also well addressed. Medically important features are considered and Correlation analysis is carried out to select the appropriate features for the model building process.The proposed Hybrid Ensemble Model (HEM) aids in lowering the bias and variance. HEM model efficiency is assessed using the performance metrics such as cross validation score (CVS), accuracy, precision, recall, F1 measure, Mean Squared Error (MSE), bias and variance and it is compared with the state-of-the-art classification schemes. The outcomes of the analysis reveal that the proposed HEM ensures that the CKD stage prediction is more accurate with 99.16%, 100%, 100% in reduced feature set I, set II, set III and with cross validation score of 97.85%, 99.28%, and 99.64% with reduced features set I, set II and set III respectively.


2020 ◽  
Author(s):  
Rafael Massahiro Yassue ◽  
José Felipe Gonzaga Sabadin ◽  
Giovanni Galli ◽  
Filipe Couto Alves ◽  
Roberto Fritsche-Neto

AbstractUsually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness that we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, posthoc tests, such as ANOVA, are not recommended due to assumption unfulfilled regarding residuals independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several scenarios of validation (replicates x folds), regardless of the number of treatments. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost nor complexity, it is more reliable and allows the use of non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.


Author(s):  
Peter T. Habib ◽  
Alsamman M. Alsamman ◽  
Sameh E. Hassnein ◽  
Ghada A. Shereif ◽  
Aladdin Hamwieh

Abstractin 2019, estimated New Cases 268.600, Breast cancer has one of the most common cancers and is one of the world’s leading causes of death for women. Classification and data mining is an efficient way to classify information. Particularly in the medical field where prediction techniques are commonly used for early detection and effective treatment in diagnosis and research.These paper tests models for the mammogram analysis of breast cancer information from 23 of the more widely used machine learning algorithms such as Decision Tree, Random forest, K-nearest neighbors and support vector machine. The spontaneously splits results are distributed from a replicated 10-fold cross-validation method. The accuracy calculated by Regression Metrics such as Mean Absolute Error, Mean Squared Error, R2 Score and Clustering Metrics such as Adjusted Rand Index, Homogeneity, V-measure.accuracy has been checked F-Measure, AUC, and Cross-Validation. Thus, proper identification of patients with breast cancer would create care opportunities, for example, the supervision and the implementation of intervention plans could benefit the quality of long-term care. Experimental results reveal that the maximum precision 100%with the lowest error rate is obtained with Ada-boost Classifier.


Plant Disease ◽  
2012 ◽  
Vol 96 (6) ◽  
pp. 889-896 ◽  
Author(s):  
S. Landschoot ◽  
W. Waegeman ◽  
K. Audenaert ◽  
J. Vandepitte ◽  
G. Haesaert ◽  
...  

Despite great efforts to forecast plant diseases, many of the existing systems often fall short in providing farmers with accurate predictions. One of the main problems arises from the existence of year and location effects, so that more advanced procedures are required for evaluating existing systems in an unbiased manner. This paper illustrates the case of Fusarium head blight of winter wheat in Belgium. We present a new cross-validation strategy that enables the evaluation of the predictive performance of a forecasting system for years and locations that are different from the years and locations on which the forecast was developed. Four different cross-validation strategies and five regression techniques are used. The results demonstrated that traditional evaluation strategies are too optimistic in their predictions, whereas the cross-year cross-location validation strategy yielded more realistic outcomes. Using this procedure, the mean squared error increased and the coefficient of determination decreased in predicting disease severity and deoxynivalenol content, suggesting that existing evaluation strategies may generate a substantial optimistic bias. The strongest discrepancies between the cross-validation strategies were observed for multiple linear regression models.


Sign in / Sign up

Export Citation Format

Share Document