scholarly journals Flexible Bayesian Nonlinear Model Configuration

2021 ◽  
Vol 72 ◽  
pp. 901-942
Author(s):  
Aliaksandr Hubin ◽  
Geir Storvik ◽  
Florian Frommlet

Regression models are used in a wide range of applications providing a powerful scientific tool for researchers from different fields. Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response. Such relationships can be better described through  flexible approaches such as neural networks, but this results in less interpretable models and potential overfitting. Alternatively, specific parametric nonlinear functions can be used, but the specification of such functions is in general complicated. In this paper, we introduce a  flexible approach for the construction and selection of highly  flexible nonlinear parametric regression models. Nonlinear features are generated hierarchically, similarly to deep learning, but have additional  flexibility on the possible types of features to be considered. This  flexibility, combined with variable selection, allows us to find a small set of important features and thereby more interpretable models. Within the space of possible functions, a Bayesian approach, introducing priors for functions based on their complexity, is considered. A genetically modi ed mode jumping Markov chain Monte Carlo algorithm is adopted to perform Bayesian inference and estimate posterior probabilities for model averaging. In various applications, we illustrate how our approach is used to obtain meaningful nonlinear models. Additionally, we compare its predictive performance with several machine learning algorithms.  

2021 ◽  
Author(s):  
Zhen Chen ◽  
Pei Zhao ◽  
Chen Li ◽  
Fuyi Li ◽  
Dongxu Xiang ◽  
...  

Abstract Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these efforts. We introduce iLearnPlus, the first machine-learning platform with graphical- and web-based interfaces for the construction of machine-learning pipelines for analysis and predictions using nucleic acid and protein sequences. iLearnPlus provides a comprehensive set of algorithms and automates sequence-based feature extraction and analysis, construction and deployment of models, assessment of predictive performance, statistical analysis, and data visualization; all without programming. iLearnPlus includes a wide range of feature sets which encode information from the input sequences and over twenty machine-learning algorithms that cover several deep-learning approaches, outnumbering the current solutions by a wide margin. Our solution caters to experienced bioinformaticians, given the broad range of options, and biologists with no programming background, given the point-and-click interface and easy-to-follow design process. We showcase iLearnPlus with two case studies concerning prediction of long noncoding RNAs (lncRNAs) from RNA transcripts and prediction of crotonylation sites in protein chains. iLearnPlus is an open-source platform available at https://github.com/Superzchen/iLearnPlus/ with the webserver at http://ilearnplus.erc.monash.edu/.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Onkar Singh ◽  
Wen-Lian Hsu ◽  
Emily Chia-Yu Su

Abstract Background Antimicrobial peptides (AMPs) are oligopeptides that act as crucial components of innate immunity, naturally occur in all multicellular organisms, and are involved in the first line of defense function. Recent studies showed that AMPs perpetuate great potential that is not limited to antimicrobial activity. They are also crucial regulators of host immune responses that can modulate a wide range of activities, such as immune regulation, wound healing, and apoptosis. However, a microorganism's ability to adapt and to resist existing antibiotics triggered the scientific community to develop alternatives to conventional antibiotics. Therefore, to address this issue, we proposed Co-AMPpred, an in silico-aided AMP prediction method based on compositional features of amino acid residues to classify AMPs and non-AMPs. Results In our study, we developed a prediction method that incorporates composition-based sequence and physicochemical features into various machine-learning algorithms. Then, the boruta feature-selection algorithm was used to identify discriminative biological features. Furthermore, we only used discriminative biological features to develop our model. Additionally, we performed a stratified tenfold cross-validation technique to validate the predictive performance of our AMP prediction model and evaluated on the independent holdout test dataset. A benchmark dataset was collected from previous studies to evaluate the predictive performance of our model. Conclusions Experimental results show that combining composition-based and physicochemical features outperformed existing methods on both the benchmark training dataset and a reduced training dataset. Finally, our proposed method achieved 80.8% accuracies and 0.871 area under the receiver operating characteristic curve by evaluating on independent test set. Our code and datasets are available at https://github.com/onkarS23/CoAMPpred.


2021 ◽  
Author(s):  
Lyudmila Babeshko ◽  
Mihail Bich ◽  
Irina Orlova

The textbook covers a wide range of issues related to econometric modeling. Regression models are the core of econometric modeling, so the issues of their evaluation, testing of assumptions, adjustment and verification are given a significant place. Various aspects of multiple regression models are included: multicollinearity, dummy variables, and lag structure of variables. Methods of linearization and estimation of nonlinear models are considered. An apparatus for evaluating systems of simultaneous and apparently unrelated equations is presented. Attention is paid to time series models. Detailed solutions of the examples in Excel and the R software environment are included. Meets the requirements of the federal state educational standards of higher education of the latest generation. For undergraduate and graduate students studying in the field of "Economics", the curriculum of which includes the disciplines "Econometrics"," Econometric Modeling","Econometric research".


2020 ◽  
Author(s):  
Sarah Delanys ◽  
Farah Benamara ◽  
Véronique Moriceau ◽  
François Olivier ◽  
Josiane Mothe

BACKGROUND With the advent of digital technology and specifically user generated contents in social media, new ways emerged for studying possible stigma of people in relation with mental health. Several pieces of work studied the discourse conveyed about psychiatric pathologies on Twitter considering mostly tweets in English and a limited number of psychiatric disorders terms. This paper proposes the first study to analyze the use of a wide range of psychiatric terms in tweets in French. OBJECTIVE Our aim is to study how generic, nosographic and therapeutic psychiatric terms are used on Twitter in French. More specifically, our study has three complementary goals: (1) to analyze the types of psychiatric word use namely medical, misuse, irrelevant, (2) to analyze the polarity conveyed in the tweets that use these terms (positive/negative/neural), and (3) to compare the frequency of these terms to those observed in related work (mainly in English ). METHODS Our study has been conducted on a corpus of tweets in French posted between 01/01/2016 to 12/31/2018 and collected using dedicated keywords. The corpus has been manually annotated by clinical psychiatrists following a multilayer annotation scheme that includes the type of word use and the opinion orientation of the tweet. Two analysis have been performed. First a qualitative analysis to measure the reliability of the produced manual annotation, then a quantitative analysis considering mainly term frequency in each layer and exploring the interactions between them. RESULTS One of the first result is a resource as an annotated dataset . The initial dataset is composed of 22,579 tweets in French containing at least one of the selected psychiatric terms. From this set, experts in psychiatry randomly annotated 3,040 tweets that corresponds to the resource resulting from our work. The second result is the analysis of the annotations; it shows that terms are misused in 45.3% of the tweets and that their associated polarity is negative in 86.2% of the cases. When considering the three types of term use, 59.5% of the tweets are associated to a negative polarity. Misused terms related to psychotic disorders (55.5%) are more frequent to those related to mood disorders (26.5%). CONCLUSIONS Some psychiatric terms are misused in the corpora we studied; which is consistent with the results reported in related work in other languages. Thanks to the great diversity of studied terms, this work highlighted a disparity in the representations and ways of using psychiatric terms. Moreover, our study is important to help psychiatrists to be aware of the term use in new communication media such as social networks which are widely used. This study has the huge advantage to be reproducible thanks to the framework and guidelines we produced; so that the study could be renewed in order to analyze the evolution of term usage. While the newly build dataset is a valuable resource for other analytical studies, it could also serve to train machine learning algorithms to automatically identify stigma in social media.


Author(s):  
Kosuke Inoue ◽  
Roch Nianogo ◽  
Donatello Telesca ◽  
Atsushi Goto ◽  
Vahe Khachadourian ◽  
...  

Abstract Objective It is unclear whether relatively low glycated haemoglobin (HbA1c) levels are beneficial or harmful for the long-term health outcomes among people without diabetes. We aimed to investigate the association between low HbA1c levels and mortality among the US general population. Methods This study includes a nationally representative sample of 39 453 US adults from the National Health and Nutrition Examination Surveys 1999–2014, linked to mortality data through 2015. We employed the parametric g-formula with pooled logistic regression models and the ensemble machine learning algorithms to estimate the time-varying risk of all-cause and cardiovascular mortality by HbA1c categories (low, 4.0 to <5.0%; mid-level, 5.0 to <5.7%; prediabetes, 5.7 to <6.5%; and diabetes, ≥6.5% or taking antidiabetic medication), adjusting for 72 potential confounders including demographic characteristics, lifestyle, biomarkers, comorbidities and medications. Results Over a median follow-up of 7.5 years, 5118 (13%) all-cause deaths, and 1116 (3%) cardiovascular deaths were observed. Logistic regression models and machine learning algorithms showed nearly identical predictive performance of death and risk estimates. Compared with mid-level HbA1c, low HbA1c was associated with a 30% (95% CI, 16 to 48) and a 12% (95% CI, 3 to 22) increased risk of all-cause mortality at 5 years and 10 years of follow-up, respectively. We found no evidence that low HbA1c levels were associated with cardiovascular mortality risk. The diabetes group, but not the prediabetes group, also showed an increased risk of all-cause mortality. Conclusions Using the US national database and adjusting for an extensive set of potential confounders with flexible modelling, we found that adults with low HbA1c were at increased risk of all-cause mortality. Further evaluation and careful monitoring of low HbA1c levels need to be considered.


2021 ◽  
Vol 13 (11) ◽  
pp. 2074
Author(s):  
Ryan R. Reisinger ◽  
Ari S. Friedlaender ◽  
Alexandre N. Zerbini ◽  
Daniel M. Palacios ◽  
Virginia Andrews-Goff ◽  
...  

Machine learning algorithms are often used to model and predict animal habitat selection—the relationships between animal occurrences and habitat characteristics. For broadly distributed species, habitat selection often varies among populations and regions; thus, it would seem preferable to fit region- or population-specific models of habitat selection for more accurate inference and prediction, rather than fitting large-scale models using pooled data. However, where the aim is to make range-wide predictions, including areas for which there are no existing data or models of habitat selection, how can regional models best be combined? We propose that ensemble approaches commonly used to combine different algorithms for a single region can be reframed, treating regional habitat selection models as the candidate models. By doing so, we can incorporate regional variation when fitting predictive models of animal habitat selection across large ranges. We test this approach using satellite telemetry data from 168 humpback whales across five geographic regions in the Southern Ocean. Using random forests, we fitted a large-scale model relating humpback whale locations, versus background locations, to 10 environmental covariates, and made a circumpolar prediction of humpback whale habitat selection. We also fitted five regional models, the predictions of which we used as input features for four ensemble approaches: an unweighted ensemble, an ensemble weighted by environmental similarity in each cell, stacked generalization, and a hybrid approach wherein the environmental covariates and regional predictions were used as input features in a new model. We tested the predictive performance of these approaches on an independent validation dataset of humpback whale sightings and whaling catches. These multiregional ensemble approaches resulted in models with higher predictive performance than the circumpolar naive model. These approaches can be used to incorporate regional variation in animal habitat selection when fitting range-wide predictive models using machine learning algorithms. This can yield more accurate predictions across regions or populations of animals that may show variation in habitat selection.


2021 ◽  
Vol 11 (2) ◽  
pp. 787
Author(s):  
Bartłomiej Ambrożkiewicz ◽  
Grzegorz Litak ◽  
Anthimos Georgiadis ◽  
Nicolas Meier ◽  
Alexander Gassner

Often the input values used in mathematical models for rolling bearings are in a wide range, i.e., very small values of deformation and damping are confronted with big values of stiffness in the governing equations, which leads to miscalculations. This paper presents a two degrees of freedom (2-DOF) dimensionless mathematical model for ball bearings describing a procedure, which helps to scale the problem and reveal the relationships between dimensionless terms and their influence on the system’s response. The derived mathematical model considers nonlinear features as stiffness, damping, and radial internal clearance referring to the Hertzian contact theory. Further, important features are also taken into account including an external load, the eccentricity of the shaft-bearing system, and shape errors on the raceway investigating variable dynamics of the ball bearing. Analysis of obtained responses with Fast Fourier Transform, phase plots, orbit plots, and recurrences provide a rich source of information about the dynamics of the system and it helped to find the transition between the periodic and chaotic response and how it affects the topology of RPs and recurrence quantificators.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sakthi Kumar Arul Prakash ◽  
Conrad Tucker

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Imogen Schofield ◽  
David C. Brodbelt ◽  
Noel Kennedy ◽  
Stijn J. M. Niessen ◽  
David B. Church ◽  
...  

AbstractCushing’s syndrome is an endocrine disease in dogs that negatively impacts upon the quality-of-life of affected animals. Cushing’s syndrome can be a challenging diagnosis to confirm, therefore new methods to aid diagnosis are warranted. Four machine-learning algorithms were applied to predict a future diagnosis of Cushing's syndrome, using structured clinical data from the VetCompass programme in the UK. Dogs suspected of having Cushing's syndrome were included in the analysis and classified based on their final reported diagnosis within their clinical records. Demographic and clinical features available at the point of first suspicion by the attending veterinarian were included within the models. The machine-learning methods were able to classify the recorded Cushing’s syndrome diagnoses, with good predictive performance. The LASSO penalised regression model indicated the best overall performance when applied to the test set with an AUROC = 0.85 (95% CI 0.80–0.89), sensitivity = 0.71, specificity = 0.82, PPV = 0.75 and NPV = 0.78. The findings of our study indicate that machine-learning methods could predict the future diagnosis of a practicing veterinarian. New approaches using these methods could support clinical decision-making and contribute to improved diagnosis of Cushing’s syndrome in dogs.


Sign in / Sign up

Export Citation Format

Share Document