Decision Tree in Biology

Several outbreak prediction models for COVID-19 are being used by officials around the world to make informed-decisions and enforce relevant control measures. Among the standard models for COVID-19 global pandemic prediction, simple epidemiological and statistical models have received more attention by authorities, and they are popular in the media. Due to a high level of uncertainty and lack of essential data, standard models have shown low accuracy for long-term prediction. Although the literature includes several attempts to address this issue, the essential generalization and robustness abilities of existing models needs to be improved. This paper presents a comparative analysis of machine learning and soft computing models to predict the COVID-19 outbreak as an alternative to SIR and SEIR models. Among a wide range of machine learning models investigated, two models showed promising results (i.e., multi-layered perceptron, MLP, and adaptive network-based fuzzy inference system, ANFIS). Based on the results reported here, and due to the highly complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research. Paper further suggests that real novelty in outbreak prediction can be realized through integrating machine learning and SEIR models.

Download Full-text

Intelligent Techniques Analysis for Glycosylation Site Prediction

Current Bioinformatics ◽

10.2174/1574893615666210108094847 ◽

2021 ◽

Vol 15 ◽

Author(s):

Alhassan Alkuhlani ◽

Walaa Gad ◽

Mohamed Roushdy ◽

Abdel-Badeeh M. Salem

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Cell Interaction ◽

Glycosylation Site ◽

Machine Learning Classification ◽

Site Prediction ◽

Glycosylation Sites ◽

Wide Range ◽

Feature Extraction And Selection ◽

Computational Intelligent

Background: Glycosylation is one of the most common post-translation modifications (PTMs) in organism cells. It plays important roles in several biological processes including cell-cell interaction, protein folding, antigen’s recognition, and immune response. In addition, glycosylation is associated with many human diseases such as cancer, diabetes and coronaviruses. The experimental techniques for identifying glycosylation sites are time-consuming, extensive laboratory work, and expensive. Therefore, computational intelligence techniques are becoming very important for glycosylation site prediction. Objective: This paper is a theoretical discussion of the technical aspects of the biotechnological (e.g., using artificial intelligence and machine learning) to digital bioinformatics research and intelligent biocomputing. The computational intelligent techniques have shown efficient results for predicting N-linked, O-linked and C-linked glycosylation sites. In the last two decades, many studies have been conducted for glycosylation site prediction using these techniques. In this paper, we analyze and compare a wide range of intelligent techniques of these studies from multiple aspects. The current challenges and difficulties facing the software developers and knowledge engineers for predicting glycosylation sites are also included. Method: The comparison between these different studies is introduced including many criteria such as databases, feature extraction and selection, machine learning classification methods, evaluation measures and the performance results. Results and conclusions: Many challenges and problems are presented. Consequently, more efforts are needed to get more accurate prediction models for the three basic types of glycosylation sites.

Download Full-text

Prediction Models for Public Health Containment Measures on COVID-19 Using Artificial Intelligence and Machine Learning: A Systematic Review

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18094499 ◽

2021 ◽

Vol 18 (9) ◽

pp. 4499

Author(s):

Anil Babu Payedimarri ◽

Diego Concina ◽

Luigi Portinale ◽

Massimo Canonico ◽

Deborah Seys ◽

...

Keyword(s):

Public Health ◽

Artificial Intelligence ◽

Machine Learning ◽

Systematic Review ◽

Public Transportation ◽

Prediction Models ◽

Positive Impact ◽

Health Interventions ◽

Public Health Interventions ◽

Commercial Activities

Artificial Intelligence (AI) and Machine Learning (ML) have expanded their utilization in different fields of medicine. During the SARS-CoV-2 outbreak, AI and ML were also applied for the evaluation and/or implementation of public health interventions aimed to flatten the epidemiological curve. This systematic review aims to evaluate the effectiveness of the use of AI and ML when applied to public health interventions to contain the spread of SARS-CoV-2. Our findings showed that quarantine should be the best strategy for containing COVID-19. Nationwide lockdown also showed positive impact, whereas social distancing should be considered to be effective only in combination with other interventions including the closure of schools and commercial activities and the limitation of public transportation. Our findings also showed that all the interventions should be initiated early in the pandemic and continued for a sustained period. Despite the study limitation, we concluded that AI and ML could be of help for policy makers to define the strategies for containing the COVID-19 pandemic.

Download Full-text

Retrosynthetic Accessibility Score (RAscore) - Rapid Machine Learned Synthesizability Classification from AI Driven Retrosynthetic Planning

10.26434/chemrxiv.13019993.v1 ◽

2020 ◽

Author(s):

Amol Thakkar ◽

Veronika Chadimova ◽

Esben Jannik Bjerrum ◽

Ola Engkvist ◽

Jean-Louis Reymond

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Virtual Screening ◽

Generative Models ◽

Synthetic Route ◽

Synthetic Accessibility ◽

Wide Range ◽

Computer Aided ◽

Synthesis Planning ◽

Retrosynthetic Analysis

<p>Computer aided synthesis planning (CASP) is part of a suite of artificial intelligence (AI) based tools that are able to propose synthesis to a wide range of compounds. However, at present they are too slow to be used to screen the synthetic feasibility of millions of generated or enumerated compounds before identification of potential bioactivity by virtual screening (VS) workflows. Herein we report a machine learning (ML) based method capable of classifying whether a synthetic route can be identified for a particular compound or not by the CASP tool AiZynthFinder. The resulting ML models return a retrosynthetic accessibility score (RAscore) of any molecule of interest, and computes 4,500 times faster than retrosynthetic analysis performed by the underlying CASP tool. The RAscore should be useful for the pre-screening millions of virtual molecules from enumerated databases or generative models for synthetic accessibility and produce higher quality databases for virtual screening of biological activity. </p>

Download Full-text

Emerging Technologies Based on Artificial Intelligence to Assess the Quality and Consumer Preference of Beverages

Beverages ◽

10.3390/beverages5040062 ◽

2019 ◽

Vol 5 (4) ◽

pp. 62 ◽

Cited By ~ 10

Author(s):

Claudia Gonzalez Viejo ◽

Damir D. Torrico ◽

Frank R. Dunshea ◽

Sigfredo Fuentes

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Food Industry ◽

Emerging Technologies ◽

Low Cost ◽

Consumer Preference ◽

Beverage Industry ◽

Wide Range ◽

Alcoholic Drinks ◽

Different Levels

Beverages is a broad and important category within the food industry, which is comprised of a wide range of sub-categories and types of drinks with different levels of complexity for their manufacturing and quality assessment. Traditional methods to evaluate the quality traits of beverages consist of tedious, time-consuming, and costly techniques, which do not allow researchers to procure results in real-time. Therefore, there is a need to test and implement emerging technologies in order to automate and facilitate those analyses within this industry. This paper aimed to present the most recent publications and trends regarding the use of low-cost, reliable, and accurate, remote or non-contact techniques using robotics, machine learning, computer vision, biometrics and the application of artificial intelligence, as well as to identify the research gaps within the beverage industry. It was found that there is a wide opportunity in the development and use of robotics and biometrics for all types of beverages, but especially for hot and non-alcoholic drinks. Furthermore, there is a lack of knowledge and clarity within the industry, and research about the concepts of artificial intelligence and machine learning, as well as that concerning the correct design and interpretation of modeling related to the lack of inclusion of relevant data, additional to presenting over- or under-fitted models.

Download Full-text

Predicting Tree Sap Flux and Stomatal Conductance from Drone-Recorded Surface Temperatures in a Mixed Agroforestry System—A Machine Learning Approach

Remote Sensing ◽

10.3390/rs12244070 ◽

2020 ◽

Vol 12 (24) ◽

pp. 4070

Author(s):

Florian Ellsäßer ◽

Alexander Röll ◽

Joyson Ahongshangbam ◽

Pierre-André Waite ◽

Hendrayanto ◽

...

Keyword(s):

Machine Learning ◽

Stomatal Conductance ◽

Prediction Models ◽

Hydrological Cycle ◽

Machine Learning Algorithms ◽

Sap Flux ◽

Plant Transpiration ◽

Surface Temperatures ◽

Whole Plant ◽

Wide Range

Plant transpiration is a key element in the hydrological cycle. Widely used methods for its assessment comprise sap flux techniques for whole-plant transpiration and porometry for leaf stomatal conductance. Recently emerging approaches based on surface temperatures and a wide range of machine learning techniques offer new possibilities to quantify transpiration. The focus of this study was to predict sap flux and leaf stomatal conductance based on drone-recorded and meteorological data and compare these predictions with in-situ measured transpiration. To build the prediction models, we applied classical statistical approaches and machine learning algorithms. The field work was conducted in an oil palm agroforest in lowland Sumatra. Random forest predictions yielded the highest congruence with measured sap flux (r2 = 0.87 for trees and r2 = 0.58 for palms) and confidence intervals for intercept and slope of a Passing-Bablok regression suggest interchangeability of the methods. Differences in model performance are indicated when predicting different tree species. Predictions for stomatal conductance were less congruent for all prediction methods, likely due to spatial and temporal offsets of the measurements. Overall, the applied drone and modelling scheme predicts whole-plant transpiration with high accuracy. We conclude that there is large potential in machine learning approaches for ecological applications such as predicting transpiration.

Download Full-text

NEW CHALLENGES FACING INTEGRATIVE BIOLOGICAL SCIENCE IN THE POST-GENOMIC ERA

Journal of Biological System ◽

10.1142/s0218339006001805 ◽

2006 ◽

Vol 14 (02) ◽

pp. 275-293 ◽

Cited By ~ 2

Author(s):

CHRISTOPHER S. OEHMEN ◽

TJERK P. STRAATSMA ◽

GORDON A. ANDERSON ◽

GALYA ORR ◽

BOBBIE-JO M. WEBB-ROBERTSON ◽

...

Keyword(s):

Paradigm Shift ◽

Large Scale ◽

Experimental Testing ◽

Spatial Scales ◽

Geographical Area ◽

Biological Data ◽

Biological Research ◽

Complex Data ◽

Discovery Research ◽

Wide Range

The future of biology will be increasingly driven by the fundamental paradigm shift from hypothesis-driven research to data-driven discovery research employing the growing volume of biological data coupled to experimental testing of new discoveries. But hardware and software limitations in the current workflow infrastructure make it impossible or intractible to use real data from disparate sources for large-scale biological research. We identify key technological developments needed to enable this paradigm shift involving (1) the ability to store and manage extremely large datasets which are dispersed over a wide geographical area, (2) development of novel analysis and visualization tools which are capable of operating on enormous data resources without overwhelming researchers with unusable information, and (3) formalisms for integrating mathematical models of biosystems from the molecular level to the organism population level. This will require the development of algorithms and tools which efficiently utilize high-performance compute power and large storage infrastructures. The end result will be the ability of a researcher to integrate complex data from many different sources with simulations to analyze a given system at a wide range of temporal and spatial scales in a single conceptual model.

Download Full-text

Interpretable machine learning in damage detection using Shapley Additive Explanations

10.31224/osf.io/96yf5 ◽

2021 ◽

Author(s):

Artur Movsessian ◽

David Garcia Cava ◽

Dmitri Tcherniak

Keyword(s):

Machine Learning ◽

Damage Detection ◽

Wind Turbine ◽

Prediction Models ◽

Turbine Blades ◽

Wind Turbine Blades ◽

Lumped Mass ◽

Damage Indices ◽

Wide Range ◽

The Difference

In recent years, Machine Learning (ML) techniques have gained popularity in Structural Health Monitoring (SHM). These have been particularly used for damage detection in a wide range of engineering applications such as wind turbine blades. The outcomes of previous research studies in this area have demonstrated the capabilities of ML for robust damage detection. However, the primary challenge facing ML in SHM is the lack of interpretability of the prediction models hindering the broader implementation of these techniques. For this purpose, this study integrates the novel Shapley Additive exPlanations (SHAP) method into a ML-based damage detection process as a tool for introducing interpretability and, thus, build evidence for reliable decision-making in SHM applications. The SHAP method is based on coalitional game theory and adds global and local interpretability to ML-based models by computing the marginal contribution of each feature. The contribution is used to understand the nature of damage indices (DIs). The applicability of the SHAP method is first demonstrated on a simple lumped mass-spring-damper system with simulated temperature variabilities. Later, the SHAP method has been evaluated on data from an in-operation V27 wind turbine with artificially introduced damage in one of its blades. The results show the relationship between the environmental and operational variabilities (EOVs) and their direct influence on the damage indices. This ultimately helps to understand the difference between false positives caused by EOVs and true positives resulting from damage in the structure.

Download Full-text

Deep Learning Prediction of Adverse Drug Reactions in Drug Discovery Using Open TG–GATEs and FAERS Databases

Frontiers in Drug Discovery ◽

10.3389/fddsv.2021.768792 ◽

2021 ◽

Vol 1 ◽

Author(s):

Attayeb Mohsen ◽

Lokesh P. Tripathi ◽

Kenji Mizuguchi

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Adverse Drug Reactions ◽

Predictive Models ◽

Prediction Models ◽

Expression Profiles ◽

Fine Tuning ◽

Machine Learning Techniques ◽

Drug Reactions ◽

Wide Range

Machine learning techniques are being increasingly used in the analysis of clinical and omics data. This increase is primarily due to the advancements in Artificial intelligence (AI) and the build-up of health-related big data. In this paper we have aimed at estimating the likelihood of adverse drug reactions or events (ADRs) in the course of drug discovery using various machine learning methods. We have also described a novel machine learning-based framework for predicting the likelihood of ADRs. Our framework combines two distinct datasets, drug-induced gene expression profiles from Open TG–GATEs (Toxicogenomics Project–Genomics Assisted Toxicity Evaluation Systems) and ADR occurrence information from FAERS (FDA [Food and Drug Administration] Adverse Events Reporting System) database, and can be applied to many different ADRs. It incorporates data filtering and cleaning as well as feature selection and hyperparameters fine tuning. Using this framework with Deep Neural Networks (DNN), we built a total of 14 predictive models with a mean validation accuracy of 89.4%, indicating that our approach successfully and consistently predicted ADRs for a wide range of drugs. As case studies, we have investigated the performances of our prediction models in the context of Duodenal ulcer and Hepatitis fulminant, highlighting mechanistic insights into those ADRs. We have generated predictive models to help to assess the likelihood of ADRs in testing novel pharmaceutical compounds. We believe that our findings offer a promising approach for ADR prediction and will be useful for researchers in drug discovery.

Download Full-text

COVID-19 Outbreak Prediction with Machine Learning

10.35542/osf.io/pzhfj ◽

2020 ◽

Author(s):

Sina Faizollahzadeh Ardabili ◽

Amir Mosavi ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

Annamaria R. Varkonyi-Koczy ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Fuzzy Inference ◽

Control Measures ◽

Future Research ◽

Complex Nature ◽

Inference System ◽

Wide Range ◽

Standard Models ◽

High Level

Several outbreak prediction models for COVID-19 are being used by officials around the world to make informed-decisions and enforce relevant control measures. Among the standard models for COVID-19 global pandemic prediction, simple epidemiological and statistical models have received more attention by authorities, and they are popular in the media. Due to a high level of uncertainty and lack of essential data, standard models have shown low accuracy for long-term prediction. Although the literature includes several attempts to address this issue, the essential generalization and robustness abilities of existing models needs to be improved. This paper presents a comparative analysis of machine learning and soft computing models to predict the COVID-19 outbreak as an alternative to SIR and SEIR models. Among a wide range of machine learning models investigated, two models showed promising results (i.e., multi-layered perceptron, MLP, and adaptive network-based fuzzy inference system, ANFIS). Based on the results reported here, and due to the highly complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research. Paper further suggests that real novelty in outbreak prediction can be realized through integrating machine learning and SEIR models.

Download Full-text