Technological innovation and the future of predictive model of pandemics (Preprint)

Mapping Intimacies ◽

10.2196/preprints.28566 ◽

2021 ◽

Author(s):

Xavier Dupont

Keyword(s):

Machine Learning ◽

Technological Innovation ◽

Predictive Models ◽

State Of The Art ◽

Field Research ◽

Sir Model ◽

Rapid Progress ◽

Compartmental Modelling ◽

Research Interview ◽

Modelling Techniques

BACKGROUND As of October 2020, the COVID-19 death toll has reached over one million with 38 million confirmed cases globally. This pandemic is shaking the foundations of economies and reminding us the fragility of our system. Epidemics have affected societies since biblical times, but the recent acceleration in science and technology, as well as global cooperation, has provided scientists and mathematicians new resources, they can use to anticipate how a pandemic will spread with mathematical modelling. Compartmental modelling techniques, such as the SIR model, have been well-established for more than a century and have proven efficient and reliable in helping governments decide what strategies to use to fight pandemics. OBJECTIVE State of the art report on predictive models and technology METHODS Field research, Interview, RESULTS More recently, digitalisation and rapid progress in fields such as Machine Learning, IoT and big data have brought new perspectives to predictive models that improve their ability to predict how a pandemic will unfold and therefore which actions should be taken to eradicate the disease. This report will first review how pandemic modelling works. CONCLUSIONS It will then discuss the benefits and limitations of those models before outlining how new initiatives in several fields of technology are being used to fight the virus that causes COVID-19.

Download Full-text

Understanding and predicting COVID-19 clinical trial completion vs. cessation

PLoS ONE ◽

10.1371/journal.pone.0253789 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0253789

Author(s):

Magdalyn E. Elkin ◽

Xingquan Zhu

Keyword(s):

Machine Learning ◽

Clinical Trial ◽

Clinical Trials ◽

Computational Methods ◽

Predictive Models ◽

State Of The Art ◽

Area Under The Curve ◽

Information Criteria ◽

The Other ◽

Satisfactory Accuracy

As of March 30 2021, over 5,193 COVID-19 clinical trials have been registered through Clinicaltrial.gov. Among them, 191 trials were terminated, suspended, or withdrawn (indicating the cessation of the study). On the other hand, 909 trials have been completed (indicating the completion of the study). In this study, we propose to study underlying factors of COVID-19 trial completion vs. cessation, and design predictive models to accurately predict whether a COVID-19 trial may complete or cease in the future. We collect 4,441 COVID-19 trials from ClinicalTrial.gov to build a testbed, and design four types of features to characterize clinical trial administration, eligibility, study information, criteria, drug types, study keywords, as well as embedding features commonly used in the state-of-the-art machine learning. Our study shows that drug features and study keywords are most informative features, but all four types of features are essential for accurate trial prediction. By using predictive models, our approach achieves more than 0.87 AUC (Area Under the Curve) score and 0.81 balanced accuracy to correctly predict COVID-19 clinical trial completion vs. cessation. Our research shows that computational methods can deliver effective features to understand difference between completed vs. ceased COVID-19 trials. In addition, such models can also predict COVID-19 trial status with satisfactory accuracy, and help stakeholders better plan trials and minimize costs.

Download Full-text

Modelling Processes and Products in the Cereal Chain

Foods ◽

10.3390/foods10010082 ◽

2021 ◽

Vol 10 (1) ◽

pp. 82

Author(s):

Otilia Carvalho ◽

Maria N. Charalambides ◽

Ilija Djekić ◽

Christos Athanassiou ◽

Serafim Bakalis ◽

...

Keyword(s):

Supply Chain ◽

Predictive Models ◽

Food Processing ◽

State Of The Art ◽

Current State ◽

The World ◽

Modelling Techniques ◽

One Step ◽

High Level ◽

Different Parts

In recent years, modelling techniques have become more frequently adopted in the field of food processing, especially for cereal-based products, which are among the most consumed foods in the world. Predictive models and simulations make it possible to explore new approaches and optimize proceedings, potentially helping companies reduce costs and limit carbon emissions. Nevertheless, as the different phases of the food processing chain are highly specialized, advances in modelling are often unknown outside of a single domain, and models rarely take into account more than one step. This paper introduces the first high-level overview of modelling techniques employed in different parts of the cereal supply chain, from farming to storage, from drying to milling, from processing to consumption. This review, issued from a networking project including researchers from over 30 different countries, aims at presenting the current state of the art in each domain, showing common trends and synergies, to finally suggest promising future venues for research.

Download Full-text

Benchmarking missing-values approaches for predictive models on health databases v2

10.17504/protocols.io.b3nfqmbn ◽

2022 ◽

Author(s):

Alexandre Perez-Lebel ◽

Gaël Varoquaux ◽

Marine Le Morvan ◽

Julie Josse ◽

Jean-Baptiste Poline

Keyword(s):

Machine Learning ◽

Predictive Models ◽

Missing Values ◽

State Of The Art ◽

Computational Cost ◽

Large Data ◽

Supervised Machine Learning ◽

Computational Time ◽

Generative Modeling ◽

Predictive Approaches

BACKGROUND As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative --rather than generative-- modeling, and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. RESULTS Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: four electronic health record datasets, a population brain imaging one, a health survey and two intensive care ones. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values --with missing incorporated attribute-- leads to robust, fast, and well-performing predictive modeling. CONCLUSIONS Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

Download Full-text

Multi-hop assortativities for network classification

Journal of Complex Networks ◽

10.1093/comnet/cny034 ◽

2018 ◽

Vol 7 (4) ◽

pp. 603-622 ◽

Cited By ~ 1

Author(s):

Leonardo Gutiérrez-Gómez ◽

Jean-Charles Delvenne

Keyword(s):

Machine Learning ◽

Scientific Collaboration ◽

State Of The Art ◽

Medical Engineering ◽

Research Field ◽

Classification Task ◽

Collaboration Network ◽

Structural Patterns ◽

Art Methods

Abstract Several social, medical, engineering and biological challenges rely on discovering the functionality of networks from their structure and node metadata, when it is available. For example, in chemoinformatics one might want to detect whether a molecule is toxic based on structure and atomic types, or discover the research field of a scientific collaboration network. Existing techniques rely on counting or measuring structural patterns that are known to show large variations from network to network, such as the number of triangles, or the assortativity of node metadata. We introduce the concept of multi-hop assortativity, that captures the similarity of the nodes situated at the extremities of a randomly selected path of a given length. We show that multi-hop assortativity unifies various existing concepts and offers a versatile family of ‘fingerprints’ to characterize networks. These fingerprints allow in turn to recover the functionalities of a network, with the help of the machine learning toolbox. Our method is evaluated empirically on established social and chemoinformatic network benchmarks. Results reveal that our assortativity based features are competitive providing highly accurate results often outperforming state of the art methods for the network classification task.

Download Full-text

A comparison of the value of two machine learning predictive models to support bovine tuberculosis disease control in England

Preventive Veterinary Medicine ◽

10.1016/j.prevetmed.2021.105264 ◽

2021 ◽

Vol 188 ◽

pp. 105264

Author(s):

M. Pilar Romero ◽

Yu-Mei Chang ◽

Lucy A. Brunton ◽

Alison Prosser ◽

Paul Upton ◽

...

Keyword(s):

Machine Learning ◽

Disease Control ◽

Predictive Models ◽

Bovine Tuberculosis ◽

Tuberculosis Disease

Download Full-text

Advances in the Application of Machine Learning Techniques for Power System Analytics: A Survey

Energies ◽

10.3390/en14164776 ◽

2021 ◽

Vol 14 (16) ◽

pp. 4776

Author(s):

Seyed Mahdi Miraftabzadeh ◽

Michela Longo ◽

Federica Foiadelli ◽

Marco Pasetti ◽

Raul Igual

Keyword(s):

Machine Learning ◽

Power Systems ◽

Smart Grids ◽

State Of The Art ◽

Smart Cities ◽

Power Grids ◽

Machine Learning Techniques ◽

Learning Techniques ◽

New Research ◽

Traditional Approaches

The recent advances in computing technologies and the increasing availability of large amounts of data in smart grids and smart cities are generating new research opportunities in the application of Machine Learning (ML) for improving the observability and efficiency of modern power grids. However, as the number and diversity of ML techniques increase, questions arise about their performance and applicability, and on the most suitable ML method depending on the specific application. Trying to answer these questions, this manuscript presents a systematic review of the state-of-the-art studies implementing ML techniques in the context of power systems, with a specific focus on the analysis of power flows, power quality, photovoltaic systems, intelligent transportation, and load forecasting. The survey investigates, for each of the selected topics, the most recent and promising ML techniques proposed by the literature, by highlighting their main characteristics and relevant results. The review revealed that, when compared to traditional approaches, ML algorithms can handle massive quantities of data with high dimensionality, by allowing the identification of hidden characteristics of (even) complex systems. In particular, even though very different techniques can be used for each application, hybrid models generally show better performances when compared to single ML-based models.

Download Full-text

BiLabel-Specific Features for Multi-Label Classification

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3458283 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-23

Author(s):

Min-Ling Zhang ◽

Jun-Peng Fang ◽

Yi-Bo Wang

Keyword(s):

Predictive Models ◽

Comparative Studies ◽

State Of The Art ◽

Classification Model ◽

Generation Process ◽

Prototype Selection ◽

Class Label ◽

Benchmark Datasets ◽

Label Correlations ◽

Class Labels

In multi-label classification, the task is to induce predictive models which can assign a set of relevant labels for the unseen instance. The strategy of label-specific features has been widely employed in learning from multi-label examples, where the classification model for predicting the relevancy of each class label is induced based on its tailored features rather than the original features. Existing approaches work by generating a group of tailored features for each class label independently, where label correlations are not fully considered in the label-specific features generation process. In this article, we extend existing strategy by proposing a simple yet effective approach based on BiLabel-specific features. Specifically, a group of tailored features is generated for a pair of class labels with heuristic prototype selection and embedding. Thereafter, predictions of classifiers induced by BiLabel-specific features are ensembled to determine the relevancy of each class label for unseen instance. To thoroughly evaluate the BiLabel-specific features strategy, extensive experiments are conducted over a total of 35 benchmark datasets. Comparative studies against state-of-the-art label-specific features techniques clearly validate the superiority of utilizing BiLabel-specific features to yield stronger generalization performance for multi-label classification.

Download Full-text