Understanding and predicting COVID-19 clinical trial completion vs. cessation

As of March 30 2021, over 5,193 COVID-19 clinical trials have been registered through Clinicaltrial.gov. Among them, 191 trials were terminated, suspended, or withdrawn (indicating the cessation of the study). On the other hand, 909 trials have been completed (indicating the completion of the study). In this study, we propose to study underlying factors of COVID-19 trial completion vs. cessation, and design predictive models to accurately predict whether a COVID-19 trial may complete or cease in the future. We collect 4,441 COVID-19 trials from ClinicalTrial.gov to build a testbed, and design four types of features to characterize clinical trial administration, eligibility, study information, criteria, drug types, study keywords, as well as embedding features commonly used in the state-of-the-art machine learning. Our study shows that drug features and study keywords are most informative features, but all four types of features are essential for accurate trial prediction. By using predictive models, our approach achieves more than 0.87 AUC (Area Under the Curve) score and 0.81 balanced accuracy to correctly predict COVID-19 clinical trial completion vs. cessation. Our research shows that computational methods can deliver effective features to understand difference between completed vs. ceased COVID-19 trials. In addition, such models can also predict COVID-19 trial status with satisfactory accuracy, and help stakeholders better plan trials and minimize costs.

Download Full-text

Predictive modeling of clinical trial terminations using feature engineering and embedding learning

Scientific Reports ◽

10.1038/s41598-021-82840-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Magdalyn E. Elkin ◽

Xingquan Zhu

Keyword(s):

Machine Learning ◽

Clinical Trial ◽

Clinical Trials ◽

Area Under The Curve ◽

Common Factors ◽

Information Criteria ◽

Feature Engineering ◽

Exclusion Criteria ◽

Direct Estimate ◽

Satisfactory Prediction

AbstractIn this study, we propose to use machine learning to understand terminated clinical trials. Our goal is to answer two fundamental questions: (1) what are common factors/markers associated to terminated clinical trials? and (2) how to accurately predict whether a clinical trial may be terminated or not? The answer to the first question provides effective ways to understand characteristics of terminated trials for stakeholders to better plan their trials; and the answer to the second question can direct estimate the chance of success of a clinical trial in order to minimize costs. By using 311,260 trials to build a testbed with 68,999 samples, we use feature engineering to create 640 features, reflecting clinical trial administration, eligibility, study information, criteria etc. Using feature ranking, a handful of features, such as trial eligibility, trial inclusion/exclusion criteria, sponsor types etc., are found to be related to the clinical trial termination. By using sampling and ensemble learning, we achieve over 67% Balanced Accuracy and over 0.73 AUC (Area Under the Curve) scores to correctly predict clinical trial termination, indicating that machine learning can help achieve satisfactory prediction results for clinical trial study.

Download Full-text

O-203 Application of machine learning to predict aneuploidy and mosaicism in embryos from in vitro fertilization (IVF) cycles

Human Reproduction ◽

10.1093/humrep/deab128.014 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

J A Ortiz ◽

R Morales ◽

B Lledo ◽

E Garcia-Hernandez ◽

A Cascales ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Predictive Models ◽

Maternal Age ◽

The Other ◽

Predictor Variables ◽

Learning Models ◽

Male Factor ◽

Factors Associated ◽

Machine Learning Models

Abstract Study question Is it possible to predict the likelihood of an IVF embryo being aneuploid and/or mosaic using a machine learning algorithm? Summary answer There are paternal, maternal, embryonic and IVF-cycle factors that are associated with embryonic chromosomal status that can be used as predictors in machine learning models. What is known already The factors associated with embryonic aneuploidy have been extensively studied. Mostly maternal age and to a lesser extent male factor and ovarian stimulation have been related to the occurrence of chromosomal alterations in the embryo. On the other hand, the main factors that may increase the incidence of embryo mosaicism have not yet been established. The models obtained using classical statistical methods to predict embryonic aneuploidy and mosaicism are not of high reliability. As an alternative to traditional methods, different machine and deep learning algorithms are being used to generate predictive models in different areas of medicine, including human reproduction. Study design, size, duration The study design is observational and retrospective. A total of 4654 embryos from 1558 PGT-A cycles were included (January-2017 to December-2020). The trophoectoderm biopsies on D5, D6 or D7 blastocysts were analysed by NGS. Embryos with ≤25% aneuploid cells were considered euploid, between 25-50% were classified as mosaic and aneuploid with >50%. The variables of the PGT-A were recorded in a database from which predictive models of embryonic aneuploidy and mosaicism were developed. Participants/materials, setting, methods The main indications for PGT-A were advanced maternal age, abnormal sperm FISH and recurrent miscarriage or implantation failure. Embryo analysis were performed using Veriseq-NGS (Illumina). The software used to carry out all the analysis was R (RStudio). The library used to implement the different algorithms was caret. In the machine learning models, 22 predictor variables were introduced, which can be classified into 4 categories: maternal, paternal, embryonic and those specific to the IVF cycle. Main results and the role of chance The different couple, embryo and stimulation cycle variables were recorded in a database (22 predictor variables). Two different predictive models were performed, one for aneuploidy and the other for mosaicism. The predictor variable was of multi-class type since it included the segmental and whole chromosome alteration categories. The dataframe were first preprocessed and the different classes to be predicted were balanced. A 80% of the data were used for training the model and 20% were reserved for further testing. The classification algorithms applied include multinomial regression, neural networks, support vector machines, neighborhood-based methods, classification trees, gradient boosting, ensemble methods, Bayesian and discriminant analysis-based methods. The algorithms were optimized by minimizing the Log_Loss that measures accuracy but penalizing misclassifications. The best predictive models were achieved with the XG-Boost and random forest algorithms. The AUC of the predictive model for aneuploidy was 80.8% (Log_Loss 1.028) and for mosaicism 84.1% (Log_Loss: 0.929). The best predictor variables of the models were maternal age, embryo quality, day of biopsy and whether or not the couple had a history of pregnancies with chromosomopathies. The male factor only played a relevant role in the mosaicism model but not in the aneuploidy model. Limitations, reasons for caution Although the predictive models obtained can be very useful to know the probabilities of achieving euploid embryos in an IVF cycle, increasing the sample size and including additional variables could improve the models and thus increase their predictive capacity. Wider implications of the findings Machine learning can be a very useful tool in reproductive medicine since it can allow the determination of factors associated with embryonic aneuploidies and mosaicism in order to establish a predictive model for both. To identify couples at risk of embryo aneuploidy/mosaicism could benefit them of the use of PGT-A. Trial registration number Not Applicable

Download Full-text

Technological innovation and the future of predictive model of pandemics (Preprint)

10.2196/preprints.28566 ◽

2021 ◽

Author(s):

Xavier Dupont

Keyword(s):

Machine Learning ◽

Technological Innovation ◽

Predictive Models ◽

State Of The Art ◽

Field Research ◽

Sir Model ◽

Rapid Progress ◽

Compartmental Modelling ◽

Research Interview ◽

Modelling Techniques

BACKGROUND As of October 2020, the COVID-19 death toll has reached over one million with 38 million confirmed cases globally. This pandemic is shaking the foundations of economies and reminding us the fragility of our system. Epidemics have affected societies since biblical times, but the recent acceleration in science and technology, as well as global cooperation, has provided scientists and mathematicians new resources, they can use to anticipate how a pandemic will spread with mathematical modelling. Compartmental modelling techniques, such as the SIR model, have been well-established for more than a century and have proven efficient and reliable in helping governments decide what strategies to use to fight pandemics. OBJECTIVE State of the art report on predictive models and technology METHODS Field research, Interview, RESULTS More recently, digitalisation and rapid progress in fields such as Machine Learning, IoT and big data have brought new perspectives to predictive models that improve their ability to predict how a pandemic will unfold and therefore which actions should be taken to eradicate the disease. This report will first review how pandemic modelling works. CONCLUSIONS It will then discuss the benefits and limitations of those models before outlining how new initiatives in several fields of technology are being used to fight the virus that causes COVID-19.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.21203/rs.3.rs-91905/v1 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Accuracy ◽

Data Science ◽

State Of The Art ◽

Hybrid Models ◽

The Other ◽

Learning Models ◽

Comprehensive Review

Abstract This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing, E-commerce, corporate banking, and cryptocurrency. Prisma method, a systematic literature review methodology is used to ensure the quality of the survey. The findings revealed that the trends are on advancement of hybrid models as more than 51% of the reviewed articles applied hybrid model. On the other hand, it is found that based on the RMSE accuracy metric, hybrid models had higher prediction accuracy than other algorithms. While it is expected the trends go toward the advancements of deep learning models.

Download Full-text

Data Security in Clinical Trials Using Blockchain Technology

Political and Economic Implications of Blockchain Technology in Business and Healthcare - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-7363-1.ch010 ◽

2021 ◽

pp. 250-268

Author(s):

Marta de-Melo-Diogo ◽

Jorge Tavares ◽

Ângelo Nunes Luís

Keyword(s):

Clinical Trial ◽

Clinical Trials ◽

Literature Review ◽

Data Security ◽

Critical Issue ◽

The Other ◽

Clinical Trial Setting ◽

Blockchain Technology ◽

Current Utilization ◽

Trial Setting

Blockchain technology in a clinical trial setting is a valuable asset due to decentralization, immutability, transparency, and traceability features. For this chapter, a literature review was conducted to map the current utilization of blockchain systems in clinical trials, particularly data security managing systems and their characteristics, such as applicability, interests of use, limitations, and issues. The advantages of data security are producing a more transparent and tamper-proof clinical trial by providing accurate, validated data, therefore producing a more reliable and credible clinical trial. On the other hand, data integrity is a critical issue since data obtained from trials are not instantly made public to all participants. Work needs to be done to establish the significant implications in security data when applying blockchain technology in a real-world clinical trial setting and generalized conditions of use to establish its security.

Download Full-text

Disconnecting structure and dynamics in glassy thin films

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1703927114 ◽

2017 ◽

Vol 114 (40) ◽

pp. 10601-10605 ◽

Cited By ~ 25

Author(s):

Daniel M. Sussman ◽

Samuel S. Schoenholz ◽

Ekin D. Cubuk ◽

Andrea J. Liu

Keyword(s):

Thin Film ◽

Machine Learning ◽

Thin Films ◽

Predictive Models ◽

Local Structure ◽

The Other ◽

Microscopic Structure ◽

Glassy Dynamics ◽

Structure And Dynamics ◽

Machine Learning Methods

Nanometrically thin glassy films depart strikingly from the behavior of their bulk counterparts. We investigate whether the dynamical differences between a bulk and thin film polymeric glass former can be understood by differences in local microscopic structure. Machine learning methods have shown that local structure can serve as the foundation for successful, predictive models of particle rearrangement dynamics in bulk systems. By contrast, in thin glassy films, we find that particles at the center of the film and those near the surface are structurally indistinguishable despite exhibiting very different dynamics. Next, we show that structure-independent processes, already present in bulk systems and demonstrably different from simple facilitated dynamics, are crucial for understanding glassy dynamics in thin films. Our analysis suggests a picture of glassy dynamics in which two dynamical processes coexist, with relative strengths that depend on the distance from an interface. One of these processes depends on local structure and is unchanged throughout most of the film, while the other is purely Arrhenius, does not depend on local structure, and is strongly enhanced near the free surface of a film.

Download Full-text

Predicting the potency of anti-Alzheimer drug combinations using machine learning

10.1101/2020.04.28.066340 ◽

2020 ◽

Author(s):

Thomas J Anastasio

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Clinical Trials ◽

Antihypertensive Drugs ◽

Demographic Variables ◽

Drug Combinations ◽

The Other ◽

Lipid Lowering ◽

Strongly Correlated ◽

Repurposed Drugs

ABSTRACTBACKGROUNDClinical trials of single drugs for the treatment of Alzheimer Disease (AD) have been notoriously unsuccessful. Combinations of repurposed drugs could provide effective treatments for AD. The challenge is to identify potentially potent combinations.OBJECTIVETo use machine learning (ML) to extract the knowledge from two leading AD databases, and then use the machine to predict which combinations of the drugs in common between the two databases would be the most effective as treatments for AD.METHODSThree-layered neural networks (NNs) having compound, gated units in their internal layer were trained using ML to predict the cognitive scores of participants in either database, given the other data fields including age, demographic variables, comorbidities, and drugs taken.RESULTSThe predictions from the separately trained NNs were strongly correlated. The best drug combinations, jointed determined from both sets of predictions, were high in NSAID, anticoagulant, lipid-lowering, and antihypertensive drugs, and female hormones.CONCLUSIONThe results suggest that AD, as a multifactorial disorder, could be effectively treated using a combination of repurposed drugs.

Download Full-text

Data Science in Economics: Comprehensive Review of Advanced Machine Learning and Deep Learning Methods

10.20944/preprints202010.0263.v1 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Accuracy ◽

Data Science ◽

State Of The Art ◽

Hybrid Models ◽

The Other ◽

Learning Models ◽

Comprehensive Review

This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing, E-commerce, corporate banking, and cryptocurrency. Prisma method, a systematic literature review methodology is used to ensure the quality of the survey. The findings revealed that the trends are on advancement of hybrid models as more than 51% of the reviewed articles applied hybrid model. On the other hand, it is found that based on the RMSE accuracy metric, hybrid models had higher prediction accuracy than other algorithms. While it is expected the trends go toward the advancements of deep learning models.

Download Full-text

A Systematic Methodology to Evaluate Prediction Models for Driving Style Classification

Sensors ◽

10.3390/s20061692 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1692 ◽

Cited By ~ 6

Author(s):

Iván Silva ◽

José Eugenio Naranjo

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Prediction Models ◽

Statistical Tests ◽

Area Under The Curve ◽

The Other ◽

Support Vector ◽

Classification Models ◽

K Nearest Neighbor

Identifying driving styles using classification models with in-vehicle data can provide automated feedback to drivers on their driving behavior, particularly if they are driving safely. Although several classification models have been developed for this purpose, there is no consensus on which classifier performs better at identifying driving styles. Therefore, more research is needed to evaluate classification models by comparing performance metrics. In this paper, a data-driven machine-learning methodology for classifying driving styles is introduced. This methodology is grounded in well-established machine-learning (ML) methods and literature related to driving-styles research. The methodology is illustrated through a study involving data collected from 50 drivers from two different cities in a naturalistic setting. Five features were extracted from the raw data. Fifteen experts were involved in the data labeling to derive the ground truth of the dataset. The dataset fed five different models (Support Vector Machines (SVM), Artificial Neural Networks (ANN), fuzzy logic, k-Nearest Neighbor (kNN), and Random Forests (RF)). These models were evaluated in terms of a set of performance metrics and statistical tests. The experimental results from performance metrics showed that SVM outperformed the other four models, achieving an average accuracy of 0.96, F1-Score of 0.9595, Area Under the Curve (AUC) of 0.9730, and Kappa of 0.9375. In addition, Wilcoxon tests indicated that ANN predicts differently to the other four models. These promising results demonstrate that the proposed methodology may support researchers in making informed decisions about which ML model performs better for driving-styles classification.

Download Full-text

Review—Computational Methods for Internal Flows With Emphasis on Turbomachinery

Journal of Fluids Engineering ◽

10.1115/1.3242443 ◽

1985 ◽

Vol 107 (1) ◽

pp. 6-22 ◽

Cited By ~ 21

Author(s):

William D. McNally ◽

Peter M. Sockol

Keyword(s):

Stream Function ◽

Computational Methods ◽

State Of The Art ◽

The State ◽

The Other ◽

Internal Flows ◽

Other Hand

A review is given of current computational methods for analyzing flows in turbomachinery and other related internal propulsion components. The methods are divided primarily into two classes, inviscid and viscous. The inviscid methods deal specifically with turbomachinery applications. Viscous methods, on the other hand, due to the state-of-the-art, deal with generalized duct flows as well as flows in turbomachinery passages. Inviscid methods are categorized into the potential, stream function, and Euler approaches. Viscous methods are treated in terms of parabolic, partially parabolic, and elliptic procedures.

Download Full-text