predictive approaches
Recently Published Documents


TOTAL DOCUMENTS

116
(FIVE YEARS 52)

H-INDEX

18
(FIVE YEARS 4)

2022 ◽  
Author(s):  
Alexandre Perez-Lebel ◽  
Gaël Varoquaux ◽  
Marine Le Morvan ◽  
Julie Josse ◽  
Jean-Baptiste Poline

BACKGROUND As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative --rather than generative-- modeling, and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. RESULTS Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: four electronic health record datasets, a population brain imaging one, a health survey and two intensive care ones. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values --with missing incorporated attribute-- leads to robust, fast, and well-performing predictive modeling. CONCLUSIONS Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.


2021 ◽  
Author(s):  
Gary William William Gunter ◽  
Mohamed Yacine Yacine Sahar ◽  
David F. Allen ◽  
Eduardo Jose Viro ◽  
Shahin Negabahn ◽  
...  

Abstract This paper discusses integrating common methods and applications for "Rock Typing" (also known as Petrophysical Rock Typing-PRT) including empirical, deterministic, statistical, probalistic and automatic/predictive approaches. Many industry asset teams apply one or more of these methods when creating static reservoir models, using dynamic reservoir simulations, completing petrophysical studies for saturation height models and determining reservoir volumetrics as part of reservoir characterization studies. Our intention is to provide guidance and important information on how and when to use the various methods, so people can make an informed selection. This discussion is important as many disciplines apply these PRT techniques without understanding the pros, cons and limitations of the different methods. An important tool is comparing PRT results from multiple methods. The topics and workflows that are covered focus on various PRT techniques and workflows. We will use case-studies to illustrate the key features and make important comparisons. Key results include comparing pros and cons, how to use and combine multiple PRT techniques and verify results. This paper includes these techniques and workflows;MICP, core analysis and pore throat calibration.Core-Log Integration focused on PRT analysis.Winland, Pittman, Aguilera and Hartmann et.al Gameboard methods.K-Phi ratio, Flow Zone Indicators and Rock Quality Index methods.Classic, Modified and Stratigraphic Lorenz methods.IPSOM and HRA Probabilistic methods.Case Study – Super Plot and Advanced Automatic PRT Method.Special Topics – Carbonate Methods, NMR and Single Well Vertical Line. Practical approaches based on case studies show how PRT analysis can be applied in mature fields to identify by-passed hydrocarbon zones and zones that have a high probability of producing water using open hole, cased hole and production logs. Traditional Rock Typing (PRT) analysis can be applied as a single well technique or as a multi-well method so operations teams can identify additional business opportunities (remedial workovers, infill drilling locations or exploitation targets) and compare reservoir performance with intrinsic rock properties. New applications and additional topics cover single, multiple well approaches and new emerging PRT techniques (including NMR well logs and machine learning). We recommend how to merge classic facies with PRT analysis for 3-D applications including populating a 3D volume.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0260632
Author(s):  
Fatima-Zahra Jaouimaa ◽  
Daniel Dempsey ◽  
Suzanne Van Osch ◽  
Stephen Kinsella ◽  
Kevin Burke ◽  
...  

Strategies adopted globally to mitigate the threat of COVID–19 have primarily involved lockdown measures with substantial economic and social costs with varying degrees of success. Morbidity patterns of COVID–19 variants have a strong association with age, while restrictive lockdown measures have association with negative mental health outcomes in some age groups. Reduced economic prospects may also afflict some age cohorts more than others. Motivated by this, we propose a model to describe COVID–19 community spread incorporating the role of age-specific social interactions. Through a flexible parameterisation of an age-structured deterministic Susceptible Exposed Infectious Removed (SEIR) model, we provide a means for characterising different forms of lockdown which may impact specific age groups differently. Social interactions are represented through age group to age group contact matrices, which can be trained using available data and are thus locally adapted. This framework is easy to interpret and suitable for describing counterfactual scenarios, which could assist policy makers with regard to minimising morbidity balanced with the costs of prospective suppression strategies. Our work originates from an Irish context and we use disease monitoring data from February 29th 2020 to January 31st 2021 gathered by Irish governmental agencies. We demonstrate how Irish lockdown scenarios can be constructed using the proposed model formulation and show results of retrospective fitting to incidence rates and forward planning with relevant “what if / instead of” lockdown counterfactuals. Uncertainty quantification for the predictive approaches is described. Our formulation is agnostic to a specific locale, in that lockdown strategies in other regions can be straightforwardly encoded using this model.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Ahmed Ali ◽  
Ahmed Fathalla ◽  
Ahmad Salah ◽  
Mahmoud Bekhit ◽  
Esraa Eldesouky

Nowadays, ocean observation technology continues to progress, resulting in a huge increase in marine data volume and dimensionality. This volume of data provides a golden opportunity to train predictive models, as the more the data is, the better the predictive model is. Predicting marine data such as sea surface temperature (SST) and Significant Wave Height (SWH) is a vital task in a variety of disciplines, including marine activities, deep-sea, and marine biodiversity monitoring. The literature has efforts to forecast such marine data; these efforts can be classified into three classes: machine learning, deep learning, and statistical predictive models. To the best of the authors’ knowledge, no study compared the performance of these three approaches on a real dataset. This paper focuses on the prediction of two critical marine features: the SST and SWH. In this work, we proposed implementing statistical, deep learning, and machine learning models for predicting the SST and SWH on a real dataset obtained from the Korea Hydrographic and Oceanographic Agency. Then, we proposed comparing these three predictive approaches on four different evaluation metrics. Experimental results have revealed that the deep learning model slightly outperformed the machine learning models for overall performance, and both of these approaches greatly outperformed the statistical predictive model.


2021 ◽  
Vol 13 (20) ◽  
pp. 11450
Author(s):  
Liping Ge ◽  
Malek Sarhani ◽  
Stefan Voß ◽  
Lin Xie

Public transport has become one of the major transport options, especially when it comes to reducing motorized individual transport and achieving sustainability while reducing emissions, noise and so on. The use of public transport data has evolved and rapidly improved over the past decades. Indeed, the availability of data from different sources, coupled with advances in analytical and predictive approaches, has contributed to increased attention being paid to the exploitation of available data to improve public transport service. In this paper, we review the current state of the art of public transport data sources. More precisely, we summarize and analyze the potential and challenges of the main data sources. In addition, we show the complementary aspects of these data sources and how to merge them to broaden their contributions and face their challenges. This is complemented by an information management framework to enhance the use of data sources. Specifically, we seek to bridge the gap between traditional data sources and recent ones, present a unified overview of them and show how they can all leverage recent advances in data-driven methods and how they can help achieve a balance between transit service and passenger behavior.


2021 ◽  
Author(s):  
Amanda Kowalczyk ◽  
Omotola Gbadamosi ◽  
Kathryn Kolor ◽  
Jahree Sosa ◽  
Livia Andrzejczuk ◽  
...  

Recent advances in genome sequencing have led to the identification of new ion and metabolite transporters, many of which have not been characterized. Due to the variety of subcellular localizations, cargo and transport mechanisms, such characterization is a daunting task, and predictive approaches focused on the functional context of transporters are very much needed. Here we present a case for identifying a transporter localization using evolutionary rate covariation (ERC), a computational approach based on pairwise correlations of amino acid sequence evolutionary rates across the mammalian phylogeny. As a case study, we find that poorly characterized transporter SLC30A9 (ZnT9) coevolves with several components of the mitochondrial oxidative phosphorylation chain, suggesting mitochondrial localization. We confirmed this computational finding experimentally using recombinant human SLC30A9. SLC30A9 loss caused zinc mishandling in the mitochondria, suggesting that under normal conditions it acts as a zinc exporter. We therefore propose that ERC can be used to predict the functional context of novel transporters and other poorly characterized proteins.


2021 ◽  
Vol 13 (15) ◽  
pp. 8194
Author(s):  
Imen Rahmouni ◽  
Geoffrey Promis ◽  
Omar Douzane ◽  
Frédéric Rosquoet

The suitability of replacing mineral aggregate with carbon-negative ones mainly depends on the properties of the aggregates produced from waste recycling, reducing CO2 emissions. This study aimed to investigate the predictive approaches adapted to concrete mixtures where mineral aggregates are replaced by carbonated aggregates (at different substitution rates from 25 to 100% with aggregates of various origins). A large experimental campaign of aggregates and carbonated aggregate concretes highlighted their physical, mechanical, thermal and hygric properties and the influence of density and porosity of aggregates on these properties. Thanks to these results, predictive approaches were formulated to establish the main engineering properties: mechanical compressive strength, elasticity modulus, thermal conductivity, thermal mass capacity and hygric diffusivity. These empirical and analytical models were based on the density of aggregates. Maximum deviations of around 15% were obtained with the experimental data, highlighting the influence of grain density on carbonated aggregate concretes. These models could then be used to optimize the formulation of concrete mixtures with carbonated aggregates, replacing international standards adapted to mineral aggregates.


2021 ◽  
Vol 12 ◽  
Author(s):  
Johannes W. R. Martini ◽  
Terence L. Molnar ◽  
José Crossa ◽  
Sarah J. Hearne ◽  
Kevin V. Pixley

Author(s):  
Hirra Hussain ◽  
Edward A McKenzie ◽  
Andrew M Robinson ◽  
Neill A Gingles ◽  
Fiona Marston ◽  
...  

AbstractBacterial expression systems remain a widely used host for recombinant protein production. However, overexpression of recombinant target proteins in bacterial systems such as Escherichia coli can result in poor solubility and the formation of insoluble aggregates. As a consequence, numerous strategies or alternative engineering approaches have been employed to increase recombinant protein production. In this case study, we present the strategies used to increase the recombinant production and solubility of ‘difficult-to-express’ bacterial antigens, termed Ant2 and Ant3, from Absynth Biologics Ltd.’s Clostridium difficile vaccine programme. Single recombinant antigens (Ant2 and Ant3) and fusion proteins (Ant2-3 and Ant3-2) formed insoluble aggregates (inclusion bodies) when overexpressed in bacterial cells. Further, proteolytic cleavage of Ant2-3 was observed. Optimisation of culture conditions and changes to the construct design to include N-terminal solubility tags did not improve antigen solubility. However, screening of different buffer/additives showed that the addition of 1–15 mM dithiothreitol alone decreased the formation of insoluble aggregates and improved the stability of both Ant2 and Ant3. Structural models were generated for Ant2 and Ant3, and solubility-based prediction tools were employed to determine the role of hydrophobicity and charge on protein production. The results showed that a large non-polar region (containing hydrophobic amino acids) was detected on the surface of Ant2 structures, whereas positively charged regions (containing lysine and arginine amino acids) were observed for Ant3, both of which were associated with poor protein solubility. We present a guide of strategies and predictive approaches that aim to guide the construct design, prior to expression studies, to define and engineer sequences/structures that could lead to increased expression and stability of single and potentially multi-domain (or fusion) antigens in bacterial expression systems.


Sign in / Sign up

Export Citation Format

Share Document