scholarly journals Hybridizing physical and data-driven prediction methods for physicochemical properties

2020 ◽  
Vol 56 (82) ◽  
pp. 12407-12410
Author(s):  
Fabian Jirasek ◽  
Robert Bamler ◽  
Stephan Mandt

We present a generic, highly effective approach to combine physical and data-driven prediction methods for physicochemical properties based on Bayesian machine learning and model distillation.

2021 ◽  
Author(s):  
Fabian Jirasek ◽  
Robert Bamler ◽  
Stephan Mandt

We present a generic way to hybridize physical and data-driven methods for predicting physicochemical properties. The approach ‘distills’ the physical method's predictions into a prior model and combines it with sparse experimental data using Bayesian inference. We apply the new approach to predict activity coefficients at infinite dilution and obtain significant improvements compared to the physical and data-driven baselines and established ensemble methods from the machine learning literature.


2019 ◽  
Vol 14 (3) ◽  
pp. 178-189 ◽  
Author(s):  
Xiaoyang Jing ◽  
Qimin Dong ◽  
Ruqian Lu ◽  
Qiwen Dong

Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.


2020 ◽  
Vol 15 (2) ◽  
pp. 121-134 ◽  
Author(s):  
Eunmi Kwon ◽  
Myeongji Cho ◽  
Hayeon Kim ◽  
Hyeon S. Son

Background: The host tropism determinants of influenza virus, which cause changes in the host range and increase the likelihood of interaction with specific hosts, are critical for understanding the infection and propagation of the virus in diverse host species. Methods: Six types of protein sequences of influenza viral strains isolated from three classes of hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest neighbor algorithms were used for host classification. The Java language was used for sequence analysis programming and identifying host-specific position markers. Results: A machine learning technique was explored to derive the physicochemical properties of amino acids used in host classification and prediction. HA protein was found to play the most important role in determining host tropism of the influenza virus, and the random forest method yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific differences were also selected and verified, and they were found to be useful position markers for host classification. Finally, ANOVA analysis and post-hoc testing revealed that the physicochemical properties of amino acids, comprising protein sequences combined with position markers, differed significantly among hosts. Conclusion: The host tropism determinants and position markers described in this study can be used in related research to classify, identify, and predict the hosts of influenza viruses that are currently susceptible or likely to be infected in the future.


Author(s):  
Ekaterina Kochmar ◽  
Dung Do Vu ◽  
Robert Belfer ◽  
Varun Gupta ◽  
Iulian Vlad Serban ◽  
...  

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.


Water ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1208
Author(s):  
Massimiliano Bordoni ◽  
Fabrizio Inzaghi ◽  
Valerio Vivaldi ◽  
Roberto Valentino ◽  
Marco Bittelli ◽  
...  

Soil water potential is a key factor to study water dynamics in soil and for estimating the occurrence of natural hazards, as landslides. This parameter can be measured in field or estimated through physically-based models, limited by the availability of effective input soil properties and preliminary calibrations. Data-driven models, based on machine learning techniques, could overcome these gaps. The aim of this paper is then to develop an innovative machine learning methodology to assess soil water potential trends and to implement them in models to predict shallow landslides. Monitoring data since 2012 from test-sites slopes in Oltrepò Pavese (northern Italy) were used to build the models. Within the tested techniques, Random Forest models allowed an outstanding reconstruction of measured soil water potential temporal trends. Each model is sensitive to meteorological and hydrological characteristics according to soil depths and features. Reliability of the proposed models was confirmed by correct estimation of days when shallow landslides were triggered in the study areas in December 2020, after implementing the modeled trends on a slope stability model, and by the correct choice of physically-based rainfall thresholds. These results confirm the potential application of the developed methodology to estimate hydrological scenarios that could be used for decision-making purposes.


Sign in / Sign up

Export Citation Format

Share Document