Ensemble Learning Approach with LASSO for Predicting Catalytic Reaction Rates

Synlett ◽  
2020 ◽  
Author(s):  
Akira Yada ◽  
Kazuhiko Sato ◽  
Tarojiro Matsumura ◽  
Yasunobu Ando ◽  
Kenji Nagata ◽  
...  

AbstractThe prediction of the initial reaction rate in the tungsten-catalyzed epoxidation of alkenes by using a machine learning approach is demonstrated. The ensemble learning framework used in this study consists of random sampling with replacement from the training dataset, the construction of several predictive models (weak learners), and the combination of their outputs. This approach enables us to obtain a reasonable prediction model that avoids the problem of overfitting, even when analyzing a small dataset.

Catalysts ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 474
Author(s):  
Jan-Paul Grass ◽  
Katharina Klühspies ◽  
Bastian Reiprich ◽  
Wilhelm Schwieger ◽  
Alexandra Inayat

This study is dedicated to the comparative investigation of the catalytic activity of layer-like Faujasite-type (FAU) zeolite X obtained from three different synthesis routes (additive-free route, Li2CO3 route, and TPOAC route) in a liquid-phase Knoevenagel condensation of benzaldehyde and ethyl cyanoacetate to ethyl trans-α-cyanocinnamate. It is shown that the charge-balancing cations (Na+ and K+) and the morphological properties have a strong influence on the apparent reaction rate and degree of conversion. The highest initial reaction rate could be found for the layer-like zeolite X synthesised by the additive-free route in the potassium form. In most cases, the potassium-exchanged zeolites enabled higher maximum conversions and higher reaction rates compared to the zeolite X catalysts in sodium form. However, very thin crystal plates (below 100 nm thickness), similar to those obtained in the presence of TPOAC, did not withstand the multiple aqueous ion exchange procedure, with the remaining coarse crystals facilitating less enhancement of the catalytic activity.


2015 ◽  
Vol 2015 ◽  
pp. 1-8
Author(s):  
Gang Zhang ◽  
Yonghui Huang ◽  
Ling Zhong ◽  
Shanxing Ou ◽  
Yi Zhang ◽  
...  

Objective.This study aims to establish a model to analyze clinical experience of TCM veteran doctors. We propose an ensemble learning based framework to analyze clinical records with ICD-10 labels information for effective diagnosis and acupoints recommendation.Methods.We propose an ensemble learning framework for the analysis task. A set of base learners composed of decision tree (DT) and support vector machine (SVM) are trained by bootstrapping the training dataset. The base learners are sorted by accuracy and diversity through nondominated sort (NDS) algorithm and combined through a deep ensemble learning strategy.Results.We evaluate the proposed method with comparison to two currently successful methods on a clinical diagnosis dataset with manually labeled ICD-10 information. ICD-10 label annotation and acupoints recommendation are evaluated for three methods. The proposed method achieves an accuracy rate of 88.2%  ±  2.8% measured by zero-one loss for the first evaluation session and 79.6%  ±  3.6% measured by Hamming loss, which are superior to the other two methods.Conclusion.The proposed ensemble model can effectively model the implied knowledge and experience in historic clinical data records. The computational cost of training a set of base learners is relatively low.


2021 ◽  
Author(s):  
Changming Zhao ◽  
Dongrui Wu ◽  
Jian Huang ◽  
Ye Yuan ◽  
Hai-Tao Zhang ◽  
...  

Abstract Bootstrap aggregating (Bagging) and boosting are two popular ensemble learning approaches, which combine multiple base learners to generate a composite model for more accurate and more reliable performance. They have been widely used in biology, engineering, healthcare, etc. This article proposes BoostForest, which is an ensemble learning approach using BoostTree as base learners and can be used for both classification and regression. BoostTree constructs a tree model by gradient boosting. It achieves high randomness (diversity) by sampling its parameters randomly from a parameter pool, and selecting a subset of features randomly at node splitting. BoostForest further increases the randomness by bootstrapping the training data in constructing different BoostTrees. BoostForest outperformed four classical ensemble learning approaches (Random Forest, Extra-Trees, XGBoost and LightGBM) on 34 classification and regression datasets. Remarkably, BoostForest has only one hyper-parameter (the number of BoostTrees), which can be easily specified. Our code is publicly available, and the proposed ensemble learning framework can also be used to combine many other base learners.


2021 ◽  
Author(s):  
Urmi Ghosh ◽  
Tuhin Chakraborty

<p>Rapid technological improvements made in in-situ analysis techniques, including LA-ICPMS, have transformed the field of analytical geochemistry. This has a far-reaching impact for different petrogenetic and ore-genetic studies where minute major and trace element compositional changes between different mineral zones within a single crystal can now be demarcated. Minerals such as garnet although robust are quite sensitive to the changing P-T and fluid conditions during their formation. These minerals have become powerful tools to characterize mineralization types. Previously, Meinert (1992) has used in-situ major element EPMA analysis results to classify different skarn deposit based on the end-member composition of hydrothermal garnets. Alternatively, Tian et al. (2019) used the garnet trace element composition for the similar purpose. However, these discrimination plots/ classification schemes show major overlap in different skarn deposits, such as Fe, Cu, Zn, and Au. The present study is an attempt to use machine learning approach on available garnet data to found a more potent classification scheme for skarn deposits, thus reaffirming garnet as a faithful indicator for hydrothermal ore deposits. We have meticulously collected major and trace element data of Ca-rich garnets, associated with different skarn deposits worldwide from 40 publications. This collected data is then used to train a model for fingerprinting the skarn deposits. Stratified random sampling method has been used on the dataset with 80% of the samples as test set and the rest 20 % as training dataset. We have used K-nearest neighbour (KNN), Support Vector Machine (SVM) and Random Forest algorithms on the data by using Python as a platform. These ML classification algorithm performs better than the earlier existing models available for classification of ore types based on garnet composition in skarn system. Factor importance is calculated that shows which elements play a pivotal role in classification of the ore type. Our results depict that multiple garnet forming elements taken together can reliably be used to discriminate between different ore formation settings.</p>


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Absalom E. Ezugwu ◽  
Ibrahim Abaker Targio Hashem ◽  
Olaide N. Oyelade ◽  
Mubarak Almutari ◽  
Mohammed A. Al-Garadi ◽  
...  

The spread of COVID-19 worldwide continues despite multidimensional efforts to curtail its spread and provide treatment. Efforts to contain the COVID-19 pandemic have triggered partial or full lockdowns across the globe. This paper presents a novel framework that intelligently combines machine learning models and the Internet of Things (IoT) technology specifically to combat COVID-19 in smart cities. The purpose of the study is to promote the interoperability of machine learning algorithms with IoT technology by interacting with a population and its environment to curtail the COVID-19 pandemic. Furthermore, the study also investigates and discusses some solution frameworks, which can generate, capture, store, and analyze data using machine learning algorithms. These algorithms can detect, prevent, and trace the spread of COVID-19 and provide a better understanding of the disease in smart cities. Similarly, the study outlined case studies on the application of machine learning to help fight against COVID-19 in hospitals worldwide. The framework proposed in the study is a comprehensive presentation on the major components needed to integrate the machine learning approach with other AI-based solutions. Finally, the machine learning framework presented in this study has the potential to help national healthcare systems in curtailing the COVID-19 pandemic in smart cities. In addition, the proposed framework is poised as a pointer for generating research interests that would yield outcomes capable of been integrated to form an improved framework.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Jinchan Qu ◽  
Albert Steppi ◽  
Dongrui Zhong ◽  
Jie Hao ◽  
Jian Wang ◽  
...  

Abstract Background Information on protein-protein interactions affected by mutations is very useful for understanding the biological effect of mutations and for developing treatments targeting the interactions. In this study, we developed a natural language processing (NLP) based machine learning approach for extracting such information from literature. Our aim is to identify journal abstracts or paragraphs in full-text articles that contain at least one occurrence of a protein-protein interaction (PPI) affected by a mutation. Results Our system makes use of latest NLP methods with a large number of engineered features including some based on pre-trained word embedding. Our final model achieved satisfactory performance in the Document Triage Task of the BioCreative VI Precision Medicine Track with highest recall and comparable F1-score. Conclusions The performance of our method indicates that it is ideally suited for being combined with manual annotations. Our machine learning framework and engineered features will also be very helpful for other researchers to further improve this and other related biological text mining tasks using either traditional machine learning or deep learning based methods.


2017 ◽  
Vol 42 (6) ◽  
Author(s):  
Aslı Soyer Malyemez ◽  
Emine Bayraktar ◽  
Ülkü Mehmetoğlu

AbstractIntroduction:In order to product (S)-2-pentanol which have been used as a key chiral intermediate required in the synthesis of several potential anti-Alzhemeir drugs, the effects of enzyme, acyl donor, substrate concentration and acyl donor/racemic-2-pentanol mole ratio were investigated on the kinetic resolution of racemic-2-pentanol.Methods:Reactions were performed in a bioreactor of 50 mL capacity with a working volume of 30 mL on an orbital shaker at 150 rpm and at 30°C. Production parameters were investigated with different type of enzyme and acyl donor.Results:The optimum conditions were obtained with Novozyme 435 and vinyl butyrate with the 50% conversion, 99% of enantiomeric excess for the substrate at 30 min. Optimum conditions are 1500 mM substrate and 4 mg/mL enzyme concentrations and 24.88 mM/min maximum initial reaction rate. It was obtained that Ping-Pong bi-bi mechanism was the appropriate reaction kinetic. Kinetic parameters were determined with Polymath 6.1 software as 4.16 mmol/min/g enzyme maximum reaction rates, 103.73 mM Km for (R)-2-pentanol and 51.18 mM Km for vinyl butyrate.Conclusion:(S)-2-pentanol was obtained with 99% of enantiomeric excess. These data will be clear up to product (S)-2-pentanol at larger industrial scales in future.


2009 ◽  
Vol 6 (1) ◽  
Author(s):  
Richard Jb. Dobson ◽  
Patricia B Munroe ◽  
Mark J Caulfield ◽  
Mansoor Saqi

SummaryFunctional annotation of a protein sequence in the absence of experimental data or clear similarity to a sequence of known function is difficult. In this study, a simple set of sequence attributes based on physicochemical and predicted structural characteristics were used as input to machine learning methods. In order to improve performance through increasing the data available for training, a technique of sequence enrichment was explored. These methods were used to predict membership to 24 and 49 large and diverse protein superfamiles from the SCOP database.We found the best performance was obtained using an enriched training dataset. Accuracies of 66.3% and 55.6% were achieved on datasets comprising 24 and 49 superfamilies with LibSVM and AdaBoostM1 respectively.The methods used here confirm that domains within superfamilies share global sequence properties. We show machine learning models used to predict categories within the SCOP database can be significantly improved via a simple sequence enrichment step. These approaches can be used to complement profile methods for detecting distant relationships where function is difficult to infer.


Energies ◽  
2021 ◽  
Vol 14 (18) ◽  
pp. 5718
Author(s):  
Regelii Suassuna de Andrade Ferreira ◽  
Patrick Picher ◽  
Hassan Ezzaidi ◽  
Issouf Fofana

Frequency response analysis (FRA) is a powerful and widely used tool for condition assessment in power transformers. However, interpretation schemes are still challenging. Studies show that FRA data can be influenced by parameters other than winding deformation, including temperature. In this study, a machine-learning approach with temperature as an input attribute was used to objectively identify faults in FRA traces. To the best knowledge of the authors, this has not been reported in the literature. A single-phase transformer model was specifically designed and fabricated for use as a test object for the study. The model is unique in that it allows the non-destructive interchange of healthy and distorted winding sections and, hence, reproducible and repeatable FRA measurements. FRA measurements taken at temperatures ranging from −40 °C to 40 °C were used first to describe the impact of temperature on FRA traces and then to test the ability of the machine learning algorithms to discriminate between fault conditions and temperature variation. The results show that when temperature is not considered in the training dataset, the algorithm may misclassify healthy measurements, taken at different temperatures, as mechanical or electrical faults. However, once the influence of temperature was considered in the training set, the performance of the classifier as studied was restored. The results indicate the feasibility of using the proposed approach to prevent misclassification based on temperature changes.


Author(s):  
Matteo Calabrese ◽  
Martin Cimmino ◽  
Martina Manfrin ◽  
Francesca Fiume ◽  
Dimos Kapetis ◽  
...  

Abstract Predictive Maintenance concerns the smart monitoring of machine to avoid possible future failures, since because it is better to intervene before the damage occurs, saving time and money. In this paper, a Predictive Maintenance methodology based on Machine learning approach is presented and it is applied to a real cutting machine, a woodworking machinery in a real industrial group, producing accurate estimations. This kind of strategy is important to deal with maintenance problems given the ever increasing need to reduce downtime and associated costs. The Predictive Maintenance methodology implemented allows dynamical decision rules that have to be considered for maintenance prediction using a combined approach on Azure Machine Learning Studio. The Three models (RF, GBM and XGBM) allowed the accurately predict machine down ever gripped bearing thanks to the pre-processing phases.


Sign in / Sign up

Export Citation Format

Share Document