Highly accurate prediction of reaction performance and enantioselectivity with a uniform machine learning protocol

Abstract Synthetic reactions, especially asymmetric reactions are key components of modern chemistry. Chemists have put enormous experimental effort into recognizing various molecule patterns to enable efficient synthesis and asymmetric catalysis. Recent application of machine learning algorithms and chemoinformatics in this field demonstrated their huge potential in facilitating this process by accurate prediction. However, existing methods are relatively limited to specific designed data set, and only implement single prediction of reaction performance or reaction enantioselectivity, rendering their general use in broader scenarios challenging. Here we provide a uniform machine learning protocol that can predict both reaction performance and enantioselectivity with high accuracy. Reconstruction of molecular chemical space derived from more comprehensive three-dimensional atomic and molecular descriptors allow for training of our neural network-based model over four representative datasets. This uniform machine learning protocol was validated with outperformance of accuracy than other methods over all four cases (C-C, C-N, C-S cross coupling reactions and asymmetric hydrogenation) in the prediction of both reaction performance and enantioselectivity. It was also successfully applied to the out-of-set and sparse set prediction, leveraging its possible wide application in accelerating synthesis improve and molecular architects.

Download Full-text

Swarm Intelligence Optimization: An Exploration and Application of Machine Learning Technology

Journal of Intelligent Systems ◽

10.1515/jisys-2020-0084 ◽

2021 ◽

Vol 30 (1) ◽

pp. 460-469

Author(s):

Yinying Cai ◽

Amit Sharma

Keyword(s):

Machine Learning ◽

Swarm Intelligence ◽

Research Result ◽

Machine Learning Algorithms ◽

Learning Technology ◽

Data Set ◽

Rice Pests ◽

Machine Leaning ◽

Smart Agriculture ◽

Swarm Intelligence Optimization

Abstract In the agriculture development and growth, the efficient machinery and equipment plays an important role. Various research studies are involved in the implementation of the research and patents to aid the smart agriculture and authors and reviewers that machine leaning technologies are providing the best support for this growth. To explore machine learning technology and machine learning algorithms, the most of the applications are studied based on the swarm intelligence optimization. An optimized V3CFOA-RF model is built through V3CFOA. The algorithm is tested in the data set collected concerning rice pests, later analyzed and compared in detail with other existing algorithms. The research result shows that the model and algorithm proposed are not only more accurate in recognition and prediction, but also solve the time lagging problem to a degree. The model and algorithm helped realize a higher accuracy in crop pest prediction, which ensures a more stable and higher output of rice. Thus they can be employed as an important decision-making instrument in the agricultural production sector.

Download Full-text

Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set

Intelligent Data Analysis ◽

10.3233/ida-2004-8406 ◽

2004 ◽

Vol 8 (4) ◽

pp. 403-415 ◽

Cited By ~ 72

Author(s):

Maheshkumar Sabhnani ◽

Gursel Serpen

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Misuse Detection ◽

Data Set

Download Full-text

Birds Sound Classification Based on Machine Learning Algorithms

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v9i430227 ◽

2021 ◽

pp. 1-11

Author(s):

Aska E. Mehyadin ◽

Adnan Mohsin Abdulazeez ◽

Dathar Abas Hasan ◽

Jwan N. Saeed

Keyword(s):

Machine Learning ◽

Noise Suppression ◽

Bird Species ◽

Machine Learning Algorithms ◽

Data Sets ◽

Learning Technology ◽

Species Classification ◽

Data Set ◽

Sound Classification ◽

Mel Frequency Cepstral Coefficient

The bird classifier is a system that is equipped with an area machine learning technology and uses a machine learning method to store and classify bird calls. Bird species can be known by recording only the sound of the bird, which will make it easier for the system to manage. The system also provides species classification resources to allow automated species detection from observations that can teach a machine how to recognize whether or classify the species. Non-undesirable noises are filtered out of and sorted into data sets, where each sound is run via a noise suppression filter and a separate classification procedure so that the most useful data set can be easily processed. Mel-frequency cepstral coefficient (MFCC) is used and tested through different algorithms, namely Naïve Bayes, J4.8 and Multilayer perceptron (MLP), to classify bird species. J4.8 has the highest accuracy (78.40%) and is the best. Accuracy and elapsed time are (39.4 seconds).

Download Full-text

PERFORMANCE COMPARISON OF MACHINE LEARNING ALGORITHMS FOR PREDICTIVE MAINTENANCE

Informatyka Automatyka Pomiary w Gospodarce i Ochronie Środowiska ◽

10.35784/iapgos.1834 ◽

2020 ◽

Vol 10 (3) ◽

pp. 32-35

Author(s):

Jakub Gęca

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Predictive Maintenance ◽

Model Parameters ◽

Data Set ◽

Reduction Techniques ◽

Machine Reliability ◽

Dimensionality Reduction Techniques

The consequences of failures and unscheduled maintenance are the reasons why engineers have been trying to increase the reliability of industrial equipment for years. In modern solutions, predictive maintenance is a frequently used method. It allows to forecast failures and alert about their possibility. This paper presents a summary of the machine learning algorithms that can be used in predictive maintenance and comparison of their performance. The analysis was made on the basis of data set from Microsoft Azure AI Gallery. The paper presents a comprehensive approach to the issue including feature engineering, preprocessing, dimensionality reduction techniques, as well as tuning of model parameters in order to obtain the highest possible performance. The conducted research allowed to conclude that in the analysed case , the best algorithm achieved 99.92% accuracy out of over 122 thousand test data records. In conclusion, predictive maintenance based on machine learning represents the future of machine reliability in industry.

Download Full-text

Non-Invasive Risk Stratification of Hypertension: A Systematic Comparison of Machine Learning Algorithms

Journal of Sensor and Actuator Networks ◽

10.3390/jsan9030034 ◽

2020 ◽

Vol 9 (3) ◽

pp. 34

Author(s):

Giovanna Sannino ◽

Ivanoe De Falco ◽

Giuseppe De Pietro

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Risk Stratification ◽

Learning Algorithms ◽

Circulatory System ◽

Machine Learning Algorithms ◽

Learning Mechanisms ◽

Data Set ◽

Non Invasive ◽

Blood Pressure Estimation

One of the most important physiological parameters of the cardiovascular circulatory system is Blood Pressure. Several diseases are related to long-term abnormal blood pressure, i.e., hypertension; therefore, the early detection and assessment of this condition are crucial. The identification of hypertension, and, even more the evaluation of its risk stratification, by using wearable monitoring devices are now more realistic thanks to the advancements in Internet of Things, the improvements of digital sensors that are becoming more and more miniaturized, and the development of new signal processing and machine learning algorithms. In this scenario, a suitable biomedical signal is represented by the PhotoPlethysmoGraphy (PPG) signal. It can be acquired by using a simple, cheap, and wearable device, and can be used to evaluate several aspects of the cardiovascular system, e.g., the detection of abnormal heart rate, respiration rate, blood pressure, oxygen saturation, and so on. In this paper, we take into account the Cuff-Less Blood Pressure Estimation Data Set that contains, among others, PPG signals coming from a set of subjects, as well as the Blood Pressure values of the latter that is the hypertension level. Our aim is to investigate whether or not machine learning methods applied to these PPG signals can provide better results for the non-invasive classification and evaluation of subjects’ hypertension levels. To this aim, we have availed ourselves of a wide set of machine learning algorithms, based on different learning mechanisms, and have compared their results in terms of the effectiveness of the classification obtained.

Download Full-text

Spoken words as biomarkers: using machine learning to gain insight into communication as a predictor of anxiety

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa049 ◽

2020 ◽

Vol 27 (6) ◽

pp. 929-933

Author(s):

George Demiris ◽

Kristin L Corey Magan ◽

Debra Parker Oliver ◽

Karla T Washington ◽

Chad Chadwick ◽

...

Keyword(s):

Machine Learning ◽

Secondary Data ◽

Health Indicators ◽

Machine Learning Algorithms ◽

Standardized Assessments ◽

Learning Tools ◽

Data Set ◽

Problem Solving Therapy ◽

Audio Communication ◽

The Impact

Abstract Objective The goal of this study was to explore whether features of recorded and transcribed audio communication data extracted by machine learning algorithms can be used to train a classifier for anxiety. Materials and Methods We used a secondary data set generated by a clinical trial examining problem-solving therapy for hospice caregivers consisting of 140 transcripts of multiple, sequential conversations between an interviewer and a family caregiver along with standardized assessments of anxiety prior to each session; 98 of these transcripts (70%) served as the training set, holding the remaining 30% of the data for evaluation. Results A classifier for anxiety was developed relying on language-based features. An 86% precision, 78% recall, 81% accuracy, and 84% specificity were achieved with the use of the trained classifiers. High anxiety inflections were found among recently bereaved caregivers and were usually connected to issues related to transitioning out of the caregiving role. This analysis highlighted the impact of lowering anxiety by increasing reciprocity between interviewers and caregivers. Conclusion Verbal communication can provide a platform for machine learning tools to highlight and predict behavioral health indicators and trends.

Download Full-text

A comparison of deep machine learning and Monte Carlo methods for facies classification from seismic data

Geophysics ◽

10.1190/geo2019-0405.1 ◽

2020 ◽

Vol 85 (4) ◽

pp. WA41-WA52 ◽

Cited By ~ 3

Author(s):

Dario Grana ◽

Leonardo Azevedo ◽

Mingliang Liu

Keyword(s):

Neural Network ◽

Machine Learning ◽

Monte Carlo ◽

Monte Carlo Methods ◽

Seismic Data ◽

Transition Probabilities ◽

Machine Learning Algorithms ◽

Data Set ◽

Facies Classification ◽

The Neural Network

Among the large variety of mathematical and computational methods for estimating reservoir properties such as facies and petrophysical variables from geophysical data, deep machine-learning algorithms have gained significant popularity for their ability to obtain accurate solutions for geophysical inverse problems in which the physical models are partially unknown. Solutions of classification and inversion problems are generally not unique, and uncertainty quantification studies are required to quantify the uncertainty in the model predictions and determine the precision of the results. Probabilistic methods, such as Monte Carlo approaches, provide a reliable approach for capturing the variability of the set of possible models that match the measured data. Here, we focused on the classification of facies from seismic data and benchmarked the performance of three different algorithms: recurrent neural network, Monte Carlo acceptance/rejection sampling, and Markov chain Monte Carlo. We tested and validated these approaches at the well locations by comparing classification predictions to the reference facies profile. The accuracy of the classification results is defined as the mismatch between the predictions and the log facies profile. Our study found that when the training data set of the neural network is large enough and the prior information about the transition probabilities of the facies in the Monte Carlo approach is not informative, machine-learning methods lead to more accurate solutions; however, the uncertainty of the solution might be underestimated. When some prior knowledge of the facies model is available, for example, from nearby wells, Monte Carlo methods provide solutions with similar accuracy to the neural network and allow a more robust quantification of the uncertainty, of the solution.

Download Full-text

A comprehensive review and performance evaluation of bioinformatics tools for HLA class I peptide-binding prediction

Briefings in Bioinformatics ◽

10.1093/bib/bbz051 ◽

2020 ◽

Vol 21 (4) ◽

pp. 1119-1135 ◽

Cited By ~ 27

Author(s):

Shutao Mei ◽

Fuyi Li ◽

André Leier ◽

Tatiana T Marquez-Lago ◽

Kailin Giam ◽

...

Keyword(s):

Machine Learning ◽

T Cell ◽

Peptide Binding ◽

Hla Class I ◽

Machine Learning Algorithms ◽

Class I ◽

Target Cells ◽

Binding Prediction ◽

Validation Data ◽

Data Set

Abstract Human leukocyte antigen class I (HLA-I) molecules are encoded by major histocompatibility complex (MHC) class I loci in humans. The binding and interaction between HLA-I molecules and intracellular peptides derived from a variety of proteolytic mechanisms play a crucial role in subsequent T-cell recognition of target cells and the specificity of the immune response. In this context, tools that predict the likelihood for a peptide to bind to specific HLA class I allotypes are important for selecting the most promising antigenic targets for immunotherapy. In this article, we comprehensively review a variety of currently available tools for predicting the binding of peptides to a selection of HLA-I allomorphs. Specifically, we compare their calculation methods for the prediction score, employed algorithms, evaluation strategies and software functionalities. In addition, we have evaluated the prediction performance of the reviewed tools based on an independent validation data set, containing 21 101 experimentally verified ligands across 19 HLA-I allotypes. The benchmarking results show that MixMHCpred 2.0.1 achieves the best performance for predicting peptides binding to most of the HLA-I allomorphs studied, while NetMHCpan 4.0 and NetMHCcons 1.1 outperform the other machine learning-based and consensus-based tools, respectively. Importantly, it should be noted that a peptide predicted with a higher binding score for a specific HLA allotype does not necessarily imply it will be immunogenic. That said, peptide-binding predictors are still very useful in that they can help to significantly reduce the large number of epitope candidates that need to be experimentally verified. Several other factors, including susceptibility to proteasome cleavage, peptide transport into the endoplasmic reticulum and T-cell receptor repertoire, also contribute to the immunogenicity of peptide antigens, and some of them can be considered by some predictors. Therefore, integrating features derived from these additional factors together with HLA-binding properties by using machine-learning algorithms may increase the prediction accuracy of immunogenic peptides. As such, we anticipate that this review and benchmarking survey will assist researchers in selecting appropriate prediction tools that best suit their purposes and provide useful guidelines for the development of improved antigen predictors in the future.

Download Full-text

Sedimentary environment prediction of grain-size data based on machine learning approach

Interpretation ◽

10.1190/int-2019-0153.1 ◽

2020 ◽

Vol 8 (3) ◽

pp. SL71-SL78

Author(s):

Qiao Su ◽

Yanhui Zhu ◽

Fang Hu ◽

Xingyong Xu

Keyword(s):

Machine Learning ◽

Grain Size ◽

Prediction Model ◽

Sedimentary Environment ◽

Machine Learning Algorithms ◽

Grain Size Analysis ◽

Size Analysis ◽

Data Set ◽

Sedimentary Environments ◽

Size Data

Grain size is one of the most important records for sedimentary environment, and researchers have made remarkable progress in the interpretation of sedimentary environments by grain size analysis in the past few decades. However, these advances often depend on the personal experience of the scholars and combination with other methods used together. Here, we constructed a prediction model using the K-nearest neighbors algorithm, one of the machine learning methods, which can predict the sedimentary environments of one core through a known core. Compared to the results of other studies based on the comprehensive data set of grain size and four other indicators, this model achieved a high precision value only using the grain size data. We have also compared our prediction model with other mainstream machine learning algorithms, and the experimental results of six evaluation metrics shed light on that this prediction model can achieve the higher precision. The main errors of the model reflect the length of the conversation area of sedimentary environment, which is controlled by the sedimentary dynamics. This model can provide a quick comparison method of the cores in a similar environment; thus, it may point out the preliminary guidance for further study.

Download Full-text

Evaluation of Prognosis in Nasopharyngeal Cancer Using Machine Learning

Technology in Cancer Research & Treatment ◽

10.1177/1533033820909829 ◽

2020 ◽

Vol 19 ◽

pp. 153303382090982

Author(s):

Melek Akcay ◽

Durmus Etiz ◽

Ozer Celik ◽

Alaattin Ozen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Naive Bayes ◽

Nasopharyngeal Cancer ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Tumor Diameter ◽

Survival Prognosis ◽

Data Set

Background and Aim: Although the prognosis of nasopharyngeal cancer largely depends on a classification based on the tumor-lymph node metastasis staging system, patients at the same stage may have different clinical outcomes. This study aimed to evaluate the survival prognosis of nasopharyngeal cancer using machine learning. Settings and Design: Original, retrospective. Materials and Methods: A total of 72 patients with a diagnosis of nasopharyngeal cancer who received radiotherapy ± chemotherapy were included in the study. The contribution of patient, tumor, and treatment characteristics to the survival prognosis was evaluated by machine learning using the following techniques: logistic regression, artificial neural network, XGBoost, support-vector clustering, random forest, and Gaussian Naive Bayes. Results: In the analysis of the data set, correlation analysis, and binary logistic regression analyses were applied. Of the 18 independent variables, 10 were found to be effective in predicting nasopharyngeal cancer-related mortality: age, weight loss, initial neutrophil/lymphocyte ratio, initial lactate dehydrogenase, initial hemoglobin, radiotherapy duration, tumor diameter, number of concurrent chemotherapy cycles, and T and N stages. Gaussian Naive Bayes was determined as the best algorithm to evaluate the prognosis of machine learning techniques (accuracy rate: 88%, area under the curve score: 0.91, confidence interval: 0.68-1, sensitivity: 75%, specificity: 100%). Conclusion: Many factors affect prognosis in cancer, and machine learning algorithms can be used to determine which factors have a greater effect on survival prognosis, which then allows further research into these factors. In the current study, Gaussian Naive Bayes was identified as the best algorithm for the evaluation of prognosis of nasopharyngeal cancer.

Download Full-text