On the influence of template size, canonicalization and exclusivity for retrosynthesis and reaction prediction applications

Heuristic and machine learning models for rank-ordering reaction templates comprise an important basis for computer-aided organic synthesis regarding both product prediction and retrosynthetic pathway planning. Their viability relies heavily on the quality and characteristics of the underlying template database. With the advent of automated reaction and template extraction software and consequently the creation of template databases too large to be curated manually, a data-driven approach to assess and improve the quality of template sets is needed. We therefore systematically studied the influence of template generality, canonicalization and exclusivity on the performance of different template ranking models. We find that duplicate and non-exclusive templates, \textit{i.e.} templates which describe the same chemical transformation on identical or overlapping sets of molecules, decrease both the accuracy of the ranking algorithm and the applicability of the respective top-ranked templates significantly. To remedy the negative effects of non-exclusivity, we developed a general and computationally efficient framework to deduplicate and hierarchically correct templates. As a result, performance improved for both heuristic and machine learning template ranking algorithms across different template sizes. The canonicalization and correction code was made freely available.

Download Full-text

Not all biases are bad: equitable and inequitable biases in machine learning and radiology

Insights into Imaging ◽

10.1186/s13244-020-00955-7 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Mirjam Pot ◽

Nathalie Kieusseyan ◽

Barbara Prainsack

Keyword(s):

Machine Learning ◽

Human Error ◽

Technical Problem ◽

Social Dimension ◽

Patient Treatment ◽

Technical Solution ◽

Negative Effects ◽

Different Types ◽

Human Decision

AbstractThe application of machine learning (ML) technologies in medicine generally but also in radiology more specifically is hoped to improve clinical processes and the provision of healthcare. A central motivation in this regard is to advance patient treatment by reducing human error and increasing the accuracy of prognosis, diagnosis and therapy decisions. There is, however, also increasing awareness about bias in ML technologies and its potentially harmful consequences. Biases refer to systematic distortions of datasets, algorithms, or human decision making. These systematic distortions are understood to have negative effects on the quality of an outcome in terms of accuracy, fairness, or transparency. But biases are not only a technical problem that requires a technical solution. Because they often also have a social dimension, the ‘distorted’ outcomes they yield often have implications for equity. This paper assesses different types of biases that can emerge within applications of ML in radiology, and discusses in what cases such biases are problematic. Drawing upon theories of equity in healthcare, we argue that while some biases are harmful and should be acted upon, others might be unproblematic and even desirable—exactly because they can contribute to overcome inequities.

Download Full-text

Data-driven studies of magnetic two-dimensional materials

Scientific Reports ◽

10.1038/s41598-020-72811-z ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Trevor David Rhone ◽

Wei Chen ◽

Shaan Desai ◽

Steven B. Torrisi ◽

Daniel T. Larson ◽

...

Keyword(s):

Machine Learning ◽

Dft Calculations ◽

Density Functional ◽

Magnetic Coupling ◽

Magnetic Ordering ◽

Magnetic Behavior ◽

Data Driven ◽

Two Dimensional ◽

Computationally Efficient ◽

Data Driven Approach

Abstract We use a data-driven approach to study the magnetic and thermodynamic properties of van der Waals (vdW) layered materials. We investigate monolayers of the form $$\hbox {A}_2\hbox {B}_2\hbox {X}_6$$ A 2 B 2 X 6 , based on the known material $$\hbox {Cr}_2\hbox {Ge}_2\hbox {Te}_6$$ Cr 2 Ge 2 Te 6 , using density functional theory (DFT) calculations and machine learning methods to determine their magnetic properties, such as magnetic order and magnetic moment. We also examine formation energies and use them as a proxy for chemical stability. We show that machine learning tools, combined with DFT calculations, can provide a computationally efficient means to predict properties of such two-dimensional (2D) magnetic materials. Our data analytics approach provides insights into the microscopic origins of magnetic ordering in these systems. For instance, we find that the X site strongly affects the magnetic coupling between neighboring A sites, which drives the magnetic ordering. Our approach opens new ways for rapid discovery of chemically stable vdW materials that exhibit magnetic behavior.

Download Full-text

UNIVERSAL ALGORITHMS FOR PROBABILITY FORECASTING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213012400155 ◽

2012 ◽

Vol 21 (04) ◽

pp. 1240015

Author(s):

FEDOR ZHDANOV ◽

YURI KALNISHKAN

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Linear Model ◽

Loss Function ◽

Efficient Algorithms ◽

Classification Problems ◽

Computationally Efficient ◽

Multi Class Classification ◽

Computationally Efficient Algorithms

Multi-class classification is one of the most important tasks in machine learning. In this paper we consider two online multi-class classification problems: classification by a linear model and by a kernelized model. The quality of predictions is measured by the Brier loss function. We obtain two computationally efficient algorithms for these problems by applying the Aggregating Algorithms to certain pools of experts and prove theoretical guarantees on the losses of these algorithms. We kernelize one of the algorithms and prove theoretical guarantees on its loss. We perform experiments and compare our algorithms with logistic regression.

Download Full-text

Study on Ontology Ranking Models Based on the Ensemble Learning

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2018040107 ◽

2018 ◽

Vol 14 (2) ◽

pp. 138-161

Author(s):

Liu Jie ◽

Yuan Kerou ◽

Zhou Jianshe ◽

Shi Jinsheng

Keyword(s):

Machine Learning ◽

Learning Strategies ◽

Ensemble Learning ◽

The Internet ◽

Feature Ranking ◽

Ranking Algorithms ◽

Ranking Models ◽

Single Feature ◽

Ontology Ranking ◽

Internal Character

This article describes how more knowledge appears on the Internet than in an ontological form. Displaying results to users precisely when searching is the key issue of the research on ontology retrieval. The considered factors of ontology ranking are not only limited to internal character-matching, but analysis of metadata, including the entities, structures and the relations in ontologies. Currently, existing single feature ranking algorithms focus on the structures, elements and the contents of a certain aspect in ontology, thus, the results are not satisfactory. Combining multiple single-featured models seems to achieve better results, but the objectivity and versatility of models' weights are debatable. Machine learning effectively solves the problem and putting advantages of ranking learning algorithms together is the pressing issue. So we propose ensemble learning strategies to combine different algorithms in ontology ranking. And the ranking result is more satisfied compared to Swoogle and base algorithms.

Download Full-text

Machine-Learning Methods for Computational Science and Engineering

Computation ◽

10.3390/computation8010015 ◽

2020 ◽

Vol 8 (1) ◽

pp. 15 ◽

Cited By ~ 2

Author(s):

Michael Frank ◽

Dimitris Drikakis ◽

Vassilis Charissis

Keyword(s):

Machine Learning ◽

Computational Science ◽

Computationally Efficient ◽

Science And Engineering ◽

Machine Learning Methods ◽

Simulation Techniques ◽

Computational Science And Engineering ◽

Speed Up ◽

Scientific Fields

The re-kindled fascination in machine learning (ML), observed over the last few decades, has also percolated into natural sciences and engineering. ML algorithms are now used in scientific computing, as well as in data-mining and processing. In this paper, we provide a review of the state-of-the-art in ML for computational science and engineering. We discuss ways of using ML to speed up or improve the quality of simulation techniques such as computational fluid dynamics, molecular dynamics, and structural analysis. We explore the ability of ML to produce computationally efficient surrogate models of physical applications that circumvent the need for the more expensive simulation techniques entirely. We also discuss how ML can be used to process large amounts of data, using as examples many different scientific fields, such as engineering, medicine, astronomy and computing. Finally, we review how ML has been used to create more realistic and responsive virtual reality applications.

Download Full-text

Can trust become a factor of economic growth? Dynamic changes in the level of trust of Russian youth

Voprosy Ekonomiki ◽

10.32609/0042-8736-2020-7-92-107 ◽

2020 ◽

pp. 92-107 ◽

Cited By ~ 2

Author(s):

A. I. Bakhtigaraeva ◽

A. A. Stavinskaya

Keyword(s):

Institutional Environment ◽

Negative Effects ◽

Dynamic Changes ◽

Generalized Trust ◽

Level Of Education ◽

Advantages And Disadvantages ◽

The Future ◽

Analysis Of Dynamics

The article considers the role of trust in the economy, the mechanisms of its accumulation and the possibility of using it as one of the growth factors in the future. The advantages and disadvantages of measuring the level of generalized trust using two alternative questions — about trusting people in general and trusting strangers — are analyzed. The results of the analysis of dynamics of the level of generalized trust among Russian youth, obtained within the study of the Institute for National Projects in 10 regions of Russia, are presented. It is shown that there are no significant changes in trust in people in general during the study at university. At the same time, the level of trust in strangers falls, which can negatively affect the level of trust in the country as a whole, and as a result have negative effects on the development of the economy in the future. Possible causes of the observed trends and the role of universities are discussed. Also the question about the connection between the level of education and generalized trust in countries with different quality of the institutional environment is raised.

Download Full-text

A Literature Review Study of Software Defect Prediction using Machine Learning Techniques

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i6.286 ◽

2018 ◽

Vol 6 (6) ◽

pp. 300 ◽

Cited By ~ 3

Author(s):

Feidu Akmel ◽

Ermiyas Birihanu ◽

Bahir Siraj

Keyword(s):

Machine Learning ◽

Software Metrics ◽

Quality Standard ◽

Machine Learning Techniques ◽

Software Systems ◽

Health Care Insurance ◽

Software Defect ◽

Learning Techniques ◽

Software Product

Software systems are any software product or applications that support business domains such as Manufacturing,Aviation, Health care, insurance and so on.Software quality is a means of measuring how software is designed and how well the software conforms to that design. Some of the variables that we are looking for software quality are Correctness, Product quality, Scalability, Completeness and Absence of bugs, However the quality standard that was used from one organization is different from other for this reason it is better to apply the software metrics to measure the quality of software. Attributes that we gathered from source code through software metrics can be an input for software defect predictor. Software defect are an error that are introduced by software developer and stakeholders. Finally, in this study we discovered the application of machine learning on software defect that we gathered from the previous research works.

Download Full-text

Accurate Detection and Quantization of Leaf- Diseases through Soft Computing

International Journal of Computational Physics Series ◽

10.29167/a1i1p236-247 ◽

2018 ◽

Vol 1 (1) ◽

pp. 236-247

Author(s):

Divya Srivastava ◽

Rajitha B. ◽

Suneeta Agarwal

Keyword(s):

Machine Learning ◽

Image Processing ◽

Agricultural Production ◽

Bacterial Blight ◽

Early Stage ◽

Second Phase ◽

Computationally Efficient ◽

Stage Of Disease ◽

Accurate Detection ◽

Two Phases

Diseases in leaves can cause the significant reduction in both quality and quantity of agricultural production. If early and accurate detection of disease/diseases in leaves can be automated, then the proper remedy can be taken timely. A simple and computationally efficient approach is presented in this paper for disease/diseases detection on leaves. Only detecting the disease is not beneficial without knowing the stage of disease thus the paper also determine the stage of disease/diseases by quantizing the affected of the leaves by using digital image processing and machine learning. Though there exists a variety of diseases on leaves, but the bacterial and fungal spots (Early Scorch, Late Scorch, and Leaf Spot) are the most prominent diseases found on leaves. Keeping this in mind the paper deals with the detection of Bacterial Blight and Fungal Spot both at an early stage (Early Scorch) and late stage (Late Scorch) on the variety of leaves. The proposed approach is divided into two phases, in the first phase, it identifies one or more disease/diseases existing on leaves. In the second phase, amount of area affected by the disease/diseases is calculated. The experimental results obtained showed 97% accuracy using the proposed approach.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

Pollutants in Organic Chemistry and Medicinal Chemistry Education Laboratory. Experimental and Machine Learning Studies

Current Topics in Medicinal Chemistry ◽

10.2174/1568026620666200211110043 ◽

2020 ◽

Vol 20 (9) ◽

pp. 720-730

Author(s):

Iker Montes-Bageneta ◽

Urtzi Akesolo ◽

Sara López ◽

Maria Merino ◽

Eneritz Anakabe ◽

...

Keyword(s):

Organic Chemistry ◽

Machine Learning ◽

Chemistry Education ◽

Organic Waste ◽

Computational Modelling ◽

University Education ◽

Academic Factors ◽

Academic Year ◽

Statistical Analysis Software

Aims: Computational modelling may help us to detect the more important factors governing this process in order to optimize it. Background: The generation of hazardous organic waste in teaching and research laboratories poses a big problem that universities have to manage. Methods: In this work, we report on the experimental measurement of waste generation on the chemical education laboratories within our department. We measured the waste generated in the teaching laboratories of the Organic Chemistry Department II (UPV/EHU), in the second semester of the 2017/2018 academic year. Likewise, to know the anthropogenic and social factors related to the generation of waste, a questionnaire has been utilized. We focused on all students of Experimentation in Organic Chemistry (EOC) and Organic Chemistry II (OC2) subjects. It helped us to know their prior knowledge about waste, awareness of the problem of separate organic waste and the correct use of the containers. These results, together with the volumetric data, have been analyzed with statistical analysis software. We obtained two Perturbation-Theory Machine Learning (PTML) models including chemical, operational, and academic factors. The dataset analyzed included 6050 cases of laboratory practices vs. practices of reference. Results: These models predict the values of acetone waste with R2 = 0.88 and non-halogenated waste with R2 = 0.91. Conclusion: This work opens a new gate to the implementation of more sustainable techniques and a circular economy with the aim of improving the quality of university education processes.

Download Full-text