Data Mining algorithms in search of effective conditions for conducting chemical reactions

В настоящее время накоплено значительное количество экспериментальных данных, фиксирующих процесс протекания химических реакций. Анализ этих данных комплексом алгоритмов Data Mining дает важную практическую информацию для поиска эффективных условий проведения реакций, при которых получается максимальное количество целевого продукта при минимальных затратах. В данной работе на примере работы с базой, содержащей данные о протекании реакции карбонилирования различных олефинов, показано, как разработанный нами программный комплекс позволяет извлечь полезные знания, способствующие повышению эффективности химических реакций. At present, a significant amount of experimental data has been accumulated, recording the process of the occurrence of chemical reactions. Analysis of these data by a set of Data Mining algorithms provides important practical information for finding effective conditions for carrying out reactions, at which the maximum amount of the target product is obtained at minimal cost. In this paper, using the example of working with a database containing data on the course of the carbonylation reaction of various olefins, it is shown how the software package developed by us allows us to extract useful knowledge that contributes to an increase in the efficiency of chemical reactions.

Download Full-text

BioMedR: an R/CRAN package for integrated data analysis pipeline in biomedical study

Briefings in Bioinformatics ◽

10.1093/bib/bbz150 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jie Dong ◽

Min-Feng Zhu ◽

Yong-Huan Yun ◽

Ai-Ping Lu ◽

Ting-Jun Hou ◽

...

Keyword(s):

Data Mining ◽

Clustering Algorithms ◽

R Package ◽

Integrated Analysis ◽

Analysis Pipeline ◽

Molecular Fingerprints ◽

Useful Knowledge ◽

Data Mining Algorithms ◽

Mining Methods ◽

Mining Algorithms

Abstract Background With the increasing development of biotechnology and information technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these resources needs to be extracted and then transformed to useful knowledge by various data mining methods. However, a main computational challenge is how to effectively represent or encode molecular objects under investigation such as chemicals, proteins, DNAs and even complicated interactions when data mining methods are employed. To further explore these complicated data, an integrated toolkit to represent different types of molecular objects and support various data mining algorithms is urgently needed. Results We developed a freely available R/CRAN package, called BioMedR, for molecular representations of chemicals, proteins, DNAs and pairwise samples of their interactions. The current version of BioMedR could calculate 293 molecular descriptors and 13 kinds of molecular fingerprints for small molecules, 9920 protein descriptors based on protein sequences and six types of generalized scale-based descriptors for proteochemometric modeling, more than 6000 DNA descriptors from nucleotide sequences and six types of interaction descriptors using three different combining strategies. Moreover, this package realized five similarity calculation methods and four powerful clustering algorithms as well as several useful auxiliary tools, which aims at building an integrated analysis pipeline for data acquisition, data checking, descriptor calculation and data modeling. Conclusion BioMedR provides a comprehensive and uniform R package to link up different representations of molecular objects with each other and will benefit cheminformatics/bioinformatics and other biomedical users. It is available at: https://CRAN.R-project.org/package=BioMedR and https://github.com/wind22zhu/BioMedR/.

Download Full-text

Data Mining in Proteomics Using Grid Computing

Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine, and Healthcare ◽

10.4018/978-1-60566-374-6.ch013 ◽

2011 ◽

pp. 245-267

Author(s):

Fotis Psomopoulos ◽

Pericles Mitkas

Keyword(s):

Data Mining ◽

Data Retrieval ◽

Protein Classification ◽

Classification Problems ◽

Proteomics Data ◽

Grid Environment ◽

Useful Knowledge ◽

Domain Specific ◽

Data Mining Algorithms ◽

Mining Algorithms

The scope of this chapter is the presentation of Data Mining techniques for knowledge extraction in proteomics, taking into account both the particular features of most proteomics issues (such as data retrieval and system complexity), and the opportunities and constraints found in a Grid environment. The chapter discusses the way new and potentially useful knowledge can be extracted from proteomics data, utilizing Grid resources in a transparent way. Protein classification is introduced as a current research issue in proteomics, which also demonstrates most of the domain – specific traits. An overview of common and custom-made Data Mining algorithms is provided, with emphasis on the specific needs of protein classification problems. A unified methodology is presented for complex Data Mining processes on the Grid, highlighting the different application types and the benefits and drawbacks in each case. Finally, the methodology is validated through real-world case studies, deployed over the EGEE grid environment.

Download Full-text

Data Mining in Proteomics Using Grid Computing

Grid and Cloud Computing ◽

10.4018/978-1-4666-0879-5.ch409 ◽

2012 ◽

pp. 918-940

Author(s):

Fotis Psomopoulos ◽

Pericles Mitkas

Keyword(s):

Data Mining ◽

Data Retrieval ◽

Protein Classification ◽

Classification Problems ◽

Proteomics Data ◽

Grid Environment ◽

Useful Knowledge ◽

Domain Specific ◽

Data Mining Algorithms ◽

Mining Algorithms

Download Full-text

Data Mining and Privacy

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch061 ◽

2011 ◽

pp. 388-393

Author(s):

Esma Aïmeur

Keyword(s):

Data Mining ◽

Data Privacy ◽

Reconstruction Algorithm ◽

Learning Task ◽

Privacy Preserving ◽

Sources Of Information ◽

Privacy Preserving Data Mining ◽

Useful Knowledge ◽

Data Mining Algorithms ◽

Mining Algorithms

With the emergence of Internet, it is now possible to connect and access sources of information and databases throughout the world. At the same time, this raises many questions regarding the privacy and the security of the data, in particular how to mine useful information while preserving the privacy of sensible and confidential data. Privacy-preserving data mining is a relatively new but rapidly growing field that studies how data mining algorithms affect the privacy of data and tries to find and analyze new algorithms that preserve this privacy. At first glance, it may seem that data mining and privacy have orthogonal goals, the first one being concerned with the discovery of useful knowledge from data whereas the second is concerned with the protection of data’s privacy. Historically, the interactions between privacy and data mining have been questioned and studied since more than a decade ago, but the name of the domain itself was coined more recently by two seminal papers attacking the subject from two very different perspectives (Agrawal & Srikant, 2000; Lindell & Pinkas, 2000). The first paper (Agrawal & Srikant, 2000) takes the approach of randomizing the data through the injection of noise, and then recovers from it by applying a reconstruction algorithm before a learning task (the induction of a decision tree) is carried out on the reconstructed dataset. The second paper (Lindell & Pinkas, 2000) adopts a cryptographic view of the problem and rephrases it within the general framework of secure multiparty computation. The outline of this chapter is the following. First, the area of privacy-preserving data mining is illustrated through three scenarios, before a classification of privacy- preserving algorithms is described and the three main approaches currently used are detailed. Finally, the future trends and challenges that await the domain are discussed before concluding.

Download Full-text

Novel Adverse Events of Iloperidone: A Disproportionality Analysis in US Food and Drug Administration Adverse Event Reporting System (FAERS) Database

Current Drug Safety ◽

10.2174/1574886313666181026100000 ◽

2019 ◽

Vol 14 (1) ◽

pp. 21-26 ◽

Cited By ~ 2

Author(s):

Viswam Subeesh ◽

Eswaran Maheswari ◽

Hemendra Singh ◽

Thomas Elsa Beulah ◽

Ann Mary Swaroop

Keyword(s):

Data Mining ◽

Adverse Event ◽

Adverse Events ◽

Reporting System ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Disproportionality Analysis ◽

Positive Signal ◽

Data Mining Algorithms ◽

Mining Algorithms

Background: The signal is defined as “reported information on a possible causal relationship between an adverse event and a drug, of which the relationship is unknown or incompletely documented previously”. Objective: To detect novel adverse events of iloperidone by disproportionality analysis in FDA database of Adverse Event Reporting System (FAERS) using Data Mining Algorithms (DMAs). Methodology: The US FAERS database consists of 1028 iloperidone associated Drug Event Combinations (DECs) which were reported from 2010 Q1 to 2016 Q3. We consider DECs for disproportionality analysis only if a minimum of ten reports are present in database for the given adverse event and which were not detected earlier (in clinical trials). Two data mining algorithms, namely, Reporting Odds Ratio (ROR) and Information Component (IC) were applied retrospectively in the aforementioned time period. A value of ROR-1.96SE>1 and IC- 2SD>0 were considered as the threshold for positive signal. Results: The mean age of the patients of iloperidone associated events was found to be 44years [95% CI: 36-51], nevertheless age was not mentioned in twenty-one reports. The data mining algorithms exhibited positive signal for akathisia (ROR-1.96SE=43.15, IC-2SD=2.99), dyskinesia (21.24, 3.06), peripheral oedema (6.67,1.08), priapism (425.7,9.09) and sexual dysfunction (26.6-1.5) upon analysis as those were well above the pre-set threshold. Conclusion: Iloperidone associated five potential signals were generated by data mining in the FDA AERS database. The result requires an integration of further clinical surveillance for the quantification and validation of possible risks for the adverse events reported of iloperidone.

Download Full-text