ReactionCode: a new versatile format for searching, analysis, classification, transform, and encoding/decoding of reactions

Database Organization ◽

Machine Readable ◽

In the past two decades a lot of different formats for molecules and reactions have been created. These formats were mostly developed for the purposes of identifiers, representation, classification, analysis and data exchange. A lot of efforts have been made on molecule formats but only few for reactions where the endeavors have been made mostly by companies leading to proprietary formats. Here, we developed a new open-source format which allows to encode and decode a reaction into multi-layers machine readable code, which aggregates reactants and products into a condensed graph of reaction (CGR). This format is flexible and can be used in a context of reaction similarity searching and classification. It is also designed for database organization, machine learning applications and as a new transform reaction language.

ReactionCode: format for reaction searching, analysis, classification, transform, and encoding/decoding

Journal of Cheminformatics ◽

10.1186/s13321-020-00476-x ◽

2020 ◽

Vol 12 (1) ◽

Author(s):

Victorien Delannée ◽

Marc C. Nicklaus

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Exchange ◽

Similarity Searching ◽

Classification Analysis ◽

The Past ◽

Database Organization ◽

Machine Readable ◽

AbstractIn the past two decades a lot of different formats for molecules and reactions have been created. These formats were mostly developed for the purposes of identifiers, representation, classification, analysis and data exchange. A lot of efforts have been made on molecule formats but only few for reactions where the endeavors have been made mostly by companies leading to proprietary formats. Here, we present ReactionCode: a new open-source format that allows one to encode and decode a reaction into multi-layer machine readable code, which aggregates reactants and products into a condensed graph of reaction (CGR). This format is flexible and can be used in a context of reaction similarity searching and classification. It is also designed for database organization, machine learning applications and as a new transform reaction language.

ReactionCode: Format for Reaction Searching, Analysis, Classification, Transform, and Encoding/Decoding

10.21203/rs.3.rs-48395/v1 ◽

2020 ◽

Author(s):

Victorien Delannée ◽

Marc C. Nicklaus

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Exchange ◽

Similarity Searching ◽

Classification Analysis ◽

The Past ◽

Database Organization ◽

Machine Readable ◽

Abstract In the past two decades a lot of different formats for molecules and reactions have been created. These formats were mostly developed for the purposes of identifiers, representation, classification, analysis and data exchange. A lot of efforts ha ve been made on molecule formats but only few for reactions where the endeavors have been made mostly by companies leading to proprietary formats. Here, we developed a new open-source format which allows to encode and decode a reaction into multi-layers machine readable code, which aggregates reactants and products into a condensed graph of reaction (CGR). This format is flexible and can be used in a context of reaction similarity searching and classification. It is also designed for database organization, machine learning applications and as a new transform reaction language.

ReactionCode: a new versatile format for searching, analysis, classification, transform, and encoding/decoding of reactions

10.26434/chemrxiv.12058971.v2 ◽

2020 ◽

Author(s):

Victorien Delannée ◽

Marc Nicklaus

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Exchange ◽

Similarity Searching ◽

Classification Analysis ◽

The Past ◽

Database Organization ◽

Machine Readable ◽

In the past two decades a lot of different formats for molecules and reactions have been created. These formats were mostly developed for the purposes of identifiers, representation, classification, analysis and data exchange. A lot of efforts have been made on molecule formats but only few for reactions where the endeavors have been made mostly by companies leading to proprietary formats. Here, we developed a new open-source format which allows to encode and decode a reaction into multi-layers machine readable code, which aggregates reactants and products into a condensed graph of reaction (CGR). This format is flexible and can be used in a context of reaction similarity searching and classification. It is also designed for database organization, machine learning applications and as a new transform reaction language.

ReactionCode: Format for Reaction Searching, Analysis, Classification, Transform, and Encoding/Decoding

10.26434/chemrxiv.12058971 ◽

2020 ◽

Author(s):

Victorien Delannée ◽

Marc Nicklaus

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Exchange ◽

Similarity Searching ◽

Classification Analysis ◽

The Past ◽

Database Organization ◽

Machine Readable ◽

In the past two decades a lot of different formats for molecules and reactions have been created. These formats were mostly developed for the purposes of identifiers, representation, classification, analysis and data exchange. A lot of efforts have been made on molecule formats but only few for reactions where the endeavors have been made mostly by companies leading to proprietary formats. Here, we developed a new open-source format which allows to encode and decode a reaction into multi-layers machine readable code, which aggregates reactants and products into a condensed graph of reaction (CGR). This format is flexible and can be used in a context of reaction similarity searching and classification. It is also designed for database organization, machine learning applications and as a new transform reaction language.

PAAIcon2016: Multivariate statistics for hydrogeology: moving forward from "the present is the key to the past"

10.31227/osf.io/fbgmy ◽

2017 ◽

Author(s):

Dasapta Erwin Irawan

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Cluster Analysis ◽

Open Source ◽

Multivariate Statistics ◽

Family Life ◽

Principal Component ◽

Demographic Health Survey ◽

The Past ◽

The Future

This abstract has been presented at the PAAI conference 2016, 16-17 Nov 2016. Consists of to part: Part 1 Introduction to Open Science (zip file) and Part 2 Multivariate statistics in hydrogeology.ABSTRACT Geology is one of the oldest science in the world. Originated from natural science, it grows from the observation of sea shells to the sophisticated interpretation of the earth interior. On recent development geological approach need to be more quantitative, related to the needs prediction and simulation. Geology has shifted from “the present is the key to the past” towards “the present is the key to the past as the base of prediction of the future”. Hydrogeology is one of the promising branch of geology that relies more to quantitative analysis. Multivariate statistics is one of the most frequently used resources in this field. We did some literature search and web scraping to analyze current situation and future trend of multivariate statistics application for geological synthesis. We used several sets of keywords but this set gave the most satifying results: “(all in title) multivariate statistics (and) groundwater”, on Google Scholar, Crossref, and ScienceOpen database. The final result was 164 papers. We used VosViewer and Zotero to do some text mining operations. Based on the analysis we can draw some results. Cluster analysis and principal component analysis are still the most frequently used method in hydrogeology. Both are mostly used to extract hydrochemical and isotope data to analyze the hydrogeological nature of groundwater flow. More machine learning methods have been introduced in the last five years in hydrogeological science. `Random forest` and `decision tree` technique are used extensively to learn the from physical and chemical properties of groundwater. Open source tools have also shifted the use of major statistical or programming language such as: SAS and Matlab. Python and R programming are the two famous open source applications in this field. We also note the increase of papers to discuss hydrogeology and public health sector. Therefore such methods are also being used to analyze open demographic data like DHS (demographic health survey) and FLS (Family Life Survey). Strong community of programmer makes the exponential development of both languages, via platform like Github. This has become the future of hydrogeology. ABSTRAK Geologi adalah salah satu ilmu tertua di dunia. Berasal dari ilmu alam, ia berkembang dari observasi kerang laut ke arah interpretasi interior bumi yang kompleks. Dalam perkembangannya saat ini, geologi memerlukan pendekatan yang lebih kuantitatif, berkaitan dengan kebutuhan untuk prediksi dan simulasi. Geologi telah bergeser dari “the present is the key to the past” (saat ini adalah kunci menuju masa lalu) menjadi “the present is the key to the past as the base of prediction of the future” (saat ini adalah kunci menuju masa lalu dan sebagai dasar prediksi masa depan. Hidrogeologi adalah salah satu cabang ilmu geologi yang bersandar kepada analisis kuantitatif. Statistik multivariabel adalah salah satu metode yang digunakan dalam bidang ini. Kami telah melakukan telaah literatur dan penyadapan web untuk menganalisis kondisi saat ini dan trend masa depan tentang aplikasi statistik multivariabel untuk sintesis geologi. Beberapa set kata kunci digunakan, tetapi yang berikut ini memberikan hasil paling memuaskan: “(all in title) multivariate statistics (and) groundwater”. Database Google Scholar, Crossref, dan ScienceOpen menjadi sumber informasi yang menghasilkan hasil terseleksi sebanyak 164 makalah ilmiah. Kami menggunakan aplikasi VosViewer and Zotero untuk mengolah data teks (text mining). Berdasarkan analisis, cluster analysis dan principal component analysis masih menjadi teknik yang paling banyak dipakai. Keduanya umumnya digunakan untuk mengesktrak data hidrokimia dan isotop untuk menganalisis kondisi hidrogeologi dan aliran air tanah. Lebih banyak lagi metode machine learning (pembelajaran mesin) telah dikenalkan dan digunakan dalam lima tahun terakhir. Teknik “Random forest” and “decision tree” yang merupakan pengembangan dari teknik regresi linear juga telah banyak digunakan untuk mempelajari sifat fisik dan kimia air tanah. Penggunaan aplikasi open source juga telah menggeser piranti lunak berbayar yang mahal, seperti SAS and Matlab. Bahasa pemrograman Python and R adalah beberapa saja yang terkenal dalam bidang machine learning. Kami juga menangkap peningkatan jumlah makalah yang isinya merupakan irisan antara bidang hidrogeologi dan kesehatan masyarakat. Karena itu teknik machine learning juga digunakan untuk menganalisis data terbuka demografi seperti DHS (demographic health survey) dan FLS (Family Life Survey). Komunitas programmer yang kuat mampu mengembangan piranti lunak open source ini secara eksponensial, melalui platform seperti Github. Hal ini telah menjadi masa depan dari hidrogeologi.

Machine Learning Applications in Head and Neck Radiation Oncology: Lessons From Open-Source Radiomics Challenges

Frontiers in Oncology ◽

10.3389/fonc.2018.00294 ◽

2018 ◽

Vol 8 ◽

Cited By ~ 11

Author(s):

Hesham Elhalawani ◽

Timothy A. Lin ◽

Stefania Volpe ◽

Abdallah S. R. Mohamed ◽

Aubrey L. White ◽

...

Keyword(s):

Machine Learning ◽

Head And Neck ◽

Open Source ◽

Radiation Oncology ◽

Head And Neck Radiation

LITERATURE SURVEY ON MACHINE LEARNING BASED TECHNIQUES IN MEDICAL DATAANALYSIS

10.36106/ijar/2113665 ◽

2019 ◽

pp. 1-4

Author(s):

Lavanya Vemulapalli

Keyword(s):

Machine Learning ◽

Disease Diagnosis ◽

Research Area ◽

Machine Learning Techniques ◽

Intelligent Algorithms ◽

The Past ◽

Disease Prognosis ◽

Learning Techniques ◽

Comprehensive Survey

Machine Learning plays a significant role among the areas of Artificial Intelligence (AI). During recent years, Machine Learning (ML) has been attracting many researchers, and it has been successfully applied in many fields such as medical, education, forecasting etc., Right now, the diagnosis of diseases is mostly from expert's decision. Diagnosis is a major task in clinical science as it is crucial in determining if a patient is having the disease or not. This in turn decides the suitable path of treatment for disease diagnosis. Applying machine learning techniques for disease diagnosis using intelligent algorithms has been a hot research area of computer science. This paper throws a light on the comprehensive survey on the machine learning applications in the medical disease prognosis during the past decades

Machine learning applications in proteomics research: How the past can boost the future

PROTEOMICS ◽

10.1002/pmic.201300289 ◽

2014 ◽

Vol 14 (4-5) ◽

pp. 353-366 ◽

Cited By ~ 35

Author(s):

Pieter Kelchtermans ◽

Wout Bittremieux ◽

Kurt De Grave ◽

Sven Degroeve ◽

Jan Ramon ◽

...

Keyword(s):

Machine Learning ◽

The Past ◽

The Future ◽

Proteomics Research

Feature Selection Algorithms as One of the Python Data Analytical Tools

Future Internet ◽

10.3390/fi12030054 ◽

2020 ◽

Vol 12 (3) ◽

pp. 54

Author(s):

Nikita Pilnenskiy ◽

Ivan Smetannikov

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Open Source ◽

Information Technologies ◽

Current Trend ◽

Modern State ◽

Selection Algorithms ◽

Python Programming ◽

Analytical Tools

With the current trend of rapidly growing popularity of the Python programming language for machine learning applications, the gap between machine learning engineer needs and existing Python tools increases. Especially, it is noticeable for more classical machine learning fields, namely, feature selection, as the community attention in the last decade has mainly shifted to neural networks. This paper has two main purposes. First, we perform an overview of existing open-source Python and Python-compatible feature selection libraries, show their problems, if any, and demonstrate the gap between these libraries and the modern state of feature selection field. Then, we present new open-source scikit-learn compatible ITMO FS (Information Technologies, Mechanics and Optics University feature selection) library that is currently under development, explain how its architecture covers modern views on feature selection, and provide some code examples on how to use it with Python and its performance compared with other Python feature selection libraries.