IP Analytics and Machine Learning Applied to Create Process Visualization Graphs for Chemical Utility Patents

Researchers must read and understand a large volume of technical papers, including patent documents, to fully grasp the state-of-the-art technological progress in a given domain. Chemical research is particularly challenging with the fast growth of newly registered utility patents (also known as intellectual property or IP) that provide detailed descriptions of the processes used to create a new chemical or a new process to manufacture a known chemical. The researcher must be able to understand the latest patents and literature in order to develop new chemicals and processes that do not infringe on existing claims and processes. This research uses text mining, integrated machine learning, and knowledge visualization techniques to effectively and accurately support the extraction and graphical presentation of chemical processes disclosed in patent documents. The computer framework trains a machine learning model called ALBERT for automatic paragraph text classification. ALBERT separates chemical and non-chemical descriptive paragraphs from a patent for effective chemical term extraction. The ChemDataExtractor is used to classify chemical terms, such as inputs, units, and reactions from the chemical paragraphs. A computer-supported graph-based knowledge representation interface is developed to plot the extracted chemical terms and their chemical process links as a network of nodes with connecting arcs. The computer-supported chemical knowledge visualization approach helps researchers to quickly understand the innovative and unique chemical or processes of any chemical patent of interest.

Download Full-text

Knowledge Visualization Techniques for Machine Learning

Intelligent Data Analysis ◽

10.3233/ida-1998-2406 ◽

1998 ◽

Vol 2 (4) ◽

pp. 333-347 ◽

Cited By ~ 2

Author(s):

Matt Humphrey ◽

Sally Jo Cunningham ◽

Ian H. Witten

Keyword(s):

Machine Learning ◽

Knowledge Visualization ◽

Visualization Techniques

Download Full-text

Prediction and Chemical Interpretation of Singlet-Oxygen-Scavenging Activity of Small Molecule Compounds by Using Machine Learning

Antioxidants ◽

10.3390/antiox10111751 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1751

Author(s):

Taiki Fujimoto ◽

Hiroaki Gotoh

Keyword(s):

Machine Learning ◽

Singlet Oxygen ◽

Molecular Descriptors ◽

High Accuracy ◽

Model Ensemble ◽

Explanatory Variables ◽

Oxygen Scavenging ◽

Machine Learning Model ◽

Feature Importance ◽

Chemical Knowledge

A chemically explainable machine learning model was constructed with a small dataset to quantitatively predict the singlet-oxygen-scavenging ability. In this model, ensemble learning based on decision trees resulted in high accuracy. For explanatory variables, molecular descriptors by computational chemistry and Morgan fingerprints were used for achieving high accuracy and simple prediction. The singlet-oxygen-scavenging mechanism was explained by the feature importance obtained from machine learning outputs. The results are consistent with conventional chemical knowledge. The use of machine learning and reduction in the number of measurements for screening high-antioxidant-capacity compounds can considerably improve prediction accuracy and efficiency.

Download Full-text

Knowledge visualization techniques for machine learning

Intelligent Data Analysis ◽

10.1016/s1088-467x(98)00029-8 ◽

1998 ◽

Vol 2 (1-4) ◽

pp. 333-347 ◽

Cited By ~ 6

Author(s):

M HUMPHREY ◽

S CUNNINGHAM ◽

I WITTEN

Keyword(s):

Machine Learning ◽

Knowledge Visualization ◽

Visualization Techniques

Download Full-text

Design of Machine Learning Model for Urban Planning and Management Improvement

International Journal of Performability Engineering ◽

10.23940/ijpe.20.06.p14.958967 ◽

2020 ◽

Vol 16 (6) ◽

pp. 958 ◽

Cited By ~ 1

Author(s):

Zhou Jiafeng ◽

Liu Tian ◽

Zou Lin

Keyword(s):

Machine Learning ◽

Urban Planning ◽

Learning Model ◽

Planning And Management ◽

Machine Learning Model ◽

Urban Planning And Management ◽

Management Improvement

Download Full-text

A Novel Machine Learning Model for Early Operational Anomaly Detection Using LWD/MWD Data

10.2523/iptc-19230-ms ◽

2019 ◽

Author(s):

Mohammed Al-Ghazal ◽

Viranchi Vedpathak

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Learning Model ◽

Machine Learning Model

Download Full-text

Machine Learning Accelerated Genetic Algorithms for Computational Materials Search

10.26434/chemrxiv.7411172 ◽

2018 ◽

Author(s):

Steen Lysgaard ◽

Paul C. Jennings ◽

Jens Strabo Hummelshøj ◽

Thomas Bligaard ◽

Tejs Vegge

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Genetic Algorithms ◽

Au Nanoparticles ◽

Learning Model ◽

Energy Calculations ◽

Atomic Distribution ◽

Machine Learning Model ◽

Fold Reduction ◽

Computational Materials

A machine learning model is used as a surrogate fitness evaluator in a genetic algorithm (GA) optimization of the atomic distribution of Pt-Au nanoparticles. The machine learning accelerated genetic algorithm (MLaGA) yields a 50-fold reduction of required energy calculations compared to a traditional GA.

Download Full-text

BAND NN: A Deep Learning Framework For Energy Prediction and Geometry Optimization of Organic Small Molecules

10.26434/chemrxiv.9763094 ◽

2019 ◽

Author(s):

Siddhartha Laghuvarapu ◽

Yashaswi Pathak ◽

U. Deva Priyakumar

Keyword(s):

Machine Learning ◽

Density Functional ◽

Computational Cost ◽

Geometry Optimization ◽

Dft Methods ◽

Energy Prediction ◽

Machine Learning Model ◽

Equilibrium Structures ◽

High Level ◽

Non Equilibrium

Recent advances in artificial intelligence along with development of large datasets of energies calculated using quantum mechanical (QM)/density functional theory (DFT) methods have enabled prediction of accurate molecular energies at reasonably low computational cost. However, machine learning models that have been reported so far requires the atomic positions obtained from geometry optimizations using high level QM/DFT methods as input in order to predict the energies, and do not allow for geometry optimization. In this paper, a transferable and molecule-size independent machine learning model (BAND NN) based on a chemically intuitive representation inspired by molecular mechanics force fields is presented. The model predicts the atomization energies of equilibrium and non-equilibrium structures as sum of energy contributions from bonds (B), angles (A), nonbonds (N) and dihedrals (D) at remarkable accuracy. The robustness of the proposed model is further validated by calculations that span over the conformational, configurational and reaction space. The transferability of this model on systems larger than the ones in the dataset is demonstrated by performing calculations on select large molecules. Importantly, employing the BAND NN model, it is possible to perform geometry optimizations starting from non-equilibrium structures along with predicting their energies.

Download Full-text

A Novel Amino Acid Sequence-based Computational Approach to Predicting Cell-penetrating Peptides

Current Computer - Aided Drug Design ◽

10.2174/1573409914666180925100355 ◽

2019 ◽

Vol 15 (3) ◽

pp. 206-211 ◽

Cited By ~ 2

Author(s):

Jihui Tang ◽

Jie Ning ◽

Xiaoyan Liu ◽

Baoming Wu ◽

Rongfeng Hu

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Amino Acid Position ◽

Cell Penetrating Peptides ◽

Support Vector ◽

Cell Penetration ◽

Drug Candidates ◽

Machine Learning Model ◽

Cell Penetrating ◽

Novel Method

Introduction: Machine Learning is a useful tool for the prediction of cell-penetration compounds as drug candidates. Materials and Methods: In this study, we developed a novel method for predicting Cell-Penetrating Peptides (CPPs) membrane penetrating capability. For this, we used orthogonal encoding to encode amino acid and each amino acid position as one variable. Then a software of IBM spss modeler and a dataset including 533 CPPs, were used for model screening. Results: The results indicated that the machine learning model of Support Vector Machine (SVM) was suitable for predicting membrane penetrating capability. For improvement, the three CPPs with the most longer lengths were used to predict CPPs. The penetration capability can be predicted with an accuracy of close to 95%. Conclusion: All the results indicated that by using amino acid position as a variable can be a perspective method for predicting CPPs membrane penetrating capability.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

Smart-ML: A System for Machine Learning Model Exploration using Pipeline Graph

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378082 ◽

2020 ◽

Author(s):

Dhaval Patel ◽

Shrey Shrivastava ◽

Wesley Gifford ◽

Stuart Siegel ◽

Jayant Kalagnanam ◽

...

Keyword(s):

Machine Learning ◽

Learning Model ◽

Machine Learning Model

Download Full-text