AI Application in Pharmaceutical Industries Being Beneficial to Material Science

Abstract The application of AI will develop further in the area of material technology similarly to how the application has advanced in the pharmaceutical industry. In this article, we explain how AI is applied in the pharmaceutical industry and in the material sciences. First, we show the trends of AI in data analysis for the different areas of the pharmaceutical industry. Second, we explain how the new machine learning platform (AutoML), in particular, benefits this type of data analysis by describing supervised machine learning. If the target value is available to define, executing the supervised machine learning is feasible to solve the problem. In this case, Implementing an AutoML process is the simple solution to look for insight. Third, we provide and discuss an example of an output from analysis done using unsupervised machine learning such as topological data analysis (TDA) as a new approach. Finally, we explain that these successful examples of AI applications in pharma provide a potential roadmap of how they may be applied to the science of material informatics. Adding new data to the current data is almost always required. Achievements are observed in the area of life science because many databases are consolidated into one database. Thus, creating new data with appropriate definitions and expanding the amount of applicable data will help materials informatics evolve into a field with both higher quality and more robust analyses in the future.

Download Full-text

Classification of apatite structures via topological data analysis: a framework for a ‘Materials Barcode’ representation of structure maps

Scientific Reports ◽

10.1038/s41598-021-90070-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Scott Broderick ◽

Ruhil Dongol ◽

Tianmu Zhang ◽

Krishna Rajan

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Crystal Chemistry ◽

Persistent Homology ◽

Hierarchical Classification ◽

Topological Data Analysis ◽

Learning Tool ◽

Coordination Polyhedra ◽

Machine Learning Tool ◽

Topological Data

AbstractThis paper introduces the use of topological data analysis (TDA) as an unsupervised machine learning tool to uncover classification criteria in complex inorganic crystal chemistries. Using the apatite chemistry as a template, we track through the use of persistent homology the topological connectivity of input crystal chemistry descriptors on defining similarity between different stoichiometries of apatites. It is shown that TDA automatically identifies a hierarchical classification scheme within apatites based on the commonality of the number of discrete coordination polyhedra that constitute the structural building units common among the compounds. This information is presented in the form of a visualization scheme of a barcode of homology classifications, where the persistence of similarity between compounds is tracked. Unlike traditional perspectives of structure maps, this new “Materials Barcode” schema serves as an automated exploratory machine learning tool that can uncover structural associations from crystal chemistry databases, as well as to achieve a more nuanced insight into what defines similarity among homologous compounds.

Download Full-text

On the Application of Topological Data Analysis and Machine Learning to Flood Incidents, and Decision Making

SSRN Electronic Journal ◽

10.2139/ssrn.3981505 ◽

2021 ◽

Author(s):

Felix Obi Ohanuba ◽

Mohd Tahir Ismail ◽

Majid Khan Ali

Keyword(s):

Machine Learning ◽

Decision Making ◽

Data Analysis ◽

Topological Data Analysis ◽

Topological Data

Download Full-text

An Overview of Supervised Machine Learning Methods and Data Analysis for COVID-19 Detection

Journal of Healthcare Engineering ◽

10.1155/2021/4733167 ◽

2021 ◽

Vol 2021 ◽

pp. 1-18

Author(s):

Aurelle Tchagna Kouanou ◽

Thomas Mih Attia ◽

Cyrille Feudjio ◽

Anges Fleurio Djeumo ◽

Adèle Ngo Mouelas ◽

...

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

High Rate ◽

Supervised Machine Learning ◽

Polymerase Chain Reaction Test ◽

Support Vector ◽

Machine Learning Algorithm ◽

Test Results

Background and Objective. To mitigate the spread of the virus responsible for COVID-19, known as SARS-CoV-2, there is an urgent need for massive population testing. Due to the constant shortage of PCR (polymerase chain reaction) test reagents, which are the tests for COVID-19 by excellence, several medical centers have opted for immunological tests to look for the presence of antibodies produced against this virus. However, these tests have a high rate of false positives (positive but actually negative test results) and false negatives (negative but actually positive test results) and are therefore not always reliable. In this paper, we proposed a solution based on Data Analysis and Machine Learning to detect COVID-19 infections. Methods. Our analysis and machine learning algorithm is based on most cited two clinical datasets from the literature: one from San Raffaele Hospital Milan Italia and the other from Hospital Israelita Albert Einstein São Paulo Brasilia. The datasets were processed to select the best features that most influence the target, and it turned out that almost all of them are blood parameters. EDA (Exploratory Data Analysis) methods were applied to the datasets, and a comparative study of supervised machine learning models was done, after which the support vector machine (SVM) was selected as the one with the best performance. Results. SVM being the best performant is used as our proposed supervised machine learning algorithm. An accuracy of 99.29%, sensitivity of 92.79%, and specificity of 100% were obtained with the dataset from Kaggle (https://www.kaggle.com/einsteindata4u/covid19) after applying optimization to SVM. The same procedure and work were performed with the dataset taken from San Raffaele Hospital (https://zenodo.org/record/3886927#.YIluB5AzbMV). Once more, the SVM presented the best performance among other machine learning algorithms, and 92.86%, 93.55%, and 90.91% for accuracy, sensitivity, and specificity, respectively, were obtained. Conclusion. The obtained results, when compared with others from the literature based on these same datasets, are superior, leading us to conclude that our proposed solution is reliable for the COVID-19 diagnosis.

Download Full-text

Persistence Bag-of-Words for Topological Data Analysis

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/624 ◽

2019 ◽

Cited By ~ 1

Author(s):

Bartosz Zieliński ◽

Michał Lipiński ◽

Mateusz Juda ◽

Matthias Zeppelzauer ◽

Paweł Dłotko

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Mathematical Theory ◽

State Of The Art ◽

Persistent Homology ◽

Complex Structure ◽

Topological Data Analysis ◽

Bag Of Words ◽

Seamless Integration ◽

Alternative Approaches

Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches.

Download Full-text

HMMRATAC: a Hidden Markov ModeleR for ATAC-seq

Nucleic Acids Research ◽

10.1093/nar/gkz533 ◽

2019 ◽

Vol 47 (16) ◽

pp. e91-e91 ◽

Cited By ~ 17

Author(s):

Evan D Tarbell ◽

Tao Liu

Keyword(s):

Machine Learning ◽

Hidden Markov ◽

Current Data ◽

Supervised Machine Learning ◽

Learning Approach ◽

Analysis Tool ◽

Peak Calling ◽

Entire Genome ◽

Machine Learning Approach ◽

Accessible Chromatin

Abstract ATAC-seq has been widely adopted to identify accessible chromatin regions across the genome. However, current data analysis still utilizes approaches initially designed for ChIP-seq or DNase-seq, without considering the transposase digested DNA fragments that contain additional nucleosome positioning information. We present the first dedicated ATAC-seq analysis tool, a semi-supervised machine learning approach named HMMRATAC. HMMRATAC splits a single ATAC-seq dataset into nucleosome-free and nucleosome-enriched signals, learns the unique chromatin structure around accessible regions, and then predicts accessible regions across the entire genome. We show that HMMRATAC outperforms the popular peak-calling algorithms on published human ATAC-seq datasets. We find that single-end sequenced or size-selected ATAC-seq datasets result in a loss of sensitivity compared to paired-end datasets without size-selection.

Download Full-text

Towards Personalized Diagnosis of Glioblastoma in Fluid-Attenuated Inversion Recovery (FLAIR) by Topological Interpretable Machine Learning

Mathematics ◽

10.3390/math8050770 ◽

2020 ◽

Vol 8 (5) ◽

pp. 770

Author(s):

Matteo Rucco ◽

Giovanna Viticchi ◽

Lorenzo Falsetti

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Tumor Growth ◽

Topological Data Analysis ◽

Textural Features ◽

The Third ◽

Interpretable Machine Learning ◽

Fluid Attenuated Inversion Recovery ◽

Topological Data

Glioblastoma multiforme (GBM) is a fast-growing and highly invasive brain tumor, which tends to occur in adults between the ages of 45 and 70 and it accounts for 52 percent of all primary brain tumors. Usually, GBMs are detected by magnetic resonance images (MRI). Among MRI, a fluid-attenuated inversion recovery (FLAIR) sequence produces high quality digital tumor representation. Fast computer-aided detection and segmentation techniques are needed for overcoming subjective medical doctors (MDs) judgment. This study has three main novelties for demonstrating the role of topological features as new set of radiomics features which can be used as pillars of a personalized diagnostic systems of GBM analysis from FLAIR. For the first time topological data analysis is used for analyzing GBM from three complementary perspectives—tumor growth at cell level, temporal evolution of GBM in follow-up period and eventually GBM detection. The second novelty is represented by the definition of a new Shannon-like topological entropy, the so-called Generator Entropy. The third novelty is the combination of topological and textural features for training automatic interpretable machine learning. These novelties are demonstrated by three numerical experiments. Topological Data Analysis of a simplified 2D tumor growth mathematical model had allowed to understand the bio-chemical conditions that facilitate tumor growth—the higher the concentration of chemical nutrients the more virulent the process. Topological data analysis was used for evaluating GBM temporal progression on FLAIR recorded within 90 days following treatment completion and at progression. The experiment had confirmed that persistent entropy is a viable statistics for monitoring GBM evolution during the follow-up period. In the third experiment we developed a novel methodology based on topological and textural features and automatic interpretable machine learning for automatic GBM classification on FLAIR. The algorithm reached a classification accuracy up to 97%.

Download Full-text