AI Application in Pharmaceutical Industries Being Beneficial to Material Science

Author(s):  
Hideo Suzuki ◽  
Shin Kurosawa ◽  
Stephen Marcella ◽  
Masaru Kanba ◽  
Yuichi Koretaka ◽  
...  

Abstract The application of AI will develop further in the area of material technology similarly to how the application has advanced in the pharmaceutical industry. In this article, we explain how AI is applied in the pharmaceutical industry and in the material sciences. First, we show the trends of AI in data analysis for the different areas of the pharmaceutical industry. Second, we explain how the new machine learning platform (AutoML), in particular, benefits this type of data analysis by describing supervised machine learning. If the target value is available to define, executing the supervised machine learning is feasible to solve the problem. In this case, Implementing an AutoML process is the simple solution to look for insight. Third, we provide and discuss an example of an output from analysis done using unsupervised machine learning such as topological data analysis (TDA) as a new approach. Finally, we explain that these successful examples of AI applications in pharma provide a potential roadmap of how they may be applied to the science of material informatics. Adding new data to the current data is almost always required. Achievements are observed in the area of life science because many databases are consolidated into one database. Thus, creating new data with appropriate definitions and expanding the amount of applicable data will help materials informatics evolve into a field with both higher quality and more robust analyses in the future.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Scott Broderick ◽  
Ruhil Dongol ◽  
Tianmu Zhang ◽  
Krishna Rajan

AbstractThis paper introduces the use of topological data analysis (TDA) as an unsupervised machine learning tool to uncover classification criteria in complex inorganic crystal chemistries. Using the apatite chemistry as a template, we track through the use of persistent homology the topological connectivity of input crystal chemistry descriptors on defining similarity between different stoichiometries of apatites. It is shown that TDA automatically identifies a hierarchical classification scheme within apatites based on the commonality of the number of discrete coordination polyhedra that constitute the structural building units common among the compounds. This information is presented in the form of a visualization scheme of a barcode of homology classifications, where the persistence of similarity between compounds is tracked. Unlike traditional perspectives of structure maps, this new “Materials Barcode” schema serves as an automated exploratory machine learning tool that can uncover structural associations from crystal chemistry databases, as well as to achieve a more nuanced insight into what defines similarity among homologous compounds.


2021 ◽  
Vol 2021 ◽  
pp. 1-18
Author(s):  
Aurelle Tchagna Kouanou ◽  
Thomas Mih Attia ◽  
Cyrille Feudjio ◽  
Anges Fleurio Djeumo ◽  
Adèle Ngo Mouelas ◽  
...  

Background and Objective. To mitigate the spread of the virus responsible for COVID-19, known as SARS-CoV-2, there is an urgent need for massive population testing. Due to the constant shortage of PCR (polymerase chain reaction) test reagents, which are the tests for COVID-19 by excellence, several medical centers have opted for immunological tests to look for the presence of antibodies produced against this virus. However, these tests have a high rate of false positives (positive but actually negative test results) and false negatives (negative but actually positive test results) and are therefore not always reliable. In this paper, we proposed a solution based on Data Analysis and Machine Learning to detect COVID-19 infections. Methods. Our analysis and machine learning algorithm is based on most cited two clinical datasets from the literature: one from San Raffaele Hospital Milan Italia and the other from Hospital Israelita Albert Einstein São Paulo Brasilia. The datasets were processed to select the best features that most influence the target, and it turned out that almost all of them are blood parameters. EDA (Exploratory Data Analysis) methods were applied to the datasets, and a comparative study of supervised machine learning models was done, after which the support vector machine (SVM) was selected as the one with the best performance. Results. SVM being the best performant is used as our proposed supervised machine learning algorithm. An accuracy of 99.29%, sensitivity of 92.79%, and specificity of 100% were obtained with the dataset from Kaggle (https://www.kaggle.com/einsteindata4u/covid19) after applying optimization to SVM. The same procedure and work were performed with the dataset taken from San Raffaele Hospital (https://zenodo.org/record/3886927#.YIluB5AzbMV). Once more, the SVM presented the best performance among other machine learning algorithms, and 92.86%, 93.55%, and 90.91% for accuracy, sensitivity, and specificity, respectively, were obtained. Conclusion. The obtained results, when compared with others from the literature based on these same datasets, are superior, leading us to conclude that our proposed solution is reliable for the COVID-19 diagnosis.


Author(s):  
Bartosz Zieliński ◽  
Michał Lipiński ◽  
Mateusz Juda ◽  
Matthias Zeppelzauer ◽  
Paweł Dłotko

Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches.


2019 ◽  
Vol 47 (16) ◽  
pp. e91-e91 ◽  
Author(s):  
Evan D Tarbell ◽  
Tao Liu

Abstract ATAC-seq has been widely adopted to identify accessible chromatin regions across the genome. However, current data analysis still utilizes approaches initially designed for ChIP-seq or DNase-seq, without considering the transposase digested DNA fragments that contain additional nucleosome positioning information. We present the first dedicated ATAC-seq analysis tool, a semi-supervised machine learning approach named HMMRATAC. HMMRATAC splits a single ATAC-seq dataset into nucleosome-free and nucleosome-enriched signals, learns the unique chromatin structure around accessible regions, and then predicts accessible regions across the entire genome. We show that HMMRATAC outperforms the popular peak-calling algorithms on published human ATAC-seq datasets. We find that single-end sequenced or size-selected ATAC-seq datasets result in a loss of sensitivity compared to paired-end datasets without size-selection.


Mathematics ◽  
2020 ◽  
Vol 8 (5) ◽  
pp. 770
Author(s):  
Matteo Rucco ◽  
Giovanna Viticchi ◽  
Lorenzo Falsetti

Glioblastoma multiforme (GBM) is a fast-growing and highly invasive brain tumor, which tends to occur in adults between the ages of 45 and 70 and it accounts for 52 percent of all primary brain tumors. Usually, GBMs are detected by magnetic resonance images (MRI). Among MRI, a fluid-attenuated inversion recovery (FLAIR) sequence produces high quality digital tumor representation. Fast computer-aided detection and segmentation techniques are needed for overcoming subjective medical doctors (MDs) judgment. This study has three main novelties for demonstrating the role of topological features as new set of radiomics features which can be used as pillars of a personalized diagnostic systems of GBM analysis from FLAIR. For the first time topological data analysis is used for analyzing GBM from three complementary perspectives—tumor growth at cell level, temporal evolution of GBM in follow-up period and eventually GBM detection. The second novelty is represented by the definition of a new Shannon-like topological entropy, the so-called Generator Entropy. The third novelty is the combination of topological and textural features for training automatic interpretable machine learning. These novelties are demonstrated by three numerical experiments. Topological Data Analysis of a simplified 2D tumor growth mathematical model had allowed to understand the bio-chemical conditions that facilitate tumor growth—the higher the concentration of chemical nutrients the more virulent the process. Topological data analysis was used for evaluating GBM temporal progression on FLAIR recorded within 90 days following treatment completion and at progression. The experiment had confirmed that persistent entropy is a viable statistics for monitoring GBM evolution during the follow-up period. In the third experiment we developed a novel methodology based on topological and textural features and automatic interpretable machine learning for automatic GBM classification on FLAIR. The algorithm reached a classification accuracy up to 97%.


PLoS ONE ◽  
2020 ◽  
Vol 15 (2) ◽  
pp. e0229821
Author(s):  
Eric Cawi ◽  
Patricio S La Rosa ◽  
Arye Nehorai

2018 ◽  
Vol 51 (14) ◽  
pp. 195-200 ◽  
Author(s):  
Firas A. Khasawneh ◽  
Elizabeth Munch ◽  
Jose A. Perea

Sign in / Sign up

Export Citation Format

Share Document