Classification of apatite structures via topological data analysis: a framework for a ‘Materials Barcode’ representation of structure maps

AbstractThis paper introduces the use of topological data analysis (TDA) as an unsupervised machine learning tool to uncover classification criteria in complex inorganic crystal chemistries. Using the apatite chemistry as a template, we track through the use of persistent homology the topological connectivity of input crystal chemistry descriptors on defining similarity between different stoichiometries of apatites. It is shown that TDA automatically identifies a hierarchical classification scheme within apatites based on the commonality of the number of discrete coordination polyhedra that constitute the structural building units common among the compounds. This information is presented in the form of a visualization scheme of a barcode of homology classifications, where the persistence of similarity between compounds is tracked. Unlike traditional perspectives of structure maps, this new “Materials Barcode” schema serves as an automated exploratory machine learning tool that can uncover structural associations from crystal chemistry databases, as well as to achieve a more nuanced insight into what defines similarity among homologous compounds.

Download Full-text

Topological Machine Learning for Mixed Numeric and Categorical Data

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213021500251 ◽

2021 ◽

Vol 30 (05) ◽

pp. 2150025

Author(s):

Chengyuan Wu ◽

Carol Anne Hargreaves

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Data Analysis ◽

Real World ◽

Persistent Homology ◽

Topological Data Analysis ◽

Mixed Data ◽

Topological Methods ◽

Cloud Data ◽

Topological Data

Topological data analysis is a relatively new branch of machine learning that excels in studying high-dimensional data, and is theoretically known to be robust against noise. Meanwhile, data objects with mixed numeric and categorical attributes are ubiquitous in real-world applications. However, topological methods are usually applied to point cloud data, and to the best of our knowledge there is no available framework for the classification of mixed data using topological methods. In this paper, we propose a novel topological machine learning method for mixed data classification. In the proposed method, we use theory from topological data analysis such as persistent homology, persistence diagrams and Wasserstein distance to study mixed data. The performance of the proposed method is demonstrated by experiments on a real-world heart disease dataset. Experimental results show that our topological method outperforms several state-of-the-art algorithms in the prediction of heart disease.

Download Full-text

Topological data analysis for true step detection in periodic piecewise constant signals

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2018.0027 ◽

2018 ◽

Vol 474 (2218) ◽

pp. 20180027 ◽

Cited By ~ 4

Author(s):

Firas A. Khasawneh ◽

Elizabeth Munch

Keyword(s):

Data Analysis ◽

Persistent Homology ◽

Topological Data Analysis ◽

Future Research ◽

Accurate Identification ◽

Piecewise Constant ◽

Higher Dimensional ◽

Powerful Approach ◽

Hadamard Transforms ◽

Topological Data

This paper introduces a simple yet powerful approach based on topological data analysis for detecting true steps in a periodic, piecewise constant (PWC) signal. The signal is a two-state square wave with randomly varying in-between-pulse spacing, subject to spurious steps at the rising or falling edges which we call digital ringing. We use persistent homology to derive mathematical guarantees for the resulting change detection which enables accurate identification and counting of the true pulses. The approach is tested using both synthetic and experimental data obtained using an engine lathe instrumented with a laser tachometer. The described algorithm enables accurate and automatic calculations of the spindle speed without any choice of parameters. The results are compared with the frequency and sequency methods of the Fourier and Walsh–Hadamard transforms, respectively. Both our approach and the Fourier analysis yield comparable results for pulses with regular spacing and digital ringing while the latter causes large errors using the Walsh–Hadamard method. Further, the described approach significantly outperforms the frequency/sequency analyses when the spacing between the peaks is varied. We discuss generalizing the approach to higher dimensional PWC signals, although using this extension remains an interesting question for future research.

Download Full-text

On the Application of Topological Data Analysis and Machine Learning to Flood Incidents, and Decision Making

SSRN Electronic Journal ◽

10.2139/ssrn.3981505 ◽

2021 ◽

Author(s):

Felix Obi Ohanuba ◽

Mohd Tahir Ismail ◽

Majid Khan Ali

Keyword(s):

Machine Learning ◽

Decision Making ◽

Data Analysis ◽

Topological Data Analysis ◽

Topological Data

Download Full-text

On Topological Data Analysis for SHM: An Introduction to Persistent Homology

10.1007/978-3-030-76004-5_20 ◽

2021 ◽

pp. 169-184

Author(s):

T. Gowdridge ◽

N. Dervilis ◽

K. Worden

Keyword(s):

Data Analysis ◽

Persistent Homology ◽

Topological Data Analysis ◽

Topological Data

Download Full-text

Topological Data Analysis as a Morphometric Method: Using Persistent Homology to Demarcate a Leaf Morphospace

Frontiers in Plant Science ◽

10.3389/fpls.2018.00553 ◽

2018 ◽

Vol 9 ◽

Cited By ~ 18

Author(s):

Mao Li ◽

Hong An ◽

Ruthie Angelovici ◽

Clement Bagaza ◽

Albert Batushansky ◽

...

Keyword(s):

Data Analysis ◽

Persistent Homology ◽

Topological Data Analysis ◽

Morphometric Method ◽

Topological Data

Download Full-text

Flow Estimation Only from Image Data, based on Persistent Homology

10.21203/rs.3.rs-330050/v1 ◽

2021 ◽

Author(s):

Anna Suzuki ◽

Miyuki Miyazawa ◽

James Minto ◽

Takeshi Tsuji ◽

Ippei Obayashi ◽

...

Keyword(s):

Data Analysis ◽

Persistent Homology ◽

Image Data ◽

Fracture Network ◽

Topological Data Analysis ◽

Geometric Information ◽

Flow Estimation ◽

Flow Phenomena ◽

Network Patterns ◽

Topological Data

Abstract Topological data analysis is an emerging concept of data analysis for characterizing shapes. A state-of-the-art tool in topological data analysis is persistent homology, which is expected to summarize quantified topological and geometric features. Although persistent homology is useful for revealing the topological and geometric information, it is difficult to interpret the parameters of persistent homology themselves and difficult to directly relate the parameters to physical properties. In this study, we focus on connectivity and apertures of flow channels detected from persistent homology analysis. We propose a method to estimate permeability in fracture networks from parameters of persistent homology. Synthetic 3D fracture network patterns and their direct flow simulations are used for the validation. The results suggest that the persistent homology can estimate fluid flow in fracture network based on the image data. This method can easily derive the flow phenomena based on the information of the structure.

Download Full-text

Using Topological Data Analysis (TDA) and Persistent Homology to Analyze the Stock Markets in Singapore and Taiwan

Frontiers in Physics ◽

10.3389/fphy.2021.572216 ◽

2021 ◽

Vol 9 ◽

Author(s):

Peter Tsung-Wen Yen ◽

Siew Ann Cheong

Keyword(s):

Data Analysis ◽

Stock Markets ◽

Persistent Homology ◽

Betti Numbers ◽

Stock Index ◽

Topological Data Analysis ◽

Series Data ◽

Topological Features ◽

Market Crashes ◽

Topological Data

In recent years, persistent homology (PH) and topological data analysis (TDA) have gained increasing attention in the fields of shape recognition, image analysis, data analysis, machine learning, computer vision, computational biology, brain functional networks, financial networks, haze detection, etc. In this article, we will focus on stock markets and demonstrate how TDA can be useful in this regard. We first explain signatures that can be detected using TDA, for three toy models of topological changes. We then showed how to go beyond network concepts like nodes (0-simplex) and links (1-simplex), and the standard minimal spanning tree or planar maximally filtered graph picture of the cross correlations in stock markets, to work with faces (2-simplex) or any k-dim simplex in TDA. By scanning through a full range of correlation thresholds in a procedure called filtration, we were able to examine robust topological features (i.e. less susceptible to random noise) in higher dimensions. To demonstrate the advantages of TDA, we collected time-series data from the Straits Times Index and Taiwan Capitalization Weighted Stock Index (TAIEX), and then computed barcodes, persistence diagrams, persistent entropy, the bottleneck distance, Betti numbers, and Euler characteristic. We found that during the periods of market crashes, the homology groups become less persistent as we vary the characteristic correlation. For both markets, we found consistent signatures associated with market crashes in the Betti numbers, Euler characteristics, and persistent entropy, in agreement with our theoretical expectations.

Download Full-text

Persistence Bag-of-Words for Topological Data Analysis

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/624 ◽

2019 ◽

Cited By ~ 1

Author(s):

Bartosz Zieliński ◽

Michał Lipiński ◽

Mateusz Juda ◽

Matthias Zeppelzauer ◽

Paweł Dłotko

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Mathematical Theory ◽

State Of The Art ◽

Persistent Homology ◽

Complex Structure ◽

Topological Data Analysis ◽

Bag Of Words ◽

Seamless Integration ◽

Alternative Approaches

Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches.

Download Full-text

Persistent homology approach distinguishes potential pattern between “Early” and “Not Early” hepatic decompensation groups using MRI modalities

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2021-2124 ◽

2021 ◽

Vol 7 (2) ◽

pp. 488-491

Author(s):

Yashbir Singh ◽

William Jons ◽

Gian Marco Conte ◽

Jaidip Jagtap ◽

Kuan Zhang ◽

...

Keyword(s):

Data Analysis ◽

Liver Failure ◽

Algebraic Topology ◽

Persistent Homology ◽

Topological Data Analysis ◽

Topological Representation ◽

Hepatic Decompensation ◽

Topological Data

Abstract Primary sclerosis cholangitis (PSC) predisposes individuals to liver failure, but it is challenging for radiologists examining radiologic images to predict which patients with PSC will ultimately develop liver failure. Motivated by algebraic topology, a topological data analysis - inspired framework was adopted in the study of the imaging pattern between the “Early Decompensation” and “Not Early” groups. The results demonstrate that the proposed methodology discriminates “Early Decompensation” and “Not Early” groups. Our study is the first attempt to provide a topological representation-based method into early hepatic decompensation and not early groups.

Download Full-text

Towards Personalized Diagnosis of Glioblastoma in Fluid-Attenuated Inversion Recovery (FLAIR) by Topological Interpretable Machine Learning

Mathematics ◽

10.3390/math8050770 ◽

2020 ◽

Vol 8 (5) ◽

pp. 770

Author(s):

Matteo Rucco ◽

Giovanna Viticchi ◽

Lorenzo Falsetti

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Tumor Growth ◽

Topological Data Analysis ◽

Textural Features ◽

The Third ◽

Interpretable Machine Learning ◽

Fluid Attenuated Inversion Recovery ◽

Topological Data

Glioblastoma multiforme (GBM) is a fast-growing and highly invasive brain tumor, which tends to occur in adults between the ages of 45 and 70 and it accounts for 52 percent of all primary brain tumors. Usually, GBMs are detected by magnetic resonance images (MRI). Among MRI, a fluid-attenuated inversion recovery (FLAIR) sequence produces high quality digital tumor representation. Fast computer-aided detection and segmentation techniques are needed for overcoming subjective medical doctors (MDs) judgment. This study has three main novelties for demonstrating the role of topological features as new set of radiomics features which can be used as pillars of a personalized diagnostic systems of GBM analysis from FLAIR. For the first time topological data analysis is used for analyzing GBM from three complementary perspectives—tumor growth at cell level, temporal evolution of GBM in follow-up period and eventually GBM detection. The second novelty is represented by the definition of a new Shannon-like topological entropy, the so-called Generator Entropy. The third novelty is the combination of topological and textural features for training automatic interpretable machine learning. These novelties are demonstrated by three numerical experiments. Topological Data Analysis of a simplified 2D tumor growth mathematical model had allowed to understand the bio-chemical conditions that facilitate tumor growth—the higher the concentration of chemical nutrients the more virulent the process. Topological data analysis was used for evaluating GBM temporal progression on FLAIR recorded within 90 days following treatment completion and at progression. The experiment had confirmed that persistent entropy is a viable statistics for monitoring GBM evolution during the follow-up period. In the third experiment we developed a novel methodology based on topological and textural features and automatic interpretable machine learning for automatic GBM classification on FLAIR. The algorithm reached a classification accuracy up to 97%.

Download Full-text