Optimization of Spectral Wavelets for Persistence-Based Graph Classification

A graph's spectral wavelet signature determines a filtration, and consequently an associated set of extended persistence diagrams. We propose a framework that optimizes the choice of wavelet for a dataset of graphs, such that their associated persistence diagrams capture features of the graphs that are best suited to a given data science problem. Since the spectral wavelet signature of a graph is derived from its Laplacian, our framework encodes geometric properties of graphs in their associated persistence diagrams and can be applied to graphs without a priori node attributes. We apply our framework to graph classification problems and obtain performances competitive with other persistence-based architectures. To provide the underlying theoretical foundations, we extend the differentiability result for ordinary persistent homology to extended persistent homology.

Download Full-text

A Persistent Homology Approach to Heart Rate Variability Analysis With an Application to Sleep-Wake Classification

Frontiers in Physiology ◽

10.3389/fphys.2021.637684 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yu-Min Chung ◽

Chuan-Shen Hu ◽

Yu-Lun Lo ◽

Hau-Tieng Wu

Keyword(s):

Heart Rate ◽

Time Series ◽

Heart Rate Variability ◽

Algebraic Topology ◽

Persistent Homology ◽

Heart Rate Variability Analysis ◽

Analysis Tool ◽

Classification Problems ◽

Hrv Analysis ◽

Persistence Diagrams

Persistent homology is a recently developed theory in the field of algebraic topology to study shapes of datasets. It is an effective data analysis tool that is robust to noise and has been widely applied. We demonstrate a general pipeline to apply persistent homology to study time series, particularly the instantaneous heart rate time series for the heart rate variability (HRV) analysis. The first step is capturing the shapes of time series from two different aspects—the persistent homologies and hence persistence diagrams of its sub-level set and Taken's lag map. Second, we propose a systematic and computationally efficient approach to summarize persistence diagrams, which we coined persistence statistics. To demonstrate our proposed method, we apply these tools to the HRV analysis and the sleep-wake, REM-NREM (rapid eyeball movement and non rapid eyeball movement) and sleep-REM-NREM classification problems. The proposed algorithm is evaluated on three different datasets via the cross-database validation scheme. The performance of our approach is better than the state-of-the-art algorithms, and the result is consistent throughout different datasets.

Download Full-text

Coloring Graph Neural Networks for Node Disambiguation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/294 ◽

2020 ◽

Author(s):

George Dasoulas ◽

Ludovic Dos Santos ◽

Kevin Scaman ◽

Aladin Virmaux

Keyword(s):

Neural Networks ◽

Message Passing ◽

State Of The Art ◽

Structural Characteristics ◽

Expressive Power ◽

Continuous Functions ◽

Graph Classification ◽

Node Attributes ◽

Graph Neural Networks ◽

Coloring Graph

In this paper, we show that a simple coloring scheme can improve, both theoretically and empirically, the expressive power of Message Passing Neural Networks (MPNNs). More specifically, we introduce a graph neural network called Colored Local Iterative Procedure (CLIP) that uses colors to disambiguate identical node attributes, and show that this representation is a universal approximator of continuous functions on graphs with node attributes. Our method relies on separability, a key topological characteristic that allows to extend well-chosen neural networks into universal representations. Finally, we show experimentally that CLIP is capable of capturing structural characteristics that traditional MPNNs fail to distinguish, while being state-of-the-art on benchmark graph classification datasets.

Download Full-text

Persistent Homology in Data Science

Data Science – Analytics and Applications ◽

10.1007/978-3-658-32182-6_13 ◽

2021 ◽

pp. 81-88

Author(s):

Stefan Huber

Keyword(s):

Data Science ◽

Persistent Homology

Download Full-text

A procedure for meaningful unsupervised clustering and its application for solvent classification

Open Chemistry ◽

10.2478/s11532-014-0514-6 ◽

2014 ◽

Vol 12 (5) ◽

pp. 594-603 ◽

Cited By ~ 2

Author(s):

Yaroslava Pushkarova ◽

Yuriy Kholin

Keyword(s):

Neural Networks ◽

A Priori ◽

Initial Distribution ◽

C60 Fullerene ◽

Classification Problems ◽

Probabilistic Networks ◽

Reliable Classification ◽

Priori Information ◽

Number Of Classes

AbstractArtificial neural networks have proven to be a powerful tool for solving classification problems. Some difficulties still need to be overcome for their successful application to chemical data. The use of supervised neural networks implies the initial distribution of patterns between the pre-determined classes, while attribution of objects to the classes may be uncertain. Unsupervised neural networks are free from this problem, but do not always reveal the real structure of data. Classification algorithms which do not require a priori information about the distribution of patterns between the pre-determined classes and provide meaningful results are of special interest. This paper presents an approach based on the combination of Kohonen and probabilistic networks which enables the determination of the number of classes and the reliable classification of objects. This is illustrated for a set of 76 solvents based on nine characteristics. The resulting classification is chemically interpretable. The approach proved to be also applicable in a different field, namely in examining the solubility of C60 fullerene. The solvents belonging to the same group demonstrate similar abilities to dissolve C60. This makes it possible to estimate the solubility of fullerenes in solvents for which there are no experimental data

Download Full-text

Evaluating recommender systems for AI-driven biomedical informatics

Bioinformatics ◽

10.1093/bioinformatics/btaa698 ◽

2020 ◽

Cited By ~ 1

Author(s):

William La Cava ◽

Heather Williams ◽

Weixuan Fu ◽

Steve Vitale ◽

Durga Srivatsan ◽

...

Keyword(s):

Septic Shock ◽

Recommender Systems ◽

Data Science ◽

Expert Knowledge ◽

State Of The Art ◽

Optimal Algorithm ◽

Supplementary Information ◽

Biomedical Data ◽

Classification Problems ◽

Systems Research

Abstract Motivation Many researchers with domain expertise are unable to easily apply machine learning (ML) to their bioinformatics data due to a lack of ML and/or coding expertise. Methods that have been proposed thus far to automate ML mostly require programming experience as well as expert knowledge to tune and apply the algorithms correctly. Here, we study a method of automating biomedical data science using a web-based AI platform to recommend model choices and conduct experiments. We have two goals in mind: first, to make it easy to construct sophisticated models of biomedical processes; and second, to provide a fully automated AI agent that can choose and conduct promising experiments for the user, based on the user’s experiments as well as prior knowledge. To validate this framework, we conduct an experiment on 165 classification problems, comparing to state-of-the-art, automated approaches. Finally, we use this tool to develop predictive models of septic shock in critical care patients. Results We find that matrix factorization-based recommendation systems outperform metalearning methods for automating ML. This result mirrors the results of earlier recommender systems research in other domains. The proposed AI is competitive with state-of-the-art automated ML methods in terms of choosing optimal algorithm configurations for datasets. In our application to prediction of septic shock, the AI-driven analysis produces a competent ML model (AUROC 0.85±0.02) that performs on par with state-of-the-art deep learning results for this task, with much less computational effort. Availability and implementation PennAI is available free of charge and open-source. It is distributed under the GNU public license (GPL) version 3. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Constructing Cost-Sensitive Fuzzy-Rule-Based Systems for Pattern Classification Problems

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2007.p0546 ◽

2007 ◽

Vol 11 (6) ◽

pp. 546-553 ◽

Cited By ~ 4

Author(s):

Tomoharu Nakashima ◽

◽

Yasuyuki Yokota ◽

Hisao Ishibuchi ◽

Gerald Schaefer ◽

...

Keyword(s):

Pattern Classification ◽

A Priori ◽

Fuzzy Rule ◽

Fuzzy Classification ◽

Classification Error ◽

Rule Generation ◽

Classification Problems ◽

Misclassification Cost ◽

Rule Based ◽

Rule Based Systems

We evaluate the performance of cost-sensitive fuzzy-rule-based systems for pattern classification problems. We assume that a misclassification cost is given a priori for each training pattern. The task of classification thus becomes to minimize both classification error and misclassification cost. We examine the performance of two types of fuzzy classification based on fuzzy if-then rules generated from training patterns. The difference is whether or not they consider misclassification costs in rule generation. In our computational experiments, we use several specifications of misclassification cost to evaluate the performance of the two classifiers. Experimental results show that both classification error and misclassification cost are reduced by considering the misclassification cost in fuzzy rule generation.

Download Full-text

Approximate conversion of Bézier curves

Bulletin of the Australian Mathematical Society ◽

10.1017/s0004972700013988 ◽

1995 ◽

Vol 51 (1) ◽

pp. 153-162 ◽

Cited By ~ 2

Author(s):

Yungeom Park ◽

U Jin Choi ◽

Ha-Jine Kimn

Keyword(s):

Error Analysis ◽

A Priori ◽

Approximation Error ◽

Bézier Curve ◽

Bezier Curve ◽

A Priori Bounds ◽

Bezier Curves ◽

Bézier Curves ◽

Geometric Properties

The methods for generating a polynomial Bézier approximation of degree n − 1 to an nth degree Bézier curve, and error analysis, are presented. The methods are based on observations of the geometric properties of Bézier curves. The approximation agrees at the two endpoints up to a preselected smoothness order. The methods allow a detailed error analysis, providing a priori bounds of the point-wise approximation error. The error analysis for other authors’ methods is also presented.

Download Full-text

Integration and modularity in Procrustes shape data: is there a risk of spurious results?

10.1101/371187 ◽

2018 ◽

Author(s):

Andrea Cardini

Keyword(s):

A Priori ◽

Relative Size ◽

Evolutionary Developmental Biology ◽

Morphological Integration ◽

Future Research ◽

Anatomical Landmarks ◽

Spurious Results ◽

Theoretical Foundations ◽

Shape Data ◽

3D Surfaces

AbstractStudies of morphological integration and modularity are a hot topic in evolutionary developmental biology. Geometric morphometrics using Procrustes methods offers powerful tools to quantitatively investigate morphological variation and, within this methodological framework, a number of different methods has been put forward to test if different regions within an anatomical structure behave like modules or,vice versa, are highly integrated and covary strongly. Although some exploratory techniques do not requirea priorimodules, commonly modules are specified in advance based on prior knowledge. Once this is done, most of the methods can be applied either by subdividing modules and performing separate Procrustes alignments or by splitting shape coordinates of anatomical landmarks into modules after a common superimposition. This second approach is particularly interesting because, contrary to completely separate blocks analyses, it preserves information on relative size and position of the putative modules. However, it also violates one of the fundamental assumptions on which Procrustes methods are based, which is that one should not analyse or interpret subsets of landmarks from a common superimposition, because the choice of that superimposition is purely based on statistical convenience (although with sound theoretical foundations) and not on a biological model of variance and covariance. In this study, I offer a first investigation of the effects of testing integration and modularity within a configuration of commonly superimposed landmarks using some of the most widely employed statistical methods available to this aim. When applied to simulated shapes with random non-modular isotropic variation, standard methods frequently recovered significant but arbitrary patterns of integration and modularity. Re-superimposing landmarks within each module, before testing integration or modularity, generally removes this artifact. The study, although preliminary and exploratory in nature, raises an important issue and indicates an avenue for future research. It also suggests that great caution should be exercised in the application and interpretation of findings from analyses of modularity and integration using Procrustes shape data, and that issues might be even more serious using some of the most common methods for handling the increasing popular semilandmark data used to analyse 2D outlines and 3D surfaces.

Download Full-text

Persistent Homology for Virtual Screening

10.26434/chemrxiv.6969260 ◽

2018 ◽

Author(s):

Bryn Keller ◽

Michael Lesnick ◽

Theodore L. Willke

Keyword(s):

Virtual Screening ◽

Open Source Software ◽

Data Science ◽

Persistent Homology ◽

General Purpose ◽

Pharmaceutical Companies ◽

Domain Specific ◽

Current State ◽

Chemical Signatures ◽

Simple Question

<div> <div> <div> <p>Finding new medicines is one of the most important tasks of pharmaceutical companies. One of the best approaches to finding a new drug starts with answering this simple question: Given a known effective drug X, what are the top 100 molecules in our database most similar to X? Thus the essence of the problem is a nearest-neighbors search, and the key question is how to define the distance between two molecules in the database. In this paper, we investigate the use of topological, rather than geometric, or chemical, signatures for molecules, and two notions of distance that come from comparing these topological signatures. We introduce PH_VS (Persistent Homology for Virtual Screening), a new system for ligand-based screening using a topological technique known as multi-parameter persistent homology. We show that our approach can match or exceed a reasonable estimate of current state of the art (including well-funded commercial tools), even with relatively little domain-specific tuning. Indeed, most of the components we have built for this system are general-purpose tools for data science and will be released soon as open source software. </p> </div> </div> </div>

Download Full-text

Topological Data Analysis with Applications

10.1017/9781108975704 ◽

2021 ◽

Author(s):

Gunnar Carlsson ◽

Mikael Vejdemo-Johansson

Keyword(s):

Data Analysis ◽

Data Science ◽

Persistent Homology ◽

Topological Data Analysis ◽

Data Sets ◽

Advanced Learners ◽

Computer Scientists ◽

Dramatic Rise ◽

Modeling Data ◽

Topological Data

The continued and dramatic rise in the size of data sets has meant that new methods are required to model and analyze them. This timely account introduces topological data analysis (TDA), a method for modeling data by geometric objects, namely graphs and their higher-dimensional versions: simplicial complexes. The authors outline the necessary background material on topology and data philosophy for newcomers, while more complex concepts are highlighted for advanced learners. The book covers all the main TDA techniques, including persistent homology, cohomology, and Mapper. The final section focuses on the diverse applications of TDA, examining a number of case studies drawn from monitoring the progression of infectious diseases to the study of motion capture data. Mathematicians moving into data science, as well as data scientists or computer scientists seeking to understand this new area, will appreciate this self-contained resource which explains the underlying technology and how it can be used.

Download Full-text