Active learning and relevance vector machine in efficient estimate of basin stability for large-scale dynamic networks

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC50 < 100 nM, and four unique cores with a predicted IC50 < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.

Download Full-text

Reaction-based Enumeration, Active Learning, and Free Energy Calculations to Rapidly Explore Synthetically Tractable Chemical Space and Optimize Potency of Cyclin Dependent Kinase 2 Inhibitors

10.26434/chemrxiv.7841270 ◽

2019 ◽

Author(s):

Kyle Konze ◽

Pieter Bos ◽

Markus Dahlgren ◽

Karl Leswing ◽

Ivan Tubert-Brohman ◽

...

Keyword(s):

Free Energy ◽

Drug Discovery ◽

Active Learning ◽

Large Scale ◽

Chemical Space ◽

Population Based ◽

Free Energy Calculations ◽

Computational Technique ◽

Cyclin Dependent Kinase ◽

Energy Calculations

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC50 < 100 nM, and four unique cores with a predicted IC50 < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.

Download Full-text

Routing in Large-scale Dynamic Networks

ACM Transactions on Internet Technology ◽

10.1145/3407192 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1-24

Author(s):

Weichao Gao ◽

James Nguyen ◽

Yalong Wu ◽

William G. Hatcher ◽

Wei Yu

Keyword(s):

Large Scale ◽

Dynamic Networks

Download Full-text

Active learning with uncertainty sampling for large scale activity recognition in smart homes

Journal of Ambient Intelligence and Smart Environments ◽

10.3233/ais-170427 ◽

2017 ◽

Vol 9 (2) ◽

pp. 209-223 ◽

Cited By ~ 6

Author(s):

Hande Alemdar ◽

T.L.M. van Kasteren ◽

Cem Ersoy

Keyword(s):

Active Learning ◽

Activity Recognition ◽

Large Scale ◽

Smart Homes ◽

Uncertainty Sampling

Download Full-text

Bayesian active learning of interatomic force field for molecular dynamics simulation of Pt/Ag(111)

10.26434/chemrxiv-2021-sk6lf-v2 ◽

2021 ◽

Author(s):

Kai Xu ◽

Lei Yan ◽

Bingran You

Keyword(s):

Molecular Dynamics ◽

Active Learning ◽

Force Field ◽

Density Functional ◽

Process Model ◽

Large Scale ◽

Computational Cost ◽

Dynamics Simulation ◽

Potential Energy Landscape ◽

Three Body

Force field is a central requirement in molecular dynamics (MD) simulation for accurate description of the potential energy landscape and the time evolution of individual atomic motions. Most energy models are limited by a fundamental tradeoff between accuracy and speed. Although ab initio MD based on density functional theory (DFT) has high accuracy, its high computational cost prevents its use for large-scale and long-timescale simulations. Here, we use Bayesian active learning to construct a Gaussian process model of interatomic forces to describe Pt deposited on Ag(111). An accurate model is obtained within one day of wall time after selecting only 126 atomic environments based on two- and three-body interactions, providing mean absolute errors of 52 and 142 meV/Å for Ag and Pt, respectively. Our work highlights automated and minimalistic training of machine-learning force fields with high fidelity to DFT, which would enable large-scale and long-timescale simulations of alloy surfaces at first-principles accuracy.

Download Full-text

Context-aware Distance Measures for Dynamic Networks

ACM Transactions on the Web ◽

10.1145/3476228 ◽

2022 ◽

Vol 16 (1) ◽

pp. 1-34

Author(s):

Yiji Zhao ◽

Youfang Lin ◽

Zhihao Wu ◽

Yang Wang ◽

Haomin Wen

Keyword(s):

Large Scale ◽

Dynamic Networks ◽

Distance Measures ◽

General Definition ◽

Mathematical Representation ◽

Network Clustering ◽

Context Aware ◽

Network Distance ◽

Practical Applications ◽

Spectral Distance

Dynamic networks are widely used in the social, physical, and biological sciences as a concise mathematical representation of the evolving interactions in dynamic complex systems. Measuring distances between network snapshots is important for analyzing and understanding evolution processes of dynamic systems. To the best of our knowledge, however, existing network distance measures are designed for static networks. Therefore, when measuring the distance between any two snapshots in dynamic networks, valuable context structure information existing in other snapshots is ignored. To guide the construction of context-aware distance measures, we propose a context-aware distance paradigm, which introduces context information to enrich the connotation of the general definition of network distance measures. A Context-aware Spectral Distance (CSD) is then given as an instance of the paradigm by constructing a context-aware spectral representation to replace the core component of traditional Spectral Distance (SD). In a node-aligned dynamic network, the context effectively helps CSD gain mainly advantages over SD as follows: (1) CSD is not affected by isospectral problems; (2) CSD satisfies all the requirements of a metric, while SD cannot; and (3) CSD is computationally efficient. In order to process large-scale networks, we develop a kCSD that computes top- k eigenvalues to further reduce the computational complexity of CSD. Although kCSD is a pseudo-metric, it retains most of the advantages of CSD. Experimental results in two practical applications, i.e., event detection and network clustering in dynamic networks, show that our context-aware spectral distance performs better than traditional spectral distance in terms of accuracy, stability, and computational efficiency. In addition, context-aware spectral distance outperforms other baseline methods.

Download Full-text

Combining Self-supervised Learning and Active Learning for Disfluency Detection

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3487290 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-25

Author(s):

Shaolei Wang ◽

Zhongyuan Wang ◽

Wanxiang Che ◽

Sendong Zhao ◽

Ting Liu

Keyword(s):

Neural Network ◽

Active Learning ◽

Supervised Learning ◽

Large Scale ◽

Training Data ◽

Fine Tuning ◽

Training Dataset ◽

Performance Gap ◽

Annotation Costs ◽

Trained Neural Network

Spoken language is fundamentally different from the written language in that it contains frequent disfluencies or parts of an utterance that are corrected by the speaker. Disfluency detection (removing these disfluencies) is desirable to clean the input for use in downstream NLP tasks. Most existing approaches to disfluency detection heavily rely on human-annotated data, which is scarce and expensive to obtain in practice. To tackle the training data bottleneck, in this work, we investigate methods for combining self-supervised learning and active learning for disfluency detection. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled data and propose two self-supervised pre-training tasks: (i) a tagging task to detect the added noisy words and (ii) sentence classification to distinguish original sentences from grammatically incorrect sentences. We then combine these two tasks to jointly pre-train a neural network. The pre-trained neural network is then fine-tuned using human-annotated disfluency detection training data. The self-supervised learning method can capture task-special knowledge for disfluency detection and achieve better performance when fine-tuning on a small annotated dataset compared to other supervised methods. However, limited in that the pseudo training data are generated based on simple heuristics and cannot fully cover all the disfluency patterns, there is still a performance gap compared to the supervised models trained on the full training dataset. We further explore how to bridge the performance gap by integrating active learning during the fine-tuning process. Active learning strives to reduce annotation costs by choosing the most critical examples to label and can address the weakness of self-supervised learning with a small annotated dataset. We show that by combining self-supervised learning with active learning, our model is able to match state-of-the-art performance with just about 10% of the original training data on both the commonly used English Switchboard test set and a set of in-house annotated Chinese data.

Download Full-text

Peer-to-Peer Service Discovery for Grid Computing

Grid and Cloud Computing ◽

10.4018/978-1-4666-0879-5.ch111 ◽

2012 ◽

pp. 232-259

Author(s):

Eddy Caron ◽

Frédéric Desprez ◽

Franck Petit ◽

Cédric Tedeschi

Keyword(s):

Fault Tolerance ◽

Load Balancing ◽

Service Discovery ◽

Large Scale ◽

Dynamic Networks ◽

Peer To Peer ◽

Indexing System ◽

Key Points ◽

Computing Platforms ◽

Traditional Approaches

Within distributed computing platforms, some computing abilities (or services) are offered to clients. To build dynamic applications using such services as basic blocks, a critical prerequisite is to discover those services. Traditional approaches to the service discovery problem have historically relied upon centralized solutions, unable to scale well in large unreliable platforms. In this chapter, we will first give an overview of the state of the art of service discovery solutions based on peer-to-peer (P2P) technologies that allow such a functionality to remain efficient at large scale. We then focus on one of these approaches: the Distributed Lexicographic Placement Table (DLPT) architecture, that provide particular mechanisms for load balancing and fault-tolerance. This solution centers around three key points. First, it calls upon an indexing system structured as a prefix tree, allowing multi-attribute range queries. Second, it allows the mapping of such structures onto heterogeneous and dynamic networks and proposes some load balancing heuristics for it. Third, as our target platform is dynamic and unreliable, we describe its powerful fault-tolerance mechanisms, based on self-stabilization. Finally, we present the software prototype of this architecture and its early experiments.

Download Full-text

A dynamical systems view of network centrality

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2013.0835 ◽

2014 ◽

Vol 470 (2165) ◽

pp. 20130835 ◽

Cited By ~ 19

Author(s):

Peter Grindrod ◽

Desmond J. Higham

Keyword(s):

Dynamical Systems ◽

Large Scale ◽

Dynamic Networks ◽

Network Evolution ◽

Mathematical Framework ◽

The Novel ◽

Matrix Logarithm ◽

The Matrix ◽

Systems View ◽

Logarithm Function

To gain insights about dynamic networks, the dominant paradigm is to study discrete snapshots , or timeslices , as the interactions evolve. Here, we develop and test a new mathematical framework where network evolution is handled over continuous time, giving an elegant dynamical systems representation for the important concept of node centrality. The resulting system allows us to track the relative influence of each individual. This new setting is natural in many digital applications, offering both conceptual and computational advantages. The novel differential equations approach is convenient for modelling and analysis of network evolution and gives rise to an interesting application of the matrix logarithm function. From a computational perspective, it avoids the awkward up-front compromises between accuracy, efficiency and redundancy required in the prevalent discrete-time setting. Instead, we can rely on state-of-the-art ODE software, where discretization takes place adaptively in response to the prevailing system dynamics. The new centrality system generalizes the widely used Katz measure, and allows us to identify and track, at any resolution, the most influential nodes in terms of broadcasting and receiving information through time-dependent links. In addition to the classical static network notion of attenuation across edges, the new ODE also allows for attenuation over time, as information becomes stale. This allows ‘running measures’ to be computed, so that networks can be monitored in real time over arbitrarily long intervals. With regard to computational efficiency, we explain why it is cheaper to track good receivers of information than good broadcasters. An important consequence is that the overall broadcast activity in the network can also be monitored efficiently. We use two synthetic examples to validate the relevance of the new measures. We then illustrate the ideas on a large-scale voice call network, where key features are discovered that are not evident from snapshots or aggregates.

Download Full-text

Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2013.121 ◽

2014 ◽

Vol 36 (2) ◽

pp. 276-288 ◽

Cited By ~ 10

Author(s):

Sudheendra Vijayanarasimhan ◽

Prateek Jain ◽

Kristen Grauman

Keyword(s):

Active Learning ◽

Large Scale

Download Full-text