Angel: a new large-scale machine learning system

Abstract Machine Learning (ML) techniques now are ubiquitous tools to extract structural information from data collections. With the increasing volume of data, large-scale ML applications require an efficient implementation to accelerate the performance. Existing systems parallelize algorithms through either data parallelism or model parallelism. But data parallelism cannot obtain good statistical efficiency due to the conflicting updates to parameters while the performance is damaged by global barriers in model parallel methods. In this paper, we propose a new system, named Angel, to facilitate the development of large-scale ML applications in production environment. By allowing concurrent updates to model across different groups and scheduling the updates in each group, Angel can achieve a good balance between hardware efficiency and statistical efficiency. Besides, Angel reduces the network latency by overlapping the parameter pulling and update computing and also utilizes the sparseness of data to avoid the pulling of unnecessary parameters. We also enhance the usability of Angel by providing a set of efficient tools to integrate with application pipelines and provisioning efficient fault tolerance mechanisms. We conduct extensive experiments to demonstrate the superiority of Angel.

Download Full-text

A billion synthetic 3D-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction

10.1101/2021.07.06.451258 ◽

2021 ◽

Author(s):

Philippe Auguste Robert ◽

Rahmad Akbar ◽

Robert Frank ◽

Milena Pavlović ◽

Michael Widrich ◽

...

Keyword(s):

Machine Learning ◽

In Silico ◽

Prediction Accuracy ◽

Large Scale ◽

Structural Information ◽

Antigen Binding ◽

Antibody Specificity ◽

Binding Prediction ◽

Information Encoding ◽

Prediction Problems

Machine learning (ML) is a key technology to enable accurate prediction of antibody-antigen binding, a prerequisite for in silico vaccine and antibody design. Two orthogonal problems hinder the current application of ML to antibody-specificity prediction and the benchmarking thereof: (i) The lack of a unified formalized mapping of immunological antibody specificity prediction problems into ML notation and (ii) the unavailability of large-scale training datasets. Here, we developed the Absolut! software suite that allows the parameter-based unconstrained generation of synthetic lattice-based 3D-antibody-antigen binding structures with ground-truth access to conformational paratope, epitope, and affinity. We show that Absolut!-generated datasets recapitulate critical biological sequence and structural features that render antibody-antigen binding prediction challenging. To demonstrate the immediate, high-throughput, and large-scale applicability of Absolut!, we have created an online database of 1 billion antibody-antigen structures, the extension of which is only constrained by moderate computational resources. We translated immunological antibody specificity prediction problems into ML tasks and used our database to investigate paratope-epitope binding prediction accuracy as a function of structural information encoding, dataset size, and ML method, which is unfeasible with existing experimental data. Furthermore, we found that in silico investigated conditions, predicted to increase antibody specificity prediction accuracy, align with and extend conclusions drawn from experimental antibody-antigen structural data. In summary, the Absolut! framework enables the development and benchmarking of ML strategies for biotherapeutics discovery and design.

Download Full-text

A Deep and Scalable Unsupervised Machine Learning System for Cyber-Attack Detection in Large-Scale Smart Grids

IEEE Access ◽

10.1109/access.2019.2920326 ◽

2019 ◽

Vol 7 ◽

pp. 80778-80788 ◽

Cited By ~ 49

Author(s):

Hadis Karimipour ◽

Ali Dehghantanha ◽

Reza M. Parizi ◽

Kim-Kwang Raymond Choo ◽

Henry Leung

Keyword(s):

Machine Learning ◽

Smart Grids ◽

Large Scale ◽

Attack Detection ◽

Learning System ◽

Cyber Attack ◽

Unsupervised Machine Learning

Download Full-text

A large scale machine learning system for recommending heterogeneous content in social networks

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR '11 ◽

10.1145/2009916.2010189 ◽

2011 ◽

Author(s):

Yanxin Shi ◽

David Ye ◽

Andrey Goder ◽

Srinivas Narayanan

Keyword(s):

Machine Learning ◽

Social Networks ◽

Large Scale ◽

Learning System

Download Full-text

SKICAT: A Machine Learning System for Automated Cataloging of Large Scale Sky Surveys

Machine Learning Proceedings 1993 ◽

10.1016/b978-1-55860-307-3.50021-6 ◽

1993 ◽

pp. 112-119 ◽

Cited By ~ 13

Author(s):

Usama M. Fayyad ◽

Nicholas Weir ◽

S. Djorgovski

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning System ◽

Sky Surveys

Download Full-text

Predicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning Approach

Applied Sciences ◽

10.3390/app11188293 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8293

Author(s):

Federico Gargiulo ◽

Dirk Duellmann ◽

Pasquale Arpaia ◽

Rosario Schiano Lo Moriello

Keyword(s):

Machine Learning ◽

Large Scale ◽

Dominant Role ◽

Storage System ◽

Supervised Machine Learning ◽

Large Set ◽

Production Environment ◽

Hard Drives ◽

Computer Center ◽

Computer Centers

Today, cloud systems provide many key services to development and production environments; reliable storage services are crucial for a multitude of applications ranging from commercial manufacturing, distribution and sales up to scientific research, which is often at the forefront of computing resource demands. In large-scale computer centers, the storage system requires particular attention and investment; usually, a large number of diverse storage devices need to be deployed in order to match the varying performance and volume requirements of changing user applications. As of today, magnetic drives still play a dominant role in terms of deployed storage volume and of service outages due to device failure. In this paper, we study methods to facilitate automated proactive disk replacement. We propose a method to identify disks with media failures in a production environment and describe an application of supervised machine learning to predict disk failures. In particular, a proper stage to automatically label (healthy/at-risk) the disks during the training and validation stage is presented along with tuning strategy to optimize the hyperparameters of the associated machine learning classifier. The approach is trained and validated against a large set of 65,000 hard drives in the CERN computer center, and the achieved results are discussed.

Download Full-text

Large-Scale Data Learning Method for Anomaly Detection using Machine Learning for Monitoring Vibration in Vehicle Equipment

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.140.480 ◽

2020 ◽

Vol 140 (6) ◽

pp. 480-487

Author(s):

Minoru Kondo

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Large Scale ◽

Learning Method ◽

Large Scale Data ◽

Scale Data

Download Full-text

Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning

10.1561/9781680837056 ◽

2020 ◽

Author(s):

Songze Li ◽

Salman Avestimehr

Keyword(s):

Machine Learning ◽

Distributed Computing ◽

Large Scale

Download Full-text

Evolution of Metastable Structures in Bimetallic Catalysts from Microscopy and Machine-Learning Molecular Dynamics

10.26434/chemrxiv.11811660.v1 ◽

2020 ◽

Author(s):

Jin Soo Lim ◽

Jonathan Vandermause ◽

Matthijs A. van Spronsen ◽

Albert Musaelian ◽

Christopher R. O’Connor ◽

...

Keyword(s):

Machine Learning ◽

Molecular Dynamics ◽

Large Scale ◽

Materials Science ◽

Complete Characterization ◽

Layer By Layer ◽

Surface Restructuring ◽

Metastable Structures ◽

Mechanistic Investigation ◽

Underlying Mechanisms

Restructuring of interface plays a crucial role in materials science and heterogeneous catalysis. Bimetallic systems, in particular, often adopt very different composition and morphology at surfaces compared to the bulk. For the first time, we reveal a detailed atomistic picture of the long-timescale restructuring of Pd deposited on Ag, using microscopy, spectroscopy, and novel simulation methods. Encapsulation of Pd by Ag always precedes layer-by-layer dissolution of Pd, resulting in significant Ag migration out of the surface and extensive vacancy pits. These metastable structures are of vital catalytic importance, as Ag-encapsulated Pd remains much more accessible to reactants than bulk-dissolved Pd. The underlying mechanisms are uncovered by performing fast and large-scale machine-learning molecular dynamics, followed by our newly developed method for complete characterization of atomic surface restructuring events. Our approach is broadly applicable to other multimetallic systems of interest and enables the previously impractical mechanistic investigation of restructuring dynamics.

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text