Federated Learning Versus Classical Machine Learning: A Convergence Comparison

In the past few decades, machine learning has revolutionized data processing for large scale applications. Simultaneously , increasing privacy threats in trending applications led to the redesign of classical data training models. In particular, classical machine learning involves centralized data training, where the data is gathered, and the entire training process executes at the central server. Despite significant convergence, this training involves several privacy threats on participants’ data when shared with the central cloud server. To this end, federated learning has achieved significant importance over distributed data training. In particular, the federated learning allows participants to collaboratively train the local models on local data without revealing their sensitive information to the central cloud server. In this paper, we perform a convergence comparison between classical machine learning and federated learning on two publicly available datasets, namely, logistic-regression-MNIST dataset and image-classification-CIFAR-10 dataset. The simulation results demonstrate that federated learning achieves higher convergence within limited communication rounds while maintaining participants’ anonymity. We hope that this research will show the benefits and help federated learning to be implemented widely.

Download Full-text

Recent Progress in Machine Learning-based Prediction of Peptide Activity for Drug Discovery

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666190122151634 ◽

2019 ◽

Vol 19 (1) ◽

pp. 4-16 ◽

Cited By ~ 6

Author(s):

Qihui Wu ◽

Hanzhong Ke ◽

Dongli Li ◽

Qi Wang ◽

Jiansong Fang ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Large Scale ◽

Recent Progress ◽

High Specificity ◽

Learning Approaches ◽

Anticancer Peptides ◽

The Past ◽

Traditional Approaches ◽

Large Scale Screening

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.

Download Full-text

Wind Energy Forecast Conditioned on Großwetterlage (large scale weather situation)

10.5194/egusphere-egu21-13132 ◽

2021 ◽

Author(s):

Greta Denisenko ◽

Markus Abel ◽

Detlef Siebert ◽

Paul Seidler ◽

Thomas Seidler

Keyword(s):

Machine Learning ◽

Probability Distribution ◽

Large Scale ◽

Quantitative Measure ◽

Absolute Error ◽

Challenging Problem ◽

The Past ◽

Weather Situation ◽

Future Work ◽

Energy Forecast

Obtaining a quantitative measure for the uncertainty of forecasts for renewable energy has proven to be a challenging problem in the past. We present results on predicting uncertainty of a forecast conditioned on the large weather situation (Gro&#223;wetterlage). As a first attempt, we use the objective weather classification by the German Meteorological Service (DWD), which sorts the weather into 40 situations based on wind direction, cyclonality and moisture in the atmosphere.The considered forecasts concern the day-ahead production of solar power for two exemplary solar parks. To quantify the uncertainty, we define five different metrics (based on normalized absolute error and probability distribution), where each one is trained individually using machine learning. As a result, we obtain measures for over- and underprediction conditioned on the said Gro&#223;wetterlage.We consider this to be a very promising yet accessible approach to derive a quantitative measure for uncertainties based on the current, day-to-day weather situation. Future work may concern an improvement of the Gro&#223;wetterlagencharacterization and a general, probabilistic formulation of the problem, e.g. using Bayesian inference.

Download Full-text

Outsourcing Computing of Large Matrix Jordan Decomposition

Mathematical Problems in Engineering ◽

10.1155/2019/6410626 ◽

2019 ◽

Vol 2019 ◽

pp. 1-7

Author(s):

Hongfeng Wu ◽

Jingjing Yan

Keyword(s):

Large Scale ◽

Security Analysis ◽

Sensitive Information ◽

Jordan Decomposition ◽

Verification Algorithm ◽

Resource Limited ◽

Large Matrix ◽

Cloud Server ◽

Scale Matrix ◽

Efficient Verification

The Jordan decomposition of matrix is a typical scientific and engineering computational task, but such computation involves enormous computing resources for large matrices, which is burdensome for the resource-limited clients. Cloud computing enables computational resource-limited clients to economically outsource such problems to the cloud server. However, outsourcing Jordan decomposition of large-scale matrix to the cloud brings great security concerns and challenges since the matrices usually contain sensitive information. In this paper, we present a secure, verifiable, efficient, and privacy preserving algorithm for outsourcing Jordan decomposition of large-scale matrix. Security analysis shows that our algorithm is practically secure. Efficient verification algorithm is used to verify the results returned from the cloud.

Download Full-text

A Defense Framework for Privacy Risks in Remote Machine Learning Service

Security and Communication Networks ◽

10.1155/2021/9924684 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Yang Bai ◽

Yu Li ◽

Mingchuang Xie ◽

Mingyu Fan

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Original Data ◽

Privacy Preserving ◽

Training Data ◽

Sensitive Information ◽

Learning Approaches ◽

Local Data ◽

Sensitive Data ◽

Privacy Risks

In recent years, machine learning approaches have been widely adopted for many applications, including classification. Machine learning models deal with collective sensitive data usually trained in a remote public cloud server, for instance, machine learning as a service (MLaaS) system. In this scene, users upload their local data and utilize the computation capability to train models, or users directly access models trained by MLaaS. Unfortunately, recent works reveal that the curious server (that trains the model with users’ sensitive local data and is curious to know the information about individuals) and the malicious MLaaS user (who abused to query from the MLaaS system) will cause privacy risks. The adversarial method as one of typical mitigation has been studied by several recent works. However, most of them focus on the privacy-preserving against the malicious user; in other words, they commonly consider the data owner and the model provider as one role. Under this assumption, the privacy leakage risks from the curious server are neglected. Differential privacy methods can defend against privacy threats from both the curious sever and the malicious MLaaS user by directly adding noise to the training data. Nonetheless, the differential privacy method will decrease the classification accuracy of the target model heavily. In this work, we propose a generic privacy-preserving framework based on the adversarial method to defend both the curious server and the malicious MLaaS user. The framework can adapt with several adversarial algorithms to generate adversarial examples directly with data owners’ original data. By doing so, sensitive information about the original data is hidden. Then, we explore the constraint conditions of this framework which help us to find the balance between privacy protection and the model utility. The experiments’ results show that our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive data from direct content exposed attacks. In addition, our method can achieve better privacy and utility balance compared to the existing method.

Download Full-text

VADAF: Visualization for Abnormal Client Detection and Analysis in Federated Learning

ACM Transactions on Interactive Intelligent Systems ◽

10.1145/3426866 ◽

2021 ◽

Vol 11 (3-4) ◽

pp. 1-23

Author(s):

Linhao Meng ◽

Yating Wei ◽

Rusheng Pan ◽

Shuyue Zhou ◽

Jianwei Zhang ◽

...

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Visual Analytics ◽

Detection Method ◽

Training Process ◽

Local Data ◽

Privacy And Security ◽

Potential Client ◽

Distributed Machine Learning ◽

Large Corpus

Federated Learning (FL) provides a powerful solution to distributed machine learning on a large corpus of decentralized data. It ensures privacy and security by performing computation on devices (which we refer to as clients) based on local data to improve the shared global model. However, the inaccessibility of the data and the invisibility of the computation make it challenging to interpret and analyze the training process, especially to distinguish potential client anomalies. Identifying these anomalies can help experts diagnose and improve FL models. For this reason, we propose a visual analytics system, VADAF, to depict the training dynamics and facilitate analyzing potential client anomalies. Specifically, we design a visualization scheme that supports massive training dynamics in the FL environment. Moreover, we introduce an anomaly detection method to detect potential client anomalies, which are further analyzed based on both the client model’s visual and objective estimation. Three case studies have demonstrated the effectiveness of our system in understanding the FL training process and supporting abnormal client detection and analysis.

Download Full-text

Formal semantics and high performance in declarative machine learning using Datalog

The VLDB Journal ◽

10.1007/s00778-021-00665-6 ◽

2021 ◽

Author(s):

Jin Wang ◽

Jiacheng Wu ◽

Mingda Li ◽

Jiaqi Gu ◽

Ariyam Das ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Large Scale ◽

Formal Semantics ◽

Distributed Data ◽

Recursive Programs ◽

Diverse Application ◽

User Friendly ◽

Performance Gains ◽

New Framework

AbstractWith an escalating arms race to adopt machine learning (ML) in diverse application domains, there is an urgent need to support declarative machine learning over distributed data platforms. Toward this goal, a new framework is needed where users can specify ML tasks in a manner where programming is decoupled from the underlying algorithmic and system concerns. In this paper, we argue that declarative abstractions based on Datalog are natural fits for machine learning and propose a purely declarative ML framework with a Datalog query interface. We show that using aggregates in recursive Datalog programs entails a concise expression of ML applications, while providing a strictly declarative formal semantics. This is achieved by introducing simple conditions under which the semantics of recursive programs is guaranteed to be equivalent to that of aggregate-stratified ones. We further provide specialized compilation and planning techniques for semi-naive fixpoint computation in the presence of aggregates and optimization strategies that are effective on diverse recursive programs and distributed data platforms. To test and demonstrate these research advances, we have developed a powerful and user-friendly system on top of Apache Spark. Extensive evaluations on large-scale datasets illustrate that this approach will achieve promising performance gains while improving both programming flexibility and ease of development and deployment for ML applications.

Download Full-text

Large-Scale Data Learning Method for Anomaly Detection using Machine Learning for Monitoring Vibration in Vehicle Equipment

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.140.480 ◽

2020 ◽

Vol 140 (6) ◽

pp. 480-487

Author(s):

Minoru Kondo

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Large Scale ◽

Learning Method ◽

Large Scale Data ◽

Scale Data

Download Full-text

Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning

10.1561/9781680837056 ◽

2020 ◽

Author(s):

Songze Li ◽

Salman Avestimehr

Keyword(s):

Machine Learning ◽

Distributed Computing ◽

Large Scale

Download Full-text

Evolution of Metastable Structures in Bimetallic Catalysts from Microscopy and Machine-Learning Molecular Dynamics

10.26434/chemrxiv.11811660.v1 ◽

2020 ◽

Author(s):

Jin Soo Lim ◽

Jonathan Vandermause ◽

Matthijs A. van Spronsen ◽

Albert Musaelian ◽

Christopher R. O’Connor ◽

...

Keyword(s):

Machine Learning ◽

Molecular Dynamics ◽

Large Scale ◽

Materials Science ◽

Complete Characterization ◽

Layer By Layer ◽

Surface Restructuring ◽

Metastable Structures ◽

Mechanistic Investigation ◽

Underlying Mechanisms

Restructuring of interface plays a crucial role in materials science and heterogeneous catalysis. Bimetallic systems, in particular, often adopt very different composition and morphology at surfaces compared to the bulk. For the first time, we reveal a detailed atomistic picture of the long-timescale restructuring of Pd deposited on Ag, using microscopy, spectroscopy, and novel simulation methods. Encapsulation of Pd by Ag always precedes layer-by-layer dissolution of Pd, resulting in significant Ag migration out of the surface and extensive vacancy pits. These metastable structures are of vital catalytic importance, as Ag-encapsulated Pd remains much more accessible to reactants than bulk-dissolved Pd. The underlying mechanisms are uncovered by performing fast and large-scale machine-learning molecular dynamics, followed by our newly developed method for complete characterization of atomic surface restructuring events. Our approach is broadly applicable to other multimetallic systems of interest and enables the previously impractical mechanistic investigation of restructuring dynamics.

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.

Download Full-text