Resource Management and Model Personalization for Federated Learning over Wireless Edge Networks

Client and Internet of Things devices are increasingly equipped with the ability to sense, process, and communicate data with high efficiency. This is resulting in a major shift in machine learning (ML) computation at the network edge. Distributed learning approaches such as federated learning that move ML training to end devices have emerged, promising lower latency and bandwidth costs and enhanced privacy of end users’ data. However, new challenges that arise from the heterogeneous nature of the devices’ communication rates, compute capabilities, and the limited observability of the training data at each device must be addressed. All these factors can significantly affect the training performance in terms of overall accuracy, model fairness, and convergence time. We present compute-communication and data importance-aware resource management schemes optimizing these metrics and evaluate the training performance on benchmark datasets. We also develop a federated meta-learning solution, based on task similarity, that serves as a sample efficient initialization for federated learning, as well as improves model personalization and generalization across non-IID (independent, identically distributed) data. We present experimental results on benchmark federated learning datasets to highlight the performance gains of the proposed methods in comparison to the well-known federated averaging algorithm and its variants.

Download Full-text

Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/152 ◽

2020 ◽

Author(s):

Yaohui Zhu ◽

Chenlong Liu ◽

Shuqiang Jiang

Keyword(s):

Image Recognition ◽

Feature Learning ◽

Learning Approaches ◽

Fine Grained ◽

Meta Learning ◽

Benchmark Datasets ◽

Gradient Based ◽

General Object ◽

Base Learner ◽

Discriminative Parts

The goal of few-shot image recognition is to distinguish different categories with only one or a few training samples. Previous works of few-shot learning mainly work on general object images. And current solutions usually learn a global image representation from training tasks to adapt novel tasks. However, fine-gained categories are distinguished by subtle and local parts, which could not be captured by global representations effectively. This may hinder existing few-shot learning approaches from dealing with fine-gained categories well. In this work, we propose a multi-attention meta-learning (MattML) method for few-shot fine-grained image recognition (FSFGIR). Instead of using only base learner for general feature learning, the proposed meta-learning method uses attention mechanisms of the base learner and task learner to capture discriminative parts of images. The base learner is equipped with two convolutional block attention modules (CBAM) and a classifier. The two CBAM can learn diverse and informative parts. And the initial weights of classifier are attended by the task learner, which gives the classifier a task-related sensitive initialization. For adaptation, the gradient-based meta-learning approach is employed by updating the parameters of two CBAM and the attended classifier, which facilitates the updated base learner to adaptively focus on discriminative parts. We experimentally analyze the different components of our method, and experimental results on four benchmark datasets demonstrate the effectiveness and superiority of our method.

Download Full-text

A Dual Attention Network with Semantic Embedding for Few-Shot Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019079 ◽

2019 ◽

Vol 33 ◽

pp. 9079-9086 ◽

Cited By ~ 5

Author(s):

Shipeng Yan ◽

Songyang Zhang ◽

Xuming He

Keyword(s):

Complex Model ◽

Training Data ◽

Future Research ◽

Learning Approaches ◽

Learning Loss ◽

Attention Network ◽

Shot Classification ◽

Recent Success ◽

Meta Learning ◽

Visual Concepts

Despite recent success of deep neural networks, it remains challenging to efficiently learn new visual concepts from limited training data. To address this problem, a prevailing strategy is to build a meta-learner that learns prior knowledge on learning from a small set of annotated data. However, most of existing meta-learning approaches rely on a global representation of images and a meta-learner with complex model structures, which are sensitive to background clutter and difficult to interpret. We propose a novel meta-learning method for few-shot classification based on two simple attention mechanisms: one is a spatial attention to localize relevant object regions and the other is a task attention to select similar training data for label prediction. We implement our method via a dual-attention network and design a semantic-aware meta-learning loss to train the meta-learner network in an end-to-end manner. We validate our model on three few-shot image classification datasets with extensive ablative study, and our approach shows competitive performances over these datasets with fewer parameters. For facilitating the future research, code and data split are available: https://github.com/tonysy/STANet-PyTorch

Download Full-text

Consistent MetaReg: Alleviating Intra-task Discrepancy for Better Meta-knowledge

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/377 ◽

2020 ◽

Author(s):

Pinzhuo Tian ◽

Lei Qi ◽

Shaokang Dong ◽

Yinghuan Shi ◽

Yang Gao

Keyword(s):

Regularization Method ◽

Data Distribution ◽

Training Data ◽

Learning Approaches ◽

Limited Data ◽

Learning Models ◽

Shot Classification ◽

Meta Learning ◽

Gradient Based ◽

Regret Bound

In the few-shot learning scenario, the data-distribution discrepancy between training data and test data in a task usually exists due to the limited data. However, most existing meta-learning approaches seldom consider this intra-task discrepancy in the meta-training phase which might deteriorate the performance. To overcome this limitation, we develop a new consistent meta-regularization method to reduce the intra-task data-distribution discrepancy. Moreover, the proposed meta-regularization method could be readily inserted into existing optimization-based meta-learning models to learn better meta-knowledge. Particularly, we provide the theoretical analysis to prove that using the proposed meta-regularization, the conventional gradient-based meta-learning method can reach the lower regret bound. The extensive experiments also demonstrate the effectiveness of our method, which indeed improves the performances of the state-of-the-art gradient-based meta-learning models in the few-shot classification task.

Download Full-text

LeSSA: A Unified Framework based on Lexicons and Semi-Supervised Learning Approaches for Textual Sentiment Classification

Applied Sciences ◽

10.3390/app9245562 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5562 ◽

Cited By ~ 2

Author(s):

Jawad Khan ◽

Young-Koo Lee

Keyword(s):

Sentiment Analysis ◽

Supervised Learning ◽

Sentiment Classification ◽

Training Data ◽

Learning Approaches ◽

Model Learning ◽

Significant Information ◽

Research Issues ◽

Sentiment Lexicon ◽

Benchmark Datasets

Sentiment Analysis (SA) is an active research area. SA aims to classify the online unstructured user-generated contents (UUGC) into positive and negative classes. A reliable training data is vital to learn a sentiment classifier for textual sentiment classification, but due to domain heterogeneity, manually construction of reliable labeled sentiment corpora is a laborious and time-consuming task. In the absence of enough labeled data, the alternative usage of sentiment lexicons and semi-supervised learning approaches for sentiment classification have substantially attracted the attention of the research community. However, state-of-the-art techniques for semi-supervised sentiment classification present research challenges expressed in questions like the following. How to effectively utilize the concealed significant information in the unstructured data? How to learn the model while considering the most effective sentiment features? How to remove the noise and redundant features? How to refine the initial training data for initial model learning as the random selection may lead to performance degradation? Besides, mainly existing lexicons have trouble with word coverage, which may ignore key domain-specific sentiment words. Further research is required to improve the sentiment lexicons for textual sentiment classification. In order to address such research issues, in this paper, we propose a novel unified sentiment analysis framework for textual sentiment classification called LeSSA. Our main contributions are threefold. (a) lexicon construction, generating quality and wide coverage sentiment lexicon. (b) training classification models based on a high-quality training dataset generated by using k-mean clustering, active learning, self-learning, and co-training algorithms. (c) classification fusion, whereby the predictions from numerous learners are confluences to determine final sentiment polarity based on majority voting, and (d) practicality, that is, we validate our claim while applying our model on benchmark datasets. The empirical evaluation of multiple domain benchmark datasets demonstrates that the proposed framework outperforms existing semi-supervised learning techniques in terms of classification accuracy.

Download Full-text

Multi-Stage Meta-Learning for Few-Shot with Lie Group Network Constraint

Entropy ◽

10.3390/e22060625 ◽

2020 ◽

Vol 22 (6) ◽

pp. 625

Author(s):

Fang Dong ◽

Li Liu ◽

Fanzhang Li

Keyword(s):

Gradient Descent ◽

Common Knowledge ◽

Training Data ◽

Stiefel Manifold ◽

Learning Approaches ◽

Learning Models ◽

Multi Stage ◽

Meta Learning ◽

Base Learner ◽

Overfitting Problem

Deep learning has achieved many successes in different fields but can sometimes encounter an overfitting problem when there are insufficient amounts of labeled samples. In solving the problem of learning with limited training data, meta-learning is proposed to remember some common knowledge by leveraging a large number of similar few-shot tasks and learning how to adapt a base-learner to a new task for which only a few labeled samples are available. Current meta-learning approaches typically uses Shallow Neural Networks (SNNs) to avoid overfitting, thus wasting much information in adapting to a new task. Moreover, the Euclidean space-based gradient descent in existing meta-learning approaches always lead to an inaccurate update of meta-learners, which poses a challenge to meta-learning models in extracting features from samples and updating network parameters. In this paper, we propose a novel meta-learning model called Multi-Stage Meta-Learning (MSML) to post the bottleneck during the adapting process. The proposed method constrains a network to Stiefel manifold so that a meta-learner could perform a more stable gradient descent in limited steps so that the adapting process can be accelerated. An experiment on the mini-ImageNet demonstrates that the proposed method reached a better accuracy under 5-way 1-shot and 5-way 5-shot conditions.

Download Full-text

Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-in-Molecule Neural Network

10.26434/chemrxiv.7151435.v2 ◽

2018 ◽

Author(s):

Roman Zubatyuk ◽

Justin S. Smith ◽

Jerzy Leszczynski ◽

Olexandr Isayev

Keyword(s):

Neural Network ◽

Molecular System ◽

Computational Cost ◽

Chemical Properties ◽

The State ◽

Molecular Properties ◽

Training Data ◽

Dft Methods ◽

Benchmark Datasets ◽

Quantum Phenomena

<p>Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena. Here we present AIMNet, a modular and chemically inspired deep neural network potential. We used AIMNet with multitarget training to learn multiple modalities of the state of the atom in a molecular system. The resulting model shows on several benchmark datasets the state-of-the-art accuracy, comparable to the results of orders of magnitude more expensive DFT methods. It can simultaneously predict several atomic and molecular properties without an increase in computational cost. With AIMNet we show a new dimension of transferability: the ability to learn new targets utilizing multimodal information from previous training. The model can learn implicit solvation energy (like SMD) utilizing only a fraction of original training data, and archive MAD error of 1.1 kcal/mol compared to experimental solvation free energies in MNSol database.</p>

Download Full-text

Multilayer Soil Moisture Mapping at a Regional Scale from Multisource Data via a Machine Learning Method

Remote Sensing ◽

10.3390/rs11030284 ◽

2019 ◽

Vol 11 (3) ◽

pp. 284 ◽

Cited By ~ 1

Author(s):

Linglin Zeng ◽

Shun Hu ◽

Daxiang Xiang ◽

Xiang Zhang ◽

Deren Li ◽

...

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Regional Scale ◽

Remotely Sensed ◽

Temporal Variations ◽

Training Data ◽

Estimation Accuracy ◽

Learning Approaches ◽

Remotely Sensed Data ◽

Deep Soil

Soil moisture mapping at a regional scale is commonplace since these data are required in many applications, such as hydrological and agricultural analyses. The use of remotely sensed data for the estimation of deep soil moisture at a regional scale has received far less emphasis. The objective of this study was to map the 500-m, 8-day average and daily soil moisture at different soil depths in Oklahoma from remotely sensed and ground-measured data using the random forest (RF) method, which is one of the machine-learning approaches. In order to investigate the estimation accuracy of the RF method at both a spatial and a temporal scale, two independent soil moisture estimation experiments were conducted using data from 2010 to 2014: a year-to-year experiment (with a root mean square error (RMSE) ranging from 0.038 to 0.050 m3/m3) and a station-to-station experiment (with an RMSE ranging from 0.044 to 0.057 m3/m3). Then, the data requirements, importance factors, and spatial and temporal variations in estimation accuracy were discussed based on the results using the training data selected by iterated random sampling. The highly accurate estimations of both the surface and the deep soil moisture for the study area reveal the potential of RF methods when mapping soil moisture at a regional scale, especially when considering the high heterogeneity of land-cover types and topography in the study area.

Download Full-text

SAR Target Recognition via Meta-Learning and Amortized Variational Inference

Sensors ◽

10.3390/s20205966 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5966

Author(s):

Ke Wang ◽

Gong Zhang

Keyword(s):

Target Recognition ◽

Probability Distributions ◽

Automatic Target Recognition ◽

Variational Inference ◽

Training Data ◽

Superior Performance ◽

Small Data ◽

Meta Learning ◽

Radar Automatic Target Recognition ◽

Global Parameters

The challenge of small data has emerged in synthetic aperture radar automatic target recognition (SAR-ATR) problems. Most SAR-ATR methods are data-driven and require a lot of training data that are expensive to collect. To address this challenge, we propose a recognition model that incorporates meta-learning and amortized variational inference (AVI). Specifically, the model consists of global parameters and task-specific parameters. The global parameters, trained by meta-learning, construct a common feature extractor shared between all recognition tasks. The task-specific parameters, modeled by probability distributions, can adapt to new tasks with a small amount of training data. To reduce the computation and storage cost, the task-specific parameters are inferred by AVI implemented with set-to-set functions. Extensive experiments were conducted on a real SAR dataset to evaluate the effectiveness of the model. The results of the proposed approach compared with those of the latest SAR-ATR methods show the superior performance of our model, especially on recognition tasks with limited data.

Download Full-text

ASFP (Artificial Intelligence based Scoring Function Platform): a web server for the development of customized scoring functions

Journal of Cheminformatics ◽

10.1186/s13321-021-00486-3 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Xujun Zhang ◽

Chao Shen ◽

Xueying Guo ◽

Zhe Wang ◽

Gaoqi Weng ◽

...

Keyword(s):

High Efficiency ◽

Low Cost ◽

Pearson Correlation ◽

Scoring Function ◽

Web Server ◽

Scoring Functions ◽

Protein Ligand Interactions ◽

Prediction Module ◽

Ligand Interactions ◽

Benchmark Datasets

AbstractVirtual screening (VS) based on molecular docking has emerged as one of the mainstream technologies of drug discovery due to its low cost and high efficiency. However, the scoring functions (SFs) implemented in most docking programs are not always accurate enough and how to improve their prediction accuracy is still a big challenge. Here, we propose an integrated platform called ASFP, a web server for the development of customized SFs for structure-based VS. There are three main modules in ASFP: (1) the descriptor generation module that can generate up to 3437 descriptors for the modelling of protein–ligand interactions; (2) the AI-based SF construction module that can establish target-specific SFs based on the pre-generated descriptors through three machine learning (ML) techniques; (3) the online prediction module that provides some well-constructed target-specific SFs for VS and an additional generic SF for binding affinity prediction. Our methodology has been validated on several benchmark datasets. The target-specific SFs can achieve an average ROC AUC of 0.973 towards 32 targets and the generic SF can achieve the Pearson correlation coefficient of 0.81 on the PDBbind version 2016 core set. To sum up, the ASFP server is a powerful tool for structure-based VS.

Download Full-text

Information-Theoretic Generalization Bounds for Meta-Learning and Applications

Entropy ◽

10.3390/e23010126 ◽

2021 ◽

Vol 23 (1) ◽

pp. 126

Author(s):

Sharu Theresa Jose ◽

Osvaldo Simeone

Keyword(s):

Learning Algorithm ◽

Broad Class ◽

Performance Measure ◽

Training Data ◽

Learning To Learn ◽

Data Set ◽

Information Theoretic ◽

Meta Learning ◽

Task Training ◽

Test Sets

Meta-learning, or “learning to learn”, refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that use either separate within-task training and test sets, like model agnostic meta-learning (MAML), or joint within-task training and test sets, like reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed for the two classes via novel individual task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.

Download Full-text