Large-Scale Local Online Similarity/Distance Learning Framework Based on Passive/Aggressive

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421510174 ◽

2021 ◽

Author(s):

Baida Hamdan ◽

Davood Zabihzadeh

Keyword(s):

Distance Learning ◽

Input Data ◽

Large Scale ◽

Learning Algorithm ◽

Metric Learning ◽

Distance Measures ◽

Discrimination Power ◽

Input Space ◽

Learning Framework ◽

Similarity Distance

Similarity/distance measures play a key role in many machine learning, pattern recognition, and data mining algorithms, which leads to the emergence of the metric learning field. Many metric learning algorithms learn a global distance function from data that satisfies the constraints of the problem. However, in many real-world datasets, where the discrimination power of features varies in the different regions of input space, a global metric is often unable to capture the complexity of the task. To address this challenge, local metric learning methods are proposed which learn multiple metrics across the different regions of the input space. Some advantages of these methods include high flexibility and learning a nonlinear mapping, but they typically achieve at the expense of higher time requirements and overfitting problems. To overcome these challenges, this research presents an online multiple metric learning framework. Each metric in the proposed framework is composed of a global and a local component learned simultaneously. Adding a global component to a local metric efficiently reduces the problem of overfitting. The proposed framework is also scalable with both sample size and the dimension of input data. To the best of our knowledge, this is the first local online similarity/distance learning framework based on Passive/Aggressive (PA). In addition, for scalability with the dimension of input data, Dual Random Projection (DRP) is extended for local online learning in the present work. It enables our methods to run efficiently on high-dimensional datasets while maintaining their predictive performance. The proposed framework provides a straightforward local extension to any global online similarity/distance learning algorithm based on PA. Experimental results on some challenging datasets from machine vision community confirm that the extended methods considerably enhance the performance of the related global ones without increasing the time complexity.

Download Full-text

Perturbation Training for Human-Robot Teams

Journal of Artificial Intelligence Research ◽

10.1613/jair.5390 ◽

2017 ◽

Vol 59 ◽

pp. 495-541 ◽

Cited By ~ 3

Author(s):

Ramya Ramakrishnan ◽

Chongjie Zhang ◽

Julie Shah

Keyword(s):

Transfer Learning ◽

Large Scale ◽

Communication Protocol ◽

Learning Algorithm ◽

Training Strategy ◽

Learning Framework ◽

Perturbation Training ◽

Learning Techniques ◽

High Level ◽

Live Interaction

In this work, we design and evaluate a computational learning model that enables a human-robot team to co-develop joint strategies for performing novel tasks that require coordination. The joint strategies are learned through "perturbation training," a human team-training strategy that requires team members to practice variations of a given task to help their team generalize to new variants of that task. We formally define the problem of human-robot perturbation training and develop and evaluate the first end-to-end framework for such training, which incorporates a multi-agent transfer learning algorithm, human-robot co-learning framework and communication protocol. Our transfer learning algorithm, Adaptive Perturbation Training (AdaPT), is a hybrid of transfer and reinforcement learning techniques that learns quickly and robustly for new task variants. We empirically validate the benefits of AdaPT through comparison to other hybrid reinforcement and transfer learning techniques aimed at transferring knowledge from multiple source tasks to a single target task. We also demonstrate that AdaPT's rapid learning supports live interaction between a person and a robot, during which the human-robot team trains to achieve a high level of performance for new task variants. We augment AdaPT with a co-learning framework and a computational bi-directional communication protocol so that the robot can co-train with a person during live interaction. Results from large-scale human subject experiments (n=48) indicate that AdaPT enables an agent to learn in a manner compatible with a human's own learning process, and that a robot undergoing perturbation training with a human results in a high level of team performance. Finally, we demonstrate that human-robot training using AdaPT in a simulation environment produces effective performance for a team incorporating an embodied robot partner.

Download Full-text

Large-Scale Multi-modal Distance Metric Learning with Application to Content-Based Information Retrieval and Image Classification

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420500342 ◽

2020 ◽

Vol 34 (13) ◽

pp. 2050034

Author(s):

Ali Salim Rasheed ◽

Davood Zabihzadeh ◽

Sumia Abdulhussien Razooqi Al-Obaidi

Keyword(s):

Information Retrieval ◽

Image Classification ◽

Large Scale ◽

Learning Algorithm ◽

State Of The Art ◽

Metric Learning ◽

Random Projection ◽

Linear Projection ◽

Modal Data ◽

Related Data

Metric learning algorithms aim to make the conceptually related data items closer and keep dissimilar ones at a distance. The most common approach for metric learning on the Mahalanobis method. Despite its success, this method is limited to find a linear projection and also suffer from scalability respecting both the dimensionality and the size of input data. To address these problems, this paper presents a new scalable metric learning algorithm for multi-modal data. Our method learns an optimal metric for any feature set of the multi-modal data in an online fashion. We also combine the learned metrics with a novel Passive/Aggressive (PA)-based algorithm which results in a higher convergence rate compared to the state-of-the-art methods. To address scalability with respect to dimensionality, Dual Random Projection (DRP) is adopted in this paper. The present method is evaluated on some challenging machine vision datasets for image classification and Content-Based Information Retrieval (CBIR) tasks. The experimental results confirm that the proposed method significantly surpasses other state-of-the-art metric learning methods in most of these datasets in terms of both accuracy and efficiency.

Download Full-text

Estimating nitrogen and phosphorus concentrations in streams and rivers across the contiguous United States: a machine learning framework

10.7287/peerj.preprints.27585 ◽

2019 ◽

Author(s):

Longzhu Shen ◽

Giuseppe Amatulli ◽

Tushar Sethi ◽

Peter Raymond ◽

Sami Domisch

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Algorithm ◽

External Validation ◽

Anthropogenic Activity ◽

Spatial And Temporal Variability ◽

Nitrogen And Phosphorus ◽

Learning Framework ◽

Environmental Models ◽

Improved Accuracy

Nitrogen (N) and Phosphorus (P) are essential nutrients for life processes in water bodies but in excessive quantities, they are a significant source of aquatic pollution. Eutrophication has now become widespread due to such an imbalance, and is largely attributed to anthropogenic activity. In view of this phenomenon, we present a new dataset and statistical method for estimating and mapping elemental and compound con- centrations of N and P at a resolution of 30 arc-seconds (∼1 km) for the conterminous US. The model is based on a Random Forest (RF) machine learning algorithm that was fitted with environmental variables and seasonal N and P concentration observations from 230,000 stations spanning across US stream networks. Accounting for spatial and temporal variability offers improved accuracy in the analysis of N and P cycles. The algorithm has been validated with an internal and external validation procedure that is able to explain 70-83% of the variance in the model. The dataset is ready for use as input in a variety of environmental models and analyses, and the methodological framework can be applied to large-scale studies on N and P pollution, which include water quality, species distribution and water ecology research worldwide.

Download Full-text

Search Efficient Binary Network Embedding

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3436892 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-27

Author(s):

Daokun Zhang ◽

Jie Yin ◽

Xingquan Zhu ◽

Chengqi Zhang

Keyword(s):

Similarity Search ◽

Large Scale ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Distance Measures ◽

Stochastic Gradient Descent ◽

Network Embedding ◽

Continuous Vector ◽

Node Attribute ◽

Binary Network

Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned continuous vector representations are inefficient for large-scale similarity search, which often involves finding nearest neighbors measured by distance or similarity in a continuous vector space. In this article, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations using a stochastic gradient descent-based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support faster node similarity search than using Euclidean or other distance measures. Extensive experiments and comparisons demonstrate that BinaryNE not only delivers more than 25 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods. The binary codes learned by BinaryNE also render competitive performance on node classification and node clustering tasks. The source code of the BinaryNE algorithm is available at https://github.com/daokunzhang/BinaryNE.

Download Full-text

Estimating nitrogen and phosphorus concentrations in streams and rivers across the contiguous United States: a machine learning framework

10.7287/peerj.preprints.27585v1 ◽

2019 ◽

Author(s):

Longzhu Shen ◽

Giuseppe Amatulli ◽

Tushar Sethi ◽

Peter Raymond ◽

Sami Domisch

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Algorithm ◽

External Validation ◽

Anthropogenic Activity ◽

Spatial And Temporal Variability ◽

Nitrogen And Phosphorus ◽

Learning Framework ◽

Environmental Models ◽

Improved Accuracy

Download Full-text

Similar Pair-free Partial Label Metric Learning

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2022.16.26 ◽

2022 ◽

Vol 16 ◽

pp. 215-223

Author(s):

Houjie Li ◽

Min Yang ◽

Yu Zhou ◽

Ruirui Zheng ◽

Wenpeng Liu ◽

...

Keyword(s):

Learning Algorithm ◽

Learning Algorithms ◽

Metric Learning ◽

Learning Technology ◽

K Nearest Neighbor ◽

Learning Framework ◽

Training Samples ◽

Lower Accuracy ◽

Partial Label Learning ◽

Similar Pair

Partial label learning is a new weak- ly supervised learning framework. In this frame- work, the real category label of a training sample is usually concealed in a set of candidate labels, which will lead to lower accuracy of learning al- gorithms compared with traditional strong super- vised cases. Recently, it has been found that met- ric learning technology can be used to improve the accuracy of partial label learning algorithm- s. However, because it is difficult to ascertain similar pairs from training samples, at present there are few metric learning algorithms for par- tial label learning framework. In view of this, this paper proposes a similar pair-free partial la- bel metric learning algorithm. The main idea of the algorithm is to define two probability distri- butions on the training samples, i.e., the proba- bility distribution determined by the distance of sample pairs and the probability distribution de- termined by the similarity of candidate label set of sample pairs, and then the metric matrix is ob- tained via minimizing the KL divergence of the two probability distributions. The experimental results on several real-world partial label dataset- s show that the proposed algorithm can improve the accuracy of k-nearest neighbor partial label learning algorithm (PL-KNN) better than the ex- isting partial label metric learning algorithms, up to 8 percentage points.

Download Full-text

Design, Implementation and Evaluation of a Distance Learning Framework to Expedite Medical Education during COVID-19 pandemic: A Proof-of-Concept Study

Journal of Medical Education and Curricular Development ◽

10.1177/23821205211000349 ◽

2021 ◽

Vol 8 ◽

pp. 238212052110003

Author(s):

Aida J Azar ◽

Amar Hassan Khamis ◽

Nerissa Naidoo ◽

Marjam Lindsbro ◽

Juliana Helena Boukhaled ◽

...

Keyword(s):

Distance Learning ◽

Cognitive Development ◽

Collaborative Learning ◽

Medical Schools ◽

Design Strategies ◽

Student Autonomy ◽

Learning Framework ◽

Theory Of Practice ◽

Competency Based ◽

Medical Educators

Background: The COVID-19 pandemic has forced medical schools to suspend on-campus live-sessions and shift to distance-learning (DL). This precipitous shift presented medical educators with a challenge, ‘to create a “ simulacrum” of the learning environment that students experience in classroom, in DL’. This requires the design of an adaptable and versatile DL-framework bearing in mind the theoretical underpinnings associated with DL. Additionally, effectiveness of such a DL-framework in content-delivery followed by its evaluation at the user-level, and in cognitive development needs to be pursued such that medical educators can be convinced to effectively adopt the framework in a competency-based medical programme. Main: In this study, we define a DL-framework that provides a ‘ simulacrum’ of classroom experience. The framework’s blueprint was designed amalgamating principles of: Garrison’s community inquiry, Siemens’ connectivism and Harasim’s online-collaborative-learning; and improved using Anderson’s DL-model. Effectiveness of the DL-framework in course delivery was demonstrated using the exemplar of fundamentals in epidemiology and biostatistics (FEB) course during COVID-19 lockdown. Virtual live-sessions integrated in the framework employed a blended-approach informed by instructional-design strategies of Gagne and Peyton. The efficiency of the framework was evaluated using first 2 levels of Kirkpatrick’s framework. Of 60 students, 51 (85%) responded to the survey assessing perception towards DL (Kirkpatrick’s Level 1). The survey-items, validated using exploratory factor analysis, were classified into 4-categories: computer expertise; DL-flexibility; DL-usefulness; and DL-satisfaction. The overall perception for the 4 categories, highlighted respondents’ overall satisfaction with the framework. Scores for specific survey-items attested that the framework promoted collaborative-learning and student-autonomy. For, Kirkpatrick’s Level 2 that is, cognitive-development, performance in FEB’s summative-assessment of students experiencing DL was compared with students taught using traditional methods. Similar, mean-scores for both groups indicated that shift to DL didn’t have an adverse effect on students’ learning. Conclusion: In conclusion, we present here the design, implementation and evaluation of a DL-framework, which is an efficient pedagogical approach, pertinent for medical schools to adopt (elaborated using Bourdieu’s Theory of Practice) to address students’ learning trajectories during unprecedented times such as that during the COVID-19 pandemia.

Download Full-text

A deep metric learning algorithm for similarity measure of the gene expression profile

2020 IEEE International Conference on E-health Networking, Application & Services (HEALTHCOM) ◽

10.1109/healthcom49281.2021.9398919 ◽

2021 ◽

Author(s):

Shaoliang Peng ◽

Lei Zhang ◽

Yaning Yang ◽

Wei Liu ◽

Fei Li ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Profile ◽

Similarity Measure ◽

Expression Profile ◽

Learning Algorithm ◽

Metric Learning ◽

Deep Metric Learning

Download Full-text

Semi-supervised LDA Based Method for Similarity Distance Metric Learning

2021 The 4th International Conference on Information Science and Systems ◽

10.1145/3459955.3460606 ◽

2021 ◽

Author(s):

Ren Deng ◽

Yaxuan Chen ◽

Ruilin Han ◽

Han Xiao ◽

Xijie Li

Keyword(s):

Metric Learning ◽

Distance Metric Learning ◽

Distance Metric ◽

Similarity Distance

Download Full-text

Multi-task learning based Encoder-Decoder: A comprehensive detection and diagnosis system for multi-sensor data

Advances in Mechanical Engineering ◽

10.1177/16878140211013138 ◽

2021 ◽

Vol 13 (5) ◽

pp. 168781402110131

Author(s):

Junfeng Wu ◽

Li Yao ◽

Bin Liu ◽

Zheyuan Ding ◽

Lei Zhang

Keyword(s):

Anomaly Detection ◽

Event Detection ◽

Large Scale ◽

Multivariate Time Series ◽

Sensor Data ◽

Unified Framework ◽

Diagnosis System ◽

Learning Framework ◽

Task Learning ◽

Detection And Diagnosis

As more and more sensor data have been collected, automated detection, and diagnosis systems are urgently needed to lessen the increasing monitoring burden and reduce the risk of system faults. A plethora of researches have been done on anomaly detection, event detection, anomaly diagnosis respectively. However, none of current approaches can explore all these respects in one unified framework. In this work, a Multi-Task Learning based Encoder-Decoder (MTLED) which can simultaneously detect anomalies, diagnose anomalies, and detect events is proposed. In MTLED, feature matrix is introduced so that features are extracted for each time point and point-wise anomaly detection can be realized in an end-to-end way. Anomaly diagnosis and event detection share the same feature matrix with anomaly detection in the multi-task learning framework and also provide important information for system monitoring. To train such a comprehensive detection and diagnosis system, a large-scale multivariate time series dataset which contains anomalies of multiple types is generated with simulation tools. Extensive experiments on the synthetic dataset verify the effectiveness of MTLED and its multi-task learning framework, and the evaluation on a real-world dataset demonstrates that MTLED can be used in other application scenarios through transfer learning.

Download Full-text